Subscribe to our 0800-DEVOPS Newsletter

    Get in touch

    Contact us for all inquiries regarding services and general information






      Get in touch

      Contact us for all inquiries regarding services and general information






        Use the form below to apply for course





          Get in touch

          Contact us for all inquiries regarding services and general information






          Data

          Introduction to Data Lake architecture

          DURATION 2 Days

          One of the essential elements of modern data (driven) architecture is Data Lake. Data Lake concept has the role of collecting all potential data in one place to support the implementation of Data-Driven services such as Self-Service BI, Predictive analytics, machine learning algorithms (AI), NLP services. The most crucial elements of data lake architecture are:

          • Build Data Ingestion – the process of loading data in near real-time regardless of data usage
          • Access layer – design optimal access points for each consumer
          • Data Catalog – each data set needs to be appropriately categorized and enabled by Data Lineage using metadata management policies and standards
          • Data Privacy – design the process of managing principal approaches depending on purpose and need (GDPR)

          It is crucial to implement all these elements simultaneously to create a coherent and sustainable architecture and avoid the unwanted situation of creating an unusable and unmaintainable Data Swamp from Data Lake.

          This training is primarily intended for architects who have not used a Data Lake system so far or plan to use it. The underlying goal of the education relies on methodology and best practices to create a sustainable and extensible Data Lake architecture with Data Governance in mind.

          The course is divided into two modules (days), allowing you to choose modules you are more interested in – whether it’s an introduction to Data Lake architecture or deep dive into sustainability and data governance. Every module is accompanied by examples and exercises based on real-life scenarios.

          Day/module 1 – Introductions to Data Lake

          Why Data Lake?

          Data Lake architectures

          Deep dive to the Data Ingestion process (patterns for batch and real-time)

          How to build an optimal access layer?

          Day/Module 2 – Data Lake sustainability

          Implement Data Catalog with Data Governance in mind

          How to build the roadmap and sustainable Data Lake architecture?

          How to avoid Data Swamp or the most common mistakes?

          How to support GDPR and implement proper Data Privacy?

          Prerequisites for attending the course

          To successfully follow the course database, knowledge in data warehouse concepts and data modeling is required. Knowledge of big data technologies is an advantage.

          Recommended audience

          This training is intended for the following roles:

          • Data and software architects
          • Business and data analysts
          • Data engineers and data scientists

          Today, Data Lake is crucial in every modern enterprise data architecture and standard when there is a need to support data-driven services and products. So through our various Data Lake projects in the context of data architecture modernization, we have implemented all crucial elements during a couple of years.

          After this course, you will be able to build and design sustainable Data Lake architecture in your environment, we talk about it in our blog

          APPLY TO COURSE

          For all inquiries regarding education, please contact us at learn@croz.net or apply online.

          Apply for course