Introduction to Data Lake architecture
One of the essential elements of modern data (driven) architecture is Data Lake. Data Lake concept has the role of collecting all potential data in one place to support the implementation of Data-Driven services such as Self-Service BI, Predictive analytics, machine learning algorithms (AI), NLP services. The most crucial elements of data lake architecture are:
- Build Data Ingestion – the process of loading data in near real-time regardless of data usage
- Access layer – design optimal access points for each consumer
- Data Catalog – each data set needs to be appropriately categorized and enabled by Data Lineage using metadata management policies and standards
- Data Privacy – design the process of managing principal approaches depending on purpose and need (GDPR)
It is crucial to implement all these elements simultaneously to create a coherent and sustainable architecture and avoid the unwanted situation of creating an unusable and unmaintainable Data Swamp from Data Lake.
This training is primarily intended for architects who have not used a Data Lake system so far or plan to use it. The underlying goal of the education relies on methodology and best practices to create a sustainable and extensible Data Lake architecture with Data Governance in mind.
The course is divided into two modules (days), allowing you to choose modules you are more interested in – whether it’s an introduction to Data Lake architecture or deep dive into sustainability and data governance. Every module is accompanied by examples and exercises based on real-life scenarios.
Day/module 1 – Introductions to Data Lake
Why Data Lake?
Data Lake architectures
Deep dive to the Data Ingestion process (patterns for batch and real-time)
How to build an optimal access layer?
Day/Module 2 – Data Lake sustainability
Implement Data Catalog with Data Governance in mind
How to build the roadmap and sustainable Data Lake architecture?
How to avoid Data Swamp or the most common mistakes?
How to support GDPR and implement proper Data Privacy?
Prerequisites for attending the course
To successfully follow the course database, knowledge in data warehouse concepts and data modeling is required. Knowledge of big data technologies is an advantage.
This training is intended for the following roles:
- Data and software architects
- Business and data analysts
- Data engineers and data scientists
Today, Data Lake is crucial in every modern enterprise data architecture and standard when there is a need to support data-driven services and products. So through our various Data Lake projects in the context of data architecture modernization, we have implemented all crucial elements during a couple of years.
After this course, you will be able to build and design sustainable Data Lake architecture in your environment, we talk about it in our blog