Do I Need a Data Catalog?
If you spend a bunch of time searching for data, worry about its quality and security, or still rely on centralized IT to get access to specific data – you definitely need a data catalog! Data catalog enables all your data to be stored, categorized, and easily accessible to users for analysis. Check out all the benefits of the data catalog!
What is a Data Catalog?
A data catalog is metadata storage that helps companies organize and search data stored across multiple systems. It enables the relevant user to find and understand datasets to extract business value. A library catalog has details on books and journals, while a data catalog contains information about tables, files, and databases. A good data catalog should contain the location of all data entities, critical information on each piece of data, statistics and summaries of data as well as lineage.
Why Do I Need a Data Catalog?
There is no clear answer to that question because every organization has its needs and challenges when defining processes, stewardship, ownership, politics, standards, etc. When we have resources for defining data governance, the only concrete delivery is a PowerPoint with logical processes, roadmap, and key guidelines. For the majority, it’s complicated to use clearly defined politics and standards because the access is managed by a waterfall process or a top-down, which we cannot establish without military discipline due to rapid changes.
To avoid this, it’s important to start with a hybrid approach – to have all requirements and begin with the implementation step by step (bottom-up). Besides that, it’s crucial to listen to clients’ needs and problems, ensure the MVP, begin to develop (bottom-up), and continuously coordinate development with data governance methodology (top-down).
- Create one minimal user-case
- Implement it quickly
- Observe it critically
- Revise it and establish the next steps
- Create a workspace with “Fast Fail”
Average Business User or Business Analyst
Before I describe one of our projects of implementing a Data Catalog into modern data architecture, I want to show you a vision of the average business user or analyst.
Imagine if an average business user could search data sets for business needs the same way we search and buy stuff online. For example, with only one request (i.e. clients contact) the user gets the list of data sets with contact information. Today, contact information contains much more than just a phone and address – everyone uses social networks, messaging services, maps with coordinates, etc. Besides the real world, we also live and communicate in a virtual one. All data exists in numerous databases used for various purposes. The question is – which data set suits the business user or business analyst the best.
Business users or business analysts can see the detailed information about each data set – which data the set has, which business language it uses, how current and correct data is, how much data is missing, etc. Besides that, they can also see the comments of other analysts, so this kind of Data Catalog encourages collaboration, is transparent, needs to have integrated Data Quality KPIs, and supports data linage and IT linage.
This is what we dream about in every modern organization. Then, we wake up and see our system – we don’t know where our data is, we send hundreds of e-mails, have tens of meetings, workshops, and calls, contact many people to get the needed information. Data quality is not defined or even measured, terminology isn’t consistent but often ambiguous and vague. We spend a lot of energy, time, and money on information that may or may not be current and accurate.
This is where Data Catalogs functionalities like search and discovery shine bright. But Data Catalog has to offer much more – data lineage, data quality, data classification for authorized access, and implementation of dynamic data masking. Also, one of the functionalities is defining data contract between information producer and consumer including SLA and results monitoring.
Let’s go back to 2017 – at the time everyone worked hard on establishing universal GDPR regulations. Imagine we had Data Catalog with a clear client information overview at that moment. What do you think – how much money, time and human resources less would be spent? Instead of a tiresome search for data, we could have all the required data only a few clicks away. And spend time smarter – developing and advancing services that bring profit and additional value.
In the last century, everyone talked about digital transformation supported by a data-driven strategy that imposes data-driven architecture with self-service BI and AI functionalities. We dream of maximal automatization of business processes based on precise data. Imagine the experience of searching data like surfing on Amazon. We live in a cloud, but in the real world, we must come down to earth and use the dream as a clear guiding light towards building the Data Catalog.
The main goal was to design a Data Catalog for one of Nordic’s biggest non-life insurance companies.