Tag: DATA ENGINEERING
DBImport is an open-source ingestion tool that uses Sqoop or Spark to ingest data from the relational databases into Hive database in the Hadoop cluster.
When using partitioning your performance of querying large tables is better, queries can access partitions in parallel, faster data deletion, and data load, better manageability and many others
If you want to feel accepted, understood and stimulated for new challenges, apply for Summer Accelerator – a student internship program at CROZ.
Delta Lake provides great features and solves some of the biggest issues that come with a data lake. On top of all, it is easy to use! Keep reading this post for some useful tips and tricks.
DataOps’s main focus is on monitoring and optimization of the so-called data factory.
The neat thing about graph databases is that they are graphs, and there are a lot of graph algorithms that can be used to find a solution to your problem. That is why they started getting more traction lately and why I decided to look deeper into the matter.
If you need help with your data or if you need someone to give you more insight into your data, why not call our Data Engineering department?
By using data visualization, we can depict all the data we possess. Try it!
Data anonymization is the process of data manipulation in order to hide sensitive data.
In this blog post we will present an interesting architecture that combines proven IBM technologies with Open Source big data technologies.