About tech

What’s up with data engineering?

New partnerships and winning first place in the sixth edition of the EUDATATHON’22 organized by the European Commission are just some of the things our data team enjoyed in 2022! Thanks to the new technologies and knowledge we gained in 2022, we firmly believe that 2023 will be just as exciting. So, keep reading and find out what amazing things happened last year and what’s planned as well.

All we do is win, win, win!

Prepare for a round of applause as we have won first place in the EUDATATHON’22 “A European Green Deal”. We are very proud of our “renEUwable team”, which was formed of data engineers, AI engineers, UX/UI designers, testers, project managers, and agile coaches. How have we managed to do this? Quite simply, by focusing on customer needs and developing an application prototype that helps people make the environment cleaner and more energy efficient.

We wanted to influence the change in human habits and motivate people for more efficient energy use. Thanks to AI, this application would generate reminders that would motivate users to take further steps by using a smart assistant called Mr.Green.

— Darko Benšić, Data Architect

The importance of investing in knowledge

As you already know, new technologies, architectures, patterns and concepts come to market every year. That’s why we invest in internal research and development so that we can offer our clients the most suitable solutions for their needs, completely independent of manufacturers.

Treat your data right!

Our projects cover all areas of data engineering, from databases, data entry and transformation, business intelligence and reporting to advanced analytics and AI. We complement our skills with those of other CROZ teams, such as DevOps and Infrastructure, to bring in all the required expertise. In all these areas, we explore methods that can and should be applied to have a well-functioning data stack that delivers good quality data and insights. In 2022, we have been involved in projects with clients that have transformed their organization and data teams around the principles of the data mesh. This includes treating data as a product and having appropriate data governance in place, including data quality, data lineage, and data catalogue. From what we see, the data mesh will become even more traction this year

Open-source tools and cloud technologies were trending

The use of open-source tools in data engineering stacks has increased, and it looks like it will continue to grow this year. Notable open-source technologies that we are using heavily include Apache Airflow, Apache Kafka, Apache Flink, Trino, Airbyte, dbt, Apache Superset, great_expectations, OpenMetadata, and Debezium. Another increase in adoption we have seen and that experienced last year is the adoption of cloud technologies.

2022 WAS THE YEAR WE HAD THE MOST CLOUD PROJECTS IN AMAZON AWS AND MICROSOFT AZURE SO FAR

It looks like the future will be hybrid for most of our customers. Adoption of cloud technologies is likely to continue to grow, but due to the sensitivity of the data, the size of the data, and regulatory requirements, organizations will keep some data onpremises and continue to work in hybrid (onpremises and cloud) environments.

New partnership

More and more companies are using streaming technologies like Apache Kafka to get fresh data that is updated in near real-time. We have been working with Kafka and related technologies such as Kafka Streams, Apache Flink and Apache Spark for many years.

An important milestone in our streaming journey was becoming a Confluent Plus level partner

What are we looking forward to in 2023?

Well, we will focus on some of the most important technologies, methodologies, and concepts such as:

Airbyte, Fivetran and dbt for ingestion and transformation of data (ELT)
Trino and Denodo for data virtualization
Superset and Looker for BI and reporting
Data mesh, data-as-a-product, data quality, and streaming governance in organization and methodology
Running on different combinations of onpremises/cloud environments

In summary, customer needs and use cases should always be the focus. It is now more important than ever to have data available in near realtime. It enables companies to gain insights into their data as quickly as possible, providing clients with the right information at the right time. Cloud usage is increasing but the future is hybrid and not all busi nesses and all use cases are suitable for the cloud. Cutting edge technologies are always a great thing, but they should always be complemented by proper governance, methods and patterns.

CUTTING EDGE TECHNOLOGIES ARE ALWAYS A GREAT THING, BUT THEY SHOULD ALWAYS BE COMPLEMENTED BY PROPER GOVERNANCE, METHODS AND PATTERNS.

And finally, we are also looking forward to further expanding our data teams in Rijeka and Osijek offices in the coming years.

Finding value with AI & ML

2022 saw many exciting advances in artificial intelligance, which you can see below:

April – OpenAI introduced DALL-E 2, a deep learning model that can synthesize images. July – DeepMind announced that its AlphaFold AI model has successfully predicted the shape of almost every known protein of almost every organism on Earth with a sequenced genome. August – Stability AI released the Stable Diffusion model as an open project, including source code and model checkpoints. The year ended with the launch of ChatGPT, a novel chatbot powered by the GPT-3 large language model. As 2023 is progressing, we are already witnessing tremendous impacts that generative AI models are bringing to our industry, and this will for sure be at the forefront of our thoughts and efforts this year.

Altough all these projects represent major breakthroughs; the numbers show otherwise.

According to a McKinsey study, only 50% of companies will successfully deploy AI in at least one business unit by 2022.

However, high- performing companies are characterized by a clearly defined AI vision and trategy, the ability to rapidly integrate data into machine learning models, and a full life cycle approach to developing and deploying machine learning models In terms of use cases, machine learning is now used in almost all industries and functions. In banking, for example, machine learning models can help segment customers based on structured data in CRM and leverage unstructured data collected by banks (transaction text, user behaviour in mobile banking, …). These models can help determine the optimal communication channel and create personalized offers for each customer. If a customer has any questions, the bank can offer a 24/7 MLbased chatbot for assistance. AIOPS solutions can also be used to monitor mobile banking and core services for technical anomalies that could affect service. In the background, each transaction is passed to various machine learning models to detect and prevent fraud or money laundering in real time. As the use of artificial intelligence continues to grow in enterprises, there is an increasing need for AI platforms or “AI factories” to help companies scale and maintain their machine learning use cases. These platforms provide a centralized, efficient way to develop, deploy, and manage AI models and enable organizations to quickly and easily incorporate new AI capabilities into their operations.

Omnichannel in real time with Event Streaming platforms

More and more companies are trying to manage their customer’s interactions with the company – and for this, we have the best strategy up our sleeve. Implementing an Omnichannel business strategy whose goal is to provide a seamless and high-quality user experience – offline or online and without much effort! Companies used to treat each channel as a separate entity, but thanks to the Internet and an always-connected society, customers expect a cohesive and integrated experience

What did that look in the beginning?

Early attempts relied on integrating data from different channels through data-level ETL or direct integration through some type of APIs. Despite the initial successes, as the solution evolved and more and more features were added, problems crept in. The main problem is integrating the data and reducing response times to near real-time as the number of integrated channels increases.

Streaming platform for rescue

A preferred way to tackle these challenges today is to adopt an Event Streaming platform as a key technology to solve Omnichannel challenges. In short, setting up a central architectural component that mediates communication and integration between different channels. In this approach, the streaming platform inevitably becomes a single point of truth that takes all the data generated in the edge system and passes the corresponding data to other systems.

The streaming platform is not just a technology stack

Even though we can’t escape the discussion about the technology stack required to set up a successful platform, at CROZ we believe it’s important to change the culture, structure, and processes that drive day-to-day operations to get the maximum benefit from such technical solutions. One way to help this process is to apply Conway’s reverse manoeuvre. In other words – the organization should be structured to optimize workflow. Another complementary approach is to establish a “platform” team dedicated to developing and maintaining the platform and supporting other teams that will use the Streaming platform to implement Omnichannel initiatives

Technology stack

To support a decentralized, asynchronous, faulttolerant architecture, the central component of such a solution is the event broker. Usually, Apache Kafka takes on this role, and all other components are selected based on this first choice. Other popular options include Apache Pulsar or Solace. Once we have selected the event broker, we are still missing some key components.

The exact choice of implementation for each component depends on various external or internal factors. It’s important to have a solution that performs the functions described in the diagram. However, special care should be taken when choosing the implementation of the Schema registry component, as its role is to store all metadata information about event record structures and their evolution as business requirements change. In conclusion, the implementation of an Omnichannel business strategy is expected by customers. The quality of execution is directly related to a positive customer experience. Streaming platforms are a technology that enables a company to implement performant, modular, and sustainable technical solutions that will serve customers well in the future.

Tips for designing large scale Cloud migrations

Migrating to the cloud can be difficult, especially if you have a large IT landscape to move. It’s all about finding the right balance between getting everything done on time and making sure everything runs smoothly in the cloud. That’s why it’s so important to plan each step and know exactly what you want to accomplish.

PHASE #1: Planing

In this phase, we gather enough insight into what needs to be migrated through a series of data collections. We may conduct server scans, surveys, and interviews with application owners. Based on the data we collect; we prioritize and make sure we focus on the most important workloads first

PHASE #2: Decision Making

What is next in line? Determine the cloud migration backlog to address in the next phase. We could use the 6-R decision matrix so we should be able to divide the workloads between the Rehost, Replatform, Repurchase, Refactor, Retain, and Retire migration approaches.

PHASE #3: Migrate – Plan big, start small

In a large-scale cloud migration, it’s important to get the basics right. Therefore, some key concepts should be tested first before a large-scale migration. Technology is only one part of the migration. The second part is organizational change and closing knowledge gaps.

PHASE #4: Improve – and do it continuously!

When migrations are performed, lessons are learned that should be used to implement improvements. This can be in the area of security, performance improvement, and more efficient use of resources.

And now you can start your journey more easily. Good luck!

10 most read tech blogs

Technology is constantly evolving, and sometimes it can be difficult to keep up with the latest developments. That’s where our blogs come in! Here are 10 of the most read tech blogs written by our team members. High-quality content, expert knowledge, and up-to-date information are guaranteed!

1 – How to run a SoapUI mockservicerunner in a Docker container (read here)

Learn how to use an amazing tool to create SOAP mock services that you can run in a Docker container using a mock service runner.

2 – XFRM Programming (read here)

When communicating with XFRM, the programmer manages the SAD and SPD databases, as he adds, updates, or deletes entries on request.

3 – Using datefudge to fake Docker date&time for testing (read here)

For one project, we wanted to find a way to force our service to behave as it was invoked at a specific date and time – not just at the current time of test execution. We found a way to do this using datefudge, which helped us fake the date and time

4 – Why you should upgrade to IBM App Connect Eterprise v12? (read here)

Well, it brings a lot of innovation to the table. If that’s not reason enough to switch to IBM App Connect Enterprise v12.

5 – Why do you need Apache Airflow? (read here)

Discover the key components and benefits of an open-source WMS designed for creating, scheduling and monitoring workflows: DAG.

6 – The benefits of migrating from IBM APIC v5 do v10 (read here)

Increased value, better architecture, and improved performance are just some of the benefits that can bring a better container and cloud experience.

7 – Installing IBM API Connect v2018.x Into Single Virtual Machine (read here)

There is often a need to access an isolated APIC environment, but with the new microservice architecture, installing API Connect 2018 can be a challenge. But we have prepared a video tutorial to make it easier!

8 – IBM DB2 Docker container startup speedup (read here)

For those developing cloud-native applications using the DB2 database from IBM, we have introduced a DB2 Docker image for development, testing and CI/CD requirements.

9 – Deploying IBM DataPower into AWS (read here)

The steps for deploying a Docker version of IBM DataPower on Amazon Elastic Container Service your way!

10 – HRK2EUR Converter – a simple tool for dual currency display (read here)

Meet a simple open-source tool that automatically displays prices on web pages in both currencies without programming.

Applying DevOps principles in real world

The idea of achieving is universal and not reserved for modern, cloud-native organizations. It’s true that sometimes it can be easier to achieve smooth operations in cloud-native organizations because modern technology makes it easier. But it’s by no means the case that these flow ideas can’t be applied in traditional organizations. For example, we’re working with largest Swiss bank, on an initiative to improve software delivery flow. It’s hard to imagine a more traditional company than a Swiss bank that uses mainframes.

Yet we apply flow ideas and automate implementations on the mainframe with modern automation platforms. In this way, we eliminate the software deployment friction caused by manual steps and allow staff to focus on creating real value for their customers. Another interesting collaboration is with a U.S. insure tech company that operates in the AWS public cloud. Although the industries and tools used are different, the goals are the same: to eliminate friction in the software delivery process by automating manual steps and eliminating unnecessary steps. By speeding up the process, we enable rapid changes, short feedback loops, and fast learning.

We can’t all be 10x engineers. But each of us can significantly increase our productivity if the process and platform work in our favour. Let’s build 10x teams!

— Ivan Krnić, Director of Engineering

Get in touch

Not sure where to start? Let our experts guide you. Send us your query through this contact form.

Get in touch

Contact us for all inquiries regarding services and general information

Use the form below to apply for course

Get in touch

Contact us for all inquiries regarding services and general information

Table of contents

About tech

What’s up with data engineering?

All we do is win, win, win!

The importance of investing in knowledge

Treat your data right!

Open-source tools and cloud technologies were trending

New partnership

What are we looking forward to in 2023?

Finding value with AI & ML

Omnichannel in real time with Event Streaming platforms

What did that look in the beginning?

Streaming platform for rescue

The streaming platform is not just a technology stack

Technology stack

Tips for designing large scale Cloud migrations

PHASE #1: Planing

PHASE #2: Decision Making

PHASE #3: Migrate – Plan big, start small

PHASE #4: Improve – and do it continuously!

10 most read tech blogs

1 – How to run a SoapUI mockservicerunner in a Docker container (read here)

2 – XFRM Programming (read here)

3 – Using datefudge to fake Docker date&time for testing (read here)

4 – Why you should upgrade to IBM App Connect Eterprise v12? (read here)

5 – Why do you need Apache Airflow? (read here)

6 – The benefits of migrating from IBM APIC v5 do v10 (read here)

7 – Installing IBM API Connect v2018.x Into Single Virtual Machine (read here)

8 – IBM DB2 Docker container startup speedup (read here)

9 – Deploying IBM DataPower into AWS (read here)

10 – HRK2EUR Converter – a simple tool for dual currency display (read here)

Applying DevOps principles in real world