Artificial Intelligence

Designing a Novel Fraud Detection System for Nexi Croatia

Our team helped NEXI group - Europe’s leading PayTech company, design an additional module of the existing fraud management system by combining machine learning techniques with large amounts of historical data

Client

In order to develop an ML model with better performances, Nexi Croatia started cooperation with CROZ. The first step in the fraud model improvement was to get better system architecture for near real-time inferencing, to try state-of-the-art models and to compare the results with the fraud model currently used for detection and prevention of fraudulent activities. The prediction quality of the newly developed LightGBM model vs currently used Logistic Regression model is greatly improved across all metrics, significantly more effective in fraud detection, while also reducing the number of false positives.

- Maja Žuvić, Data scientist & Nataša Benčić, Senior Product Expert - Nexi Croatia

Client & context

Being a part of Nexi Group, NEXI Croatia is one of Europe’s leading PayTech companies, providing fast and safe payment solutions to people, businesses, and financial institutions in over 25 countries. It also holds the number 1 spot regarding the number of payment cards managed and total transaction value processed.

With almost 3 billion active credit cards in the world, the number of fraudulent transactions is consistently growing. In 2018, over 130 000 credit card fraud reports were recorded in the US alone, while the total value of all worldwide fraudulent transactions was almost 25 billion USD. These numbers alone are enough reason for financial institutions to try and detect, and possibly even prevent fraud. Naturally, NEXI Croatia had previously developed a rule-based fraud management system that detects potentially suspicious card transactions which trigger alerts in case of any suspicious activity.

Traditional fraud detection systems are based on explicit and hard-coded sets of rules. These rules are often easy to interpret, which is very important in domains like PayTech, where interpretability is key and decisions must be explained. However, what they have in interpretability, they deeply lack in flexibility and adaptability, while also being difficult to maintain. In a rapidly changing environment like today’s, hand-crafted rules are rapidly becoming outdated and unable to properly detect fraud.

Therefore, modern fraud detection systems rely on almost-as-interpretable machine learning techniques to achieve better results with more flexibility and adaptability. Supported by modern tools for dataset versioning, distributed processing, and model tracking, these techniques are also easier to maintain. The system currently in use at NEXI Croatia is mostly traditional, using ML outputs only as one of the inputs to the rule-based system.

Challenge

One of CROZ’s tasks was to modernize the system by incorporating ML techniques and automating as much work as possible. The problems NEXI Croatia had with their current fraud system can be summarized into four categories:

large amounts of data
an unsuitable environment
a gradual drop in the current model’s performances
the lack of a near real-time ML solution.

The large amounts of data caused the processing time for calculating the feature set to be measured in days while limiting the expressivity of the features themselves – particularly the amount of history each feature’s calculation is based on. The development environment was underpowered or underutilized: the calculation of features wasn’t parallelized, the models and tools were not sufficient to capture the patterns of fraudulent transactions, and the horizontal scaling capabilities were limited. Due to technical and business limitations, the model currently in use was trained on a comparatively small set of features and is difficult to maintain, which might result in feature drift.

Additionally, the features are calculated based on a relatively short historical period, insufficient to capture user and merchant behaviors over longer periods. Finally, while this model can be and is used in near real-time fraud detection, the current near real-time architecture is not sufficient to support more complex models and feature sets.

Solution

Our team focused on simultaneously tackling all four problem components of the current system. In particular, CROZ has helped NEXI Croatia in the following tasks:

Environment preparation and infrastructure setup:
Installing all tools and utilities required to develop the solution, process large datasets, calculate long-term aggregations and train first models
Improving big data processing:
Preparing the final feature dataset by processing large raw transaction. Incorporating complex features into the feature dataset, such as various short- and long-term aggregations, with resolutions ranging between 5 minutes and 9 months. Splitting the feature dataset into a train and test set and implementing various techniques to mitigate class imbalance.
Creating a new fraud classifier:
Experimenting with SotA models and tuning their hyperparameters. Training the fraud classifiers on the prepared train sets. Evaluating models using both standard classification metrics (recall, precision, F1 score, AUC-PR…) and business-oriented metrics in comparison with the currently used model (number and value of detected frauds, missed frauds, and false positives).
Feature analysis:
Analyzing all calculated features and their importance, ranking them and determining their overall relevance. Categorizing features into business-level semantic categories and ranking the categories of features.
Near real-time architecture:
Designing and planning the proposed architecture which would be used to calculate features for transactions and classify them in near real-time.

Results

Once the project finished and CROZ helped NEXI Croatia with improving the fraud system, these were the results:

Increased dataset quality: more transactions, more features (~400)
Larger historic scope of the dataset: increase from several days to several months
Speed-up of feature creation: a drop from days to hours
Simplified the feature creation code and made it easier to maintain
An ML model with a greater capacity: from logistic regression to LightGBM
Prediction quality of the model is greatly improved across all metrics
The new model is significantly more effective at detecting frauds (both in the number of frauds and value) within the top 1000 alerts compared to the current model, while also reducing the number of false positives.