Blueprint of a transaction monitoring solution on top of Azure and custom ML algorithm for money laundering

Radu Vunvulea
5 min readJul 5, 2019

Source: http://vunvulearadu.blogspot.com/2019/07/blueprint-of-transaction-monitoring.html

In this post, we talk about how we can use the cloud to enable us to do real-time analytics of bank account activities using our own custom Machine Learning algorithm.

Business scenario

Imagine that you are working for a bank that has subsidiaries in multiple regions around the world. You want to develop a system that can monitor in real time the bank activities that are happening cross subsidiaries and identify suspect any suspect transactions or accounts.

Machine Learning Algorithm

The bank already develops a custom ML algorithm that can detect and mark any account or transactions that looks, suspect. The solutions can detect suspect transactions so good, that you decide that in a specific situation you already take automatic actions like suspending the access to the account or delay the transaction for a specific time interval for further investigation.

Business problem

The bank lacks a solution that can aggregate all the audit data from the account activity and run a custom Machine Learning system on top of it. They already did an internal audit cross different subsidiaries and found a way how they can remove customer-specific information to be able to do collect and analyse data in one location.

Technical challenges

The first challenge is to collect all the accounts activities in a central location, like a stream of data. The current number of activities per minutes that needs to be ingested are between 40.000 to 150.000. In the future, the growth rate is estimated to be between 15–20% per year, mainly because of the online platforms.

The second challenge is to build a solution that can apply the ML algorithm on a stream of data without required to build a new data centre. The board approved a test period of 18 months, but they cannot do a commitment for 5 years until they don’t see that the solution is working as promised.

The 3rd challenge is to provide a more user-friendly dashboard, that would allow the security analytics team to interact with the system easier and to be able to drill down inside data easier. The current solution has dashboards that cannot be customised too much, and they would like to be able to write queries in a more human-friendly way.

The last biggest challenge is to aggregate the streams of data in only one data source that would be processed by the ML algorithm. Each country or region produces its own data stream of data that needs to be merged based on the timestamp.

Cloud approach

An approach for such a solution would be to use Microsoft Azure or AWS and build a platform on top of the cloud. There are no upfront costs and there are a lot of SaaS services inside them that can enable fast implementation. The most significant advantage in this context is that the bank already received the green light from an external audit company that is allowing them to push content outside subsidiary regions as long as the customer identification information is removed.

Solution overview on top of Microsoft Azure

In the next section, let’s take a look at a solution build on top of Microsoft Azure, where we try to identify the key services that can enable us to build a platform that can scale easier, required low initial investment and can operation costs are kept as low as possible.

Azure Blueprint

Inside each subsidiary, we already have a local system that can provide us with a stream of audit and logs information. This is our entry point that can be used used to collect the account activities. Inside each subsidiary datacenter, a custom component needs to be installed that will:

1 Collect relevant activities

2 Remove customer identification information

3 Push the stream of data inside Microsoft Azure

Because we have multiple subsidiaries that need to push content to the platform, the best approach is to use a dedicated Azure Event Hub for each of them. Each subsidiary has its instance of Azure Event Hub where activities are pushed.

All the activities are collected in a central Azure Event Hub by an instance of Azure Event Grid. Azure Event Grid is capable of merging, or the stream of data an connect them to external references that are stored inside Azure Storage. This gives us the flexibility to do a pre-processing or transformation in the future and connect other data sources without changing the platform architecture.

The main instance of Azure Event Hub is connected to Azure Stream Analytics that becomes our central location where the real magic is happening. Azure Stream Analytics is allowing us to connect it to Azure Machine Learning solution and analyse the activity stream using our custom algorithm hosted inside it.

The combination of Azure ML and Azure Stream Analytics enable us to have a system that can apply our ML algorithm on top of a stream of data without requiring us to do any custom configuration or development. This is the most significant advantage offered by Microsoft Azure to our platform and is the differential factor.

The number of activities that are marked as suspected is pretty low, under 0.05%. We need to take into account that for a part of them will land in a repository where security analysts team will review them, and others will need to trigger specific actions.

A good candidate is Azure Service Bus, that would allow us to hook multiple consumers. Consumers that can trigger automated actions can filter the output from Azure Stream Analytics and accept only suspicious activities where the confidence is over a specific threshold, and automatic actions can be taken.

Information from Azure Service Bus is pushed in two different systems using Azure Functions. The first one is to a dedicated subscription consumed by an Azure Function for automatic actions can be taken. This function makes calls to a REST API exposed by each subsidiary that it is used to suspend accounts or delay transactions.

The second Azure Functions consume content from another Azure Service Bus subscription that pushes the content to a repository like Azure Cosmos DB that it is used by PowerBI to generate custom reports on top of it and a dashboard for monitoring used by the security analyst. On top of this, a bot developed on top of Azure Bot Service it is used by the security analyst team to query the storages and extract insights related to different items.

Conclusion

As we can see from the infrastructure and services used perspective, the proposed solution is using services offered in a SaaS and PaaS model. Using this approach, the initial investment is kept as low a possible, and most of the issues related to scalability, operation and similar activities are solved out of the box.

Source: http://vunvulearadu.blogspot.com/2019/07/blueprint-of-transaction-monitoring.html

--

--

Radu Vunvulea

Technology enthusiast that runs away from stupidity and enjoy the simple life of the cloud era. Speaker, traveler and crafter, he is a wine and coffee lover