Find the outlier: Detecting sales fraud with machine learning
We spoke to data expert Canburak Tümer about how machine learning is being used to detect fraud in sales transactions. Find out how ML technology is helping to keep this tricky job under control and what it looks for when crunching the data.
Canburak Tümer: Let me first define what I mean with sales point. Sales points are the locations where Turkcell Superonline gathers new subscribers. They can be a shop belonging to Turkcell, they can be a franchise, or sometimes they can be a booth in an event. Anomaly in sales usually occurs in the numbers of new subscriptions; if a shop usually sells x subscriptions in a day and suddenly in a new day it sells twice as many units, there is an anomaly and it may lead to a fraud. We report this anomaly to revenue assurance teams to investigate.
The other type of anomaly is between different shops. We are expecting to have similar numbers between the same type of shop in same towns, but there can be outliers. These outliers should be investigated for any potential fraud. So an anomaly in the sales may mean a fraudulent action.
JAXenter: What parameters do you look for when looking for an anomaly?
Canburak Tümer: Our main parameter is the number of new subscriptions in different intervals (daily, weekly, monthly, 6 months), supported by the town and sales point type information. But in further research we will add, we will also look at cancellation numbers of these new subscriptions, complaint numbers, and average churn tenure.
JAXenter: How can outlying sales points be identified?
Canburak Tümer: For detecting the outlying shop in a town, we are now using the interquartile range method. This is a basic and trusted method to detect outliers in a set of records. Also, we are evaluating the hierarchical clustering method by choosing a good cut off point. Hierarchical clustering can help us to detect non-normal point in the data.
JAXenter: Why is it more complex to find outlier sales points? What is necessary for this?
Canburak Tümer: For a single sales point, it is easier to detect the trend, then predict the sales for the next time interval and check if the data belongs in the predicted value. But when it comes to comparing different sales points, new features come into the stage. First of all, location and population of the location affect the sales.
Then, the type of the sales point: an online sale or telesales point cannot be compared to a local shop. As the number of features increases, model complexity increases along with it. In order to keep things simple, we group the sales point by the location and type then use the simple methods to detect outliers.
JAXenter: Thank you!