Abstract
The mobile market has grown substantially in the past years and its penetration nowadays to the developed societies is at the level of 150%. While telecommunication operators allow a person to have more than one mobile contract, they are also faced with the fact that such mobile contracts are often the target of fraudulent behavior due to the discounts on expensive devices they offer. Here, we describe a fully-automated business solution to identify fraudulent groups of mobile contracts. The framework is already “into production” by one of our clients and it offers a great increase in fraud detection efficiency. Furthermore, it brings considerable financial value not only via the fraud detection per se but also by reducing the labor time of this task. On the technical side, the workflow makes use of mSQL Server 2014 Integration Services and Azure ML Web services. It runs fully automatically on a schedule and executes the following tasks:
i) transferring and pre-processing of new contract-data to an Azure DB
ii) calling a custom-made Azure ML web-service that detects fraudulent groups of contracts, and
iii) writing back the results to the on-prem DB that are used to generate daily reports.
Introduction
According to the Swiss Federal Communications commission, the Swiss mobile telephony market included nearly 12 million contracts at the end of 2014 [1]. The increasing penetration of the smartphone market and the continuously more attracting and diverse contract-offers from telecommunication providers create a huge turnover of new or renewed mobile contracts every day. Such contracts can be the target of fraudulent behavior, especially for the postpaid-type, which come with great discounts on mobile devices.
It’s a common practice for telecommunication companies to allow a person to have more than one mobile contract. However, they also pose restrictions with respect to the number of allowed contracts per individual, especially to external companies that operate on their behalf. Whatever the case, a company needs always to assure that a new contract is permissible under a set of restrictions. Such a process requires comparing information of each new contract with information from past contracts and identifying a group of contracts that potentially belong to the same individual. But, how easy is it to do that?
Well, it depends on the information we can operate on and the type of fraud we try to identify. In many situations, contract-data do not include a unique identifier for each individual. Furthermore, names in contracts may be misspelled, first and last names flipped, middle names included or excluded. Identifying groups of contracts that potentially belong to the same person by use of only string-similarity metrics is by far not adequate. A smart contract-comparison-process needs to also account for other factors, such as the date of the contract, the geographical location of the shop that the contract took place, the responsible salesperson, etc. Now, if we assume that 50% of all mobile-contracts in Switzerland are renewed within the year, an average of appr. 1’000 new mobile contracts are made every day from a company that operates on behalf of all Swiss-providers and controls 6% of this market. In such case, fraud can get easily undetected and cause considerable financial damage, if manual practices are employed to compare all new contracts with past contracts stored in the database.
A modern solution around this problem needs to employ data analytics, be fully automatic, be easily expandable, and of course, be integrable with other processes of its environment. Additionally it needs to produce a result which can be easily consumed by a variety of end users. Here, we show how we can realize such a solution by combining SQL Server 2014 Integration Services with Azure ML. The described framework runs already daily for one of our customers and aids its Controlling Department to process large batches of mobile contracts and detect automatically potentially fraudulent groups of contracts. The solution offers, among others, a vast reduction of labor time associated with this task and a more than 100-fold increase to the number of detected fraudulent groups compared to the previously manual procedure.
Solution
Overview
The solution we are presenting here is coordinated through SQL Server 2014 Integration Services, and can easily run on a schedule. There are three important processes that take place (Figure 1):
- Transfer contract-data from the on-premise database to the Azure SQL database.
- Run the Fraud Detection Process, which reads the contract-data from the Azure SQL database, runs the fraud detection algorithm and writes the results back to the Azure SQL database.
- Transfer data (fraud groups) from the Azure SQL database to the on-premise database.
Building the Workflow with Microsoft SQL Server 2014 Integration services
Microsoft SQL Server 2014 Integration services (SSIS) is a very convenient platform for performing complex workflow functions that, for example, execute SQL statements or custom C# code. For our solution, we use Visual Studio as an environment for designing an SSIS job which performs the following operations and is scheduled to run twice a day using SQL Server Agent (Figure 2):
a. Data Extraction (DF_Extract_Data): In this step we use SQL to extract the contract-data from the on-premise Oracle database and write them to the Azure SQL Database. Depending on the characteristics of the analysis, we may decide to write to the Azure DB only a part of the original contract-data. For example, if we decide to compare every new contract with all the contracts of the past 6 months, we would initially load the data of all these contracts to the Azure SQL DB, and then, each day, only append the data of the new contracts for the given day. Such a strategy will make this part of the process run very fast on a daily basis.
b. Pre-Processing of Contract Data (UPD_IsCancelled_ContractNumber, UPD_IsCancelled_Phone, DEL_OldRecords): The modules of this step use SQL to filter contract-data from the Azure SQL DB that are not eligible for analysis. For example, contracts which get cancelled or are more than 6 months old, get deleted, and hence, are not analysed from the subsequent fraud detection process.
c. Perform the Fraud Detection (EXEC_FraudDetectionJob): The module of this step uses a generic C# script to call an Azure ML web service which performs the data analysis process and writes the result to a table in the Azure SQL database. Using a polling mechanism, the dataflow waits for the Azure ML script to complete before the next step of the ETL-flow is executed. In the next section we will describe in detail how we can create this web service.
d. Write the results to the on-premise database (PullDataToOnPrem, Truncate OutPut): When the Azure ML job is finished, the PullDataToOnPrem task captures the newly generated fraud group data in the Azure SQL database and appends a table on the on-premise SQL server with this data. In the last task, the result table in the Azure SQL database is truncated as a preparation for the next execution of the whole process. As in step (b), we also use SQL to perform these operations.
When the pipeline is built, we deploy it as a package and execute as many times as we want during the day by scheduling accordingly an SQL Server Agent Job.
Building the Azure ML Web Service
As previously described, the Azure ML web service reads the contract-data from the Azure SQL database, runs the fraud detection algorithm, and writes the results back to the Azure SQL database. Such a process can be created with Azure ML (see Figure 3). The Azure ML pipeline contains the following modules:
- A “Reader” module, which reads the data from the Azure SQL database
- A number of “R Execution Script” modules, which use R to implement a custom-made fraud detection process. The heart of the process is a classifier which decides if two contracts are suspiciously similar or not. To make this decision, the classifier takes various factors into account, like name similarity, date of the contracts, geographical location of the shops that the contracts took place, the responsible salespersons, etc. When all possible pair of contracts are analyzed, the data analytics performs a graph-theory-based process, which creates groups of similar contracts. Each group of contracts might belong to the same individual and hence, might constitute a fraud case.
- A “Writer” module, which writes the results (potentially fraudulent groups of contracts) to the Azure SQL database.
The Azure ML pipeline can be easily deployed as a web service (with just one click of a button) and called with any programming language and from any device that has a network connection, SSL capabilities to perform HTTPS requests, and the ability to parse JSON [2]. As described in the previous section, we call this web service using a simple C# script and use it as part of a larger process built with Microsoft SQL Integration Services, which can run automatically on a schedule
Conclusion
In this article we showed how to use SQL Server 2014 Integration Services and Azure ML in order to build a fully automatic solution for detecting groups of mobile contracts that potentially violate restrictions from telecommunication operators with respect to the number of allowed contracts per individual. The described pipeline is ideal for internal controlling processes in a company, since it’s almost impossible to manually identify such groups of contracts. Finally, the fraud detection process could be easily expanded to the point-of-sales level: The Azure ML web-service is very fast and could be called and executed on-line from the salesperson each time a new contract is made.
References
- Swiss federal Communications Commission (ComCom), „Annual Report 2014“, March, 2015.
- Jeff Barnes, “Microsoft Azure Essentials: Azure Machine Learning”, Microsoft Press, April, 2015