Elsevier

Expert Systems with Applications

Volume 104, 15 August 2018, Pages 1-21
Expert Systems with Applications

An artificial intelligence system for predicting customer default in e-commerce

https://doi.org/10.1016/j.eswa.2018.03.025Get rights and content

Highlights

  • The first Artificial Intelligence system for credit scoring in e-commerce.

  • A new Genetic Programming system for credit scoring.

  • Improving the state of the art for credit scoring in e-commerce.

  • Transparent predictive models for credit scoring.

Abstract

The growing number of e-commerce orders is leading to increased risk management to prevent default in payment. Default in payment is the failure of a customer to settle a bill within 90 days upon receipt. Frequently, credit scoring (CS) is employed to identify customers’ default probability. CS has been widely studied, and many computational methods have been proposed. The primary aim of this work is to develop a CS model to replace the pre-risk check of the e-commerce risk management system Risk Solution Services (RSS), which is currently one of the most used systems to estimate customers’ default probability. The pre-risk check uses data from the order process and includes exclusion rules and a generic CS model. The new model is supposed to replace the whole pre-risk check and has to work both in isolation and in integration with the RSS main risk check. An application of genetic programming (GP) to CS is presented in this paper. The model was developed on a real-world dataset provided by a well-known German financial solutions company. The dataset contains order requests processed by RSS. The results show that GP outperforms the generic CS model of the pre-risk check in both classification accuracy and profit. GP achieved competitive classificatory accuracy with several state-of-the-art machine learning methods, such as logistic regression, support vector machines and boosted trees. Furthermore, the GP model can be used in combination with the RSS main risk check to create a model with even higher discriminatory power.

Introduction

E-commerce vendors in Germany have to deal with a peculiarity: commonly used payment types like credit cards and PayPal represent relatively low market shares, and the majority of orders are processed using open invoice instead. Using open invoice, a vendor bills customers for goods and services only after delivery of the product. Thus, the vendor grants customers a credit to the extent of the invoice. Usually, the vendor sends customers an invoice statement as soon as the products are delivered or provided. The invoice contains a detailed statement of the transaction. Because the customer receives a purchase before payment, it is called open, and the invoice is closed once the payment is received. Around 28% of customers in Germany choose open invoice as their payment type (Frigge, 2016), and around 68% of customers name open invoice as one of their favorite payment types (Fittkau & Maa Consulting, Wach). However, open invoice is prone to payment disruptions. Among the most common reasons, vendors find that customers simply forget to settle the bill or delay the payment on purpose. However, around 53% of vendors state that insolvency is one of the most common reasons for payment disruption (Weinfurner, Weisheit, Wittmann, Stahl, & Pur, 2011). The majority of the cases that conclude in default on payment in Germany are nowadays orders with open invoices, with more than 8% of all orders defaulting (Seidenschwarz, Weinfurtner, Stahl, & Wittmann, 2014). E-commerce vendors find themselves in a conflict: offering open invoice incentivizes many clients to confirm their purchases but, at the same time, increases the risk of default on payment rate. The former aspect has a positive effect on revenue, while the latter drives it down. Additionally, default on payment has a negative impact on the profit margin, due to costs arising through the provision of services and advance payments to third parties. In order to break through this vicious circle, vendors can fall back on a plethora of methods. Many tackle this conflict by implementing exclusion rules for customer groups they consider especially default-prone (for instance, customers who are unknown to the vendor or whose order values are conspicuously high). Another approach, used by more than 30% of e-commerce vendors in Germany, is to fall back on external risk-management services (Weinfurner et al., 2011). Risk management applications are aimed at detecting customers with a high risk of defaulting. Those applications are frequently built using credit scoring (CS) models. CS analyzes historical data to isolate meaningful characteristics that are used to predict the probability of default (Mester, 1997). However, the probability of default is not an attribute of potential customers but merely a vendor’s assessment of whether the potential customer is a risk worth taking. Over the years, CS has evolved from a subjective vendor’s “gut” decision to a method based on statistically sound models (Thomas, Edelman, & Crook, 2002). Among the providers of risk management services in Germany is the risk management division of Arvato Financial Solutions (AFS), which provides a number of services, including identification of individuals, evaluation of credit-worthiness, and fraud recognition. The AFS databases consist of 21 million solvency observations totaling information from 7 million individuals in Germany, addresses and change in address information, and bank account information as well as phone numbers, email addresses, and device information. AFS’s risk management service for e-commerce is called Risk Solution Services (RSS). RSS covers the entire order process and provides a number of services for every stage of the order process. The main service for evaluating customers’ default probability is called risk check and is split into a pre-risk check and a main risk check. The main risk check is based on a credit agency score that uses country-specific solvency information on individuals. Hence, the main risk check is inoperable in countries without accessible solvency information. Contrarily, the pre-risk check was designed to always be operable and to ensure that the risk check returns an evaluation of the customers’ default probability. For this purpose, the pre-risk check uses data transmitted by the customer during the order process. However, the pre-risk check in several industrial realities is nowadays based on a generic model, sometimes even without statistically sound backup (Lessmann, Baesens, Seow, & Thomas, 2015).

The objective of this work is to use genetic programming (GP) to build a CS model to replace the existing RSS pre-risk check. This is done in continuity with a precise recent research track, aimed at using technology to improve risk management (Lessmann et al., 2015). Inspired by Darwin’s theory of evolution, GP (Koza, 1992a) is a computational intelligence (CI) method that employs evolutionary mechanisms such as inheritance, selection, crossover, and mutation to gradually evolve new solutions to a problem. In a CS environment, GP initializes a population of discriminant functions to classify customers into bad and good ones (hereafter called bads and goods for simplicity). This population is subsequently evolved to find the best possible discriminant function. The motivation for using a CI method to tackle the problem comes from Marques, Garcia, and Sanchez (2013), who discuss five major characteristics of CI systems that are especially appealing in CS: learning, adaption, flexibility, transparency, and discovery. Learning describes the ability to learn decisions and tasks from historical data. Adaption represents the capability to adapt to a changing environment, i.e., without being restricted to specific situations or economic conditions. The flexibility of CI systems allows for utilization even with incomplete or unreliable datasets. Furthermore, Marques and colleagues state that CI systems may be transparent, in the sense that resulting decisions may be visible and thus at least partially explainable in some cases. Lastly, discovery represents the ability to find previously unknown relationships. Inside the wide field of CI, our focus on GP follows the same motivations as in Ong, Huang, and Tzeng (2005), where it is argued that GP has a number of attractive characteristics for its application in CS. First, it is a non-parametric tool and is not restricted to specific situations or datasets, but can be used in a vast context. Second, it automatically determines the most fitting discriminant function. Last but not least, GP can automatically select the most important variables during the learning phase. Indeed, research has already shown the benefits of GP and its utility in CS (see Section 3 for a detailed discussion of the state of the art). However, CS is usually employed with data from the financial sector, while other sectors have rarely been considered so far. In this work - for the first time, to the best of our knowledge - we extend current research in CS by employing GP on a dataset that contains orders from e-commerce vendors.

This work is organized as follows. Section 2 contains a general introduction to the theoretical framework of CS. In Section 3, previous and related work is analyzed and discussed. Section 4 presents the RSS and the services it provides for every stage of the order process. In Section 5, we describe the dataset used in this work and provided by AFS. Section 6 presents the organization of our experimental study and a discussion of our experimental settings. In Section 7, we present and discuss the obtained experimental results. Finally, Section 8 concludes the paper and proposes ideas for future research. The paper is terminated by Appendix A, in which we briefly introduce GP for readers who are not familiar with this computational method and also suggest bibliographic material to deepen the readers’ understanding of the subject.

Section snippets

Theoretical framework

CS is widely used by financial institutions to determine applicants’ default probability and subsequently classify them into good applicants (the “goods”, for simplicity) or bad applicants (the “bads”) (Thomas et al., 2002). Consequentially, applicants may be rejected or accepted as customers based on that classification. Thus, CS represents a binary classification problem (Henley, 1995). The binary response variable represents a default in payment by the customer, or potential default in

Literature review

CS is currently a widely studied research field, and several important contributions have appeared. For a detailed survey of classification algorithms for CS, the reader is referred to Lessmann et al. (2015). While an attempt to exhaustively cover all existing contributions here is purely utopic, given the limited available space, we organize this section in the following way: in the first part, we present the history and evolution of the field, while in the last part, we focus on the most

Risk solution service

Risk Solution Service (RSS) is a risk management service that aims to cover the whole order process of e-commerce retailers’ customers. Its objectives are threefold. First, increase conversion rate and customer retention in the e-shop by improving differentiation and managing of payment methods. Second, enhance cost control by providing innovative pricing models and configurable standard solutions in different service levels. Third, improve discriminatory power by combining current and

Dataset

The dataset used in this work consists of order requests processed by RSS between 10-01-2014 and 12-31-2015, and it is provided by the AFS company. It contains 56,669 order requests, among which 15,535 ( ≈  27%) are labeled as “bad”, while the remaining 41,134 are labeled as “good”. These order requests are subject to a stratified random split into a training set with 31,669 ( ≈  56%) observations, a test set with 10,000 ( ≈  18%) observations and a validation set with 15,000 ( ≈  26%)

Experimental organization and settings

When GP is employed to solve complex problems, like the one tackled in this paper, the use of an appropriate fitness function is often a crucial step. In this work, after considering several other possible measures, we have decided to use the area under the receiver operating characteristic (ROC) curve (ROC-AUC). ROC-AUC is the single-scalar representation of the ROC curve (Abdou & Pointon, 2011). The ROC curve is used when a classifier returns a numeric value that has to be interpreted as a

Experimental results

The presentation of the experimental results is organized as follows: in Section 7.1, we present the results obtained by GP in the CS problem described so far, and we dedicate particular attention to a discussion and an interpretation of the best model evolved by GP. In Section 7.2, we discuss the results we have obtained in the calibration phase. In Section 7.3, we compare GP and other machine learning methods. Finally, in Section 7.4, we discuss the results we obtained when GP was first

Conclusions and future work

The objective of this work was to develop a credit scoring (CS) model to replace the pre-risk check of the e-commerce risk management system Risk Solution Services (RSS), which is currently one of the most used systems to estimate customers’ default probabilities. The pre-risk check uses data from the order process and includes exclusion rules and a generic CS model. The new model was supposed to work as a replacement for the whole pre-score and had to be able to work in isolation and in

References (75)

  • R. Neto et al.

    A framework for data transformation in credit behavioral scoring applications based on model driven development

    Expert Systems with Applications

    (2017)
  • Platt, J. (1998). Sequential minimal optimization: A fast algorithm for training support vector...
  • G. Rätsch et al.

    Soft margins for adaboost

    Machine learning

    (2001)
  • Weinfurner, S., Weisheit, S., Wittmann, G., Stahl, E., & Pur, S. (2011). Zahlungsabwicklung im...
  • J.C. Wiginton

    A note on the comparison of logit and discriminant models of consumer credit behavior

    The Journal of Financial and Quantitative Analysis

    (1980)
  • B. Zadrozny et al.

    Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers

    Icml

    (2001)
  • H.A. Abdou et al.

    Credit scoring, statistical techniques and evaluation criteria: A review of the literature

    Intelligent Systems in Accounting, Finance & Management

    (2011)
  • J. Abelln et al.

    Improving experimental studies about ensembles of classifiers for bankruptcy prediction and credit scoring

    Expert Systems with Applications

    (2014)
  • B.C. Alves et al.

    Survival mixture models in behavioral scoring

    Expert Systems with Applications

    (2015)
  • B. Baesens et al.

    Benchmarking state of the art classification algorithms for credit scoring

    Journal of the Operational Research Society

    (2003)
  • G.W. Brier

    Verification of forecasts expressed in terms of probability

    Monthly Weather Review

    (1950)
  • M. Castelli et al.

    A c++ framework for geometric semantic genetic programming

    Genetic Programming and Evolvable Machines

    (2015)
  • Code, U. S. (1974). Equal credit opportunity...
  • Fittkau & Maa Consulting (2014). 38. WWW-Benutzer-Analyse W3B: Kaufentscheidung im...
  • R.H. Davis et al.

    Machine-learning algorithms for credit-card applications

    IMA Journal of Management Mathematics

    (1992)
  • M.H. DeGroot et al.

    The comparison and evaluation of forecasters

    The Statistician

    (1983)
  • V.S. Desai et al.

    A comparison of neural networks and linear scoring models in the credit union environment

    European Journal of Operational Research

    (1996)
  • A.E. Eiben et al.

    Introduction to evolutionary computing

    (2003)
  • T. Fitzpatrick et al.

    An empirical comparison of classification algorithms for mortgage default prediction: Evidence from a distressed mortgage market

    European Journal of Operational Research

    (2016)
  • F.-A. Fortin et al.

    DEAP: Evolutionary algorithms made easy

    Journal of Machine Learning Research

    (2012)
  • H. Frydman et al.

    Introducing recursive partitioning for financial classification: The case of financial distress

    The Journal of Finance

    (1985)
  • V. García et al.

    An insight into the experimental design for credit risk and corporate bankruptcy prediction systems

    Journal of Intelligent Information Systems

    (2014)
  • S. Giovanni et al.

    Ensemble methods in data mining: Improving accuracy through combining predictions

    Synthesis Lectures on Data Mining and Knowledge Discovery

    (2010)
  • D.E. Goldberg

    Genetic algorithms in search, optimization and machine learning

    (1989)
  • Han Ju, Y., & Young Sohn, S. (2014). Updating a credit-scoring model based on new attributes without realization of...
  • D.J. Hand et al.

    Statistical classification methods in consumer credit scoring: A review

    Journal of the Royal Statistical Society: Series A (Statistics in Society)

    (1997)
  • T. Hastie et al.

    Multi-class adaboost

    Statistics and its Interface

    (2009)
  • Cited by (53)

    • Credit scoring methods: Latest trends and points to consider

      2022, Journal of Finance and Data Science
      Citation Excerpt :

      mean/mode imputation – for continuous/discrete variables,30,68,69 incorporation of missing values into a separate category – for discrete and categorical variables,30,63,70 weight of evidence (WOE) transformation,71

    • Assessing credit risk of commercial customers using hybrid machine learning algorithms

      2022, Expert Systems with Applications
      Citation Excerpt :

      The data set used in this study contains real-world financial, classification and transactional data as well as labeled information (i.e., current and past credit scores) of commercial customers over a period of three years. A review of the literature that focuses on the prediction and classification of credit scores shows that many studies investigated retail customers’ credit (Banasik et al., 1996; Bao et al., 2019; Bijak & Thomas, 2012; Chandler & Ewert, 1976; Finlay, 2011; Kozodoi et al., 2019; Kvamme et al., 2018; Lim & Sohn, 2007; Liu et al., 2019; Soui et al., 2019; Zhang et al., 2019), while others looked at commercial customers’ credit (Barboza et al., 2017; Ben-David & Frank, 2009; Bequé & Lessmann, 2017; Liang et al., 2016; Mai et al., 2019; Vanneschi, Horn, Castelli, & Popovic, 2018). Research that explored credit scoring for retail customers usually relied on data sets from the UCI Machine Learning Repository (Dua & Graff, 2019), even when the purpose was only for validating research results (Bao et al., 2019; Bequé & Lessmann, 2017; Soui et al., 2019; Zhang et al., 2019), with few studies having used private data sets from specific markets (Bao et al., 2019; Kvamme et al., 2018; Liu et al., 2019).

    • Artificial Intelligence Tools for Reshaping E-Business and Trade

      2024, Handbook of Artificial Intelligence Applications for Industrial Sustainability: Concepts and Practical Examples
    View all citing articles on Scopus
    View full text