Optimising customer engagement efforts using machine learning: Application on retail banking

Valerie Lim
5 min readFeb 20, 2020

Background

Telemarketing is a strategy marketers commonly use to promote products or services to customers because it is more cost effective than conducting roadshows. Banks, too often adopt telemarketing. But, this pool of customers may be too huge and contacting all of them is extremely resource intensive and inefficient. Hence, if banks can find out who among this pool of customers are more likely to subscribe, and target them, banks can stretch their marketing expenses to the fullest, which in turn increase sales revenue and thus profits. In this article, I will walk you through how I used Machine Learning models to predict customers’ subscription behaviour, so marketers can focus their efforts on higher-propensity leads.

The Process

The dataset is obtained from Kaggle, which is derived from a Portuguese bank that has over 45,000 individual observations. I tested several machine learning models to see which model gave the best performance in predicting customer’s subscription behaviour.

After which, I created a Flask app tool and deployed it on Heroku so that marketing teams can key in relevant attributes of the customer and find out their likelihood of subscription and adjust their marketing efforts accordingly.

Feature Engineering

  1. Second half of the month: From day and month that the customer was last contacted, I found that more subscriptions happened in the second half of the month.
  2. New customers: From the duration of engagement, I can determine whether the customer was existing or new (i.e. 0 seconds of engagement).
  3. Cluster type: Customers’ segmentation can augment companies’ ability to target customers and make better campaign, which in turn improves customer relationships that can be translated into business profitability. As such, prior to building the classification model, I applied k-modes clustering to segment the customer population into different profiles and used their respective labels as additional input for the model.
List of original dataset features and engineered features. Colours indicate the features from which the engineered features were created from.

Feature Selection

I selected features based on their Information Value (IV), which ranks features based on how well they predict the target. I used these 9 features to create the model, and obtained similar results as using all features you saw earlier, suggesting those that were dropped were very weak predictors of subscription behaviour.

Selected features and their Information Value (IV) scores.

Model Building

The dataset was split into 20% test and 80% cross-validation data sets. I used Stratified 5-fold cross validation so that each fold has similar percentage of samples for each class. Using 5 training-validation folds, the data was fitted on three supervised learning algorithms:

  1. LightGBM
  2. Catboost
  3. Random Forest

I tried to fit the data using a simpler model such as Logistic Regression, but the model’s performance were not as good as the above gradient boosting and bagging algorithms. This could be due to the categorical nature of my features, which require them to be converted into one-hot encoded values. However, this results in a sparse dataset, resulting in a poorer model performance. On the other hand, gradient boosting models have categorical encoding support built-in which can handle categorical features without having to apply one-hot encoding, thus improving the models’ performance. LightGBM was eventually chosen over Catboost because of its lower log loss and higher F1 score. But recall score is relatively low across 3 models, at about 0.4. This means that, out of 10 customers who subscribed, we only engaged around 4.

Test score using various metrics

Can we improve recall score? Yes! After balancing our dataset by oversampling the number of people of subscribed, recall score increased by 40%. In this context, between precision and recall, recall score is the metric of concern, because I want to minimise false negatives (i.e. ppl who would subscribe but we didn’t engage them). Nevertheless, precision score is decent (i.e. out of 10 people we engaged, 4 subscribed).

Test score from LGBM before and after handling class imbalances

To best understand how the model works, let’s dive into the features. The plot below shows the relative importance of each feature.

Feature importances

Stronger predictors of customers’ subscription were Month, age group of customer, duration of engagement, cluster that customers belong to, and outcome of previous campaign. On the other hand, relatively less important predictors of customers’ subscription were mode of contact, whether customer has a housing loan, number of days since the customer was last contacted and number of engagements performed before this campaign.

Conclusion

This model can inform telemarketers on the profile of the key leads that they should focus their marketing efforts on. As seen in this lift curve, by focusing their efforts on the top 20%, the proportion of customers who would subscribe to the bank’s term deposit is 4 times more than the average.

Building a Flask app and deploying it on Heroku

As a minimum viable product, I have created a Flask app. You can access it here or view it here.

Marketing teams can enter customer’s attributes on this app. Clicking on the ‘submit’ button generates a likelihood of the customer actually subscribing to the term deposit. Assuming the marketing team’s internal threshold is 60%, with the given inputs shown in the video, the model generated a prediction of 80% likelihood of this customer subscribing. Since the likelihood is greater than the threshold, they could perhaps decide to invest more time in sharing with the customer the benefits of this term deposit etc.

Future Work

Besides demographic and campaign related features, I could

  1. Include other social and economic context features that could affect customers’s decision to subscribe (e.g. consumer price index, unemployment rate).
  2. Tune the threshold based on the business’s needs.
  3. Explore whether an ensemble model would improve model’s performance.

[This project was done as part of an immersive data science program called Metis. You can find the files for this project at my GitHub and the slides here.]

Feel free to reach out with any questions :)

--

--

Valerie Lim

A fast learner and self-starter, Valerie is results driven and possesses strong analytical skills | Data Scientist @ Dell | linkedin.com/in/valerie-lim-yan-hui/