Skip to content

Latest commit

 

History

History
88 lines (50 loc) · 4.12 KB

File metadata and controls

88 lines (50 loc) · 4.12 KB

Arvato Customer Segmentation Project

Table of Contents

  1. Dependencies
  2. Description
  3. Data files
  4. Project Motivation
  5. File Description
  6. Results
  7. Licensing, Authors, Acknowledgements
  8. References

Dependencies

Description

This Project is part of Data Science Nanodegree Program by Udacity in collaboration with Arvato Bertelsmann.

The Project is divided in the following Sections:

  1. Customer Segmentation Report: In this section,the unsupervised learning technique is used to identify few characteristics for company's existing customers compared to the general population of Germany.

  2. Supervised Learning Model: In this section, supervised Learning model is used to investigate mailout_train and mailout_test dataset to predict which individuals are most likely to respond to a mailout campaign.

  3. Kaggle Competition: After chosing the best model, the results submitted to kaggle competition.

Data files

  • azdias: demographics data for the general population of Germany;  891 211 persons (rows) x 366 features.

  • customers: demographics data for customers of a mail-order company; 191 652 persons (rows) x 369 features.

  • mailout_train: Demographics data for individuals who were targets of a marketing campaign; 42982 persons and 367 features including response of people.

  • mailout_test: Demographics data for individuals who were targets of a marketing campaign; 42833 persons and 366 features.

There are two more files which describes the attributes and its values. But the main datasets files are not available because of privacy of Arvato comapny's data.

Project Motivation

The main goal of this project is to characterize the customer segment of the population, and to build a model that will be able to predict customers for Arvato Financial Solutions

File Description

There are mainly two Notebooks available,

Arvato Project Customer Segmentation Report.ipynb : It includes Data analysis and Unsurvised learning techinques to compare general population to the company's customers.

Arvato Project ML prediction.ipynb : It includes Supervised learning techniques to predict which individuals are most likely to respond to a mailout campaign.

And two python files,

cleaning.py : It describes the data preprocessing and cleaning functions of azdias and customers dataset and unsupervised learning function.

ml.py : It describles the data preprocessing and cleaning functions of mailout_train and mailout_test dataset and model evaluation functions.

Results

The main findings of the code can be found at this Customer Segemnetaion Report available here.

Licensing, Authors, Acknowledgements

References

https://www.dataschool.io/roc-curves-and-auc-explained/

https://scikit-learn.org/stable/auto_examples/model_selection/plot_roc.html#sphx-glr-auto-examples-model-selection-plot-roc-py

https://www.kaggle.com/alexisbcook/titanic-tutorial