Customer Churn Prediction

This Jupyter notebook aims to evaluate a Customer churn dataset, its triggers and the main features which impact in customer churn in a telecommunications provider.

1. Installation

The Python version was updated to Python 3.8.5 here. And other modules and functions were used like: -sklearn.preprocessing > MinMaxScaler -sklearn.linear_model > LogisticRegression -sklearn.model_selection > train_test_split -sklearn.metrics > confusion_matrix -seaborn -awswrangler

The dataset for this project can be downloaded after installing Kaggle API (https://github.com/Kaggle/kaggle-api). To download the Customer-churn-dataset-2020 the following bash command has to be written: "Kaggle competitions download -c customer-churn-prediction-2020" The link for this Kaggle competition with the data description can be found bellow in Acnowledgments.

2. Project Motivation

This project is part of a selective process of Clin/Devexo data scientist position. The challenge was given to solve a Kaggle competion from 2020. The goal was to implement and visualize stats and inferences from the customer churn dataset of a telecom provider in USA.

4. How To Interact With the Project

Methodology - CRISP-DM: The followed steps were used to infer the answer the proposed problem of churn in the Kaggle dataset proposed by Clin/Devexo company to its selective process.

1. Business Understanding:
    Why people from this telecom provider churns?
    Is there some state influence in churn?
    What are the main feature of influence?
2. Data Undertanding:
    Data of customer churn from a USA telecom provider. 
3. Data preparation:
    Evaluation of features to drop, add or prepare the data to answer the above questions.

4. Prediction Model 
    It was used a Logistical Regression model. This part is still being implemented.
    
5. Model Evaluation 
    This part is still being implemented.

5. Results

The results obtined untill now showed that the disbalanced training datased probabily resulted in a model with overfitting. Future implementations and balancing of the data will be done to improve the model building.

6. Licensing, Authors, Acknowledgments

I would like to thank to Kaggle for providing the dataset used in this project and it can be originally find in a Kaggle 2020 competition here. And I also would like to thanks to Clin/Devexo for this challenging oppotunity.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
README.md		README.md
data-visualization.ipynb		data-visualization.ipynb
download_preprocessing.ipynb		download_preprocessing.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Customer Churn Prediction

1. Installation

2. Project Motivation

4. How To Interact With the Project

5. Results

6. Licensing, Authors, Acknowledgments

About

Releases

Packages

Languages

natalia-kitaoka/churn-prediction

Folders and files

Latest commit

History

Repository files navigation

Customer Churn Prediction

1. Installation

2. Project Motivation

4. How To Interact With the Project

5. Results

6. Licensing, Authors, Acknowledgments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages