Disaster_Response_Pipelines

Keywords: NLP, Tfidf, Pipeline, GridSearch, Multi-label Classification, Flask, Plotly

Project description

In this project, data engineering skills were applied to analyze disaster data from Figure Eight to build a model for an API that classifies disaster messages.

In the project repo, messages.csv contains real messages that were sent during disaster events, while the categories.csv contains category information of those messages. One piece of the message can be labeled in more categories. A machine learning pipeline should be created to categorize these events so that they could be sent to an appropriate disaster relief agency, which is the real-world implication of this project.

A web app were created, where a new message can be classified into several categories. The web app also displays three visualizations of the data.

Project components

There are three components to this project.

ETL Pipeline In the preparation phase, data were processed in Jupyter Notebook, refer ETL Pipeline Preparation.ipynb for details. A data cleaning pipeline process_data.py (in Workspace/Data folder) was created in a Python script including the following functions:
- Loads the messages and categories datasets
- Merges the two datasets
- Cleans the data
- Stores it in a SQLite database
Machine Learning Pipeline In the preparation phase, data were processed in Jupyter Notebook, refer ML Pipeline Preparation.ipynb for details. A machine learning pipeline train_classifier.py (in Workspace/Model folder) was created in a Python script including the following functions:
- Loads data from the SQLite database
- Splits the dataset into training and test sets
- Builds a text processing and machine learning pipeline
- Trains and tunes a model using GridSearchCV
- Outputs results on the test set
- Exports the final model as a pickle file
Flask Web App Here's the file structure of the project:

app
- template
-- master.html # main page of web app

-- go.html # classification result page of web app
- run.py # Flask file that runs the app
data
- disaster_categories.csv # data to process
- disaster_messages.csv # data to process
- process_data.py # data processing module
- InsertDatabaseName.db # database to save clean data to
models
- train_classifier.py # model traning module
- classifier.pkl # saved model
README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Disaster_Response_Pipelines

Project description

Project components

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
workspace		workspace
ETL Pipeline Preparation.ipynb		ETL Pipeline Preparation.ipynb
ML Pipeline Preparation.ipynb		ML Pipeline Preparation.ipynb
README.md		README.md
categories.csv		categories.csv
messages.csv		messages.csv

Tselmeg-C/Udacity_Project2_Disaster_Response_Pipelines

Folders and files

Latest commit

History

Repository files navigation

Disaster_Response_Pipelines

Project description

Project components

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages