Skip to content

Latest commit

 

History

History
55 lines (33 loc) · 1.58 KB

README.md

File metadata and controls

55 lines (33 loc) · 1.58 KB

IMDB Data Analysis Pipeline

Objective:

The aim of the project is to analyse the movies data from multiple sources such as IMDB MoviesLens, The Numbers and BoxOffice Mojo.com based on movies/cast/box office revenues, movie brands and franchises and perform ETL processes using Talend.

Technologies Used:

ER/ Studio SQL server Developer Edition Microsoft SQL server Management Studio Talend Real-Time Data Platform 7.1 Tableau Desktop Microsoft PowerBI

Dataset Links:

https://datasets.imdbws.com/

https://www.boxofficemojo.com/franchise/?ref_=bo_nb_fr_secondarytab

https://www.boxofficemojo.com/brand/?ref_=bo_nb_frs_secondarytab

https://grouplens.org/datasets/movielens/25m/

https://www.the-numbers.com/movies/franchises

https://www.the-numbers.com/movies/franchise/Marvel-Cinematic-Universe#tab=summary

https://www.the-numbers.com/movie/Avengers-The-(2012)#tab=box-office

Code Walkthrough:

Step 1 : Run following script in SSMS to setup the staging database The Number - stage tables.sql

stg imdb tables - core tables.sql

stg imdb tables expanded part 2.sql

stg_ml_tables.sql

Step 2 : Open Talend and setup your database connections and input file connections When the connections are successful run jobs.

Step 3 : Perform Visualizations in Tableau and PowerBI Refer to Tableau workbook for checking visualizations and new use cases will be added soon. Microsoft PowerBI file to be added soon.

References:

https://elearning.tableau.com/

https://help.talend.com/reader/KxVIhxtXBBFymmkkWJ~O4Q/8RlpZdAdKhP0IaMHXRV7yw

https://www.talend.com/

https://grouplens.org/datasets/movielens/