This repository was created for study and curiosity about Apache Spark, mainly how to make a cluster and integrate with App like notebook, for example, JupiterLab. Here, the Docker container and Docker Compose were used as infrastructure of this project.
In the repository have the Spark version 3.2.1 with Hadoop bin 3.2.
Download here: Apache Spark.
The dataset used for this project was Ebola Cases get from Data World.
Those are my references:
- Apache Spark Cluster on Docker (ft. a JupyterLab Interface) by André Perez.
- Doc Apache Spark
- Formación Apache Spark by Albert Coronado Calzada.