Skip to content

This repository was created for study and curiosity about Apache Spark, mainly how to make a cluster and integrate with App like notebook, for example, JupiterLab. Here, the Docker container and Docker Compose were used as infrastructure of this project.

Notifications You must be signed in to change notification settings

Guilherme-Esplugues/spark-cluster-with-jupyterlab-on-docker

Repository files navigation

Spark cluster with JupyterLab on Docker

This repository was created for study and curiosity about Apache Spark, mainly how to make a cluster and integrate with App like notebook, for example, JupiterLab. Here, the Docker container and Docker Compose were used as infrastructure of this project.

Apache Spark

In the repository have the Spark version 3.2.1 with Hadoop bin 3.2.

Download here: Apache Spark.

Dataset sample - Ebola cases

The dataset used for this project was Ebola Cases get from Data World.

Reference

Those are my references:

  1. Apache Spark Cluster on Docker (ft. a JupyterLab Interface) by André Perez.
  2. Doc Apache Spark
  3. Formación Apache Spark by Albert Coronado Calzada.

About

This repository was created for study and curiosity about Apache Spark, mainly how to make a cluster and integrate with App like notebook, for example, JupiterLab. Here, the Docker container and Docker Compose were used as infrastructure of this project.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published