Set up a Kafka
cluster with Zookeeper, to handle streaming data. It uses a Cassandra database as a sink, but you could easily modify this part.
git clone https://github.com/n-traore/kafka-docker.git && cd kafka-docker
git update-index --assume-unchanged .env
docker run --rm --name cassandra_sink -d cassandra:3.11.5
docker cp cassandra-sink.cql cassandra_sink:/
# wait a few seconds for cassandra to be reachable, then run
docker exec -i cassandra_sink cqlsh -f cassandra-sink.cql
To check that everything went well, you can connect to the container and describe the table like this :
docker exec -it cassandra_sink cqlsh
describe sink.spotify_playlist
The first step is to choose whether you want to run a single node or a multi-nodes cluster.
For the single node, run this command :
# Start single node cluster
docker compose -f docker-compose.zookeeperkafka.yml up -d
docker network connect kafka_net cassandra_sink
To check that everything went well : docker compose -f docker-compose.zookeeperkafka.yml logs broker | grep started
For multi-nodes, run this command :
# Start 3 nodes cluster
docker compose -f docker-compose.zookeeperkafka_multi.yml up -d
docker network connect kafka_net cassandra_sink
Then create the producer and consumer that will stream data to the cluster. Here we have defined the producer and consumer as simple Python apps that you can run using the following command :
docker compose up
This will run on the foreground so you can see the messages arrive on the console every 10 seconds. Stop using ctrl+c
.
You can view the cluster in UI with kafdrop by going to http://localhost:9010
Stop and delete all containers. For example, for cassandra and the multi-nodes cluster containers :
docker stop cassandra_sink
docker compose -f docker-compose.zookeeperkafka_multi.yml down