The details of how to spark-images are build in different layers can be created can be read through the blog post written by André Perez on Medium blog -Towards Data Science
# Build Spark Images
./build.sh
# Create Network
docker network create kafka-spark-network
# Create Volume
docker volume create --name=hadoop-distributed-file-system
# Start Docker-Compose (within for kafka and spark folders)
docker compose up -d
In depth explanation of Kafka Listeners
Explanation of Kafka Listeners
# Stop Docker-Compose (within for kafka and spark folders)
docker compose down
# Delete all Containers
docker rm -f $(docker ps -a -q)
# Delete all volumes
docker volume rm $(docker volume ls -q)
#Stream-Processing with Python
In this document, you will be finding information about stream processing using different Python libraries (kafka-python,confluent-kafka,pyspark, faust).
This Python module can be seperated in following modules.
-
Docker Docker module includes, Dockerfiles and docker-compose definitions to run Kafka and Spark in a docker container. Setting up required services is the prerequsite step for running following modules.
-
Kafka Producer - Consumer Examples Json Producer-Consumer Example using kafka-python library Avro Producer-Consumer Example using confluent-kafka library Both of these examples require, up-and running Kafka services, therefore please ensure following steps under docker-README
To run the producer-consumer examples in the respective example folder, run following commands
python3 producer.py
python3 consumer.py