Apache Pinot is a distributed OLAP data store, built to deliver real-time analytics with low latency. It can ingest from batch data sources (such as Hadoop HDFS, Amazon S3, Azure ADLS or Google Cloud Storage) as well as stream data sources (such as Apache Kafka). In terms of scale, largest Pinot Production clusters are known to handle more than 1M+ events/sec. 170k+ queries/sec with latency in several milliseconds.
Superset is fast, lightweight, intuitive, and loaded with options that make it easy for users of all skill sets to explore and visualize their data, from simple line charts to highly detailed geospatial charts.
Covid Tracker Service will do stream ingestion of covid registered cases across different states (4 cities each). Approximately a 2k events are pushed every second through Kafka onto Apache Pinot. Several Dashboards are created using Apache Superset to visualize the data.
- Overall Covid Cases Across Cities.
- Covid Tracker State Wise.
- Covid Cases Grouped By Cities In Telangana State.
- Covid Tracker Where direct dependents > 4.
- Docker Compose.
- Spring Boot 2.
- Apache Superset (Installed in My mac) Please visit this site for steps.
- Java 11.
- Apache Kafka (through docker).
Note: Set %JavaHome%
$ docker-compose up -d
Wait After everything comes up
$ docker-compose ps
Name Command State Ports
covid-tracker-pinot_pinot_1 ./bin/pinot-admin.sh Quick ... Up>8000/tcp,:::8000->8000/tcp, 8096/tcp, 8097/tcp, 8098/tcp, 8099/tcp,>9000/tcp,:::9000->9000/tcp
kafka /etc/confluent/docker/run Up>29092/tcp,:::29092->29092/tcp,>9092/tcp,:::9092->9092/tcp
zookeeper /docker-entrypoint.sh zkSe ... Up>2181/tcp,:::2181->2181/tcp, 2888/tcp, 3888/tcp, 8080/tcp
In the application we have already had a folder pinot
that had schema definition and index specification. The index type star-tree
is created for pre-aggregation purposes.
- Create table in Apache Pinot
docker exec pinot-covid-tracker-fork_pinot_1 bin/pinot-admin.sh AddTable -tableConfigFile /pinot/covid-cases-table-definition.json -schemaFile /pinot/covid-cases-schema-definition.json -exec
Executing command: AddTable -tableConfigFile /pinot/covid-cases-table-definition.json -schemaFile /pinot/covid-cases-schema-definition.json -controllerHost -controllerPort 9000 -exec
Sending request: to controller: af0a1b06cabb, version: Unknown
{"status":"Table CovidCasesTracker_REALTIME succesfully added"}
Apache pinot can be accessed from port 9000. http://localhost:9000/. And have a look at CovidCasesTracker table.
See the Dockerfile at repository root level directory where we add on to the apache/superset:latest
superset image. The Dockerfile installs superset dependencies, configures admin login, updates and finally initializes and sets the entrypoint for the image. We then reference the image in the docker-compose.yaml.
In Superset UI select Data menu > + Database link > use slqalchemy mssms connection string:
mssql+pymssql://superset:[email protected]:1433/Assessments-Dev
- Debugger for Java
- Extension Pack for Java