A Golang application to track all data changes and stored in Kafka.
- For linux:
GOOS=linux GOARCH=amd64 CGO_ENABLED=0 go build -o main
- For windows:
$env:GOOS="linux"; $env:GO_ARCH="amd64"; $env:CGO_ENABLED=0; go build -o main
docker-compose up
Go to: http://localhost:9000
- Binlog Storage: All data changes are recorded in the binlog file. Each entry has a log position, which helps in tracking changes. If the application or server crashes, we can resume from the next log position.
-
Event Data and Commit Log Topics:
- We use separate topics for event data and commit logs. The event data topic contains the actual changes, while the commit log topic tracks which messages have been committed.
-
Message Handling:
- Each data change event is pushed into a
waiting queue
and sent to apending channel
. - A goroutine processes messages from the
pending channel
every 0.1 seconds. Upon successfully sending a message to Kafka, a callback triggers an update in thecommit log hashmap
(using go-cache in this project).
- Each data change event is pushed into a
-
Watcher Goroutine:
- A background goroutine (the watcher) tracks which data has been successfully stored in Kafka by monitoring the
commit log hashmap
. - The watcher compares the first log position in the
waiting queue
with the commit log. If the first log position has been written, it releases this position from both thecommit log hashmap
and thewaiting queue
. - Released log positions are written into the commit log topic.
- A background goroutine (the watcher) tracks which data has been successfully stored in Kafka by monitoring the
- Maintaining Order: We need both the
commit log hashmap
and thewaiting queue
to maintain message order when producing to Kafka. This ensures that messages are processed in the correct sequence. - Handling Failures: Without maintaining order, a failure could result in skipping events. For example, if messages 3 and 4 are sent to Kafka and only message 4 is successful, a crash could lead to message 3 being missed upon restart. The system would only see message 4 in the commit log topic and would not resend message 3.
- Handling Duplicates: The primary concern with this approach is handling duplicate events. If an event is processed more than once, it can be skipped safely, as duplicates do not cause errors in the system.
- Get last offset in Kafka after re-start
- Write new binlog reader (currently this project using canal)