You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have introduced RocksDB to store CQ and Offset in our million queues, but currently, there is no solution for in-place upgrading from the JSON version to the RocksDB version. This proposal aims to address this issue.
Regarding the Offset, we will convert it into RocksDB format when opening transferOffsetJsonToRocksdb, which occurs during the broker startup. For the Change Queue (CQ), we provide two formats for the CQ dual write mode, controlled by the rocksdbCQWriteEnable switch, which determines whether to simultaneously dispatch both formats of the CQ. We also offer a tool to monitor the progress of the dual write to help identify an appropriate time to switch streams.
In the entire scheme, we provide three switches: transferMetadataJsonToRocksdb, transferOffsetJsonToRocksdb, and rocksdbCQWriteEnable. When combined with the previous storeType, we can complete the local upgrade from the file version to RocksDB.
As we can see, in Stage 2, we only dispatch one additional index, which consumes some extra storage space, but does not have any other impact. If any issues arise, it can also be easily rolled back.
In cqUnit, the position is specified at the time of message dispatch. Therefore, when a new index is added, its initial position will remain continuous and will not start from zero. After a period of dual writing, the ends of the two types of cq will definitely align, similar to this.
In addition, we also provide two additional admin tools: DiffConsumeQueueCommand and ExportMetadataInRocksDBCommand. The former is used to check the dual-write progress of the consume queue, while the latter is used to export metadata from RocksDB into JSON format (mainly to complete any missing metadata in the JSON during rollback scenarios).
When the topic parameter is included, the specific details of each queue are displayed:
When the topic parameter is not included, it only shows whether the check is successful:
In summary, this solution is relatively straightforward. It avoids excessively blocking the broker's startup, which could lead to delays in message consumption, while also providing ample rollback options for potential exceptions. Additionally, it does not alter the internal logic of the two storage solutions; instead, it simply increases the functionality by holding a reference to rocksdbStore and implementing a Dispatcher for dual writing. We have also ensured the safety and lack of side effects of the logic introduced by this PR through three control switches.
The text was updated successfully, but these errors were encountered:
In the second phase, when RocketDB starts writing the cq(RocksDBConsumeQueueStore#bufferDRList) offset, there are scenarios where the cq data in memory hasn't been fully persisted. Upon reboot, the original file-based cq is intact, but there are some losses in the cq within RocketDB. This raises the question of how to ensure the transactional consistency between the original cq and the cq in RocketDB
In the second phase, when RocketDB starts writing the cq(RocksDBConsumeQueueStore#bufferDRList) offset, there are scenarios where the cq data in memory hasn't been fully persisted. Upon reboot, the original file-based cq is intact, but there are some losses in the cq within RocketDB. This raises the question of how to ensure the transactional consistency between the original cq and the cq in RocketDB
Before Creating the Enhancement Request
Summary
支持cq文件和Offset从json版本升级到rocksdb版本
Support Upgrade of CQ Files and Offset from JSON Version to RocksDB Version
Motivation
我们百万队列引入了rocksdb来存储cq和Offset,但是现在没有从json版原地升级到rocksdb版的方案,此方案就是为了解决这个问题
We have introduced RocksDB to store CQ and Offset in our million queues, but currently, there is no solution for in-place upgrading from the JSON version to the RocksDB version. This proposal aims to address this issue.
Describe the Solution You'd Like
对于Offset,我们会在storeType=defaultRocksdb时候时,broker启动时候做转换。
对于cq,我们提供了俩种格式cq双写的模式,通过rocksdbCQWriteEnable来控制是否同时Dispatch俩种格式的cq,同时我们还提供了检测双写进度的工具,方便寻找合适的时间以切流。
在整个方案中,我们额外提供了一个开关,rocksdbCQWriteEnable,再搭配之前的storeType,我们就可以完成原地的从文件版升级到rocksdb。
整个过程分为三个阶段:
storeType=default
rocksdbCQWriteEnable=false
storeType=default
rocksdbCQWriteEnable=true
storeType=defaultRocksdb
rocksdbCQWriteEnable=false
可以看到,过程2我们只多Dispatch了一份索引,这个过程会消耗额外一部分存储空间,除此以外没有任何影响,如果遇到问题也可以很容易的回滚
而且cqUnit中的位点,是消息Dispatch时候就指定的,所以新增索引的初始位点会保持连续,不会从零开始,当双写一段时间后,俩种cq的尾端一定是对齐的,类似这样:
此外我们还提供了俩个额外的admin工具,CheckRocksdbCqWriteProgressCommand和ExportMetadataInRocksDBCommand,分别用于查看cq双写进度以及导出rocksdb中的元数据成json(主要是在回滚场景下补齐json中缺失的元数据)
执行的结果如下图所示,带topic参数时会显示每个queue具体的情况:
不带topic参数时只显示是否check成功:
Regarding the Offset, we will convert it into RocksDB format when opening transferOffsetJsonToRocksdb, which occurs during the broker startup. For the Change Queue (CQ), we provide two formats for the CQ dual write mode, controlled by the rocksdbCQWriteEnable switch, which determines whether to simultaneously dispatch both formats of the CQ. We also offer a tool to monitor the progress of the dual write to help identify an appropriate time to switch streams.
In the entire scheme, we provide three switches: transferMetadataJsonToRocksdb, transferOffsetJsonToRocksdb, and rocksdbCQWriteEnable. When combined with the previous storeType, we can complete the local upgrade from the file version to RocksDB.
The entire process is divided into three stages:
Read-Only and Write-Only File Version CQ
storeType=default
rocksdbCQDoubleWriteEnable=false
Dual Write to RocksDB, Read-Only File Version
storeType=default
rocksdbCQDoubleWriteEnable=true
Write-Only and Read-Only RocksDB Version CQ, Concurrently Converting Metadata and Offset
storeType=defaultRocksdb
rocksdbCQDoubleWriteEnable=false
As we can see, in Stage 2, we only dispatch one additional index, which consumes some extra storage space, but does not have any other impact. If any issues arise, it can also be easily rolled back.
In cqUnit, the position is specified at the time of message dispatch. Therefore, when a new index is added, its initial position will remain continuous and will not start from zero. After a period of dual writing, the ends of the two types of cq will definitely align, similar to this.
In addition, we also provide two additional admin tools: DiffConsumeQueueCommand and ExportMetadataInRocksDBCommand. The former is used to check the dual-write progress of the consume queue, while the latter is used to export metadata from RocksDB into JSON format (mainly to complete any missing metadata in the JSON during rollback scenarios).
When the topic parameter is included, the specific details of each queue are displayed:
When the topic parameter is not included, it only shows whether the check is successful:
Describe Alternatives You've Considered
nothing
Additional Context
综上可以看到,该方案总的来说是比较简单的,既不会过久的阻塞broker的启动导致消息消费延迟,也为可能遇到的异常情况留下了充足的回滚后路,而且也没有动俩种cq存储方案的内部逻辑,只是简单的通过持有rocksdbStore引用的方式,增加了一个Dispatcher来实现双写。而且我们也通过三个开关,来保证该pr增加的逻辑是足够安全和没有副作用的。
In summary, this solution is relatively straightforward. It avoids excessively blocking the broker's startup, which could lead to delays in message consumption, while also providing ample rollback options for potential exceptions. Additionally, it does not alter the internal logic of the two storage solutions; instead, it simply increases the functionality by holding a reference to rocksdbStore and implementing a Dispatcher for dual writing. We have also ensured the safety and lack of side effects of the logic introduced by this PR through three control switches.
The text was updated successfully, but these errors were encountered: