Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement] Support JSON CQ Files and offset In-place Upgrade to RocksDB #8589

Closed
1 task done
LetLetMe opened this issue Aug 28, 2024 · 2 comments · Fixed by #8600
Closed
1 task done

[Enhancement] Support JSON CQ Files and offset In-place Upgrade to RocksDB #8589

LetLetMe opened this issue Aug 28, 2024 · 2 comments · Fixed by #8600

Comments

@LetLetMe
Copy link
Contributor

LetLetMe commented Aug 28, 2024

Before Creating the Enhancement Request

  • I have confirmed that this should be classified as an enhancement rather than a bug/feature.

Summary

支持cq文件和Offset从json版本升级到rocksdb版本

Support Upgrade of CQ Files and Offset from JSON Version to RocksDB Version

Motivation

我们百万队列引入了rocksdb来存储cq和Offset,但是现在没有从json版原地升级到rocksdb版的方案,此方案就是为了解决这个问题

We have introduced RocksDB to store CQ and Offset in our million queues, but currently, there is no solution for in-place upgrading from the JSON version to the RocksDB version. This proposal aims to address this issue.

Describe the Solution You'd Like

对于Offset,我们会在storeType=defaultRocksdb时候时,broker启动时候做转换。
对于cq,我们提供了俩种格式cq双写的模式,通过rocksdbCQWriteEnable来控制是否同时Dispatch俩种格式的cq,同时我们还提供了检测双写进度的工具,方便寻找合适的时间以切流。
在整个方案中,我们额外提供了一个开关,rocksdbCQWriteEnable,再搭配之前的storeType,我们就可以完成原地的从文件版升级到rocksdb。
整个过程分为三个阶段:

  1. 只读只写文件版cq
    storeType=default
    rocksdbCQWriteEnable=false
  2. cq双写rocksdb,只读文件版,offset只写json(此时只会多Dispatch一份索引)
    storeType=default
    rocksdbCQWriteEnable=true
  3. 只写只读rocksdb版cq,同时转换元数据和offset
    storeType=defaultRocksdb
    rocksdbCQWriteEnable=false

可以看到,过程2我们只多Dispatch了一份索引,这个过程会消耗额外一部分存储空间,除此以外没有任何影响,如果遇到问题也可以很容易的回滚

而且cqUnit中的位点,是消息Dispatch时候就指定的,所以新增索引的初始位点会保持连续,不会从零开始,当双写一段时间后,俩种cq的尾端一定是对齐的,类似这样:
image

此外我们还提供了俩个额外的admin工具,CheckRocksdbCqWriteProgressCommand和ExportMetadataInRocksDBCommand,分别用于查看cq双写进度以及导出rocksdb中的元数据成json(主要是在回滚场景下补齐json中缺失的元数据)
执行的结果如下图所示,带topic参数时会显示每个queue具体的情况:
image

不带topic参数时只显示是否check成功:
image

Regarding the Offset, we will convert it into RocksDB format when opening transferOffsetJsonToRocksdb, which occurs during the broker startup. For the Change Queue (CQ), we provide two formats for the CQ dual write mode, controlled by the rocksdbCQWriteEnable switch, which determines whether to simultaneously dispatch both formats of the CQ. We also offer a tool to monitor the progress of the dual write to help identify an appropriate time to switch streams.

In the entire scheme, we provide three switches: transferMetadataJsonToRocksdb, transferOffsetJsonToRocksdb, and rocksdbCQWriteEnable. When combined with the previous storeType, we can complete the local upgrade from the file version to RocksDB.

The entire process is divided into three stages:

Read-Only and Write-Only File Version CQ

storeType=default
rocksdbCQDoubleWriteEnable=false

Dual Write to RocksDB, Read-Only File Version

storeType=default
rocksdbCQDoubleWriteEnable=true

Write-Only and Read-Only RocksDB Version CQ, Concurrently Converting Metadata and Offset

storeType=defaultRocksdb
rocksdbCQDoubleWriteEnable=false

As we can see, in Stage 2, we only dispatch one additional index, which consumes some extra storage space, but does not have any other impact. If any issues arise, it can also be easily rolled back.

In cqUnit, the position is specified at the time of message dispatch. Therefore, when a new index is added, its initial position will remain continuous and will not start from zero. After a period of dual writing, the ends of the two types of cq will definitely align, similar to this.
image

In addition, we also provide two additional admin tools: DiffConsumeQueueCommand and ExportMetadataInRocksDBCommand. The former is used to check the dual-write progress of the consume queue, while the latter is used to export metadata from RocksDB into JSON format (mainly to complete any missing metadata in the JSON during rollback scenarios).

When the topic parameter is included, the specific details of each queue are displayed:
image

When the topic parameter is not included, it only shows whether the check is successful:
image

Describe Alternatives You've Considered

nothing

Additional Context

综上可以看到,该方案总的来说是比较简单的,既不会过久的阻塞broker的启动导致消息消费延迟,也为可能遇到的异常情况留下了充足的回滚后路,而且也没有动俩种cq存储方案的内部逻辑,只是简单的通过持有rocksdbStore引用的方式,增加了一个Dispatcher来实现双写。而且我们也通过三个开关,来保证该pr增加的逻辑是足够安全和没有副作用的。

In summary, this solution is relatively straightforward. It avoids excessively blocking the broker's startup, which could lead to delays in message consumption, while also providing ample rollback options for potential exceptions. Additionally, it does not alter the internal logic of the two storage solutions; instead, it simply increases the functionality by holding a reference to rocksdbStore and implementing a Dispatcher for dual writing. We have also ensured the safety and lack of side effects of the logic introduced by this PR through three control switches.

@fuyou001
Copy link
Contributor

fuyou001 commented Sep 2, 2024

在第二阶段时,当 RocketDB 开始写入 cq 位点,但在一些场景下,内存中的 cq(RocksDBConsumeQueueStore#bufferDRList) 数据没有完成持久化。重启时,原来文件版cq 是正常的,但 rocketdb 中的 cq 有些缺失,也就说如何保障原来 cq 和 RocketDB 中的 cq 事务性

In the second phase, when RocketDB starts writing the cq(RocksDBConsumeQueueStore#bufferDRList) offset, there are scenarios where the cq data in memory hasn't been fully persisted. Upon reboot, the original file-based cq is intact, but there are some losses in the cq within RocketDB. This raises the question of how to ensure the transactional consistency between the original cq and the cq in RocketDB

@LetLetMe
Copy link
Contributor Author

在第二阶段时,当 RocketDB 开始写入 cq 位点,但在一些场景下,内存中的 cq(RocksDBConsumeQueueStore#bufferDRList) 数据没有完成持久化。重启时,原来文件版cq 是正常的,但 rocketdb 中的 cq 有些缺失,也就说如何保障原来 cq 和 RocketDB 中的 cq 事务性

In the second phase, when RocketDB starts writing the cq(RocksDBConsumeQueueStore#bufferDRList) offset, there are scenarios where the cq data in memory hasn't been fully persisted. Upon reboot, the original file-based cq is intact, but there are some losses in the cq within RocketDB. This raises the question of how to ensure the transactional consistency between the original cq and the cq in RocketDB

  1. 如果双写过程中出现kvCq重启导致落后,那我们什么都不需要做,多双写一段时间,等待再次对齐就好了
  2. 如果是kv模式下重启,那么会走正常的revover逻辑

lizhimins pushed a commit that referenced this issue Sep 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment