-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ISSUE #51] Optimize the source task commit method #52
[ISSUE #51] Optimize the source task commit method #52
Conversation
优化背景是什么? |
|
*/ | ||
public void commit(final List<ConnectRecord> connectRecords) throws InterruptedException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
对于某些Source保留Offset的场景,如果一次commit一批数据,Source可以根据不同的分区信息,选择每个分区的最新位点进行提交,以此降低提交的频次;如果是单个提交,类似的场景就不好处理。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
对于某些Source保留Offset的场景,如果一次commit一批数据,Source可以根据不同的分区信息,选择每个分区的最新位点进行提交,以此降低提交的频次;如果是单个提交,类似的场景就不好处理。
次处主要的考虑主要有以下几点,
- 每条数据经过transform 、converter、send 成功后就可以commit了,如果是批次的commit,系统内还需要缓存record offset信息,数据丢了会出现重复按照历史offset拉取数据的问题
- 如果要降低批次提交,看用户自身需要可以在插件中自缓存list , 同样可以解决相同的问题
所以,觉得提交单条还是更纯粹一点
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里我理解你讲的是send单条数据的场景,但是有些场景下,我们需要支持send一批数据。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里我理解你讲的是send单条数据的场景,但是有些场景下,我们需要支持send一批数据。
嗯,考虑过这个问题,但是一个source task拉取的数据可能不是写入同一个topic,可能是多个,这个要看写插件的逻辑定;所以可能暂时用不到,也可能一直用不到,如果需要时可以加上
No description provided.