Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

backend: do not copy buffer when creating read tx #12529

Closed
wants to merge 1 commit into from

Conversation

mlmhl
Copy link
Contributor

@mlmhl mlmhl commented Dec 8, 2020

Currently, the backend buffer is copied once for each read request, which may brings significant additional overhead. For example, in a Kubernetes cluster, all write requests are Txn, which triggers a read request to check the Comapre assertion. What's more, kube-apiserver watches etcd with previous kv required, so each watch event also triggers a read request. In a busy Kubernetes cluster, there will be many read and write requests at the same time, resulting in a large buffer and a large number of buffer copy operations.

However, as the buffer is managed as a sorted array, the overhead of a read operation is less than that of copying the entire buffer, so we can remove the buffer copy operation and just hold the read lock when invoke the buffer's range operation.

I developed a simple test tool, which can generate concurrent read and write requests at the same time, and tested the version before and after optimization. Below is a preliminary test result: the size of key is set to 64, concurrency of read and write operation is 500, read and write requests are executed 10W times and 30W times respectively. It seems that this optimization can significantly improve the performance of read operations.

value size read qps optimized read qps write qps optimized write qps
128 11313 24093 6276 13118
256 11538 24229 6405 12997
512 11669 23000 6447 12062
1024 10999 19594 6285 10036
2048 10354 13780 5595 6157
3072 10202 12112 5197 5297
4096 8217 9351 3773 3794
5120 7507 8564 3448 3462

@tangcong
Copy link
Contributor

tangcong commented Dec 8, 2020

@mlmhl if etcd holds read lock will cause "large read blocking write" issue (#10525) in mvcc/backend. This PR solves it.

@@ -704,8 +704,6 @@ func clusterVersionFromBackend(lg *zap.Logger, be backend.Backend) *semver.Versi
func downgradeInfoFromBackend(lg *zap.Logger, be backend.Backend) *DowngradeInfo {
dkey := backendDowngradeKey()
tx := be.ReadTx()
tx.Lock()
Copy link
Contributor

@ptabor ptabor Jan 31, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please comment why this transaction does not need to take 'lock' and its worth not taking the lock ?

The method seems to be called very infreqently (during Recovery).

@ptabor
Copy link
Contributor

ptabor commented Jan 31, 2021

Very promising improvement. Thank you. Could you, please, retrigger the test.

@jingyih could you, please, take second look. It LGTM, but you have worked a lot in this area.

@ptabor
Copy link
Contributor

ptabor commented May 14, 2021

I assume its obsoleted by:

#12933.

Thank you for the idea.

@ptabor ptabor closed this May 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

4 participants