-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Load-based replica read #105
Conversation
Signed-off-by: Yilin Chen <[email protected]>
This design only maintains the load of TiKVs in the client. And the client only receives the load info only when I'm still not confident about the retry strategy in this document. Completely different strategy designs are welcome. |
Signed-off-by: Yilin Chen <[email protected]>
text/0105-load-based-replica-read.md
Outdated
|
||
The current queue length is easily known. But we have to predict the average time slice in the short future. We can use the EWMA of the previous time slices to estimate it. $S_{now}$ is the average time slice length of the read pool in the past second. We update the latest EWMA $S_{i}$ every second using the following formula: | ||
|
||
$$S_{i}=\alpha \cdot S_{now}+(1-\alpha) \cdot S_{i-1}$$ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems a bit vague to me. Does Oh I get it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now I use
text/0105-load-based-replica-read.md
Outdated
|
||
Knowing the current queue length $L$ and the average time slice $S$ of the read pool, we can estimate that the wait duration is $T_{waiting} =L \cdot S$. | ||
|
||
The current queue length is easily known. But we have to predict the average time slice in the short future. We can use the EWMA of the previous time slices to estimate it. $S_{now}$ is the average time slice length of the read pool in the past second. We update the latest EWMA $S_{i}$ every second using the following formula: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems it can take at most 1 second for the mechanism to recognize a spike of load. Underestimating the load might undermine the optimization.
Does a shorter interval improve the sensitivity while not introduce much more overhead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I change it to 200ms. The average time slice does not change much under a spike of load. So, the update frequency needn't be very short.
text/0105-load-based-replica-read.md
Outdated
|
||
Knowing the current queue length $L$ and the average time slice $S$ of the read pool, we can estimate that the wait duration is $T_{waiting} =L \cdot S$. | ||
|
||
The current queue length is easily known. But we have to predict the average time slice in the short future. We can use the EWMA of the previous time slices to estimate it. $S_{now}$ is the average time slice length of the read pool in the past second. We update the latest EWMA $S_{i}$ every second using the following formula: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When load is extremely low (e.g. there is only 1 large read request, or even 0), could it misestimate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's a good point. I add a paragraph below for this case.
Signed-off-by: Yilin Chen <[email protected]>
Signed-off-by: Yilin Chen <[email protected]>
|
||
To make use of as many resources as possible, the load we predict should not be larger than the current load. Otherwise, we may skip a node that is already free for executing requests and not get the best performance. | ||
|
||
We use `estimatedWait - (time.Now().Since(waitTimeUpdatedAt))` as the estimated waiting duration in the client. It's mostly certain that this estimated value is smaller than real because the TiKV accepts requests meanwhile and some queries don't finish in a single time slice. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My initial thought is to let the client use the observed metrics like cop_task_avg_wait_duration
in a recent time interval or something like that to decide which replica to choose next. This estimatedWait - (time.Now().Since(waitTimeUpdatedAt))
looks simpler and could avoid retrying already busy replicas 🤔
} | ||
``` | ||
|
||
Because we will retry in replica-read mode, we don't need the follower or learner to issue a read index RPC again after knowing the applied index. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What will the replica-read node do when it's applied index is not satisfied?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It waits until it applies the index. This saves the read index RPC, and the other procedures are the same with the original replica read.
Please also consider cross AZ data transfer fee when deploy tikv cross AZs. |
If user experience is more important, this feature is also worth considering in spite of the extra cost. Anyway, this mode is not available to users using closest-replica/adaptive mode now. |
Signed-off-by: Yilin Chen <[email protected]>
Signed-off-by: Yilin Chen <[email protected]>
/merge |
No description provided.