Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

modifies FATE to use a single thread to find work #4042

Merged

Conversation

keith-turner
Copy link
Contributor

This change modifies FATE to use singe thread to find work. It also cleans up some of the signaling between threads in FATE and fixes a synchronization bug in FATE that was introduced in #4017. The bug introduced in #4017 is that somethings are syncronizing on the wrong object because a new inner class was introduced.

These changes were pulled from #3964 and cleaned up and improved.

This change modifies FATE to use singe thread to find work.  It also
cleans up some of the signaling between threads in FATE and fixes a
synchronization bug in FATE that was introduced in apache#4017.  The bug
introduced in apache#4017 is that somethings are syncronizing on the wrong
object because a new inner class was introduced.

These changes were pulled from apache#3964 and cleaned up and improved.
@keith-turner keith-turner requested a review from cshannon December 7, 2023 22:03
@keith-turner
Copy link
Contributor Author

Prior to this change every fate thread would read everything from ZK when it wanted to find work. Think this will read a lot less from zookeeper and eventually the metadata table.

@keith-turner
Copy link
Contributor Author

The bug from #4017, that this fixes, is causing fate operations to take 5 seconds which is causing some test to timeout

Copy link
Contributor

@cshannon cshannon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the updates here a lot, I think using a single thread makes a lot of sense. I've been reviewing this for a bit and there are a lot of complex changes here so it's a bit hard to know if there is a bug with this or something not quire right. I think we will likely just need to merge and then test it out to try and discover if there's anything unexpected that will pop up that the ITs don't catch.

@dlmarion
Copy link
Contributor

dlmarion commented Dec 8, 2023

It looks like with the transfer queue change, you are no longer queue'ing things up and instead only using it to transfer work to an available worker. Is that correct? There was a comment in an earlier version about the queue needing to change size on a config change.

@keith-turner
Copy link
Contributor Author

It looks like with the transfer queue change, you are no longer queue'ing things up and instead only using it to transfer work to an available worker. Is that correct? There was a comment in an earlier version about the queue needing to change size on a config change.

Yeah that is what it is doing now. I removed that comment when switching to transfer queue.

@dlmarion
Copy link
Contributor

dlmarion commented Dec 8, 2023

It looks like with the transfer queue change, you are no longer queue'ing things up and instead only using it to transfer work to an available worker. Is that correct? There was a comment in an earlier version about the queue needing to change size on a config change.

Yeah that is what it is doing now. I removed that comment when switching to transfer queue.

I think that's a good approach, conserves memory in the Manager.

@keith-turner
Copy link
Contributor Author

I think that's a good approach, conserves memory in the Manager.

Yeah, and eventually the impl that stores data in an accumulo table may be able to be more memory efficient than the ZK impl.

@keith-turner keith-turner merged commit 8e78562 into apache:elasticity Dec 8, 2023
@keith-turner keith-turner deleted the fate-single-thread-work-finder branch December 8, 2023 21:12
@ctubbsii ctubbsii added this to the 4.0.0 milestone Jul 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

4 participants