Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for get_many #16

Merged
merged 6 commits into from
Feb 20, 2024
Merged

Add support for get_many #16

merged 6 commits into from
Feb 20, 2024

Conversation

ianbishop
Copy link

@ianbishop ianbishop commented Feb 13, 2024

Add support for bulk gets to hbase. Makes use of a protobuf MultiRequest that I discovered while digging around.

Also fixed a host of bugs, I'll add some comments inline.

for key in keys:
dest_region = self._find_hosting_region(table, key)
# we must call each region server, which can server many key ranges
grouped_by_server[dest_region.region_client.host][dest_region].append(key)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The format for using this call is pretty straightforward but context if you're not too familiar with hbase:

Each region server hosts (in our case, many) range of keys for a given table. We need to organize each key into the appropriate region server key range. Once we've done that, we can re-group these by server and send a request for all keys matching any key range supported by that server.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

brilliant

@@ -128,6 +129,8 @@ def _send_request(self, rq, lock_timeout=10):

# send and receive the request
future = self.thread_pool.submit(self.send_and_receive_rpc, my_id, rq, to_send)
if _async:
return future
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We utilize a threadpool here but immediately block. This would work in gevent world but removes the entire point of the threadpool in normal python execution.

@@ -240,7 +246,7 @@ def NewClient(host, port, pool_size, secondary=False, call_timeout=60):
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((c.host, int(port)))
_send_hello(s)
s.settimeout(2)
s.settimeout(call_timeout)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would submit these onto a threadpool and then immediately block for call_timeout for 60 seconds. However, the actual timeout on the socket was hardcoded to 2 seconds. So you never would wait for whatever your call_timeout was.

for key in keys:
dest_region = self._find_hosting_region(table, key)
# we must call each region server, which can server many key ranges
grouped_by_server[dest_region.region_client.host][dest_region].append(key)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

brilliant

@ianbishop ianbishop merged commit 393cb97 into master Feb 20, 2024
@ianbishop ianbishop deleted the ibishop/support-multi branch February 20, 2024 16:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants