-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Shell out to model handlers to collect byte sizes #28182
Shell out to model handlers to collect byte sizes #28182
Conversation
R: @tvalentyn |
Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control |
Codecov Report
@@ Coverage Diff @@
## master #28182 +/- ##
==========================================
+ Coverage 72.29% 72.31% +0.01%
==========================================
Files 678 678
Lines 99848 99855 +7
==========================================
+ Hits 72189 72208 +19
+ Misses 26084 26072 -12
Partials 1575 1575
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 10 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
|
||
def test_keyed_model_handler_multiple_models_get_num_bytes(self): | ||
mhs = [ | ||
base.KeyMhMapping(['key1'], FakeModelHandler(num_bytes_per_element=10)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you still plan to change the name of KeyMhMapping?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'm waiting until I don't have in flight PRs around this to change it to avoid conflicts (right now #28026 uses KeyMhMapping)
return batch_bytes + self._unkeyed.get_num_bytes(unkeyed_batch) | ||
|
||
batch_by_key = defaultdict(list) | ||
for pair in batch: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: consider using mnemonic names for readability if pair can be unpacked:
for key, examples in batch:
batch_by_key[key].append(examples)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also there is conflation of batch of keyed prediction inputs vs a batch of elements for a single key. not sure how to clarify, could use a courtesy variable like:
if self._single_model:
return batch_bytes + self._unkeyed.get_num_bytes(unkeyed_batch)
else:
keyed_batches = batch
for key, examples in keyed_batches:
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated to use key/examples.
also there is conflation of batch of keyed prediction inputs vs a batch of elements for a single key. not sure how to clarify, could use a courtesy variable like:
I don't think this is right. batches
means batch of keyed prediction inputs in all contexts here, batch_by_key
represents batches of elements per key.
* Shell out to model handlers to collect byte sizes * naming
Right now, we're not calculating batch sizes correctly for keyed examples in the per key model handler implementation; we should be shelling out to the underlying model handler which can do this more effectively. This adds support to do this.
Part of #27628
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123
), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>
instead.CHANGES.md
with noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.