Skip to content
This repository has been archived by the owner on Dec 16, 2022. It is now read-only.

Benchmark iterator, avoid redundant queue, remove managers. #3119

Merged
merged 77 commits into from
Aug 29, 2019

Conversation

brendan-ai2
Copy link
Contributor

@brendan-ai2 brendan-ai2 commented Aug 7, 2019

  • Adds a script to benchmark iterators.
    • Average speed
    • Introspects queues
  • Removes a bottleneck when MultiprocessDatasetReader and MultiprocessIterator are used in conjunction.
    • Specifically, removes a redundant queue that was populated by a single process.
  • Removes managers which have significant overhead.
  • Results on benchmarking the iterator from training_config/bidirectional_language_model.jsonnet:
    • original code, no multiprocessing: 0.047 s/b over 10000 batches
    • original code, workers = 1: 0.073 s/b over 10000 batches
    • original code, workers = 10: 0.078 s/b over 10000 batches
    • this PR (-queue), workers = 1: 0.073 s/b over 10000 batches
    • this PR (-queue), workers = 10: 0.046 s/b over 10000 batches
    • this PR (-queue, - manager), workers = 1: 0.063 s/b over 10000 batches
    • this PR (-queue, - manager), workers = 10: 0.020 s/b over 10000 batches
    • Notably, previously we did not see any benefit from scaling to multiple workers. Now we do, albeit worse than linearly. More work required there.
    • To be clear, this is testing the reader + iterator in isolation. Not training.
  • Related issues: How can I make batch instances(converting input text fields to tensor objects) faster or efficiently? #2962, Unable to see the gains by using MultiProcessDatasetReader #1890

@brendan-ai2
Copy link
Contributor Author

Thanks, feedback addressed!

Copy link
Contributor

@joelgrus joelgrus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, modulo a comment on a non-obvious test

# Half of 100 files * 4 sentences / file
i = 0
for instance in reader.read(self.identical_files_glob):
if i == 200:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would probably just add a comment here, along the lines of "i == 200 means this is the 201st instance, which is too many, so break out and fail the test"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, done

@brendan-ai2
Copy link
Contributor Author

Thanks for the review! (I'm dragging my heels on merging due to some test discrepancies between TC and my local machine. I'll ping for another pass if the fix is anything major.)

@@ -33,13 +36,51 @@ def instances() -> Iterator[Instance]:

output_queue.put(index)

# We need to ensure we've gotten all the tensors out of this queue before
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joelgrus, if you're interested, this comment describes the fix for the issue I alluded to earlier. Given all these edge cases I see why you preferred using a Manager in the first version of this code!

https://eli.thegreenplace.net/2009/06/12/safely-using-destructors-in-python/.
"""
for process in self.processes:
process.terminate()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joelgrus, last bug should be fixed now. It was a nasty race in the tests due to the stray child processes. I ended up using __del__ here for that. I know that using with is preferred in Python, but I'm not sure we have that as an option given that we're attempting to fit the Iterable interface. We aren't holding any circular refs here, so this should be safe, IIUC. Certainly it fixes the problem empirically. Any concerns?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it works, I have no concerns 😇

(I am somewhat far from this code at this point, and I'm not really a multiprocessing expert)

@brendan-ai2 brendan-ai2 merged commit bbaf1fc into allenai:master Aug 29, 2019
@brendan-ai2
Copy link
Contributor Author

Thanks for the review!

@brendan-ai2 brendan-ai2 mentioned this pull request Sep 27, 2019
reiyw pushed a commit to reiyw/allennlp that referenced this pull request Nov 12, 2019
…3119)

- Adds a script to benchmark iterators.
  - Average speed
  - Introspects queues
- Removes a bottleneck when `MultiprocessDatasetReader` and `MultiprocessIterator` are used in conjunction.
  - Specifically, removes a redundant queue that was populated by a single process.
- Removes managers which have significant overhead.
- Results on training_config/bidirectional_language_model.jsonnet:
  - original code, no multiprocessing: 0.047 s/b over 10000 batches
  - original code, workers = 1: 0.073 s/b over 10000 batches
  - original code, workers = 10: 0.078 s/b over 10000 batches
  - this PR (-queue), workers = 1: 0.073 s/b over 10000 batches
  - this PR (-queue), workers = 10: 0.046 s/b over 10000 batches
  - this PR (-queue, - manager), workers = 1: 0.063 s/b over 10000 batches
  - this PR (-queue, - manager), workers = 10: 0.020 s/b over 10000 batches
  - Notably, previously we did not see any benefit from scaling to multiple workers. Now we do, albeit worse than linearly. More work required there.
- Related issues: allenai#2962, allenai#1890
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants