-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Unable to see the gains by using MultiProcessDatasetReader
#1890
Comments
@HarshTrivedi how large was your original dataset? We have not tested this code thoroughly, but we're going through it soon so, if there are issues, we might sort them out soon. |
@schmmd My original dataset is about ~2M pairs of sentences. I wasn't able to see gain in that so I made this small case with only 50K entailment pairs (from snli) - with and without multiprocess-reader enabled. This is easily reproducible w/o any part of my code ... It might help debug the problem. |
Thanks @HarshTrivedi for putting together the repro case. That said, troubleshooting this issue isn't a high priority for our team right now--but if there are any issues we should run into them with some of our current work for the quarter. |
I'm hitting this now with the Calypso port. Investigating. |
One issue is that you'll also likely need the multiprocess iterator. Add something like the following to your config:
I'm still studying this code to fully understand the interaction between the reader and iterator. Should have more for you soon. |
@brendan-ai2 Thanks for looking into this! Can you confirm that you could reproduce the issue? Unfortunately, I am out of town till Thursday at least and won't be able try your suggestions till then. Btw, I am not using In my particular case, since dataset fits in memory, changing datasetreader to use multiprocessing map turned out to work well ( @matt-gardner suggested it in #1891 ). |
@HarshTrivedi, I have not attempted to reproduce with your exact code, but I have reproduced with my own which is fairly similar. My suspicion for |
Quick update, I've written a version of the |
is there such a strong need to parallelize counting the tokens? it only happens once? I thought the main idea of the multiprocess stuff was to parallelize the tensor generation (which happens over and over again), but I could be missing something. |
Yeah, I guess I'd like to see absolute numbers on the performance difference before being sure we should invest heavily in fixing the vocab creation part. If it really is a large bottleneck, it might be worth it, I just don't know how bad the problem is. |
The token counting was proceeding at a speed that it would have taken about 5-6 hours to complete. Processing the shards independently brought that down to about 30 minutes. Yes, in the ideal case that cost is borne only once, but it's still a pretty unfortunate user experience and will be incurred whenever the dataset changes or a vocab parameter is tweaked. I'm still trying to grok how the current setup is intending to parallelize the tensor generation. My concern is that the queue is going to be a bottleneck even when it's properly exposed as a queue instead of an iterable. I'll be able to verify this once my branch is cleaned up. Effectively I need to pass the underlying queue through everywhere and remove any code that combines datasets as iterables naively. |
@brendan-ai2 Any updates on this? |
Hi @rangwani-harsh, thanks for the ping. For now I've discarded my code that parallelizes counting the tokens. Getting an API for that that fits with the existing AllenNLP codebase is challenging. Instead I've pre-generated a vocabulary (which can be done with I'd suggest taking the same approach for now. So, when pre-generating the vocabulary don't use the I hope that helps. Let me know if you have any followup questions. |
Hi, In the PR that introduced lazy dataset reader, there is an assumption that laziness is only required for training. Can you please share some insight on it @joelgrus. And is it not possible for the dry-run and make-vocab to work with laziness. Thanks |
@brendan-ai2 I still can't get through as |
- Adds a script to benchmark iterators. - Average speed - Introspects queues - Removes a bottleneck when `MultiprocessDatasetReader` and `MultiprocessIterator` are used in conjunction. - Specifically, removes a redundant queue that was populated by a single process. - Removes managers which have significant overhead. - Results on training_config/bidirectional_language_model.jsonnet: - original code, no multiprocessing: 0.047 s/b over 10000 batches - original code, workers = 1: 0.073 s/b over 10000 batches - original code, workers = 10: 0.078 s/b over 10000 batches - this PR (-queue), workers = 1: 0.073 s/b over 10000 batches - this PR (-queue), workers = 10: 0.046 s/b over 10000 batches - this PR (-queue, - manager), workers = 1: 0.063 s/b over 10000 batches - this PR (-queue, - manager), workers = 10: 0.020 s/b over 10000 batches - Notably, previously we did not see any benefit from scaling to multiple workers. Now we do, albeit worse than linearly. More work required there. - Related issues: #2962, #1890
…3119) - Adds a script to benchmark iterators. - Average speed - Introspects queues - Removes a bottleneck when `MultiprocessDatasetReader` and `MultiprocessIterator` are used in conjunction. - Specifically, removes a redundant queue that was populated by a single process. - Removes managers which have significant overhead. - Results on training_config/bidirectional_language_model.jsonnet: - original code, no multiprocessing: 0.047 s/b over 10000 batches - original code, workers = 1: 0.073 s/b over 10000 batches - original code, workers = 10: 0.078 s/b over 10000 batches - this PR (-queue), workers = 1: 0.073 s/b over 10000 batches - this PR (-queue), workers = 10: 0.046 s/b over 10000 batches - this PR (-queue, - manager), workers = 1: 0.063 s/b over 10000 batches - this PR (-queue, - manager), workers = 10: 0.020 s/b over 10000 batches - Notably, previously we did not see any benefit from scaling to multiple workers. Now we do, albeit worse than linearly. More work required there. - Related issues: allenai#2962, allenai#1890
With the merge yesterday, can this be closed? @DeNeutoy, @brendan-ai2? |
@matt-gardner, yes we should close this. #3119 addressed one fundamental problem with the multiprocess data/iterator setup, but we're trying to move away from that anyway with @DeNeutoy's new iterator PR: #3386. Of course, that PR is still in flight, but it's the likely path forward. We're not intending to invest more effort into |
Superseding issue: #3079 |
@joelgrus, I am unable to see the gains by using the
MultiProcessDatasetReader
. In fact it's taking 7x more time todry-run
in this particular case. I could be doing something wrong here, so please let me know if so.I have created a minimally reproducible branch here. It runs ESIM textual entailment (one already present in
allennlp
) by reading SNLI train datasetEdit: Actually esim is only for borrowing config file, I am only doing
dry-run
here.DatasetReader config is as follows:
To reproduce, just run following on that branch:
vs
Btw, I did change snli reader to use allennlp
Token
instead of spacy tokens as in #1887. You can check that in last commit on that branch.The text was updated successfully, but these errors were encountered: