-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
train_on_dataset
much slower when using ActiveLearningDataset
compared to torch Dataset
#264
Comments
train_on_dataset
much slower when using ActiveLearningDataset
compared to torch.Dataset
train_on_dataset
much slower when using ActiveLearningDataset
compared to torch Dataset
Hello! I was able to reproduce with this example # test that active learning is fast
from torchvision.datasets import CIFAR10
from baal.active.dataset import ActiveLearningDataset
dataset = CIFAR10(root='/tmp', train=True, download=True)
al_dataset = ActiveLearningDataset(dataset)
al_dataset.label_randomly(len(dataset))
%timeit [x for x in al_dataset]
%timeit [x for x in dataset] I have a possible fix where we cache the result of I'll try to merge this quickly and make a release, but I'm away for the long weekend. Coming back on Monday |
I opened #265, not super happy with the solution, but that's the best I can do for now. Now it is "only" 2x slower, will revisit next week, but feel free to use the branch |
Thank you for the fix! I would be happy with the "only 2x slower" training time. |
Hello! I want to be sure that we are not blocking you. If so, I'll immediately merge and deploy a minor release asap. |
I’m currently on holiday and need to work on a paper revision when I return to the office (not related to Baal). As such, I’ll not be working with Baal the next 4 weeks so the fix is not urgent for me. If the minor release is not done by then, I’ll install it from source. Thank you for your message. |
Fixed in #265 |
Describe the bug
When the entire pool is labelled (i.e. training on the entire training set), the
train_on_dataset
function is much slower when using anActiveLearningDataset
as compared to using a regular torchDataset
. In the MNIST experiment below, it is 17x slower (!!!).I suspect this discrepancy is larger when the labelled pool is larger, because there is no difference when only using 20 labelled samples.
To Reproduce
In this gist, a LeNet-5 model with MC Dropout is trained on the entire MNIST data for 1 epoch. Note that this script does not perform any active learning as no acquisitions are done and the pool set is empty. The script was intended to compare training times across AL packages.
The script has an option
use-ald
, which uses theActiveLearningDataset
in thetrain_on_dataset
function instead of the regular torchDataset
. Please refer to lines 83-94 in the gist for the relevant code.Results are as follows:
python baal_mnist.py --use-ald
=> "Elapsed training time: 0:1:23"python baal_mnist.py
=> "Elapsed training time: 0:0:5"Here is the full output:
Expected behavior
I would expect the training time with
ActiveLearningDataset
to be a few percent slower, but not 17x slower.Version (please complete the following information):
Additional context
I want to use active learning in my experiments, so just using the torch
Dataset
is not an appropriate solution.Any ideas why this is the case and whether this could be fixed?
Thank you!
The text was updated successfully, but these errors were encountered: