Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

train model failed again #8

Closed
akafen opened this issue May 24, 2021 · 3 comments
Closed

train model failed again #8

akafen opened this issue May 24, 2021 · 3 comments

Comments

@akafen
Copy link

akafen commented May 24, 2021

Using the latest code
The error is below:

Traceback (most recent call last):
File "scripts/train_models.py", line 82, in
[job.result() for jobs in jobs_dict.values() for job in jobs]
File "scripts/train_models.py", line 82, in
[job.result() for jobs in jobs_dict.values() for job in jobs]
File "/home/liuyijiao/torch_venv/lib/python3.7/site-packages/submitit/core/core.py", line 261, in result
r = self.results()
File "/home/liuyijiao/torch_venv/lib/python3.7/site-packages/submitit/local/debug.py", line 72, in results
return [self._submission.result()]
File "/home/liuyijiao/torch_venv/lib/python3.7/site-packages/submitit/core/utils.py", line 128, in result
self._result = self.function(*self.args, **self.kwargs)
File "/home/liuyijiao/muss/muss/utils/training.py", line 19, in wrapped_func
return func(*args, **kwargs)
File "/home/liuyijiao/muss/muss/utils/training.py", line 39, in wrapped_func
return func(*args, **kwargs)
File "/home/liuyijiao/muss/muss/utils/submitit.py", line 41, in wrapped_func
return func(*args, **kwargs)
File "/home/liuyijiao/muss/muss/utils/training.py", line 49, in wrapped_func
result = func(*args, **kwargs)
File "/home/liuyijiao/muss/muss/utils/helpers.py", line 470, in wrapped_func
return func(*args, **kwargs)
File "/home/liuyijiao/muss/muss/fairseq/main.py", line 228, in fairseq_train_and_evaluate_with_parametrization
recommended_preprocessors_kwargs = print_running_time(find_best_parametrization)(exp_dir, **kwargs)
File "/home/liuyijiao/muss/muss/utils/helpers.py", line 470, in wrapped_func
return func(*args, **kwargs)
File "/home/liuyijiao/muss/muss/fairseq/main.py", line 174, in find_best_parametrization
return find_best_parametrization_nevergrad(exp_dir, preprocessors_kwargs, *args, **kwargs)
File "/home/liuyijiao/muss/muss/fairseq/main.py", line 150, in find_best_parametrization_nevergrad
recommendation = optimizer.minimize(evaluate_parametrization, verbosity=0)
File "/home/liuyijiao/torch_venv/lib/python3.7/site-packages/nevergrad/optimization/base.py", line 460, in minimize
result = job.result()
File "/home/liuyijiao/torch_venv/lib/python3.7/site-packages/nevergrad/optimization/utils.py", line 133, in result
self._result = self.func(*self.args, **self.kwargs)
File "/home/liuyijiao/muss/muss/fairseq/main.py", line 130, in evaluate_parametrization
scores = evaluate_simplifier(simplifier, **kwargs.get('evaluate_kwargs', {'test_set': 'asset_valid'}))
File "/home/liuyijiao/muss/muss/evaluation/general.py", line 20, in evaluate_simplifier
sys_sents_path = simplifier(orig_sents_path)
File "/home/liuyijiao/muss/muss/simplifiers.py", line 42, in wrapped
simplifier(complex_filepath, pred_filepath)
File "/home/liuyijiao/muss/muss/simplifiers.py", line 30, in wrapped
simplifier(complex_filepath, pred_filepath)
File "/home/liuyijiao/muss/muss/simplifiers.py", line 68, in preprocessed_simplifier
preprocessed_pred_filepath = simplifier(preprocessed_complex_filepath)
File "/home/liuyijiao/muss/muss/simplifiers.py", line 42, in wrapped
simplifier(complex_filepath, pred_filepath)
File "/home/liuyijiao/muss/muss/simplifiers.py", line 30, in wrapped
simplifier(complex_filepath, pred_filepath)
File "/home/liuyijiao/muss/muss/simplifiers.py", line 54, in fairseq_simplifier
fairseq_generate(complex_filepath, output_pred_filepath, exp_dir, **kwargs)
File "/home/liuyijiao/muss/muss/fairseq/base.py", line 278, in fairseq_generate
**kwargs,
File "/home/liuyijiao/muss/muss/utils/training.py", line 60, in wrapped_func
return func(*args, **kwargs)
File "/home/liuyijiao/muss/muss/fairseq/base.py", line 231, in _fairseq_generate
generate.cli_main()
File "/home/liuyijiao/torch_venv/lib/python3.7/site-packages/fairseq_cli/generate.py", line 382, in cli_main
main(args)
File "/home/liuyijiao/torch_venv/lib/python3.7/site-packages/fairseq_cli/generate.py", line 41, in main
return _main(args, sys.stdout)
File "/home/liuyijiao/torch_venv/lib/python3.7/site-packages/fairseq_cli/generate.py", line 179, in _main
for sample in progress:
File "/home/liuyijiao/torch_venv/lib/python3.7/site-packages/tqdm/std.py", line 1127, in iter
for obj in iterable:
File "/home/liuyijiao/torch_venv/lib/python3.7/site-packages/fairseq/data/iterators.py", line 59, in iter
for x in self.iterable:
File "/home/liuyijiao/torch_venv/lib/python3.7/site-packages/fairseq/data/iterators.py", line 591, in next
raise item
File "/home/liuyijiao/torch_venv/lib/python3.7/site-packages/fairseq/data/iterators.py", line 522, in run
for item in self._source:
File "/home/liuyijiao/torch_venv/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 435, in next
data = self._next_data()
File "/home/liuyijiao/torch_venv/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1068, in _next_data
idx, data = self._get_data()
File "/home/liuyijiao/torch_venv/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1034, in _get_data
success, data = self._try_get_data()
File "/home/liuyijiao/torch_venv/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 905, in _try_get_data
" at the beginning of your code") from None
RuntimeError: Too many open files. Communication with the workers is no longer possible. Please increase the limit using ulimit -n in the shell or change the sharing strategy by calling torch.multiprocessing.set_sharing_strategy('file_system') at the beginning of your code
36%|█████████████████▎ | 46/128 [59:14:20<105:36:00, 4636.10s/it]

By the way,It happens in find_best_parametrization.But when I change the max_update from 50000 to 50 in fairseq/base.py, it seems fine.I check the ulimit in my machine:

$ ulimit -Sn
1024
$ ulimit -Hn
1048576

By the way, max-sentences error also happens in find_best_parametrization using get_mbart_kwargs

@louismartin
Copy link
Contributor

Hi @akafen thanks for raising this issue!

It seems that it's not a bug coming from MUSS but rather from torch or your system. Have you checked this issue pytorch/pytorch#11201 ?

@louismartin
Copy link
Contributor

louismartin commented May 27, 2021

Concerning the max-sentences error, can you check that this commit fixes the issue? #9

@akafen
Copy link
Author

akafen commented May 27, 2021

@louismartin ok, thanks

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants