Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel smooth fails at 2e6 spectra #941

Open
keflavich opened this issue Feb 25, 2025 · 0 comments
Open

Parallel smooth fails at 2e6 spectra #941

keflavich opened this issue Feb 25, 2025 · 0 comments

Comments

@keflavich
Copy link
Contributor

I've consistently seen a failure at the 2 millionth(ish) spectrum:

[Parallel(n_jobs=16)]: Done 2158288 tasks      | elapsed: 33.8min
_RemoteTraceback:
"""
Traceback (most recent call last):
  File "/blue/adamginsburg/adamginsburg/miniconda3/envs/python312/lib/python3.12/site-packages/joblib/externals/loky/process_executor.py", line 426, in _process_worker
    call_item = call_queue.get(block=True, timeout=timeout)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/blue/adamginsburg/adamginsburg/miniconda3/envs/python312/lib/python3.12/multiprocessing/queues.py", line 122, in get
    return _ForkingPickler.loads(res)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/blue/adamginsburg/adamginsburg/miniconda3/envs/python312/lib/python3.12/site-packages/joblib/_memmapping_reducer.py", line 245, in _strided_from_memmap
    return make_memmap(
           ^^^^^^^^^^^^
  File "/blue/adamginsburg/adamginsburg/miniconda3/envs/python312/lib/python3.12/site-packages/joblib/backports.py", line 133, in make_memmap
    mm = np.memmap(filename, dtype=dtype, mode=mode, offset=offset,
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/blue/adamginsburg/adamginsburg/miniconda3/envs/python312/lib/python3.12/site-packages/numpy/core/memmap.py", line 229, in __new__
    f_ctx = open(os_fspath(filename), ('r' if mode == 'c' else mode)+'b')
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: '/scratch/local/59392354/tmpzhz849fv'
"""

The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File /orange/adamginsburg/ACES/reduction_ACES/aces/analysis/downsample_all_spectrally.py:56
    if os.path.exists(outname):
  File /orange/adamginsburg/ACES/reduction_ACES/aces/analysis/downsample_all_spectrally.py:47 in main
    target_path = '/red/adamginsburg/workdir/mosaics/'
  File /orange/adamginsburg/ACES/reduction_ACES/aces/imaging/make_mosaic.py:1063 in downsample_spectrally
    dscube = cube.spectral_smooth(kernel, use_memmap=True, num_cores=num_cores, verbose=verbose)
  File /blue/adamginsburg/adamginsburg/repos/spectral-cube/spectral_cube/spectral_cube.py:124 in wrapper
    return func(*args, **kwargs)
  File /blue/adamginsburg/adamginsburg/repos/spectral-cube/spectral_cube/spectral_cube.py:3216 in spectral_smooth
    return self.apply_function_parallel_spectral(convolve,
  File /blue/adamginsburg/adamginsburg/repos/spectral-cube/spectral_cube/spectral_cube.py:3151 in apply_function_parallel_spectral
    return self._apply_function_parallel_base(iteration_data=spectra,
  File /blue/adamginsburg/adamginsburg/repos/spectral-cube/spectral_cube/spectral_cube.py:3014 in _apply_function_parallel_base
    Parallel(n_jobs=num_cores,
  File /blue/adamginsburg/adamginsburg/miniconda3/envs/python312/lib/python3.12/site-packages/joblib/parallel.py:2007 in __call__
    return output if self.return_generator else list(output)
  File /blue/adamginsburg/adamginsburg/miniconda3/envs/python312/lib/python3.12/site-packages/joblib/parallel.py:1650 in _get_outputs
    yield from self._retrieve()
  File /blue/adamginsburg/adamginsburg/miniconda3/envs/python312/lib/python3.12/site-packages/joblib/parallel.py:1754 in _retrieve
    self._raise_error_fast()
  File /blue/adamginsburg/adamginsburg/miniconda3/envs/python312/lib/python3.12/site-packages/joblib/parallel.py:1789 in _raise_error_fast
    error_job.get_result(self.timeout)
  File /blue/adamginsburg/adamginsburg/miniconda3/envs/python312/lib/python3.12/site-packages/joblib/parallel.py:745 in get_result
    return self._return_or_raise()
  File /blue/adamginsburg/adamginsburg/miniconda3/envs/python312/lib/python3.12/site-packages/joblib/parallel.py:763 in _return_or_raise
    raise self._result
BrokenProcessPool: A task has failed to un-serialize. Please ensure that the arguments of the function are all picklable.

I can't quite tell what's gone on here - either joblib has some maximum number of total jobs it can handle, or memmap does, or .... something else?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant