Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CPU inference freezes on server with SLURM task manager #10736

Closed
dawnmy opened this issue Mar 2, 2022 · 4 comments
Closed

CPU inference freezes on server with SLURM task manager #10736

dawnmy opened this issue Mar 2, 2022 · 4 comments
Labels
core runtime issues related to core runtime

Comments

@dawnmy
Copy link

dawnmy commented Mar 2, 2022

Describe the bug

I tried to do inference with Python multiprocessing on a server with SLURM task manager. The program just got frozen (onnxruntime.InferenceSession blocked) and can not be terminated with CTRL+C. But it works just fine on a server without using SLURM. The relevant code lines: https://github.com/hzi-bifo/RiboDetector/blob/ae40ae4a49ceb63a39297c3ae7b6d92581c6ab7b/ribodetector/detect_cpu.py#L73-L79

I also tried to set:

so.intra_op_num_threads = 1
so.inter_op_num_threads = 1

and OMP_NUM_THREADS = 1. Then the program can start to run, but the CPU load for each process is very low. The sum of the load of all processes is about 200% no matter how many processes I set with multiprocessing. On a server without using SLURM the CPU load of each process is normal i.e. 100%.

Urgency
I used onnxruntime in the software I developed: https://github.com/hzi-bifo/RiboDetector. And recently I got lots of users. All the users using SLURM encountered this issue.

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): CentOS 7, Ubuntu 18.04, Fedora release 29
  • ONNX Runtime installed from (source or binary): binary with pip3
  • ONNX Runtime version: 1.10.0, 1.7.0
  • Python version: Python3.8, Python3.9
  • Visual Studio version (if applicable):
  • GCC/Compiler version (if compiling from source):
  • CUDA/cuDNN version:
  • GPU model and memory:

To Reproduce
Run the following code lines with SLURM

so = onnxruntime.SessionOptions()
so.graph_optimization_level = onnxruntime.GraphOptimizationLevel.ORT_ENABLE_ALL
model = onnxruntime.InferenceSession(model_file, so)

the model file is: https://github.com/hzi-bifo/RiboDetector/blob/pip/ribodetector/data/ribodetector_600k_variable_len70_101_epoch47.onnx

Expected behavior
The InferenceSession can be created and the inference can run with ~100% CPU load for each process

Additional context
Add any other context about the problem here. If the issue is about a particular model, please share the model details as well to facilitate debugging.

@dawnmy
Copy link
Author

dawnmy commented Mar 2, 2022

When invoked with python -m trace --trace:

detect_cpu.py(71):             cd, self.config['state_file'][model_file_ext]).replace('.pth', '.onnx')
detect_cpu.py(70):         self.model_file = os.path.join(
detect_cpu.py(74):         so = onnxruntime.SessionOptions()
detect_cpu.py(77):         so.graph_optimization_level = onnxruntime.GraphOptimizationLevel.ORT_ENABLE_ALL
detect_cpu.py(79):         self.model = onnxruntime.InferenceSession(self.model_file, so)
 --- modulename: onnxruntime_inference_collection, funcname: __init__
onnxruntime_inference_collection.py(315):         Session.__init__(self)
 --- modulename: onnxruntime_inference_collection, funcname: __init__
onnxruntime_inference_collection.py(104):         self._sess = None
onnxruntime_inference_collection.py(105):         self._enable_fallback = True
onnxruntime_inference_collection.py(317):         if isinstance(path_or_bytes, str):
onnxruntime_inference_collection.py(318):             self._model_path = path_or_bytes
onnxruntime_inference_collection.py(319):             self._model_bytes = None
onnxruntime_inference_collection.py(326):         self._sess_options = sess_options
onnxruntime_inference_collection.py(327):         self._sess_options_initial = sess_options
onnxruntime_inference_collection.py(328):         self._enable_fallback = True
onnxruntime_inference_collection.py(329):         self._read_config_from_model = os.environ.get('ORT_LOAD_CONFIG_FROM_MODEL') == '1'
 --- modulename: _collections_abc, funcname: get
_collections_abc.py(659):         try:
_collections_abc.py(660):             return self[key]
 --- modulename: os, funcname: __getitem__
os.py(671):         try:
os.py(672):             value = self._data[self.encodekey(key)]
 --- modulename: os, funcname: encode
os.py(749):             if not isinstance(value, str):
os.py(751):             return value.encode(encoding, 'surrogateescape')
os.py(673):         except KeyError:
os.py(675):             raise KeyError(key) from None
_collections_abc.py(661):         except KeyError:
_collections_abc.py(662):             return default
onnxruntime_inference_collection.py(332):         disabled_optimizers = kwargs['disabled_optimizers'] if 'disabled_optimizers' in kwargs else None
onnxruntime_inference_collection.py(334):         try:
onnxruntime_inference_collection.py(335):             self._create_inference_session(providers, provider_options, disabled_optimizers)
 --- modulename: onnxruntime_inference_collection, funcname: _create_inference_session
onnxruntime_inference_collection.py(347):         available_providers = C.get_available_providers()
onnxruntime_inference_collection.py(350):         if 'TensorrtExecutionProvider' in available_providers:
onnxruntime_inference_collection.py(353):             self._fallback_providers = ['CPUExecutionProvider']
onnxruntime_inference_collection.py(356):         providers, provider_options = check_and_normalize_provider_args(providers,
onnxruntime_inference_collection.py(357):                                                                         provider_options,
onnxruntime_inference_collection.py(358):                                                                         available_providers)
onnxruntime_inference_collection.py(356):         providers, provider_options = check_and_normalize_provider_args(providers,
 --- modulename: onnxruntime_inference_collection, funcname: check_and_normalize_provider_args
onnxruntime_inference_collection.py(48):     if providers is None:
onnxruntime_inference_collection.py(49):         return [], []
onnxruntime_inference_collection.py(359):         if providers == [] and len(available_providers) > 1:
onnxruntime_inference_collection.py(366):         session_options = self._sess_options if self._sess_options else C.get_default_session_options()
onnxruntime_inference_collection.py(367):         if self._model_path:
onnxruntime_inference_collection.py(368):             sess = C.InferenceSession(session_options, self._model_path, True, self._read_config_from_model)

@dawnmy
Copy link
Author

dawnmy commented Mar 7, 2022

Seems to be solved by set

so.intra_op_num_threads = 1
so.inter_op_num_threads = 1

and SLURM --cpus-per-task <num_cpus> --threads-per-core 1

@gevro
Copy link

gevro commented May 8, 2024

Hi, I ran into the same issue with a different program. But I don't have access to that program source code. Is there any workaround for this issue as a user without modifying the source code to add those two parameters?

I'm curious why hasn't ONNX been fixed internally to handle SLURM correctly?

@wangsl
Copy link

wangsl commented May 11, 2024

The issue is because of CPU affinity set for new created threads, the default assigned CPU core may not be available from job scheduler when cgroup is enabled. One solution is to override the function pthread_setaffinity_np. The c code is available from

https://raw.githubusercontent.com/wangsl/pthread-setaffinity/main/pthread-setaffinity.c

to compile the code

gcc -fPIC -shared -Wl,-soname,libpthread-setaffinity.so -ldl -o libpthread-setaffinity.so pthread-setaffinity.c

then

export LD_PRELOAD=libpthread-setaffinity.so

Now it should work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core runtime issues related to core runtime
Projects
None yet
Development

No branches or pull requests

4 participants