Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cudnn PoolForward launch failed exception #98

Closed
hzlmn opened this issue Aug 10, 2020 · 2 comments
Closed

Cudnn PoolForward launch failed exception #98

hzlmn opened this issue Aug 10, 2020 · 2 comments

Comments

@hzlmn
Copy link

hzlmn commented Aug 10, 2020

Hello, thanks for your work on package. We periodically get such exceptions with cudnn. Any hints what can cause such problem?
Env:
tensorflow-gpu==1.14
cuda 10.1
cudnn 7.6.5.32
mtcnn==0.0.9

tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
  (0) Internal: cudnn PoolForward launch failed
	 [[node rnet/pool1 (defined at usr/local/lib/python3.6/dist-packages/mtcnn/layer_factory.py:175) ]]
  (1) Internal: cudnn PoolForward launch failed
	 [[node rnet/pool1 (defined at usr/local/lib/python3.6/dist-packages/mtcnn/layer_factory.py:175) ]]
	 [[rnet/prob1/_111]]
0 successful operations.
0 derived errors ignored.
11:39
"caught error while running engine ops
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1356, in _do_call
    return fn(*args)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1341, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1429, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
  (0) Internal: cudnn PoolForward launch failed
	 [[{{node rnet/pool1}}]]
  (1) Internal: cudnn PoolForward launch failed
	 [[{{node rnet/pool1}}]]
	 [[rnet/prob1/_111]]
0 successful operations.
0 derived errors ignored.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/app/leapi/worker/pipeline_item_celery.py", line 118, in run_engine_proc
    out_payload = engine.run_ops(task.pipeline.operations, payload)
  File "/app/leapi/pipeline/engine.py", line 80, in run_ops
    payload = self.run_op(op, payload, warmup)
  File "/app/leapi/pipeline/engine.py", line 87, in run_op
    payload_out = f(payload, **op._kwargs)
  File "/app/leapi/util/timeit.py", line 10, in timed
    result = f(*args, **kwargs)
  File "/app/leapi/pipeline/neural.py", line 949, in resize_upscale_with_faces
    p = self.detect_and_extract_faces(p, face_method=face_method)
  File "/app/leapi/util/timeit.py", line 10, in timed
    result = f(*args, **kwargs)
  File "/app/leapi/pipeline/neural.py", line 559, in detect_and_extract_faces
    faces_json = self.mtcnn_detector.detect_faces(win)
  File "/usr/local/lib/python3.6/dist-packages/mtcnn/mtcnn.py", line 418, in detect_faces
    result = stage(img, result[0], result[1])
  File "/usr/local/lib/python3.6/dist-packages/mtcnn/mtcnn.py", line 528, in __stage2
    out = self.__rnet.feed(tempimg1)
  File "/usr/local/lib/python3.6/dist-packages/mtcnn/network.py", line 108, in feed
    return self._feed(image)
  File "/usr/local/lib/python3.6/dist-packages/mtcnn/mtcnn.py", line 103, in _feed
    return self._session.run(['rnet/fc2-2/fc2-2:0', 'rnet/prob1:0'], feed_dict={'rnet/input:0': image})
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 950, in run
    run_metadata_ptr)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1173, in _run
    feed_dict_tensor, options, run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1350, in _do_run
    run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1370, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
  (0) Internal: cudnn PoolForward launch failed
	 [[node rnet/pool1 (defined at usr/local/lib/python3.6/dist-packages/mtcnn/layer_factory.py:175) ]]
  (1) Internal: cudnn PoolForward launch failed
	 [[node rnet/pool1 (defined at usr/local/lib/python3.6/dist-packages/mtcnn/layer_factory.py:175) ]]
	 [[rnet/prob1/_111]]
0 successful operations.
0 derived errors ignored.
Errors may have originated from an input operation.
@owlhtchen
Copy link

I also got "tensorflow.python.framework.errors_impl.InternalError: cudnn PoolForward launch failed", did you end up fixing this issue? I am using tensorflow-gpu 1.12, cuda 9.0, cudnn 7.6.5.

@hzlmn
Copy link
Author

hzlmn commented Mar 5, 2022

To be honest, i did not remember :) i guess i ended up playing with versions and env config.

ipazc pushed a commit that referenced this issue Oct 7, 2024
…tch processing support

- Completely refactored the MTCNN implementation following best coding practices.
- Optimized code by removing unnecessary transpositions, resulting in faster computation. Fixes #22.
- Transposed convolutional layer weights to eliminate the need for additional transpositions during preprocessing and postprocessing, improving overall efficiency.
- Converted preprocessing and postprocessing functions into matrix operations to accelerate computation. Fixes #14, #110.
- Added batch processing support to enhance performance for multiple input images. Fixes #9, #71.
- Migrated network architecture to TensorFlow >= 2.12 for improved compatibility and performance. Fixes #80, #82, #90, #91, #93, #98, #104, #112, #114, #115, #116.
- Extensively documented the project with detailed explanations of thresholds and parameters. Fixes #12, #41, #52, #57, #99, #122, #117.
- Added support for selecting computation backends (CPU, GPU, etc.) with the `device` parameter. Fixes #23.
- Added new parameters to control the result format (support for x1, y1, x2, y2 instead of x1, y1, width, height) and the ability to return tensors instead of dictionaries. Fixes #72.
- Configured PyLint support to ensure code quality and style adherence.
- Organized functions into specific modules (`mtcnn.utils.*` and `mtcnn.stages.*`) for better modularity.
- Created Jupyter notebooks for visualization and ablation studies of each stage, allowing detailed exploration of layers, weights, and intermediate results. Fixes #88, #102.
- Added a comprehensive training guide for the model. Fixes #35, #39.
- Updated README with information on the new version, including the complete Read the Docs documentation that describes the process, theoretical background, and usage examples. Fixes #53, #73.
- Configured GitHub Actions for continuous integration and delivery (CI/CD).
- Fixed memory leak by switching to a more efficient TensorFlow method (`model(tensor)` instead of `model.predict(tensor)`). Fixes #87, #109, #121, #125, #128.
- Made TensorFlow an optional dependency to prevent conflicts with user-installed versions. Fixes #95.
- Added comprehensive unit tests for increased reliability and coverage.
@ipazc ipazc mentioned this issue Oct 8, 2024
ipazc pushed a commit that referenced this issue Oct 8, 2024
…tch processing support

- Completely refactored the MTCNN implementation following best coding practices.
- Optimized code by removing unnecessary transpositions, resulting in faster computation. Fixes #22.
- Transposed convolutional layer weights to eliminate the need for additional transpositions during preprocessing and postprocessing, improving overall efficiency.
- Converted preprocessing and postprocessing functions into matrix operations to accelerate computation. Fixes #14, #110.
- Added batch processing support to enhance performance for multiple input images. Fixes #9, #71.
- Migrated network architecture to TensorFlow >= 2.12 for improved compatibility and performance. Fixes #80, #82, #90, #91, #93, #98, #104, #112, #114, #115, #116.
- Extensively documented the project with detailed explanations of thresholds and parameters. Fixes #12, #41, #52, #57, #99, #122, #117.
- Added support for selecting computation backends (CPU, GPU, etc.) with the `device` parameter. Fixes #23.
- Added new parameters to control the result format (support for x1, y1, x2, y2 instead of x1, y1, width, height) and the ability to return tensors instead of dictionaries. Fixes #72.
- Configured PyLint support to ensure code quality and style adherence.
- Organized functions into specific modules (`mtcnn.utils.*` and `mtcnn.stages.*`) for better modularity.
- Created Jupyter notebooks for visualization and ablation studies of each stage, allowing detailed exploration of layers, weights, and intermediate results. Fixes #88, #102.
- Added a comprehensive training guide for the model. Fixes #35, #39.
- Updated README with information on the new version, including the complete Read the Docs documentation that describes the process, theoretical background, and usage examples. Fixes #53, #73.
- Configured GitHub Actions for continuous integration and delivery (CI/CD).
- Fixed memory leak by switching to a more efficient TensorFlow method (`model(tensor)` instead of `model.predict(tensor)`). Fixes #87, #109, #121, #125, #128.
- Made TensorFlow an optional dependency to prevent conflicts with user-installed versions. Fixes #95.
- Added comprehensive unit tests for increased reliability and coverage.
@ipazc ipazc closed this as completed Oct 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants