Cudnn PoolForward launch failed exception #98

hzlmn · 2020-08-10T15:23:06Z

Hello, thanks for your work on package. We periodically get such exceptions with cudnn. Any hints what can cause such problem?
Env:
tensorflow-gpu==1.14
cuda 10.1
cudnn 7.6.5.32
mtcnn==0.0.9

tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
  (0) Internal: cudnn PoolForward launch failed
	 [[node rnet/pool1 (defined at usr/local/lib/python3.6/dist-packages/mtcnn/layer_factory.py:175) ]]
  (1) Internal: cudnn PoolForward launch failed
	 [[node rnet/pool1 (defined at usr/local/lib/python3.6/dist-packages/mtcnn/layer_factory.py:175) ]]
	 [[rnet/prob1/_111]]
0 successful operations.
0 derived errors ignored.
11:39
"caught error while running engine ops
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1356, in _do_call
    return fn(*args)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1341, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1429, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
  (0) Internal: cudnn PoolForward launch failed
	 [[{{node rnet/pool1}}]]
  (1) Internal: cudnn PoolForward launch failed
	 [[{{node rnet/pool1}}]]
	 [[rnet/prob1/_111]]
0 successful operations.
0 derived errors ignored.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/app/leapi/worker/pipeline_item_celery.py", line 118, in run_engine_proc
    out_payload = engine.run_ops(task.pipeline.operations, payload)
  File "/app/leapi/pipeline/engine.py", line 80, in run_ops
    payload = self.run_op(op, payload, warmup)
  File "/app/leapi/pipeline/engine.py", line 87, in run_op
    payload_out = f(payload, **op._kwargs)
  File "/app/leapi/util/timeit.py", line 10, in timed
    result = f(*args, **kwargs)
  File "/app/leapi/pipeline/neural.py", line 949, in resize_upscale_with_faces
    p = self.detect_and_extract_faces(p, face_method=face_method)
  File "/app/leapi/util/timeit.py", line 10, in timed
    result = f(*args, **kwargs)
  File "/app/leapi/pipeline/neural.py", line 559, in detect_and_extract_faces
    faces_json = self.mtcnn_detector.detect_faces(win)
  File "/usr/local/lib/python3.6/dist-packages/mtcnn/mtcnn.py", line 418, in detect_faces
    result = stage(img, result[0], result[1])
  File "/usr/local/lib/python3.6/dist-packages/mtcnn/mtcnn.py", line 528, in __stage2
    out = self.__rnet.feed(tempimg1)
  File "/usr/local/lib/python3.6/dist-packages/mtcnn/network.py", line 108, in feed
    return self._feed(image)
  File "/usr/local/lib/python3.6/dist-packages/mtcnn/mtcnn.py", line 103, in _feed
    return self._session.run(['rnet/fc2-2/fc2-2:0', 'rnet/prob1:0'], feed_dict={'rnet/input:0': image})
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 950, in run
    run_metadata_ptr)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1173, in _run
    feed_dict_tensor, options, run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1350, in _do_run
    run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1370, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: 2 root error(s) found.
  (0) Internal: cudnn PoolForward launch failed
	 [[node rnet/pool1 (defined at usr/local/lib/python3.6/dist-packages/mtcnn/layer_factory.py:175) ]]
  (1) Internal: cudnn PoolForward launch failed
	 [[node rnet/pool1 (defined at usr/local/lib/python3.6/dist-packages/mtcnn/layer_factory.py:175) ]]
	 [[rnet/prob1/_111]]
0 successful operations.
0 derived errors ignored.
Errors may have originated from an input operation.

The text was updated successfully, but these errors were encountered:

owlhtchen · 2022-03-03T17:01:54Z

I also got "tensorflow.python.framework.errors_impl.InternalError: cudnn PoolForward launch failed", did you end up fixing this issue? I am using tensorflow-gpu 1.12, cuda 9.0, cudnn 7.6.5.

hzlmn · 2022-03-05T19:03:14Z

To be honest, i did not remember :) i guess i ended up playing with versions and env config.

…tch processing support - Completely refactored the MTCNN implementation following best coding practices. - Optimized code by removing unnecessary transpositions, resulting in faster computation. Fixes #22. - Transposed convolutional layer weights to eliminate the need for additional transpositions during preprocessing and postprocessing, improving overall efficiency. - Converted preprocessing and postprocessing functions into matrix operations to accelerate computation. Fixes #14, #110. - Added batch processing support to enhance performance for multiple input images. Fixes #9, #71. - Migrated network architecture to TensorFlow >= 2.12 for improved compatibility and performance. Fixes #80, #82, #90, #91, #93, #98, #104, #112, #114, #115, #116. - Extensively documented the project with detailed explanations of thresholds and parameters. Fixes #12, #41, #52, #57, #99, #122, #117. - Added support for selecting computation backends (CPU, GPU, etc.) with the `device` parameter. Fixes #23. - Added new parameters to control the result format (support for x1, y1, x2, y2 instead of x1, y1, width, height) and the ability to return tensors instead of dictionaries. Fixes #72. - Configured PyLint support to ensure code quality and style adherence. - Organized functions into specific modules (`mtcnn.utils.*` and `mtcnn.stages.*`) for better modularity. - Created Jupyter notebooks for visualization and ablation studies of each stage, allowing detailed exploration of layers, weights, and intermediate results. Fixes #88, #102. - Added a comprehensive training guide for the model. Fixes #35, #39. - Updated README with information on the new version, including the complete Read the Docs documentation that describes the process, theoretical background, and usage examples. Fixes #53, #73. - Configured GitHub Actions for continuous integration and delivery (CI/CD). - Fixed memory leak by switching to a more efficient TensorFlow method (`model(tensor)` instead of `model.predict(tensor)`). Fixes #87, #109, #121, #125, #128. - Made TensorFlow an optional dependency to prevent conflicts with user-installed versions. Fixes #95. - Added comprehensive unit tests for increased reliability and coverage.

ipazc mentioned this issue Oct 8, 2024

MTCNN v1.0.0 #132

Closed

ipazc mentioned this issue Oct 8, 2024

Refactored MTCNN codebase with significant optimizations. Version 1.0.0 #133

Merged

ipazc closed this as completed Oct 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cudnn PoolForward launch failed exception #98

Cudnn PoolForward launch failed exception #98

hzlmn commented Aug 10, 2020

owlhtchen commented Mar 3, 2022

hzlmn commented Mar 5, 2022

Cudnn PoolForward launch failed exception #98

Cudnn PoolForward launch failed exception #98

Comments

hzlmn commented Aug 10, 2020

owlhtchen commented Mar 3, 2022

hzlmn commented Mar 5, 2022