You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The exception (below) is hinting at the modulo operation being the problem. There's obviously mathematical workarounds that avoid the modulo operation, but it'd be good to dig into it.
In the meantime, I've deleted the conda package in our Anaconda channel since it was getting downloaded even for the stable release (v1.3.3) since we version fenced SLEAP permissively to allow newer versions of TensorFlow. This issue may have affected a small number of users who installed SLEAP since I pushed that conda package though (~100-200).
If we need to rebuild that conda package for testing, we can just rerun the jobs in this workflow to rebuild and reupload TensorFlow v2.10 to our conda channel. We should probably change the tag from main to dev to prevent users from downloading the new release until it's fixed though.
The short term fix if others run into this is to just pip install tensorflow==2.7 and everything should work.
Actual behaviour
Inference breaks during evaluation or inference with a centered instance model, specifically during the call to find_global_peaks_rough:
File "d:\sleap_develop\sleap\nn\inference.py", line 2265, in call
if isinstance(self.instance_peaks, FindInstancePeaksGroundTruth):
File "d:\sleap_develop\sleap\nn\inference.py", line 2274, in call
peaks_output = self.instance_peaks(crop_output)
File "C:\Miniconda3\envs\sleap_develop\lib\site-packages\keras\utils\traceback_utils.py", line 65, in error_handler
return fn(*args, **kwargs)
File "C:\Miniconda3\envs\sleap_develop\lib\site-packages\keras\engine\base_layer.py", line 1097, in __call__
outputs = call_fn(inputs, *args, **kwargs)
File "C:\Miniconda3\envs\sleap_develop\lib\site-packages\keras\utils\traceback_utils.py", line 96, in error_handler
return fn(*args, **kwargs)
File "d:\sleap_develop\sleap\nn\inference.py", line 2110, in call
if self.offsets_ind is None:
File "d:\sleap_develop\sleap\nn\inference.py", line 2112, in call
peak_points, peak_vals = sleap.nn.peak_finding.find_global_peaks(
File "d:\sleap_develop\sleap\nn\peak_finding.py", line 366, in find_global_peaks
rough_peaks, peak_vals = find_global_peaks_rough(
File "d:\sleap_develop\sleap\nn\peak_finding.py", line 224, in find_global_peaks_rough
channel_subs = tf.range(total_peaks, dtype=tf.int64) % channels
Node: 'mod'
2 root error(s) found.
(0) UNKNOWN: JIT compilation failed.
[[{{node mod}}]]
[[top_down_inference_model/find_instance_peaks_1/RaggedFromValueRowIds_1/RowPartitionFromValueRowIds/bincount/Minimum/_436]]
(1) UNKNOWN: JIT compilation failed.
[[{{node mod}}]]
Yes, we were getting JIT errors, but we found that we could upgrade tensorflow - the newer versions just need some help finding the CUDA directory (located at our conda prefix).
Bug description
As part of an experimental move to the latest version of TensorFlow available on Windows (v2.10), we are now facing an issue during inference.
The logs below reveal that we're getting a weird error when using
find_global_peaks_rough
, specifically on this line:sleap/sleap/nn/peak_finding.py
Line 224 in eb14764
The exception (below) is hinting at the modulo operation being the problem. There's obviously mathematical workarounds that avoid the modulo operation, but it'd be good to dig into it.
In the meantime, I've deleted the conda package in our Anaconda channel since it was getting downloaded even for the stable release (v1.3.3) since we version fenced SLEAP permissively to allow newer versions of TensorFlow. This issue may have affected a small number of users who installed SLEAP since I pushed that conda package though (~100-200).
If we need to rebuild that conda package for testing, we can just rerun the jobs in this workflow to rebuild and reupload TensorFlow v2.10 to our conda channel. We should probably change the tag from
main
todev
to prevent users from downloading the new release until it's fixed though.The short term fix if others run into this is to just
pip install tensorflow==2.7
and everything should work.Actual behaviour
Inference breaks during evaluation or inference with a centered instance model, specifically during the call to
find_global_peaks_rough
:Your personal set up
eb14764
Environment packages
Logs
Screenshots
How to reproduce
Run inference with a top-down model (specifically the centered instance portion).
The text was updated successfully, but these errors were encountered: