Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training metrics become NaN after 2k iterations in stage1 #77

Open
Asnly1 opened this issue Jan 6, 2025 · 2 comments
Open

Training metrics become NaN after 2k iterations in stage1 #77

Asnly1 opened this issue Jan 6, 2025 · 2 comments

Comments

@Asnly1
Copy link

Asnly1 commented Jan 6, 2025

Hello, thank you for your amazing work. I met a problem when trainning stage1 on my own dataset: all evaluation metrics become NaN after 2000 iterations. The training starts normally but fails to continue after this point. I have tried to prepare my data as provided data. Are there any specific data format requirements I should check? Or What may cause this problem?

@Asnly1
Copy link
Author

Asnly1 commented Jan 11, 2025

I find that it is because new_mask0 consists of 0 without other value in human_loader.py's function get_rectified_stereo_data. Thus flow in the following part becomes 0 and results in NaN. Does anyone encounter this problem?
image
image

@pengcanon
Copy link

Did you rectify your training data offline? It seems as if you are doing rectification while training. If so, have you tried rectifying your training data first before conducting stage 1 training?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants