Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Torch CUDA synchronize update #1826

Merged
merged 4 commits into from
Jan 3, 2021
Merged

Torch CUDA synchronize update #1826

merged 4 commits into from
Jan 3, 2021

Conversation

glenn-jocher
Copy link
Member

@glenn-jocher glenn-jocher commented Jan 3, 2021

Fix for #1816. Torch CUDA synchronization is not attempted now on CUDA-enabled machines when --device cpu is requested.

🛠️ PR Summary

Made with ❤️ by Ultralytics Actions

🌟 Summary

Improved CUDA device handling and reproducibility configuration in PyTorch utilities.

📊 Key Changes

  • Unified assignment of cudnn.deterministic and cudnn.benchmark values for clarity.
  • Simplified CPU request handling in select_device by setting environment variable CUDA_VISIBLE_DEVICES to '-1'.
  • Enhanced device selection string output to be more informative and user-friendly.
  • Ensured batch_size is a multiple of GPU count, raising an error for incompatibility.
  • Optimized GPU device properties fetching and logging.
  • Made the time_synchronized function more concise.

🎯 Purpose & Impact

  • Streamlining CUDA configurations improves code clarity and reproducibility when seeding for randomness 🔁.
  • Clarifications in device selection facilitate better handling of CPU-only requests and ensure better error messaging 🖥️.
  • More informative logging assists users in understanding their device utilization and system capabilities 📈.
  • Batch size check against GPU count helps prevent runtime errors due to configuration mismatches 🛠️.
  • Cleaner codebase with compact and improved readability, aiding maintenance and development 🧹.

These changes help provide a smoother experience for users working with different devices and configurations, aiming to improve the overall reliability and performance of machine learning workflows running on the Ultralytics YOLOv5 project.

@glenn-jocher
Copy link
Member Author

PR is verified, all tests passing: single GPU, multi-GPU, specific multi-GPU order, CPU.

Screen Shot 2021-01-03 at 11 16 58 AM

@glenn-jocher glenn-jocher merged commit 9f5a18b into master Jan 3, 2021
@glenn-jocher glenn-jocher deleted the cuda_synchronize branch January 3, 2021 19:23
KMint1819 pushed a commit to KMint1819/yolov5 that referenced this pull request May 12, 2021
* torch.cuda.synchronize() update

* torch.cuda.synchronize() update

* torch.cuda.synchronize() update

* newline
taicaile pushed a commit to taicaile/yolov5 that referenced this pull request Oct 12, 2021
* torch.cuda.synchronize() update

* torch.cuda.synchronize() update

* torch.cuda.synchronize() update

* newline
BjarneKuehl pushed a commit to fhkiel-mlaip/yolov5 that referenced this pull request Aug 26, 2022
* torch.cuda.synchronize() update

* torch.cuda.synchronize() update

* torch.cuda.synchronize() update

* newline
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

time_synchronized() when using CPU for inference on a GPU enabled workstation?
1 participant