Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prefer nvidia channel for conda builds #5648

Merged
merged 3 commits into from
Mar 21, 2022

Conversation

malfet
Copy link
Contributor

@malfet malfet commented Mar 20, 2022

Fixes #5635

@facebook-github-bot
Copy link

facebook-github-bot commented Mar 20, 2022

💊 CI failures summary and remediations

As of commit 876bb30 (more details on the Dr. CI page):


  • 3/3 failures introduced in this PR

🕵️ 3 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build binary_linux_conda_py3.8_cu115 (1/3)

Step: "packaging/build_conda.sh" (full log | diagnosis details | 🔁 rerun)

$SRC_DIR/torchvision/csrc/io/decoder/decoder.cp...tFormat*’ to ‘AVInputFormat*’ [-fpermissive]
          ^~~~~~~~~~~~~~~
$SRC_DIR/torchvision/csrc/io/decoder/decoder.cpp:33:10: note: suggested alternative: ‘AV_LOG_ERROR’
     case AV_LOCK_DESTROY:
          ^~~~~~~~~~~~~~~
          AV_LOG_ERROR
$SRC_DIR/torchvision/csrc/io/decoder/decoder.cpp: In lambda function:
$SRC_DIR/torchvision/csrc/io/decoder/decoder.cpp:206:5: error: ‘av_lockmgr_register’ was not declared in this scope
     av_lockmgr_register(&ffmpeg_lock);
     ^~~~~~~~~~~~~~~~~~~
$SRC_DIR/torchvision/csrc/io/decoder/decoder.cpp: In member function ‘virtual bool ffmpeg::Decoder::init(const ffmpeg::DecoderParameters&, ffmpeg::DecoderInCallback&&, std::vector<ffmpeg::DecoderMetadata>*)’:
$SRC_DIR/torchvision/csrc/io/decoder/decoder.cpp:280:33: error: invalid conversion from ‘const AVInputFormat*’ to ‘AVInputFormat*’ [-fpermissive]
       fmt = av_find_input_format(fmtName);
             ~~~~~~~~~~~~~~~~~~~~^~~~~~~~~
$SRC_DIR/torchvision/csrc/io/decoder/decoder.cpp: In member function ‘int ffmpeg::Decoder::getFrame(size_t)’:
$SRC_DIR/torchvision/csrc/io/decoder/decoder.cpp:509:27: warning: ‘void av_init_packet(AVPacket*)’ is deprecated [-Wdeprecated-declarations]
   av_init_packet(&avPacket);
                           ^
In file included from $BUILD_PREFIX/include/libavcodec/avcodec.h:45:0,
                 from $SRC_DIR/torchvision/csrc/io/decoder/defs.h:12,
                 from $SRC_DIR/torchvision/csrc/io/decoder/seekable_buffer.h:3,
                 from $SRC_DIR/torchvision/csrc/io/decoder/decoder.h:5,

See CircleCI build binary_linux_conda_py3.10_cu115 (2/3)

Step: "packaging/build_conda.sh" (full log | diagnosis details | 🔁 rerun)

$SRC_DIR/torchvision/csrc/io/decoder/decoder.cp...tFormat*’ to ‘AVInputFormat*’ [-fpermissive]
          ^~~~~~~~~~~~~~~
$SRC_DIR/torchvision/csrc/io/decoder/decoder.cpp:33:10: note: suggested alternative: ‘AV_LOG_ERROR’
     case AV_LOCK_DESTROY:
          ^~~~~~~~~~~~~~~
          AV_LOG_ERROR
$SRC_DIR/torchvision/csrc/io/decoder/decoder.cpp: In lambda function:
$SRC_DIR/torchvision/csrc/io/decoder/decoder.cpp:206:5: error: ‘av_lockmgr_register’ was not declared in this scope
     av_lockmgr_register(&ffmpeg_lock);
     ^~~~~~~~~~~~~~~~~~~
$SRC_DIR/torchvision/csrc/io/decoder/decoder.cpp: In member function ‘virtual bool ffmpeg::Decoder::init(const ffmpeg::DecoderParameters&, ffmpeg::DecoderInCallback&&, std::vector<ffmpeg::DecoderMetadata>*)’:
$SRC_DIR/torchvision/csrc/io/decoder/decoder.cpp:280:33: error: invalid conversion from ‘const AVInputFormat*’ to ‘AVInputFormat*’ [-fpermissive]
       fmt = av_find_input_format(fmtName);
             ~~~~~~~~~~~~~~~~~~~~^~~~~~~~~
$SRC_DIR/torchvision/csrc/io/decoder/decoder.cpp: In member function ‘int ffmpeg::Decoder::getFrame(size_t)’:
$SRC_DIR/torchvision/csrc/io/decoder/decoder.cpp:509:27: warning: ‘void av_init_packet(AVPacket*)’ is deprecated [-Wdeprecated-declarations]
   av_init_packet(&avPacket);
                           ^
In file included from $BUILD_PREFIX/include/libavcodec/avcodec.h:45:0,
                 from $SRC_DIR/torchvision/csrc/io/decoder/defs.h:12,
                 from $SRC_DIR/torchvision/csrc/io/decoder/seekable_buffer.h:3,
                 from $SRC_DIR/torchvision/csrc/io/decoder/decoder.h:5,

See CircleCI build binary_linux_conda_py3.7_cu115 (3/3)

Step: "packaging/build_conda.sh" (full log | diagnosis details | 🔁 rerun)

error_prefix='Error compiling objects for extension')
    self._build_extensions_serial()
  File "/opt/conda/conda-bld/torchvision_1647881675966/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/lib/python3.7/site-packages/setuptools/_distutils/command/build_ext.py", line 473, in _build_extensions_serial
    self.build_extension(ext)
  File "/opt/conda/conda-bld/torchvision_1647881675966/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/lib/python3.7/site-packages/setuptools/command/build_ext.py", line 202, in build_extension
    _build_ext.build_extension(self, ext)
  File "/opt/conda/conda-bld/torchvision_1647881675966/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/lib/python3.7/site-packages/setuptools/_distutils/command/build_ext.py", line 534, in build_extension
    depends=ext.depends)
  File "/opt/conda/conda-bld/torchvision_1647881675966/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 593, in unix_wrap_ninja_compile
    with_cuda=with_cuda)
  File "/opt/conda/conda-bld/torchvision_1647881675966/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1473, in _write_ninja_file_and_compile_objects
    error_prefix='Error compiling objects for extension')
  File "/opt/conda/conda-bld/torchvision_1647881675966/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_p/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1805, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error compiling objects for extension
Traceback (most recent call last):
  File "/opt/conda/bin/conda-build", line 11, in <module>
    sys.exit(main())
  File "/opt/conda/lib/python3.9/site-packages/conda_build/cli/main_build.py", line 488, in main
    execute(sys.argv[1:])
  File "/opt/conda/lib/python3.9/site-packages/conda_build/cli/main_build.py", line 477, in execute
    outputs = api.build(args.recipe, post=args.post, test_run_post=args.test_run_post,

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@malfet malfet requested review from atalman and datumbox March 20, 2022 04:17
datumbox
datumbox previously approved these changes Mar 20, 2022
Copy link
Contributor

@datumbox datumbox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @malfet.

Is this a temporary update to fix the issue or does it mean we will change our recommended channel on the download page at pytorch.org?

@datumbox
Copy link
Contributor

@malfet The issue persists on the cmake_linux_gpu job:
ImportError: libcupti.so.11.3: cannot open shared object file: No such file or directory

The binary_linux_conda_*_cu115 jobs are failing because they use ffmpeg-5.0.0 which deprecates some functionality that we currently use. There is another PR that attempts to fix this at #5644.

I think we should merge this after the cmake job is also fixed and then resolve the rest of the jobs on the other PR.

@datumbox datumbox dismissed their stale review March 21, 2022 09:18

1 job still failing

@malfet malfet force-pushed the malfet/prefer-nvidia-channel-for-conda-builds branch from 33a72da to 876bb30 Compare March 21, 2022 16:52
Copy link
Contributor

@datumbox datumbox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for patching cmake.

Let's merge on "green-ish" CI.

@malfet malfet merged commit fbc8ea4 into main Mar 21, 2022
@malfet malfet deleted the malfet/prefer-nvidia-channel-for-conda-builds branch March 21, 2022 17:58
@github-actions
Copy link

Hey @malfet!

You merged this PR, but no labels were added. The list of valid labels is available at https://github.com/pytorch/vision/blob/main/.github/process_commit.py

@datumbox datumbox added topic: build module: ci other if you have no clue or if you will manually handle the PR in the release notes labels Mar 21, 2022
lezwon pushed a commit to lezwon/vision that referenced this pull request Mar 23, 2022
To mitigate missing `libcupti.so` dependency
pmeier added a commit that referenced this pull request Mar 25, 2022
* added usps dataset

* fixed type issues

* fix mobilnet norm layer test (#5643)

* xfail mobilnet norm layer test

* fix test

* More robust check in tests for 16 bits images (#5652)

* Prefer nvidia channel for conda builds (#5648)

To mitigate missing `libcupti.so` dependency

* fix torchdata CI installation (#5657)

* update urls for kinetics dataset (#5578)

* update urls for kinetics dataset

* update urls for kinetics dataset

* remove errors

* update the changes and add test option to split

* added test to valid values for split arg

* change .txt to .csv for annotation url of k600

Co-authored-by: Nicolas Hug <[email protected]>

* Port Multi-weight support from prototype to main (#5618)

* Moving basefiles outside of prototype and porting Alexnet, ConvNext, Densenet and EfficientNet.

* Porting googlenet

* Porting inception

* Porting mnasnet

* Porting mobilenetv2

* Porting mobilenetv3

* Porting regnet

* Porting resnet

* Porting shufflenetv2

* Porting squeezenet

* Porting vgg

* Porting vit

* Fix docstrings

* Fixing imports

* Adding missing import

* Fix mobilenet imports

* Fix tests

* Fix prototype tests

* Exclude get_weight from models on test

* Fix init files

* Porting googlenet

* Porting inception

* porting mobilenetv2

* porting mobilenetv3

* porting resnet

* porting shufflenetv2

* Fix test and linter

* Fixing docs.

* Porting Detection models (#5617)

* fix inits

* fix docs

* Port faster_rcnn

* Port fcos

* Port keypoint_rcnn

* Port mask_rcnn

* Port retinanet

* Port ssd

* Port ssdlite

* Fix linter

* Fixing tests

* Fixing tests

* Fixing vgg test

* Porting Optical Flow, Segmentation, Video models (#5619)

* Porting raft

* Porting video resnet

* Porting deeplabv3

* Porting fcn and lraspp

* Fixing the tests and linter

* Porting docs, examples, tutorials and galleries (#5620)

* Fix examples, tutorials and gallery

* Update gallery/plot_optical_flow.py

Co-authored-by: Nicolas Hug <[email protected]>

* Fix import

* Revert hardcoded normalization

* fix uncommitted changes

* Fix bug

* Fix more bugs

* Making resize optional for segmentation

* Fixing preset

* Fix mypy

* Fixing documentation strings

* Fix flake8

* minor refactoring

Co-authored-by: Nicolas Hug <[email protected]>

* Resolve conflict

* Porting model tests (#5622)

* Porting tests

* Remove unnecessary variable

* Fix linter

* Move prototype to extended tests

* Fix download models job

* Update CI on Multiweight branch to use the new weight download approach (#5628)

* port Pad to prototype transforms (#5621)

* port Pad to prototype transforms

* use literal

* Bump up LibTorchvision version number for Podspec to release Cocoapods (#5624)

Co-authored-by: Anton Thomma <[email protected]>
Co-authored-by: Vasilis Vryniotis <[email protected]>

* pre-download model weights in CI docs build (#5625)

* pre-download model weights in CI docs build

* move changes into template

* change docs image

* Regenerated config.yml

Co-authored-by: Philip Meier <[email protected]>
Co-authored-by: Anton Thomma <[email protected]>
Co-authored-by: Anton Thomma <[email protected]>

* Porting reference scripts and updating presets (#5629)

* Making _preset.py classes

* Remove support of targets on presets.

* Rewriting the video preset

* Adding tests to check that the bundled transforms are JIT scriptable

* Rename all presets from *Eval to *Inference

* Minor refactoring

* Remove --prototype and --pretrained from reference scripts

* remove  pretained_backbone refs

* Corrections and simplifications

* Fixing bug

* Fixing linter

* Fix flake8

* restore documentation example

* minor fixes

* fix optical flow missing param

* Fixing commands

* Adding weights_backbone support in detection and segmentation

* Updating the commands for InceptionV3

* Setting `weights_backbone` to its fully BC value (#5653)

* Replace default `weights_backbone=None` with its BC values.

* Fixing tests

* Fix linter

* Update docs.

* Update preprocessing on reference scripts.

* Change qat/ptq to their full values.

* Refactoring preprocessing

* Fix video preset

* No initialization on VGG if pretrained

* Fix warning messages for backbone utils.

* Adding star to all preset constructors.

* Fix mypy.

Co-authored-by: Nicolas Hug <[email protected]>
Co-authored-by: Philip Meier <[email protected]>
Co-authored-by: Anton Thomma <[email protected]>
Co-authored-by: Anton Thomma <[email protected]>

* Apply suggestions from code review

Co-authored-by: Philip Meier <[email protected]>

* use decompressor for extracting bz2

* Apply suggestions from code review

Co-authored-by: Philip Meier <[email protected]>

* Apply suggestions from code review

Co-authored-by: Philip Meier <[email protected]>

* fixed lint fails

* added tests for USPS

* check image shape

* fix tests

* check shape on image directly

* Apply suggestions from code review

Co-authored-by: Philip Meier <[email protected]>

* removed test and comments

* Update test/test_prototype_builtin_datasets.py

Co-authored-by: Nicolas Hug <[email protected]>

Co-authored-by: Philip Meier <[email protected]>
Co-authored-by: Nicolas Hug <[email protected]>
Co-authored-by: Nikita Shulga <[email protected]>
Co-authored-by: Sahil Goyal <[email protected]>
Co-authored-by: Vasilis Vryniotis <[email protected]>
Co-authored-by: Anton Thomma <[email protected]>
Co-authored-by: Anton Thomma <[email protected]>
facebook-github-bot pushed a commit that referenced this pull request Apr 5, 2022
Summary:
To mitigate missing `libcupti.so` dependency

(Note: this ignores all push blocking failures!)

Reviewed By: datumbox

Differential Revision: D35216765

fbshipit-source-id: 99e9ac632b08961011b56a6e9b9a9ecce670fe48
facebook-github-bot pushed a commit that referenced this pull request Apr 5, 2022
Summary:
* added usps dataset

* fixed type issues

* fix mobilnet norm layer test (#5643)

* xfail mobilnet norm layer test

* fix test

* More robust check in tests for 16 bits images (#5652)

* Prefer nvidia channel for conda builds (#5648)

To mitigate missing `libcupti.so` dependency

* fix torchdata CI installation (#5657)

* update urls for kinetics dataset (#5578)

* update urls for kinetics dataset

* update urls for kinetics dataset

* remove errors

* update the changes and add test option to split

* added test to valid values for split arg

* change .txt to .csv for annotation url of k600

* Port Multi-weight support from prototype to main (#5618)

* Moving basefiles outside of prototype and porting Alexnet, ConvNext, Densenet and EfficientNet.

* Porting googlenet

* Porting inception

* Porting mnasnet

* Porting mobilenetv2

* Porting mobilenetv3

* Porting regnet

* Porting resnet

* Porting shufflenetv2

* Porting squeezenet

* Porting vgg

* Porting vit

* Fix docstrings

* Fixing imports

* Adding missing import

* Fix mobilenet imports

* Fix tests

* Fix prototype tests

* Exclude get_weight from models on test

* Fix init files

* Porting googlenet

* Porting inception

* porting mobilenetv2

* porting mobilenetv3

* porting resnet

* porting shufflenetv2

* Fix test and linter

* Fixing docs.

* Porting Detection models (#5617)

* fix inits

* fix docs

* Port faster_rcnn

* Port fcos

* Port keypoint_rcnn

* Port mask_rcnn

* Port retinanet

* Port ssd

* Port ssdlite

* Fix linter

* Fixing tests

* Fixing tests

* Fixing vgg test

* Porting Optical Flow, Segmentation, Video models (#5619)

* Porting raft

* Porting video resnet

* Porting deeplabv3

* Porting fcn and lraspp

* Fixing the tests and linter

* Porting docs, examples, tutorials and galleries (#5620)

* Fix examples, tutorials and gallery

* Update gallery/plot_optical_flow.py

* Fix import

* Revert hardcoded normalization

* fix uncommitted changes

* Fix bug

* Fix more bugs

* Making resize optional for segmentation

* Fixing preset

* Fix mypy

* Fixing documentation strings

* Fix flake8

* minor refactoring

* Resolve conflict

* Porting model tests (#5622)

* Porting tests

* Remove unnecessary variable

* Fix linter

* Move prototype to extended tests

* Fix download models job

* Update CI on Multiweight branch to use the new weight download approach (#5628)

* port Pad to prototype transforms (#5621)

* port Pad to prototype transforms

* use literal

* Bump up LibTorchvision version number for Podspec to release Cocoapods (#5624)

* pre-download model weights in CI docs build (#5625)

* pre-download model weights in CI docs build

* move changes into template

* change docs image

* Regenerated config.yml

* Porting reference scripts and updating presets (#5629)

* Making _preset.py classes

* Remove support of targets on presets.

* Rewriting the video preset

* Adding tests to check that the bundled transforms are JIT scriptable

* Rename all presets from *Eval to *Inference

* Minor refactoring

* Remove --prototype and --pretrained from reference scripts

* remove  pretained_backbone refs

* Corrections and simplifications

* Fixing bug

* Fixing linter

* Fix flake8

* restore documentation example

* minor fixes

* fix optical flow missing param

* Fixing commands

* Adding weights_backbone support in detection and segmentation

* Updating the commands for InceptionV3

* Setting `weights_backbone` to its fully BC value (#5653)

* Replace default `weights_backbone=None` with its BC values.

* Fixing tests

* Fix linter

* Update docs.

* Update preprocessing on reference scripts.

* Change qat/ptq to their full values.

* Refactoring preprocessing

* Fix video preset

* No initialization on VGG if pretrained

* Fix warning messages for backbone utils.

* Adding star to all preset constructors.

* Fix mypy.

* Apply suggestions from code review

* use decompressor for extracting bz2

* Apply suggestions from code review

* Apply suggestions from code review

* fixed lint fails

* added tests for USPS

* check image shape

* fix tests

* check shape on image directly

* Apply suggestions from code review

* removed test and comments

* Update test/test_prototype_builtin_datasets.py

(Note: this ignores all push blocking failures!)

Reviewed By: datumbox

Differential Revision: D35216783

fbshipit-source-id: 556a63a89f15d1541ac2b479244a7b6c564eff14

Co-authored-by: Nicolas Hug <[email protected]>
Co-authored-by: Nicolas Hug <[email protected]>
Co-authored-by: Nicolas Hug <[email protected]>
Co-authored-by: Anton Thomma <[email protected]>
Co-authored-by: Vasilis Vryniotis <[email protected]>
Co-authored-by: Philip Meier <[email protected]>
Co-authored-by: Anton Thomma <[email protected]>
Co-authored-by: Anton Thomma <[email protected]>
Co-authored-by: Nicolas Hug <[email protected]>
Co-authored-by: Philip Meier <[email protected]>
Co-authored-by: Anton Thomma <[email protected]>
Co-authored-by: Anton Thomma <[email protected]>
Co-authored-by: Philip Meier <[email protected]>
Co-authored-by: Philip Meier <[email protected]>
Co-authored-by: Philip Meier <[email protected]>
Co-authored-by: Philip Meier <[email protected]>
Co-authored-by: Nicolas Hug <[email protected]>
Co-authored-by: Philip Meier <[email protected]>
Co-authored-by: Nicolas Hug <[email protected]>
Co-authored-by: Nikita Shulga <[email protected]>
Co-authored-by: Sahil Goyal <[email protected]>
Co-authored-by: Vasilis Vryniotis <[email protected]>
Co-authored-by: Anton Thomma <[email protected]>
Co-authored-by: Anton Thomma <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/default cla signed module: ci other if you have no clue or if you will manually handle the PR in the release notes topic: build
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ImportError: libcupti.so.10.2: cannot open shared object file: No such file or directory
3 participants