Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raise informative missing s3 extra #7501

Merged
merged 7 commits into from
Apr 13, 2022
Merged

Conversation

nsorros
Copy link
Contributor

@nsorros nsorros commented Mar 25, 2022

As described in #6870 when s3 extra missing dvc does not always raise the informative message that it should be installed. This has to do with catching a generic DvcException and adding a debug message which I am overwriting here in that case to allow the Exception to propagate.

I am not sure this is the right place to catch the exception and additionally this might affect other cases as well which I am happy to address like non directory based hashes. Assuming this is the right place to catch the exception, happy to work on adding a test for this.

Fixes #6870 ?

Thank you for the contribution - we'll try to review it as soon as possible. 🙏

@nsorros nsorros requested a review from a team as a code owner March 25, 2022 08:40
@nsorros nsorros requested a review from pared March 25, 2022 08:40
@nsorros nsorros marked this pull request as draft March 25, 2022 08:41
@nsorros nsorros marked this pull request as ready for review March 25, 2022 08:53
@pared
Copy link
Contributor

pared commented Mar 31, 2022

@nsorros I am sorry for the late response. Could you verify that the problem still exists on current main? I am trying to reproduce and I am getting a seemingly proper error:

Everything is up to date.                                             
ERROR: failed to pull data from the cloud - URL 's3://' is supported but requires these missing dependencies: ['s3fs', 'boto3']. To install dvc with those dependencies, run:

        pip install 'dvc[s3]'

When running with dvc installed from this branch I get:

Everything is up to date.                                             
ERROR: failed to pull data from the cloud - URL 's3://' is supported but requires these missing dependencies: ['s3fs', 'boto3']. Please report this bug to <https://github.com/iterative/dvc/issues>. Thank you!

So both seem ok, though the problem is still Everything is up to date, which should not show up.

@nsorros
Copy link
Contributor Author

nsorros commented Mar 31, 2022

@nsorros I am sorry for the late response. Could you verify that the problem still exists on current main? I am trying to reproduce and I am getting a seemingly proper error:

Everything is up to date.                                             
ERROR: failed to pull data from the cloud - URL 's3://' is supported but requires these missing dependencies: ['s3fs', 'boto3']. To install dvc with those dependencies, run:

        pip install 'dvc[s3]'

When running with dvc installed from this branch I get:

Everything is up to date.                                             
ERROR: failed to pull data from the cloud - URL 's3://' is supported but requires these missing dependencies: ['s3fs', 'boto3']. Please report this bug to <https://github.com/iterative/dvc/issues>. Thank you!

So both seem ok, though the problem is still Everything is up to date, which should not show up.

I am getting the same error using the latest main which I install using pip install git+https://github.com/iterative/dvc.git. The message I get is

WARNING: No file hash info found for '/Users/nsorros/code/test/issue/data'. It won't be created.
1 file failed
ERROR: failed to pull data from the cloud - Checkout failed for following targets:
/Users/nsorros/code/test/issue/data
Is your cache up to date?
<https://error.dvc.org/missing-files>

If you have trouble reproducing this as described in the issue, maybe I could provide a more complete script?

@pared
Copy link
Contributor

pared commented Mar 31, 2022

@nsorros I would be thankful. Mine looks like this:

#!/bin/bash

set -exu

wsp=test_wspace
rep=test_repo

pushd $TMPDIR
rm -rf $wsp && mkdir $wsp && pushd $wsp
mkdir $rep && pushd $rep

git init
dvc init

echo data >> data

pip install "dvc[s3]"

dvc add data
dvc remote add storage -d s3://prd-dvc-test/cache
git add -A
git commit -am "init"

dvc push

popd

git clone test_repo clone
pushd clone

pip uninstall -y s3fs boto3 dvc
pip install git+https://github.com/iterative/dvc.git
# dvc cache dir $TMPDIR/tmp-cache
dvc pull

And the error:

ERROR: failed to pull data from the cloud - URL 's3://' is supported but requires these missing dependencies: ['s3fs', 'boto3']. Please report this bug to <https://github.com/iterative/dvc/issues>. Thank you!

@nsorros
Copy link
Contributor Author

nsorros commented Apr 1, 2022

This is a script to reproduce the problem

#!/bin/bash

set -e

mkdir issue
cd issue
python -m venv venv
venv/bin/pip install dvc[s3]
git init
venv/bin/dvc init
mkdir data
touch data/data.csv
venv/bin/dvc add data
venv/bin/dvc remote add remote s3://dvc-issue
git add .
git commit -m "Add issue"
venv/bin/dvc push -r remote
cd ..
git clone issue test_issue
cd test_issue
python -m venv venv
venv/bin/pip install dvc
venv/bin/dvc pull

this gives me

ERROR: failed to pull data from the cloud - Checkout failed for following targets:
/Users/nsorros/code/test/repro_issue/test_issue/data
Is your cache up to date?
<https://error.dvc.org/missing-files>

@pared
Copy link
Contributor

pared commented Apr 1, 2022

@nsorros thanks, right, I didn't make a dir, but a file, that's the problem. I can reproduce. Reviewing...

@nsorros
Copy link
Contributor Author

nsorros commented Apr 1, 2022

@nsorros thanks, right, I didn't make a dir, but a file, that's the problem. I can reproduce. Reviewing...

Indeed, I just tested with a file and the error is the correct one, so this seems to affect directories only.

@pared
Copy link
Contributor

pared commented Apr 1, 2022

Ok, so indeed, I can confirm this fix improves error message raised upon download of dir, when dependencies for remote is missing. It would be great to add a test for that. However, this alone will not fix all aspects of #6870, as another problem is that we are displaying "Everythin is up to date".

@nsorros would you be willing to write a test?

I think we could close #6870 with this PR, and create a new issue, regarding the "Everything is up to date", as its not related to original problem of #6870, which was unclear messaging for lacking deps.

@nsorros
Copy link
Contributor Author

nsorros commented Apr 1, 2022

Ok, so indeed, I can confirm this fix improves error message raised upon download of dir, when dependencies for remote is missing. It would be great to add a test for that. However, this alone will not fix all aspects of #6870, as another problem is that we are displaying "Everythin is up to date".

@nsorros would you be willing to write a test?

I think we could close #6870 with this PR, and create a new issue, regarding the "Everything is up to date", as its not related to original problem of #6870, which was unclear messaging for lacking deps.

Happy to add a test, I assume this would go in tests/unit/output/test_output.py? It would be great to receive some guidance as it seems quite complicated by looking at it as multiple things may need patching. Is there another test I can use as a starting point?

@pared
Copy link
Contributor

pared commented Apr 1, 2022

@nsorros So what we need to do would be to make a test that would verify, that upon trying to fetch a dir, RemoteMissing error actually gets passed, instead of being skipped. So what I would be aiming to do would be:

  1. generate a dir data (search for tmp_dir.dvc_gen usages)
  2. remove the cache and created dir
  3. configure any remote (search for with dvc.config.edit())
  4. patch _check_requires from FileSystem to raise RemoteMissing error
  5. verify that when trying to dvc.fetch() this error rises.

@nsorros
Copy link
Contributor Author

nsorros commented Apr 5, 2022

I made a start on the test but I am slightly confused as to why it passes at this point instead of raising an exception. I am including the test below

def test_remote_missing_data_dir(tmp_dir, scm, dvc, make_remote):
     make_remote("default", default=True)
     tmp_dir.dvc_gen({"dir": {"subfile": "file2 content"}}, commit="add dir")
     dvc.push()

     remove("dir")
     remove(dvc.odb.local.cache_dir)

     with patch(
         "dvc.fs.base.FileSystem._check_requires",
         side_effect=RemoteMissingDepsError,
     ):
         dvc.pull

I will keep investigating, let me know if you have any thoughts @pared

@nsorros nsorros force-pushed the fix-pull-missing-s3 branch from 0bc66e0 to 844713c Compare April 5, 2022 17:23
@nsorros
Copy link
Contributor Author

nsorros commented Apr 5, 2022

I made a start on the test but I am slightly confused as to why it passes at this point instead of raising an exception. I am including the test below

def test_remote_missing_data_dir(tmp_dir, scm, dvc, make_remote):
     make_remote("default", default=True)
     tmp_dir.dvc_gen({"dir": {"subfile": "file2 content"}}, commit="add dir")
     dvc.push()

     remove("dir")
     remove(dvc.odb.local.cache_dir)

     with patch(
         "dvc.fs.base.FileSystem._check_requires",
         side_effect=RemoteMissingDepsError,
     ):
         dvc.pull

I will keep investigating, let me know if you have any thoughts @pared

Resolved dvc.pull was missing a parenthesis 😂

Copy link
Contributor

@pared pared left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nsorros
Something is off, the codecov says the fix has not been tested and the test passes regardless if fix is in place or not, seems like my suggestion how to mock it was not that good, I'll take a look.

dvc/output.py Outdated Show resolved Hide resolved
@nsorros nsorros force-pushed the fix-pull-missing-s3 branch from 4b8f1b1 to 69138aa Compare April 6, 2022 08:04
@nsorros
Copy link
Contributor Author

nsorros commented Apr 6, 2022

@nsorros
Something is off, the codecov says the fix has not been tested and the test passes regardless if fix is in place or not, seems like my suggestion how to mock it was not that good, I'll take a look.

Patching dvc.data_cloud.DataCloud.get_remote_odb should do it. I think _chech_requires is also checked for the local file system before looking for remote.

@pared
Copy link
Contributor

pared commented Apr 6, 2022

Yeah, it's called whenever fs is created, so basically any operation on filesystem :) My bad.

Copy link
Contributor

@pared pared left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should move the test to
tests/unit/output/test_output.py.

Comment on lines 586 to 589
with patch(
"dvc.data_cloud.DataCloud.get_remote_odb",
side_effect=RemoteMissingDepsError("remote missing"),
):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To maintain consistency between tests I would suggest using mocker fixture.
then just mocker.patch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nsorros
Copy link
Contributor Author

nsorros commented Apr 7, 2022

I think we should move the test to
tests/unit/output/test_output.py.

Addressed in 8a37f70

@nsorros
Copy link
Contributor Author

nsorros commented Apr 11, 2022

Anything else I need to do on this PR @pared? What's the process for merging?

Copy link
Contributor

@pared pared left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nsorros sorry, did not have time on friday to check that one.

@pared pared requested a review from a team April 12, 2022 09:41
@nsorros
Copy link
Contributor Author

nsorros commented Apr 13, 2022

I am assuming someone from DVC will merge this as merging seems to be restricted to authorised users. Let me know if you need me to do anything else on this.

@pared
Copy link
Contributor

pared commented Apr 13, 2022

@nsorros yeah, I will wait till tomorrow if one more person finds time to check it, if not I will merge it.

@pared pared merged commit dd9bc11 into iterative:main Apr 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pull: Unclear error message when remote support is not installed
3 participants