Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

push: not updating anything on the remote #50

Closed
rlleshi opened this issue May 24, 2023 · 11 comments · Fixed by iterative/dvc#9524 or #51
Closed

push: not updating anything on the remote #50

rlleshi opened this issue May 24, 2023 · 11 comments · Fixed by iterative/dvc#9524 or #51
Assignees
Labels
bug Something isn't working

Comments

@rlleshi
Copy link

rlleshi commented May 24, 2023

Bug Report

dvc push is not really pushing newly changed files remotely even though it confirms the changes.

Description

The remote is an Azure blob storage that has versioning enabled.

When I do dvc push I do get the confirmation 1 file pushed but in the end, nothing has been pushed to the remote blob storage.

I can confirm this visually by browsing the files in the blob container (have enabled version_aware and can see that the modified timestamp corresponds to the old files) but also by trying dvc pull on another repo.

Reproduce

  1. Set remote to azure blob: dvc remote add -d my_azure azure://my-blob/
  2. dvc repro & dvc push
  3. Make changes
  4. dvc repro & dvc push again
  5. No changes are actually pushed to the remote though dvc.lock is updated accordingly

Expected

Files should be updated on remote.

Environment information

❯ dvc doctor
DVC version: 2.57.0 (pip)
-------------------------
Platform: Python 3.10.6 on Linux-5.10.102.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
Subprojects:
        dvc_data = 0.51.0
        dvc_objects = 0.22.0
        dvc_render = 0.5.2
        dvc_task = 0.2.1
        scmrepo = 1.0.3
Supports:
        azure (adlfs = 2023.4.0, knack = 0.10.1, azure-identity = 1.13.0),
        http (aiohttp = 3.8.4, aiohttp-retry = 2.8.3),
        https (aiohttp = 3.8.4, aiohttp-retry = 2.8.3)
Config:
        Global: /home/user/.config/dvc
        System: /etc/xdg/dvc
Cache types: hardlink, symlink
Cache directory: ext4 on /dev/sdb
Caches: local
Remotes: azure
Workspace directory: ext4 on /dev/sdb
Repo: dvc, git
Repo.site_cache_dir: /var/tmp/dvc/repo/32582e8b1552224ea25e5d697a41250a
@dberenbaum
Copy link

I'm able to confirm that I can push new files but modified files aren't being uploaded.

@dberenbaum dberenbaum added bug Something isn't working p0-critical Handle immediately labels May 26, 2023
@dberenbaum
Copy link

Simple reproduction:

$ dvc remote add -d myremote azure://storage/test
$ dvc remote modify myremote version_aware true
$ dvc remote modify myremote --local connection_string ...
$ echo test > test.txt
$ dvc add test.txt
$ dvc push
1 file pushed
$ echo test2 > test.txt
$ dvc add test.txt
$ dvc push
1 file pushed
$ az storage blob list --container-name storage --prefix test/test.txt --account-name dberenbaum --include v --query "[[].name, [].versionId]"
[
  [
    "test/test.txt"
  ],
  [
    "2023-05-26T17:22:47.6495641Z"
  ]
]

When I manually upload, I can see both versions:

$ az storage blob upload -f test.txt --overwrite -c storage -n test/test.txt --account-name dberenbaum
$ az storage blob list --container-name storage --prefix test/test.txt --account-name dberenbaum --include v --query "[[].name, [].versionId]"
[
  [
    "test/test.txt",
    "test/test.txt"
  ],
  [
    "2023-05-26T17:22:47.6495641Z",
    "2023-05-26T17:55:22.3061212Z"
  ]
]

@dberenbaum
Copy link

It looks like dvc push is actually reverting to the previous/already pushed version. It reverts all info in the .dvc file, even for the local md5.

$ dvc remote add -d myremote azure://storage/test
$ dvc remote modify myremote version_aware true
$ dvc remote modify myremote --local connection_string ...
$ echo test > test.txt
$ dvc add test.txt
$ dvc push
$ cat test.txt.dvc
outs:
- md5: d8e8fca2dc0f896fd7cb4cb0031ba249
  size: 5
  path: test.txt
  cloud:
    myremote:
      etag: '"0x8DB5E145F3B323E"'
      version_id: '2023-05-26T18:09:39.3129022Z'
$ echo test2 > test.txt
$ dvc add test.txt
$ cat test.txt.dvc
outs:
- md5: 126a8a51b9d1bbd07fddc65819a542c3
  size: 6
  path: test.txt
$ dvc push
$ cat test.txt.dvc
outs:
- md5: 126a8a51b9d1bbd07fddc65819a542c3
  size: 5
  path: test.txt
  cloud:
    myremote:
      etag: '"0x8DB5E145F3B323E"'
      version_id: '2023-05-26T18:09:39.3129022Z'
$ cat test.txt
test2

@dberenbaum
Copy link

Also confirmed that this is specific to azure

@efiop
Copy link
Contributor

efiop commented May 26, 2023

Might be related to dvc-data's checkout changes. If that is the case, downgrading to dvc 2.56.0 (make sure pip check doesn't complain about a newer dvc-data) should fix it.

@dberenbaum
Copy link

Good thought, but unfortunately I don't see any difference:

DVC version: 2.56.0
-------------------
Platform: Python 3.10.10 on macOS-13.3.1-arm64-arm-64bit
Subprojects:
        dvc_data = 0.47.5
        dvc_objects = 0.22.0
        dvc_render = 0.3.1
        dvc_task = 0.2.1
        scmrepo = 1.0.3
Supports:
        azure (adlfs = 2023.4.0, knack = 0.10.1, azure-identity = 1.12.0),
        gdrive (pydrive2 = 1.15.3),
        gs (gcsfs = 2022.11.0),
        hdfs (fsspec = 2022.11.0, pyarrow = 11.0.0),
        http (aiohttp = 3.7.4.post0, aiohttp-retry = 2.8.3),
        https (aiohttp = 3.7.4.post0, aiohttp-retry = 2.8.3),
        oss (ossfs = 2021.8.0),
        s3 (s3fs = 2022.11.0, boto3 = 1.24.59),
        ssh (sshfs = 2023.4.1),
        webdav (webdav4 = 0.9.8),
        webdavs (webdav4 = 0.9.8),
        webhdfs (fsspec = 2022.11.0)
Config:
        Global: /Users/dave/Library/Application Support/dvc
        System: /Library/Application Support/dvc
Cache types: reflink, hardlink, symlink
Cache directory: apfs on /dev/disk3s1s1
Caches: local
Remotes: local, azure
Workspace directory: apfs on /dev/disk3s1s1
Repo: dvc, git
Repo.site_cache_dir: /Library/Caches/dvc/repo/6414c2ff59800d13bb7dc40946807396

@rlleshi
Copy link
Author

rlleshi commented May 30, 2023

Can confirm, downgrading to 2.56.0 doesn't solve it.

@daavoo daavoo self-assigned this May 30, 2023
@daavoo
Copy link
Contributor

daavoo commented May 31, 2023

I have reproduced the issue and I am taking a look

@daavoo
Copy link
Contributor

daavoo commented May 31, 2023

The bug comes from c1b4fd2

You can install the previous version of dvc-azure as a workaround for now:

pip install 'dvc-azure<2.21.1'

The problem is that we stopped correctly wrapping the methods (wrapping the private methods appears to not be working as expected cc @pmrowla )

For put_file, we stopped passing overwrite=True. In the case of cloud versioning this always raises a FileExistError, which is just ignored during transfer (I assume from the comment that this is intentional for cache operations but probably not in this case):

https://github.com/iterative/dvc-objects/blob/33e4d8f0e9247569905b859a808eb4d8e9c02ffb/src/dvc_objects/fs/generic.py#L314-L328

@daavoo
Copy link
Contributor

daavoo commented May 31, 2023

Upgrading dvc-azure fixes the issue:

pip install --upgrade dvc-azure

The bump will be included in the next release.

daavoo referenced this issue in iterative/dvc May 31, 2023
mergify bot referenced this issue in iterative/dvc May 31, 2023
Closes #9506 via iterative/dvc-azure#48

(cherry picked from commit a1b891c)
daavoo referenced this issue in iterative/dvc May 31, 2023
Closes #9506 via iterative/dvc-azure#48

(cherry picked from commit a1b891c)
@pmrowla pmrowla reopened this Jun 21, 2023
@pmrowla
Copy link
Contributor

pmrowla commented Jun 21, 2023

The dvc-azure changes for this issue only resolve the issue in specific cases, and reintroduce the problem in other cases.

see: https://github.com/iterative/dvc-azure/pull/48/files#r1236138314

discord context: https://discord.com/channels/485586884165107732/485596304961962003/1120799820605575209

@pmrowla pmrowla transferred this issue from iterative/dvc Jun 21, 2023
@pmrowla pmrowla assigned pmrowla and unassigned daavoo Jun 21, 2023
@pmrowla pmrowla removed the p0-critical Handle immediately label Jun 21, 2023
@github-project-automation github-project-automation bot moved this from Todo to Done in DVC Jun 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
No open projects
Archived in project
5 participants