Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

push: gdrive_service_account_json_file_path must be set for GDRIVE_CREDENTIALS_DATA to work #19

Closed
0x2b3bfa0 opened this issue Jun 25, 2021 · 5 comments
Labels
bug Something isn't working

Comments

@0x2b3bfa0
Copy link
Member

0x2b3bfa0 commented Jun 25, 2021

Description

The GDRIVE_CREDENTIALS_DATA environment variable won't be honored unless the gdrive_service_account_json_file_path configuration option is set beforehand. In most of the habitual use cases, this requirement doesn't make too much sense from the user standpoint.

Is this the recommended way of passing Google Drive (Service Account) credentials through an environment variable?

Additionally, after reading the pertaining documentation, I don't understand why I can't use the GDRIVE_CREDENTIALS_DATA environment variable to pass the DVC-specific .dvc/tmp/gdrive-user-credentials.json file contents, instead of having to pass the raw service account file contents.

How to reproduce

The following GitHub Actions workflow provides a self-contained example to reproduce this issue. Uncommenting the only line prefixed with a hash # will make it work as expected.

on: push
jobs:
  run:
    runs-on: ubuntu-latest
    steps:
      - uses: iterative/setup-dvc@v1
      - run: |
          git init
          dvc init
          
          dvc remote add origin gdrive://root
          dvc remote modify origin --local gdrive_use_service_account true
          # dvc remote modify origin --local gdrive_service_account_json_file_path /dev/null

          date > file
          dvc add file

          dvc push --verbose --remote origin
        env:
          GDRIVE_CREDENTIALS_DATA: ${{ secrets.ORIGINAL_SERVICE_ACCOUNT_JSON }}

Note: the ${{ secrets.ORIGINAL_SERVICE_ACCOUNT_JSON }} variable represents a GCP Service Account file in the standard JSON format provided by Google, not a DVC credentials JSON file.

Actual output

$ dvc push --verbose --remote origin
2021-06-25 22:04:09,030 ERROR: failed to push data to the cloud - To use service account, set `gdrive_service_account_json_file_path`, and optionally`gdrive_service_account_user_email` in DVC config
Learn more at <https://man.dvc.org/remote/modify>
------------------------------------------------------------
Traceback (most recent call last):
  File "dvc/command/data_sync.py", line 57, in run
  File "dvc/repo/__init__.py", line 51, in wrapper
  File "dvc/repo/push.py", line 44, in push
  File "dvc/data_cloud.py", line 77, in push
  File "dvc/data_cloud.py", line 38, in get_remote
  File "dvc/data_cloud.py", line 58, in _init_remote
  File "dvc/remote/__init__.py", line 8, in get_remote
  File "dvc/fs/gdrive.py", line 128, in __init__
  File "dvc/fs/gdrive.py", line 155, in _validate_config
dvc.exceptions.DvcException: To use service account, set `gdrive_service_account_json_file_path`, and optionally`gdrive_service_account_user_email` in DVC config
Learn more at <https://man.dvc.org/remote/modify>
2021-06-25 22:04:09,033 DEBUG: Analytics is enabled.
------------------------------------------------------------
2021-06-25 22:04:09,076 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', '/tmp/tmp8tbexj0k']'
2021-06-25 22:04:09,077 DEBUG: Spawned '['daemon', '-q', 'analytics', '/tmp/tmp8tbexj0k']'

Expected output

$ dvc push --verbose --remote origin
2021-06-25 22:17:58,441 DEBUG: Preparing to upload data to 'gdrive://root'
2021-06-25 22:17:58,441 DEBUG: Preparing to collect status from gdrive://root
2021-06-25 22:17:58,441 DEBUG: Collecting information from local cache...
2021-06-25 22:17:58,441 DEBUG: Collecting information from remote cache...
2021-06-25 22:17:58,441 DEBUG: Matched '0' indexed hashes
2021-06-25 22:17:58,442 DEBUG: Querying 1 hashes via object_exists
2021-06-25 22:17:58,672 DEBUG: GDrive remote auth with config '***'client_config_backend': 'settings', 'client_config_file': 'client_secrets.json', 'save_credentials': True, 'oauth_scope': ['https://www.googleapis.com/auth/drive', 'https://www.googleapis.com/auth/drive.appdata'], 'save_credentials_backend': 'file', 'save_credentials_file': '/home/runner/work/test/test/.dvc/tmp/.5VsCA2nJs9ZQgeBZ8pWNAk.tmp', 'get_refresh_token': True, 'service_config': ***'client_user_email': None, 'client_json_file_path': '/home/runner/work/test/test/.dvc/tmp/.mpgSFHPn7NBgMe35nngenb.tmp'***'.
2021-06-25 22:17:59,308 DEBUG: Uploading '.dvc/cache/5d/afc8549edbdc83c20f11cbfde93cc4' to 'gdrive://root/5d/afc8549edbdc83c20f11cbfde93cc4'
1 file pushed
2021-06-25 22:18:01,059 DEBUG: Analytics is enabled.
2021-06-25 22:18:01,113 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', '/tmp/tmp0ocbbm6a']'
2021-06-25 22:18:01,116 DEBUG: Spawned '['daemon', '-q', 'analytics', '/tmp/tmp0ocbbm6a']'

Environment information

$ dvc doctor
DVC version: 2.4.1 (deb)
---------------------------------
Platform: Python 3.8.10 on Linux-5.8.0-1033-azure-x86_64-with-glibc2.7
Supports: All remotes
Cache types: <https://error.dvc.org/no-dvc-cache>
Caches: local
Remotes: gdrive
Workspace directory: ext4 on /dev/root
Repo: dvc, git

Additional information

https://github.com/iterative/dvc/blob/4e792ae61c5927ab2e5f6a6914d985d43aa705b4/dvc/fs/gdrive.py#L128

https://github.com/iterative/dvc/blob/4e792ae61c5927ab2e5f6a6914d985d43aa705b4/dvc/fs/gdrive.py#L149-L162

See also

@pmrowla
Copy link
Contributor

pmrowla commented Jun 26, 2021

With the current gdrive remote implementation, for service accounts, gdrive_use_service_account and gdrive_service_account_json_file_path are both required

GDRIVE_CREDENTIALS_DATA is only valid for user account credentials, not service accounts.

This needs to be more explicit in our docs, but it does say it here:

Alternatively, a GDRIVE_CREDENTIALS_DATA can be set to pass user credentials in CI/CD systems, production setup, read-only file systems, etc.


In addition to clarifying the docs, it would probably be better for us to support a separate env var for passing service account credentials into DVC

@0x2b3bfa0
Copy link
Member Author

0x2b3bfa0 commented Jun 26, 2021

With the current gdrive remote implementation, for service accounts, gdrive_use_service_account and gdrive_service_account_json_file_path are both required.

Exactly! Failing to provide the latter would raise the DvcException above: «To use service account, set gdrive_service_account_json_file_path [...] in DVC config»

GDRIVE_CREDENTIALS_DATA is only valid for user account credentials, not service accounts.

Surprisingly, GDRIVE_CREDENTIALS_DATA also works for service accounts as I pointed out above:

  1. Set gdrive_service_account_json_file_path to any readable file; even /dev/null would suffice
  2. Pass the original Google Service Account JSON contents through the same environment variable

Alternatively, a GDRIVE_CREDENTIALS_DATA can be set to pass user credentials in CI/CD systems, production setup, read-only file systems, etc.

Yes, it would need a bit of clarification. This cursory mention can be easily overlooked by occasional readers. Especially, utilizing user credentials as opposed to service account credentials without any context could be slightly misleading.

In addition to clarifying the docs, it would probably be better for us to support a separate env var for passing service account credentials into DVC

As an unwary user, I'd probably expect the same environment variable to serve both purposes.

Diagram

@pmrowla pmrowla added the bug Something isn't working label Jun 26, 2021
@pmrowla
Copy link
Contributor

pmrowla commented Jun 26, 2021

In addition to the docs issues, this is the bug:

Surprisingly, GDRIVE_CREDENTIALS_DATA also works for service accounts as I pointed out above:

Set gdrive_service_account_json_file_path to any readable file; even /dev/null would suffice

@isidentical
Copy link

See also #20

@daavoo
Copy link
Contributor

daavoo commented Apr 28, 2022

I might be missing something but I think this is fixed by iterative/dvc#7213 . Could you confirm @0x2b3bfa0 ?

@efiop efiop transferred this issue from iterative/dvc Jan 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants