Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot pass API tokens to data import scripts #69

Open
mjugl opened this issue Dec 12, 2024 · 1 comment
Open

Cannot pass API tokens to data import scripts #69

mjugl opened this issue Dec 12, 2024 · 1 comment

Comments

@mjugl
Copy link

mjugl commented Dec 12, 2024

We deployed cBioPortal with Keycloak for user auth and token-based data access. I have successfully tested that the issued tokens can be used to access the cBioPortal API on the command line. Importing studies however via the validateData.py and metaImport.py scripts errors out because there is no way to pass an API token to the Authorization header used in all requests against the cBioPortal API.

These are the logs that I encounter. Confidential URLs have been replaced.

validateData.py -u https://######## -s study/minimal_example/ -v
DEBUG: -: Requesting info from portal at 'https://########'
Traceback (most recent call last):
  File "/usr/local/bin/validateData.py", line 5132, in request_from_portal_api
    response.raise_for_status()
  File "/usr/local/lib/python3.10/dist-packages/requests/models.py", line 940, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://########/api/info

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/bin/validateData.py", line 5674, in <module>
    exit_status = main_validate(parsed_args)
  File "/usr/local/bin/validateData.py", line 5647, in main_validate
    portal_instance = load_portal_info(server_url, logger)
  File "/usr/local/bin/validateData.py", line 5280, in load_portal_info
    parsed_json = request_from_portal_api(path, api_name, logger)
  File "/usr/local/bin/validateData.py", line 5134, in request_from_portal_api
    raise ConnectionError(
ConnectionError: Failed to fetch metadata from the portal at [https://########/api/info]

This is the snippet that causes the issue. Note how neither auth nor headers are specified in requests.get.

logger.debug("Requesting %s from portal at '%s'",
api_name, server_url)
# this may raise a requests.exceptions.RequestException subclass,
# usually because the URL provided on the command line was invalid or
# did not include the http:// part
response = requests.get(service_url)
try:
response.raise_for_status()
except requests.exceptions.HTTPError as e:
raise ConnectionError(
'Failed to fetch metadata from the portal at [{}]'.format(service_url)
) from e

And just to be sure, here's the output to show that manually adding auth headers would fix the issue.

curl https://########/api/info -H "Accept: application/json" -H "Authorization: Bearer ########" | jq '.'
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   570  100   570    0     0  12038      0 --:--:-- --:--:-- --:--:-- 12127
{
  "portalVersion": "5.4.2-dirty-SNAPSHOT",
  "dbVersion": "2.13.1",
  "gitBranch": "bf8fac6040e80da43cf652827df8fe050c9258ee",
  "gitCommitId": "bf8fac6040e80da43cf652827df8fe050c9258ee",
  "gitCommitIdDescribe": "v5.4.2-dirty",
  "gitCommitIdDescribeShort": "v5.4.2-dirty",
  "gitCommitMessageFull": "Merge pull request #10392 from cBioPortal/frontend-v5.4.2\n\nFrontend v5.4.2",
  "gitCommitMessageShort": "Merge pull request #10392 from cBioPortal/frontend-v5.4.2",
  "gitCommitMessageUserEmail": "[email protected]",
  "gitCommitMessageUserName": "Gaofei Zhao",
  "gitDirty": true
}

I'm aware we're running an older version of cBioPortal, hence why I checked against the main branch of this repository to see that the issue is still there.

I suggest that the API token should either be read from an environment variable, or a command line option, or ideally both with the environment variable taking precedence.

@mjugl
Copy link
Author

mjugl commented Dec 12, 2024

I implemented a quick workaround by patching validateData.py as follows.

    logger.debug("Requesting %s from portal at '%s'",
                api_name, server_url)
    # this may raise a requests.exceptions.RequestException subclass,
    # usually because the URL provided on the command line was invalid or
    # did not include the http:// part
-   response = requests.get(service_url) 
+   headers = {}

+   if os.environ.get("CBP_API_TOKEN") is not None:
+       headers["Authorization"] = f"Bearer {os.environ.get('CBP_API_TOKEN')}"
+   response = requests.get(service_url, headers=headers)
    try:
        response.raise_for_status()
    except requests.exceptions.HTTPError as e:
        raise ConnectionError(
            'Failed to fetch metadata from the portal at [{}]'.format(service_url, headers=headers)
        ) from e

The script can then be called with CBP_API_TOKEN=XXX_token_goes_here_XXX validateData.py.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant