Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incompatibility of pybiomart and omnipath #54

Closed
PauBadiaM opened this issue Feb 16, 2024 · 3 comments
Closed

Incompatibility of pybiomart and omnipath #54

PauBadiaM opened this issue Feb 16, 2024 · 3 comments
Assignees

Comments

@PauBadiaM
Copy link
Member

Hi @deeenes, there seems to be a weird bug regarding pybiomart and omnipath. In the beginning of the pseudobulk vignette of decoupler I use pybiomart to retrieve gene symbols and then later I import collectri. The problem is that when I run the gene symbol query, it generates a hidden file called .pybiomart.sqlite which triggers a huge amount of warnings for omnipath. This used to not be the case since you can pass the argument use_cache=False but it still creates the file anyways now, I've opened an issue about it scverse/scanpy#2861

I tried deleting the file before loading collectri but omnipath still spams warnings and falls to the static tables. The only way to make it work is to load first collectri and then biomart, which breaks the flow of the vignette. Is there anything that can be done from omnipath to solve this? Any clue what might be causing it?

You can reproduce the error by running:

import scanpy as sc
import decoupler as dc

annot = sc.queries.biomart_annotations(
    'hsapiens',
    ['ensembl_gene_id', 'external_gene_name'],
    use_cache=False
)

collectri = dc.get_collectri(organism='human', split_complexes=False)

Thanks!

@deeenes deeenes self-assigned this Feb 17, 2024
@deeenes
Copy link
Member

deeenes commented Feb 17, 2024

Hi Pau,

I think it has nothing to do with the sqlite file (that would be indeed surprising), the most likely I can think about is pybiomart doing something with urllib3 parameters. A more minimal example to reproduce the issue:

import pybiomart
import omnipath as op

Already import time omnipath downloads some resource metadata, hence this is enough to trigger the Connection broken: IncompleteRead errors. The only condition is that pybiomart has to be imported first, so it has a chance to do something. The question is, what is this something.

Also, this seems to be a pybiomart issue, I would open an issue in their repo. I'm not sure if the project is active, it hasn't been updated for 7 years. I found that the setup of requests_cache triggers the issue:

import requests_cache
requests_cache.install_cache('foobar')
import omnipath as op

I'm not familiar with requests_cache, but the omnipath module relies entirely on requests (an API for urllib3). I found a few similar issues, for example this one. Developers at requests claim they've already fixed the issue. Indeed, the tracebacks point to requests_cache, specifically, the part requests_cache/models/raw_response.py", line 77, in from_response in the second traceback (the first one is the original urllib3 error, the third one is omnipath's error handling, the second one happens in requests_cache):

WARNING:root:Failed to download from `http://no-tls.omnipathdb.org/`.
WARNING:root:Traceback (most recent call last):
  File "/home/denes/omnipath_python/venv/lib/python3.11/site-packages/urllib3/response.py", line 738, in _error_catcher
    yield
  File "/home/denes/omnipath_python/venv/lib/python3.11/site-packages/urllib3/response.py", line 875, in _raw_read
    raise IncompleteRead(self._fp_bytes_read, self.length_remaining)
urllib3.exceptions.IncompleteRead: IncompleteRead(-196 bytes read, -196 more expected)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/denes/omnipath_python/venv/lib/python3.11/site-packages/requests/models.py", line 816, in generate
    yield from self.raw.stream(chunk_size, decode_content=True)
  File "/home/denes/omnipath_python/venv/lib/python3.11/site-packages/urllib3/response.py", line 1035, in stream
    data = self.read(amt=amt, decode_content=decode_content)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/denes/omnipath_python/venv/lib/python3.11/site-packages/urllib3/response.py", line 955, in read
    data = self._raw_read(amt)
           ^^^^^^^^^^^^^^^^^^^
  File "/home/denes/omnipath_python/venv/lib/python3.11/site-packages/urllib3/response.py", line 852, in _raw_read
    with self._error_catcher():
  File "/usr/lib/python3.11/contextlib.py", line 155, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/home/denes/omnipath_python/venv/lib/python3.11/site-packages/urllib3/response.py", line 755, in _error_catcher
    raise ProtocolError(f"Connection broken: {e!r}", e) from e
urllib3.exceptions.ProtocolError: ('Connection broken: IncompleteRead(-196 bytes read, -196 more expected)', IncompleteRead(-196 bytes read, -196 more expected))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/denes/omnipath_python/local/omnipath/_core/downloader/_downloader.py", line 144, in maybe_download
    res = self._download(req)
          ^^^^^^^^^^^^^^^^^^^
  File "/home/denes/omnipath_python/local/omnipath/_core/downloader/_downloader.py", line 179, in _download
    with self._session.send(
         ^^^^^^^^^^^^^^^^^^^
  File "/home/denes/omnipath_python/venv/lib/python3.11/site-packages/requests_cache/session.py", line 205, in send
    response = self._send_and_cache(request, actions, cached_response, **kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/denes/omnipath_python/venv/lib/python3.11/site-packages/requests_cache/session.py", line 233, in _send_and_cache
    self.cache.save_response(response, actions.cache_key, actions.expires)
  File "/home/denes/omnipath_python/venv/lib/python3.11/site-packages/requests_cache/backends/base.py", line 89, in save_response
    cached_response = CachedResponse.from_response(response, expires=expires)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/denes/omnipath_python/venv/lib/python3.11/site-packages/requests_cache/models/response.py", line 102, in from_response
    obj.raw = CachedHTTPResponse.from_response(response)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/denes/omnipath_python/venv/lib/python3.11/site-packages/requests_cache/models/raw_response.py", line 77, in from_response
    _ = response.content  # This property reads, decodes, and stores response content
        ^^^^^^^^^^^^^^^^
  File "/home/denes/omnipath_python/venv/lib/python3.11/site-packages/requests/models.py", line 899, in content
    self._content = b"".join(self.iter_content(CONTENT_CHUNK_SIZE)) or b""
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/denes/omnipath_python/venv/lib/python3.11/site-packages/requests/models.py", line 818, in generate
    raise ChunkedEncodingError(e)
requests.exceptions.ChunkedEncodingError: ('Connection broken: IncompleteRead(-196 bytes read, -196 more expected)', IncompleteRead(-196 bytes read, -196 more expected))

Note, the -196, -196 is because at this point I've already tinkered with this file, originally it was a double read error, with 392 read, -196 more expected. We have no chance to work around this issue in omnipath, because the thing happens within one method in requests_cache. So I sent them a PR. We can't either point to this PR in our packages, bc requests_cache is not our direct dependency. pybiomart could set the fixed version in their requirements, if it has any active maintainer.

Btw, pypath has a simple BioMart client:

from pypath.inputs import biomart
import pandas as pd

human_genesymbols = pd.DataFrame(biomart.biomart_query('external_gene_name', gene = True))

human_genesymbols
       ensembl_gene_id external_gene_name
0      ENSG00000210049              MT-TF
1      ENSG00000211459            MT-RNR1
2      ENSG00000210077              MT-TV
3      ENSG00000210082            MT-RNR2
4      ENSG00000209082             MT-TL1
...                ...                ...
70706  ENSG00000288629                   
70707  ENSG00000288678                   
70708  ENSG00000290825            DDX11L2
70709  ENSG00000227232             WASH7P
70710  ENSG00000290826                   

[70711 rows x 2 columns]

@PauBadiaM
Copy link
Member Author

Wow @deeenes! You seems to have solved this mystery, after installing your requests_cache PR it works flawlessly. Thanks for digging into this! 😉

@JWCook
Copy link

JWCook commented Feb 17, 2024

@PauBadiaM I released @deeenes's fix in the latest version of requests-cache.

I see pybiomart doesn't pin any of its dependencies, so you can pin requests-cache in your own project if you'd like, i.e. requests-cache~=1.2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants