Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] How can we debug gevent related segfaults? #1751

Closed
LefterisJP opened this issue Jan 20, 2021 · 5 comments
Closed

[Question] How can we debug gevent related segfaults? #1751

LefterisJP opened this issue Jan 20, 2021 · 5 comments
Labels
Type: Question User support and/or waiting for responses

Comments

@LefterisJP
Copy link

  • gevent version: 21.1.1
  • Python version: 3.7.9
  • Operating System: Ubuntu 18.04.5

Description:

For a very long time in our project we had this constraint on gevent in requirements:

gevent==1.5a2
greenlet==0.4.16  # adding this constraint since 0.4.17 does not work with gevent 1.5a2

All worked well. But we decided to try and upgrade to the latest gevent since the one we use is quite old and also remove the greenlet constraint since well from my understanding these 2 go together.

But it seems that our test suite in the CI is now segfaulting at the same place iff the entire directory of tests is ran. If tests are ran individually no segfault happens. That makes it hard to debug since that directory contains test that take a long time.

The traceback is:

rotkehlchen/tests/api/test_uniswap.py ..........                         [ 26%]
Fatal Python error: Segmentation fault

Thread 0x00007f4f07de5700 (most recent call first):
  File "/opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages/gevent/_threading.py", line 68 in wait
  File "/opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages/gevent/_threading.py", line 158 in get
  File "/opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages/gevent/threadpool.py", line 187 in run

Thread 0x00007f4f0ce4c700 (most recent call first):
  File "/opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages/gevent/_threading.py", line 68 in wait
  File "/opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages/gevent/_threading.py", line 158 in get
  File "/opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages/gevent/threadpool.py", line 187 in run

Current thread 0x00007f4f2269e740 (most recent call first):
  File "/opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages/web3/_utils/request.py", line 31 in _get_session
  File "/opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages/web3/_utils/request.py", line 37 in make_post_request
  File "/opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages/web3/providers/rpc.py", line 95 in make_request
  File "/opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages/web3/providers/base.py", line 103 in isConnected
  File "/opt/hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages/web3/main.py", line 257 in isConnected
  File "/home/runner/work/rotki/rotki/rotkehlchen/chain/ethereum/manager.py", line 304 in attempt_connect
rotkehlchen/tests/api/test_users.py ..
/home/runner/work/_temp/68865a12-daeb-4725-9068-66797053593e.sh: line 6: 25810 Segmentation fault      (core dumped) python pytestgeventwrapper.py $COVERAGE_ARGS rotkehlchen/tests
Error: Process completed with exit code 139.

What I've run:

As described above we ran a particular test directory of our test suite.

Here is the run: https://github.com/rotki/rotki/pull/2132/checks?check_run_id=1731324419

And here is the PR: rotki/rotki#2132

Naturally I don't expect you guys to debug it for me. Just would like some hints as to where to look or some gotchas I may be missing for finding the root cause of this.

Keep in mind everything works fine with gevent==1.5a2 and greenlet==0.4.16 for some reason. I have since tried various other newer combinations and they all fail with the same traceback.

Thanks for your time! And if this is not the right place to ask please point me to the right one.

@jamadden
Copy link
Member

jamadden commented Jan 20, 2021

My first step in debugging would be to run with PURE_PYTHON=1 and see what happens. Disabling a lot of the C code that way can make coding errors more apparent, whether in gevent or the application.

@jamadden
Copy link
Member

In addition, I would also try running with GEVENT_LOOP=libev-cffi; again, this removes some C code and can produce more scrutable errors.

@jamadden jamadden added the Type: Question User support and/or waiting for responses label Jan 20, 2021
@LefterisJP
Copy link
Author

After a long time debugging I have managed to boil it down to a combination of a few tests running in a specific order.

I am running with both GEVENT_LOOP=libev-cffi and PURE_PYTHON=1 but I don't see any difference. Segfault is always there.

I also once only got a non-segfault error at the same location which may be helpful:

[20/01/2021 18:48:17 CET] DEBUG web3.providers.HTTPProvider: Making request HTTP. URI: http://localhost:8545, Method: web3_clientVersion 
[20/01/2021 18:48:17 CET] DEBUG web3.providers.HTTPProvider: Making request HTTP. URI: https://api.mycryptoapi.com/eth, Method: web3_clientVersion
[20/01/2021 18:48:17 CET] DEBUG web3.providers.HTTPProvider: Making request HTTP. URI: https://mainnet-nethermind.blockscout.com/, Method: web3_clientVersion
[20/01/2021 18:48:17 CET] DEBUG web3.providers.HTTPProvider: Making request HTTP. URI: https://web3.1inch.exchange, Method: web3_clientVersion
[20/01/2021 18:48:17 CET] DEBUG web3.providers.HTTPProvider: Making request HTTP. URI: https://mainnet.eth.cloud.ava.do/, Method: web3_clientVersion
[20/01/2021 18:48:17 CET] DEBUG web3.providers.HTTPProvider: Making request HTTP. URI: https://web3.1inch.exchange, Method: web3_clientVersion
[20/01/2021 18:48:17 CET] DEBUG web3.providers.HTTPProvider: Making request HTTP. URI: https://nodes.mewapi.io/rpc/eth, Method: web3_clientVersion
[20/01/2021 18:48:17 CET] DEBUG web3.providers.HTTPProvider: Making request HTTP. URI: https://cloudflare-eth.com/, Method: web3_clientVersion
[20/01/2021 18:48:17 CET] WARNING rotkehlchen.user_messages: You do not have an Etherscan API key configured. Rotki etherscan queries will still work but will be very slow. If you are not using your own ethereum node, it is recommended to go to https://etherscan.io/registe
r, create an API key and then input it in the external service credentials setting of Rotki
[20/01/2021 18:48:17 CET] DEBUG rotkehlchen.externalapis.etherscan: Querying etherscan: https://api.etherscan.io/api?module=proxy&action=eth_call&to=0x3d9819210A31b4961b30EF54bE2aeD79B9c9Cd3B&data=0xbb82aa5e
KeyError: <rotkehlchen.user_messages.MessagesAggregator object at 0x7f44f2824c90>
                                                                                                                                        
The above exception was the direct cause of the following exception: 
                                                                                                                                        
Traceback (most recent call last):                                                                                                      
  File "src/gevent/greenlet.py", line 906, in gevent._gevent_cgreenlet.Greenlet.run             
  File "/home/lefteris/w/rotkehlchen/rotkehlchen/chain/ethereum/manager.py", line 304, in attempt_connect
    is_connected = web3.isConnected()  
  File "/home/lefteris/.virtualenvs/rotkipy37/lib/python3.7/site-packages/web3/main.py", line 257, in isConnected
    return self.provider.isConnected()        
  File "/home/lefteris/.virtualenvs/rotkipy37/lib/python3.7/site-packages/web3/providers/base.py", line 103, in isConnected
    response = self.make_request(RPCEndpoint('web3_clientVersion'), [])
  File "/home/lefteris/.virtualenvs/rotkipy37/lib/python3.7/site-packages/web3/providers/rpc.py", line 95, in make_request
    **self.get_request_kwargs()                                                                                                         
  File "/home/lefteris/.virtualenvs/rotkipy37/lib/python3.7/site-packages/web3/_utils/request.py", line 37, in make_post_request
    session = _get_session(endpoint_uri)                                                                                                
SystemError: PyEval_EvalFrameEx returned a result with an error set
2021-01-20T17:48:17Z <Greenlet at 0x7f44f7037950: <bound method EthereumManager.attempt_connect of <rotkehlchen.chain.ethereum.manager.EthereumManager object at 0x7f44f6f9c050>>(name=<NodeName.MYETHERWALLET: 6>, ethrpc_endpoint='https://nodes.mewapi.io/rpc/eth', mainnet_ch
eck=True)> failed with SystemError





../../.virtualenvs/rotkipy37/lib/python3.7/site-packages/requests/api.py:134: in put
    return request('put', url, data=data, **kwargs)
../../.virtualenvs/rotkipy37/lib/python3.7/site-packages/requests/api.py:61: in request
    return session.request(method=method, url=url, **kwargs)
../../.virtualenvs/rotkipy37/lib/python3.7/site-packages/requests/sessions.py:428: in __exit__
    self.close()
../../.virtualenvs/rotkipy37/lib/python3.7/site-packages/requests/sessions.py:747: in close
    v.close()
../../.virtualenvs/rotkipy37/lib/python3.7/site-packages/requests/adapters.py:325: in close
    self.poolmanager.clear()
../../.virtualenvs/rotkipy37/lib/python3.7/site-packages/urllib3/poolmanager.py:222: in clear
    self.pools.clear()
../../.virtualenvs/rotkipy37/lib/python3.7/site-packages/urllib3/_collections.py:100: in clear
    self.dispose_func(value)
../../.virtualenvs/rotkipy37/lib/python3.7/site-packages/urllib3/poolmanager.py:173: in <lambda>
    self.pools = RecentlyUsedContainer(num_pools, dispose_func=lambda p: p.close())
../../.virtualenvs/rotkipy37/lib/python3.7/site-packages/urllib3/connectionpool.py:490: in close
    conn = old_pool.get(block=False)
/usr/lib/python3.7/queue.py:181: in get
    self.not_full.notify()
/usr/lib/python3.7/threading.py:345: in notify
    if not self._is_owned():
/usr/lib/python3.7/threading.py:258: in _is_owned
    if self._lock.acquire(0):
../../.virtualenvs/rotkipy37/lib/python3.7/site-packages/gevent/thread.py:141: in acquire
    sleep()
../../.virtualenvs/rotkipy37/lib/python3.7/site-packages/gevent/hub.py:159: in sleep
    waiter.get()
src/gevent/_waiter.py:143: in gevent._gevent_c_waiter.Waiter.get
    ???
src/gevent/_waiter.py:154: in gevent._gevent_c_waiter.Waiter.get
    ???
src/gevent/_greenlet_primitives.py:61: in gevent._gevent_c_greenlet_primitives.SwitchOutGreenletWithLoop.switch
    ???
src/gevent/_greenlet_primitives.py:61: in gevent._gevent_c_greenlet_primitives.SwitchOutGreenletWithLoop.switch
    ???
src/gevent/_greenlet_primitives.py:65: in gevent._gevent_c_greenlet_primitives.SwitchOutGreenletWithLoop.switch
    ???
src/gevent/_gevent_c_greenlet_primitives.pxd:35: in gevent._gevent_c_greenlet_primitives._greenlet_switch
    ???
src/gevent/greenlet.py:906: in gevent._gevent_cgreenlet.Greenlet.run
    ???
rotkehlchen/chain/ethereum/manager.py:304: in attempt_connect
    is_connected = web3.isConnected()
../../.virtualenvs/rotkipy37/lib/python3.7/site-packages/web3/main.py:257: in isConnected
    return self.provider.isConnected()
../../.virtualenvs/rotkipy37/lib/python3.7/site-packages/web3/providers/base.py:103: in isConnected
    response = self.make_request(RPCEndpoint('web3_clientVersion'), [])
../../.virtualenvs/rotkipy37/lib/python3.7/site-packages/web3/providers/rpc.py:95: in make_request
    **self.get_request_kwargs()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

endpoint_uri = 'https://nodes.mewapi.io/rpc/eth', data = b'{"jsonrpc": "2.0", "method": "web3_clientVersion", "params": [], "id": 0}', args = ()
kwargs = {'headers': {'Content-Type': 'application/json', 'User-Agent': "Web3.py/5.15.0/<class 'web3.providers.rpc.HTTPProvider'>"}, 'timeout': 10}

    def make_post_request(endpoint_uri: URI, data: bytes, *args: Any, **kwargs: Any) -> bytes:
        kwargs.setdefault('timeout', 10)
>       session = _get_session(endpoint_uri)
E       SystemError: PyEval_EvalFrameEx returned a result with an error set

../../.virtualenvs/rotkipy37/lib/python3.7/site-packages/web3/_utils/request.py:37: SystemError

I am at a loss as to how to continue debugging this. I can get in pdb in the test a few calls before the segfault. But not even sure what to check to figure out why it happens.

@jamadden
Copy link
Member

This turns out to be a bug in web3 and its use of a C data structure. The same problem can happen with threads.

@LefterisJP
Copy link
Author

Thanks a lot for all the help @jamadden. I guess we can now close this issue as there is nothing actionable on gevent's side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Question User support and/or waiting for responses
Projects
None yet
Development

No branches or pull requests

2 participants