Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Queries towards Nebula times out #370

Open
magnudae opened this issue Jan 21, 2025 · 5 comments
Open

Queries towards Nebula times out #370

magnudae opened this issue Jan 21, 2025 · 5 comments

Comments

@magnudae
Copy link

magnudae commented Jan 21, 2025

Bug Question

We are experiencing some consistency issues with our Nebula setup.

It is ran on Azure in its own container instance group with GraphD, MetaD and StorageD, inspired by the minimal docker setup.
They are communicating properly and are operational

But we are experiencing timeout issues towards the server from time to time.

.venv/lib/python3.12/site-packages/nebula3/gclient/net/Connection.py", line 211, in execute_parameter
    raise IOErrorException(
nebula3.Exception.IOErrorException: TSocket read 0 bytes

This issue times out the query towards the graph quite often.

The setup we use is to create a sessionpool and then run execute_py to pass a query with params.

this is also run async with an asyncio.thread wrapped around.

    async def execute_with_params(self, query: str, params: dict) -> dict:
        coroutine = await asyncio.to_thread(self._pool.execute_py, query, params)
        return coroutine

Anyone experienced the same?
can it be the lack of async support?

@magnudae
Copy link
Author

Been trying abit of different ways of connecting, but I seem to get timeouts if it has been a while since the last I/O call.
Then it goes smoothly after

this is the latest with using pure connections nebula3.Exception.IOErrorException: Socket read failed: [Errno 60] Operation timed out

@wey-gu
Is there some kind of warmup/downtime on the access point?
Something you have encountered before?

Solution now is to run a retry, but that still gives back slow I/O calls from time to time.
Which isnt great when we want to scale this up

@wey-gu
Copy link
Contributor

wey-gu commented Jan 22, 2025

Guess it's a session idle for long and graphd considering it's legit to be released.

Could you please try increasing

client_idle_timeout_secs
of graphd conf, by default it's equivalent to 8 hours, thus we observe the timeout like cold start/warm up.

We could reduce it to extremely small first to see if we could reproduce in short feedback loop.

Due to on production online service this normally won't happen, we didn't expose it as it should be(at least in faq)

cc @ChrisChen2023 on docs perspective.

@wey-gu
Copy link
Contributor

wey-gu commented Jan 22, 2025

@HarrisChu @Nicole00 @BeautyyuYanli

I recall when I was hacking some long run nebula py based things, I encountered this every now and then(like in a Jupyter session there for whole day, when connect to nebula again, I got this)

Maybe, on client side it's ok to identify such time out and recreate connection on demand in a proper way? Or other ways to improve such case from client side?

@magnudae
Copy link
Author

@wey-gu thanks. Ill try that.

@lremember
Copy link

I have the same problem, how to solve it ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants