Queries towards Nebula times out #370

magnudae · 2025-01-21T11:02:10Z

Bug Question

We are experiencing some consistency issues with our Nebula setup.

It is ran on Azure in its own container instance group with GraphD, MetaD and StorageD, inspired by the minimal docker setup.
They are communicating properly and are operational

But we are experiencing timeout issues towards the server from time to time.

.venv/lib/python3.12/site-packages/nebula3/gclient/net/Connection.py", line 211, in execute_parameter
    raise IOErrorException(
nebula3.Exception.IOErrorException: TSocket read 0 bytes

This issue times out the query towards the graph quite often.

The setup we use is to create a sessionpool and then run execute_py to pass a query with params.

this is also run async with an asyncio.thread wrapped around.

    async def execute_with_params(self, query: str, params: dict) -> dict:
        coroutine = await asyncio.to_thread(self._pool.execute_py, query, params)
        return coroutine

Anyone experienced the same?
can it be the lack of async support?

The text was updated successfully, but these errors were encountered:

magnudae · 2025-01-21T19:15:58Z

Been trying abit of different ways of connecting, but I seem to get timeouts if it has been a while since the last I/O call.
Then it goes smoothly after

this is the latest with using pure connections nebula3.Exception.IOErrorException: Socket read failed: [Errno 60] Operation timed out

@wey-gu
Is there some kind of warmup/downtime on the access point?
Something you have encountered before?

Solution now is to run a retry, but that still gives back slow I/O calls from time to time.
Which isnt great when we want to scale this up

wey-gu · 2025-01-22T02:22:54Z

Guess it's a session idle for long and graphd considering it's legit to be released.

Could you please try increasing

client_idle_timeout_secs
of graphd conf, by default it's equivalent to 8 hours, thus we observe the timeout like cold start/warm up.

We could reduce it to extremely small first to see if we could reproduce in short feedback loop.

Due to on production online service this normally won't happen, we didn't expose it as it should be(at least in faq)

cc @ChrisChen2023 on docs perspective.

wey-gu · 2025-01-22T02:28:56Z

@HarrisChu @Nicole00 @BeautyyuYanli

I recall when I was hacking some long run nebula py based things, I encountered this every now and then(like in a Jupyter session there for whole day, when connect to nebula again, I got this)

Maybe, on client side it's ok to identify such time out and recreate connection on demand in a proper way? Or other ways to improve such case from client side?

magnudae · 2025-01-22T09:37:39Z

@wey-gu thanks. Ill try that.

lremember · 2025-01-23T13:42:28Z

I have the same problem, how to solve it ?

wey-gu mentioned this issue Jan 25, 2025

Weekly Report 2025-01-24 vesoft-inc/nebula-community#473

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Queries towards Nebula times out #370

Queries towards Nebula times out #370

magnudae commented Jan 21, 2025 •

edited

Loading

magnudae commented Jan 21, 2025

wey-gu commented Jan 22, 2025

wey-gu commented Jan 22, 2025 •

edited

Loading

magnudae commented Jan 22, 2025

lremember commented Jan 23, 2025

Queries towards Nebula times out #370

Queries towards Nebula times out #370

Comments

magnudae commented Jan 21, 2025 • edited Loading

magnudae commented Jan 21, 2025

wey-gu commented Jan 22, 2025

wey-gu commented Jan 22, 2025 • edited Loading

magnudae commented Jan 22, 2025

lremember commented Jan 23, 2025

magnudae commented Jan 21, 2025 •

edited

Loading

wey-gu commented Jan 22, 2025 •

edited

Loading