Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dlt_user_shared: Add timeout to writev #385

Merged
merged 1 commit into from
May 17, 2023

Conversation

alexmohr
Copy link
Contributor

This timeout is necessary to prevent
blocking writev indefinitely.
Without the timeout dlt-daemon, may block
indefinitely when an app id is re-used
very frequently.
In that case dlt-daemon won't accept anymore
new connections and further communication
in any way is not possible anymore.

The program was tested solely for our own use cases, which might differ from yours.
Licensed under Mozilla Public License Version 2.0

Alexander Mohr, [email protected], Mercedes-Benz Tech Innovation GmbH, imprint

@alexmohr alexmohr force-pushed the prevent-timeouts-on-send branch from 11b94ca to 502119a Compare August 29, 2022 11:58
@alexmohr alexmohr closed this Jan 31, 2023
@alexmohr alexmohr reopened this Feb 15, 2023
FD_SET(handle, &fds);

struct timeval tv = { 0, 100000000 };
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100000000 usec = 100 sec.
Isn't this a bit to much?
Please add a configurable value for all these values. Is 10 seconds sufficient as default value? This would be more readable:
struct timeval tv = { 10, 0 };

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I checked our internal fork and I somehow screwed up while applying this patch here. We're actually using 1 second timeout and not 100.
I adjusted the value to reflect this.
Also made it configurable via CMake

This timeout is necessary to prevent
blocking writev indefinitely.
Without the timeout dlt-daemon, may block
indefinitely when an app id is re-used
very frequently.
In that case dlt-daemon won't accept anymore
new connections and further communication
in any way is not possible anymore.

Signed-off-by: Alexander Mohr <[email protected]>
@alexmohr alexmohr force-pushed the prevent-timeouts-on-send branch from 502119a to 3fdeedb Compare February 22, 2023 11:23
@michael-methner
Copy link
Collaborator

Hello @alexmohr ,
i have tested your changes and working fine with me as well.

But I am still a bit hesitant to merge it as it might cause problems when a system is under high cpu load and timeout of 1 sec is triggered by accident.

Can you please describe how to reproduce the issue?

You stated above "indefinitely when an app id is re-used very frequently." Does this mean the app-id is registered from the same process several times? When it is registered from the different process, it should not matter as each process uses its own fifo.

@alexmohr
Copy link
Contributor Author

alexmohr commented Apr 24, 2023

Hello @alexmohr , i have tested your changes and working fine with me as well.

But I am still a bit hesitant to merge it as it might cause problems when a system is under high cpu load and timeout of 1 sec is triggered by accident.

Can you please describe how to reproduce the issue?

I was never to reproduce this in a static test but only during test drives w/o clear idea on how this happens.
It seems to happen when a receiving client (i.e. dlt-receive) is very slow or if the cpu load is high.

You stated above "indefinitely when an app id is re-used very frequently." Does this mean the app-id is registered from the same process several times? When it is registered from the different process, it should not matter as each process uses its own fifo.

It happened when multiple applications (processes) are using the same app id. dlt-daemon will frequently re-create the mapping between the app id and the process id, which takes some time because quite a few syscalls are made

@michael-methner michael-methner merged commit 6005b92 into COVESA:master May 17, 2023
@michael-methner
Copy link
Collaborator

I gave another try to reproduce the behavior but failed. Nevertheless, I think the patch will be an improvement for these high-load scenarios -> merging.

@alexmohr : Thank you for the contribution.

alexmohr added a commit to mercedes-benz/dlt-daemon that referenced this pull request Jun 28, 2024
This timeout is necessary to prevent
blocking writev indefinitely.
Without the timeout dlt-daemon, may block
indefinitely when an app id is re-used
very frequently.
In that case dlt-daemon won't accept anymore
new connections and further communication
in any way is not possible anymore.

Signed-off-by: Alexander Mohr <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants