Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggested fix for issue #9131 #9163

Open
wants to merge 1 commit into
base: development
Choose a base branch
from

Conversation

puzzlepaint
Copy link
Contributor

This is a suggestion for a fix to issue #9131. Changes:

  • retry_controls_work_around: Instead of sleeping for 50 milliseconds already after the first time a USB command fails (due to the device being busy), the strategy for retrying is changed to behave as follows:
    • During the first 2 milliseconds (or at least 3 times), retry immediately. This allows to retrieve USB responses quickly that are delayed only shortly.
    • Afterwards, retry with a delay in-between, as before. This avoids to stress the CPU constantly in case the device is busy for a longer period of time.
  • The retry delay for commands, DELAY_FOR_RETRIES, is reduced from 50 to 10 milliseconds, since 50 milliseconds seems rather long to me. MAX_RETRIES is correspondingly increased to keep the time period for retries the same (in addition to the added short period of immediate retries).
  • The "global time" mechanism only accepts samples if they are received within the time window where retries are done immediately. (If the response took longer, then system_time_finish is very likely inaccurate, as explained in the added comment.)

…iseconds, and reduce the retry delay for subsequent retries; time_diff_keeper: Discard delayed responses, as they may have been received earlier than they were returned, which means that system_time_finish is very likely inaccurate
const int POLLING_DEVICES_INTERVAL_MS = 5000;
const uint16_t MAX_RETRIES = 500;
const uint8_t DEFAULT_V4L2_FRAME_BUFFERS = 4;
const uint16_t DELAY_FOR_RETRIES = 10;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about the other changes but please note that "DELAY_FOR_RETRIES" is used in other places in LRS SDK. (for example on l500-device.cpp).
Changing the value will effect other processes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your comment. I overlooked this because I initially made the commit in this PR based on the master branch and then cherry-picked it on top of the development branch for submitting the PR. Based on grep-ing the source, it seems that "DELAY_FOR_RETRIES" was not used anywhere else in master, but in development, it has been introduced into three pieces of identically-looking code which start on these lines:

const int MAX_ITERATIONS_FOR_DEVICE_DISCONNECTED_LOOP = (POLLING_DEVICES_INTERVAL_MS + 1000) / DELAY_FOR_RETRIES;

const int MAX_ITERATIONS_FOR_DEVICE_DISCONNECTED_LOOP = (POLLING_DEVICES_INTERVAL_MS + 1000) / DELAY_FOR_RETRIES;

const int MAX_ITERATIONS_FOR_DEVICE_DISCONNECTED_LOOP = (POLLING_DEVICES_INTERVAL_MS + 1000) / DELAY_FOR_RETRIES;

Based on a quick look at these pieces of code, my impression is that the exact delay there probably does not matter much, since the code only seems to query a bool behind a mutex based on this delay and keeps the overall query duration constant. But I don't know which delay would be most appropriate here; I could easily revert it to 50 milliseconds if desired.

My reasoning for reducing the delay in the other place (the retry interval for USB commands) was that with a D415 device with slightly non-current firmware, and the command to read out the device clock, there seemed to be three distinct outcomes in the code that I tested:

  1. The response to the command could be read immediately
  2. A single retry (without any delay) was needed to get the response after getting EBUSY in the first try
  3. It took about 82 +- 1 milliseconds to get the response, with EBUSY being returned in the meantime

Waiting 50 milliseconds as the original code does seemed to cause a lot of unnecessary waiting in cases 2 and 3. The changed retry strategy in this PR should be able to get the response as soon as possible in cases 1 and 2, and reduce the waiting in case 3.

This is just based on observations though; I do not have any previous experience with low-level USB handling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants