Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Commander/preflight checks: Add monitoring to ESC failures #16704

Merged
merged 3 commits into from
Apr 22, 2021

Conversation

cmic0
Copy link
Contributor

@cmic0 cmic0 commented Jan 31, 2021

Resurrected version of #14800.

Describe problem solved by this pull request
This PR extends monitoring to failures reported by ESCs with telemetry.

ESC checks are enabled/disabled with the COM_ARM_CHK_ESCS parameter.

Test data / coverage
I've been flying this exact code with smart ESCs since the first PR was opened.

@cmic0 cmic0 requested review from MaEtUgR and dagar January 31, 2021 21:08
@cmic0
Copy link
Contributor Author

cmic0 commented Jan 31, 2021

@dagar this was also a suggestion you had back in the time

Did you consider splitting the esc_status update check and error reporting and separately check for timeouts and missing ESCs? You shouldn't need to copy esc_status (a rather large message with 8 esc_reports) every iteration.

I think would make sense to do it


if (esc_status.esc[index].failures != _last_esc_failure[index]) {

if (esc_status.esc[index].failures & esc_report_s::FAILURE_OVER_CURRENT_MASK) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These aren't really generic failures (not used by UAVCAN, DSHOT, etc), perhaps the error message should be confined to the specific module?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mh not really, I've seen this as a recurring pattern where people with proprietary ESCs ended up doing the logic in the driver (and the failures ended up to be almost the same). I think that this is a good first step towards supporting such edge cases

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, let's see how things continue to evolve.
I'm trying to fine the line where commander is tied in such that it's aware of the components and high level health status, but doesn't have to have specific intimate knowledge unless it's actually going to do something with that information.

} else {

if (_esc_status_was_updated && !_status_flags.condition_escs_error) {
mavlink_log_critical(&_mavlink_log_pub, "ESCs telemetry timeout");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this user actionable? If it's not going to block arming or trigger a failsafe in flight then I don't think it's worth sending to a user. You could change it to PX4_WARN to catch in the logs for review later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually now i remember the use case for this:
if you have ESC that are reporting telemetry but the telemetry line is completely disconnected from the vehicle you would be allowed to arm without the telemetry line.
This check guarantees that if this situation happens user is then warned. What i usually do is publishing an esc_status message right when the driver starts, such that then the logic would kick in if no further statuses are received

mavlink_log_critical(&_mavlink_log_pub, "%soffline", esc_fail_msg);
} else {

if (_esc_status_was_updated && !_status_flags.condition_escs_error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't _esc_status_was_updated is always going to be true within esc_status_check()?

Copy link
Contributor Author

@cmic0 cmic0 Mar 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not completely, IIRC i added that condition to guarantee that the check does not trigger a false positive at boot

@dagar
Copy link
Member

dagar commented Feb 2, 2021

@dagar this was also a suggestion you had back in the time

Did you consider splitting the esc_status update check and error reporting and separately check for timeouts and missing ESCs? You shouldn't need to copy esc_status (a rather large message with 8 esc_reports) every iteration.

I think would make sense to do it

@cmic0 yes, I think it's simply a matter of only doing the work of esc_status_check() when the topic updates OR if you haven't had an update for the timeout period (700_ms). In the correct code once you receive the first ORB_ID(esc_status) commander is now copying it every single iteration (100 Hz) even when there's no new data.

@mrpollo
Copy link
Contributor

mrpollo commented Feb 15, 2021

@cmic0 thanks for resurrecting your PR, much appreciated, and really cool feature. 🏆

Could you please address @dagar comments above?

@dagar could you advise on how to fix some of the failed tests? I think this pushes FMUv2 out of flash again.

region `flash' overflowed by 445 bytes

@dagar
Copy link
Member

dagar commented Mar 29, 2021

@cmic0 can you rebase to see if this fits now?

@cmic0
Copy link
Contributor Author

cmic0 commented Mar 29, 2021

yep - i am actually trying improve the code to avoid copying the esc_status if the escs are already timing out

@cmic0 cmic0 force-pushed the pr-esc-failure-checks branch 2 times, most recently from 69f3266 to c85ca27 Compare March 30, 2021 09:36
@cmic0
Copy link
Contributor Author

cmic0 commented Mar 30, 2021

@dagar with c85ca27 i moved out the timeout check from the esc_status_check(), this should avoid that we keep copying the esc_status message for nothing

@cmic0 cmic0 requested a review from bkueng March 30, 2021 09:42
@dagar
Copy link
Member

dagar commented Mar 30, 2021

SITL tests are failing.

Screenshot from 2021-03-30 09-30-00

@Igor-Misic Igor-Misic force-pushed the pr-esc-failure-checks branch from bd01cd0 to 478cca0 Compare April 15, 2021 19:15
@Igor-Misic Igor-Misic force-pushed the pr-esc-failure-checks branch 4 times, most recently from 4551bbd to 764c977 Compare April 20, 2021 13:22
@Igor-Misic Igor-Misic force-pushed the pr-esc-failure-checks branch 2 times, most recently from 419a105 to b8f304f Compare April 21, 2021 12:09
src/modules/commander/Commander.hpp Outdated Show resolved Hide resolved
src/modules/commander/Commander.cpp Outdated Show resolved Hide resolved
COM_ARM_CHK_ESCS set to 0. The user will need to enable it manually.
Co-authored-by: Beat Küng <[email protected]>
@Igor-Misic Igor-Misic force-pushed the pr-esc-failure-checks branch from 9c35196 to cf2b25d Compare April 22, 2021 07:34
@Igor-Misic
Copy link
Member

FYI: I am not authorized to merge this PR.

Igor-Misic added a commit to PX4/PX4-user_guide that referenced this pull request Apr 22, 2021
@TSC21 TSC21 merged commit d631a5d into master Apr 22, 2021
@TSC21 TSC21 deleted the pr-esc-failure-checks branch April 22, 2021 11:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants