Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[visualization] Fix meshcat control deletion paths #22028

Merged

Conversation

rpoyner-tri
Copy link
Contributor

@rpoyner-tri rpoyner-tri commented Oct 11, 2024

Fix a footgun that has been hiding in plain sight for some time: the mismatch among the meshcat controls API, its (mis-)uses, and the fact that Python bindings make destructor ordering unpredictable.

Why hasn't it fired? Largely because pybind memory management was configured to leak diagrams and leaf systems forever, so destructors would never run. In order to open the way to a fix for memory leaks, the destructor hazard(s) must be addressed first.

  • Introduce a non-strict deletion mode, that warns instead of throws.
  • Use non-strict deletion when called from destructors.
  • In addition, catch all exceptions in destructors, and abort if any are raised.
  • Update bindings, doc, and tests.

Note that this patch does not solve any of the inherent hazards of the meshcat controls API: the delusion of control ownership, problems with overlapping client lifetimes, etc. It merely stops the destructor-operated deletion paths from throwing.


This change is Reviewable

@rpoyner-tri rpoyner-tri added the release notes: fix This pull request contains fixes (no new features) label Oct 11, 2024
Copy link
Contributor Author

@rpoyner-tri rpoyner-tri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+@jwnimmer-tri or delegate for feature review.

Reviewable status: LGTM missing from assignee jwnimmer-tri(platform), needs at least two assigned reviewers

Copy link
Contributor Author

@rpoyner-tri rpoyner-tri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Toward #14387 .

Reviewable status: LGTM missing from assignee jwnimmer-tri(platform), needs at least two assigned reviewers

Copy link
Collaborator

@jwnimmer-tri jwnimmer-tri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 1 unresolved discussion, LGTM missing from assignee jwnimmer-tri(platform), needs at least two assigned reviewers


multibody/meshcat/joint_sliders.h line 99 at r1 (raw file):

  constructor, not any slider values.

  @throws std::exception if any of the sliders registered by this class are no

BTW If you prefer, I would be okay with this class (and its twin) not having any API changes. These classes did not make any detailed API promises about deletion, so just doing strict = false in the cc file would be OK by me.

If you want to keep them like this for consistency, that's OK by me too.

Copy link
Contributor Author

@rpoyner-tri rpoyner-tri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 1 unresolved discussion, LGTM missing from assignee jwnimmer-tri(platform), needs at least two assigned reviewers


multibody/meshcat/joint_sliders.h line 99 at r1 (raw file):

Previously, jwnimmer-tri (Jeremy Nimmer) wrote…

BTW If you prefer, I would be okay with this class (and its twin) not having any API changes. These classes did not make any detailed API promises about deletion, so just doing strict = false in the cc file would be OK by me.

If you want to keep them like this for consistency, that's OK by me too.

This scheme saves me one private method, but I could see the appeal of not changing API.

Copy link
Collaborator

@jwnimmer-tri jwnimmer-tri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM on meshcat.

Reviewed 5 of 13 files at r1.
Reviewable status: 6 unresolved discussions, LGTM missing from assignee jwnimmer-tri(platform), needs at least two assigned reviewers


geometry/meshcat.cc line 62 at r1 (raw file):

template <typename Mapping>
void ComplainThingNotFound(std::string_view thing, std::string_view name,
                           Mapping thing_map,

nit Here and throughout, we should use const Mapping& to prevent copying. I'm pretty sure the unadorned template ends up copying the entire map.

(This is a pre-existing defect, but we're adding even more copies now.)


geometry/meshcat.cc line 69 at r1 (raw file):

  }
  std::string message = fmt::format(
      "Meshcat does not have any {} named {}.  The "

nit The string literal split into two lines is weird/irregular now.


geometry/meshcat.cc line 1625 at r1 (raw file):

  // This function is public via the PIMPL.
  bool DeleteButton(std::string name, bool strict = true) {

nit Do not put default values in the cc file, only the header.


geometry/meshcat.cc line 1765 at r1 (raw file):

  // This function is public via the PIMPL.
  bool DeleteSlider(std::string name, bool strict = true) {

nit Do not put default values in the cc file, only the header.


geometry/test/meshcat_test.cc line 862 at r1 (raw file):

  // Removing the button then asking for clicks is an error.
  meshcat.DeleteButton("alice");

nit Add EXPECT_TRUE here (per missing test coverage of the impl's return statement).


multibody/meshcat/joint_sliders.h line 99 at r1 (raw file):

Previously, rpoyner-tri (Rick Poyner (rico)) wrote…

This scheme saves me one private method, but I could see the appeal of not changing API.

Right, I see now. I am fine either way.

Copy link
Collaborator

@jwnimmer-tri jwnimmer-tri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 6 unresolved discussions, LGTM missing from assignee jwnimmer-tri(platform), needs at least two assigned reviewers


multibody/meshcat/joint_sliders.h line 99 at r1 (raw file):

Previously, jwnimmer-tri (Jeremy Nimmer) wrote…

Right, I see now. I am fine either way.

Actually, we don't need a new method anyway?

The Delete() implementation can hard-code strict = False unconditionally. It's fine that a users's call to Delete ends up as non-strict.

That will be vastly simpler, I think.

Copy link
Collaborator

@jwnimmer-tri jwnimmer-tri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 7 unresolved discussions, LGTM missing from assignee jwnimmer-tri(platform), needs at least two assigned reviewers


multibody/meshcat/joint_sliders.cc line 232 at r1 (raw file):

    Delete(/*strict = */false);
  } catch(...) {
    ::abort();

We need a better error message than this. Probably using DRAKE_UNREACHABLE(); is enough (gives a filename and line number).

Copy link
Contributor Author

@rpoyner-tri rpoyner-tri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 1 unresolved discussion, LGTM missing from assignee jwnimmer-tri(platform), needs at least two assigned reviewers


geometry/meshcat.cc line 69 at r1 (raw file):

Previously, jwnimmer-tri (Jeremy Nimmer) wrote…

nit The string literal split into two lines is weird/irregular now.

it was before as well?


multibody/meshcat/joint_sliders.cc line 232 at r1 (raw file):

Previously, jwnimmer-tri (Jeremy Nimmer) wrote…

We need a better error message than this. Probably using DRAKE_UNREACHABLE(); is enough (gives a filename and line number).

Done.

Copy link
Collaborator

@jwnimmer-tri jwnimmer-tri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: feature.

Reviewed 11 of 11 files at r2, all commit messages.
Reviewable status: 1 unresolved discussion, needs at least two assigned reviewers


geometry/meshcat.h line 773 at r2 (raw file):

   @throws std::exception if `strict` is true and `name` is not a registered
           button. If `strict` is false and `name` is missing, it logs a
           warning. */

BTW I was probably the one to suggest a warning, but having seen the whole thing now I would also be OK without any logging. We are returning a status-bool, so the caller can decide whether and how to warn if they really want to. Not warning would be less code, and plausibly nicer for callers who don't want any console output in their situation. (I am OK for you to choose which you like.)

Copy link
Contributor Author

@rpoyner-tri rpoyner-tri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+@ggould-tri for next-available on call platform review.

Reviewable status: LGTM missing from assignee ggould-tri(platform)

Copy link
Collaborator

@jwnimmer-tri jwnimmer-tri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 3 files at r3, all commit messages.
Reviewable status: LGTM missing from assignee ggould-tri(platform)

Copy link
Contributor

@ggould-tri ggould-tri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:; trivial nits.

Reviewed 1 of 13 files at r1, 9 of 11 files at r2, 3 of 3 files at r3, all commit messages.
Reviewable status: 4 unresolved discussions


geometry/test/meshcat_test.cc line 869 at r3 (raw file):

  DRAKE_EXPECT_THROWS_MESSAGE(meshcat.DeleteButton("alice"),
                              "Meshcat does not have any button named alice.*");
  EXPECT_FALSE(meshcat.DeleteButton("alice", /*strict = */false));

nit meta: This file is inconsistent with its peers as to whether there's a space following the keyword comment. (here and below)

Suggestion:

*/ f

multibody/meshcat/joint_sliders.cc line 220 at r3 (raw file):

    for (const auto& [position_index, slider_name] : position_names_) {
      unused(position_index);
      meshcat_->DeleteSlider(slider_name, /*strict =  */ false);

typo double whitespace


visualization/meshcat_pose_sliders.cc line 149 at r3 (raw file):

    for (int i = 0; i < 6; ++i) {
      if (visible_[i]) {
        meshcat_->DeleteSlider(slider_names_[i], /*strict =  */ false);

typo double whitespace


bindings/pydrake/geometry/test/visualizers_test.py line 215 at r3 (raw file):

        self.assertEqual(meshcat.GetButtonClicks(name="alice"), 1)
        meshcat.DeleteButton(name="alice")
        meshcat.DeleteButton(name="alice", strict=False)

minor: Since this has a return value, the assert should make sure it's wired.
(here and below)

Suggestion:

        self.AssertTrue(meshcat.DeleteButton(name="alice"))
        self.AssertFalse(meshcat.DeleteButton(name="alice", strict=False))

Fix a footgun that has been hiding in plain sight for some time: the
mismatch among the meshcat controls API, its (mis-)uses, and the fact
that Python bindings make destructor ordering unpredictable.

Why hasn't it fired? Largely because pybind memory management was
configured to leak diagrams and leaf systems forever, so destructors
would never run. In order to open the way to a fix for memory leaks, the
destructor hazard(s) must be addressed first.

* Introduce a non-strict deletion mode, that returns false instead of
  throws.
* Use non-strict deletion when called from destructors.
* In addition, catch all exceptions in destructors, and abort if any are
  raised.
* Update bindings, doc, and tests.

Note that this patch does *not* solve any of the inherent hazards of the
meshcat controls API: the delusion of control ownership, problems with
overlapping client lifetimes, etc. It merely stops the destructor-operated
deletion paths from throwing.
@rpoyner-tri rpoyner-tri added the release notes: feature This pull request contains a new feature label Oct 14, 2024
Copy link
Contributor Author

@rpoyner-tri rpoyner-tri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also +(release notes: feature) I guess?. Adds the strict=false alternative.

Reviewable status: :shipit: complete! all discussions resolved, LGTM from assignees ggould-tri(platform),jwnimmer-tri(platform)

@rpoyner-tri rpoyner-tri merged commit 417cc3f into RobotLocomotion:master Oct 14, 2024
9 checks passed
RussTedrake pushed a commit to RussTedrake/drake that referenced this pull request Dec 15, 2024
…2028)

Fix a footgun that has been hiding in plain sight for some time: the
mismatch among the meshcat controls API, its (mis-)uses, and the fact
that Python bindings make destructor ordering unpredictable.

Why hasn't it fired? Largely because pybind memory management was
configured to leak diagrams and leaf systems forever, so destructors
would never run. In order to open the way to a fix for memory leaks, the
destructor hazard(s) must be addressed first.

* Introduce a non-strict deletion mode, that returns false instead of
  throws.
* Use non-strict deletion when called from destructors.
* In addition, catch all exceptions in destructors, and abort if any are
  raised.
* Update bindings, doc, and tests.

Note that this patch does *not* solve any of the inherent hazards of the
meshcat controls API: the delusion of control ownership, problems with
overlapping client lifetimes, etc. It merely stops the destructor-operated
deletion paths from throwing.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release notes: feature This pull request contains a new feature release notes: fix This pull request contains fixes (no new features)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants