clutter.html

<!DOCTYPE html>

<html>

  <head>
    <title>Ch. 5 - Bin Picking</title>
    <meta name="Ch. 5 - Bin Picking" content="text/html; charset=utf-8;" />
    <link rel="canonical" href="http://manipulation.csail.mit.edu/clutter.html" />

    <script src="https://hypothes.is/embed.js" async></script>
    <script type="text/javascript" src="htmlbook/book.js"></script>

    <script src="htmlbook/mathjax-config.js" defer></script> 
    <script type="text/javascript" id="MathJax-script" defer
      src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-chtml.js">
    </script>
    <script>window.MathJax || document.write('<script type="text/javascript" src="htmlbook/MathJax/es5/tex-chtml.js" defer><\/script>')</script>

    <link rel="stylesheet" href="htmlbook/highlight/styles/default.css">
    <script src="htmlbook/highlight/highlight.pack.js"></script> <!-- http://highlightjs.readthedocs.io/en/latest/css-classes-reference.html#language-names-and-aliases -->
    <script>hljs.initHighlightingOnLoad();</script>

    <link rel="stylesheet" type="text/css" href="htmlbook/book.css" />
  </head>

<body onload="loadChapter('manipulation');">

<div data-type="titlepage" pdf="no">
  <header>
    <h1><a href="index.html" style="text-decoration:none;">Robotic Manipulation</a></h1>
    <p data-type="subtitle">Perception, Planning, and Control</p> 
    <p style="font-size: 18px;"><a href="http://people.csail.mit.edu/russt/">Russ Tedrake</a></p>
    <p style="font-size: 14px; text-align: right;"> 
      &copy; Russ Tedrake, 2020-2021<br/>
      Last modified <span id="last_modified"></span>.</br>
      <script>
      var d = new Date(document.lastModified);
      document.getElementById("last_modified").innerHTML = d.getFullYear() + "-" + (d.getMonth()+1) + "-" + d.getDate();</script>
      <a href="misc.html">How to cite these notes, use annotations, and give feedback.</a><br/>
    </p>
  </header>
</div>

<p pdf="no"><b>Note:</b> These are working notes used for <a
href="http://manipulation.csail.mit.edu/Fall2021/">a course being taught
at MIT</a>. They will be updated throughout the Fall 2021 semester.  <!-- <a 
href="https://www.youtube.com/channel/UChfUOAhz7ynELF-s_1LPpWg">Lecture  videos are available on YouTube</a>.--></p> 

<table style="width:100%;" pdf="no"><tr style="width:100%">
  <td style="width:33%;text-align:left;"><a class="previous_chapter" href=pose.html>Previous Chapter</a></td>
  <td style="width:33%;text-align:center;"><a href=index.html>Table of contents</a></td>
  <td style="width:33%;text-align:right;"><a class="next_chapter" href=segmentation.html>Next Chapter</a></td>
</tr></table>


<!-- EVERYTHING ABOVE THIS LINE IS OVERWRITTEN BY THE INSTALL SCRIPT -->
<chapter style="counter-reset: chapter 4"><h1>Bin Picking</h1>
  <a href="https://deepnote.com/project/Ch-5-Bin-Picking-jhCzASd2RI--Q8f2-1T6Hw/%2Fclutter.ipynb" style="float:right; margin-top:-70px;background:white;border:0;" target="clutter">
  <img src="https://deepnote.com/buttons/launch-in-deepnote-white.svg"></a>
  <div style="clear:right;"></div>

  <todo>Consider moving / teaching this BEFORE geometric pose estimation.  It could actually be nice to introduce cameras / point clouds with this material instead of with pose estimation.</todo>

  <p>Our initial study of geometric perception gave us some powerful tools, but
  also revealed some major limitations.  In the next chapter, we will begin
  applying techniques from deep learning to perception.  Spoiler alert: those
  methods are going to be insatiable in their hunger for data. So before we get
  there, I'd like to take a brief excursion into a nice subproblem that might
  help us feed that need.</p>

  <p>In this chapter we'll consider the simplest version of the bin picking
  problem: the robot has a bin full of random objects and simply needs to move
  those objects from one bin to the other.  We'll be agnostic about what those
  objects are and about where they end up in the other bin, but we would like
  our solution to achieve a reasonable level of performance for a very wide
  variety of objects.  This turns out to be a pretty convenient way to create a
  training ground for robot learning -- we can set the robot up to move objects
  back and forth between bins all day and night, and intermittently add and
  remove objects from the bin to improve diversity.  Of course, it is even
  easier in simulation!</p>

  <p>Bin picking has potentially very important applications in industries such
  as logistics, and there are significantly more refined versions of this
  problem. For example, we might need to pick only objects from a specific
  class, and/or place the objects in known position (e.g. for "packing").  But
  let's start with the basic case.</p>

  <section><h1>Generating random cluttered scenes</h1>
  
    <p>If our goal is to test a diversity of bin picking situations, then the
    first task is to figure out how to generate diverse simulations.  How
    should we populate the bin full of objects?  So far we've set up each
    simulation by carefully setting the initial positions (in the Context) for
    each of the objects, but that approach won't scale.</p>

    <p>The first change that you'll see in this chapter is that I will start
    using Drake's "<a
    href="https://github.com/RobotLocomotion/drake/blob/master/multibody/parsing/README_model_directives.md">Model
    Directives</a>" for <a
    href="https://drake.mit.edu/doxygen_cxx/namespacedrake_1_1multibody_1_1parsing.html#a57aafdf49e089598a1e344d8e7a74249">parsing</a>
    more complex environments from yaml files.  That will help us set up the
    bins, cameras, and the robot which can be welded in fixed positions.  But
    generating the distributions over objects and object initial conditions is
    more bespoke, and we need an algorithm for that.</p>

    <subsection><h1>Falling things</h1>

      <p>In the real world, we would probably just dump the random objects into
      the bin.  That's a decent strategy for simulation, too.  We can roughly
      expect our simulation to faithfully implement multibody physics as long as
      our initial conditions (and time step) are reasonable; the physics isn't
      well defined if we initialize the Context with multiple objects occupying
      the same physical space.  The simplest and most common way to avoid this is
      to generate a random number of objects in random poses, with their
      vertical positions staggered so that they trivially start out of
      penetration.</p>

      <p>If you look for them, you can find animations of large numbers of
      falling objects in the demo reels for most advanced multibody
      simulators.  But we're doing it for a purpose!</p>

      <todo>maybe cite the falling things paper, but make it clear that the idea
      is not new?</todo></p>

      <example><h1>Piles of foam bricks in 2D</h1>

        <p>Here is the 2D case.  I've added many instances of our favorite red
        foam brick to the plant.  Note that it's possible to write highly
        optimized 2D simulators; that's not what I've done here.  Rather, I've
        added a planar joint connecting each brick to the world, and run our
        full 3D simulator.  The planar joint has three degrees of freedom.  I've
        oriented them here to be $x$, $z$, and $\theta$ to constrain the
        objects to the $xz$ plane.</p>

        <p>I've set the initial positions for each object in the Context to be
        uniformly distributed over the horizontal position, uniformly rotated,
        and staggered every 0.1m in their initial vertical position.  We only
        have to simulate for a little more than a second for the bricks to come
        to rest and give us our intended "initial conditions".</p>

        <p><a
        href="https://deepnote.com/project/Ch-5-Bin-Picking-jhCzASd2RI--Q8f2-1T6Hw/%2Fclutter.ipynb"
        style="background:none; border:none;" class="deepnote"
        target="clutter">
        <img
        src="https://deepnote.com/buttons/launch-in-deepnote-white.svg"></a></p>
  
        <figure>
          <iframe style="border:0;height:300px;width:540px;" src="data/falling_bricks_2d.html?height=240px" pdf="no"></iframe>
          <p pdf="only"><a href="data/falling_bricks_2d.html">Click here for the animation.</a></p>
        </figure>

      </example>

      <p>It's not really any different to do this with any random objects --
      here is what it looks like when I run the same code, but replace the
      brick with a random draw from a few objects from the <a
      href="https://www.ycbbenchmarks.com/">YCB dataset</a>.  It somehow amuses
      me that we can see the <a
      href="https://en.wikipedia.org/wiki/Central_limit_theorem">central limit
      theorem</a> hard at work, even when our objects are slightly
      ridiculous.</p>

      <figure>
      <img style="width:70%" src="data/ycb_planar_clutter.png"/>
      </figure>

      <example><h1>Filling bins with clutter</h1>

        <p>The same technique also works in 3D.  Setting uniformly random
        orientations in 3D requires a little more thought, but Drake supplies
        the method <code>UniformlyRandomRotationMatrix</code> (and also one for
        quaternions and roll-pitch-yaw) to do that work for us.</p>

        <p><a
          href="https://deepnote.com/project/Ch-5-Bin-Picking-jhCzASd2RI--Q8f2-1T6Hw/%2Fclutter.ipynb"
          style="background:none; border:none;" class="deepnote"
          target="clutter">
          <img
          src="https://deepnote.com/buttons/launch-in-deepnote-white.svg"></a></p>
  
        <p>I had to decide how to visualize the results of this one for you.
        The mesh and texture map files for the YCB objects are very large, so
        downloading many of them to your browser from an online notebook felt a
        bit too painful.  If you've decided to run the notebooks on your local
        machine, then go ahead and run <code>drake_visualizer</code> before
        running this test to see the live simulation (drake visualizer will
        load the mesh files directly from your disk, so avoids the download).
        For the cloud notebooks, I've decided to add a camera to the scene and
        take a picture after simulating for a few seconds.  After all, that's
        perhaps the data that we're actually looking for.</p>

        <figure>
          <img style="width:70%" src="data/ycb_clutter.png"/>
        </figure>

      </example>

      <p>Please appreciate that bins are a particularly simple case for
      generating random scenarios. If you wanted to generate random kitchen
      environments, for example, then you won't be as happy with a solution
      that drops refrigerators, toasters, and stools from uniformly random
      i.i.d. poses.  In those cases, authoring reasonable distributions gets
      much more interesting; we will revisit the topic of generative scene
      models <elib>Izatt20</elib> later in the notes.</p>

    </subsection>

    <subsection><h1>Static equilibrium with frictional contact</h1>

      <p>Even in the case of bins, we should try to think critically about
      whether dropping objects from a height is really the best solution.  Given
      our discussion in the last chapter about writing optimizations with
      non-penetration constraints, I hope you are already asking yourself: why
      not use those constraints again here?  Let's explore that idea a bit
      further.</p>
  
      <p>I won't dive into a full discussion of multibody dynamics nor multibody
      simulation, though I do have more notes <a
      href="http://underactuated.mit.edu/multibody.html">available here</a>.
      What is important to understand here is that the equations of motion of
      our multibody system are described by differential equations of the form:
      $$M(q)\dot{v} + C(q,v)v = \tau_g(q) + \sum_i J^T_i(q)F^{c_i}.$$  The left
      side of the equation is just a generalization of "mass times
      acceleration", with the mass matrix, $M$, and the Coriolis terms $C$.  The
      right hand side is the sum of the (generalized) forces, with $\tau_g(q)$
      capturing the terms due to gravity, and $F^{c_i}$ is the <a
      href="https://drake.mit.edu/doxygen_cxx/group__multibody__spatial__vectors.html">spatial
      force</a> due to the $i$th contact.  $J_i(q)$ is the $i$th "contact
      Jacobian" -- it is the Jacobian that maps from the generalized velocities
      to the spatial velocity of the $i$th contact frame.</p>

      <p>Our interest here is in finding (stable) steady-state solutions to
      these differential equations that can serve as good initial conditions for
      our simulation.  At steady-state we have $v=\dot{v}=0$, and conveniently
      all of the terms on the left-hand side of the equations are zero.  This
      leaves us with just the force-balance equations $$\tau_g(q) = - \sum_i
      J^T_i(q) F^{c_i}.$$  But we still need to understand where the contact
      forces come from.</p>

      <subsubsection><h1>Collision geometry</h1>

      <p>Geometry engines for robotics, like <code>SceneGraph</code> in
      <drake></drake>, distinguish between a few different <a
      href="https://drake.mit.edu/doxygen_cxx/group__geometry__roles.html"></a>roles
      that geometry can play</a> in a simulation.  In <a
      href="http://sdformat.org/spec">robot description files</a>, we
      distinguish between <a
      href="http://sdformat.org/spec?ver=1.8&elem=link#link_visual">visual</a>
      and <a
      href="http://sdformat.org/spec?ver=1.8&elem=collision">collision</a>
      geometries.  In particular, every rigid body in the simulation can have
      multiple
      <i>collision</i>
      geometries associated with it (playing the "proximity" role). Collision
      geometries are often much simpler than the visual geometries we use for
      illustration and simulating perception -- sometimes they are just a
      low-polygon-count version of the visual mesh and sometimes we actually
      use much simpler geometry (like boxes, spheres, and cylinders).  These
      simpler geometries make the physics engine faster and more robust.</p>

      <p>For subtle reasons we will explore below, in addition to simplifying
      the geometry, we sometimes over-parameterize the collision geometry in
      order to make the numerics more robust.  For example, when we simulate
      the red brick, we actually use <i>nine</i> collision geometries for
      the one body.</p>

      <figure><img style="width:50%"
      src="data/foam_brick_contact_geometry.png"/><figcaption>The (exaggerated)
      contact geometry used for robust simulation of boxes.  We add contact
      "points" (epsilon radius spheres) to each corner, and have a slightly
      inset box for the remaining contacts.  <a
      href="data/foam_brick_contact_geometry.html">Here</a> is the interactive
      version.</figcaption></figure>
    
      <example><h1>Collision geometry inspector</h1>
      
        <p>Coming soon: I'll provide a little notebook gui that lets you view
        both the visual and collision geometry for your favorite URDF/SDF
        files.</p>

      </example>

      <p>SceneGraph also implements the concept of a <a
      href="https://drake.mit.edu/doxygen_cxx/classdrake_1_1geometry_1_1_collision_filter_declaration.html">collision
      filter</a>.  It can be important to specify that, for instance, the iiwa
      geometry at link 5 cannot collide with the geometry in links 4 or 6.
      Specifying that some collisions should be ignored not only speeds up the
      computation, but it also facilitates the use of simplified collision
      geometry for links.  It would be extremely hard to approximate link 4 and
      5 accurately with spheres, and cylinders if I had to make sure that those
      spheres and cylinders did not overlap in any feasible joint angle.  The
      default collision filter settings should work fine for most applications,
      but you can tweak them if you like.</p>

      <p>So where do the contact forces, $F^{c_i}$, come from?  There is
      potentially an equal and opposite contact force for every <i>pair</i> of
      collision geometries that are not filtered out by the collision filter.
      In <code>SceneGraph</code>, the <a
      href="https://drake.mit.edu/doxygen_cxx/classdrake_1_1geometry_1_1_scene_graph_inspector.html#a79b84ada25718e2a38fd2d13adb4d839"><code>GetCollisionCandidates</code></a>
      method returns them all.  We'll take to calling the two bodies in a collision pair "body A" and "body B".</p>

    </subsubsection>

    <subsubsection><h1>The Contact Frame</h1>

      <p>We still need to decide the magnitude of these spatial forces, and to
      write down the magnitude, we need to specify the <i>contact frame</i> in
      which the spatial force is to be applied.  For instance, we <a
      href="https://drake.mit.edu/doxygen_cxx/classdrake_1_1multibody_1_1_point_pair_contact_info.html">might
      use</a> $C_B$ to denote a contact frame on body $B$, with the forces
      applied at the origin of the frame.</p>

      <p>Most simulators summarize the contact between two bodies as a
      translational force (e.g. zero torque) at one or more contact points. Our
      convention will be to align the positive $z$ axis with the "contact
      normal", with positive forces resisting penetration. Defining this normal
      can be deceptively complicated.  For instance, what is the normal at the
      corner of a box?  Taking the normal as the gradient of the
      signed-distance function for the collision geometry provides a reliable
      definition that will extend to the distance between two bodies and into
      generalized coordinates. The $x$ and $y$ axes of the contact frame are
      any orthogonal basis for the tangential coordinates.</p>

      <example><h1>Brick on a half-plane</h1>

        <p>Let's work through these details on a simple example -- our foam
        brick sitting on the ground.  The ground is welded to the world, so has
        no degrees of freedom; we can ignore the forces applied to the ground
        and focus on the forces applied to the brick.</p>

        <p>Where are the contact forces and contact frames?  If we used only
        box-on-box geometry, then the locations of the contact forces are
        ambiguous during this "face-on-face" contact; this is even more true in
        3D (where three forces would be sufficient).  But by adding extra
        contact spheres to the corners of our box, we are telling the physics
        engine that we specifically want contact forces at each of the corners
        (and all four of the corners in 3D).  I've labeled the frames $C_1$ and
        $C_2$ in the sketch below.</p>

        <figure>
          <img src="data/brick_on_half_plane.jpg" width=50%/>
        </figure>

        <p>In order to achieve static equilibrium, we require that all of the
        forces and torques on the brick be in balance.  This is a simple
        manifestation of the equations $$\tau_g(q) = - \sum_i J^T_i(q)
        F^{c_i},$$ because the configuration $q$ is the $x,z, \theta$ position
        and orientation of the brick; we have three equations and three
        unknowns: \begin{gather*} 0 = \sum_i f^{c_i}_{C_{i,x}} \\ -mg = -\sum_i
        f^{c_i}_{C_{i,z}} \\ 0 = \sum_i \left[ \left[ p^{C_i} - p^{B_{cm}}
        \right] \times f^{c_i} \right]_{W_y}.\end{gather*}  The last equation
        represents the torque balance, taken around the center of mass of the
        brick which I've call $p^{B_{cm}}$.  Torque about $\theta$ corresponds
        to the $y$ component of the cross product.  The torque balance ensures
        that $f^{c_1}_{C_{1,z}} = f^{c_2}_{C_{2,z}}$ assuming the center of
        mass is in the middle of the box; force balance in $z$ set them equal
        to $\frac{mg}{2}$.  We haven't said anything about the horizontal
        forces yet ($0$ is a reasonable solution here). Let's develop that
        next.</p>

      </example>
      
    </subsubsection>

    <subsubsection><h1>The (Coulomb) Friction Cone</h1>

      <p>Now the rules governing contact forces can begin to take shape.  First
      and foremost, we demand that there is no force at a distance. Using
      $\phi_i(q)$ to denote the distance between two bodies in configuration
      $q$, we have $$\phi(q) > 0 \Rightarrow F^{c_i} = 0.$$  Second, we demand
      that the normal force only resists penetration; bodies are never pulled
      into contact: $$f^{c_i}_{C_z} \ge 0.$$  In <i>rigid</i> contact models, we
      solve for the smallest normal force enforces the non-penetration
      constraint (this is known as the principle of least constraint).  In
      <i>soft</i> contact models, we define the force to be a function of the
      penetration depth and velocity.</p>  <todo>Citations</todo>

      <todo>Need figures here</todo>

      <p>Forces in the tangential directions are due to friction.  The most
      commonly used model of friction is Coulomb friction, which states that
      $$|f^{c_i}_{C_{x,y}}|_2 \le \mu f^{c_i}_{C_z},$$ with $\mu$ a
      non-negative scalar <i>coefficient of friction</i>.  Typically we define
      both a $\mu_{static}$, which is applied when the tangential velocity is
      zero, and $\mu_{dynamic}$, applied when the tangential velocity is
      non-zero. In the Coulomb friction model, the tangential contact force is
      the force within this friction cone which produces maximum dissipation.
      </p>

      <p>Taken together, the geometry of these constraints forms a <a
      href="https://en.wikipedia.org/wiki/Convex_cone">cone</a> of admissable
      contact forces.  It is famously known as the "friction cone", and we will
      refer to it often.</p>

      <example><h1>Brick on an inclined half-plane</h1>

        <p>If we take our last example, but tilt the table to an angle relative
        to gravity, then the horizontal forces start becoming important.
        Before going through the equations, let's check your intuition.  Will
        the magnitude of the forces on the two corners stay the same in this
        configuration?  Or will there be more contact force on the lower
        corner?</p>

        <figure>
          <img src="data/brick_incline.jpg" width=50%/>
        </figure>

        <p>In the illustration above, you'll notice that the contact frames
        have rotated so that the $z$ axis is aligned with the contact normals.
        I've sketched two possible friction cones (dark green and lighter
        green), corresponding to two different coefficients of friction.  We
        can tell immediately by inspection that the smaller value of $\mu$
        (corresponding to the darker green) cannot produce contact forces that
        will completely counteract gravity (the friction cone does not contain
        the vertical line).  In this case, the box will slide and no static
        equilibrium exists in this configuration.</p>

        <p>If we increase the coefficient of (static) friction to the range
        corresponding to the lighter green, then we can find contact forces
        that produce an equilibrium.  Indeed, for this problem, we need some
        amount of friction to even have an equilibrium (we'll explore this in
        the <a href="#static_equilibrium">exercises</a>).  We also need for the
        vertical projection of the center of mass onto the ramp to land between
        the two contact points, otherwise the brick will tumble over the bottom
        edge.  We can see this by writing our same force/torque balance
        equations.  We can write them in body frame, B, assuming the center of
        mass in the center of the brick and the brick has length $l$ and height
        $h$:\begin{gather*} f^{c_1}_{B_x} + f^{c_2}_{B_x} = - mg\sin\gamma \\
        f^{c_1}_{B_z}  + f^{c_2}_{B_z} = mg\cos\gamma \\ -h f^{c_1}_{B_x} + l
        f^{c_1}_{B_z} = h f^{c_2}_{B_x} + l f^{c_2}_{B_z} \\ f^{c_1}_{B_z} \ge
        0, \quad f^{c_2}_{B_z} \ge 0 \\ |f^{c_1}_{B_x}| \le \mu f^{c_1}_{B_z},
        \quad |f^{c_2}_{B_x}| \le \mu f^{c_2}_{B_z} \end{gather*} </p>

        <p>So, are the magnitude of the contact forces the same or different?
        Substituting the first equation into the third reveals $$f^{c_2}_{B_z}
        = f^{c_1}_{B_z} + \frac{mgh}{l}\sin\gamma.$$</p>
      
      </example>


    </subsubsection>

    <subsubsection><h1>Static equilibrium as an optimization problem</h1>

      <p>Rather than dropping objects from a random height, perhaps we can
      initialize our simulations using optimization to find the initial
      conditions that are already in static equilibrium.  In <drake></drake>,
      the
      <a
      href="https://drake.mit.edu/doxygen_cxx/classdrake_1_1multibody_1_1_static_equilibrium_problem.html"><code>StaticEquilbriumProblem</code></a>
      collects all of the constraints we enumerated above into an optimization
      problem: \begin{align*} \find_q \quad \subjto \quad& \tau_g(q) = - \sum_i
      J^T_i(q) F^{c_i} & \\ & f^{c_i}_{C_z} \ge 0 & \forall i, \\ &
      |f^{c_i}_{C_{x,y}}|_2 \le \mu f^{c_i}_{C_z} & \forall i, \\ & \phi_i(q)
      \ge 0 & \forall i, \\ & \phi(q) = 0 \textbf{ or } f^{c_i}_{C_z} = 0
      &\forall i, \\ & \text{joint limits}.\end{align*} This is a nonlinear
      optimization problem: it includes the nonconvex non-penetration
      constraints we discussed in the last chapter.  The second-to-last
      constraints (a logical or) is particularly interesting; constraints of
      the form $x \ge 0, y \ge 0, x =0 \textbf{ or } y = 0$ are known as
      complementarity constraints, and are often written as $x \ge 0, y \ge 0,
      xy = 0$.  We can make the problem easier for the nonlinear optimization
      solver by relaxing the equality to $0 \le \phi(q) f^{c_i}_{C_z} \le
      \text{tol}$, which provides a proper gradient for the optimizer to follow
      at the cost of allowing some force at a distance.</p>  
      
      <p>It's easy to add additional costs and constraints; for initial
      conditions we might use an objective that keeps $q$ close to an initial
      configuration-space sample.</p>

      <example><h1>Tall Towers</h1></example>

      <p>So how well does it work?  </p>  <todo>finish this...</todo>

    </subsubsection>

    </subsection>

  </section>

  <section><h1>A few of the nuances of simulating contact</h1>
    
    <example><h1>Contact force inspector</h1>
    
      <p>I've created a simple GUI that allows you to pick any two primitive
      geometry types and inspect the contact information that is computed when
      those object are put into penetration.  There is a lot of information
      displayed there!  Take a minute to make sure you understand the colors,
      and the information provided in the textbox display.
      </p>

      <p><a
        href="https://deepnote.com/project/Ch-5-Bin-Picking-jhCzASd2RI--Q8f2-1T6Hw/%2Fclutter.ipynb"
        style="background:none; border:none;" class="deepnote"
        target="clutter">
        <img
        src="https://deepnote.com/buttons/launch-in-deepnote-white.svg"></a></p>

    </example>

    <p>When I play with the GUI above, I feel almost overwhelmed.  First,
    overwhelmed by the sheer number of cases that we have to get right in the
    code; it's unfortunately extremely common for open-source tools to have
    bugs in here.  But second, overwhelmed by the sense that we are asking the
    wrong question.  Asking to summarize the forces between two bodies in deep
    penetration with a single Cartesian force applied at a point is fraught
    with peril.  As you move the objects, you will find many discontinuities;
    this is a common reason why you sometimes see rigid body simulations
    "explode".  It might seem natural to try to use multiple contact points
    instead of a single contact point to summarize the forces, and some
    simulators do, but it is very hard to write an algorithm that only depends
    on the current configurations of the geometry which applies forces
    consistently from one time step to the next with these approaches.</p>

    <p>I currently feel that the only fundamental way to get around these
    numerical issues is to start reasoning about the entire volume of
    penetration.  The Drake developers have recently proposed a version of
    this that we believe we can make computationally tractable enough to be
    viable for real-time simulation <elib>Elandt19</elib>, and are working
    hard to bring this into full service in Drake.  You will find many
    references to "hydroelastic" contact throughout the code.  We hope to turn
    it on by default soon.</p>

    <todo>Add the hydroelastic skittles video here.  https://youtu.be/5-k2Pc6OmzE</todo>

    <p>In the mean time, we can make simulations robust by carefully curating the contact geometry...</p>

  </section>

  <section><h1>Grasp selection</h1>

    <p>Now that we have that ability to populate our bins full of random
    clutter, we need to determine where we will grasp.  The pipeline that one
    might expect, and indeed the pipeline that robotics was somewhat stuck on
    for decades, is that the first step should be to write some form of
    perception system to look into the bin and segment the point cloud into
    individual objects.  These days that's not impossible, but it's still
    hard!</p>

    <p>The newer approach that has proven incredibly useful for bin picking is
    to just look for graspable areas directly on the (unsegmented) point cloud.
    Some of the most visible proponents of this approach are
    <elib>tenPas17+Mahler17+Zeng18</elib>.  If you look at those references,
    you'll see that they all use learning to somehow decide where in the point
    cloud to grasp.  And I <b>do</b> think learning has a lot to offer in terms
    of choosing good grasps -- it's a nice and natural learning formulation and
    there is significant information besides just what appears in the
    immediate sensor returns that can contribute to deciding where to grasp. But
    often you will see that underlying those learning approaches is an approach
    to selecting grasps based on geometry alone, if only to produce the original
    training data.  I also think that the community doesn't regularly
    acknowledge just how well the purely geometric approaches still work.  I'd
    like to discuss those first.</p>

    <subsection><h1>Point cloud pre-processing</h1>
    
      <p>To get a good view into the bin, we're going to set up multiple RGB-D
      cameras.  I've used three per bin in all of my examples here.  And those
      cameras don't only see the objects in the bin; they also see the bins, the
      other cameras, and anything else in the viewable workspace.  So we have a
      little work to do to merge the point clouds from multiple cameras into a
      single point cloud that only includes the area of interest.</p>

      <p>First, we can <i>crop</i> the point cloud to discard any points that
      are from outside the area of interest (which we'll define as an
      axis-aligned bounding box immediately above the known location of the
      bin).</p>

      <p>As we will discuss in some detail below, many of our grasp selection
      strategies will benefit from <i>estimating the "normals"</i> of the point
      cloud (a unit vector that estimates the normal direction relative to the
      surface of the underlying geometry).  It is actually better to estimate
      the normals on the individual point clouds, making use of the camera
      location that generated those points, than to estimate the normal after
      the point cloud has been merged.</p>
        
      <p>For sensors mounted on the real world, <i>merging point clouds</i>
      requires high-quality camera calibration and must deal with the messy
      depth returns.  All of the tools from the last chapter are relevant, as
      the tasks of merging the point clouds is another instance of the
      point-cloud-registration problem. For the perfect depth measurements we
      can get out of simulation, given known camera locations, we can skip this
      step and simply concatenate the list of points in the point clouds
      together.</p>

      <p>Finally, the resulting raw point clouds likely include many more points
      then we actually need for our grasp planning.  One of the standard
      approaches for <i>down-sampling</i> the point clouds is using a <a
      href="https://en.wikipedia.org/wiki/Voxel">voxel</a> grid -- regularly
      sized cubes tiled in 3D.  We then summarize the original point cloud with
      exactly one point per voxel (see, for instance <a
      href="http://www.open3d.org/docs/latest/tutorial/Advanced/voxelization.html">Open3D's
      note on voxelization</a>).  Since point clouds typically only occupy a
      small percentage of the voxels in 3D space, we use sparse data structures
      to store the voxel grid.  In noisy point clouds, this voxelization step is
      also a useful form of filtering.</p>

      <example><h1>Mustard bottle point clouds</h1>

        <figure><img style="width:60%"
          src="data/mustard_normals.png"></figure>

        <p><a
          href="https://deepnote.com/project/Ch-5-Bin-Picking-jhCzASd2RI--Q8f2-1T6Hw/%2Fclutter.ipynb"
          style="background:none; border:none;" class="deepnote"
          target="clutter">
          <img
          src="https://deepnote.com/buttons/launch-in-deepnote-white.svg"></a></p>
      
        <p>I've produced a scene with three cameras looking at our favorite YCB
        mustard bottle.  I've taken the individual point clouds (already
        converted to the world frame by the
        <code>DepthImageToPointCloud</code> system), converted them into
        Open3D's point cloud format, cropped the point clouds (to get rid of the
        geometry from the other cameras), estimated their normals, merged the
        point clouds, then down-sampled the point clouds.  The order is
        important!</p>

        <p>I've pushed all of the point clouds to meshcat, but with many of them
        set to not be visible by default.  Use the drop-down menu to turn them
        on and off, and make sure you understand basically what is happening on
        each of the steps.  For this one, I can also give you the <a
        href="data/mustard_bottle_point_clouds.html">meshcat output
        directly</a>, if you don't want to run the code.</p>

      </example>

    </subsection>

    <subsection><h1>Estimating normals and local curvature</h1>
    
      <p>It's worth going through the basic estimation of the local geometry in
      a point cloud.  You'll see it has a lot in common with the point cloud
      registration we studied in the last chapter.  Let's think about the
      problem of fitting a plane, in a least-squares sense, to a set of
      points<elib>Shakarji98</elib>. We can describe a plane in 3D with a
      position $p$ and a unit length normal vector, $n$.  The distance between a
      point $p^i$ and a plane is simply given by the magnitude of the inner
      product, $\left| (p^i - p)^T n \right|.$  So our least-squares
      optimization becomes $$\min_{p, n} \quad \sum_{i=1}^N \left| (p^i - p)^T n
      \right|^2, \quad \subjto \quad |n| = 1. $$  Taking the gradient of the
      Lagrangian with respect to $p$ and setting it equal to zero gives us that
      $$p^* = \frac{1}{N} \sum_{i=1}^N p^i.$$ Inserting this back into the
      objective, we can write the problem as $$\min_n n^T W n, \quad \subjto
      \quad |n|=1, \quad \text{where } W = \left[ \sum_i  (p^i - p^*) (p^i -
      p^*)^T \right].$$  Geometrically, this objective is a quadratic bowl
      centered at the origin, with a unit circle constraint.  So the optimal
      solution is given by the (unit-length) eigenvector corresponding to the
      <i>smallest</i> eigenvalue of the data matrix, $W$. And for any optimal
      $n$, the "flipped" normal $-n$ is also optimal.  We can pick arbitrarily
      for now, and then flip the normals in a post-processing step (to make sure
      that the normals all point towards the camera origin).</p>

      <p>What is really interesting is that the second and third
      eigenvalues/eigenvectors also tell us something about the local geometry.
      Because $W$ is symmetric, it has orthogonal eigenvectors, and these
      eigenvectors form a (local) basis for the point cloud.  The smallest
      eigenvalue pointed along the normal, and the <i>largest</i> eigenvalue
      corresponds to the direction of least curvature (the squared dot product
      with this vector is the largest).  This information can be very useful for
      finding and planning grasps.  <elib>tenPas17</elib> and others before them
      use this as a primary heuristic in generating candidate grasps.</p>
  
      <p>In order to approximate the local curvature of a mesh represented by a
      point cloud, we can use our fast nearest neighbor queries to find a
      handful of local points, and use this plane fitting algorithm on just
      those points. When doing normal estimation directly on a depth image,
      people often forgo the nearest-neighbor query entirely; simply using the
      approximation that neighboring points in pixel coordinates are often
      nearest neighbors in the point cloud.  We can repeat that entire
      procedure for every point in the point cloud.</p>  
      
      <p>I remember when working with point clouds started to become a bigger
      part of my life, I thought that surely doing anything moderately
      computational like this on every point in some dense point cloud would be
      incompatible with online perception.  But I was wrong! Even years ago,
      operations like this one were often used inside real-time perception
      loops. (And they pale in comparison to the number of <a
      href="https://en.wikipedia.org/wiki/FLOPS">FLOPs</a> we spend these days
      evaluating large neural networks).</p>

      <example id="normal_estimation"><h1>Normals and local curvature of the mustard bottle.</h1>
    
        <figure><img style="width:60%"
        src="data/mustard_surface_estimation.png"></figure>

        <p><a
          href="https://deepnote.com/project/Ch-5-Bin-Picking-jhCzASd2RI--Q8f2-1T6Hw/%2Fclutter.ipynb"
          style="background:none; border:none;" class="deepnote"
          target="clutter">
          <img
          src="https://deepnote.com/buttons/launch-in-deepnote-white.svg"></a></p>
  
        <p>I've coded up the basic least-squares surface estimation algorithm,
        with the query point in green, the nearest neighbors in blue, and the
        local least squares estimation drawn with our RGB$\Leftrightarrow$XYZ
        frame graphic.  You should definitely slide it around and see if you can
        understand how the axes line up with the normal and local curvature.</p>

      </example>
    
      <p>You might wonder where you can read more about algorithms of this type.
      I don't have a great reference for you.  But Radu Rusu was the main author
      of the point cloud library<elib>Rusu11</elib>, and his thesis has a lot of
      nice summaries of the point cloud algorithms of
      2010<elib>Rusu10</elib>.</p>

    </subsection>

    <subsection id="grasp_candidates"><h1>Evaluating a candidate grasp</h1>

      <p>Now that we have processed our point cloud, we have everything we need
      to start planning grasps.  I'm going to break that discussion down into
      two steps.  In this section we'll come up with a cost function that scores
      grasp candidates.  In the following section, we'll discuss some very
      simple ideas for trying to find grasps candidates that have a low
      cost.</p>

      <p>There is a massive literature in robotics on "grasp metrics" and "grasp
      analysis".  The Handbook of Robotics<elib>Siciliano16</elib> has an entire
      chapter on grasping devoted to it (which I
      recommend)<elib>Prattichizzo08</elib>.  For a course on manipulation
      (especially from someone who cares a lot about dynamics) to spend so
      little time on this topic is perhaps surprising, but a great deal of that
      literature is focused on analyzing point contacts between a dexterous hand
      and an object with known geometry.  That's simply not the situation we
      find ourselves in for a lot of manipulation work today.  We have ourselves
      a simple two fingered grippers with relatively a lot of grasping force (>
      80 N) and only local point clouds to represent our geometry.</p>

      <p>But there is one key idea that I do want to extract from that
      literature: the value of <b>antipodal grasps</b>.  Roughly speaking, once
      we pick up an object -- or whatever happens to be between our fingers when
      we squeeze -- then we will expect the contact forces between our fingers
      to have to resist at least the <i>gravitational wrench</i> (the spatial
      force due to gravity) of the object.  The closing force provided by our
      gripper is in the gripper's $x$-axis, but if we want to be able to pick up
      the object without it slipping from our hands, then we need forces inside
      the friction cones of our contacts to be able to resist the gravitational
      wrench.  Since we don't know what that wrench will be (and are somewhat
      constrained by the geometry of our fingers from tilting our contact
      normals into the vertical direction), a reasonable strategy is to look for
      places on the point cloud where the normals will align with $x$-axis of
      the gripper.  That means finding a pair of points on the point cloud with
      normals that are collinear with the line connecting the points and are
      pointing in opposite directions, these are
      <i>antipodal points</i>.  In a real point cloud, we are unlikely to find
      perfect antipodal pairs, but finding areas with normals pointing in nearly
      opposite directions is a good strategy for grasping!</p>


      <example><h1>Scoring grasp candidates</h1>

        <p>In practice, the contact between our fingers and the object(s) will
        be better described by a <i>patch contact</i> than by a point contact
        (due to the deflection of the rubber fingertips and any deflection of
        the objects being picked).  So it makes sense to look for patches of
        points with agreeable normals.  There are many ways one could write
        this, I've done it here by transforming the processed point cloud of the
        scene into the candidate frame of the gripper, and cropped away all of
        the points except the ones that are inside a bounding box between the
        finger tips (I've marked them in red in MeshCat).  The first term in my
        grasping cost function is just reward for all of the points in the point
        cloud, based on how aligned their normal is to the $x$-axis of the
        gripper: $$\text{cost} = -\sum_i (n^i_{G_x})^2,$$ where $n^i_{G_x}$ is
        the $x$ component of the $i$th point in the cropped point cloud
        expressed in the gripper frame.</p>

        <figure><img style="width:80%"
          src="data/grasp_candidate_evaluation.png"></figure>
    
        <p>There are other considerations for what might make a good grasp, too.
        For our kinematically limited robot reaching into a bin, we might favor
        grasps that put the hand in favorable orientation for the arm.  In the
        grasp metric I've implemented in the code, I added a cost for the hand
        deviating from vertical.  I can reward the dot product of the vector
        world $-z$ vector, $[0, 0, -1]$ with the $y$-axis in gripper frame
        rotated into world frame with : $$\text{cost} \mathrel{{+}{=}} -\alpha
        \begin{bmatrix} 0 & 0 &-1\end{bmatrix}R^G \begin{bmatrix}0 \\ 1 \\
        0\end{bmatrix} = \alpha R^G_{3,2},$$ where $\alpha$ is relative cost
        weight, and $R^G_{3,2}$ is the scalar element of the rotation matrix in
        the third row and second column.</p>        

        <p>Finally, we need to consider collisions between the candidate grasp
        and both the bins and with the point cloud.  I simply return infinite
        cost when the gripper is in collision.  I've implemented all of those
        terms in the notebook, and given you a sliders to move the hand around
        and see how the cost changes.</p>

        <p><a
          href="https://deepnote.com/project/Ch-5-Bin-Picking-jhCzASd2RI--Q8f2-1T6Hw/%2Fclutter.ipynb"
          style="background:none; border:none;" class="deepnote"
          target="clutter">
          <img
          src="https://deepnote.com/buttons/launch-in-deepnote-white.svg"></a></p>
    
      </example>

    </subsection>

    <subsection><h1>Generating grasp candidates</h1>

      <p>We've defined a cost function that, given a point cloud from the scene
      and a model of the environment (e.g. the location of the bins), can score
      a candidate grasp pose, $X^G$.  So now we would like to solve the
      optimization problem: find $X^G$ that minimizes the cost subject to the
      collision constraints.</p>

      <p>Unfortunately, this is an extremely difficult optimization problem,
      with a highly nonconvex objective and constraints.  Moreover, the cost
      terms corresponding to the antipodal points is zero for most $X^G$ --
      since most random $X^G$ will not have any points between the fingers.  As
      a result, instead of using the typical mathematical programming solvers,
      most approaches in the literature resort to a randomized sampling-based
      algorithm.  And we do have strong heuristics for picking reasonable
      samples.</p>

      <p>One heuristic, used for instance in <elib>tenPas17</elib>, is to use
      the local curvature of the point cloud to propose grasp candidates that
      have the point cloud normals pointing into the palm, and orients the hand
      so that the fingers are aligned with the direction of maximum curvature.
      One can move the hand in the direction of the normal until the fingers are
      out of collision, and even sample nearby points.  We have written <a
      href="#gpg">an exercise</a> for you to explore this heuristic.  But for
      our YCB objects, I'm not sure it's the best heuristic; we have a lot of
      boxes, and boxes don't have a lot of information to give in their local
      curvature.</p>

      <p>Another heuristic is to find antipodal point pairs in the point cloud,
      and then sample grasp candidates that would align the fingers with those
      antipodal pairs.  Many 3D geometry libraries support "ray casting"
      operations at least for a voxel representation of a point cloud; so a
      reasonable approach to finding antipodal pairs is to simply choose a point
      at random in the point cloud, then ray cast into the point cloud in the
      opposite direction of the normal.  If the normal of the voxel found by ray
      casting is sufficiently antipodal, and if the distance to that voxel is
      smaller than the gripper width, then we've found a reasonable antipodal
      point pair.</p>

      <example><h1>Generating grasp candidates</h1>

        <p>Open3D doesn't implement ray casting (yet), so I've implemented an
        even simpler heuristic in my code example: I simply choose a point at
        random, and start sampling grasps in orientation (starting from
        vertical) that align the $x$-axis of the gripper with the normal of that
        point.  Then I mostly just rely on the antipodal term in my scoring
        function to allow me to find good grasps.</p>

        <figure><img style="width:60%"
          src="data/candidate_grasp_mustard.png"></figure>

        <p>I do implement one more heuristic -- once I've found the points in
        the point cloud that are between the finger tips, then I move the hand
        out along the gripper $x$-axis so that those points are centered in the
        gripper frame.  This helps prevent us knocking over objects as we close
        our hands to grasp them.</p>

        <p>But that's it!  It's a very simple strategy.  I sample a handful of
        candidate grasps and just draw the top few with the lowest cost.  If you
        run it a bunch of times, I think you will find it's actually quite
        reasonable.  Every time it runs, it is simulating the objects falling
        from the sky; the actual grasp evaluation is actually quite fast.</p>

        <figure><img style="width:80%"
            src="data/grasp_candidates.png"></figure>
  
        <p><a
          href="https://deepnote.com/project/Ch-5-Bin-Picking-jhCzASd2RI--Q8f2-1T6Hw/%2Fclutter.ipynb"
          style="background:none; border:none;" class="deepnote"
          target="clutter">
          <img
          src="https://deepnote.com/buttons/launch-in-deepnote-white.svg"></a></p>
        
      </example>


    </subsection>


  </section>

  <section><h1>The corner cases</h1>
  
    <p>If you play around with the grasp scoring I've implemented above a little
    bit, you will find deficiencies.  Some of them are addressed easily (albeit
    heuristically) by adding a few more terms to the cost.  For instance, I
    didn't check collisions of the pre-grasp configuration, but this could be
    added easily.</p>

    <p>There are other cases where grasping alone is not sufficient as a
    strategy.  Imagine that you place an object right in one of the corners of
    the bin.  It might not be possible to get the hand around both sides of the
    object without being in collision with either the object or the side.  The
    strategy above will never choose to try to grab the very corner of a box
    (because it always tried to align the sample point normal with the gripper
    $x$), and it's not clear that it should.  This is probably especially true
    for our relatively large gripper.  In the setup we used at TRI, we
    implemented an additional simple "pushing" heuristic that would be used if
    there were point clouds in the sink, but no viable grasp candidate could be
    found.   Instead of grasping, we would drive the hand down and nudge that
    part of the point cloud towards the middle of the bin.  This can actually
    help a lot!
    </p>

    <p>There are other deficiencies to our simple approach that would be very
    hard to address with a purely geometric approach.  Most of them come down to
    the fact that our system so far has no notion of "objects".  For instance,
    it's not uncommon to see this strategy result in "double picks" if two
    objects are close together in the bin.  For heavy objects, it might be
    important to pick up the object close to the center of mass, to improve our
    chances of resisting the gravitational wrench while staying in our friction
    cones.  But our strategy here might pick up a heavy hammer by squeezing just
    the very far end of the handle.</p>

    <p>Interestingly, I don't think the problem is necessarily that the point
    cloud information is insufficient, even without the color information.  I
    could show you similar point clouds and you wouldn't make the same mistake.
    These are the types of examples where learning a grasp metric could
    potentially help.  We don't need to achieve artificial general intelligence
    to solve this one; just experience knowing that when we tried to grasp in
    someplace before we failed would be enough to improve our grasp heuristics
    significantly.</p>
  
    <todo>render some point clouds that a human can distinguish but out algorithm would not.</todo>

  </section>

  <section><h1>Putting it all together</h1>
  
    <subsection><h1>A simple state machine</h1></subsection>
  
  </section>

  <section id="exercises"><h1>Exercises</h1>

    <exercise id="static_equilibrium"><h1>Assessing static equilibrium</h1>
        <p>For this problem, we'll use the Coulomb friction model, where $|f_t| \le \mu f_n$. In other words, the friction force is large enough to resist any movement up until the force required would exceed $\mu f_n$, in which case $|f_t| = \mu f_n$.
        <ol type="a">
          <li>
            <p>Consider a box with mass $m$ sitting on a ramp at angle $\theta$, with coefficient of box $\mu$ in between the sphere and the ramp:</p>
            <figure>
                <img style="width:50%", src="figures/exercises/static_equilibrium_ramp.png"/>
            </figure>
            <p>For a given angle $\theta$, what is the minimum coefficient of friction required for the box to not slip down the plane? Use $g$ for acceleration due to gravity.</p>
          </li>
        </ol>
        <p>Now consider a flat ground plane with three solid (uniform density) spheres sitting on it, with
             radius $r$ and mass $m$.
             Assume they have the same coefficient of friction $\mu$ between each other as with the ground.</p>

        <p>For each of the following configurations: could the spheres be in
        static equilibrium for some $\mu\in[0,1]$, $m > 0$, $r > 0$? Explain
        why or why not. Remember that both torques and forces need to be
        balanced for all bodies to be in equilibrium.</p>

        <p>To help you solve these problems, we have <a
        href="https://deepnote.com/project/Exercise-51-Assessing-static-equilibrium-2I5TX7SESPGgkr533KhVgw/%2Fstatic_equilibrium.ipynb">a
        notebook</a> to help you build intuition and test your answers. It lets
        you specify the configuration of the spheres and then uses the <a
        href="https://drake.mit.edu/doxygen_cxx/classdrake_1_1multibody_1_1_static_equilibrium_problem.html">StaticEquilbriumProblem</a>
        class to solve for static equilibrium. Use this notebook to help
        visualize and think about the system, but for each of the
        configurations, you should have a physical explanation for your answer.
        (An example of such a physical explanation would be a free body diagram
        of the forces and torques on each sphere, and equations for how they
        can or cannot sum to zero. This is essentially what
        StaticEquilbriumProblem checks for.)</p>
        <ol start="2" type="a">
          <li> Spheres stacked on top of each other:
            <figure>
                <img style="width:50%", src="figures/exercises/static_equilibrium_stacked.png"/>
            </figure>
          </li>
          <li> One sphere on top of another, offset:
            <figure>
                <img style="width:50%", src="figures/exercises/static_equilibrium_tilted.png"/>
            </figure>
          </li>
          <li> Spheres stacked in a pyramid:
            <figure>
                <img style="width:50%", src="figures/exercises/static_equilibrium_pyramid.png"/>
            </figure>
          </li>
          <li> Spheres stacked in a pyramid, but with a distance $d$ in between the bottom two:
            <figure>
                <img style="width:50%", src="figures/exercises/static_equilibrium_pyramid_with_space.png"/>
            </figure>
          </li>
        </ol>
        <p>Finally, a few conceptual questions on the StaticEquilbriumProblem:</p>
        <ol start="6" type="a">
            <li>Why does it matter what initial guess we specify for the system? (Hint: what type of optimization problem is this?)</li>
            <li>Take a look at the Drake documentation for <a href="https://drake.mit.edu/doxygen_cxx/classdrake_1_1multibody_1_1_static_equilibrium_problem.html">StaticEquilbriumProblem</a>. It lists the constraints that are used when it's solve for equilibrium. Which two of these can a free body diagram answer?</li>
        </ol>
      </exercise>

    <exercise><h1>Normal Estimation from Depth</h1>
      <p>For this exercise, you will investigate a slightly different approach to normal vector estimation. In particular, we can exploit the spatial structure that is already in a depth image to avoid computing nearest neighbors. You will work exclusively in <a href="https://deepnote.com/project/52-Normal-Estimation-from-Depth-plsl4e2hTd63nWEgo9khBw/%2Fnormal_estimation_depth.ipynb" target="_blank">this notebook</a>. You will be asked to complete the following steps:</p>
      <ol type="a">
        <li> Implement a method to estimate normal vectors from a depth image, without computing nearest neighbors. </li>
        <li> Reason about a scenario where the depth image-based solution will not be as performant as computing nearest-neighbors.</li>
      </ol>    
    </exercise>

    <exercise><h1>Analytic Antipodal Grasping</h1>
      <p>So far, we have used sampling-based methods to find antipodal grasps - but can we have find one analytically if we knew the equation of the object shape? For this exercise, you will analyze and implement a correct-by-construction method to find antipodal points using symbolic differentiation and MathematicalProgram. You will work exclusively in <a href="https://deepnote.com/project/53-Analytic-Antipodal-Grasping-ukPUQmOZShSgyP8TvAG3lw/%2Fanalytic_antipodal_grasps.ipynb" target="_blank">this notebook</a>. You will be asked to complete the following steps:</p>
      <ol type="a">
        <li> Prove that antipodal points are critical points of an energy function defined on the shape. </li>
        <li> Prove the converse does not hold.</li>
        <li> Implement a method to find these antipodal points using MathematicalProgram.</li>
        <li> Analyze the Hessian of the energy function and its relation to the type of antipodal grasps.</li>
      </ol>    
    </exercise>

    <exercise id="gpg"><h1>Grasp Candidate Generation</h1>

      <p>In the chapter Colab notebook, we generated grasp candidates using the
      antipodal heuristic.  In this exercise, we will investigate an alternative
      method for generating grasp candidates based on local curvature, similar
      to the one proposed in <elib>tenPas17</elib>. This exercise is
      implementation-heavy, and you will work exclusively in <a
      href="https://deepnote.com/project/54-Grasp-Candidate-Generation-AjoJNLmcT-qvywcR7WiUlg/%2Fgrasp_candidate.ipynb"
      target="_blank">this notebook</a>.</p>
    </exercise>    

    <exercise id="behaviorTrees"><h1>Behavior Trees</h1>

      <p>Let's reintroduce the terminology of behavior trees. Behavior trees provide a structure for switching between different tasks in a way that is both modular and reactive. Behavior trees are a directed rooted tree where the internal nodes manage the control flow and the leaf nodes are actions to be executed or conditions to be evaluated. For example, an action may be to "pick ball". This action can succeed or fail. A condition may be "Is my hand empty?" which can be true (thus the condition succeeds) or can be false (the condition fails). <br/><br/>

      We'll consider two categories of control flow nodes:
      <dl>
        <dt>Sequence Node: ($\rightarrow$)</dt> <dd>Sequence nodes execute each of the child behaviors one after another. The sequence <i>fails if any of the children fail</i>. One way to think about this operator is that a sequence node takes an "and" over all of the child behaviors.</dd>
        <dt>Fallback Node: (?)</dt> <dd>Fallback nodes also execute each of the child behaviors one after another. However, fallback <i>succeeds if any of the children succeed</i>. One way to think about this operator is that the fallback node takes an "or" over all of the children behaviors.</dd>
</dl>

    The symbols are visualized below. Sequence nodes are represented as an arrow in a box, fallback nodes are represented as a question mark in a box, actions are inside boxes and conditions are inside ovals.</p>
    <figure><img style="width:70%", src="figures/exercises/behaviorTree_symbols.png"/></figure>

    <p>Let's apply our understanding of behavior trees in the context of a simple task where a robot's goal is to: find a ball, pick the ball up and place the ball in a bin. We can describe task with the high-level behavior tree:</p>
    <figure><img style="width:80%", src="figures/exercises/behaviorTree_highLevel.png"/></figure>

    <p>Confirm to yourself that this small behavior tree captures the task! We can go one level deeper and expand out our "Pick Ball" behavior:</p>
    <figure><img style="width:80%", src="figures/exercises/behaviorTree_pick.png"/></figure>

    <p>The pick ball behavior can be considered as such: Let's start with the left branch. If the ball is already close, the condition "Ball Close?" returns true. If the ball is not close, we execute an action to "Approach ball", thus making the ball close. Thus either the ball is already close (i.e. the condition "Ball Close?" is true) or we approach the ball (i.e. the action "Approach ball" is successfully executed) to make the ball close. Given that the ball is now close, we consider the right branch of the pick action. If the ball is already grasped, the condition "Ball Grasped?" returns true. if the ball is not grasped, we execute the action, "Grasp Ball".<br/>

    We can expand our our behavior tree even further to give our full behavior:</p>
    <figure><img style="width:99%", src="figures/exercises/behaviorTree_detail.png" id="behaviorTree"/>
            <figcaption>Behavior Tree for Pick-Up Task</figcaption></figure>

    <p>Here we've added the behavior that if the robot cannot complete the task, it can ask a kind human for help. Take a few minutes to walk through the behavior tree and convince yourself it encodes the desired task behavior.</p>

    <ol type="a">
        <li> We claimed behavior trees enable reactive behavior. Let's explore that. Imagine we have a robot executing our <a href="#behaviorTree">behavior tree</a>. As the robot is executing the action "Approach Bin", a rude human knocks the ball out of the robot's hand. The ball rolls to a position that the robot can still see, but is quite far away. Following the logic of the behavior tree, what condition fails and what action is executed because of this?</li>

        <li>Another way to mathematically model computation is through finite state machines (FSM). Finite state machines are composed of states, transitions and events. We can encode the behavior of our behavior tree for our pick-up task through a finite state machine, as shown below:


    <figure><img style="width:99%", src="figures/exercises/fsm_blanked.png"/></figure>
    We have left a few states and transitions blank. Your task is to fill in those 4 values such that the finite state machine produces the same behavior as the behavior tree. </li>
    </ol>

    <p>From this simple task we can see that while finite state machines can often be easy to understand and intuitive, they can quickly get quite messy!</p>

    <ol type="a" start=3> 
        <li> Throughout this chapter and the lectures we've discussed a bin
        picking task. Our goal is to now capture the behavior of the
        bin-picking task using a behavior tree. We want to encode the following
        behavior: while Bin 1 is not empty, we are going to select a feasible
        grasp within the bin, execute our grasp and transport what's within our
        grasp to Bin 2. If there are still objects within Bin 1, but we cannot
        find a feasible grasp, assume we have some action that shakes the bin,
        hence randomizing the objects and (hopefully) making feasible grasps
        available.<br/><br/>

        We define the following conditions:
        <ul>
          <li>"Bin 1 empty?</li>
          <li>"Feasible Grasp Exists?"</li>
          <li>"Grasp selected?"</li>
          <li>"Grasped Object(s) above Bin 2?"</li>
          <li>"Object Grasped?"</li>
          <li>"Object(s) in Bin 2?"</li>
        </ul><br/>
        And the following actions:

        <ul>
          <li>"Drop grasped object(s) into Bin 2"</li>
          <li>"Grasp Object(s)"</li>
          <li>"Select Grasp"</li>
          <li>"Shake Bin"</li>
          <li>"Transport grasped object(s) to above Bin 2"</li>
        </ul><br/>
        Using these conditions, actions and our two control flow nodes (Sequence and Fallback), draw a behavior tree that captures the behavior of our bin picking task. 
        </li>
    </ol>

    <p>We claimed behavior trees are modular. Their modularity comes from the fact that we can write reusable snippets of code that define actions and conditions and then compose them using behavior trees. For example, we can imagine that the "Select Grasp" action from part (c) is implemented using the antipodal grasping mechanism we have built up throughout this chapter. And the "Transport grasped object(s) to above Bin 2" action could be implemented using the pick and place tools we developed in Chapter 3.</p>

    <p>Source: The scenario and figures for parts (a) and (b) of this exercise are inspired by material in <elib>Colledanchise18</elib></p>
    </exercise>   

  </section>


</chapter>
<!-- EVERYTHING BELOW THIS LINE IS OVERWRITTEN BY THE INSTALL SCRIPT -->

<div id="references"><section><h1>References</h1>
<ol>

<li id=Izatt20>
<span class="author">Gregory Izatt and Russ Tedrake</span>, 
<span class="title">"Generative Modeling of Environments with Scene Grammars and Variational Inference"</span>, 
<span class="publisher">Proceedings of the 2020 IEEE International Conference on Robotics and Automation (ICRA)</span> , <span class="year">2020</span>.
[&nbsp;<a href="http://groups.csail.mit.edu/robotics-center/public_papers/Izatt20.pdf">link</a>&nbsp;]

</li><br>
<li id=Elandt19>
<span class="author">Ryan Elandt and Evan Drumwright and Michael Sherman and Andy Ruina</span>, 
<span class="title">"A pressure field model for fast, robust approximation of net contact force and moment between nominally rigid objects"</span>, 
, pp. 8238-8245, 11, <span class="year">2019</span>.

</li><br>
<li id=tenPas17>
<span class="author">Andreas ten Pas and Marcus Gualtieri and Kate Saenko and Robert Platt</span>, 
<span class="title">"Grasp pose detection in point clouds"</span>, 
<span class="publisher">The International Journal of Robotics Research</span>, vol. 36, no. 13-14, pp. 1455--1473, <span class="year">2017</span>.

</li><br>
<li id=Mahler17>
<span class="author">Jeffrey Mahler and Jacky Liang and Sherdil Niyaz and Michael Laskey and Richard Doan and Xinyu Liu and Juan Aparicio Ojea and Ken Goldberg</span>, 
<span class="title">"Dex-net 2.0: Deep learning to plan robust grasps with synthetic point clouds and analytic grasp metrics"</span>, 
<span class="publisher">arXiv preprint arXiv:1703.09312</span>, <span class="year">2017</span>.

</li><br>
<li id=Zeng18>
<span class="author">Andy Zeng and Shuran Song and Kuan-Ting Yu and Elliott Donlon and Francois R Hogan and Maria Bauza and Daolin Ma and Orion Taylor and Melody Liu and Eudald Romo and others</span>, 
<span class="title">"Robotic pick-and-place of novel objects in clutter with multi-affordance grasping and cross-domain image matching"</span>, 
<span class="publisher">2018 IEEE international conference on robotics and automation (ICRA)</span> , pp. 1--8, <span class="year">2018</span>.

</li><br>
<li id=Shakarji98>
<span class="author">Craig M Shakarji</span>, 
<span class="title">"Least-squares fitting algorithms of the NIST algorithm testing system"</span>, 
<span class="publisher">Journal of research of the National Institute of Standards and Technology</span>, vol. 103, no. 6, pp. 633, <span class="year">1998</span>.

</li><br>
<li id=Rusu11>
<span class="author">Radu Bogdan Rusu and Steve Cousins</span>, 
<span class="title">"{3D is here: Point Cloud Library (PCL)}"</span>, 
<span class="publisher">Proceedings of the IEEE International Conference on Robotics and Automation (ICRA)</span> , May 9-13, <span class="year">2011</span>.

</li><br>
<li id=Rusu10>
<span class="author">Radu Bogdan Rusu</span>, 
<span class="title">"Semantic 3D Object Maps for Everyday Manipulation in Human Living Environments"</span>, 
PhD thesis, Institut für Informatik der Technischen Universität München, <span class="year">2010</span>.

</li><br>
<li id=Siciliano16>
<span class="author">Bruno Siciliano and Oussama Khatib</span>, 
<span class="title">"Springer handbook of robotics"</span>, Springer
, <span class="year">2016</span>.

</li><br>
<li id=Prattichizzo08>
<span class="author">Domenico Prattichizzo and Jeffrey C Trinkle</span>, 
<span class="title">"Grasping"</span>, 
<span class="publisher">Springer Handbook of Robotics</span> , pp. 671-700, <span class="year">2008</span>.

</li><br>
<li id=Colledanchise18>
<span class="author">Michele Colledanchise and Petter Ogren</span>, 
<span class="title">"Behavior trees in robotics and AI: An introduction"</span>, CRC Press
, <span class="year">2018</span>.

</li><br>
</ol>
</section><p/>
</div>

<table style="width:100%;" pdf="no"><tr style="width:100%">
  <td style="width:33%;text-align:left;"><a class="previous_chapter" href=pose.html>Previous Chapter</a></td>
  <td style="width:33%;text-align:center;"><a href=index.html>Table of contents</a></td>
  <td style="width:33%;text-align:right;"><a class="next_chapter" href=segmentation.html>Next Chapter</a></td>
</tr></table>

<div id="footer" pdf="no">
  <hr>
  <table style="width:100%;">
    <tr><td><a href="https://accessibility.mit.edu/">Accessibility</a></td><td style="text-align:right">&copy; Russ
      Tedrake, 2021</td></tr>
  </table>
</div>


</body>
</html>