forked from RussTedrake/underactuated
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathtrajopt.html
1787 lines (1484 loc) · 98.4 KB
/
trajopt.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<!DOCTYPE html>
<html>
<head>
<title>Ch. 10 - Trajectory
Optimization</title>
<meta name="Ch. 10 - Trajectory
Optimization" content="text/html; charset=utf-8;" />
<link rel="canonical" href="http://underactuated.mit.edu/trajopt.html" />
<script src="https://hypothes.is/embed.js" async></script>
<script type="text/javascript" src="chapters.js"></script>
<script type="text/javascript" src="htmlbook/book.js"></script>
<script src="htmlbook/mathjax-config.js" defer></script>
<script type="text/javascript" id="MathJax-script" defer
src="htmlbook/MathJax/es5/tex-chtml.js">
</script>
<script>window.MathJax || document.write('<script type="text/javascript" src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-chtml.js" defer><\/script>')</script>
<link rel="stylesheet" href="htmlbook/highlight/styles/default.css">
<script src="htmlbook/highlight/highlight.pack.js"></script> <!-- http://highlightjs.readthedocs.io/en/latest/css-classes-reference.html#language-names-and-aliases -->
<script>hljs.initHighlightingOnLoad();</script>
<link rel="stylesheet" type="text/css" href="htmlbook/book.css" />
</head>
<body onload="loadChapter('underactuated');">
<div data-type="titlepage">
<header>
<h1><a href="index.html" style="text-decoration:none;">Underactuated Robotics</a></h1>
<p data-type="subtitle">Algorithms for Walking, Running, Swimming, Flying, and Manipulation</p>
<p style="font-size: 18px;"><a href="http://people.csail.mit.edu/russt/">Russ Tedrake</a></p>
<p style="font-size: 14px; text-align: right;">
© Russ Tedrake, 2023<br/>
Last modified <span id="last_modified"></span>.</br>
<script>
var d = new Date(document.lastModified);
document.getElementById("last_modified").innerHTML = d.getFullYear() + "-" + (d.getMonth()+1) + "-" + d.getDate();</script>
<a href="misc.html">How to cite these notes, use annotations, and give feedback.</a><br/>
</p>
</header>
</div>
<p><b>Note:</b> These are working notes used for <a
href="https://underactuated.csail.mit.edu/Spring2023/">a course being taught
at MIT</a>. They will be updated throughout the Spring 2023 semester. <a
href="https://www.youtube.com/channel/UChfUOAhz7ynELF-s_1LPpWg">Lecture videos are available on YouTube</a>.</p>
<table style="width:100%;"><tr style="width:100%">
<td style="width:33%;text-align:left;"><a class="previous_chapter" href=lyapunov.html>Previous Chapter</a></td>
<td style="width:33%;text-align:center;"><a href=index.html>Table of contents</a></td>
<td style="width:33%;text-align:right;"><a class="next_chapter" href=policy_search.html>Next Chapter</a></td>
</tr></table>
<script type="text/javascript">document.write(notebook_header('trajopt'))
</script>
<!-- EVERYTHING ABOVE THIS LINE IS OVERWRITTEN BY THE INSTALL SCRIPT -->
<chapter style="counter-reset: chapter 9"><h1>Trajectory
Optimization</h1>
<p>I've argued that optimal control is a powerful framework for specifying
complex behaviors with simple objective functions, letting the dynamics and
constraints on the system shape the resulting feedback controller (and vice
versa!). But the computational tools that we've provided so far have been
limited in some important ways. The numerical approaches to dynamic
programming which involve putting a mesh over the state space do not scale
well to systems with state dimension more than four or five. Linearization
around a nominal operating point (or trajectory) allowed us to solve for
locally optimal control policies (e.g. using LQR) for even very
high-dimensional systems, but the effectiveness of the resulting controllers
is limited to the region of state space where the linearization is a good
approximation of the nonlinear dynamics. The computational tools for
Lyapunov analysis from the last chapter can provide, among other things, an
effective way to compute estimates of those regions, and the sums-of-squares
approaches to dynamic programming are still in their early days. We have not
yet provided the mainstream computational tools for approximate optimal
control that work for high-dimensional systems beyond the linearization
around a goal. That is precisely the goal for this chapter.</p>
<p>In order to scale to high-dimensional systems, we are going to formulate a
simpler version of the optimization problem. Rather than trying to solve for
the optimal feedback controller for the entire state space, in this chapter
we will instead attempt to find an optimal control solution that is valid
from only a single initial condition. Instead of representing this as a
feedback control function, we can represent this solution as a
<em>trajectory</em>, $\bx(t), \bu(t)$, typically defined over a finite
interval.</p>
<section><h1>Problem Formulation</h1>
<p>Given an initial condition, $\bx_0$, and an input trajectory $\bu(t)$
defined over a finite interval, $t\in[t_0,t_f],$ we can compute the
long-term (finite-horizon) cost of executing that trajectory using the
standard additive-cost optimal control objective, \[ J_{\bu(\cdot)}(\bx_0)
= \ell_f (\bx(t_f)) + \int_{t_0}^{t_f} \ell(\bx(t),\bu(t)) dt. \] We will
write the trajectory optimization problem as \begin{align*}
\min_{\bu(\cdot)} \quad & \ell_f (\bx(t_f)) + \int_{t_0}^{t_f}
\ell(\bx(t),\bu(t)) dt \\ \subjto \quad & \dot{\bx}(t) = f(\bx(t),\bu(t)),
\quad \forall t\in[t_0, t_f] \\ & \bx(t_0) = \bx_0. \\ \end{align*} Many
trajectory optimization problems will also include additional constraints,
such as collision avoidance (e.g., where the constraint is that the signed
distance between the robot's geometry and the obstacles stays positive) or
input limits (e.g. $\bu_{min} \le \bu \le \bu_{max}$ ). Constraints can be
defined for all time or some subset of the trajectory.</p>
<p> As written, the optimization above is an optimization over continuous
trajectories. In order to formulate this as a numerical optimization, we
must parameterize it with a finite set of numbers. Perhaps not
surprisingly, there are many different ways to write down this
parameterization, with a variety of different properties in terms of speed,
robustness, and accuracy of the results. We will outline just a few of the
most popular below; I would recommend <elib>Betts98+Betts01</elib> for
additional details.</p>
<p>It is worth contrasting this parameterization problem with the one that
we faced in our continuous-dynamic programming algorithms. For trajectory
optimization, we need a finite-dimensional parameterization in only one
dimension (time), whereas in the mesh-based value iteration algorithms we
had to work in the dimension of the state space. Our mesh-based
discretizations scaled badly with this state dimension, and led to
numerical errors that were difficult to deal with. Discreting over just
one dimension (time), is fundamentally different. There is relatively much
more known about discretizing solutions to differential equations over
time, including work on error-controlled integration. And the number of
parameters required for trajectory parameterizations scales linearly with
the state dimension, instead of exponentially in mesh-based value
iteration.</p>
</section>
<section><h1>Convex Formulations for Linear Systems</h1>
<p>Let us first consider the case of linear systems. In fact, if we start
in discrete time, we can even defer the question of how to best discretize
the continuous-time problem. There are a few different ways that we might
"transcribe" this optimization problem into a concrete mathematical
program.</p>
<subsection><h1>Direct Transcription</h1>
<p>For instance, let us start by writing both $\bu[\cdot]$ and
$\bx[\cdot]$ as decision variables. Then we can write: \begin{align*}
\min_{\bx[\cdot],\bu[\cdot]} \quad & \ell_f(\bx[N]) + \sum_{n=0}^{N-1}
\ell(\bx[n],\bu[n]) \\ \subjto \quad & \bx[n+1] = {\bf A}\bx[n] + {\bf
B}\bu[n], \quad \forall n\in[0, N-1] \\ & \bx[0] = \bx_0 \\ & +
\text{additional constraints}. \end{align*} We call this modeling choice
-- adding $\bx[\cdot]$ as decision variables and modeling the discrete
dynamics as explicit constraints -- the "<i>direct transcription</i>".
Importantly, for linear systems, the dynamics constraints are linear
constraints in these decision variables. As a result, if we can restrict
our additional constraints to linear inequality constraints and our
objective function to being linear/quadratic in $\bx$ and $\bu$, then the
resulting trajectory optimization is a convex optimization (specifically
a linear program or quadratic program depending on the objective). We
can reliably solve these problems to global optimality at quite large
scale; these days it is common to solve these optimization online inside
a high-rate feedback controller.</p>
<example><h1>Trajectory optimization for the Double Integrator</h1>
<p>We've looked at a few optimal control problems for the double
integrator using value iteration. For one of them -- the quadratic
objective with no constraints on $\bu$ -- we know now that we could have
solved the problem "exactly" using LQR. But we have not yet given
satisfying numerical solutions for the minimum-time problem, nor for the
constrained LQR problem.</p>
<p>In the trajectory formulation, we can solve these problems exactly
for the discrete-time double integrator, and with better accuracy for
the continuous-time double integrator. Take a moment to appreciate
that! The bang-bang policy and cost-to-go functions are fairly
nontrivial functions of state; it's quite satisfying that we can
evaluate them using convex optimization! The limitation, of course, is
that we are only solving them for one initial condition at a time.</p>
<script>document.write(notebook_link('trajopt', d=deepnote, link_text="", notebook="double_integrator"))</script>
</example>
<p>If you have not studied convex optimization before, you might be
surprised by the modeling power of even of this framework. Consider, for
instance, an objective of the form $$\ell(\bx,\bu) = |\bx| + |\bu|.$$
This can be formulated as a linear program. To do it, add additional
decision variables ${\bf s}_x[\cdot]$ and ${\bf s}_u[\cdot]$ -- these are
commonly referred to as <i>slack variables</i>
-- and write $$\min_{\bx,\bu,{\bf s}_x,{\bf s}_u} \sum_n^{N-1} {\bf
s}_x[n] + {\bf s}_u[n], \quad \text{s.t.} \quad {\bf s}_x[n] \ge x[n],
\quad {\bf s}_x[n] \ge -x[n], \quad ...$$ The field of convex
optimization is replete with tricks like this. Knowing and recognizing
them are skills of the (optimization) trade. But there are also many
relevant constraints which cannot be recast into convex constraints (in
the original coordinates) with any amount of skill. An important example
is obstacle avoidance. Imagine a vehicle that must decide if it should
go left or right around an obstacle. This represents a fundamentally
non-convex constraint in $\bx$; we'll discuss the implications of using
non-convex optimization for trajectory optimization below.</p>
</subsection>
<subsection id="direct_shooting"><h1>Direct Shooting</h1>
<p>The savvy reader might have noticed that adding $\bx[\cdot]$ as
decision variables was not strictly necessary. If we know $\bx[0]$ and
we know $\bu[\cdot]$, then we should be able to solve for $\bx[n]$ using
forward simulation. For our discrete-time linear systems, this is
particularly nice: \begin{align*}\bx[1] =& \bA\bx[0] + \bB\bu[0] \\
\bx[2] =& \bA(\bA\bx[0] + \bB\bu[0]) + \bB\bu[1] \\ \bx[n] =& \bA^n\bx[0]
+ \sum_{k=0}^{n-1} \bA^{n-1-k}\bB\bu[k].\end{align*} What's more, the
solution is still linear in $\bu[\cdot]$. This is amazing... we can get
rid of a bunch of decision variables, and turn a constrained optimization
problem into an unconstrained optimization problem (assuming we don't
have any other constraints). This approach -- using $\bu[\cdot]$ but
<i>not</i> $\bx[\cdot]$ as decision variables and using forward
simulation to obtain $\bx[n]$ -- is called the <i>direct shooting</i>
transcription. For linear systems with linear/quadratic objectives in
$\bx$, and $\bu$, it is still a convex optimization, and has less
decision variables and constraints than the direct transcription.</p>
</subsection>
<subsection id="computational_considerations"><h1>Computational
Considerations</h1></subsection>
<p>So is direct shooting uniformly better than the direct transcription
approach? I think it is not. There are a few potential reason that one
might prefer the direct transcription: <ul><li>Numerical conditioning.
Shooting involves calculating ${\bf A}^n$ for potentially large $n$,
which can lead to a large range of coefficient values in the constraints.
This problem is sometimes referred to as the "tail wagging the dog", or
in very deep neural networks as "vanishing gradients" and/or "exploding
gradients". In trajectory optimization, it is not simply an artifact of
our transcription, but is a real property of the problem we have
formulated: the control input $\bu[0]$ really does have more opportunity
to have a large impact on the total cost than control input $\bu[N-1]$.
The direct transcription approach combats the numerical issue by
spreading this effect out over a large number of well-balanced
constraints.</li><li>Adding state constraints. Having $\bx[n]$ as
explicit decision variables makes it very easy/natural to add additional
state constraints; and the solver effectively reuses the computation of
${\bf A}^n$ for each constraint. In shooting, one has to unroll those
terms for each new constraint.</li><li>Parallelization. For larger
problems, evaluating the constraints can be a substantial cost. In
direct transcription, one can evaluate the dynamics/constraints in
parallel (because each iteration begins with $\bx[n]$ already given),
whereas shooting is more fundamentally a serial operation.</li></ul></p>
<p>For linear convex problems, the solvers are mature enough that these
differences often don't amount to much. For nonlinear optimization
problems, the differences can be substantial. If you look at trajectory
optimization papers in mainstream robotics, you will see that both direct
transcription and direct shooting approaches are used. (It used to be
that you could guess which research lab wrote a paper simply by the
transcription they use!)</p>
<p>It is also worth noting that the problems generated by the direct
transcription have an important and exploitable "banded" sparsity pattern
-- most of the constraints touch only a small number of variables. This
is actually the same pattern that we exploit in the Riccati equations.
Thanks to the importance of these methods in real applications, numerous
specialized solvers have been written to explicitly exploit this sparsity
(e.g. <elib>Wang09a+Dellaert12</elib>).</p>
</subsection>
<subsection><h1>Continuous Time</h1>
<p>If we wish to solve the continuous-time version of the problem, then we
can discretize time and use the formulations above. The most important
decision is the discretization / numerical integration scheme. For linear
systems, if we assume that the control inputs are held constant for each
time step (aka <a
href="https://en.wikipedia.org/wiki/Zero-order_hold">zero-order hold</a>),
then we can integrate the dynamics perfectly: $$\bx[n+1] = \bx[n] +
\int_{t_n}^{t_n + h} \left[ {\bf A} \bx(t) + {\bf B}\bu \right]dt =
e^{{\bf A}h}\bx[n] + {\bf A}^{-1}(e^{{\bf A}h} - {\bf I}){\bf B}\bu[n],$$
is the simple case (when ${\bf A}$ is invertible). But in general, we can
use any finitely-parameterized representation of $\bu(t)$ and any
numerical integration scheme to obtain $\bx[n+1]={\bf f}(\bx[n],
\bu[n])$.</p>
</subsection>
</section>
<section><h1>Nonconvex Trajectory Optimization</h1>
<p>I strongly recommend that you study the convex trajectory optimization
case; it can lead you to mental clarity and sense of purpose. But in
practice trajectory optimization is often used to solve nonconvex problems.
Our formulation can become nonconvex for a number of reasons. For example,
if the dynamics are nonlinear, then the dynamic constraints become
nonconvex. You may also wish to have a nonconvex objective or nonconvex
additional constraint (e.g. collision avoidance). Typically we formulate
these problems using tools from <a
href="optimization.html#nonlinear">nonlinear programming</a>.</p>
<subsection><h1>Direct Transcription and Direct Shooting</h1>
<p>The formulations that we wrote for direct transcription and direct
shooting above are still valid when the dynamics are nonlinear, it's just
that the resulting problem is nonconvex. For instance, the direct
transcription for discrete-time systems becomes the more general:
\begin{align*} \min_{\bx[\cdot],\bu[\cdot]} \quad & \ell_f(\bx[N]) +
\sum_{n=0}^{N-1} \ell(\bx[n],\bu[n]) \\ \subjto \quad & \bx[n+1] = {\bf
f}(\bx[n], \bu[n]), \quad \forall n\in[0, N-1] \\ & \bx[0] = \bx_0 \\ & +
\text{additional constraints}. \end{align*} Direct shooting still works,
too, since on each iteration of the algorithm we can compute $\bx[n]$
given $\bx[0]$ and $\bu[\cdot]$ by forward simulation. But things get a
bit more interesting when we consider continuous-time systems.</p>
<p>For nonlinear dynamics, we have many choices for how to approximate
the discrete dynamics $$\bx[n+1] = \bx[n] + \int_{t[n]}^{t[n+1]}
f(\bx(t), \bu(t)) dt, \quad \bx(t[n]) = \bx[n].$$ For instance, in
<drake></drake> we have an entire <a
href="https://drake.mit.edu/doxygen_cxx/group__integrators.html">suite of
numerical integrators</a> that achieve different levels of simulation
speed and/or accuracy, both of which can be highly dependent on the
details of ${\bf f}(\bx,\bu)$.</p>
<todo>Finish supporting IntegratorBase in DirectTranscription and write an
example showing how to use it here.</todo>
<p>One very important idea in numerical integration of differential
equations is the use of variable-step integration as a means for
controlling integration error. <a
href="https://en.wikipedia.org/wiki/Runge%E2%80%93Kutta_methods#Adaptive_Runge%E2%80%93Kutta_methods">Runge-Kutta-Fehlberg</a>,
also known as "RK45", is one of the most famous variable-step
integrators. We typically avoid using variable steps inside a constraint
(it can lead to discontinuous gradients), but it is possible to
accomplish something similar in trajectory optimization by allowing the
sample times, $t[\cdot]$, themselves to be decision variables. This
allows the optimizer to stretch or shrink the time intervals in order to
solve the problem, and is particularly useful if you do not know apriori
what the total duration of the trajectory should be. Adding some
constraints to these time variables is essential in order to avoid
trivial solutions (like collapsing to a trajectory of zero duration). One
could potentially even add constraints to bound the integration
error.</p>
</subsection> <!-- end dirtran -->
<subsection id="direct_collocation"><h1>Direct Collocation</h1>
<p>It is very satisfying to have a suite of numerical integration routines
available for our direct transcription. But numerical integrators are
designed to solve forward in time, and this represents a design constraint
that we don't actually have in our direct transcription formulation. If
our goal is to obtain an accurate solution to the differential equation
with a small number of function evaluations / decision variables /
constraints, then some new formulations are possible that take advantage
of the constrained optimization formulation. These include the
so-called <em>collocation methods</em>.</p>
<p> In direct collocation (c.f., <elib>Hargraves87</elib>), both the
input trajectory and the state trajectory are represented explicitly as
piecewise polynomial functions. In particular, the sweet spot for this
algorithm is taking $\bu(t)$ to be a first-order polynomial and $\bx(t)$
to be a cubic polynomial.</p>
<p>It turns out that in this sweet spot, the only decision variables we
need in our optimization are the sample values $\bu(t)$ and $\bx(t)$ at
the so called "break" points of the spline. You might think that you
would need the coefficients of the cubic spline parameters, but you do
not. For the first-order interpolation on $\bu(t)$, given $\bu(t_k)$ and
$\bu(t_{k+1})$, we can solve for every value $\bu(t)$ over the interval
$t \in [k, k+1]$. But we also have everything that we need for the cubic
spline: given $\bx(t_k)$ and $\bu(t_k),$ we can compute $\dot\bx(t_k) = f
(\bx(t_k), \bu(t_k))$; and the four values $\bx(t_k), \bx(t_{k+1}),
\dot\bx (t_k), \dot\bx(t_{k+1})$ completely define all of the parameters
of the cubic spline over the interval $t\in[t_k, t_{k+1}]$. This is very
convenient, because it is easy for us to add additional constraints to
$\bu$ and $\bx$ at the sample points (and would have been relatively
harder to have to convert every constraint into constraints on the spline
coefficients).</p>
<figure>
<img width="80%" src="data/collocation.svg">
<figcaption>Cubic spline parameters used in the direct collocation method.</figcaption>
</figure>
<p>Given values of the decision variables ($\bu(t_k)$ and $\bx(t_k)$ at
the break points), the spline is fully specified, but we need dim$(\bx)$
additional constraints to constrain that $\bx(t_{k+1})$ is
(approximately) $\bx(t_k) + \int_{t_k}^{t_{k+1}} f(\bx(t),\bu(t)) dt.$ In
direct collocation, we impose this with derivative constraints at the
so-called <i>collocation points</i>. In particular, if we choose the
collocation times to be the midpoints of the spline, then we have that
\begin{gather*} t_{c,k} = \frac{1}{2}\left(t_k + t_{k+1}\right), \qquad
h_k = t_{k+1} - t_k, \\ \bu(t_{c,k}) = \frac{1}{2}\left(\bu(t_k) +
\bu(t_{k+1})\right), \\ \bx(t_{c,k}) = \frac{1}{2}\left(\bx(t_k) +
\bx(t_{k+1})\right) + \frac{h}{8} \left(\dot\bx(t_k) -
\dot\bx(t_{k+1})\right), \\ \dot\bx(t_{c,k}) =
-\frac{3}{2h}\left(\bx(t_k) - \bx(t_{k+1})\right) - \frac{1}{4}
\left(\dot\bx(t_k) + \dot\bx(t_{k+1})\right). \end{gather*} These
equations come directly from the equations that fit the cubic spline to
the end points/derivatives then interpolate them at the midpoint. They
give us precisely what we need to add the dynamics constraint to our
optimization at the collocation times:\begin{align*}
\min_{\bx[\cdot],\bu[\cdot]} \quad & \ell_f(\bx[N]) + \sum_{n=0}^{N-1}
h_n \ell(\bx[n],\bu[n]) \\ \subjto \quad & \dot\bx(t_{c,n}) =
f(\bx(t_{c,n}), \bu(t_{c,n})), & \forall n \in [0,N-1] \\ & \bx[0] =
\bx_0 \\ & + \text{additional constraints}. \end{align*} I hope this
notation is clear -- I'm using $\bx[k] = \bx(t_k)$ as the decision
variables, and the collocation constraint at $t_{c,k}$ depends on the
decision variables: $\bx[k], \bx[k+1], \bu[k], \bu[k+1]$. The actual
equations of motion get evaluated at both the break points, $t_k$, and
the collocation times, $t_{c,k}$, because we need to evaluate the
dynamics at the break points in order to write $\dot{\bx}(t_{c,k}).$</p>
<p>Once again, direct collocation effectively integrates the equations of
motion by satisfying the constraints of the optimization -- this time
producing an integration of the dynamics that is accurate to third order
with effectively two evaluations of the plant dynamics per segment (since
we use $\dot\bx(t_k)$ for two intervals). <elib>Hargraves87</elib>
claims, without proof, that as the break points are brought closer
together, the trajectory will converge to a true solution of the
differential equation. Once again it is very natural to add additional
terms to the cost function or additional input/state constraints, and
very easy to calculate the gradients of the objective and constraints.
I personally find it very nice to explicitly account for the parametric
encoding of the trajectory in the solution technique.</p>
<example id="swingup"><h1>Direct Collocation for the Pendulum, Acrobot,
and Cart-Pole</h1>
<figure>
<img width="80%" src="data/pend_trajopt_swingup.svg">
<figcaption>A swing-up trajectory for the simple pendulum (with
severe torque limits) optimized using direct
collocation.</figcaption>
</figure>
<p>Direct collocation also easily solves the swing-up problem for
the pendulum, Acrobot, and cart-pole system. Try it for
yourself:</p>
<script>document.write(notebook_link('trajopt', d=deepnote, link_text="", notebook="dircol"))</script>
As always, make sure you take a look at the code!
</example>
</subsection> <!-- end dircol -->
<subsection id="pseudo-spectral"><h1>Pseudo-spectral Methods</h1>
<p>The direct collocation method of <elib>Hargraves87</elib> was our first
example of explicitly representing the solution of the optimal control
problem as a parameterized trajectory, and adding constraints to the
derivatives at a series of collocation times. In the algorithm above,
the representation of choice was <i>piecewise-polynomials</i>, e.g. cubic
spines, and the spline coefficients were the decision variables. A
closely related approach, often called "pseudo-spectral" optimal control,
uses the same collocation idea, but represents the trajectories instead
using a linear combination of <i>global, polynomial</i> basis functions.
These methods use typically much higher-degree polynomials, but can
leverage clever parameterizations to write sparse collocation objectives
and to select the collocation times <elib>Garg11+Ross12a</elib>.
Interestingly, The continuously-differentiable nature of the
representation of these methods has led to comparatively more theorems and
analysis than we have seen for other direct trajectory optimization
methods <elib>Ross12a</elib> -- but despite some of the language used in
these articles please remember they are still local optimization methods
trying to solve a nonconvex optimization problem. While the direct
collocation method above might be expected to converge to the true optimal
solution by adding more segments to the piecewise polynomial (and having
each segment represent a smaller interval of time), here we expect
convergence to happen as we increase the degree of the polynomials.</p>
<p>The pseudo-spectral methods are also sometimes knowns as "orthogonal
collocation" because the $N$ basis polynomials, $\phi_j(t)$, are chosen
so that at the $j$th collocation time $t_j$, we have $$\phi_i(t_j) =
\begin{cases} 1 & i=j, \\ 0 & \text{otherwise.}\end{cases}$$ This can be
accomplished by using the Lagrange polynomials, $$\phi_i(t) = \prod_{j=0,
j\ne i}^{N} \frac{t-t_j}{t_i - t_j}.$$ Note that for both numerical
reasons and for analysis, time is traditionally rescaled from the
interval $[t_0, t_f]$ to $[-1, 1]$. Collocation times are chosen based
on small variations of
<a href="https://en.wikipedia.org/wiki/Gaussian_quadrature"></a>Gaussian
quadrature</a>, known as the "Gauss-Lobatto" which includes collocation
times at $t=-1$ and $t=1$.</p>
<p>Interestingly, a number of papers have also considered
infinite-horizon pseudo-spectral optimization by the nonlinear rescaling
of the time interval $t\in[0, \infty)$ to the half-open interval
$\tau\in[-1, 1)$ via $\tau = \frac{t-1}{t+1}$
<elib>Garg11+Ross12a</elib>. In this case, one chooses the collocation
times so that they include $\tau = -1$ but do not include $\tau=1$, using
the so-called "Gauss-Radau" points
<elib>Garg11</elib>.</p>
<todo>Add an example. Acrobot swingup?</todo>
<subsubsection><h1>Dynamic constraints in implicit form</h1>
<p>There is another, seemingly subtle but potentially important,
opportunity that can be exploited in a few of these transcriptions, if
our main interest is in optimizing systems with significant multibody
dynamics. In some cases, we can actually write the dynamics constraints
directly in their implicit form. We've <a
href="lyapunov.html#implicit">introduced this idea already</a> in the
context of Lyapunov analysis. In many cases, it is nicer or more
efficient to obtain the equations of motion in an implicit form, ${\bf
g}(\bx,\bu,\dot\bx) = 0$, and to avoid ever having to solve for the
explicit form $\dot\bx = {\bf f}(\bx,\bu).$ This can become even more
important when we consider systems for which the explicit form doesn't
have a unique solution -- we will see examples of this when we study
trajectory optimization through contact because the Coulomb model for
friction actually results in a differential inclusion instead of a
differential equation.</p>
<p>The collocation methods, which operate on the dynamic constraints at
collocation points directly in their continuous form, can use the
implicit form directly. It is possible to write a time-stepping
(discrete-time approximation) for direct transcription using implicit
integrators -- again providing constraints in implicit form. The
implicit form is harder to exploit in the shooting methods.</p>
<todo>There should really be better versions of direct transcription and
the collocation methods that are specialized for second-order systems.
We should only have $\bq$ as a decision variable, and be able to impose
the dynamics only on the second derivatives (the first derivatives are
consistent by construction). This is one of the main talking points in
DMOC, even though the emphasis is on the discrete mechanics.</todo>
</subsubsection>
</subsection>
</section>
<section><h1>Solution techniques</h1>
<p>The different transcriptions presented above represent different ways to
map the (potentially continuous-time) optimal control problem into a finite
set of decision variables, objectives, and constraints. But even once that
choice is made, there are numerous approaches to solving this optimization
problem. Any general approach to <a
href="optimization.html#nonlinear">nonlinear programming</a>
can be applied here; in the python examples we've included so far, the
problems are handed directly to the sequential-quadratic programming (SQP)
solver SNOPT, or to the interior-point solver IPOPT.</p>
<p>There is also quite a bit of exploitable problem-specific structure in
these trajectory optimization problems due to the sequential nature of the
problem. As a result, there are some ideas that are fairly specific to the
trajectory optimization formulation of optimal control, and customized
solvers can often (and sometimes dramatically) outperform general purpose
solvers.</p>
<p>This trajectory-optimization structure is easiest to discuss, and
implement, in unconstrained formulations, so we will start there.</p>
<subsection><h1>Efficiently computing gradients</h1>
<p>Providing gradients of the objectives and constraints to the solver is
not strictly required -- most solvers will obtain them from finite
differences if they are not provided -- but I feel strongly that for
smooth problems the solvers are faster and more robust when exact
gradients are provided (systems that make and break contact might be a
different story<elib>Suh22b</elib>). Providing the gradients for the
direct transcription methods is very straight-forward -- we simply
provide the gradients for each constraint individually. But in the
direct shooting approach, where we have removed the $\bx$ decision
variables from the program but still write objectives and constraints in
terms of $\bx$, it would become very inefficient to compute the gradients
of each objective/constraint independently. We need to leverage the
chain rule.</p>
<p>To be concise (and slightly more general), let us define
$\bx[n+1]=f_d(\bx[n],\bu[n])$ as the discrete-time approximation of the
continuous dynamics; for example, the forward Euler integration scheme
used above would give $f_d(\bx[n],\bu[n]) = \bx[n]+f(\bx[n],\bu[n])dt.$
Then we have \[\pd{J}{\bu_k} = \pd{\ell_f(\bx[N])}{\bu_k} +
\sum_{n=0}^{N-1} \left(\pd{\ell(\bx[n],\bu[n])}{\bx[n]}
\pd{\bx[n]}{\bu_k} + \pd{\ell(\bx[n],\bu[n])}{\bu_k} \right), \] where
the gradient of the state with respect to the inputs can be computed
during the "forward simulation", \[ \pd{\bx[n+1]}{\bu_k} =
\pd{f_d(\bx[n],\bu[n])}{\bx[n]} \pd{\bx[n]}{\bu_k} +
\pd{f_d(\bx[n],\bu[n])}{\bu_k}. \] These simulation gradients can also
be used in the chain rule to provide the gradients of any constraints.
Note that there are a lot of terms to keep around here, on the order of
(state dim) $\times$ (control dim) $\times$ (number of time steps).
Ouch. Note also that many of these terms are zero; for instance with the
Euler integration scheme above $\pd{\bu[n]}{\bu_k} = 0$ if $k\ne n$. (If
this looks like I'm mixing two notations here, recall that I'm using
$\bu_k$ to represent the decision variable and $\bu[n]$ to represent the
input used in the $n$th step of the simulation.)</p>
</subsection> <!-- gradients -->
<subsection id="backprop_through_time">
<h1>The special case of direct shooting without state constraints</h1>
<p>By solving for $\bx(\cdot)$ ourselves, we've removed a large number of
constraints from the optimization. If no additional state constraints
are present, and the only gradients we need to compute are the gradients
of the objective, then a surprisingly efficient algorithm emerges. I'll
give the steps here without derivation, but will derive it in the
Pontryagin section below: <ol> <li>Simulate forward: $$\bx[n+1] =
f_d(\bx[n],\bu_n),$$ from $\bx[0] = \bx_0$.</li>
<li>Calculate backwards: $$\lambda[n-1] =
\pd{\ell(\bx[n],\bu[n])}{\bx[n]}^T + \pd{f(\bx[n],\bu[n])}{\bx[n]}^T
\lambda[n],$$ from $\lambda[N-1]=\pd{\ell_f(\bx[N])}{\bx[N]}$.</li>
<li>Extract the gradients: $$\pd{J}{\bu[n]} =
\pd{\ell(\bx[n],\bu[n])}{\bu[n]} + \lambda[n]^T
\pd{f(\bx[n],\bu[n])}{\bu[n]},$$ with $\pd{J}{\bu_k} = \sum_n
\pd{J}{\bu[n]}\pd{\bu[n]}{\bu_k}$.</li> </ol> <p>Here $\lambda[n]$ is a
vector the same size as $\bx[n]$ which has an interpretation as
$\lambda[n]=\pd{J}{\bx[n+1]}^T$. The equation governing $\lambda$ is
known as the <em>adjoint equation</em>, and it represents a dramatic
efficiency improvement over calculating the huge number of simulation
gradients described above. In case you are interested, yes the adjoint
equation is exactly the <em>backpropagation algorithm</em> that is famous
in the neural networks literature, or more generally a bespoke version of
reverse-mode <a
href="https://en.wikipedia.org/wiki/Automatic_differentiation">automatic
differentiation</a>. (The previous section would correspond to
forward-mode automatic differentiation.)</p>
</subsection>
<subsection><h1>Penalty methods and the Augmented Lagrangian</h1>
<p>In fact, in recent years we have seen a surge in popularity in
robotics for doing trajectory optimization using (often special-purpose)
solvers for unconstrained trajectory optimization, where the constrained
problems are transformed into unconstrained problem via penalty
methods.</p>
<p>Manually adding penalty terms to the cost function with user-specified
cost weighting can be nasty business. Often this generates a significant
amount of effort in "cost function tuning"; setting the weights too large
can cause the problems to become numerically ill-conditioned, but setting
them too small can allow unwanted constraint violations. The <a
href="https://en.wikipedia.org/wiki/Augmented_Lagrangian_method"><i>augmented
Lagrangian</i></a> approach offers a principled solution to this problem,
by automatically scheduling the weights in order to achieve "essentially
exact" satisfaction of the constraints with finite penalties
<elib>Bertsekas82</elib>, <elib part="Ch. 17">Nocedal06</elib>.</p>
<p>Methods based on the augmented Lagrangian are particularly popular for
trajectory optimization these days <elib>Lin91+Toussaint14</elib>.</p>
</subsection>
<subsection><h1>Zero-order optimization</h1>
<p>Cross-entropy method (CEM) <elib>Kobilarov12</elib> and
Model-predictive path-integral control (MPPI)
<elib>Williams17</elib>.</p>
<p>In the last few years, we've seen trajectory optimization playing a
major role in "model-based reinforcement learning", especially when the
model is represented using a deep neural network (e.g.
<elib>Li18+Li18a</elib>). Despite the fact that neural network toolboxes
can provide efficient gradients, the gradient-free trajectory
optimization methods described here normally the tool of choice. There
are a few possible explanations for this. The trajectory optimization
problem has local minima, and the zero-order optimizers do provide some
aspect of "global optimization" (at least they tend to explore multiple
possible minima before converging). Neural network models fit with
limited data may even have more local minima than if one used the
original physics model -- the standard regression loss could be small
even if the neural representation of the function is locally
bumpy/non-smooth. But I think the biggest reason for the popularity of
the zero-order methods is that they are trivially parallelizable, and
very nicely compatible with the existing GPU neural-network workflows.
When rolling out 10,000 samples in parallel costs approximately the same
as rolling out 1, then this certainly makes the zero-order methods more
appealing.</p>
</subsection>
<subsection><h1>Getting good solutions... in practice.</h1>
<p>As you begin to play with these algorithms on your own problems, you
might feel like you're on an emotional roller-coaster. You will have
moments of incredible happiness -- the solver may find very impressive
solutions to highly nontrivial problems. But you will also have moments
of frustration, where the solver returns an awful solution, or simply
refuses to return a solution (saying "infeasible"). The frustrating
thing is, you cannot distinguish between a problem that is actually
infeasible, vs. the case where the solver was simply stuck in a local
minima.</p>
<p>So the next phase of your journey is to start trying to "help" the
solver along. There are two common approaches.</p>
<p>The first is tuning your cost function -- some people spend a lot of
time adding new elements to the objective or adjusting the relative
weight of the different components of the objective. This is a slippery
slope, and I tend to try to avoid it (possibly to a fault; other groups
tend to put out more compelling videos!).</p>
<p>The second approach is to give a better initial guess to your solver
to put it in the vicinity of the "right" local minimal. I find this
approach more satisfying, because for most problems I think there really
is a "correct" formulation for the objective and constraints, and we
should just aim to find the optimal solution. Once again, we do see a
difference here between the direct shooting algorithms and the direct
transcription / collocation algorithms. For shooting, we can only
provide the solver with an initial guess for $\bu(\cdot)$, whereas the
other methods allow us to also specify an initial guess for $\bx(\cdot)$
directly. I find that this can help substantially, even with very simple
initializations. In the direct collocation examples for the swing-up
problem of the Acrobot and cart-pole, simply providing the initial guess
for $\bx(\cdot)$ as a straight line trajectory between the start and the
goal was enough to help the solver find a good solution; in fact it was
necessary.</p>
</subsection>
<!-- maybe: relaxing dynamic constraints (as in the formulation n Ross and
Karpenko, 2012 between eq 16 and 17) -->
</section> <!-- end three algorithms -->
<section><h1>Local Trajectory Feedback Design</h1>
<p>Once we have obtained a locally optimal trajectory from trajectory
optimization, we have found an open-loop trajectory that (at least locally)
minimizes our optimal control cost. Up to numerical tolerances, this pair
$\bu_0(t), \bx_0(t)$ represents a feasible solution trajectory of the
system. But we haven't done anything, yet, to ensure that this trajectory
is locally stable.</p>
<p>In fact, there are a few notable approximations that we've already made
to get to this point: the integration accuracy of our trajectory
optimization tends to be much less than the accuracy used during forward
simulation (we tend to take bigger time steps during optimization to avoid
adding too many decision variables), and the default convergence tolerance
from the optimization toolboxes tend to satisfy the dynamic constraints
only to around $10^{-6}$. As a result, if you were to simulate the
optimized control trajectory directly <i>even from the exact initial
conditions</i> used in / obtained from trajectory optimization, you might
find that the state trajectory diverges from your planned trajectory.</p>
<p>There are a number of things we can do about this. It is possible to
evaluate the local stability of the trajectory during the trajectory
optimization, and add a cost or constraint that rewards open-loop stability
(e.g. <elib>Mombaur05+Johnson16</elib>). This can be very effective
(though it does tend to be expensive). But open-loop stability is a quite
restrictive notion. A potentially more generally useful approach is to
design a feedback controller to regulate the system back to the planned
trajectory.</p>
<subsection><h1>Finite-horizon LQR</h1>
<p>We have already developed one approach for <a
href="lqr.html#finite_horizon_nonlinear">trajectory stabilization in the
LQR chapter</a>. This is one of my favorite approaches to trajectory
feedback, because it provides a (numerically) closed-form solution for
the controller, $\bK(t),$ and even comes with a time-varying quadratic
cost-to-go function, $S(t),$ that can be used for Lyapunov analysis.</p>
<p>The basic procedure is to create a time-varying linearization along
the trajectory in the error coordinates: $\bar\bx(t) = \bx(t) -
\bx_0(t)$, $\bar\bu(t) = \bu(t)-\bu_0(t)$, and $\dot{\bar{\bx}}(t) = {\bf A}(t)\bar\bx(t) + {\bf B}(t)\bar\bu(t).$ This linearization uses
all of the same gradients of the dynamics that we have been using in our
trajectory optimization algorithms. Once we have the time-varying
linearization, then we can apply finite-horizon LQR (see the <a
href="lqr.html#finite_horizon_nonlinear">LQR chapter</a>
for the details).</p>
<p>A major virtue of this approach is that we can proceed immediately to
verifying the performance of the closed-loop system under the LQR policy.
Specifically, we can apply the finite-time reachability analysis to
obtain "funnels" that certify a desired notion of invariance --
guaranteeing that trajectories which start near the planned trajectory
will stay near the planned trajectory. Please see the <a
href="lyapunov.html#finite-time">finite-time reachability analysis</a>
section for those details. We will put all of these ideas together in
the perching case-study below.</p>
</subsection>
<subsection><h1>Model-Predictive Control</h1>
<p>The maturity, robustness, and speed of solving trajectory optimization
using convex optimization leads to a beautiful idea: if we can optimize
trajectories quickly enough, then we can use our trajectory optimization
as a feedback policy. The recipe is simple: (1) measure the current
state, (2) optimize a trajectory from the current state, (3) execute the
first action from the optimized trajectory, (4) let the dynamics evolve
for one step and repeat. This recipe is known as <i>model-predictive
control</i> (MPC). </p>
<p>Despite the very computational nature of this controller (there is no
closed-form representation of this policy; it is represented only
implicitly as the solution of the optimization), there is a bounty of
theoretical and algorithmic results on MPC
<elib>Garcia89+Camacho13</elib>. The theoretical and practical aspects of Linear MPC are so well
understood today that it is considered the de-facto generalization of LQR
for controlling a linear system subject to (linear) constraints. And there are a few core ideas that
practitioners should really understand.</p>
<subsubsection><h1>Receding-horizon MPC</h1>
<p>One core idea is the concept of <i>receding-horizon</i> MPC. Since
our trajectory optimization problems are formulated over a
finite-horizon, we can think each optimization as reasoning about the
next $N$ time steps. If our true objective is to optimize the
performance over a horizon longer than $N$ (e.g. over the infinite
horizon), then it is standard to continue solving for an $N$ step
horizon on each evaluation of the controller. In this sense, the total
horizon under consideration continues to move forward in time (e.g. to
recede).</p>
</subsubsection>
<subsubsection><h1>Recursive feasibility</h1>
<p>Some care must be taken in receding-horizon formulations because on
each new step we are introducing new costs and constraints into the
problem (the ones that would have been associated with time $N+1$ on
the previous solve) -- it would be very bad to march forward in time
solving convex optimization problems only to suddenly encounter a
situation where the solver returns "infeasible!". One can design MPC
formulations that guarantee <i>recursive feasibility</i> -- e.g.
guarantee that if a feasible solution is found at time $n$, then the
solver will also find a feasible solution at time $n+1$.</p>
<p>Perhaps the simplest mechanism for guaranteeing recursive
feasibility in an optimization for stabilizing a fixed point, $(\bx^*,
\bu^*)$, is to add a final-value constraint to the receding horizon,
$\bx[N] = \bx^*$. This idea is simple but important. Considering the
trajectories/constraints in absolute time, then on step $k$ of the
algorithm, we are optimizing for $\bx[k], ... , \bx[k+N],$ and $\bu[k],
..., \bu[k+N-1]$; let us say that we have found a feasible solution for
this problem. The danger in receding-horizon control is that when we
shift to the next step ($k+1$) we introduce constraints on the system
at $\bx[k+N+1]$ for the first time. But if our feasible solution in
step $k$ had $\bx[k+N] = \bx^*$, then we know that setting $\bx[k+N+1]
= \bx^*, \bu[k+N] = \bu^*$ is guaranteed to provide a feasible solution
to the new optimization problem in step $k+1$. With feasibility
guaranteed, the solver is free to search for a lower-cost solution
(which may be available now because we've shifted the final-value
constraint further into the future). It is also possible to formulate
MPC problems that guarantee recursive feasibility even in the presence
of modeling errors and disturbances (c.f. <elib>Bemporad99</elib>).</p>
</subsubsection>
<subsubsection><h1>MPC and Lyapunov functions</h1>
<p>Even though the MPC policy is only defined implicitly as the result
of an optimization problem, it is still possible to make Lyapunov
stability arguments. If, in the receding-horizon MPC setting, in
addition to recursive feasibility (of the <i>constraints</i>) we have
the property that the <i>cost</i> added at step $k+N+1$ is less than or
equal to the cost removed corresponding to step $k$. Then the optimal
cost, $J(\bx,k)$, of the trajectory optimization is a Lyapunov
function: $$\forall \bx, k, \quad J^*({\bf f}(\bx, \bu^*), k+1) \le
J^*(\bx, k).$$ Once again, a particularly simple way to achieve this
is to have a final-value constraint at the $N$th step of every
optimization, which gets me to an equilibrium configuration with zero
cost. But, again, there are ways to extend this Lyapunov argument even
in the presence of modeling errors and disturbances (c.f.
<elib>Bemporad99</elib>).</p>
<p>Unlike the nice explicit time-varying quadratic Lyapunov functions
that we get from finite-horizon LQR, this Lyapunov function in general
does not admit a compact representation (this is the object of study in
<a href="#explicit_mpc">explicit MPC</a>), and is more difficult to use
with e.g. sums-of-squares optimization.</p>
</subsubsection>
</subsection>
</section> <!-- end feedback design -->
<section id="perching"><h1>Case Study: A glider that can land on a perch
like a bird</h1>
<p>From 2008 til 2014, my group conducted a series of increasingly
sophisticated investigations
<elib>Cory08+Roberts09+Cory10a+Moore11a+Moore12+Moore14b</elib> which asked
the question: can a fixed-wing UAV land on a perch like a bird?</p>
<figure>
<img width="80%" src="figures/perch-sequence.jpg">
<figcaption>Basic setup for the glider landing on a perch.</figcaption>
</figure>
<p>At the outset, this was a daunting task. When birds land on a perch,
they pitch up and expose their wings to an "angle-of-attack" that far
exceeds the typical flight envelope. Airplanes traditionally work hard to
avoid this regime because it leads to aerodynamic "stall" -- a sudden loss
of lift after the airflow separates from the wing. But this loss of lift is
also accompanied by a significant increase in drag, and birds exploit this
when they rapidly decelerate to land on a perch. Post-stall aerodynamics
are a challenge for control because (1) the aerodynamics are time-varying
(characterized by periodic vortex shedding) and nonlinear, (2) it is much
harder to build accurate models of this flight regime, at least in a wind
tunnel, and (3) stall implies a loss of attached flow on the wing and
therefore on the control surfaces, so a potentially large reduction in
control authority.</p>
<p>We picked the project initially thinking that it would be a nice example
for model-free control (like reinforcement learning -- since the models
were unknown). In the end, however, it turned out to be the project that
really taught me about the power of model-based trajectory optimization and
linear optimal control. By conducting dynamic system identification
experiments in a motion capture environment, we were able to fit both
surprisingly simple models (based on flat-plate theory) to the
dynamics<elib>Cory08</elib>, and also more accurate models using
"neural-network-like" terms to capture the residuals between the model and
the data <elib>Moore14b</elib>. This made model-based control viable, but
the dynamics were still complex -- while trajectory optimization should
work, I was quite dubious about the potential for regulating to those
trajectories with only linear feedback.</p>
<p>I was wrong. Over and over again, I watched time-varying linear quadratic
regulators take highly nontrivial corrective actions -- for instance,
dipping down early in the trajectory to gain kinetic energy or tipping up to
dump energy out of the system -- in order to land on the perch at the final
time. Although the quality of the linear approximation of the dynamics did
degrade the farther that we got from the nominal trajectory, the validity of
the controller dropped off much less quickly (even as the vector field
changed, the direction that the controller needed to push did not). This was
also the thinking that got me initially so interested in understanding the
regions of attraction of linear control on nonlinear systems.</p>
<p>In the end, the experiments were very successful. We started searching
for the "simplest" aircraft that we could build that would capture the
essential control dynamics, reduce complexity, and still accomplish the
task. We ended up building a series of flat-plate foam gliders (no
propellor) with only a single actuator to control the elevator. We added
dihedral to the wings to help the aircraft stay in the longitudinal plane.
The simplicity of these aircraft, plus the fact that they could be
understood through the lens of quite simple models makes them one of my
favorite canonical underactuated systems.</p>
<figure>
<iframe width="560" height="315"
src="https://www.youtube.com/embed/syJF8js9aEU" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
<figcaption>The original perching experiments from <elib>Cory08</elib> in
a motion capture arena with a simple rope acting as the perch. The main
videos were shot with high-speed cameras; an entire perching trajectory
takes about .8 seconds.</figcaption>
</figure>
<subsection><h1>The Flat-Plate Glider Model</h1>
<figure>
<img width="50%" src="figures/glider.svg">
<figcaption>The flat-plate glider model. Note that traditionally
aircraft coordinates are chosen so that positive pitch brings the nose
up; but this means that positive z is down (to have a right-handed
system). For consistency, I've chosen to stick with the vehicle
coordinate system that we use throughout the text -- positive z is up,
but positive pitch is down. </figcaption>
</figure>
<p>In our experiments, we found the dynamics of our aircraft
<elib>Cory08</elib> were captured very well by the so-called "flat plate
model" <elib part="Ch.4, Sec.6">Hoerner85</elib>. In flat plate theory
lift and drag forces of a wing are summarized by a single force at the
center-of-pressure of the wing acting normal to the wing, with magnitude:
$$f_n(S, {\bf n}, \bv) = \rho S \sin\alpha |\bv|^2 = -\rho S ({\bf n}
\cdot \bv) |\bv|,$$ where $\rho$ is the density of the air, $S$ is the
area of the wing, $\alpha$ is the angle of attack of the surface, ${\bf
n}$ is the normal vector of the lifting surface, and $\bv$ is the
velocity of the center of pressure relative to the air. This corresponds
to having lift and drag coefficients $$c_{\text{lift}} =
2\sin\alpha\cos\alpha, \quad c_{\text{drag}} = 2\sin^2\alpha.$$ In our
glider model, we summarize all of the aerodynamic forces by contributions
of two lifting surfaces, the wing (including some contribution from the
horizontal fuselage) denoted by subscript $w$ and the elevator denoted by
subscript $e$, with centers at $\bp_w = [x_w, z_w]^T$ and $\bp_e = [x_e,
z_e]^T$ given by the kinematics: $$\bp_w = \bp - l_w\begin{bmatrix}
c_\theta \\ -s_\theta \end{bmatrix},\quad \bp_e = \bp - l_h
\begin{bmatrix} c_\theta \\ -s_\theta \end{bmatrix} - l_e \begin{bmatrix}
c_{\theta+\phi} \\ -s_{\theta+\phi} \end{bmatrix},$$ where the origin of
our vehicle coordinate system, $\bp = [x,z]^T$, is chosen to be the
center of mass. We assume still air, so $\bv = \dot{\bp}$ for both the
wing and the elevator. We assume that the elevator is massless, and the
actuator controls velocity directly (note that this breaks our "control
affine" structure, but is more realistic for the tiny hobby servos we
were dealing with). This gives us a total of 7 state variables $\bx =
[x, z, \theta, \phi, \dot{x}, \dot{z}, \dot\theta]^T$ and one control
input $\bu = \dot\phi.$ The resulting equations of motion are:
\begin{gather*} {\bf n}_w = \begin{bmatrix} s_\theta \\ c_\theta
\end{bmatrix}, \quad {\bf n}_e = \begin{bmatrix} s_{\theta+\phi} \\
c_{\theta+\phi} \end{bmatrix}, \\ f_w = f_n(S_w, {\bf n}_w, \dot\bp_w),
\quad f_e = f_n(S_e, {\bf n}_e, \dot\bp_e), \\ \ddot{x} = \frac{1}{m}
\left(f_w s_\theta + f_e s_{\theta+\phi} \right), \\ \ddot{z} =
\frac{1}{m} \left(f_w c_\theta + f_e c_{\theta+\phi} \right) - g, \\
\ddot\theta = \frac{1}{I} \left( l_w f_w + (l_h c_\phi + l_e) f_e
\right). \end{gather*}
<!-- Note: the inertia term looks suspicious, but is the result of a lot of cancellations in the cross product. See, for instance, eq. 3.9 in Moore14b -->
</p>
</subsection>