class: middle, center, title-slide
Lecture 11: Theory of deep learning
Prof. Gilles Louppe
[email protected]
???
R: move out the GP part into a new lecture. R: cover neural tangents there https://rajatvd.github.io/NTK/ R: science of dl https://people.csail.mit.edu/madry/6.883/
mysteries of deep learning -> better generalization than they should (over-param) -> lottery ticket -> adversarial examples http://introtodeeplearning.com/materials/2019_6S191_L6.pdf
R: check generalization from https://m2dsupsdlclass.github.io/lectures-labs/slides/08_expressivity_optimization_generalization/index.html#87
.bold[Theorem.] (Cybenko 1989; Hornik et al, 1991) Let
class: middle
The universal approximation theorem
- guarantees that even a single hidden-layer network can represent any classification problem in which the boundary is locally linear (smooth);
- does not inform about good/bad architectures, nor how they relate to the optimization procedure.
- generalizes to any non-polynomial (possibly unbounded) activation function, including the ReLU (Leshno, 1993).
class: middle
.bold[Theorem] (Barron, 1992) The mean integrated square error between the estimated network
- Combines approximation and estimation errors.
- Provided enough data, it guarantees that adding more neurons will result in a better approximation.
class: middle
Let us consider the 1-layer MLP
This model can approximate any smooth 1D function, provided enough hidden units.
class: middle count: false
Let us consider the 1-layer MLP
This model can approximate any smooth 1D function, provided enough hidden units.
class: middle count: false
Let us consider the 1-layer MLP
This model can approximate any smooth 1D function, provided enough hidden units.
class: middle count: false
Let us consider the 1-layer MLP
This model can approximate any smooth 1D function, provided enough hidden units.
class: middle count: false
Let us consider the 1-layer MLP
This model can approximate any smooth 1D function, provided enough hidden units.
class: middle count: false
Let us consider the 1-layer MLP
This model can approximate any smooth 1D function, provided enough hidden units.
class: middle count: false
Let us consider the 1-layer MLP
This model can approximate any smooth 1D function, provided enough hidden units.
class: middle count: false
Let us consider the 1-layer MLP
This model can approximate any smooth 1D function, provided enough hidden units.
class: middle count: false
Let us consider the 1-layer MLP
This model can approximate any smooth 1D function, provided enough hidden units.
class: middle count: false
Let us consider the 1-layer MLP
This model can approximate any smooth 1D function, provided enough hidden units.
class: middle count: false
Let us consider the 1-layer MLP
This model can approximate any smooth 1D function, provided enough hidden units.
class: middle count: false
Let us consider the 1-layer MLP
This model can approximate any smooth 1D function, provided enough hidden units.
class: middle count: false
Let us consider the 1-layer MLP
This model can approximate any smooth 1D function, provided enough hidden units.
.bold[Theorem] (Montúfar et al, 2014) A rectifier neural network with
- That is, the number of linear regions of deep models grows exponentially in
$L$ and polynomially in$q$ . - Even for small values of
$L$ and$q$ , deep rectifier models are able to produce substantially more linear regions than shallow rectifier models.