Create a step "use backprop to chart dependencies" #196

kordc · 2023-11-26T18:00:55Z

As in #72
Use backprop to chart dependencies. Your deep learning code will often contain complicated, vectorized, and broadcasted operations. A relatively common bug I’ve come across a few times is that people get this wrong (e.g. they use a view instead of transpose/permute somewhere) and inadvertently mix information across the batch dimension. It is a depressing fact that your network will typically still train okay because it will learn to ignore data from the other examples. One way to debug this (and other related problems) is to set the loss to be something trivial like the sum of all outputs of example i, run the backward pass all the way to the input and ensure that you get a non-zero gradient only on the i-th input. The same strategy can be used to e.g. ensure that your autoregressive model at time t only depends on 1..t-1. More generally, gradients give you information about what depends on what is in your network, which can be useful for debugging.

A step or a network modifier. To be discussed

kordc added the v0.4 label Nov 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create a step "use backprop to chart dependencies" #196

Create a step "use backprop to chart dependencies" #196

kordc commented Nov 26, 2023

Create a step "use backprop to chart dependencies" #196

Create a step "use backprop to chart dependencies" #196

Comments

kordc commented Nov 26, 2023