-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implemented Householder reflections with modifications. #8
Conversation
… our optimizers - including tests. Tests for Riemannian gradients still not completed.
Also started to implement retractions now. I first tried the retraction which can be found in file |
I implemented a custom retraction for the Stiefel manifold (true geodesic). A problem with the performance of the Householder retractions became apparent. Also: there may be a problem with the term |
… changed bits of the Householder algorithm. The main routines are now using LinearAlgebra.qr for the moment, however. This might become an issue since there is no Julia implementation of symplectic Householder.
…type LayerCache is new.
…specific random number generator at the initialization step. A new 'TrivialInitRNG' is used for initializing the optimizer caches.
…re now more or less in the format they should be in.
A "problem" that has emerged is that
in |
Also: I implemented a custom RNG |
…odels on CPU now.
…ge uses fewer allocations than Lux, but seems to be slower nonetheless.
… the manifold optimizers for now).
… named tuples to have more flexibility, especially when working with 'nested structures' like the transformer.
…he correct preprocessing now.
…changed the corresponding test.
…for all optimizers!
Householder reflections are needed because Gram Schmidt (especially the symplectic version) is numerically very unstable. In addition, Householder reflections are much cheaper to compute and should in general scale with$\mathcal{O}(N^2)$ as opposed to $\mathcal{O}(N^3)$ (but this has to be investigated further).
The most important new routines are in$Q$ and $R$ matrices (same as in the $QX$ or $Q^TX$ .$Q$ and $Q^T$ , not the factorization $A=QR$ ) are implied the need to re-implement the Householder reflections!
src/optimizers/householder.jl
.HouseDecom
is a struct that efficiently and implicitly stores theqr
routine inLinearAlgebra.jl
). Depending on whethertranspose
is set tofalse
ortrue
,(HD::HouseDecom)(X)
will computeTests for these routines are in
test/householder.jl
. These routines (application ofsrc/optimizers/lie_alg_lifts.jl
computes the lift for the Stiefel manifold and maps to the global tangent space representation, i.e.StiefelLieAlgHorMatrix
in the filesrc/arrays/stiefel_lie_alg_hor.jl
.The function
apply_projection
is the canonical map that doesSo far one test for this has been implemented in
tests/lie_alg_lifts.jl
.