This repository has been archived by the owner on Aug 15, 2019. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 951
"Low-Level" QR Implementation #1366
Open
DirkToewe
wants to merge
18
commits into
tensorflow:master
Choose a base branch
from
DirkToewe:linalg_qr
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The QR Decomposition implementation uses Givens Rotations for elimination. There's a few reasons, why the Givens Rotation was chosen over the Householder method: * During a quick trial, there seemed to be no practical performance difference in JS between Householder and Givens. JS itself seems to introduce enough overhead to make the fewe extra FLOPS irrelevant. * Givens Method is easier to implement numerically stable (e.g. there is no underflow-safe norm necessary). * Givens Method is easier to backpropagate. * Givens Method ensure det(Q) = 1 * Givens Rotations seem to be smoother when it comes to Pertubation, resulting in smoother gradients. * Givens Method is easier to parallelize As long as R is non-singular there is always a way to produce a canonical representation from a QR Decompostion (e.g. make the diagonal of R all positive). That also means that there is no compatibilty issue with whichever QR implementation Tensorflow for Python/C/C++ chooses. Both `bandPart` and `triangularSolve` were necessary to implement the symbolic backpropagation of the QR Decomposition.
…to linalg_qr I have no idea where why that branch was ahead of the local one...
…to linalg_qr Merging (Merge from upstream into origin) into local.
Okay so the PR is in a reviewable state (I hope). No more commits will be made until further changes are requested and/or suggested. This is a huge PR, so I understand that review will take some time. |
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
As discussed, this is the first split of PR#1356: A faster QR Decomposition using a direct implementation of the Givens method, including support of Backpropagation.
bandPart
andtriangularSolve
are a prerequesites for symbolic backpropagation of the QR Decomposition. Since both of the methods are frequently used in symbolic backpropagation, I believe they both deserve a backend implementation.matrixTriangularSolve
andmatrixBandParts
are kernels in the Python/C/C++ Tensorflow Implentation as well.bandPart
is currently implemented purely using TFJS methods. In a quick performance trial, bandPart was only 8x slower without a backend implementation. Since it is only anO(m*n)
operation in the first place, it's not too worrysome. The memory overhead however may be more of an issue, depending on how broadcasting is implemented.triangularSolve
will allow solving linear equations systems in TFJS, which is a requested feature.All linting errors are now fixed, but I'm afraid the code might have become (even) less readable. Suggestions as to how to improve this are welcome. In my defense: There was method to my code-formatting-madness. Low level linear algebra is always hard to read (at least to me). So in NDJS I tried my best to format the code in a way that things that belong together are aligned, reducing distractions and making bugs easier to spot.
As I said before, it should not be too hard to implement
qr
andtriangularSolve
in WebGL1. In order to do that, I will however need some guidance and introduction to the TFJS WebGL backend.The randomized gradients test fails by a small margin roughly every 1 in
10,000100,000 tests. With the old implementation (after fixing some reshape and disposal issues), it fails roughly1 time in 1,00076 times in 50,000, sometimes with a large margin (Possible explaination: The Householder implementation ignores sign changes in the input causing abrupt changes in the gradients). Sadly,tf.randomUniform
does not seem to have a seed parameter, which would make the tests reproducible.If there are any questions about implementation details, I'm more than happy to answer them.
Quick Overview
The
qr
implementation is two-fold: Whenever the resultingR
is a square matrix, the economic QR DecompositionqrEcoDecompKernel()
is computed. For the gradients, the same symbolic backpropagation as in Python/C/C++ Tensorflow is used.In all other cases, the full QR Decomposition is computed using
qrFullDecompKernel()
. Backpropagation is computed viaqrFullBackpropKernel()
using the Givens rotations'sin
andcos
values that were recorded byqrFullDecompKernel()
. Higher order derivatives are not (yet) supported/implemented.Why Givens Rotations?
The Householder method is the de-facto standard for QR Decomposition, so I feel like I have to explain why I chose Givens Rotations over it:
between Givens and Householder in JS. My guess is that the JS overhead outweighs the difference
in FLOPs
With Givens Rotations, I was able to implement an economic QR Decomposition that requires onlyO(m*n)
memory instead ofO(m²)
. For Householder, I could not find such an implementation.is required).
det(Q) = +1
, which is somewhat more canonical.c
to+‖c‖e
or-‖c‖e
). This should result in smoother gradients.O(m²n)
all the way down toO(m*n)
for upper triangular inputs.For repository owners only:
Please remember to apply all applicable tags to your pull request.
Tags: FEATURE, BREAKING, BUG, PERF, DEV, DOC, SECURITY
For more info see: https://github.com/tensorflow/tfjs/blob/master/DEVELOPMENT.md
This change is