-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Interested in contributing #83
Comments
Thanks for reaching out! But unfortunately, I'm going to recommend you not do this project yet. You're right that the SVM support is very limited right now. The basic problem is that the linear algebra support in Haskell is very poor right now. It would be possible to do what you propose with the current linear algebra systems, but I would recommend against it. The resulting code will probably be pretty ugly, slow, hard to maintain, and need to be rewritten anyways when better linear algebra support comes along. So my recommendation would be to not use Haskell for this task. If that didn't scare you away, send me a link to the paper you're thinking about implementing. I can tell you what parts of the linear algebra systems would need reworking for your problem and maybe you could help get those up to speed first. |
Why is the linear-algebra-support pretty poor and slow in Haskell atm? Things WOULD be pretty simple as it is just basic LA. Just some scalar-products, vector-addition and an ocassional derivation of I did a bit on research beside my lecture notes and came across https://mitp-web2.mit.edu/sites/default/files/titles/content/9780262026253_sch_0001.pdf with 3 different approaches to the problem. Basically it is: find argmin(a_i), given sum_ij a_i a_j y_i y_j k(x_i, x_j) - where 0 <= a <= C, y_i labeles in (-1,1), x_i datapoints and k a valid kernel. My first intuition was to just plug that formula into a conjugate-gradient-descent (but ghc threw type errors without end at me.. -.- see: https://github.com/Drezil/HLearn/blob/broken_svm/src/HLearn/Classifiers/Linear.hs#L129-L205 - and yeah.. i already know some things wrong with that, but i had no time to work on that up to now.) I would now try to implement something like "Algorithm 1.3" linked in the paper above. All steps that are commented are explained in detail in the case-study on LibSVM. tho other two algorithms (1.1 & 1.2) could be implemented as well for comparison. Additionally one could think of abstracting over some things in this setting. Especially the difference between a C-SVM and a CVM is the algorithm to find the alpha_i-coefficients (C-SVM does this like LibSVM does that, CVM modifies the kernel to fit the CCMEB-dual problem, makes the CCMEB-primal problem and solves that with a core-set in O(n) and transfers that information the pipeline back resulting in an O(n)-algorithm instead of an O(n³)-Algorithm). Even CVM were proposed around 2009 i could not find a proper (non-matlab) implementation of that algorithm and it would really be helpful to have one of them - no matter the performance for now. You can hit me up on IRC or Jabber as well if you want to discuss details.. |
Subhask provides a decent story for linear algebra, but it's still not (at least what I consider) great. I tried implementing some kernel learners a while back. And while I got some things working, it was very slow. The main reason is that mutable vector operations are still very slow in subhask (as in O(n) time to update a single entry in a vector, so rediculously slow), and all the kernelized algorithms I tried required mutable operations. Fixing this is not a hard task, but it's not something I've had time for yet. If you want to give it a go, I'd be happy to help you. (I don't have too much free time, but I'll do what I can.) I just want to make sure you're fully aware that everything you're working with is super experimental, probably won't work quite like you expect, and will require some deep digging into subhask's internals. |
For a Vector x most kernels depend directly on a variation of the scalar-product with some operations in them. i.e. gaussian-kernels are just In the case of an SVM (as implemented in libsvm) you have n d-dimensional inputs and an n-dimensional weight. here you only update 2 components of the weight at the same time - but you still have to compute n kernels with the complexity O(d) - so in total you go something like O(n) for the traversal + 2 O(nd) for updates instead of O(1) + 2O(nd). Not ideal, but not game-breaking. On the other side the "slower" algorithm 1.1 in the paper above relies on changing everything in the weight-vector anyway. Back to the main question: Why are they so slow? For BVectors i would understand that problem (if you have to start chasing pointers), but for UVector and SVector that should just come with no additional costs. Or is it just that there is currently just no operation supporting that mode? That should be somewhere in IxContainers or so. I would suppose something like All in all i would like to give it a go, because i regularly will have to implement such papers in my further career in university and working with matlab/octave or python drives me nuts... so i would be willing to dump quite some time in this project if it makes my life easier in the long run (i.e. next 5 years). |
If I understand you correctly, you've forgot to include the number of iterations t. You calculate the kernels in a preprocessing step taking time O(n^2), then the iterations will take time O(nt) without mutability, which is a huge penalty. Anyways, I've also added quite a bit more support for mutability than existed when I was playing with kernel algorithms a while back. In subhask, the goal is to unify all mutable operations into a single system defined by https://github.com/mikeizbicki/subhask/blob/master/src/SubHask/Mutable.hs . This mechanism wraps around the many ways mutabilty is handled in standard Haskell ( All algebraic operations have both mutable and immutable versions. See for example Vectors currently fully support fast mutable operations. There's actually no extra overhead for Finally, subhask is in the process of being upgrade to GHC 8. There's a working branch over there, but hlearn hasn't been updated yet. I'm not going to get around to this for some time probably. If you're looking for a good place to start, getting hlearn to compile with the GHC 8 branch of subhask would be awesome!
That was very similar to my motivation :) I've found that it's taken a lot more work than I expected and has distracted me from other projects I wish I could have done. On the plus side, I've learned a lot about the internals of linear algebra libraries and have a very different perspective on theory than anyone else I've encountered. Overall, I still think it was a good decision personally for my longterm goals, but I definitely wouldn't recommend it to everyone. Hopefully things will be a bit easier for you at this point. Are you working on your phd right now? |
I am interested in contributing and i have decent Haskell-Skills .. but this stuff is still a bit above my head.
I would like to have an implementation of the CVM (Core-Vector-Machine - basically a O(n)-SVM with epsilon-error by dualizing the problem, converting it to the dualized CCMEB-problem (center-constrained minimum enclosing ball), converting it to the primary CCMEB and then using an approximate O(1) solver for the CCMEB via a core-set) and some examples for the use of SVM with different Kernels.
I would like to toy around a bit and try to use it on my current problem (including whitening, K-Means, GMLVQ or similar clustering for preprocessing and then an SVM or CVM with custom Kernel for training & classifying) and fix/develop stuff as i go along.
At first i would like to start to flesh out some examples on how to use a SVM with different Kernels and lay the groundwork for that.
The thing in https://github.com/mikeizbicki/HLearn/blob/master/src/HLearn/Classifiers/Linear.hs mentions SVM, but an SVM is no GLM (in the statistical sense).
I think it still fits into Linear, because it is just linear classification in some scalar-space given by the kernel (so just phi(w)^T phi(x) instead of w^T x).
I had lectures on that topic (SVM, CVM, ...) and know all the math and pseudo-code, but i am not sure if i can transfer that on such an abstract level to haskell and meybe need help doing that.
Are you interested in working together?
The text was updated successfully, but these errors were encountered: