-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pure Browser Implementation #2
Comments
Another issue with Excel: For an enhanced version of the Embeddings lesson plan, I wanted to use SVD on a co-occurrence matrix to demonstrate primitive embeddings. But Excel can't run SVD. |
Self-host OnlyOffice or similar? https://www.onlyoffice.com/spreadsheet-editor.aspx Should be Excel-compatible. …Or import to Google Sheets? |
Google Sheets won't work. It's too big to fit. I tried multiple ways, even using clasp. An alternative might be OpenOffice calc in the meantime for people who don't have excel but haven't tried it yet. |
Hi @ianand, if you think IronCalc mentioned above could be what you are looking for I would be happy to prioritize work to make it happen or tell you if there are major roadblocks. |
@nhatcher-frequenz thanks for reaching out. IronCalc looks cool and promising. How can I help? It doesn't look the version in the playground can read/write excel files yet? |
Hi @ianand, indeed you cannot upload an Excel workbook in the playground. Our best chance right now is to use a tailored script or use a TUI like https://github.com/ironcalc/TironCalc. I think the next steps are for me to identify what is possible and what is not. I will get back at you in the next couple of weeks. |
Sounds good. Let me know how it goes. |
@nhatcher IronCalc says it uses Rust, and WASM for web. Per my understanding, WASM has a hard cap at 4GB of RAM due to being a 32-bit format, and can even run into problems long before due to issues with freeing, address space, and fragmentation. JS reportedly has similar usage limits, visible in the Chromium console or Firefox The |
Hi @will-ca, my own experiments seem to confirm what you are saying. When I run the model in the bare metal my OS reports ~ 12Gb. I have not given up just yet. But running in the browser seems tough |
@will-ca thanks for the heads up and @nhatcher thanks for smoke testing. there is evidently a wasm64 spec that's still experimental as of now but can be enabled via the Chrome flags WebAssembly/memory64#36 |
Don’t know why I didn’t think of this earlier since I considered it this approach for Google sheets awhile back. The model is very modular so it should be easy to split into multiple separate wasm instances (in separate workers? Or separate tabs?) that each should be able to fit into 4gb. Feeling like this should be surmountable. And a first test in ironcalc would be to just extract one of the layer tabs from the excel sheet and its associated weight matrices and check if we can get a single layer of the transformer to run. That’s the first mvp/poc. But we’ll need those compatible formulas. @nhatcher thoughts? |
Hey @ianand, that might work actually. If you are able to split the workbook into 4 different ones using external references (like |
That would be very easy to do. As an aside, maybe an interesting variant would be to have these separate tabs running on separate machines.
I think there are a couple of tricks under my sleeve and I might get back at you in a couple of months with some realistic data, once those functions are implemented Good points @nhatcher. I wonder if I can make the job easier by "meeting in the middle", i.e. I modify the spreadsheet implementation to more closely match what's currently available in IronCalc. Specifically, I could trying re-implement a single layer of GPT2 (i.e. the Block_0 tab in the current sheet) without using Lambdas, etc. Not sure how hard that is though. Can I assume https://github.com/ironcalc/IronCalc/blob/e9fc41541b6e60d66430db68802cf9bdecf378c0/base/src/functions/mod.rs#L70 is the list of currently implemented functions? BTW Does IronCalc support R1C1 style references? |
Yes, the list of supported functions will also be at the wiki:
IronCalc does all it's computations with the R1C1 style internally. The A1 style is only used for display.
I think we can have this conversation once IronCalc is a bit more developed. Once we hit MVP and we have a page in which you can try your formulas easier uploading and downloading Excel workbooks I could ask you to simplify your model a bit. A couple of things I have learned this week. I managed to compile it to wasm64 and it doesn't seem the model will ever run in the browser. A 12 Gb webpage seems to be to much even for modern browsers. But IronCalc will open it, make changes and evaluate it just fine. You will either need to use a not yet developed desktop version, or the TUI Tironcalc. That by itself might be useful for your purposes. Millions of people don't have an access to an Excel Licence. The only question mark is how long would, once the formulas are implemented, take IronCalc to evaluate the model. It might be more time that you are comfortable with (over 5 minutes, maybe more?) |
Thanks.
Great.
Ok.
Thanks for trying. What is the failure mode? Very slow? Out of memory? I still wonder if splitting across layers might help.
Good to know about TUI. It's not accessible as something in a browser but that's better than nothing, especially for those without an Excel license as you point out. I'm look forward to when you think we can do an MVP with a single layer of the model. |
It's eventually killed in my laptop by the oom killer. But it is difficult to say at this point if the problem is solvable, could be an error on the wasm64 side or could be that we just can't parse that data structure into a browser. Another difficulty is that I have to "mock" parts of the workbook to be able to parse it without error. I think it is essentially correct but there are many "what-ifs". At this point, the our best chance is to get IronCalc up to speed. Anyway, as soon as I get some indication that simplifying the workbook somehow or some other solution I will get back to you. |
Hey folks, excited to announce that I recently released a pure browser version of Spreadsheets-are-all-you-need at https://spreadsheets-are-all-you-need.ai/gpt2/ This isn't quite a full spreadsheet, but more a "spreadsheet-meet-python-notebook" interface. Check it out and let me know what you think. Relatedly, I did manage to find a way to get a single layer of the model to fit into a Google Sheet but given there are 11 more layers plus the token embedding matrix, there's no way to fit it into a single Google Sheet at this time. |
Implement an in-browser spreadsheet to run GPT2. Would be great to have a pure browser implementation while maintaining the spreadsheet interface.
Why:
Additional notes:
The text was updated successfully, but these errors were encountered: