Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pure Browser Implementation #2

Closed
ianand opened this issue Mar 4, 2024 · 17 comments
Closed

Pure Browser Implementation #2

ianand opened this issue Mar 4, 2024 · 17 comments
Assignees
Labels
enhancement New feature or request

Comments

@ianand
Copy link
Owner

ianand commented Mar 4, 2024

Implement an in-browser spreadsheet to run GPT2. Would be great to have a pure browser implementation while maintaining the spreadsheet interface.

Why:

  • Excel works but this limits the accessibility of the lessons to those who have Excel.
  • Google sheets is free-ish but unable to handle a sheet this size.
  • Spreadsheets are novel interface for running toying with ML models, even perhaps beyond GPT2 (e.g. Mamba Small). Maybe even load ONNX or model checkpoints.

Additional notes:

  • This wouldn't be just GPT2 running in the browser (i.e. pure JS version of GPT2), but a spreadsheet running the browser capable of running GPT2 size models.
@ianand ianand added the enhancement New feature or request label Mar 4, 2024
@ianand ianand self-assigned this Mar 4, 2024
@ianand
Copy link
Owner Author

ianand commented Mar 4, 2024

Another issue with Excel: For an enhanced version of the Embeddings lesson plan, I wanted to use SVD on a co-occurrence matrix to demonstrate primitive embeddings. But Excel can't run SVD.

@will-ca
Copy link

will-ca commented Mar 15, 2024

Self-host OnlyOffice or similar?

https://www.onlyoffice.com/spreadsheet-editor.aspx

Should be Excel-compatible.

…Or import to Google Sheets?

@ianand
Copy link
Owner Author

ianand commented Mar 15, 2024

Google Sheets won't work. It's too big to fit. I tried multiple ways, even using clasp. An alternative might be OpenOffice calc in the meantime for people who don't have excel but haven't tried it yet.

@nhatcher-frequenz
Copy link

Hi @ianand, if you think IronCalc mentioned above could be what you are looking for I would be happy to prioritize work to make it happen or tell you if there are major roadblocks.

@ianand
Copy link
Owner Author

ianand commented Mar 17, 2024

@nhatcher-frequenz thanks for reaching out. IronCalc looks cool and promising. How can I help?

It doesn't look the version in the playground can read/write excel files yet?

@nhatcher
Copy link

Hi @ianand, indeed you cannot upload an Excel workbook in the playground. Our best chance right now is to use a tailored script or use a TUI like https://github.com/ironcalc/TironCalc.

I think the next steps are for me to identify what is possible and what is not. I will get back at you in the next couple of weeks.

@ianand
Copy link
Owner Author

ianand commented Mar 18, 2024

Sounds good. Let me know how it goes.

@will-ca
Copy link

will-ca commented Mar 19, 2024

@nhatcher IronCalc says it uses Rust, and WASM for web. Per my understanding, WASM has a hard cap at 4GB of RAM due to being a 32-bit format, and can even run into problems long before due to issues with freeing, address space, and fragmentation. JS reportedly has similar usage limits, visible in the Chromium console or Firefox about:config.

The .xlsb download on this repository is over 1GB. Presumably, loading and evaluating it takes several times that. Do you have a solution to this possible problem?

@nhatcher
Copy link

Hi @will-ca, my own experiments seem to confirm what you are saying. When I run the model in the bare metal my OS reports ~ 12Gb. I have not given up just yet. But running in the browser seems tough

@ianand
Copy link
Owner Author

ianand commented Mar 20, 2024

@will-ca thanks for the heads up and @nhatcher thanks for smoke testing.

there is evidently a wasm64 spec that's still experimental as of now but can be enabled via the Chrome flags WebAssembly/memory64#36

@ianand
Copy link
Owner Author

ianand commented Mar 21, 2024

Don’t know why I didn’t think of this earlier since I considered it this approach for Google sheets awhile back. The model is very modular so it should be easy to split into multiple separate wasm instances (in separate workers? Or separate tabs?) that each should be able to fit into 4gb. Feeling like this should be surmountable. And a first test in ironcalc would be to just extract one of the layer tabs from the excel sheet and its associated weight matrices and check if we can get a single layer of the transformer to run. That’s the first mvp/poc. But we’ll need those compatible formulas. @nhatcher thoughts?

@nhatcher
Copy link

Hey @ianand, that might work actually. If you are able to split the workbook into 4 different ones using external references (like ='[Workbook2]Sheet3'!D3) then we might have a chance.
But we are not there yet, the required formulas might take months to implement. Some formulas are just a few hours work, but the dynamic arrrays and the Lambdas will need sometime and by then the wasm64 ecosystem might be in place.
Also it is not clear to me that the workbook will compute in a reasonable amount of time. I thought I could mock sone of those functions and get a rough idea of how long the computation would take but I can't do it in confidence.
I think there are a couple of tricks under my sleeve and I might get back at you in a couple of months with some realistic data, once those functions are implemented

@ianand
Copy link
Owner Author

ianand commented Mar 23, 2024

If you are able to split the workbook into 4 different ones using external references (like ='[Workbook2]Sheet3'!D3) then we might have a chance.

That would be very easy to do. As an aside, maybe an interesting variant would be to have these separate tabs running on separate machines.

But we are not there yet, the required formulas might take months to implement. Some formulas are just a few hours work, but the dynamic arrrays and the Lambdas will need sometime and by then the wasm64 ecosystem might be in place. Also it is not clear to me that the workbook will compute in a reasonable amount of time. I thought I could mock sone of those functions and get a rough idea of how long the computation would take but I can't do it in confidence.

I think there are a couple of tricks under my sleeve and I might get back at you in a couple of months with some realistic data, once those functions are implemented

Good points @nhatcher. I wonder if I can make the job easier by "meeting in the middle", i.e. I modify the spreadsheet implementation to more closely match what's currently available in IronCalc. Specifically, I could trying re-implement a single layer of GPT2 (i.e. the Block_0 tab in the current sheet) without using Lambdas, etc. Not sure how hard that is though.

Can I assume https://github.com/ironcalc/IronCalc/blob/e9fc41541b6e60d66430db68802cf9bdecf378c0/base/src/functions/mod.rs#L70 is the list of currently implemented functions?

BTW Does IronCalc support R1C1 style references?

@nhatcher
Copy link

Can I assume https://github.com/ironcalc/IronCalc/blob/e9fc41541b6e60d66430db68802cf9bdecf378c0/base/src/functions/mod.rs#L70 is the list of currently implemented functions?

Yes, the list of supported functions will also be at the wiki:
https://github.com/ironcalc/IronCalc/wiki/

BTW Does IronCalc support R1C1 style references?

IronCalc does all it's computations with the R1C1 style internally. The A1 style is only used for display.

Good points @nhatcher. I wonder if I can make the job easier by "meeting in the middle", i.e. I modify the spreadsheet implementation to more closely match what's currently available in IronCalc. Specifically, I could trying re-implement a single layer of GPT2 (i.e. the Block_0 tab in the current sheet) without using Lambdas, etc. Not sure how hard that is though.

I think we can have this conversation once IronCalc is a bit more developed. Once we hit MVP and we have a page in which you can try your formulas easier uploading and downloading Excel workbooks I could ask you to simplify your model a bit.

A couple of things I have learned this week. I managed to compile it to wasm64 and it doesn't seem the model will ever run in the browser. A 12 Gb webpage seems to be to much even for modern browsers.

But IronCalc will open it, make changes and evaluate it just fine. You will either need to use a not yet developed desktop version, or the TUI Tironcalc. That by itself might be useful for your purposes. Millions of people don't have an access to an Excel Licence. The only question mark is how long would, once the formulas are implemented, take IronCalc to evaluate the model. It might be more time that you are comfortable with (over 5 minutes, maybe more?)

@ianand
Copy link
Owner Author

ianand commented Mar 25, 2024

Yes, the list of supported functions will also be at the wiki: https://github.com/ironcalc/IronCalc/wiki/

Thanks.

IronCalc does all it's computations with the R1C1 style internally. The A1 style is only used for display.

Great.

I think we can have this conversation once IronCalc is a bit more developed. Once we hit MVP and we have a page in which you can try your formulas easier uploading and downloading Excel workbooks I could ask you to simplify your model a bit.

Ok.

A couple of things I have learned this week. I managed to compile it to wasm64 and it doesn't seem the model will ever run in the browser. A 12 Gb webpage seems to be to much even for modern browsers.

Thanks for trying. What is the failure mode? Very slow? Out of memory? I still wonder if splitting across layers might help.

But IronCalc will open it, make changes and evaluate it just fine. You will either need to use a not yet developed desktop version, or the TUI Tironcalc. That by itself might be useful for your purposes. Millions of people don't have an access to an Excel Licence. The only question mark is how long would, once the formulas are implemented, take IronCalc to evaluate the model. It might be more time that you are comfortable with (over 5 minutes, maybe more?)

Good to know about TUI. It's not accessible as something in a browser but that's better than nothing, especially for those without an Excel license as you point out.

I'm look forward to when you think we can do an MVP with a single layer of the model.

@nhatcher
Copy link

Thanks for trying. What is the failure mode? Very slow? Out of memory? I still wonder if splitting across layers might help.

It's eventually killed in my laptop by the oom killer. But it is difficult to say at this point if the problem is solvable, could be an error on the wasm64 side or could be that we just can't parse that data structure into a browser. Another difficulty is that I have to "mock" parts of the workbook to be able to parse it without error. I think it is essentially correct but there are many "what-ifs". At this point, the our best chance is to get IronCalc up to speed. Anyway, as soon as I get some indication that simplifying the workbook somehow or some other solution I will get back to you.

@ianand
Copy link
Owner Author

ianand commented Dec 18, 2024

Hey folks, excited to announce that I recently released a pure browser version of Spreadsheets-are-all-you-need at https://spreadsheets-are-all-you-need.ai/gpt2/ This isn't quite a full spreadsheet, but more a "spreadsheet-meet-python-notebook" interface. Check it out and let me know what you think.

Relatedly, I did manage to find a way to get a single layer of the model to fit into a Google Sheet but given there are 11 more layers plus the token embedding matrix, there's no way to fit it into a single Google Sheet at this time.

@ianand ianand closed this as completed Dec 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants