Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Own dataset #21

Open
PoojaAxis opened this issue May 5, 2023 · 5 comments
Open

Own dataset #21

PoojaAxis opened this issue May 5, 2023 · 5 comments

Comments

@PoojaAxis
Copy link

How to use our own data?

@ParisNeo
Copy link
Owner

ParisNeo commented May 5, 2023

Yes, that's possible, We didn't add this to our ui yet. You'll have to wait or do that using the old way with console, yaml confuiguration files and stuff.

You can do all that from the main repo gpt4all:
https://github.com/nomic-ai/gpt4all

you have a train script to retrain with your own data

@PoojaAxis
Copy link
Author

PoojaAxis commented May 5, 2023 via email

@ParisNeo
Copy link
Owner

ParisNeo commented May 5, 2023

Well, reading pdfs will be handled by an extension. Just need to wait a little bit.
You can simply copy manually text and place it in the discussion then ask questions for now. Or even create a personality that contains the text in the conditionning.

The main issue here is context size. The onctext is not big enough to read long text. That's a current limitation that may be fixed when Recurrent Transformers become main stream.

@PoojaAxis
Copy link
Author

PoojaAxis commented May 8, 2023 via email

@ParisNeo
Copy link
Owner

ParisNeo commented May 8, 2023

It is possible to do lora fine tuning of your model using a pdf database that you convert to text somehow. But keep in mind that you need to have discussion format, or at least have some examples that resembles a discussion about a text so you don't make the model loose its conversational capabilities. Then you can play on parameters.to adjudt how much of each stuff you want to use. All an exercice of balance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants