-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Own dataset #21
Comments
Yes, that's possible, We didn't add this to our ui yet. You'll have to wait or do that using the old way with console, yaml confuiguration files and stuff. You can do all that from the main repo gpt4all: you have a train script to retrain with your own data |
I have a list of pdfs file on a folder, and would like the bot read from them and output the answers based on the user input.
And I don’t know exactly what to modify from the app.py script.
|
Well, reading pdfs will be handled by an extension. Just need to wait a little bit. The main issue here is context size. The onctext is not big enough to read long text. That's a current limitation that may be fixed when Recurrent Transformers become main stream. |
Good Day!
Well what do you suggest if we were to upload all of the PDF files onto the database? Or
import the contents of all the PDF files into the database?
|
It is possible to do lora fine tuning of your model using a pdf database that you convert to text somehow. But keep in mind that you need to have discussion format, or at least have some examples that resembles a discussion about a text so you don't make the model loose its conversational capabilities. Then you can play on parameters.to adjudt how much of each stuff you want to use. All an exercice of balance. |
How to use our own data?
The text was updated successfully, but these errors were encountered: