Own dataset #21

PoojaAxis · 2023-05-05T08:53:23Z

How to use our own data?

ParisNeo · 2023-05-05T12:28:28Z

Yes, that's possible, We didn't add this to our ui yet. You'll have to wait or do that using the old way with console, yaml confuiguration files and stuff.

You can do all that from the main repo gpt4all:
https://github.com/nomic-ai/gpt4all

you have a train script to retrain with your own data

PoojaAxis · 2023-05-05T12:55:49Z

I have a list of pdfs file on a folder, and would like the bot read from them and output the answers based on the user input. And I don’t know exactly what to modify from the app.py script.

ParisNeo · 2023-05-05T13:25:12Z

Well, reading pdfs will be handled by an extension. Just need to wait a little bit.
You can simply copy manually text and place it in the discussion then ask questions for now. Or even create a personality that contains the text in the conditionning.

The main issue here is context size. The onctext is not big enough to read long text. That's a current limitation that may be fixed when Recurrent Transformers become main stream.

PoojaAxis · 2023-05-08T06:25:28Z

Good Day! Well what do you suggest if we were to upload all of the PDF files onto the database? Or import the contents of all the PDF files into the database?

ParisNeo · 2023-05-08T06:57:59Z

It is possible to do lora fine tuning of your model using a pdf database that you convert to text somehow. But keep in mind that you need to have discussion format, or at least have some examples that resembles a discussion about a text so you don't make the model loose its conversational capabilities. Then you can play on parameters.to adjudt how much of each stuff you want to use. All an exercice of balance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Own dataset #21

Own dataset #21

PoojaAxis commented May 5, 2023

ParisNeo commented May 5, 2023

PoojaAxis commented May 5, 2023 via email •

edited

Loading

ParisNeo commented May 5, 2023

PoojaAxis commented May 8, 2023 via email •

edited

Loading

ParisNeo commented May 8, 2023

Own dataset #21

Own dataset #21

Comments

PoojaAxis commented May 5, 2023

ParisNeo commented May 5, 2023

PoojaAxis commented May 5, 2023 via email • edited Loading

ParisNeo commented May 5, 2023

PoojaAxis commented May 8, 2023 via email • edited Loading

ParisNeo commented May 8, 2023

PoojaAxis commented May 5, 2023 via email •

edited

Loading

PoojaAxis commented May 8, 2023 via email •

edited

Loading