-
Notifications
You must be signed in to change notification settings - Fork 207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KeyError: 'samples' #51
Comments
I got the same error when I tried to use Argilla, is there any place I need to store my own samples data for this part? Thanks |
Can you provide more details:
Also, please verify that you are not loading dumps (this error might be due to dumps issues). |
I'm running a classification pipeline. The config file is:
Sorry where to check if i'm loading dumps or not? |
Seems there is something wrong when pipeline tries to generate initial samples, I wonder if we can use our own sample datasets or not, I changed the initial_dataset in config file to 'dump/validated_test_file_temp.csv' which contains one column with several samples and then I got this error, is there any other config I need to modify if I want to use my own sample data?
|
I think I solved this based on the dataset info you provided in example.md. Just want to check what is the difference between prediction col and annotation col, as I know we need to provide both of them in input datasets right? And what are the metadata and score for here Thanks for your help.
|
|
Oh got it, so if I want to input my own sample dataset file, can I leave the prediction col to be all empty? |
Yes.
|
Hi, I will get the AttributeError if I remove the whole annotator part from config file |prompt_model\AutoPrompt\optimization_pipeline.py:57 in | |
Yes, you are right it should not be removed completely but you should modify the method to empty string:
|
Thanks. |
It might be an issue with openAI functions (although the model you are using should be support functions).
|
In this part in evaluator.py, seems it labeled all my prediction col to Discard and then delete all my data, and the self.dataset is empty and all metrix are empty due to this. I'm not sure what did I change so I re-download the code repo and only changed the dataset file and two config files, still got this. Do you have any idea what is going on here
|
It seems like a dataset structure issue. |
Has the example of adding my own annotated dataset been updated? : ) |
If I have 30 samples, with text and annotation. Is it possible to use these samples in the first few iterations, while user llm-generated samples in the following iterations? How should I modify the config files? |
Hi @danielliu99. In order to iterate on your own dataset you need to: 1.You need to transfer your data to AutoPrompt dataset format: should be modified to the number of samples in your csv (from 50) If you want to start with your dataset and then continue with synthetic data: You need to follow the same steps as above and simply change the max_samples to be 30 + the number of synthetic samples you want to add |
Hi @Eladlev thank you for the instructions on how to use AutoPrompt with a ground truth. I have tested the approach and here are a few amendments to the instructions you shared above: Changes in dependencies
Name of the ground truth dataset Example
|
Hi! I tried to run the pipeline using Azure Open AI, using LLM as annotator, but got this error.
The text was updated successfully, but these errors were encountered: