Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatic Column Selection #359

Merged
merged 17 commits into from
Sep 25, 2023
Merged

Automatic Column Selection #359

merged 17 commits into from
Sep 25, 2023

Conversation

ritugala
Copy link
Collaborator

@ritugala ritugala commented Sep 22, 2023

Description

A good starting point for reviewing this PR would be: https://github.com/neulab/prompt2model/pull/359/files#diff-952ef426d80e42d77fe849ad91b5ef92bcd907ea702707fb1d9a714d9e31820cR169

Changes:

-Added a file for creating the prompt for column selection, it is similar to the instr_parser_prompt file; The prompt includes incontext learning as well.
-Created a function in dataset_retriever for automatic column selection
-Also included the prompt spec to be passed in canonicalize_dataset_by_cli so that I can use the instruction in column selection prompt.
-Added tests for automatic column selection
-Modified the JSON parser file to handle responses where the JSON keys would have lists as the values (and not just strings)
-Set an explicit value for max_api in json parser incase it is not set - it was getting stuck in an infinite loop otherwise, and constantly calling the API

Issues/Discussions/Question

  • Small considerations: like whether I should be throwing an error here because otherwise if the response is not appropriate (like not a json or something) then it throws a max API recall limit reached, rather than the actual error. Should not be hard to resolve, but just wanted to know if I should fix it or let it be :
    logger.error("Maximum number of API calls reached.")

@ritugala ritugala requested a review from viswavi September 22, 2023 19:47
prompt2model/dataset_retriever/column_selection_prompt.py Outdated Show resolved Hide resolved
prompt2model/dataset_retriever/column_selection_prompt.py Outdated Show resolved Hide resolved
prompt2model/dataset_retriever/column_selection_prompt.py Outdated Show resolved Hide resolved
prompt2model/utils/parse_json_responses.py Outdated Show resolved Hide resolved
prompt2model/utils/parse_json_responses.py Outdated Show resolved Hide resolved
@ritugala ritugala marked this pull request as ready for review September 23, 2023 05:23
@ritugala ritugala requested review from viswavi and neubig September 23, 2023 05:32
Copy link
Collaborator

@neubig neubig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ritugala , this looks really exciting!

I had several comments, and wherever possible I tried to use the github suggestion functionality so you can just accept the comments through github if you want.

@ritugala ritugala requested a review from neubig September 24, 2023 04:36
Copy link
Collaborator

@zhaochenyang20 zhaochenyang20 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add some minimal comments in the last minute. Sorry that preparing for my English test takes too much time.

ritugala and others added 2 commits September 24, 2023 19:26
Co-authored-by: Eren Chenyang Zhao <[email protected]>
Co-authored-by: Eren Chenyang Zhao <[email protected]>
Copy link
Collaborator

@viswavi viswavi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few minor comments - please address them, and then feel free to merge!

This is great work. This PR clearly required a lot of understanding of our system and external components, so this is a great milestone for your first major PR @ritugala!

prompt2model/dataset_retriever/column_selection_prompt.py Outdated Show resolved Hide resolved
prompt2model/utils/parse_json_responses.py Outdated Show resolved Hide resolved
tests/dataset_retriever_test.py Outdated Show resolved Hide resolved
ritugala and others added 4 commits September 25, 2023 21:14
@ritugala ritugala merged commit 7fa341c into main Sep 25, 2023
@ritugala ritugala deleted the ritu-column-selection branch September 25, 2023 16:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants