Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure that SDG path data branch matches evaluation branch #35

Closed
Tracked by #30 ...
nathan-weinberg opened this issue Jun 28, 2024 · 7 comments
Closed
Tracked by #30 ...

Ensure that SDG path data branch matches evaluation branch #35

nathan-weinberg opened this issue Jun 28, 2024 · 7 comments
Assignees
Labels
mmlu Pertains to MMLU mmlubranch Pertains to MMLUBranch

Comments

@nathan-weinberg
Copy link
Member

We need to ensure the SDG path data we are receiving is being generated off the same branch we are passing for evaluation.

I have some sample data that indicates this is tracked in the SDG data in the form of origin_branch_name.

We need to meet with the SDG team and ensure that data will be there and can be consumed in a predictable way.

@nathan-weinberg nathan-weinberg self-assigned this Jun 28, 2024
@nathan-weinberg nathan-weinberg added mmlu Pertains to MMLU mmlubranch Pertains to MMLUBranch labels Jun 28, 2024
@russellb
Copy link
Member

Right now the data format is the same as it was before in the CLI. That data does not exist in the output right now.

@nathan-weinberg
Copy link
Member Author

Ack @russellb - is the plan to have that information added to the output? This is an example of the sample data I'm basing the above assumptions off of: https://github.com/nathan-weinberg/eval/blob/test/tests/testdata/sdg/tonsil_data.jsonl

cc @danmcp

@russellb
Copy link
Member

russellb commented Jul 1, 2024

You should be looking at data generated by the CLI. I don’t think that’s where that data came from.

File an issue against the sdg repo with any requested differences

This was referenced Jul 1, 2024
@nathan-weinberg
Copy link
Member Author

@russellb sounds good, @aakankshaduggal and I are going to talk about this more later today, cc @oindrillac

@nathan-weinberg
Copy link
Member Author

@russellb @oindrillac i just spoke with @aakankshaduggal and the data format y'all are planning to do for training in the issue above is the same format we are expecting for Eval, so we should be good on our side as soon as that's complete!

@russellb
Copy link
Member

russellb commented Jul 1, 2024

assuming someone is going to adjust the CLI training code for the new format as well, then?

@nathan-weinberg
Copy link
Member Author

This issue really only pertains to Eval, so can't speak to Training

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
mmlu Pertains to MMLU mmlubranch Pertains to MMLUBranch
Projects
None yet
Development

No branches or pull requests

2 participants