-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
converts dataset format messages required for training #94
Conversation
Signed-off-by: Oindrilla Chatterjee <[email protected]>
Signed-off-by: Oindrilla Chatterjee <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @oindrillac, tested and it gives out desired output files!
The e2e is failing, is it because I changed the name of the output file? Will re-instating the name back fix it? |
yes, changing the filenames would break it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The e2e failure should be resolved before merging, thanks!
E2E (NVIDIA A10G x4 - full pipeline) workflow launched on this PR: View run |
e2e workflow succeeded on this PR: View run, congrats! |
just FYI -- this job runs the full pipeline, but does not run training, as we do not have the new training method (training library) working in CI yet. |
Fixed the file names for E2E to pass. Seems like the E2E is failing on a training step. |
so is the e2e expected to fail right now? |
Spoke to @RobotSail and we are not sure what the generated_*.jsonl form is being used for right now. Added a change to drop that file and keep train_ and test_ files intact to be compatible with the CLI. |
ah seems like that seems to fix it, the CI passes |
E2E (NVIDIA A10G x4 - full pipeline) workflow launched on this PR: View run |
e2e workflow failed on this PR: View run, please investigate. |
@russellb can you please take a quick look? |
don't worry about that last failure. The CI job broke because of instructlab/instructlab#1471 fix is here: instructlab/instructlab#1645 |
E2E (NVIDIA A10G x4 - full pipeline) workflow launched on this PR: View run |
e2e workflow failed on this PR: View run, please investigate. |
Ignore again. Still broken related to instructlab/instructlab#1471 fix v2 -- instructlab/instructlab#1650 |
E2E (NVIDIA A10G x4 - full pipeline) workflow launched on this PR: View run |
e2e workflow failed on this PR: View run, please investigate. |
sigh. continue to ignore these. |
I think this is ready for merge. @russellb is this good to go from your end |
yeah, you don't need to block on me. I'll run the full sdg CI workflow again on this repo when it's fixed. Thanks for checking! |
... though I just looked and it's still removing one of the files that used to be created? (the one with a generated_ prefix) but maybe nothing cares |
previously requested changes were addressed, except for the removal of the "generated_" file, but i'm not sure if there is code anywhere that cares
yeah, it is removing that one since it was a duplicate, does not seem to be needed on the CLI from when @RobotSail and I checked, hopefully its just a duplicate |
I see it used in the CLI repo in so if I'm reading this right, this will break all of the unit+functional test jobs on the CLI repo once we make an sdg library release with this change as-is |
Thanks for finding it. Maybe that should also use the train_.jsonl, but that should be tracked as a cli issue, I can keep it here then |
E2E (NVIDIA A10G x4 - full pipeline) workflow launched on this PR: View run |
e2e workflow succeeded on this PR: View run, congrats! |
both the simple and full SDG pipelines are now passing in CI based on this PR, so I'm good with it. thanks everyone! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…ab-app InstructLab macOS App
This PR adds a function that converts dataset format to messages and saves an additional jsonl along with each generate and does that for both train and test dataset.
This resolves #60