Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Example Annotation with Showcase Notebook #550

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

eshwarprasadS
Copy link
Contributor

@eshwarprasadS eshwarprasadS commented Feb 11, 2025

Addresses #527
This PR:

  • Adds an example notebook showing how to leverage SDG to enable annotation use cases.

  • The example explicitly calls to guided_decoding_backend of vLLM, to make sure that the annotation options (guided_choices) are respected, while generating.

  • Adds example notebook showcasing an end-to-end custom use case SDG to annotate a classification dataset using the composable components available SDG library (namely Pipeline)

@mergify mergify bot added the documentation Improvements or additions to documentation label Feb 11, 2025
@bbrowning
Copy link
Contributor

This looks like a useful example to have in the repository. I rarely use Jupyter notebooks and am not really setup with a way to validate or run this to ensure it works as expected, but perhaps that's something @aakankshaduggal or @khaledsulayman can help with to give this an approval?

Also, we should think of how we keep this updated to ensure it doesn't get stale. The other examples all run as part of our unit test suite in test_examples.py. That may not be possible with a notebook in the same way, but something to think about so we know how we'll keep it working and up-to-date if we can't automatically test it.

" \"max_tokens\": 20,\n",
" \"temperature\": 0,\n",
" \"extra_body\": {\n",
" \"guided_decoding_backend\": \"outlines\", #use outlines backend for guided decoding, explicitly\n",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bbrowning This is an explicit call to the guided_decoding_backend options available in vLLM (https://docs.vllm.ai/en/latest/features/structured_outputs.html).

Tested that this works, along with checking the logs from vLLM, and it seems to receive both the guided_choice and guided_decoding_backend parameters as part of the completion requests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks!

@eshwarprasadS
Copy link
Contributor Author

Also, we should think of how we keep this updated to ensure it doesn't get stale. The other examples all run as part of our unit test suite in test_examples.py. That may not be possible with a notebook in the same way, but something to think about so we know how we'll keep it working and up-to-date if we can't automatically test it.

@bbrowning Thanks for the comment. You are right, that this would be a new type of artifact that would need its own testing suite. I think there are a few different ways to test notebooks in AI / ML libraries (such as ours), but something I came across that might fit our pattern could be nbmake. This plugs in to pytest and can be part of our CI and can be activated like so (?)

- name: Test Jupyter Notebooks
  run: pytest --nbmake path/to/notebooks/

I think it would make sense to make this a new issue, if we have a strong desire to keep our example jupyter notebooks tested and updated regularly.

@bbrowning
Copy link
Contributor

@eshwarprasadS Yes, it's fine to make a new issue to track figuring out how or if we can keep the notebooks updated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants