Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add pending PRs from ibm vllm #7

Conversation

prashantgupta24
Copy link
Contributor

@prashantgupta24 prashantgupta24 commented Jun 17, 2024

Pending PRs from https://github.com/IBM/vllm targeting grpc_server.py

@prashantgupta24 prashantgupta24 changed the title Add guided decoding to TGIS gRPC API add pending PRs from ibm vllm Jun 17, 2024
@prashantgupta24 prashantgupta24 force-pushed the ibm-vllm-changes branch 3 times, most recently from b169710 to 779710f Compare June 18, 2024 04:47
@prashantgupta24 prashantgupta24 requested a review from dtrifiro June 18, 2024 04:49
@njhill
Copy link
Contributor

njhill commented Jun 18, 2024

Thanks @prashantgupta24! The changes from my commits look good to me

from vllm.model_executor.guided_decoding import outlines_decoding
from vllm.model_executor.guided_decoding.outlines_decoding import (
GuidedDecodingMode,
_get_cached_logits_processor,
Copy link
Contributor Author

@prashantgupta24 prashantgupta24 Jun 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uhh @njhill it seems like this was removed recently - we did the same thing as they did and replaced it with _get_logits_processor, is that okay?

@dtrifiro
Copy link
Contributor

Looks good, would you mind rebasing to squash some of the commits together?

@prashantgupta24 prashantgupta24 changed the base branch from main to concurrent-http-and-grpc June 19, 2024 18:49
njhill and others added 7 commits June 19, 2024 11:56
  enum ResponseFormat {
    // Plain text, no constraints
    TEXT = 0;
    // Valid json
    JSON = 1;
  }

  message StringChoices {
    repeated string choices = 1;
  }

  // Mutually-exclusive guided decoding options
  oneof guided {
    // Output will be in the specified format
    ResponseFormat format = 3;
    // Output will follow the provided JSON schema
    string json_schema = 4;
    // Output will follow the provided regex pattern
    string regex = 5;
    // Output will be exactly one of the specified choices
    StringChoices choice = 6;
    // Output will follow the provided context free grammar
    string grammar = 7;
  }

Signed-off-by: Nick Hill <[email protected]>
Adds in this one more metric from TGIS

---------

Signed-off-by: Joe Runde <[email protected]>
Also don't set/return seed or other (random) sampling params when in greedy mode.

Signed-off-by: Nick Hill <[email protected]>
Signed-off-by: Prashant Gupta <[email protected]>
@dtrifiro dtrifiro force-pushed the concurrent-http-and-grpc branch from 52f4793 to f861021 Compare June 20, 2024 11:11
@dtrifiro dtrifiro force-pushed the concurrent-http-and-grpc branch 4 times, most recently from 7dec913 to d1479e8 Compare June 20, 2024 13:00
@dtrifiro dtrifiro deleted the branch opendatahub-io:concurrent-http-and-grpc June 20, 2024 13:20
@dtrifiro dtrifiro closed this Jun 20, 2024
@dtrifiro
Copy link
Contributor

Merged in #14

@prashantgupta24 prashantgupta24 deleted the ibm-vllm-changes branch June 24, 2024 22:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants