Streaming with sglang frontend language #2799

is-ahmed · 2025-01-08T18:57:24Z

is-ahmed
Jan 8, 2025

Hi, so I'm running into an issue with sglang where it doesn't seem to be able to actually generate long responses, even if I prompt it to.

Here is the example I'm trying

import sglang as sgl
openai_backend = sgl.OpenAI("gpt-3.5-turbo-instruct", api_key=OPENAI_API_KEY)

@sgl.function
def text_qa(s, essay_prompt):
    s += "Q: " + essay_prompt + "\n"
    s += "A:" + sgl.gen("essay", stop="\n")

state = text_qa.run(
    essay_prompt="Write 500 words about France please",
    backend=openai_backend,
    stream=True
)
for out in state.text_iter():
    print(out, end="", flush=True)

And this is the output from the terminal I get, which holds much fewer words than 500.

Q: Write 500 words about France please
A: France, known as the "Land of Love", is a country located in Western Europe. With a population of approximately 68 million people, it is the second most populous country in Europe and one of the world's top tourist destinations. From its rich history, culture, art, food, and fashion, France is a country that continues to captivate and charm people from all around the world.%

If I change the model to gpt-4o, then I get a longer response, but for some reason it cuts off mid sentence.

I don't have this issue if I use the official openai library directly for the same prompt. Is there something I'm doing wrong that's causing this discrepancy?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Streaming with sglang frontend language #2799

{{title}}

Replies: 0 comments

Select a reply

Streaming with sglang frontend language #2799

is-ahmed Jan 8, 2025

Replies: 0 comments

is-ahmed
Jan 8, 2025