Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow multi SGLang engines to coordinate #2791

Closed
wants to merge 284 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
284 commits
Select commit Hold shift + click to select a range
ebf19e3
more
fzyzcjy Jan 6, 2025
b9deb74
more
fzyzcjy Jan 6, 2025
79fb5f8
more
fzyzcjy Jan 6, 2025
be6e74d
more
fzyzcjy Jan 6, 2025
655a767
more
fzyzcjy Jan 6, 2025
aa5d90b
more
fzyzcjy Jan 6, 2025
89596db
more
fzyzcjy Jan 6, 2025
f7e21d0
more
fzyzcjy Jan 6, 2025
bc40c20
more
fzyzcjy Jan 6, 2025
3ca3c9b
more
fzyzcjy Jan 6, 2025
46734ea
mv
fzyzcjy Jan 6, 2025
fa523bb
more
fzyzcjy Jan 6, 2025
b2542cc
more
fzyzcjy Jan 6, 2025
2533f9e
more
fzyzcjy Jan 6, 2025
21de8cd
more
fzyzcjy Jan 6, 2025
3263557
more
fzyzcjy Jan 6, 2025
24b72b5
more
fzyzcjy Jan 6, 2025
ccc2c54
more
fzyzcjy Jan 6, 2025
b6f12ba
more
fzyzcjy Jan 6, 2025
cb76376
more
fzyzcjy Jan 6, 2025
8d31405
more
fzyzcjy Jan 6, 2025
a514d99
more
fzyzcjy Jan 6, 2025
7821d91
more
fzyzcjy Jan 6, 2025
d24598d
more
fzyzcjy Jan 6, 2025
8220c4a
more
fzyzcjy Jan 6, 2025
be82180
more
fzyzcjy Jan 6, 2025
5473ab7
more
fzyzcjy Jan 6, 2025
1ac623e
rename
fzyzcjy Jan 6, 2025
b1eaa2c
mv
fzyzcjy Jan 6, 2025
a2ad066
more
fzyzcjy Jan 6, 2025
e8bee84
more
fzyzcjy Jan 6, 2025
a0a531b
more
fzyzcjy Jan 6, 2025
6d6918f
more
fzyzcjy Jan 6, 2025
6e0b6cf
more
fzyzcjy Jan 6, 2025
15fe125
more
fzyzcjy Jan 6, 2025
8608ded
extract
fzyzcjy Jan 6, 2025
86e5522
private
fzyzcjy Jan 6, 2025
8cb34cf
more
fzyzcjy Jan 6, 2025
647c380
mv
fzyzcjy Jan 6, 2025
7e8d1c7
more
fzyzcjy Jan 6, 2025
9b2a2b4
more
fzyzcjy Jan 6, 2025
a838f5b
more
fzyzcjy Jan 6, 2025
6fbe975
more
fzyzcjy Jan 6, 2025
1425ab0
more
fzyzcjy Jan 6, 2025
3927045
more
fzyzcjy Jan 6, 2025
53a039b
more
fzyzcjy Jan 6, 2025
d9d5be1
mv back
fzyzcjy Jan 6, 2025
52de2a3
more
fzyzcjy Jan 6, 2025
cf4679b
more
fzyzcjy Jan 6, 2025
87d4e73
more
fzyzcjy Jan 6, 2025
1b6d5e9
more
fzyzcjy Jan 7, 2025
ba1310f
more
fzyzcjy Jan 7, 2025
8fd7569
more
fzyzcjy Jan 7, 2025
314040a
more
fzyzcjy Jan 7, 2025
14a7c01
more
fzyzcjy Jan 7, 2025
f67b6c7
more
fzyzcjy Jan 7, 2025
fc40f2a
rm
fzyzcjy Jan 7, 2025
9f9045d
more
fzyzcjy Jan 7, 2025
9e58f4a
more
fzyzcjy Jan 7, 2025
df4d74f
Merge branch 'main' into feat/refactor_many
fzyzcjy Jan 7, 2025
32f0eb8
more
fzyzcjy Jan 7, 2025
8019a18
Merge remote-tracking branch 'origin/feat/refactor_many' into feat/re…
fzyzcjy Jan 7, 2025
d2a685b
more
fzyzcjy Jan 7, 2025
f005d7c
more
fzyzcjy Jan 7, 2025
29f12b2
more
fzyzcjy Jan 7, 2025
f3cdeff
Merge branch 'main' into feat/refactor_many
fzyzcjy Jan 7, 2025
1d2520d
more
fzyzcjy Jan 7, 2025
bc2ad08
more
fzyzcjy Jan 7, 2025
2c155a9
more
fzyzcjy Jan 7, 2025
8729287
more
fzyzcjy Jan 7, 2025
46cabc6
more
fzyzcjy Jan 7, 2025
c02cda2
fmt
fzyzcjy Jan 7, 2025
c485b06
more
fzyzcjy Jan 7, 2025
166135d
empty
fzyzcjy Jan 8, 2025
65095b2
rename
fzyzcjy Jan 8, 2025
2391a7f
more
fzyzcjy Jan 8, 2025
4dee5ec
rebase
fzyzcjy Jan 8, 2025
883a4d6
mv
fzyzcjy Jan 8, 2025
e7bcaaf
more
fzyzcjy Jan 8, 2025
1ed7060
rebase more
fzyzcjy Jan 8, 2025
c084186
more
fzyzcjy Jan 8, 2025
c896fe6
more
fzyzcjy Jan 8, 2025
7086193
more
fzyzcjy Jan 8, 2025
d26fd3a
more
fzyzcjy Jan 8, 2025
6066c2e
more
fzyzcjy Jan 8, 2025
af77d69
rename
fzyzcjy Jan 8, 2025
5637774
mv
fzyzcjy Jan 8, 2025
9c8d622
more
fzyzcjy Jan 8, 2025
0971f2a
more
fzyzcjy Jan 8, 2025
afdf8e5
more
fzyzcjy Jan 8, 2025
7ca7fd6
more
fzyzcjy Jan 8, 2025
7a01f04
more
fzyzcjy Jan 8, 2025
e983ddd
more
fzyzcjy Jan 8, 2025
4e8f8f6
more
fzyzcjy Jan 8, 2025
36c4300
more
fzyzcjy Jan 8, 2025
80f65f4
more
fzyzcjy Jan 8, 2025
4ffbeb7
mv
fzyzcjy Jan 8, 2025
bf1a40a
more
fzyzcjy Jan 8, 2025
abdada7
make private
fzyzcjy Jan 8, 2025
e7be226
mv
fzyzcjy Jan 8, 2025
e5d212a
no more
fzyzcjy Jan 8, 2025
2a3d29a
extract
fzyzcjy Jan 8, 2025
20a9d25
more
fzyzcjy Jan 8, 2025
fbb325c
more
fzyzcjy Jan 8, 2025
17c9cf3
more
fzyzcjy Jan 8, 2025
c750dfe
more
fzyzcjy Jan 8, 2025
583c2b1
extract
fzyzcjy Jan 8, 2025
4f456fd
more
fzyzcjy Jan 8, 2025
78c4c02
Revert "more"
fzyzcjy Jan 8, 2025
adbcf6b
more
fzyzcjy Jan 8, 2025
3ccfb60
Revert "more"
fzyzcjy Jan 8, 2025
91d28d5
Revert "Revert "more""
fzyzcjy Jan 8, 2025
aa4f1b4
more
fzyzcjy Jan 8, 2025
6b9af72
more
fzyzcjy Jan 8, 2025
dbaf636
more
fzyzcjy Jan 8, 2025
b049e91
more
fzyzcjy Jan 8, 2025
b817f85
more
fzyzcjy Jan 8, 2025
0d639d9
more
fzyzcjy Jan 8, 2025
3db4329
more
fzyzcjy Jan 8, 2025
1024d28
more
fzyzcjy Jan 8, 2025
603d228
more
fzyzcjy Jan 8, 2025
c05aa7d
more
fzyzcjy Jan 8, 2025
d90e19c
simp
fzyzcjy Jan 8, 2025
706e17a
more
fzyzcjy Jan 8, 2025
f2d9d5e
rename
fzyzcjy Jan 8, 2025
b7b1e7b
more
fzyzcjy Jan 8, 2025
bc767e7
rename
fzyzcjy Jan 8, 2025
313d22f
rename
fzyzcjy Jan 8, 2025
224ccba
more
fzyzcjy Jan 8, 2025
e259e5e
more
fzyzcjy Jan 8, 2025
dd5759d
more
fzyzcjy Jan 8, 2025
af48be1
more
fzyzcjy Jan 8, 2025
0311111
more
fzyzcjy Jan 8, 2025
5c91608
more
fzyzcjy Jan 8, 2025
7c1da7e
more
fzyzcjy Jan 8, 2025
563dab1
more
fzyzcjy Jan 8, 2025
9c1ff09
more
fzyzcjy Jan 8, 2025
aff8155
mv
fzyzcjy Jan 8, 2025
fd80388
more
fzyzcjy Jan 8, 2025
76737a2
more
fzyzcjy Jan 8, 2025
69bab20
more
fzyzcjy Jan 8, 2025
d5c9283
more
fzyzcjy Jan 8, 2025
81bd2df
more
fzyzcjy Jan 8, 2025
67febfc
mv
fzyzcjy Jan 8, 2025
a4a7d0c
simp
fzyzcjy Jan 8, 2025
4423ce2
more
fzyzcjy Jan 8, 2025
2501770
more
fzyzcjy Jan 8, 2025
1954cf9
simp
fzyzcjy Jan 8, 2025
66d494c
more
fzyzcjy Jan 8, 2025
cce5b93
more
fzyzcjy Jan 8, 2025
de2054c
more
fzyzcjy Jan 8, 2025
b4ac5a7
mv
fzyzcjy Jan 8, 2025
6627477
more
fzyzcjy Jan 8, 2025
c756c25
more
fzyzcjy Jan 8, 2025
d368f37
rename
fzyzcjy Jan 8, 2025
a2b49e9
rename
fzyzcjy Jan 8, 2025
05f56fa
rm
fzyzcjy Jan 8, 2025
53de2fa
mv
fzyzcjy Jan 8, 2025
8d049ba
more
fzyzcjy Jan 8, 2025
992de1b
more
fzyzcjy Jan 8, 2025
3d39be5
more
fzyzcjy Jan 8, 2025
80aae80
more
fzyzcjy Jan 8, 2025
044bd12
more
fzyzcjy Jan 8, 2025
9dc6510
more
fzyzcjy Jan 8, 2025
791dabf
rm
fzyzcjy Jan 8, 2025
04f304c
more
fzyzcjy Jan 8, 2025
d9d6506
more
fzyzcjy Jan 8, 2025
1c63389
more
fzyzcjy Jan 8, 2025
6cae287
more
fzyzcjy Jan 8, 2025
f193f5e
more
fzyzcjy Jan 8, 2025
d414f10
more
fzyzcjy Jan 8, 2025
c0bdd2f
more
fzyzcjy Jan 8, 2025
d0e79a2
simp
fzyzcjy Jan 8, 2025
f63f38c
more
fzyzcjy Jan 8, 2025
b3d3e03
more
fzyzcjy Jan 8, 2025
3417438
more
fzyzcjy Jan 8, 2025
646e75c
more
fzyzcjy Jan 8, 2025
607e515
more
fzyzcjy Jan 8, 2025
dff878e
more
fzyzcjy Jan 8, 2025
d30ef53
more
fzyzcjy Jan 8, 2025
3ba771e
cp
fzyzcjy Jan 8, 2025
35d7989
more
fzyzcjy Jan 8, 2025
5ee2b62
more
fzyzcjy Jan 8, 2025
664eeaa
more
fzyzcjy Jan 8, 2025
4564450
more
fzyzcjy Jan 8, 2025
f8a31a7
more
fzyzcjy Jan 8, 2025
a73b3fb
cp
fzyzcjy Jan 8, 2025
f1faa4a
more
fzyzcjy Jan 8, 2025
f14b4fd
more
fzyzcjy Jan 8, 2025
4a53cba
more
fzyzcjy Jan 8, 2025
e08c5cc
more
fzyzcjy Jan 8, 2025
2c3b4cb
more
fzyzcjy Jan 8, 2025
e4d224d
more
fzyzcjy Jan 8, 2025
007dceb
more
fzyzcjy Jan 8, 2025
3048da5
more
fzyzcjy Jan 8, 2025
2568788
more
fzyzcjy Jan 8, 2025
f258286
more
fzyzcjy Jan 8, 2025
3ccc372
fix
fzyzcjy Jan 8, 2025
73af0d3
more
fzyzcjy Jan 8, 2025
967788d
Merge branch 'main' into feat/refactor_many
fzyzcjy Jan 8, 2025
58e4d92
update
fzyzcjy Jan 8, 2025
20a18c8
Merge branch 'feat/refactor_many' into feat/refactor_layer
fzyzcjy Jan 8, 2025
2bbb75c
more
fzyzcjy Jan 8, 2025
066515b
Merge branch 'feat/refactor_many' into feat/refactor_layer
fzyzcjy Jan 8, 2025
81b70b9
fmt
fzyzcjy Jan 8, 2025
e1dd31f
fmt
fzyzcjy Jan 8, 2025
61d972a
Merge branch 'feat/refactor_many' into feat/refactor_layer
fzyzcjy Jan 8, 2025
4ce7f7c
simp
fzyzcjy Jan 8, 2025
6baabdb
more
fzyzcjy Jan 8, 2025
8fc897e
Merge branch 'feat/refactor_many' into feat/refactor_layer
fzyzcjy Jan 8, 2025
e0a81cd
Merge branch 'main' into feat/refactor_many
fzyzcjy Jan 8, 2025
9919be9
Merge branch 'feat/refactor_many' into feat/refactor_layer
fzyzcjy Jan 8, 2025
d1faea4
more
fzyzcjy Jan 8, 2025
de472bc
fmt
fzyzcjy Jan 8, 2025
2ed857c
more
fzyzcjy Jan 8, 2025
3ece2bc
fmt
fzyzcjy Jan 8, 2025
7dd4371
fmt
fzyzcjy Jan 8, 2025
700f3d3
cp
fzyzcjy Jan 8, 2025
1c1a886
more
fzyzcjy Jan 8, 2025
179da2e
more
fzyzcjy Jan 8, 2025
df6fd3a
more
fzyzcjy Jan 8, 2025
370ed33
more
fzyzcjy Jan 8, 2025
b46abca
more
fzyzcjy Jan 8, 2025
732f258
more
fzyzcjy Jan 8, 2025
1a1e6f0
more
fzyzcjy Jan 8, 2025
6c6ecf0
more
fzyzcjy Jan 8, 2025
2de5826
more
fzyzcjy Jan 8, 2025
3c6e15c
fix
fzyzcjy Jan 8, 2025
698d771
fix
fzyzcjy Jan 8, 2025
12ff201
fmt
fzyzcjy Jan 8, 2025
435d5d6
Revert "fmt"
fzyzcjy Jan 8, 2025
b047d04
more
fzyzcjy Jan 8, 2025
707df61
more
fzyzcjy Jan 8, 2025
8853190
more
fzyzcjy Jan 8, 2025
b6699de
more
fzyzcjy Jan 8, 2025
67c702f
more
fzyzcjy Jan 8, 2025
76268bd
cp original
fzyzcjy Jan 8, 2025
ff5e752
rm adhoc
fzyzcjy Jan 8, 2025
6ba871e
Merge branch 'main' into feat/refactor_many
fzyzcjy Jan 10, 2025
c046621
merge
fzyzcjy Jan 10, 2025
e09e9ce
fmt
fzyzcjy Jan 10, 2025
881ef94
Merge branch 'feat/refactor_many' into feat/refactor_layer
fzyzcjy Jan 10, 2025
a980d86
merge
fzyzcjy Jan 11, 2025
883aef7
Merge branch 'main' into feat/refactor_many
fzyzcjy Jan 12, 2025
fd14677
merge
fzyzcjy Jan 12, 2025
19a15d8
Merge branch 'feat/refactor_many' into feat/refactor_layer
fzyzcjy Jan 12, 2025
7596441
merge
fzyzcjy Jan 12, 2025
4f04950
logging
fzyzcjy Jan 12, 2025
765213e
merge
fzyzcjy Jan 12, 2025
136fac9
lint
fzyzcjy Jan 12, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
79 changes: 79 additions & 0 deletions examples/runtime/engine/offline_batch_inference_torchrun.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
import datetime
import os
import sys

from sglang.srt.server.engine_fragment import EngineFragment


def run():
"""
Example command:
```
torchrun --nproc_per_node=4 offline_batch_inference_torchrun.py
```
"""

local_rank = int(os.environ["LOCAL_RANK"])
rank = int(os.environ["RANK"])
world_size = int(os.environ["WORLD_SIZE"])

def _log(text):
t = datetime.datetime.now().strftime("%H:%M:%S")
print(f"[{t}] [rank={rank}] {text}")

_log(
f'start {local_rank=} {rank=} {world_size=} {sys.argv=} {os.environ.get("CUDA_VISIBLE_DEVICES")}'
)

tp_size = world_size
tp_rank = rank
_log(f"{tp_rank=} {tp_size=}")

model_name, mem_fraction_static = "meta-llama/Llama-3.2-1B-Instruct", 0.1
# model_name, mem_fraction_static = "meta-llama/Llama-3.1-70B-Instruct", 0.9 # test large models

for k in [
"GROUP_RANK",
"GROUP_WORLD_SIZE",
"LOCAL_RANK",
"LOCAL_WORLD_SIZE",
"MASTER_ADDR",
"MASTER_PORT",
"NCCL_DEBUG",
"OMP_NUM_THREADS",
"RANK",
"ROLE_NAME",
"ROLE_RANK",
"ROLE_WORLD_SIZE",
"TORCHELASTIC_ERROR_FILE",
"TORCHELASTIC_MAX_RESTARTS",
"TORCHELASTIC_RESTART_COUNT",
"TORCHELASTIC_RUN_ID",
"TORCHELASTIC_USE_AGENT_STORE",
"TORCH_NCCL_ASYNC_ERROR_HANDLING",
"WORLD_SIZE",
]:
del os.environ[k]

fragment = EngineFragment(
model_path=model_name,
mem_fraction_static=mem_fraction_static,
tp_size=tp_size,
tp_rank=tp_rank,
nccl_port=23456,
gpu_id=tp_rank,
)
_log(f"{fragment=}")

output = fragment.generate(
prompt=["1+1=2, 1+2=3, 1+3=4, 1+4=", "9-1=8, 8-1=7, 7-1="],
sampling_params=dict(max_new_tokens=16, temperature=0.0),
)
_log(f"{output=}")

fragment.shutdown()
_log(f"End script")


if __name__ == "__main__":
run()
2 changes: 1 addition & 1 deletion python/sglang/bench_one_batch.py
Original file line number Diff line number Diff line change
Expand Up @@ -60,8 +60,8 @@
from sglang.srt.managers.schedule_batch import Req, ScheduleBatch
from sglang.srt.model_executor.forward_batch_info import ForwardBatch
from sglang.srt.model_executor.model_runner import ModelRunner
from sglang.srt.orchestration.std.launcher import _set_envs_and_config
from sglang.srt.sampling.sampling_params import SamplingParams
from sglang.srt.server import _set_envs_and_config
from sglang.srt.server_args import PortArgs, ServerArgs
from sglang.srt.speculative.spec_info import SpeculativeAlgorithm
from sglang.srt.utils import configure_logger, kill_process_tree, suppress_other_loggers
Expand Down
3 changes: 2 additions & 1 deletion python/sglang/launch_server_llavavid.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,8 @@
import json
import sys

from sglang.srt.server import launch_server, prepare_server_args
from sglang.srt.server import launch_server
from sglang.srt.server_args import prepare_server_args

if __name__ == "__main__":
server_args = prepare_server_args(sys.argv[1:])
Expand Down
2 changes: 1 addition & 1 deletion python/sglang/srt/managers/data_parallel_controller.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@
TokenizedEmbeddingReqInput,
TokenizedGenerateReqInput,
)
from sglang.srt.managers.scheduler import run_scheduler_process
from sglang.srt.orchestration.std.scheduler import run_scheduler_process
from sglang.srt.server_args import PortArgs, ServerArgs
from sglang.srt.utils import bind_port, configure_logger, get_zmq_socket
from sglang.utils import get_exception_traceback
Expand Down
246 changes: 102 additions & 144 deletions python/sglang/srt/managers/detokenizer_manager.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,23 +15,17 @@

import dataclasses
import logging
import signal
from collections import OrderedDict
from typing import Dict, List, Union

import psutil
import setproctitle
import zmq

from sglang.srt.hf_transformers_utils import get_tokenizer
from sglang.srt.managers.io_struct import (
BatchEmbeddingOut,
BatchStrOut,
BatchTokenIDOut,
)
from sglang.srt.server_args import PortArgs, ServerArgs
from sglang.srt.utils import configure_logger, get_zmq_socket
from sglang.utils import find_printable_text, get_exception_traceback
from sglang.srt.server_args import ServerArgs
from sglang.utils import find_printable_text

logger = logging.getLogger(__name__)

Expand All @@ -53,17 +47,7 @@ class DetokenizerManager:
def __init__(
self,
server_args: ServerArgs,
port_args: PortArgs,
):
# Init inter-process communication
context = zmq.Context(2)
self.recv_from_scheduler = get_zmq_socket(
context, zmq.PULL, port_args.detokenizer_ipc_name
)
self.send_to_tokenizer = get_zmq_socket(
context, zmq.PUSH, port_args.tokenizer_ipc_name
)

if server_args.skip_tokenizer_init:
self.tokenizer = None
else:
Expand All @@ -75,125 +59,116 @@ def __init__(

self.decode_status = LimitedCapacityDict()

def trim_matched_stop(
self, output: Union[str, List[int]], finished_reason: Dict, no_stop_trim: bool
):
if no_stop_trim or not finished_reason:
return output
def handle_batch_embedding_out(self, recv_obj: BatchEmbeddingOut):
# If it is embedding model, no detokenization is needed.
return recv_obj

def handle_batch_token_id_out(self, recv_obj: BatchTokenIDOut):
bs = len(recv_obj.rids)

# Initialize decode status
read_ids, surr_ids = [], []
for i in range(bs):
rid = recv_obj.rids[i]
vid = recv_obj.vids[i]
if rid not in self.decode_status or self.decode_status[rid].vid != vid:
s = DecodeStatus(
vid=vid,
decoded_text=recv_obj.decoded_texts[i],
decode_ids=recv_obj.decode_ids[i],
surr_offset=0,
read_offset=recv_obj.read_offsets[i],
)
self.decode_status[rid] = s
else:
s = self.decode_status[rid]
s.decode_ids = recv_obj.decode_ids[i]

read_ids.append(
_trim_matched_stop(
s.decode_ids[s.surr_offset :],
recv_obj.finished_reasons[i],
recv_obj.no_stop_trim[i],
)
)
surr_ids.append(s.decode_ids[s.surr_offset : s.read_offset])

matched = finished_reason.get("matched", None)
if not matched:
return output
# TODO(lmzheng): handle skip_special_tokens/spaces_between_special_tokens per request
surr_texts = self.tokenizer.batch_decode(
surr_ids,
skip_special_tokens=recv_obj.skip_special_tokens[0],
spaces_between_special_tokens=recv_obj.spaces_between_special_tokens[0],
)
read_texts = self.tokenizer.batch_decode(
read_ids,
skip_special_tokens=recv_obj.skip_special_tokens[0],
spaces_between_special_tokens=recv_obj.spaces_between_special_tokens[0],
)

# TODO(lmzheng): handle the case where multiple stop strs are hit
# Incremental decoding
output_strs = []
for i in range(bs):
s = self.decode_status[recv_obj.rids[i]]
new_text = read_texts[i][len(surr_texts[i]) :]
if recv_obj.finished_reasons[i] is None:
# Streaming chunk: update the decode status
if len(new_text) > 0 and not new_text.endswith("�"):
s.decoded_text = s.decoded_text + new_text
s.surr_offset = s.read_offset
s.read_offset = len(s.decode_ids)
new_text = ""
else:
new_text = find_printable_text(new_text)

# Trim stop str.
if isinstance(matched, str) and isinstance(output, str):
pos = output.find(matched)
return output[:pos] if pos != -1 else output
output_strs.append(
_trim_matched_stop(
s.decoded_text + new_text,
recv_obj.finished_reasons[i],
recv_obj.no_stop_trim[i],
)
)

# Trim stop token.
if isinstance(matched, int) and isinstance(output, list):
assert len(output) > 0
return output[:-1]
return output
return BatchStrOut(
rids=recv_obj.rids,
finished_reasons=recv_obj.finished_reasons,
output_strs=output_strs,
prompt_tokens=recv_obj.prompt_tokens,
completion_tokens=recv_obj.completion_tokens,
cached_tokens=recv_obj.cached_tokens,
input_token_logprobs_val=recv_obj.input_token_logprobs_val,
input_token_logprobs_idx=recv_obj.input_token_logprobs_idx,
output_token_logprobs_val=recv_obj.output_token_logprobs_val,
output_token_logprobs_idx=recv_obj.output_token_logprobs_idx,
input_top_logprobs_val=recv_obj.input_top_logprobs_val,
input_top_logprobs_idx=recv_obj.input_top_logprobs_idx,
output_top_logprobs_val=recv_obj.output_top_logprobs_val,
output_top_logprobs_idx=recv_obj.output_top_logprobs_idx,
normalized_prompt_logprob=recv_obj.normalized_prompt_logprob,
)

def event_loop(self):
"""The event loop that handles requests"""

while True:
recv_obj = self.recv_from_scheduler.recv_pyobj()
def _trim_matched_stop(
output: Union[str, List[int]], finished_reason: Dict, no_stop_trim: bool
):
if no_stop_trim or not finished_reason:
return output

if isinstance(recv_obj, BatchEmbeddingOut):
# If it is embedding model, no detokenization is needed.
self.send_to_tokenizer.send_pyobj(recv_obj)
continue
else:
assert isinstance(recv_obj, BatchTokenIDOut)

bs = len(recv_obj.rids)

# Initialize decode status
read_ids, surr_ids = [], []
for i in range(bs):
rid = recv_obj.rids[i]
vid = recv_obj.vids[i]
if rid not in self.decode_status or self.decode_status[rid].vid != vid:
s = DecodeStatus(
vid=vid,
decoded_text=recv_obj.decoded_texts[i],
decode_ids=recv_obj.decode_ids[i],
surr_offset=0,
read_offset=recv_obj.read_offsets[i],
)
self.decode_status[rid] = s
else:
s = self.decode_status[rid]
s.decode_ids = recv_obj.decode_ids[i]

read_ids.append(
self.trim_matched_stop(
s.decode_ids[s.surr_offset :],
recv_obj.finished_reasons[i],
recv_obj.no_stop_trim[i],
)
)
surr_ids.append(s.decode_ids[s.surr_offset : s.read_offset])
matched = finished_reason.get("matched", None)
if not matched:
return output

# TODO(lmzheng): handle skip_special_tokens/spaces_between_special_tokens per request
surr_texts = self.tokenizer.batch_decode(
surr_ids,
skip_special_tokens=recv_obj.skip_special_tokens[0],
spaces_between_special_tokens=recv_obj.spaces_between_special_tokens[0],
)
read_texts = self.tokenizer.batch_decode(
read_ids,
skip_special_tokens=recv_obj.skip_special_tokens[0],
spaces_between_special_tokens=recv_obj.spaces_between_special_tokens[0],
)
# TODO(lmzheng): handle the case where multiple stop strs are hit

# Incremental decoding
output_strs = []
for i in range(bs):
s = self.decode_status[recv_obj.rids[i]]
new_text = read_texts[i][len(surr_texts[i]) :]
if recv_obj.finished_reasons[i] is None:
# Streaming chunk: update the decode status
if len(new_text) > 0 and not new_text.endswith("�"):
s.decoded_text = s.decoded_text + new_text
s.surr_offset = s.read_offset
s.read_offset = len(s.decode_ids)
new_text = ""
else:
new_text = find_printable_text(new_text)

output_strs.append(
self.trim_matched_stop(
s.decoded_text + new_text,
recv_obj.finished_reasons[i],
recv_obj.no_stop_trim[i],
)
)
# Trim stop str.
if isinstance(matched, str) and isinstance(output, str):
pos = output.find(matched)
return output[:pos] if pos != -1 else output

self.send_to_tokenizer.send_pyobj(
BatchStrOut(
rids=recv_obj.rids,
finished_reasons=recv_obj.finished_reasons,
output_strs=output_strs,
prompt_tokens=recv_obj.prompt_tokens,
completion_tokens=recv_obj.completion_tokens,
cached_tokens=recv_obj.cached_tokens,
input_token_logprobs_val=recv_obj.input_token_logprobs_val,
input_token_logprobs_idx=recv_obj.input_token_logprobs_idx,
output_token_logprobs_val=recv_obj.output_token_logprobs_val,
output_token_logprobs_idx=recv_obj.output_token_logprobs_idx,
input_top_logprobs_val=recv_obj.input_top_logprobs_val,
input_top_logprobs_idx=recv_obj.input_top_logprobs_idx,
output_top_logprobs_val=recv_obj.output_top_logprobs_val,
output_top_logprobs_idx=recv_obj.output_top_logprobs_idx,
normalized_prompt_logprob=recv_obj.normalized_prompt_logprob,
)
)
# Trim stop token.
if isinstance(matched, int) and isinstance(output, list):
assert len(output) > 0
return output[:-1]
return output


class LimitedCapacityDict(OrderedDict):
Expand All @@ -207,20 +182,3 @@ def __setitem__(self, key, value):
self.popitem(last=False)
# Set the new item
super().__setitem__(key, value)


def run_detokenizer_process(
server_args: ServerArgs,
port_args: PortArgs,
):
setproctitle.setproctitle("sglang::detokenizer")
configure_logger(server_args)
parent_process = psutil.Process().parent()

try:
manager = DetokenizerManager(server_args, port_args)
manager.event_loop()
except Exception:
traceback = get_exception_traceback()
logger.error(f"DetokenizerManager hit an exception: {traceback}")
parent_process.send_signal(signal.SIGQUIT)
Loading
Loading