-
Red Hat
- San Jose, California
Pinned Loading
-
vllm-project/vllm
vllm-project/vllm PublicA high-throughput and memory-efficient inference and serving engine for LLMs
-
IBM/text-generation-inference
IBM/text-generation-inference PublicIBM development fork of https://github.com/huggingface/text-generation-inference
-
-
-
netty/netty
netty/netty PublicNetty project - an event-driven asynchronous network application framework
-
IBM/kv-utils
IBM/kv-utils PublicAbstracted helper classes providing consistent key-value store functionality, with zookeeper and etcd3 implementations
928 contributions in the last year
Day of Week | January Jan | February Feb | March Mar | April Apr | May May | June Jun | July Jul | August Aug | September Sep | October Oct | November Nov | December Dec | January Jan | ||||||||||||||||||||||||||||||||||||||||
Sunday Sun | |||||||||||||||||||||||||||||||||||||||||||||||||||||
Monday Mon | |||||||||||||||||||||||||||||||||||||||||||||||||||||
Tuesday Tue | |||||||||||||||||||||||||||||||||||||||||||||||||||||
Wednesday Wed | |||||||||||||||||||||||||||||||||||||||||||||||||||||
Thursday Thu | |||||||||||||||||||||||||||||||||||||||||||||||||||||
Friday Fri | |||||||||||||||||||||||||||||||||||||||||||||||||||||
Saturday Sat |
Less
No contributions.
Low contributions.
Medium-low contributions.
Medium-high contributions.
High contributions.
More
Activity overview
Loading
Contribution activity
January 2025
Created 3 commits in 1 repository
Created a pull request in vllm-project/vllm that received 18 comments
[Frontend][V1] Online serving performance improvements
These help in particular with TTFT, ITL variance, and overall throughput. Break up output processing (detokenization) to avoid blocking the event …
+101
−45
lines changed
•
18
comments
Opened 2 other pull requests in 1 repository
vllm-project/vllm
2
merged
-
[V1][Frontend] Coalesce bunched
RequestOutput
sThis contribution was made on Jan 22 -
[Benchmark] More accurate TPOT calc in
benchmark_serving.py
This contribution was made on Jan 22
Reviewed 7 pull requests in 1 repository
vllm-project/vllm
7 pull requests
-
[Misc] Enable proxy support in benchmark script
This contribution was made on Jan 24
-
Revert "[core] separate builder init and builder prepare for each batch"
This contribution was made on Jan 24
-
Set weights_only=True when using torch.load()
This contribution was made on Jan 23
-
[V1][Frontend] Coalesce bunched
RequestOutput
sThis contribution was made on Jan 23 -
[Frontend][V1] Online serving performance improvements
This contribution was made on Jan 22
-
[V1] PR 1/N for v1 sample and prompt logprobs support
This contribution was made on Jan 17
-
[V1][Perf] Reduce scheduling overhead in model runner after cuda sync
This contribution was made on Jan 16
Created an issue in vllm-project/production-stack that received 3 comments
With the session-based routing, what happens when a pod goes away?
Sorry if I missed something but from a quick look through the code, it looks like all subsequent requests belonging to sessions assigned to that po…
3
comments