njhill

Nick Hill njhill

114 followers · 1 following

Red Hat
San Jose, California

Achievements

x4 x3

Achievements

x4 x3

Organizations

Pinned Loading

vllm-project/vllm Public

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 34.6k 5.3k
IBM/text-generation-inference Public

IBM development fork of https://github.com/huggingface/text-generation-inference

Python 58 31
IBM/etcd-java Public

Alternative etcd3 java client

Java 157 45
kserve/modelmesh Public

Distributed Model Serving Framework

Java 156 68
netty/netty Public

Netty project - an event-driven asynchronous network application framework

Java 33.7k 16k
IBM/kv-utils Public

Abstracted helper classes providing consistent key-value store functionality, with zookeeper and etcd3 implementations

Java 5 2

928 contributions in the last year

Learn how we count contributions

Less

Activity overview

Contributed to vllm-project/vllm, IBM/vllm, opendatahub-io/vllm-gaudi and 18 other repositories

Contribution activity

January 2025

Created 3 commits in 1 repository

vllm-project/vllm 3 commits

Created a pull request in vllm-project/vllm that received 18 comments

Jan 21

[Frontend][V1] Online serving performance improvements

These help in particular with TTFT, ITL variance, and overall throughput. Break up output processing (detokenization) to avoid blocking the event …

+101 −45 lines changed • 18 comments

Opened 2 other pull requests in 1 repository

vllm-project/vllm 2 merged

[V1][Frontend] Coalesce bunched RequestOutputs
This contribution was made on Jan 22
[Benchmark] More accurate TPOT calc in benchmark_serving.py
This contribution was made on Jan 22

Reviewed 7 pull requests in 1 repository

vllm-project/vllm 7 pull requests

[Misc] Enable proxy support in benchmark script
This contribution was made on Jan 24
Revert "[core] separate builder init and builder prepare for each batch"
This contribution was made on Jan 24
Set weights_only=True when using torch.load()
This contribution was made on Jan 23
[V1][Frontend] Coalesce bunched RequestOutputs
This contribution was made on Jan 23
[Frontend][V1] Online serving performance improvements
This contribution was made on Jan 22
[V1] PR 1/N for v1 sample and prompt logprobs support
This contribution was made on Jan 17
[V1][Perf] Reduce scheduling overhead in model runner after cuda sync
This contribution was made on Jan 16

Created an issue in vllm-project/production-stack that received 3 comments

Jan 24

With the session-based routing, what happens when a pod goes away?

Sorry if I missed something but from a quick look through the code, it looks like all subsequent requests belonging to sessions assigned to that po…

3 comments

	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec	Jan
Sun
Mon
Tue
Wed
Thu
Fri
Sat

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nick Hill njhill

Achievements

Achievements

Organizations

Block or report njhill

Pinned Loading

928 contributions in the last year

Activity overview

Contribution activity

January 2025

Created a pull request in vllm-project/vllm that received 18 comments

[Frontend][V1] Online serving performance improvements

Created an issue in vllm-project/production-stack that received 3 comments

With the session-based routing, what happens when a pod goes away?

	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec	Jan
Sun
Mon
Tue
Wed
Thu
Fri
Sat

	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec	Jan
Sun
Mon
Tue
Wed
Thu
Fri
Sat

	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec	Jan
Sun
Mon
Tue
Wed
Thu
Fri
Sat