Skip to content

Pull requests: NVIDIA/TensorRT-LLM

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Sort

Pull requests list

Fix Incorrect Batch Slot Usage in addCumLogProbs Kernel
#2787 opened Feb 14, 2025 by aotman Loading…
fix WeightOnlyQuantRowLinear triaged Issue has been triaged by maintainers
#2768 opened Feb 9, 2025 by liquanfeng Loading…
Only read cfg json once triaged Issue has been triaged by maintainers
#2747 opened Feb 5, 2025 by LetsGoFir Loading…
add chunked context/prefill runtime option to trtllm-serve triaged Issue has been triaged by maintainers
#2731 opened Jan 31, 2025 by tsnyder-sps Loading…
feat: deepseek_v1 gqa and correct normalization mode triaged Issue has been triaged by maintainers
#2715 opened Jan 23, 2025 by akhoroshev Loading…
Custom samplingconfig addition
#2633 opened Dec 27, 2024 by buddhapuneeth Loading…
Create c-cpp.yml
#2601 opened Dec 20, 2024 by TNGBBK Loading…
fix NV bench output len was garbage value
#2516 opened Nov 29, 2024 by ekagra-ranjan Loading…
[DO NOT MERGE] test CI
#2513 opened Nov 29, 2024 by niukuo Loading…
bugfix/incorrect lora out dims triaged Issue has been triaged by maintainers
#2484 opened Nov 22, 2024 by akhoroshev Loading…
Fix prompt_table_data empty tensor shape error triaged Issue has been triaged by maintainers
#2470 opened Nov 20, 2024 by BasicCoder Loading…
Create INT8 KV Cache on Qserve triaged Issue has been triaged by maintainers
#2446 opened Nov 14, 2024 by dleunji Loading…
th::optional -> std::optional triaged Issue has been triaged by maintainers
#2397 opened Oct 31, 2024 by r-barnes Loading…
attention mechanism toggle added functionality issue Investigating triaged Issue has been triaged by maintainers
#2384 opened Oct 28, 2024 by Aaryanverma Loading…
fix load_model_on_cpu on qwen/convert_checkpoint.py feature request New feature or request triaged Issue has been triaged by maintainers
#2382 opened Oct 27, 2024 by lkm2835 Loading…
Fix errors when using smoothquant to quantize Qwen2 model Low Precision Issue about lower bit quantization, including int8, int4, fp8 triaged Issue has been triaged by maintainers
#2370 opened Oct 24, 2024 by Missmiaom Loading…
README.md: Add 3rd Party Inference Speed Dashboard Documentation Improvements or additions to documentation triaged Issue has been triaged by maintainers
#2244 opened Sep 22, 2024 by matichon-vultureprime Loading…
Modify small-batched weight only quantization Low Precision Issue about lower bit quantization, including int8, int4, fp8 triaged Issue has been triaged by maintainers
#2213 opened Sep 10, 2024 by dasistwo Loading…
Create sync.yml
#2154 opened Aug 27, 2024 by inkimikoko Loading…
fix GemmFpAIntB MMa::IteratorB::Layout
#2070 opened Jul 31, 2024 by luliyucoordinate Loading…
fix wrong arg in Engine Building Command in docs/source/performance/perf-overview.md Documentation Improvements or additions to documentation
#2057 opened Jul 30, 2024 by RuibaiXu Loading…
ProTip! Exclude everything labeled bug with -label:bug.