-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Pull requests: NVIDIA/TensorRT-LLM
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[None][test] Add model-derived PyTorch attention backend test suite
#15536
opened Jun 23, 2026 by
yuxianq
Collaborator
Loading…
[https://nvbugs/6150288][fix] Use persistent per-stream workspace in cublas_mm for CUDA-graph safety
#15534
opened Jun 23, 2026 by
pamelap-nvidia
Collaborator
Loading…
2 of 4 tasks
[None][chore] Clean deprecated CppMambaCacheManager
#15533
opened Jun 23, 2026 by
bo-nv
Collaborator
Loading…
1 task done
[None][feat] Qwen-Image: NVFP4 SVDQuant (NVFP4 residual + rank-r BF16 LoRA)
#15532
opened Jun 23, 2026 by
jingyu-ml
Loading…
[#14874][feat] AutoDeploy : Perf optimization for gpt-oss-120b for low conc
AutoDeploy
<NV> AutoDeploy Backend
#15531
opened Jun 23, 2026 by
taylor-yb-lee
Collaborator
Loading…
1 task done
[None][chore] Autodeploy disable the pipeline cache by default
#15530
opened Jun 22, 2026 by
nvchenghaoz
Collaborator
Loading…
1 task
[None][CI] Waive flaky test_vbench_dimension_score_wan (nvbugs/6357628)
#15529
opened Jun 22, 2026 by
chang-l
Collaborator
Loading…
[https://nvbugs/6276842][test] Loosen rtol/atol on encoder CUDA graph logits parity check
#15527
opened Jun 22, 2026 by
tingyangk
Collaborator
Loading…
1 task done
[None][feat] Add prefix-aware scheduling config flag to support opt-out
#15526
opened Jun 22, 2026 by
SimengLiu-nv
Collaborator
Loading…
1 task done
[TRTLLM-13543][feat] WideEP FT: add EPLB mask-only reconfigure (1b.1)
#15525
opened Jun 22, 2026 by
chienchunhung
Collaborator
Loading…
[TRTLLM-12557][feat] WideEP FT: add AlltoAll watchdog (1a.4)
#15524
opened Jun 22, 2026 by
chienchunhung
Collaborator
Loading…
[None][fix] Preserve Kimi 2.5 tool call IDs
#15523
opened Jun 22, 2026 by
hvagadia
Contributor
Loading…
[#14882][fix] Make kv_cache_aware router robust to a missing KV-event stream
#15522
opened Jun 22, 2026 by
GodlyDonuts
Loading…
[doc] Clarify dtype='auto' resolution for LLM and KvCacheConfig
#15520
opened Jun 22, 2026 by
ojas4414
Loading…
[TRTLLM-11608][feat] Chunked KV cache transfer with early block release
#15519
opened Jun 22, 2026 by
athena-nv
Collaborator
Loading…
1 task done
[TRTLLM-12714][feat] KV pool rebalance: gate for multi-GPU and coordinate attention-DP
#15518
opened Jun 22, 2026 by
thorjohnsen
Collaborator
Loading…
[None][feat] DSA indexer Top-K cross-layer reuse (IndexCache)
#15513
opened Jun 22, 2026 by
murphymatt
Loading…
4 tasks done
[None][bugfix] Fix executor test response timeout
#15502
opened Jun 19, 2026 by
fallintoplace
Loading…
[None][bugfix] Fix Mamba preloaded HF model loading
#15501
opened Jun 19, 2026 by
fallintoplace
Loading…
[None][fix] Make NIXL port-lock path configurable via TRTLLM_NIXL_PORT_LOCK_PATH
#15500
opened Jun 19, 2026 by
CodersAcademy006
Loading…
4 tasks done
[https://nvbugs/6329227][fix] Use pkgutil.extend_path to merge the two flash_attn distributions before…
#15498
opened Jun 19, 2026 by
tensorrt-cicd
Collaborator
Loading…
2 tasks done
[https://nvbugs/6337228][fix] In tests/unittest/tools/test_layer_wise_benchmarks.py, replace check_call with…
#15497
opened Jun 19, 2026 by
tensorrt-cicd
Collaborator
Loading…
2 tasks done
[https://nvbugs/6316980][fix] Added a runtime guard in FlashInferTrtllmGenAttention.is_supported using the…
#15496
opened Jun 19, 2026 by
tensorrt-cicd
Collaborator
Loading…
2 tasks done
Previous Next
ProTip!
What’s not been updated in a month: updated:<2026-05-23.