-
Notifications
You must be signed in to change notification settings - Fork 811
Pull requests: NVIDIA/TensorRT-LLM
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
DeepSeek MoE support
triaged
Issue has been triaged by maintainers
#1758
opened Jun 9, 2024 by
akhoroshev
Loading…
Bump transformers from 4.36.2 to 4.38.0 in /examples/multimodal
bug
Something isn't working
dependencies
Pull requests that update a dependency file
triaged
Issue has been triaged by maintainers
waiting for feedback
#1689
opened May 28, 2024 by
dependabot
bot
Loading…
add cached generation buffer
triaged
Issue has been triaged by maintainers
waiting for feedback
#1685
opened May 28, 2024 by
michael200892458
Loading…
Optimize python benchmark logging
triaged
Issue has been triaged by maintainers
#1646
opened May 22, 2024 by
michaelnny
Loading…
Fix CUDA OOM when creating Mixtral checkpoint
triaged
Issue has been triaged by maintainers
waiting for feedback
#1629
opened May 19, 2024 by
VivekBits2210
Loading…
Add support for non-power-of-two heads with Alibi
triaged
Issue has been triaged by maintainers
#1611
opened May 15, 2024 by
vmarkovtsev
Loading…
[feat]: Support weight only gemm with 2bit
triaged
Issue has been triaged by maintainers
waiting for feedback
#1568
opened May 9, 2024 by
gavinchen430
Loading…
Support SDXL and its distributed inference
waiting for feedback
#1514
opened Apr 28, 2024 by
Zars19
Loading…
fix: correct cudaSetDevice error when GPUs per node are fewer than their ranks in inter-node inference
#1495
opened Apr 24, 2024 by
littlefatfat
Loading…
llama convert add rotary_scaling param in cli_args
waiting for feedback
#1385
opened Apr 1, 2024 by
activezhao
Loading…
Relax python dependencies
triaged
Issue has been triaged by maintainers
#1346
opened Mar 24, 2024 by
tdeboissiere
Loading…
Previous Next
ProTip!
Updated in the last three days: updated:>2024-07-13.