-
Notifications
You must be signed in to change notification settings - Fork 811
Issues: NVIDIA/TensorRT-LLM
[Issue Template]Short one-line summary of the issue #270
#783
opened Jan 1, 2024 by
juney-nvidia
Open
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
Is MPI required even multi device is disabled?
bug
Something isn't working
#1959
opened Jul 16, 2024 by
jlewi
4 tasks
Model Performance Degraded when using BFLOAT16 LoRa Adapters
bug
Something isn't working
#1957
opened Jul 16, 2024 by
TheCodeWrangler
1 of 4 tasks
Question on how to perform cross-attention with FMHA kernel
#1956
opened Jul 16, 2024 by
Ashwin-Ramesh2607
[Feature] quantize_by_modelopt.py get_tokenizer is not suitable for CodeQwen1.5 7B Chat
bug
Something isn't working
#1953
opened Jul 16, 2024 by
Yuchen-Cao
1 of 4 tasks
[Question] accelerating groupwise weightOnlyGemm on v100 using double buffering?
#1951
opened Jul 16, 2024 by
foricee
LLAMA checkpoint ImportError: undefined symbol
bug
Something isn't working
#1950
opened Jul 16, 2024 by
Pareek-Yash
2 of 4 tasks
Does tensorrt-llm support blip2 with fp8 quantization??
question
Further information is requested
#1949
opened Jul 15, 2024 by
SVT-Yang
H20 runs python benchmark with enable_cuda_graph encounters cudaDeviceSynchronize runtime error
bug
Something isn't working
#1948
opened Jul 15, 2024 by
zxs789
2 of 4 tasks
[Feature]: FlashAttention 3 support
feature request
New feature or request
#1947
opened Jul 15, 2024 by
fan-niu
How to use Medusa to support non llama models?
question
Further information is requested
#1946
opened Jul 15, 2024 by
skyCreateXian
2 of 4 tasks
How to quantize customed models, such as LVM?
question
Further information is requested
#1945
opened Jul 15, 2024 by
XA23i
[Feature Request]: support for vAttention style paging for attention
feature request
New feature or request
#1944
opened Jul 13, 2024 by
thecheekygeek
4 tasks
[new] discord channel for tensorrt
question
Further information is requested
#1943
opened Jul 13, 2024 by
geraldstanje
4 tasks
Mixtral-8x7B repetitive answers
bug
Something isn't working
Investigating
#1942
opened Jul 12, 2024 by
BugsBuggy
2 of 4 tasks
tensorrt_llm.bindings.Request
class is not usable for non-text inputs
feature request
#1941
opened Jul 12, 2024 by
MahmoudAshraf97
3 of 4 tasks
Inquiry Regarding the Use of FP8 Type in GEMM Computations
question
Further information is requested
#1940
opened Jul 12, 2024 by
unbelievable3513
problem with tensorrt_llm performance
bug
Something isn't working
#1938
opened Jul 12, 2024 by
Arnold1
4 tasks
[Model Request] InternVL2.0 support
feature request
New feature or request
#1934
opened Jul 11, 2024 by
BasicCoder
Cannot install tensorrt_llm
bug
Something isn't working
#1933
opened Jul 11, 2024 by
Dawn-2-Winter
1 of 4 tasks
GPU OOM Error When Quantizing Llama 3 8b
bug
Something isn't working
#1932
opened Jul 11, 2024 by
ngockhanh5110
2 of 4 tasks
[model request] PaliGemma support
feature request
New feature or request
#1931
opened Jul 11, 2024 by
kitterive
failed to load whisper decoder engine with paged kv cache
bug
Something isn't working
#1930
opened Jul 10, 2024 by
MahmoudAshraf97
3 of 4 tasks
result is different from 0.9.0 and 0.10.0,and speed has decreased when update version
bug
Something isn't working
#1929
opened Jul 10, 2024 by
sundayKK
2 of 4 tasks
moe kernel Assertion failed when running qwen2-moe-57B-A14B with TP enabled
#1925
opened Jul 9, 2024 by
handoku
Support int type zero-points in weight-only GEMM
feature request
New feature or request
#1922
opened Jul 9, 2024 by
xiaonans
Previous Next
ProTip!
Type g i on any issue or pull request to go back to the issue listing page.