-
-
Notifications
You must be signed in to change notification settings - Fork 3.3k
Pull requests: vllm-project/vllm
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[ Doc ] Update README to highlight activation quantization
#6476
opened Jul 16, 2024 by
robertgshaw2-neuralmagic
Loading…
[Misc][Speculative decoding] Typos and typing fixes
ready
#6467
opened Jul 16, 2024 by
ShangmingCai
Loading…
[Bugfix][Frontend] Fix missing
/metrics
endpoint
#6463
opened Jul 16, 2024 by
DarkLight1337
Loading…
[Hardware][TPU] Support MoE with Pallas GMM kernel
tpu
Related to Google TPUs
#6457
opened Jul 16, 2024 by
WoosukKwon
Loading…
[Bugfix] enable prefix caching for AsyncLLMEngine when requesting prompt_logprobs
#6456
opened Jul 15, 2024 by
KrishnaM251
Loading…
[Distributed][Model] Rank-based Component Creation for Pipeline Parallelism Memory Optimization
#6455
opened Jul 15, 2024 by
wushidonguc
Loading…
[Core] Use numpy to speed up padded token processing
ready
#6442
opened Jul 15, 2024 by
peng1999
Loading…
[Doc][CI/Build] Update docs and tests to use
vllm serve
#6431
opened Jul 15, 2024 by
DarkLight1337
Loading…
[ci][distributed] add pipeline parallel correctness test
ready
#6410
opened Jul 13, 2024 by
youkaichao
Loading…
[Misc][WIP] Disambiguate quantized types via a new ScalarType
#6396
opened Jul 12, 2024 by
LucasWilkinson
•
Draft
3 tasks
[BUGFIX] Raise an error for no draft token case when draft_tp>1
ready
#6369
opened Jul 12, 2024 by
wooyeonlee0
Loading…
[ Misc ] Support Act Order in Compressed Tensors
#6358
opened Jul 12, 2024 by
robertgshaw2-neuralmagic
•
Draft
[Core] draft_model_runner: Implement prepare_inputs on GPU for advance_step
#6338
opened Jul 11, 2024 by
alexm-neuralmagic
Loading…
Previous Next
ProTip!
What’s not been updated in a month: updated:<2024-06-16.