vllm-project / vllm Public

Notifications You must be signed in to change notification settings
Fork 3.3k
Star 23k

Code
Issues 1.1k
Pull requests 317
Discussions
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security
Insights

Pull requests: vllm-project/vllm

Labels 45 Milestones 0

New pull request New

317 Open 2,405 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

[ Doc ] Update README to highlight activation quantization

#6476 opened Jul 16, 2024 by robertgshaw2-neuralmagic

Loading…

[Misc][Speculative decoding] Typos and typing fixes ready

#6467 opened Jul 16, 2024 by ShangmingCai

Loading…

[Bugfix][Frontend] Fix missing /metrics endpoint

#6463 opened Jul 16, 2024 by DarkLight1337

Loading…

[CI/Build] Remove "boardwalk" image asset ready

#6460 opened Jul 16, 2024 by DarkLight1337

Loading…

[Hardware][TPU] Support MoE with Pallas GMM kernel tpu

Related to Google TPUs

#6457 opened Jul 16, 2024 by WoosukKwon

Loading…

[Bugfix] enable prefix caching for AsyncLLMEngine when requesting prompt_logprobs

#6456 opened Jul 15, 2024 by KrishnaM251

Loading…

[Distributed][Model] Rank-based Component Creation for Pipeline Parallelism Memory Optimization

#6455 opened Jul 15, 2024 by wushidonguc

Loading…

[Misc] Log spec decode metrics ready

#6454 opened Jul 15, 2024 by comaniac

Loading…

[Model] H2O Danube3 Collection

#6451 opened Jul 15, 2024 by g-eoj • Draft

[Not for review] PP ADAG

#6448 opened Jul 15, 2024 by ruisearch42 • Draft

[Core] Use numpy to speed up padded token processing ready

#6442 opened Jul 15, 2024 by peng1999

Loading…

[Draft] proposal for ipex quant support

#6440 opened Jul 15, 2024 by jikunshang • Draft

[Doc][CI/Build] Update docs and tests to use vllm serve

#6431 opened Jul 15, 2024 by DarkLight1337

Loading…

[ Kernel ] AWQ Fused MoE

#6422 opened Jul 14, 2024 by robertgshaw2-neuralmagic • Draft

[Doc] Add documentations for nightly benchmarks

#6412 opened Jul 13, 2024 by KuntaiDu

Loading…

[ci][distributed] add pipeline parallel correctness test ready

#6410 opened Jul 13, 2024 by youkaichao

Loading…

[Model] Pipeline parallel support for Mixtral

#6403 opened Jul 13, 2024 by binxuan

Loading…

[Misc][WIP] Disambiguate quantized types via a new ScalarType

#6396 opened Jul 12, 2024 by LucasWilkinson • Draft

3 tasks

torch.compile based model optimizer

#6377 opened Jul 12, 2024 by bnellnm • Draft

[BUGFIX] Raise an error for no draft token case when draft_tp>1 ready

#6369 opened Jul 12, 2024 by wooyeonlee0

Loading…

[ Misc ] Support Act Order in Compressed Tensors

#6358 opened Jul 12, 2024 by robertgshaw2-neuralmagic • Draft

Fix the lm_head in gptbigcode in lora mode

#6357 opened Jul 12, 2024 by maxdebayser

Loading…

[Bugfix] Fix Ray Metrics API usage

#6354 opened Jul 11, 2024 by Yard1

Loading…

[Kernel] Fix identical branches

#6344 opened Jul 11, 2024 by stevegrubb

Loading…

[Core] draft_model_runner: Implement prepare_inputs on GPU for advance_step

#6338 opened Jul 11, 2024 by alexm-neuralmagic

Loading…

Previous 1 2 3 4 5 … 12 13 Next

Previous Next

ProTip! What’s not been updated in a month: updated:<2024-06-16.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly