-
Notifications
You must be signed in to change notification settings - Fork 26k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request]: Manually load/unload checkpoints into GPU #16129
Comments
why? |
In my use case, the machine is running multiple AI services (one of them being this webui). There are several machines that do the same. So the checkpoints should be loaded upon machine boot up and unloaded if memory is needed for another AI service, etc. |
modle loading is a mess in webui the 2 api's endpoints to be honest works more like putting web UI to yo ucan put it to sleep and save VRAM and wake it before use
there is an issue with
change model (load model) can be doen by post to {
"sd_model_checkpoint": "YOUR model"
} or you could use add "override_settings": {
"sd_model_checkpoint": "YOUR model"
} you can get a list all models by using it should is possible to improve it but some people need to wanted enough to work on that future
|
these can aslo help |
Ah, thanks! I knew bits and pieces from inspecting the codebase. This puts them all together. One more thing before I can go off on my own: how do I know whether a model is currently loaded or not? At the moment, I'm inferring this from |
however if you also used if you have improvements that you think that can benefit everyone then don't hesitate to contribute |
Ollama framework has a really handy environment and API accessible variable: OLLAMA_KEEP_ALIVE=[# of seconds] | [xM] | 0 I think it's mostly used for people who want the last loaded chat model to stay loaded longer. But I use it set to zero to keep the GPU VRAM as empty as possible as soon as possible. This is because I have many users that mostly use the GPU for chat and occasionally for Text-to-speech and SD image creation - loading up the GPU VRAM. Unfortunately SDWeb keeps its last model loaded indefinitely. It would be great if SDWeb had a similar Keep Alive option to let us decide how long to keep the last model loaded. |
Is there an existing issue for this?
What would your feature do ?
I want to achieve the following either programmatically or via API:
Proposed workflow
The checkpoint parameter passed in 2 and 3 should be obtained from 1. For example, the object returned in 1 could contain a "uniqueid" key.
Additional information
I've made a fork but it only loads and unloads the currently selected checkpoint. The relevant endpoints are unloadmodel, loadmodel and get_model_status in api.py.
https://github.com/AnthoneoJ/stable-diffusion-webui
The text was updated successfully, but these errors were encountered: