[Feature Request]: Manually load/unload checkpoints into GPU #16129

AnthoneoJ · 2024-07-02T10:33:05Z

Is there an existing issue for this?

I have searched the existing issues and checked the recent builds/commits

What would your feature do ?

I want to achieve the following either programmatically or via API:

List of all checkpoints and their status (loaded or unloaded)
Load a checkpoint
Unload a checkpoint

Proposed workflow

Retrieve the available checkpoints and their status via HTTP request e.g. http://0.0.0.0:7860/sdapi/v1/checkpoint-status
Load a specified checkpoint via HTTP request e.g http://0.0.0.0:7860/sdapi/v1/load-checkpoint?checkpointid=abc123
Unload a specified checkpoint via HTTP request e.g http://0.0.0.0:7860/sdapi/v1/unload-checkpoint?checkpointid=abc123
The checkpoint parameter passed in 2 and 3 should be obtained from 1. For example, the object returned in 1 could contain a "uniqueid" key.

Additional information

I've made a fork but it only loads and unloads the currently selected checkpoint. The relevant endpoints are unloadmodel, loadmodel and get_model_status in api.py.
https://github.com/AnthoneoJ/stable-diffusion-webui

w-e-w · 2024-07-02T10:39:27Z

why?

AnthoneoJ · 2024-07-02T10:50:17Z

why?

In my use case, the machine is running multiple AI services (one of them being this webui). There are several machines that do the same. So the checkpoints should be loaded upon machine boot up and unloaded if memory is needed for another AI service, etc.

w-e-w · 2024-07-02T11:53:10Z

modle loading is a mess in webui
I suggest you just settle with
Maximum number of checkpoints loaded at the same time to 1
and Only keep one model on device True

the 2 api's endpoints to be honest works more like putting web UI to sleep and wake it up from sleep
/sdapi/v1/unload-checkpoint and /sdapi/v1/reload-checkpoint

yo ucan put it to sleep and save VRAM and wake it before use

you need to manually wake it before use (bad design on our part)

there is an issue with /sdapi/v1/unload-checkpoint, if the Maximum number of checkpoints loaded at the same time is > 1, the sleep will only send the current main model to ram
there's no distinguish between which models it just unloads the main model, it only cares about the main model

example if you have Maximum number of checkpoints loaded at the same time set to 3 Only keep one model on device False after switch model or more 3 times there will be 3 modles loaded
now if you use /sdapi/v1/unload-checkpoint only 1 model will be unloaded, 2 will be still loaded

change model (load model) can be doen by post to /sdapi/v1/options with

{
    "sd_model_checkpoint": "YOUR model"
}

or you could use add override_settings arg in payload of txt2img / img2img api call
this method is generally more reliable when dealing with multiple users

"override_settings": {
     "sd_model_checkpoint": "YOUR model"
}

you can get a list all models by using /sdapi/v1/sd-models

it should is possible to improve it but some people need to wanted enough to work on that future
it might even be possible to implement this as an extension

I might trying to work on this but no guarantees

initially I was confused because I somehow misread your request as you wanting to load every model in sequence then unload them for no apparent reason

w-e-w · 2024-07-02T12:02:32Z

these can aslo help
https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Command-Line-Arguments-and-Settings
--nowebui --skip-load-model-at-start

AnthoneoJ · 2024-07-02T16:46:44Z

Ah, thanks! I knew bits and pieces from inspecting the codebase. This puts them all together. One more thing before I can go off on my own: how do I know whether a model is currently loaded or not? At the moment, I'm inferring this from sd_models.model_data.sd_model. If it's None, the model is unloaded, and vice versa.

w-e-w · 2024-07-03T02:43:17Z

sd_models.model_data.sd_model
yeah I think that's pretty much the place you want to look

however if you also used Checkpoints to cache in RAM > 0 then
I think you also want to inspect shared.opts.sd_checkpoint_cache and checkpoints_loaded

if you have improvements that you think that can benefit everyone then don't hesitate to contribute

randelreiss · 2024-07-03T22:00:00Z

Ollama framework has a really handy environment and API accessible variable:

OLLAMA_KEEP_ALIVE=[# of seconds] | [xM] | 0

I think it's mostly used for people who want the last loaded chat model to stay loaded longer. But I use it set to zero to keep the GPU VRAM as empty as possible as soon as possible. This is because I have many users that mostly use the GPU for chat and occasionally for Text-to-speech and SD image creation - loading up the GPU VRAM. Unfortunately SDWeb keeps its last model loaded indefinitely. It would be great if SDWeb had a similar Keep Alive option to let us decide how long to keep the last model loaded.

AnthoneoJ added the enhancement New feature or request label Jul 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request]: Manually load/unload checkpoints into GPU #16129

[Feature Request]: Manually load/unload checkpoints into GPU #16129

AnthoneoJ commented Jul 2, 2024

w-e-w commented Jul 2, 2024

AnthoneoJ commented Jul 2, 2024

w-e-w commented Jul 2, 2024 •

edited

Loading

w-e-w commented Jul 2, 2024

AnthoneoJ commented Jul 2, 2024

w-e-w commented Jul 3, 2024

randelreiss commented Jul 3, 2024

[Feature Request]: Manually load/unload checkpoints into GPU #16129

[Feature Request]: Manually load/unload checkpoints into GPU #16129

Comments

AnthoneoJ commented Jul 2, 2024

Is there an existing issue for this?

What would your feature do ?

Proposed workflow

Additional information

w-e-w commented Jul 2, 2024

AnthoneoJ commented Jul 2, 2024

w-e-w commented Jul 2, 2024 • edited Loading

w-e-w commented Jul 2, 2024

AnthoneoJ commented Jul 2, 2024

w-e-w commented Jul 3, 2024

randelreiss commented Jul 3, 2024

w-e-w commented Jul 2, 2024 •

edited

Loading