Skip to content

nimadez/mental-diffusion

Repository files navigation

Mental Diffusion

Fast Stable Diffusion CLI
Powered by Diffusers
Designed for Linux

MDX 0.9.0
Python 3.12 - 3.11
Torch 2.3.1 +cu121
Diffusers 0.29.2
+ Gradio 4.37.2

Features

  • SD, SDXL
  • Load VAE and LoRA weights
  • Txt2Img, Img2Img, Inpaint (auto-pipeline)
  • TAESD latents preview (image and animation)
  • Batch image generation, multiple images per prompt
  • Read/write PNG metadata, auto-rename files
  • CPU, GPU, Low VRAM mode (auto mode)
  • Lightweight and fast, rewritten in 300 lines
  • Proxy, offline mode, minimal downloads

SD3 is currently not supported. prototype

Addons


All addons are based on Gradio and optional
Addons are not as thoroughly tested as the mdx.py script

Name Description Screenshot
main A tabbed interface for all addons -
inference The inference user-interface view
preview Watch preview and gallery view
metadata View and recreate data from PNG view
outpaint Create image and mask for outpaint view
upscale Real-ESRGAN x2 and x4 plus view
~/.venv/mdx/bin/python3 src/addons/addon-name.py

Installation

  • Compatible with most diffusers-based python venvs
  • 3GB Python packages (5GB extracted)
  • 50MB Huggingface cache (automatic, mostly for taesd)
  • Make sure you have a swap partition or swap file
git clone https://github.com/nimadez/mental-diffusion
cd mental-diffusion

# Automatic installation for debian-based distributions:
apt install python3-pip python3-venv
sh install-venv.sh
sh install-bin.sh (optional)

# Manual installation:
python3 -m venv ~/.venv/mdx
source ~/.venv/mdx/bin/activate
pip install torch torchvision --extra-index-url https://download.pytorch.org/whl/cu121
pip install -r ./requirements.txt
deactivate

optional Install Gradio for addons:

~/.venv/mdx/bin/python3 -m pip install gradio==4.37.2

optional Install Real-ESRGAN for upscaler addon:

~/.venv/mdx/bin/python3 -m pip install realesrgan

optional Install Zenity for inference addon:

apt install zenity

Without Zenity, you can't select safetensors files with the file dialog, you have to enter the Checkpoint, VAE and LoRA path manually.

Arguments

~/.venv/mdx/bin/python3 mdx.py --help

--type        -t    str     sd, xl (def: custom)
--checkpoint  -c    str     /checkpoint.safetensors (def: custom)
--scheduler   -sc   str     ddim, ddpm, euler, eulera, lcm, lms, pndm (def: custom)
--prompt      -p    str     positive prompt
--negative    -n    str     negative prompt
--width       -w    int     divisible by 8 (def: custom)
--height      -h    int     divisible by 8 (def: custom)
--seed        -s    int     -1 randomize (def: -1)
--steps       -st   int     1 to 100+ (def: 24)
--guidance    -g    float   0 - 20.0+ (def: 8.0)
--strength    -sr   float   0 - 1.0 (def: 1.0)
--lorascale   -ls   float   0 - 1.0 (def: 1.0)
--image       -i    str     /image.png
--mask        -m    str     /mask.png
--vae         -v    str     /vae.safetensors
--lora        -l    str     /lora.safetensors
--filename    -f    str     filename prefix without .png extension, add {seed} to be replaced (def: img_{seed})
--output      -o    str     image and preview output directory (def: custom)
--number      -no   int     number of images to generate per prompt (def: 1)
--batch       -b    int     number of repeats to run in batch, --seed -1 to randomize
--preview     -pv           stepping is slower with preview enabled (def: no preview)
--lowvram     -lv           slower if you have enough VRAM, automatic on 4GB cards (def: no lowvram)
--metadata    -meta str     /image.png, extract metadata from png

[automatic pipeline]
Txt2Img: no --image and no --mask
Img2Img: --image and no --mask
Inpaint: --image and --mask
ERROR:   no --image and --mask
Default:    mdx -p "prompt" -st 28 -g 7.5
SD:         mdx -t sd -c /checkpoint.safetensors -w 512 -h 512
SDXL:       mdx -t xl -c /checkpoint.safetensors -w 768 -h 768
Img2Img:    mdx -i /image.png -sr 0.5
Inpaint:    mdx -i /image.png -m ./mask.png
VAE:        mdx -v /vae.safetensors
LoRA:       mdx -l /lora.safetensors -ls 0.5
Filename:   mdx -f img_test_{seed}
Output:     mdx -o /home/user/.mdx
Number:     mdx -no 4
Batch:      mdx -b 10
Preview:    mdx -pv
Low VRAM:   mdx -lv
Metadata:   mdx -meta ./image.png

Direct Inference

Import MDX class to inference from JSON data

from mdx import MDX

data = json.loads(data)
data["prompt"] = "new prompt"

parser = argparse.ArgumentParser()
args = parser.parse_args(namespace=argparse.Namespace(**data))

MDX().main(args)

Inference can be interrupted by creating a file named ".interrupt" in the --output directory

Tips & Tricks

  • Enable OFFLINE if you have already downloaded the huggingface cache
  • Enable SAVE_ANIM to save the preview animation to {output}/filename.webp
Preview, cancel, and repeat faster:
mdx -p "prompt" -g 8.0 -st 30 -pv
mdx -p "prompt" -g 8.0 -st 30 -s 827362763262387

Improve details with Img2Img pipeline:
mdx -p "prompt" -st 20 -f myimage
mdx -p "prompt" -st 30 -i ~/.mdx/image.png -sr 0.15

Content-aware upscaling: (ImageMagick, similar to A1111 hires-fix)
mdx -p "prompt" -st 20 -w 512 -h 512 -f image
magick convert ~/.mdx/image.png -resize 200% ~/.mdx/image_up.png
mdx -p "prompt" -st 20 -i ~/.mdx/image_up.png -sr 0.5

Generate 40 images in less time:
mdx -p "prompt" -b 10 -no 4

Extract images from WebP animation: (ImageMagick)
magick convert image.webp jpg

Create images across the LAN via SSH:
apt install openssh-server && ssh-keygen -t rsa -b 4096
ssh [email protected]
$ mdx -p "prompt"

Explore output directory in a browser across the LAN:
cd ~/.mdx && python3 -m http.server 8000
$ open http://192.168.x.x:8000

Download huggingface cache in a specific path:
mkdir ~/.hfcache && ln -s ~/.hfcache ~/.cache/huggingface

Tests

v0.9.0 SD CPU SD GPU SDXL GPU
Txt2Img
Img2Img
Inpaint
VAE
LoRA
Batch
Preview
Low VRAM
  • Debian Trixie (testing branch)
  • Kernel 6.9.8
  • Nvidia driver 535

Previous Experiments

History

↑ Gradio webui addons
↑ Rewritten in 300 lines
↑ Port to Linux
↑ Back to Diffusers
↑ Port to Code (webui)
↑ Change to ComfyUI API (webui)
↑ Created for personal use on Windows OS (diffusers)

"AI will bring us back to the age of terminals."

License

Code released under the MIT license.

Credits

Models
  • zavychromaxl_v80
  • OpenDalleV1.1
  • juggernaut_aftermath