Cover image article

Stable Diffusion Updates


This is a series!🔗

Part one: Running Stable Diffusion on Windows with an AMD GPU
Part two: You're here!


This is a follow-up to my previous post, which explains how to get Stable Diffusion up and running on Windows for folks with an AMD GPU.

Now that I've been playing around with it myself for a few days, and had some opportunities to chat with others likewise, I'd like to share a handful of updates and discoveries.

Namely:

Updated Python Script🔗

Since the last post, I've cleaned up the Python script I've been using to invoke Stable Diffusion. The new version now allows for all the parameters to be passed in directly from the CLI--no need to edit the Python code by hand.

It adds one more requirement, a Pip package called "Click" that makes parsing CLI arguments very simple. Make sure you've got your virtual environment activated (virtualenv/Scripts/Activate.ps1 or virtualenv\Scripts\activate.bat), and run:

pip install click

... and you'll be ready to rock.

The updated Python script looks like this:

import click
from diffusers import StableDiffusionOnnxPipeline
import numpy as np

@click.command()
@click.option("-p", "--prompt", required=True, type=str)
@click.option("-w", "--width", required=False, type=int, default=512)
@click.option("-h", "--height", required=False, type=int, default=512)
@click.option("-st", "--steps", required=False, type=int, default=25)
@click.option("-g", "--guidance-scale", required=False, type=float, default=7.5)
@click.option("-s", "--seed", required=False, type=int, default=None)
def run(
    prompt: str, 
    width: int, 
    height: int, 
    steps: int, 
    guidance_scale: float, 
    seed: int):

    pipe = StableDiffusionOnnxPipeline.from_pretrained(
        "./stable_diffusion_onnx", 
        provider="DmlExecutionProvider"
    )        

    # Generate our own latents so that we can provide a seed.
    seed = np.random.randint(np.iinfo(np.int32).max) if seed is None else seed
    latents = get_latents_from_seed(seed, width, height)

    print(f"\nUsing a seed of {seed}")
    image = pipe(prompt, height=height, width=width, num_inference_steps=steps, guidance_scale=guidance_scale, latents=latents).images[0]
    image.save("output.png")

def get_latents_from_seed(seed: int, width: int, height:int) -> np.ndarray:
    # 1 is batch size
    latents_shape = (1, 4, height // 8, width // 8)
    # Gotta use numpy instead of torch, because torch's randn() doesn't support DML
    rng = np.random.default_rng(seed)
    image_latents = rng.standard_normal(latents_shape).astype(np.float32)
    return image_latents

if __name__ == '__main__':
    run()

You can also find it directly on GitHub if that's more your style.

It takes up to six parameters, only one of which is required:

  • -p or --prompt is required, and is the text prompt you'd like to generate an image from.
  • -w or --width is optional, defaults to 512, and must be divisible by 8.
  • -h or --height is optional, defaults to 512, and must be divisible by 8
  • -st or --steps is optional, defaults to 25, and is the number of iterations that will be performed on your prompt. Generally speaking, the higher this number is, the better quality the output.
  • -g or --guidance-scale is optional, defaults to 7.5, and is how heavily the AI will weight your prompt versus being creative. 0 means that the AI will take a great deal of creative liberty. 20 or higher means that it attempt to rigidly adhere to the prompt.
  • -s or --seed is optional, defaults to a randomly generated 32-bit integer, and is the value used as a seed for generating randomness. The same prompt with the same seed will produce the same output.

With these modifications, you can now invoke the script like so:

.\text2img.py -st 25 -p "A happy cat in a cyberpunk garden, cartoony style, digital painting, artstation, concept art, smooth, sharp focus, illustration, 8k"

An AI-generated picture of a large cat sitting in the middle of a futuristic intersection in a cyberpunk city.

Voila!

Disabling the Safety Checker🔗

You may have noticed that, sometimes, instead of generating something useful, your output image will just be a blank, black square. This isn't a bug, or an error--this is because Stable Diffusion's built in Safety Checker has detected content that is either NSFW, or otherwise objectionable.

Now, if this you find that this is a useful feature, you could just detect it and print out a message, by doing something like this in the Python script:

result = pipe(prompt, height=height, width=width, num_inference_steps=steps, guidance_scale=guidance_scale, latents=latents)
image = result.images[0]
is_nsfw = result.has_nsfw_concept
if is_nsfw: 
    print("Oh no! NSFW output detected!")

image.save("output.png")

...but for my use case, I'm only running this locally, and I don't really care if the AI occasionally generates some boobs. As an extra bonus, I've observed that if I disable the safety checker, I get a pretty significant speedup--somehwere between 20% and 40%, which usually shaves around a minute off my runtime. Not bad! So, if you'd like to disable the safety checker, all you have to do is add the following line after the declaration of pipe:

# .... etc
pipe = StableDiffusionOnnxPipeline.from_pretrained(
    "./stable_diffusion_onnx", 
    provider="DmlExecutionProvider"
)    

# Add this line here!
pipe.safety_checker = lambda images, **kwargs: (images, [False] * len(images))
# ... etc

This is a tiny bit of a hack--we're messing around with the internals of pipe, which aren't really meant to be used externally, but dynamic languages gonna dynamic language. We replace the safety_checker member of pipe with what is basically a dummy function that unconditionally returns false.

Now, no more black squares! Just beware, you now have a high likelihood of generating stuff you probably don't want to open up at work.

Using Different Schedulers🔗

Stable Diffusion can use a number of different sampling methods, which the diffusers package refers to as "schedulers" internally. The details of all of these are, frankly, not something I've investigated in great detail. The short version is that the characteristics of what they output, particularly at lower numbers of steps, tend to vary. For that reason, it can be useful to sometimes use a different scheduler. To use a different one, you have to construct it manually, and then pass it into the call to from_pretrained(). For example:

# Up in your imports, add the DDIMScheduler from diffusers
import click
import diffusers import StableDiffusionOnnxPipeline, DDIMScheduler
import numpy as np

# Skipping a few lines for brevity...

# Constructing the DDIMScheduler scheduler manually:
scheduler = DDIMScheduler(beta_start=0.00085, beta_end=0.012, beta_schedule="scaled_linear", num_train_timesteps=1000)

# And telling the created pipe to use it:
    pipe = StableDiffusionOnnxPipeline.from_pretrained(
        "./stable_diffusion_onnx", 
        provider="DmlExecutionProvider",
        scheduler=scheduler
    )    

...however, if you run this as-is, it won't work. You'll get an arcane error along the lines of "expected np.int64, got np.int32".

Fixing this requires two things, the first of which is immensely hacky.

The first thing🔗

We need to go modify our local version of Stable Diffusion's Onnx pipeline. In order to find it, go look in virtualenv\Lib\site-packages\diffusers\pipelines\stable_diffusion\ in whatever folder you have your virtual environment set up in.

Once in there, find pipeline_stable_diffusion_onny.py. That's our target here. Open it up, head down to line 133. We're going to change it from:

# OLD
sample=latent_model_input, timestep=np.array([t]), encoder_hidden_states=text_embeddings

into...

# NEW
sample=latent_model_input, timestep=np.array([t], dtype=np.int64), encoder_hidden_states=text_embeddings

We're now specifying the dtype in our call to np.array().

Remember that this change won't survive if you recreate your virtual environment, reinstall the diffusers package, or update the diffusers package. I fully expect the need for this to go away in the next release of diffusers anyway.

The second thing🔗

Once the diffusers package has been modified, you need to make a tiny change to how we declare our scheduler. Let's reuse our DDIM scheduler example.

Instead of doing this:

# Wrong
scheduler = DDIMScheduler(beta_start=0.00085, beta_end=0.012, beta_schedule="scaled_linear", num_train_timesteps=1000)

...do this:

# Right
scheduler = DDIMScheduler(beta_start=0.00085, beta_end=0.012, beta_schedule="scaled_linear", num_train_timesteps=1000, tensor_format="np")

(I believe the reason for this is because we're using Onnx and not Torch, we need to tell the scheduler to use Numpy's tensor format, and not Torch's. I think. I'm no expert on this.)

Once you've done Thing One and Thing Two, you should now be able to use the other schedulers. Examples of constructing them can be found in HuggingFace's diffusers repository.

For an example of what all this looks like when put together, take a look at the version I have in GitHub.

Disclaimer: I've only tried the DDIM scheduler myself--my GPU is a touch under-powered, and I mostly just wanted to run something that would generate acceptable results in fewer steps. If you have any success in getting the others running, feel free to leave a comment!

Wrapping Up🔗

I think that's all I've got for this one. A CLI-ified version of the script from last time, disabling the safety checker to win some speed (and possible salacious output), and enabling other schedulers when using the Onnx pipeline. Not bad for a few days of tinkering.

Some additional thanks to ponut64 in the comments of the last post, and AzuriteCoin for confirming the Onnx scheduler fix.

One extra thought: one thing I might do in the future is enhance my little CLI script to allow the caller to choose which scheduler to use. I'll have to play around with that a bit more, but watch this space if you're interete in such a thing (and don't just hack it together yourself).

Thanks for reading! As ever, I can be found on GitHub as pingzing and Twitter as @pingzingy. Happy generating!

Creative Commons BY badge The text of this blog post is licensed under a Creative Commons Attribution 4.0 International License.

Comments

  1. Luxion
    Mon, Sep 19, 2022, 19:00:40
    On my RX560 4G it takes 30s for the script to load the scheduler and the pipe. So if I want to generate lets say 100 images with the exact same prompt and settings (rand seed) I'll end up spending 50m just for loading those.
    We can bypass that by running the rest of the code in a while loop for which the amount of loops should be an optional arg (default 1) passed through the command. Just something I thought it was worth mentioning potentially for your next guide update.
    Again, thanks for doing this and please keep an eye on the incoming img2img+inpaint onnx repo.
  2. Neil
    Tue, Sep 20, 2022, 11:41:33
    Huh, good observation. I've just been generating single one-offs, and hadn't put any thought to the high startup time of setting up the pipeline.
  3. Adam
    Tue, Sep 20, 2022, 19:16:39
    I haven't been able to get any output size other than 512 x 512 to work. Here's the terminal output:

    python '.\text2img.py' -p "a cat sitting on a rusted car landscape" -w 1024 -h 1024 -g 13
    2022-09-20 13:59:55.9690752 [W:onnxruntime:, inference_session.cc:490 onnxruntime::InferenceSession::RegisterExecutionProvider] Having memory pattern enabled is not supported while using the DML Execution Provider. So disabling it for this session since it uses the DML Execution Provider.
    2022-09-20 13:59:56.2614650 [W:onnxruntime:, session_state.cc:1198 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
    2022-09-20 13:59:56.2653808 [W:onnxruntime:, session_state.cc:1200 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
    2022-09-20 13:59:58.0183008 [W:onnxruntime:, inference_session.cc:490 onnxruntime::InferenceSession::RegisterExecutionProvider] Having memory pattern enabled is not supported while using the DML Execution Provider. So disabling it for this session since it uses the DML Execution Provider.
    2022-09-20 13:59:58.0897282 [W:onnxruntime:, session_state.cc:1198 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
    2022-09-20 13:59:58.0966059 [W:onnxruntime:, session_state.cc:1200 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
    2022-09-20 13:59:59.0341965 [W:onnxruntime:, inference_session.cc:490 onnxruntime::InferenceSession::RegisterExecutionProvider] Having memory pattern enabled is not supported while using the DML Execution Provider. So disabling it for this session since it uses the DML Execution Provider.
    2022-09-20 13:59:59.2561741 [W:onnxruntime:, session_state.cc:1198 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
    2022-09-20 13:59:59.2635000 [W:onnxruntime:, session_state.cc:1200 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
    2022-09-20 14:00:00.4652889 [W:onnxruntime:, inference_session.cc:490 onnxruntime::InferenceSession::RegisterExecutionProvider] Having memory pattern enabled is not supported while using the DML Execution Provider. So disabling it for this session since it uses the DML Execution Provider.
    2022-09-20 14:00:01.7090363 [W:onnxruntime:, session_state.cc:1198 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
    2022-09-20 14:00:01.7129493 [W:onnxruntime:, session_state.cc:1200 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
    ftfy or spacy is not installed using BERT BasicTokenizer instead of ftfy.

    Using a seed of 50033
    0%| | 0/26 [00:00
    run()
    File "C:\Users\Adam\Documents\stable-diffusion\virtualenv\lib\site-packages\click\core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
    File "C:\Users\Adam\Documents\stable-diffusion\virtualenv\lib\site-packages\click\core.py", line 1055, in main
    rv = self.invoke(ctx)
    File "C:\Users\Adam\Documents\stable-diffusion\virtualenv\lib\site-packages\click\core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
    File "C:\Users\Adam\Documents\stable-diffusion\virtualenv\lib\site-packages\click\core.py", line 760, in invoke
    return __callback(*args, **kwargs)
    File "C:\Users\Adam\Documents\stable-diffusion\text2img.py", line 30, in run
    image = pipe(prompt, height=height, width=width, num_inference_steps=steps, guidance_scale=guidance_scale, latents=latents).images[0]
    File "C:\Users\Adam\Documents\stable-diffusion\virtualenv\lib\site-packages\diffusers\pipelines\stable_diffusion\pipeline_stable_diffusion_onnx.py", line 132, in __call__
    noise_pred = self.unet(
    File "C:\Users\Adam\Documents\stable-diffusion\virtualenv\lib\site-packages\diffusers\onnx_utils.py", line 51, in __call__
    return self.model.run(None, inputs)
    File "C:\Users\Adam\Documents\stable-diffusion\virtualenv\lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 200, in run
    return self._sess.run(output_names, input_feed, run_options)
    onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running Einsum node. Name:'Einsum_558' Status Message: D:\a\_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\MLOperatorAuthorImpl.cpp(1866)\onnxruntime_pybind11_state.pyd!00007FFFDACFA60A: (caller: 00007FFFDACFBC0F) Exception(3) tid(16c4) 80070057 The parameter is incorrect.

    Hopefully you can pull more from that than me. Also hopefully the problem isn't only on my end :)

    Thanks for your work on this, been having a great time.
  4. Neil
    Tue, Sep 20, 2022, 20:27:00
    -Adam-
    I have vague memories of the Onnx pipeline being limited to 512x512 right now. Hopefully as Onnx support matures, that limitation will go away. Keep on eye on Hugging Face's diffusers repo, I suppose.
  5. Brian
    Tue, Sep 20, 2022, 22:06:54
    I can confirm -w 256 -h 256 produces garbage output.

    Also, here's a basic implementation of opening the pipe then running multiple iterations like luxion says

    import click
    from diffusers import StableDiffusionOnnxPipeline
    import numpy as np

    @click.command()
    @click.option("-p", "--prompt", required=True, type=str)
    @click.option("-w", "--width", required=False, type=int, default=512)
    @click.option("-h", "--height", required=False, type=int, default=512)
    @click.option("-st", "--steps", required=False, type=int, default=25)
    @click.option("-g", "--guidance-scale", required=False, type=float, default=7.5)
    @click.option("-s", "--seed", required=False, type=int, default=None)
    @click.option("-i", "--iter", required=False, type=int, default=1)
    def run(
    prompt: str,
    width: int,
    height: int,
    steps: int,
    guidance_scale: float,
    seed: int,
    iter: int):

    pipe = StableDiffusionOnnxPipeline.from_pretrained(
    "./stable_diffusion_onnx",
    provider="DmlExecutionProvider"
    )

    # Add this line here!
    pipe.safety_checker = lambda images, **kwargs: (images, [False] * len(images))
    # ... etc

    first_iter = True
    for i in range(iter):
    # Generate our own latents so that we can provide a seed.
    if first_iter == True:
    seed = np.random.randint(np.iinfo(np.int32).max) if seed is None else seed
    first_iter = False
    else:
    seed = np.random.randint(np.iinfo(np.int32).max)
    latents = get_latents_from_seed(seed, width, height)

    print(f"\nUsing a seed of {seed}")
    image = pipe(prompt, height=height, width=width, num_inference_steps=steps, guidance_scale=guidance_scale, latents=latents).images[0]
    image.save("output_" + str(i) + ".png")

    def get_latents_from_seed(seed: int, width: int, height:int) -> np.ndarray:
    # 1 is batch size
    latents_shape = (1, 4, height // 8, width // 8)
    # Gotta use numpy instead of torch, because torch's randn() doesn't support DML
    rng = np.random.default_rng(seed)
    image_latents = rng.standard_normal(latents_shape).astype(np.float32)
    return image_latents

    if __name__ == '__main__':
    run()
  6. Eric
    Wed, Sep 21, 2022, 13:27:58
    Next is getting this to work with one of the UIs
  7. Wolfy
    Wed, Sep 21, 2022, 16:01:37
    Using the DIMM scheduler like you posted on github just seems to generate brown squares for me, even with 25+ steps
  8. Mike
    Fri, Sep 23, 2022, 00:46:37
    Hey, is it possible to make a simple window with tinker, where you can type in the prompts and the values you want? maybe a image outputview when done?
  9. quickwick
    Sun, Sep 25, 2022, 21:41:01
    Thanks for this, Neil!

    I've been playing with this for the past few days. I ended up hacking together a basic Tkinter-based GUI to make experimentation faster/easier. Hopefully other people find it useful. https://github.com/quickwick/stable-diffusion-win-amd-ui
  10. Rosh
    Fri, Sep 30, 2022, 00:30:13
    Really wanted to thank you for the great tutorials. I was using my CPU to generate images before hand and ity was taking upwards of 30-50 minutes, now It takes about 6.

    I do have a lot of output saying "Having memory pattern enabled is not supported while using the DML Execution Provider. So disabling it for this session since it uses the DML Execution Provider." and I'm not sure if that's normal; but there don't seem to be any issues.
  11. Anxst
    Sun, Oct 02, 2022, 19:38:01
    Fantastic tutorial. Got me up and runnign with the basics. I can confirm that sadly it won't generate anything above 512*512 images at the moment, it errors on all larger sizes. Also trying to figure out how to convert other .ckpt models is interesting.

    Thanks for all the great info!
  12. MindLeaker
    Sun, Oct 02, 2022, 22:37:16
    This was an amazing tutorial, thanks a bunch Neil, quick question though, do you have any idea how to implement stable-diffusion-webui's "negative prompt" feature onto your script? It seems to have quite the positive impact on image cohesiveness.
  13. Ray
    Tue, Oct 04, 2022, 17:20:55
    Awesome! Running on RX580 8GB!
    Around 30sec per iteration.

    Sadly, 512x512 is my limit too.
    Changing the pipe output size + changing the latent size gets it to "work" for me, but throws "Not enough memory" errors!
    I'm so close but so far away. :'(

    Also, found users using inpainting with this, and am excited to give that a try.

    Currently trying to figure out textual inversion.
  14. Ruben
    Tue, Oct 04, 2022, 22:50:13
    Amazing stuff Neil! Looking forward to more updates, be it on this or AI generation in general. If there was a donation button I would have clicked it, you were amazing help!
  15. Thanks for the guide I will add my gradio UI implementation
    Wed, Oct 05, 2022, 13:44:01
    I will add my Gradio UI local web interface here if someone want it. Put it inside dml_onyx.py on the end, you also need:
    import gradio as gr

    [code]
    #image GRID (currently not used because can't generate more than 1)
    def image_grid(imgs, rows, cols):
    assert len(imgs) == rows*cols

    w, h = imgs[0].size
    grid = Image.new('RGB', size=(cols*w, rows*h))
    grid_w, grid_h = grid.size

    for i, img in enumerate(imgs):
    grid.paste(img, box=(i%cols*w, i//cols*h))
    return grid

    #num_images = 3

    #gr.Interface.from_pipeline(pipe).launch()
    def predict(text,steps,height,width,scale,strenght,batch_count,batch_size):
    # text = [text] * num_images
    images = pipe(prompt=text, height=height, width=width, num_inference_steps=steps, guidance_scale=scale, denoising_strength=strenght, batch_count=batch_count, batch_size=batch_size, eta=0.0, execution_provider="DmlExecutionProvider")["sample"][0]
    # grid = image_grid(images, rows=1, cols=num_images)
    return(images)

    gr.Interface(
    predict,
    title = "Stable Diffusion ONNX",
    allow_flagging="never",
    inputs = [
    gr.Textbox(label = 'Prompt text', lines=2, elem_id="text"),
    gr.Slider(minimum=1, maximum=150, step=1, label="Sampling Steps", value=10, elem_id="steps"),
    gr.Slider(minimum=64, maximum=2048, step=64, label="Width", value=512, elem_id="height"),
    gr.Slider(minimum=64, maximum=2048, step=64, label="Height", value=512, elem_id="width"),
    gr.Slider(minimum=1, maximum=10, step=0.5, label='Guidance Scale', value=7.5, elem_id="scale"),
    gr.Slider(minimum=0.0, maximum=1.0, step=0.01, label='Denoising strength', value=0.7, elem_id="strenght"),
    gr.Slider(minimum=1, maximum=1, step=1, label='Batch count', value=1, elem_id="batch_count"),
    gr.Slider(minimum=1, maximum=1, step=1, label='Batch size', value=1, elem_id="batch_size"),
    # gr.Slider(minimum=1, maximum=4, step=1, label="How Many Images?", value=1, elem_id="num_images") #not working
    # gr.Image(source = 'canvas', type = 'pil') #not working
    #Batch count and size limited to max 1 (currently not working)
    ],
    outputs = [
    gr.Image()
    ]
    ).launch(debug=True)
    [/code]
  16. A-gratefull-noob
    Wed, Oct 05, 2022, 15:43:49
    First off, this is amazing, thank you!

    Second: it does work till this update.
    I updated the script, or to be more precise I do have now text2img.py 1 to 3. 3 being the newest. If I try to use it.. nothing happens. I suspect its because there isnt a diffuser folder to be found anywhere. I tracked back your installation and cant find a promt to create this so I assume it should have created itself.. but it didnt.. any Idea why?
  17. SerzaNT
    Fri, Oct 07, 2022, 14:58:48
    Hi, im glad i found this guide! I was stuggling to run this on my AMD gpu for some time, so thank you! I was wondering if you(or maybe some1 here) was try trying to get the img2img generation runing? And if so, maybe point me in the right direction here?
  18. Haeden
    Fri, Oct 07, 2022, 15:23:59
    Hello,
    Is there a way to get the additional models working with this method like GFPGAN, RealESRGAN, and LDSR?
    I have this working with my old RX580, I'm upgrading soon, I really want a 6800xt but if AMD limits me to these workarounds I may spring for an Nvidia card.
    Thanks for the guide and update!
  19. John6800XT
    Sat, Oct 08, 2022, 01:44:44
    For what it's worth, on my 6800XT, all default settings, I was getting 1.25s/it or so.

    But I noticed that it really wasn't pushing my GPU all that hard, so I went into Adrenaline and forced the minimum frequency to be around 2300MHz. This got me to about 1.3it/s (note the unit flip). So about a 63% increase! But it's clearly still leaving performance on the table because the temperatures hardly go up at all.

    For comparison, I was generating single images with similar settings on my laptop with a 1660-ti at a pretty similar pace. But the 6GB of vRAM was definitely a limitation.

    However, when I did a Linux variation back on my desktop again (using ROCm), I was able to generate 6-image batches with 50 steps in about 40 seconds if my memory serves. So naiively that's something like 12x faster than through DirectML on Windows. And it noticeably pushed the GPU. Makes me really wish AMD would get ROCm working on WSL2 or Windows natively ASAP. Some forum sleuthing suggests it may be discretely in the works, but there's no official word about any attempt to make ROCm work with Windows. *sigh*
  20. JohnP
    Sat, Oct 08, 2022, 22:01:27
    Neal, thank you for putting this tutorial together. I have been having a lot of fun generating pictures.

    I have been using painting programs like Blender and the Gimp for years but very new to AI art.

    Also have a couple questions:
    1) What makes this an AMD tutorial? I don't understand what I did that makes it work AMD and not on Nvidia or CPU.
    2) If I wanted to test against CPU or Nvidia, what would I need to change?
  21. JohnP
    Sat, Oct 08, 2022, 22:02:29
    Sorry, should have said 3D rendering programs like Blender and painting programs like the Gimp...
  22. VodkaMartini
    Mon, Oct 10, 2022, 09:58:21
    Img2Img and inpaint still are not implemented.
    Pull request is failed in check, and It's been left unattended for 22 days.
    Disappointed.
    https://github.com/huggingface/diffusers/pull/552
  23. Jojo
    Mon, Oct 10, 2022, 18:13:56
    Hey Neil, thanks so much for this tutorial and update! I was reading through the github thread linked in the first post you made about this and came across this comment https://gist.github.com/harishanand95/75f4515e6187a6aa3261af6ac6f61269?permalink_comment_id=4319542#gistcomment-4319542. I'm still very new to stable diffusion but it might be interesting to see how much of a difference using a direct onnx version vs a converter would be. Just an idea for a possible update post! Anyway, thanks again :)
  24. Garanault
    Tue, Oct 11, 2022, 01:56:20
    Thanks for the tutorial Neil! currently got waifu-diffusion set up on this at 768x512 with my RX 6950XT at 2.75s/i. Though at this resolution, it eats up all 16GB on the card. I don't exactly generate the cleanest looking pictures, since I like to generate illustrations over portraits, but hoping to get better results tinkering with schedulars and different models. Hopefully RoCm xWindows becomes a thing.
    Thanks again for this great resource, Neil!
  25. Neil
    Wed, Oct 12, 2022, 10:53:27
    -Jojo-
    Huh, that's interesting. May tinker with that a bit this weekend. Thanks for the heads-up.
  26. Paul B
    Sun, Oct 16, 2022, 00:26:21
    Hi, thanks for the excellent tutorials. I just wanted to list a few modifications that I've found useful in my text2img.py

    Setting a save directory for images with:
    imagepath = "C:\\Users\\yourusername\\path\\to\\imagedirectory\\" (double blackslashes required)
    image.save(imagepath + imagename)

    Appending the seed number to saved images with:
    imagename = "output-" + str(seed) + ".png"

    Using a batch file to open Windows Terminal, which changes to my stable-diffusion directory, and loads the powershell environment:
    wt PowerShell.exe -NoExit -Command "(cd C:\Users\yourusername\path\to\stable-diffusion),(.\virtualenv\Scripts\Activate.ps1)"
  27. Bob
    Mon, Oct 17, 2022, 17:14:48
    Does anyone else have an issue where your entire graphics card essentially just shuts off randomly during some renders? It does that for me and then sends my computer into an endless power cycle. It is rare but it does happen.
  28. Zerg
    Thu, Oct 20, 2022, 16:31:41
    Bob, I think you have some power supply issues or maybe graphics card, its looking like hardware problem.
  29. VodkaMartini
    Sun, Oct 23, 2022, 12:14:25
    Finally img2img can be used with onnx.
    https://github.com/huggingface/diffusers/releases/tag/v0.6.0
  30. Jean
    Wed, Oct 26, 2022, 07:14:29
    Has anyone managed to convert ckpt model into diffusers to finally onnx? Really stuck on this.

    convert_original_stable_diffusion_to_diffusers.py // it's not really working, anyone knows another way?
  31. hex
    Wed, Oct 26, 2022, 20:57:14
    Jean, I've done f111 and r34_e4. I could not get novelai to convert -- it screamed something about unexpected keys. Might try a different model.
  32. hex
    Wed, Oct 26, 2022, 21:06:52
    I've actually written a batch file to do it. Paste this into a file with a .bat extension, in the same directory as the two conversion python scripts (and your ckpt files) and run the .bat file. If there are any errors it'll stop so you're not wasting your time. Excuse the formatting; I note above that formatting here isn't the greatest, but it should be understandable.

    for %%f in (*.ckpt) do (
    python convert_original_stable_diffusion_to_diffusers.py --checkpoint_path="./%%f" --dump_path="./%%~nf-diff"
    IF ERRORLEVEL 1 GOTO errorHandling
    python convert_stable_diffusion_checkpoint_to_onnx.py --model_path="./%%~nf-diff" --output_path="./%%~nf-onnx"
    IF ERRORLEVEL 1 GOTO errorHandling
    )

    :errorHandling
  33. hex
    Wed, Oct 26, 2022, 21:36:45
    Got to helping Jean and forgot what I was here for. The patch to pipeline_stable_diffusion_onny.py doesn't seem to be working. It also wasn't a modification on line 133 - it was 168 but the code was the same. PNDM still works fine, but LMS and DDIM get an error of "unsupported type(s) for *: 'numpy.ndarray' and 'Tensor'". Versions: diffusers-0.5.0, onnx-1.12.0, onnxruntime-1.21.1. Not sure how to fix this so if anyone has any ideas, I'll give them a shot.
  34. Jean
    Wed, Oct 26, 2022, 22:58:24
    @hex For me it's keeping saying:

    "RuntimeError: Error(s) in loading state_dict for CLIPTextModel"
    Missing key(s) in state_dict: "text_model.embeddings.position_ids", "text_model.embeddings.token_embedding.weight........." and goes on."

    So i doesn't succeed to convert the novelAI ckpt to onnx. YET.

    But on another dir ( novelaileakpt2/prodmodels ) has a bunch of .pt files with, maybe we can get something there.
  35. hex
    Thu, Oct 27, 2022, 08:12:47
    I did get a version of NovelAI to convert. I ended up following the guide to install SD Web UI in CPU mode, then I used it to merge NovelAI with another ckpt (I actually made "Berry's Mix" - so NovelAI + f111 + sd 1.5 @ 1.0, then that mixture + r34_e4 @ 0.25). I took the resulting ckpt and converted it to onnx with a TON of warnings, but no errors. I have not tested the resulting ckpt file yet - it's 3am here - but I'll definitely give it a shot tomorrow.

    The issue mentioned above about the schedulers not working with the modification is being looked into on their github, so hopefully that'll be fixed soon. You CAN get DDIM working with the fix proposed in the ticket, but not LMS.
  36. Jean
    Thu, Oct 27, 2022, 21:08:00
    @hex Thanks for the tips, i didn't knew this Berry mix, really interesting. There's this guide if anyone wanna try: https://rentry.org/berrymix

    But unfortunately, the novelAI model.ckpt is the only one that i cannot convert utilizing "convert_original_stable_diffusion_to_diffusers.py", i tried with f111, sd1.4, sd1.5 all of them works fine.

    I merged novel AI (berry mix) like you did, but i still receives the same error:

    Traceback (most recent call last):
    File "convert_original_stable_diffusion_to_diffusers.py", line 673, in
    checkpoint = checkpoint["state_dict"]
    KeyError: 'state_dict'

    I did try the convert script from Luxion (20/10/2022 14:03:19) on his PasteBin, but no lucky too. I tried some other models from novel AI leak, always the same KeyError: 'state_dict' error for everyone.

    Now i got really stuck, who knows someone knows a solution. Thanks guys, awesome tutorial.

  37. hex
    Thu, Oct 27, 2022, 21:58:57
    @Jean
    It's working great here. If NovelAI wasn't proprietary I'd put my berry mix onnx conversion up on mega for you, but this isn't the place to share that. I can't believe the difference in results from all the other models to this Berry mix. It's quite amazing.
  38. TK
    Sun, Oct 30, 2022, 11:59:18
    @Ray

    Any chance you have any tips on how you're getting 30sec/image on an RX 580? at 512x512 with 50 steps I'm at 3mins 30sec per.

  39. JohnP
    Mon, Oct 31, 2022, 15:16:40
    What is the default sampling method / scheduler?
  40. hex
    Wed, Nov 02, 2022, 05:56:24
    So here's some info about 0.6.0 and getting it to work.
    - If you converted your models with 0.5.0 (or perhaps before), you will need to edit the config files (model_index.json) for them. You will need to add four lines:
    "vae_encoder": [
    "diffusers",
    "OnnxRuntimeModel"
    ]

    These were not put in the configs for whatever reason, and because of that you can not load them as pretrained models. Next, to get ALL of the schedulers working, you'll need to edit pipeline_onnx_stable_diffusion.py. At line 152, before the line starting with "latents = " add this:
    import torch
    latents = torch.tensor(latents)

    Around line 170 you should see a familiar line from the guide - you're going to do the same thing. Change
    FROM: sample=latent_model_input, timestep=np.array([t]), encoder_hidden_states=text_embeddings
    TO: sample=latent_model_input, timestep=np.array([t], dtype=np.int64), encoder_hidden_states=text_embeddings

    And finally, around line 180, you'll find this line:
    latents = self.scheduler.step(noise_pred, t, latents, **extra_step_kwargs).prev_sample
    BEFORE it, add
    latents = torch.tensor(latents)

    These modifications are because, as the guide indicates, you can no longer put tensor_format="np" in your scheduler calls, but the schedulers aren't fixed (see github issue #967). This fixes it so you don't need the tensor_format in your scheduler calls, and all three schedulers DO work.

    The only other issue using this guide is that at some point you need to do a pip install torch, which is not mentioned in the guide. I had no other issues getting diffusers 0.6.0 to work using this guide and doing a clean install (I simply didn't specify diffusers==0.5.0 in the pip install line). Happy generating everyone.
  41. Luxion
    Mon, Nov 07, 2022, 21:26:32
    I've made a very basic gradio UI that allows ONNX txt2img, img2img, inpaint as well as dynamically switching between models, VAEs and schedulers. It also supports negative prompt, wildcards and token attention by using () or [] just like people do in Automatic's. Currently must not switch schedulers because in diffusers v0.7 most schedulers are broken for ONNX but that issue is currently being looked at. The code is very amateurish because I'm not a programmer but so far its working great.

    @Neil if you want me to share the UI to give you some insights on your (potential) next guide update - just say the word.
  42. John6800XT
    Mon, Nov 14, 2022, 22:39:37
    @Luxion I'd love to see a UI based on the work here. Might try my hand at one just to refresh my Python skills.

    I'd also like to report the good news that the ort_nightly_directml-1.14.0.dev20221110006-cp38-cp38-win_amd64 package seems to have resulted in something like a 3X increase in performance on my 6800XT. Just generated a 512x512 image with 3.21it/s! Still behind what I achieved on Linux with Rocm, but this speedup definitely gives me hope for future optimizations even without Rocm.
  43. Luxion
    Wed, Nov 16, 2022, 05:33:18
    @John6800XT
    Hero you go: https://mega.nz/folder/XMxEmZZT#AGokxgNtyphqDHNCbkosGw
  44. Luxion
    Wed, Nov 16, 2022, 17:04:41
    @John6800XT
    Thank you for the tip about the latest nightly! I also got a significant inference speed boost although with my RX 560 4G it was only about 33% increase - still good! There's only so much they can improve on low VRAM cards after all.
    One thing I noticed is diffusers do not actually import the nightly version of directml and instead it tries to import from any of these packages: "onnxruntime", "onnxruntime-gpu", "onnxruntime-directml", "onnxruntime-openvino". So I'm saying this because there is a chance that you might not actually be taking advantage of the lastest version of onnx directml - if you had installed onnxruntime previously. Check the comment I just posted in the part 1 of Neil's guide.
    Also if you tried the UI please tell me your thoughts about it. There are some features I want to improve/add and might have to rewrite the whole thing using gradio.blocks() - I would like to avoid that if possible. Particularly I'm interested in being able to dynamically update the UI variables (including the outputs). If you find anything wrong and/or improve over it - please share.
  45. Query
    Sun, Nov 27, 2022, 18:58:57
    How do you use a different model with the text2img script? Is it just swapping

    pipe = StableDiffusionOnnxPipeline.from_pretrained(
    "./stable_diffusion_onnx",
    provider="DmlExecutionProvider"
    )

    with

    pipe = StableDiffusionOnnxPipeline.from_pretrained(
    "./insert_other_model_here",
    provider="DmlExecutionProvider"
    )

    or is it something else?
  46. cstueckrath
    Thu, Dec 01, 2022, 18:47:26
    come over to https://github.com/nod-ai/SHARK for Stable Diffusion using Vulkan.
  47. Anonymous
    Fri, Dec 02, 2022, 22:15:20
    Please try this new approach, 20x faster on windows. Thanks!

    https://github.com/nod-ai/SHARK/blob/main/shark/examples/shark_inference/stable_diffusion/stable_diffusion_amd.md
  48. anonymous
    Sat, Dec 17, 2022, 10:41:17
    hello anonymous i have instaled the latest drivers of amd ad the speed is the same, this drivers are diferent? could you move games with it or only for stable diffusion
  49. Alexander
    Tue, Dec 20, 2022, 18:44:55
    How can we add a negative prompt?
  50. giorgio
    Sun, Jan 01, 2023, 11:22:53
    brutal script for this in https://github.com/m8ax/Programa-En-Python-Para-Manejar-Stable-Diffusion-Corriendo-En-GPU-AMD.-Incluidas-Las-Integradas

    brutal!
  51. m8ax
    Sun, Jan 01, 2023, 11:23:11
    its mine
  52. Beleriandian
    Sat, Jan 28, 2023, 02:29:22
    Can someone share an img2img.py example? I don't know how to do it. If you share one, I will use it as example.
  53. Luxion
    Sat, Apr 01, 2023, 18:55:07
    This guide has been outdated for several months and there has been a better alternative for a while:

    https://github.com/lshqqytiger/stable-diffusion-webui-directml

    Install that instead. Its essentially a ported version of famous Automatic1111 UI to work with DirectML which is compatible with most AMD cards - which means that every feature is working: Img2img/Inpaint, LoRAs, ControlNet, Upscalers, most extensions such as Dynamic Prompts, 3D open pose editor etc, etc...

    Read the instructions on the github page. You should add these args to webui-user.bat like so:

    set COMMANDLINE_ARGS=--precision full --no-half --no-half-vae --opt-sub-quad-attention --opt-split-attention-v1

    If your GPU has LESS than 10Gb VRAM you might want to add this arg as well: --medvram

    If your GPU has LESS than 6Gb VRAM add this instead: --lowvram

    If your GPU has at least 4Gb, you followed the instructions correctly and still get 'not enough memory' kind of errors - then try it without the '--opt-sub-quad-attention' arg.

    If you have troubles with SD failing to launch offline, add these: --skip-install --skip-version-check

    Good luck, and enjoy your local free image generator!