Cover image

Running Stable Diffusion on Windows with an AMD GPU


This is a series!🔗

Part one: You're here!
Part two: Stable Diffusion Updates


(Want just the bare tl;dr bones? Go read this Gist by harishanand95. It says everything this does, but for a more experienced audience.)

Stable Diffusion has recently taken the techier (and art-techier) parts of the internet by storm. It's an open-source machine learning model capable of taking in a text prompt, and (with enough effort) generating some genuinely incredible output. See the cover image for this article? That was generated by a version of Stable Diffusion trained on lots and lots of My Little Pony art. The prompt I used for that image was kirin, pony, sumi-e, painting, traditional, ink on canvas, trending on artstation, high quality, art by sesshu.

Unfortunately, in its current state, it relies on Nvidia's CUDA framework, which means that it only works out of the box if you've got an Nvidia GPU.

Fear not, however. Because Stable Diffusion is both a) open source and b) good, it has seen an absolute flurry of activity, and some enterprising folks have done the legwork to make it usable for AMD GPUs, even for Windows users.

Requirements🔗

Before you get started, you'll need the following:

  • A reasonably powerful AMD GPU with at least 6GB of video memory. I'm using an AMD Radeon RX 5700 XT, with 8GB, which is just barely powerful enough to outdo running this on my CPU.
  • A working Python installation. You'll need at least version 3.7. v3.7, v3.8, v.39, and v3.10 should all work.
  • The fortitude to download around 6 gigabytes of machine learning model data.
  • A Hugging Face account. Go on, go sign up for one, it's free.
  • A working installation of Git, because the Hugging Face login process stores its credentials there, for some reason.

The Process🔗

I'll assume you have no, or little, experience in Python. My only assumption is that you have it installed, and that when you run python --version and pip --version from a command line, they respond appropriately.

Preparing the workspace🔗

Before you begin, create a new folder somewhere. I named mine stable-diffusion. The name doesn't matter.

Once created, open a command line in your favorite shell (I'm a PowerShell fan myself) and navigate to your new folder. We're going to create a virtual environment to install some packages into.

When there, run the following:

python -m venv ./virtualenv

This will use the venv package to create a virtual environment named virtualenv. Now, you need to activate it. Run the following:

# For PowerShell
./virtualenv/Scripts/Activate.ps1
rem For cmd.exe
virtualenv\Scripts\activate.bat

Now, anything you install via pip or run via python will only be installed or run in the context of this environment we've named virtualenv. If you want to leave it, you can just run deactivate at any time.

Okay. All set up, let's start installing the things we need.

Installing Dependencies🔗

We need a few Python packages, so we'll use pip to install them into the virtual envrionment, like so:

pip install diffusers==0.3.0
pip install transformers
pip install onnxruntime

Now, we need to go and download a build of Microsoft's DirectML Onnx runtime. Unfortunately, at the time of writing, none of their stable packages are up-to-date enough to do what we need. So instead, we need to either a) compile from source or b) use one of their precompiled nightly packages.

Because the toolchain to build the runtime is a bit more involved than this guide assumes, we'll go with option b). Head over to https://aiinfra.visualstudio.com/PublicPackages/_artifacts/feed/ORT-Nightly/PyPI/ort-nightly-directml/overview/1.13.0.dev20220908001 (Or, if you're the suspicious sort, you could go to https://aiinfra.visualstudio.com/PublicPackages/_artifacts/feed/ORT-Nightly and grab the latest under ort-nightly-directml yourself).

Either way, download the package that corresponds to your installed Python version: ort_nightly_directml-1.13.0.dev20220913011-cp37-cp37m-win_amd64.whl for Python 3.7, ort_nightly_directml-1.13.0.dev20220913011-cp38-cp38-win_amd64.whl for Python 3.8, you get the idea.

Once it's downloaded, use pip to install it.

pip install pathToYourDownloadedFile/ort_nightly_whatever_version_you_got.whl --force-reinstall

Take note of that --force-reinstall flag! The package will override some previously-installed dependencies, but if you don't allow it to do so, things won't work further down the line. Ask me how I know >.>

Getting and Converting the Stable Diffusion Model🔗

First thing, we're going to download a little utility script that will automatically download the Stable Diffusion model, convert it to Onnx format, and put it somewhere useful. Go ahead and download https://raw.githubusercontent.com/huggingface/diffusers/main/scripts/convert_stable_diffusion_checkpoint_to_onnx.py (i.e. copy the contents, place them into a text file, and save it as convert_stable_diffusion_checkpoint_to_onnx.py) and place it next to your virtualenv folder.

Now is when that Hugging Face account comes into play. The Stable Diffusion model is hosted here, and you need an API key to download it. Once you sign up, you can find your API key by going to the website, clicking on your profile picture at the top right -> Settings -> Access Tokens.

Once you have your token, authenticate your shell with it by running the following:

huggingface-cli.exe login

And paste in your token when prompted.

Note: If you can get an error with a stack trace that looks something like this at the bottom:

  File "C:\Python310\lib\subprocess.py", line 1438, in _execute_child
    hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
FileNotFoundError: [WinError 2] The system cannot find the file specified

...then that probably means that you don't have Git installed. The huggingface-cli tool uses Git to store login credentials.

Once that's done, we can run the utility script.

python convert_stable_diffusion_checkpoint_to_onnx.py --model_path="CompVis/stable-diffusion-v1-4" --output_path="./stable_diffusion_onnx"

--model_path is the path on Hugging Face to go and find the model. --output_path is the path on your local filesystem to place the now-Onnx'ed model into.

Sit back and relax--this is where that 6GB download comes into play. Depending on your connection speed, this may take some time.

...done? Good. Now, you should have a folder named stable_diffusion_onnx which contains an Onnx-ified version of the Stable Diffusion model.

Your folder structure should now look something like this:

A picture of Windows Explorer displaying two folders and two files. (I named my virtual environment venv instead of virtualenv. Same same though.)

Almost there.

Running Stable Diffusion🔗

Now, you just have to write a tiny bit of Python code. Let's create a new file, and call it text2img.py. Inside of it, write the following:

from diffusers import StableDiffusionOnnxPipeline
pipe = StableDiffusionOnnxPipeline.from_pretrained("./stable_diffusion_onnx", provider="DmlExecutionProvider")

prompt = "A happy celebrating robot on a mountaintop, happy, landscape, dramatic lighting, art by artgerm greg rutkowski alphonse mucha, 4k uhd'"

image = pipe(prompt).images[0] 
image.save("output.png")

Take note of the first argument we pass to StableDiffusionOnnxPipeline.from_pretrained(). "./stable_diffusion_onnx". That's a file path to the Onnx-ified model we just created. And provider needs to be "DmlExecutionProvider" in order to actually instruct Stable Diffusion to use DirectML, instead of the CPU.

Once that's saved, you can run it with python .\text2img.py.

Once it's done, you'll have an image named output.png that's hopefully close to what you asked for in prompt!

A Stable Diffusion generated picture of a robot reclining on a mountainside.

Bells and Whistles🔗

Now, that was a little bit bare-minimum, particularly if you want to customize more than just your prompt. I've written a small script with a bit more customization, and a few notes to myself that I imagine some folks might find helpful. It looks like this:

from diffusers import StableDiffusionOnnxPipeline
import numpy as np

def get_latents_from_seed(seed: int, width: int, height:int) -> np.ndarray:
    # 1 is batch size
    latents_shape = (1, 4, height // 8, width // 8)
    # Gotta use numpy instead of torch, because torch's randn() doesn't support DML
    rng = np.random.default_rng(seed)
    image_latents = rng.standard_normal(latents_shape).astype(np.float32)
    return image_latents

pipe = StableDiffusionOnnxPipeline.from_pretrained("./stable_diffusion_onnx", provider="DmlExecutionProvider")
"""
prompt: Union[str, List[str]],
height: Optional[int] = 512,
width: Optional[int] = 512,
num_inference_steps: Optional[int] = 50,
guidance_scale: Optional[float] = 7.5, # This is also sometimes called the CFG value
eta: Optional[float] = 0.0,
latents: Optional[np.ndarray] = None,
output_type: Optional[str] = "pil",
"""

seed = 50033
# Generate our own latents so that we can provide a seed.
latents = get_latents_from_seed(seed, 512, 512)
prompt = "A happy celebrating robot on a mountaintop, happy, landscape, dramatic lighting, art by artgerm greg rutkowski alphonse mucha, 4k uhd"
image = pipe(prompt, num_inference_steps=25, guidance_scale=13, latents=latents).images[0]
image.save("output.png")

With this script, I can pass in an arbitrary seed value, easily customize the height and width, and in the triple-quote comments, I've added some notes about what arguments the pipe() function takes. My plan is to wrap all of this up into an argument parser, so that I can just pass all of these parameters into the script without having to modify the source file itself, but I'll do that later.

Some Final Notes🔗

  • As far as I can tell, this is still a fair bit slower than running things on Nvidia hardware! I don't have any hard numbers to share, only anecdotal observations that this seems to be anywhere from 3x to 8x slower than it is for people on similar-specced Nvidia hardware.
  • Currently, the Onnx pipeline doesn't support batching, so don't try to pass it multiple prompts, or it will be sad.
  • All of this is changing at breakneck pace, so I fully expect about half of this blog post to be outdated a few weeks from now. Expect to have to do some legwork of your own. Sorry!
  • There is a very good guide on how to use Stable Diffusion on Reddit that goes through the basics of what each of the parameters means, how it affects the output, and gives tips on what you can do to get better ouputs.

Closing Thoughts🔗

So hopefully, now you've got your AMD Windows machine generating some AI-powered images. As I said before, I expct much of this information to be out of date two weeks from now. I might try to keep this post updated if I find the time and inclination, but that depends a lot on how this develops, and my own free time. We'll see!

As ever, I can be found on GitHub as pingzing and Twitter as @pingzingy. Happy generating!

Creative Commons BY badge The text of this blog post is licensed under a Creative Commons Attribution 4.0 International License.

Comments

  1. Stephen
    Thu, Sep 15, 2022, 01:07:07
    Thanks for putting this together. Unfortunately I can't seem to get this to run - it seems to hang up on the pipe command. Do you have any suggestions?
    File "C:\stable-diffusion\stable-diffusion\text2img.py", line 12, in
    pipe = StableDiffusionOnnxPipeline.from_pretrained("./stable_diffusion_onnx", provider="DmlExecutionProvider")
    RuntimeError: D:\a\_work\1\s\onnxruntime\core\providers\dml\dml_provider_factory.cc(124)\onnxruntime_pybind11_state.pyd!00007FFF3E877BF3: (caller: 00007FFF3E7C9C16) Exception(1) tid(d50) 80070057 The parameter is incorrect.
  2. MK
    Thu, Sep 15, 2022, 01:09:08
    While trying to login using the API token, I get the below error. I'm trying to understand what file is missing and why:

    Traceback (most recent call last):
    File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.8_3.8.2800.0_x64__qbz5n2kfra8p0\lib\runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
    File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.8_3.8.2800.0_x64__qbz5n2kfra8p0\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
    File "I:\stable-diffusion\virtualenv\Scripts\huggingface-cli.exe\__main__.py", line 7, in
    File "i:\stable-diffusion\virtualenv\lib\site-packages\huggingface_hub\commands\huggingface_cli.py", line 41, in main
    service.run()
    File "i:\stable-diffusion\virtualenv\lib\site-packages\huggingface_hub\commands\user.py", line 176, in run
    _login(self._api, token=token)
    File "i:\stable-diffusion\virtualenv\lib\site-packages\huggingface_hub\commands\user.py", line 344, in _login
    hf_api.set_access_token(token)
    File "i:\stable-diffusion\virtualenv\lib\site-packages\huggingface_hub\hf_api.py", line 705, in set_access_token
    write_to_credential_store(USERNAME_PLACEHOLDER, access_token)
    File "i:\stable-diffusion\virtualenv\lib\site-packages\huggingface_hub\hf_api.py", line 528, in write_to_credential_store
    with subprocess.Popen(
    File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.8_3.8.2800.0_x64__qbz5n2kfra8p0\lib\subprocess.py", line 858, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
    File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.8_3.8.2800.0_x64__qbz5n2kfra8p0\lib\subprocess.py", line 1311, in _execute_child
    hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
    FileNotFoundError: [WinError 2] The system cannot find the file specified
    (virtualenv) PS I:\stable-diffusion>
  3. Neil
    Thu, Sep 15, 2022, 08:27:18
    MK: Judging by the stack trace, when the huggingface-cli tries to call write_to_credential_store(), it can't find ANY credential store. Since it looks like you're using a version of Python installed from the Windows Store... maybe that's why? As a guess, you could try uninstalling that version of Python, and installing a non-Store version.

    Stephen: Yours is trickier. It looks like it's actually dying somewhere in DirectML's native C/C++ code itself, with a classically unhelpful "The parameter is incorrect." =/
    Without more information about your setup, it's hard to say. I can say that I haven't seen that before, though. Are you running on an unusual system, like an ARM version of Windows, or something?
  4. ponut64
    Thu, Sep 15, 2022, 11:48:00
    I got it to work.
    Note: Probably do not use a newer nightly version of DirectML. It may cause huggingface to misbehave. Or something.
    CPU is AMD R5 5600G.
    GPU is AMD RX 6600 (non-XT).
    CPU time for a sample 256x256 image and prompt is 54 seconds.
    GPU time for the same prompt and size is 34 seconds.
    It helps!
    As for the images it's producing, they all seem rather cursed. But such is the way of AI!
  5. Conor
    Thu, Sep 15, 2022, 12:15:37
    Thanks for making the step by step.
    Though I'm afraid it still seems to be throwing an error at me and I don't understand where i'm missing a step.
    The error i get is as follows:

    PS F:\Applications\AI\Stable-Diffusion> huggingface-cli.exe login
    huggingface-cli.exe : The term 'huggingface-cli.exe' is not recognized as the name of a cmdlet, function, script file, or operable program. Check the spelling of the name, or if a path was included, verify
    that the path is correct and try again.
    At line:1 char:1
    + huggingface-cli.exe login
    + ~~~~~~~~~~~~~~~~~~~
    + CategoryInfo : ObjectNotFound: (huggingface-cli.exe:String) [], CommandNotFoundException
    + FullyQualifiedErrorId : CommandNotFoundException

    If anyone can see what I'm doing wrong or has had the same issue I'd love to know!
  6. Neil
    Thu, Sep 15, 2022, 15:59:20
    Conor: The huggingface-cli.exe is something that gets brought in by one of the dependencies (not sure which--probably diffusers?). If you're in the virtual environment when you installed everything, it should have wound up in .\virtualenv\Scripts\.

    My guess would be that you didn't activate the virtual environment with .\virtualenv\Scripts\Activate.ps1. If you HAVE, you can probably just invoke it manually with .\virtualenv\Scripts\huggingface-cli.exe.
  7. A good man
    Thu, Sep 15, 2022, 16:24:45
    Thanks for putting this together. A few errrors and issues i had:
    -you need GIT or you won't be able to log into huggingface. You don't mention this I believe
    - '---force-reinstall' has an extra -. Needs removing or command won't be recognized
    -downloading the utility script may be confusing for newbies (was for me). Guessing what you meant by 'downloading' is selecting it all and copy-pasting into a notepad file and then changing to .py format. At least that's what worked for me
  8. Conor
    Thu, Sep 15, 2022, 18:08:57
    I realized I had a previous install of Python installed through windows store, and its directory was not added to PATH.
    Completely uninstalling Python, then reinstalling using the installer from the website, and checking the box to add to PATH during the install fixed my problem.
  9. Andrew
    Fri, Sep 16, 2022, 01:23:52
    When I run your first example python script to generate an image, it takes about 7-8 minutes. This is not as fast as I had hoped for an AMD RX 6600 considering ponut64 could do 256x256 images in less than a minute. How much is DirectML dependent on the CPU? That may be my bottleneck (Ivy Bridge processor 3570K, 16GB ram). When I modify the script to not use GPU by deleting the provider argument, it takes nearly 30 minutes to generate an image, so a relative improvement, but still long enough to try my patience.

    I cannot try generating a different image size (e.g. 256x256) since I get the following error:

    ValueError: Unexpected latents shape, got (1, 4, 32, 32), expected (1, 4, 64, 64)
  10. ponut64
    Fri, Sep 16, 2022, 02:52:10
    To andrew:
    Try this line
    image = pipe(prompt, height=320, width=320, guidance_scale=8, num_inference_steps=25).images[0]
    You can adjust the height, steps, and such from here, and many other parameters that not even I know about.
  11. Neil
    Fri, Sep 16, 2022, 07:09:17
    -A good man-
    - Huh, I didn't even realize the CLI used Git as its credential storage medium. Whack. Thanks for the heads-up.
    - Typo fixed, thanks.
    - Added a bit of clarification around downloading the script, good point.

    -Andrew-
    If you're using the second Python script in the post, like ponut64 pointed out, you need to pass height and width arguments to pipe() as well as get_latents_from_seed(). (The only thing get_latents_from_seed() does is generate the randomness the image generation process uses, which for Reasons needs to know what the dimensions of the output will be.)

    I'll probably put up a small follow-up post soon with an updated script and a few observations about the process. I've got a nicely cleaned-up script that just reads in arguments now, and is a bit neater.
  12. Rev Hellfire
    Fri, Sep 16, 2022, 13:27:56
    Thanks a million for putting this together. Worked first time for me, that's a first :-).

    One small typo though, "---force-reinstall" should be "--force-reinstall"
  13. A good man
    Fri, Sep 16, 2022, 14:13:49
    Been playing with this for a while and very often I get plain black image outputs. I read it's due to a NSFW filter. Any clue how to disable it? Tried a few tricks from different places and none of them work.
    This is very annoying as often I don't even try anything explicit nsfw. It just happens at random when u go by different art styles.
  14. Neil
    Fri, Sep 16, 2022, 14:16:37
    -Rev Hellfire-
    Thanks for the heads-up! Fixed!

    -A good man-
    Yep, that's the safety checker. I'm going to address that in the follow-up post (because for some reason, its also seems to slow things down greatly), but the easiest way to disable it is, after defining pipe, add the following line:

    pipe.safety_checker = lambda images, **kwargs: (images, [False] * len(images))

    That will replace pipe's safety checker with a dummy function that always returns "false" when it checks to see if the generated output would be considered NSFW.
  15. Eric
    Fri, Sep 16, 2022, 14:59:14
    Is there a way to change the sampling method? I'd like to the the ddim sampler as it seems to make good results with less steps.
  16. Alex
    Fri, Sep 16, 2022, 16:13:34
    Works for me on RX5700, thanks!
  17. Marz
    Fri, Sep 16, 2022, 17:03:37
    Hello! Thank you for this!
    I'm just having an issue I'm not quite sure how to fix.

    When running..
    python convert_stable_diffusion_checkpoint_to_onnx.py --model_path="CompVis/stable-diffusion-v1-4" --output_path="./stable_diffusion_onnx"

    I get this error..
    C:\Users\tiny\AppData\Local\Programs\Python\Python310\python.exe: can't open file 'E:\\Desktop\\Stable Diffusion\\convert_stable_diffusion_checkpoint_to_onnx.py': [Errno 2] No such file or directory

    I've followed the steps word for word so I'm not too sure where I'm messing up. Any help would be great, thank you!
  18. ponut64
    Fri, Sep 16, 2022, 17:57:19
    Marz,

    You need to manually specify a directory on your system for it to put stable-diffusion.
    It must be a full directory name, for example, D:\Library\stable-diffusion\stable_diffusion_onnx
  19. Marz
    Fri, Sep 16, 2022, 18:19:03
    Hey ponut64, thanks for the reply :)

    I did do that, no matter what I get the same error. I do have it set up properly (as far I know), just didn't copy the exact prompt before. Here's what I have

    (virtualenv) PS E:\Desktop\Stable Diffusion> convert_stable_diffusion_checkpoint_to_onnx.py --model_path="CompVis/stable-diffusion-v1-4" --output_path="E:\Desktop\Stable Diffusion"
    convert_stable_diffusion_checkpoint_to_onnx.py: The term 'convert_stable_diffusion_checkpoint_to_onnx.py' is not recognized as a name of a cmdlet, function, script file, or executable program.
    Check the spelling of the name, or if a path was included, verify that the path is correct and try again.
  20. Marz
    Fri, Sep 16, 2022, 18:20:27
    UPDATE: I know that prompt is also wrong (I've been trying multiple things), I did fix it with "python" in front and it still shows the same error.
  21. Neil
    Fri, Sep 16, 2022, 18:26:06
    -Eric-
    I've played with trying to use the other schedulers, but haven't had any success yet. They usually die somewhere in the middle with arcane errors I don't know enough to debug.

    -Marz-
    It looks like the 'convert_stable_diffusion_checkpoint_to_onnx.py' script isn't in the 'E:\Desktop\Stable Diffusion' folder, judging by that error message. Try moving it into there, then running the command again?
  22. Eric
    Fri, Sep 16, 2022, 18:31:14
    How do you change the scheduler? I'd like to take a crack at figuring it out.
    For that matter, is there a complete list of arguments I can put into "pipe()" somewhere?
  23. Neil
    Fri, Sep 16, 2022, 18:36:18
    -Eric-
    The closest thing I've found to a comprehensive list of arguments taken by pipe() is the source code itself: https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_onnx.py#L45

    As for using a different scheduler, 'scheduler' is an arg that can be passed to .from_pretrained(), and the diffusers repo has a few examples here: https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines/stable_diffusion though I haven't had any luck with those, or tweaks thereof.
  24. Tony Topper
    Fri, Sep 16, 2022, 23:03:25
    Thanks for doing this. It's been interesting to play with. My first generation clocked in at around 48 seconds on a 6900xt and an AMD 3950x. I am exploring the possibility of using AI to create assets for video game productions. Speeding it up would be awesome. Any thoughts on how to achieve that?

    Also, both DALL-E 2 and Nitecafe offer generating multiple images at the same time. I would love to get this running with that feature.

    I installed the nightly via pip install using the info found here: https://aiinfra.visualstudio.com/PublicPackages/_artifacts/feed/ORT-Nightly/connect/pip Though I had to remove that config line from my pip.ini file after installing it or no other pip installs would work.

    Also, FWIW, I got an error about "memory pattern" being disabled when I run the Python script you supplied.

    Would love to keep up to date with how you improve the Python script.

    (P.S. I've also been getting black squares as the output on occasion. Wonder what that is.)
  25. Marz
    Fri, Sep 16, 2022, 23:24:05
    Thanks for the reply Neil :)
    It's most definitely there, that's why I'm stumped. Just gonna save my GPU the trouble and use the browser. Thank you though and take care!
  26. Luxion
    Sat, Sep 17, 2022, 02:43:25
    Excellent guide!
    You forgot to mention that you need to activate the environment before executing the script each time the console is closed - its obvious but maybe not for noobs.
    Now we just need the guys at diffusers to work on onnx img2img and inpainting pipelines.
  27. Eric
    Sat, Sep 17, 2022, 02:46:32
    -Tony Topper-
    The black squares output is because of the NSFW filter. From another comment above:
    "
    pipe.safety_checker = lambda images, **kwargs: (images, [False] * len(images))
    That will replace pipe's safety checker with a dummy function that always returns "false" when it checks to see if the generated output would be considered NSFW.
    "
    I'm also getting that memory pattern error, but it doesn't seem to affect anything?

    -Neil-
    I'm working on the scheduler problem. As near as I can figure, it's something related to an incompatibility with numpy and torch. Torch isn't getting the right type of float from numpy and when I try to cast it, it still doesn't work. That's where I'm investigating lately at least.
  28. ponut64
    Sat, Sep 17, 2022, 05:00:32
    For fixing the schedulers, I found a hint while getting help for another diffuser.
    From here:
    https://huggingface.co/hakurei/waifu-diffusion/discussions/4

    "it can be fixed by setting dtype=np.int64 in pipeline_stable_diffusion_onnx.py line 133:

    noise_pred = self.unet(
    sample=latent_model_input, timestep=np.array([t], dtype=np.int64), encoder_hidden_states=text_embeddings
    )
    "

    And specifically about the scheduler:

    "I see astype(np.int64) in scheduling_pndm.py line 168 but not in other schedulers that's why change to PNDMScheduler can fix it."

    So that guy know's what he's talking about, I don't.
  29. Simon
    Sat, Sep 17, 2022, 15:20:07
    Excellent guide! Finally got txt2img working thanks to your post. Thanks!
  30. Gianluca
    Sat, Sep 17, 2022, 19:13:24
    Thank you very much for this effective guide!
    I was able to use just the CPU so far and now I can finally try with the GPU, instead.
    It seems that the GPU is faster in my case.

    I have the following system configuration:
    - OS: Windows 10 Pro
    - MB: ASRock X570 Phantom Gaming 4
    - CPU: AMD Ryzen 7 3700X 8-Core 3600MHz
    - RAM: 32 GB (17 GB available)
    - GPU: MSI AMD Radeon RX 570 ARMOR 8 GB

    CPU average time for 512 x 512 was about 5~6 minutes.
    With GPU and the simple script above it was reduced to 4 minutes (just 1 run).
    And with your more optimized script here above my first run says 2 minutes and 10 seconds.

    With CPU it was using about 16GB of the available RAM and CPU usage was about 50%.
    With GPU it is using all of the 8GB available and 100% of GPU processing power.
  31. Dan
    Sat, Sep 17, 2022, 20:24:27
    Excellent Thank you!
    Had a bit of difficulty putting my Token on the command line. Not sure why Ctrl-V wouldn't work, but a single right click and enter worked.
  32. Luxion
    Sat, Sep 17, 2022, 23:23:28
    @Gianluca
    His second script is not 'more optimized', it simply makes some variables which are then used internally to adjust the settings. The reason you are generating twice as fast is because in his second script he specified the number of steps to 25 - which when not specified it defaults back to 50.
    I'm assuming you don't know what steps are nor their importance because you didn't worry about them at all before running the script. In which case I recommend you read/watch some SD tutorials and learn more.
  33. Gianluca
    Sun, Sep 18, 2022, 08:39:04
    Thanks for the advice! I am definitely new to Stable Diffusione and some more tutorials will help :)

    Of course you are right, I noticed the difference in steps just after I posted my comment here (but I could not edit), and I was a little bit sad to acknowledge that my graphic card is not so good at the end :)
    Basically for me it works as in your own experience: just slightly better than with CPU alone, perhaps just 0.3~0.5 s/it faster.
  34. ekkko
    Sun, Sep 18, 2022, 15:50:36
    I can't seem to paste or type in the token once the prompt appears in either shell. Ideas?
  35. Luxion
    Sun, Sep 18, 2022, 16:02:53
    Check out the diffusers repo, someone made a PR for ONNX img2img and inpaint pipelines!
    I still see some torch calls in there - not sure it really works but if it does I hope you could update this guide and teach us how to use them when they get merged.
    Thanks again for the guide! Its brilliant!


    @Gianluca
    I have the MSI RX560 4G so I know how you feel.
    But there's lots of good news:
    - GPU's pricing should drop a little in the near future
    - ONNX is still improving and its SD models and pipelines will become much better in the very near future
    - AMD is apparently working with StabilityAI to help compatibility issues
    - And finally within 2 years SD will become so optimized that it will be able to run on mobile - according to Emad - SD's creator!
  36. ekkko
    Sun, Sep 18, 2022, 16:02:55
    It finally worked via right-clicking the shell window and selecting edit>paste. Seems quite a few people have this problem - found the solution at: https://discuss.huggingface.co/t/how-to-login-to-huggingface-hub-with-access-token/22498
  37. Dan
    Sun, Sep 18, 2022, 20:44:18
    Is there a way to have the model run the same prompt more than once? I tried the technique from hugging face where it does 3 variations and puts them in a grid with a specified amount of rows and columns, not enough vram to run 3 variations at the same time. I think I'm more interested in running the process back to back for the same prompt.
  38. ponut64
    Mon, Sep 19, 2022, 00:03:10
    To Dan,

    Yes, you can easily do that with a Python "while" loop (which, here, is being used like a 'for' loop would in other languages).

    Here is an example (note the TABS!):

    num_images = 0;
    while (num_images < 10):
    num_images = num_images + 1
    image = pipe(prompt, height=448, width=320, guidance_scale=12, num_inference_steps=60).images[0]
    image.save("output" + str(num_images) + ".png")

    The count of images you want out of the program is the number that "num_images" is to be less than.
  39. Robin
    Mon, Sep 19, 2022, 11:14:28
    Thank you so much for putting this together! I was able to follow the steps and got it up and running. Is there any way I can use a GUI with it? The text prompts are working, would just be amazing to have something like stable-diffusion-ui cooperating with this method :-) Appreciate all your help! Take care and best wishes
  40. Mads
    Mon, Sep 19, 2022, 16:12:59
    Thank you so much. It is working just fine.
    A question. How do I convert custom ckpt model instead the one provided from hugging face?
  41. Bellic
    Mon, Sep 19, 2022, 18:29:54
    Wow that is great! text2img really doing well. But I wonder how do I use image to image. The default isn't onnx but some nvidia standard.
    Again. Thank you so much.
  42. Allan
    Tue, Sep 20, 2022, 00:20:49
    Thank you for providing this information! Much gratitude.
  43. Magnus
    Tue, Sep 20, 2022, 09:09:16
    Hi, great guide i got it working on my windows 10 machine. 2 questions:

    Can you do reinforcement learning on windows 10 too? referring to the pony art, was that model trained using this setup or downloaded from elsewhere? i'm guessing the ladder...

    I also notice the generation uses all of my vram (8GB) does that mean it has to use some of the regular ram and slowing it down as a process?
    i will try to generate in smaller resoultion but are there other ways of reducing vram usage?

    Thanks
  44. anonymous
    Tue, Sep 20, 2022, 09:46:29
    I'm getting images output, however with this error:
    ...virtualenv\lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py:54: UserWarning: Specified provider 'DmlExecutionProvider' is not in available provider names.Available providers: 'CPUExecutionProvider'

    If anybody knows what this issue is, how to resolve it, or if you can tell me how I can go about debugging this, then I would be very grateful. I've been trying to print a list of available providers as my debug starting point to no avail.
  45. michael
    Tue, Sep 20, 2022, 10:44:00
    no, no it still doesnt work. need more steps.
    please explain in mroe detail the convert python part.
  46. hugh
    Tue, Sep 20, 2022, 10:44:04
    @ponut64

    Got any ideas on iterating the seed in your loop? Omitting latents=latents from pipe yields irreproducable results.

    I tried this:

    seed = 50033
    while (num_images < 25):
    num_images = num_images + 1
    seed = seed + 1
    image = pipe(prompt, height=512, width=512, guidance_scale=13, num_inference_steps=25).images[0]
    image.save("output"+ str(num_images) + "_seed-" + str(seed) + ".png")

    and received 25 different images, but running again without changes gives 25 new images.

    Replacing with:
    image = pipe(prompt, num_inference_steps=13, guidance_scale=13, latents=latents).images[0]

    gives reproducible results, but the seed is unchanging throughout.
  47. michael
    Tue, Sep 20, 2022, 10:57:45
    nevermind it works. now i just wish i can make the images bigger?
  48. Neil
    Tue, Sep 20, 2022, 11:40:04
    -Robin-
    Don't have a good out-of-the-box answer for you, but I know a few people have had some success using Gradio to throw together a rudimentary UI. Might be something you could experiment with.

    -Mads-
    Not sure! I imagine that the convert_stable_diffusion_checkpoint_to_onnx.py script has some clues. I haven't taken a close look at it myself, but you might be able to repurpose whatever it does to point at a local CKPT.

    -Magnus-
    Not sure! I'm more a casual user than an ML guru. The pony was generated using someone else's custom-built model based on Stable Diffusion that they haven't released yet, not generated locally.
    As to reducing VRAM usage, not as far as I know--I think SD uses everything that's available. There's probably some way to tune it, but I don't know what that might be.

    -anonymous-
    That indicates that SD can't find everything it needs to run DirectML, so it's falling back to executing on the CPU, and you're not getting any advantage from running on your GPU.
    When you installed the nightly Onnx Runtime package, did you make sure to pass it the --force-reinstall flag? I noticed I had similar failures until I did so.
  49. Neil
    Tue, Sep 20, 2022, 11:49:50
    -hugh-
    "...yields irreproducible results..."
    Of course, you're not actually using the seed in your example! The latents are the source of randomness in the pipeline, and if you don't pass in your own, you give the pipeline free reign to generate them for you, which it will do so randomly. If you want deterministic results, you need to generate your own latents, using a seed you control.