converted models producing noise #446

ssube · 2023-12-23T22:45:48Z

Some recently-converted are producing random noise, like:

This is:

happening with pytorch 2.x, including 2.0 and 2.1
- even with low_cpu_mem_usage monkeypatches on most/all UNetCondition2DModel ctors
only happening when ONNX_WEB_CONVERT_EXTRACT=FALSE
- which is necessary for converting some newer models
happening with both fp16 and fp32
happening with SD v1.5 models
- does not appear to happen with SDXL

Running a diff between a good and bad copy of the same model, most/all of the weights are different:

INFO:__main__:raw data differs for onnx::Mul_9546: -0.12585449
INFO:__main__:raw data differs for onnx::Add_9547: 0.10827637
INFO:__main__:raw data differs for onnx::MatMul_9548: 0.25146484
INFO:__main__:raw data differs for onnx::MatMul_9549: 0.34448242
INFO:__main__:raw data differs for onnx::MatMul_9550: 0.16455078
INFO:__main__:raw data differs for onnx::MatMul_9557: 0.19995117
INFO:__main__:raw data differs for onnx::MatMul_9558: 0.15942383
INFO:__main__:raw data differs for onnx::MatMul_9559: 0.21325684
INFO:__main__:raw data differs for onnx::MatMul_9560: 0.13708496
INFO:__main__:raw data differs for onnx::MatMul_9567: 0.23217773
INFO:__main__:raw data differs for onnx::MatMul_9568: 0.19250488
INFO:__main__:raw data differs for onnx::MatMul_9569: 0.18237305
INFO:__main__:raw data differs for onnx::Mul_9570: -0.03488159
INFO:__main__:raw data differs for onnx::Add_9571: 2.65625
WARNING:__main__:models have 686 differences

This is true for at least the UNet and VAEs.

The text was updated successfully, but these errors were encountered:

ssube · 2023-12-24T12:04:20Z

When this occurs on Windows, it appears to cause a crash rather than noise:

[2023-12-23 20:16:06,568] ERROR: onnx-web worker: directml MainThread onnx_web.chain.pipeline: error while running stage pipeline, 1 retries left
Traceback (most recent call last):
  File "onnx_web\chain\pipeline.py", line 227, in __call__
  File "onnx_web\chain\source_txt2img.py", line 144, in run
  File "diffusers\pipelines\stable_diffusion\pipeline_onnx_stable_diffusion.py", line 433, in __call__
  File "diffusers\pipelines\stable_diffusion\pipeline_onnx_stable_diffusion.py", line 433, in <listcomp>
  File "onnx_web\diffusers\patches\vae.py", line 79, in __call__
  File "diffusers\pipelines\onnx_utils.py", line 60, in __call__
  File "onnxruntime\capi\onnxruntime_inference_collection.py", line 220, in run
onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running Add node. Name:'/decoder/mid_block/attentions.0/Add_1' Status Message: D:\a\_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\MLOperatorAuthorImpl.cpp(2759)\onnxruntime_pybind11_state.pyd!00007FF863B5DDF2: (caller: 00007FF863B5DB05) Exception(4) tid(5d98) 80070057 The parameter is incorrect.

I believe this is the same issue, and the different error message is due to DirectML.

I've written a new SD converter that uses the same optimum.main_export call that the SDXL converter is using, which seems to work on most models. Currently testing on the models included in the pre-converted set:

Cetus
- fails on both v4 and Whalefall
Dreamshaper
- works on v8
Elegant Entropy
- works on v1.4
Faetastic
- fails on v2
Juggernaut
- not setup yet, not tested
ReV Animated
- works on v1.2.2-EOL

Examples:

Based on the fact that some models fail to convert with both methods, it seems like there might be an issue with the model or somewhere upstream. All of the failing models (Cetus and Faetastic) convert correctly when using pipeline: txt2img-legacy and ONNX_WEB_CONVERT_EXTRACT=TRUE (which is the default again).

ssube added status/confirmed issues that have been discussed but not planned type/bug broken features scope/convert labels Dec 23, 2023

ssube added this to the v0.11 milestone Dec 23, 2023

ssube self-assigned this Dec 23, 2023

ssube mentioned this issue Dec 24, 2023

v0.11.0 release checklist #418

Closed

ssube added status/progress issues that are in progress and have a branch and removed status/confirmed issues that have been discussed but not planned labels Dec 24, 2023

ssube modified the milestones: v0.11, v0.12 Dec 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

converted models producing noise #446

converted models producing noise #446

ssube commented Dec 23, 2023 •

edited

Loading

ssube commented Dec 24, 2023 •

edited

Loading

converted models producing noise #446

converted models producing noise #446

Comments

ssube commented Dec 23, 2023 • edited Loading

ssube commented Dec 24, 2023 • edited Loading

ssube commented Dec 23, 2023 •

edited

Loading

ssube commented Dec 24, 2023 •

edited

Loading