Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenVINO: Encountered unknown exception in Run() #20069

Open
mertalev opened this issue Mar 25, 2024 · 15 comments
Open

OpenVINO: Encountered unknown exception in Run() #20069

mertalev opened this issue Mar 25, 2024 · 15 comments
Labels
ep:OpenVINO issues related to OpenVINO execution provider model:transformer issues related to a transformer model: BERT, GPT2, Hugging Face, Longformer, T5, etc.

Comments

@mertalev
Copy link

mertalev commented Mar 25, 2024

Describe the issue

When using OpenVINO, the session can be created, but calling run leads to the error: RuntimeException: [ONNXRuntimeError] RUNTIME_EXCEPTION: Encountered unknown exception. Based on reports in this issue, there seems to be a pattern with the N100 CPU in particular.

This seems to be a regression as this error only appears after upgrading to 1.17.1 of onnxruntime-openvino with OpenVINO 2023.3.0. This model worked when using 1.15.0 and OpenVINO 2023.1.0.

After enabling the following environmental variables:

ORT_OPENVINO_ENABLE_CI_LOG=1
ORT_OPENVINO_ENABLE_DEBUG=1
OPENVINO_LOG_LEVEL=5

There are a few additional logs, but none that seem pertinent:

In the OpenVINO EP
Model is fully supported on OpenVINO
CreateNgraphFunc

To reproduce

With onnxruntime-openvino 1.17.1 and OpenVINO 2023.3.0, create a session including the following providers:

['OpenVINOExecutionProvider', 'CPUExecutionProvider']

And the following provider options:

[{'device_type': 'GPU_FP32', 'cache_dir': '/tmp/facial-recognition/buffalo_l/openvino'}, {'arena_extend_strategy': 'kSameAsRequested'}] 

Then attempt to run inference with this model. It may or may not work depending on the CPU.

You may use this image to have the exact software environment producing the issue: ghcr.io/immich-app/immich-machine-learning@sha256:01799596c7f40495887d4027df1c0f4c144c7cd6ab34937ef2cc14d246470095

Urgency

No response

Platform

Linux

OS Version

Ubuntu 22.04

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.17.1

ONNX Runtime API

Python

Architecture

X64

Execution Provider

OpenVINO

Execution Provider Library Version

2023.3.0

@github-actions github-actions bot added ep:OpenVINO issues related to OpenVINO execution provider model:transformer issues related to a transformer model: BERT, GPT2, Hugging Face, Longformer, T5, etc. labels Mar 25, 2024
@jywu-msft
Copy link
Member

+@sfatimar , @preetha-intel

@Disty0
Copy link

Disty0 commented Mar 29, 2024

Having the same error using this model: https://huggingface.co/SmilingWolf/wd-swinv2-tagger-v3
Other variants of this model (convnext and vit) runs fine but swinv2 fails with the following logs:

Using onnxruntime-openvino==1.17.1 on Arch Linux 6.8.2 with this script:
https://github.com/kohya-ss/sd-scripts/blob/dev/finetune/tag_images_by_wd14_tagger.py

Command:

ORT_OPENVINO_ENABLE_CI_LOG=1 ORT_OPENVINO_ENABLE_DEBUG=1 OPENVINO_LOG_LEVEL=5 python finetune/tag_images_by_wd14_tagger.py --model_dir "~/models/wd14_tagger_model" --repo_id "SmilingWolf/wd-swinv2-tagger-v3" --recursive --remove_underscore --append_tags --onnx --caption_separator ", " --batch_size 1 --caption_extension ".txt" point_to_a_folder_with_images/

On CPU with OpenVINO {'device_type': 'CPU_FP32'}):

2024-03-29 20:38:42,374 - __main__ - INFO - loading onnx model: /mnt/DataSSD/AI/models/wd14_tagger_model/SmilingWolf_wd-swinv2-tagger-v3/model.onnx
In the OpenVINO EP
CreateNgraphFunc
CreateNgraphFunc
CreateNgraphFunc
CreateNgraphFunc
CreateNgraphFunc
2024-03-29 20:38:43.498619390 [W:onnxruntime:, session_state.cc:1166 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-03-29 20:38:43.498633230 [W:onnxruntime:, session_state.cc:1168 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
2024-03-29 20:38:44,003 - __main__ - INFO - found 151 images.
  0%|                                                                                                                                                                                                                 | 0/151 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/mnt/DataSSD/AI/Apps/rocm/kohya_ss/sd-scripts/finetune/tag_images_by_wd14_tagger.py", line 448, in <module>
    main(args)
  File "/mnt/DataSSD/AI/Apps/rocm/kohya_ss/sd-scripts/finetune/tag_images_by_wd14_tagger.py", line 321, in main
    run_batch(b_imgs)
  File "/mnt/DataSSD/AI/Apps/rocm/kohya_ss/sd-scripts/finetune/tag_images_by_wd14_tagger.py", line 199, in run_batch
    probs = ort_sess.run(None, {input_name: imgs})[0]  # onnx output numpy
  File "/mnt/DataSSD/AI/Apps/ipex/kohya_ss/venv/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 220, in run
    return self._sess.run(output_names, input_feed, run_options)
onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Encountered unknown exception in Run()

On an Intel ARC A770 ({'device_type': 'GPU.0_FP32'} or {'device_type': 'GPU_FP32'}):

2024-03-29 20:30:00,068 - __main__ - INFO - loading onnx model: /mnt/DataSSD/AI/models/wd14_tagger_model/SmilingWolf_wd-swinv2-tagger-v3/model.onnx
In the OpenVINO EP
CreateNgraphFunc
CreateNgraphFunc
CreateNgraphFunc
CreateNgraphFunc
CreateNgraphFunc
2024-03-29 20:30:01.866096452 [W:onnxruntime:, session_state.cc:1166 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2024-03-29 20:30:01.866111062 [W:onnxruntime:, session_state.cc:1168 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
2024-03-29 20:30:02,360 - __main__ - INFO - found 151 images.
  0%|                                                                                                                                                                                                                 | 0/151 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/mnt/DataSSD/AI/Apps/rocm/kohya_ss/sd-scripts/finetune/tag_images_by_wd14_tagger.py", line 448, in <module>
    main(args)
  File "/mnt/DataSSD/AI/Apps/rocm/kohya_ss/sd-scripts/finetune/tag_images_by_wd14_tagger.py", line 321, in main
    run_batch(b_imgs)
  File "/mnt/DataSSD/AI/Apps/rocm/kohya_ss/sd-scripts/finetune/tag_images_by_wd14_tagger.py", line 199, in run_batch
    probs = ort_sess.run(None, {input_name: imgs})[0]  # onnx output numpy
  File "/mnt/DataSSD/AI/Apps/ipex/kohya_ss/venv/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 220, in run
    return self._sess.run(output_names, input_feed, run_options)
onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Encountered unknown exception in Run()

On an AMD RX 7900 XTX {'device_type': 'GPU.1_FP32'}):
(Different error and more useful logs)

2024-03-29 20:35:50,890 - __main__ - INFO - loading onnx model: /mnt/DataSSD/AI/models/wd14_tagger_model/SmilingWolf_wd-swinv2-tagger-v3/model.onnx
In the OpenVINO EP
CreateNgraphFunc
lld: error: undefined hidden symbol: _fc_bf_tiled_kernel_default_fully_connected_gpu_bf_tiled_12472420788504233070_0__sa
>>> referenced by /tmp/comgr-3b4438/input/linked.bc.o:(fully_connected_gpu_bf_tiled_12472420788504233070_0__sa)
>>> referenced by /tmp/comgr-3b4438/input/linked.bc.o:(fully_connected_gpu_bf_tiled_12472420788504233070_0__sa)
Error: Creating the executable from LLVM IRs failed.
....
lld: error: undefined hidden symbol: _fc_bf_tiled_kernel_default_fully_connected_gpu_bf_tiled_5508805082171153497_0__sa
>>> referenced by /tmp/comgr-9d5597/input/linked.bc.o:(fully_connected_gpu_bf_tiled_5508805082171153497_0__sa)
>>> referenced by /tmp/comgr-9d5597/input/linked.bc.o:(fully_connected_gpu_bf_tiled_5508805082171153497_0__sa)
Error: Creating the executable from LLVM IRs failed.
....
2024-03-29 20:35:53.575513812 [E:onnxruntime:, inference_session.cc:1985 Initialize] Encountered unknown exception in Initialize()
Traceback (most recent call last):
  File "/mnt/DataSSD/AI/Apps/rocm/kohya_ss/sd-scripts/finetune/tag_images_by_wd14_tagger.py", line 448, in <module>
    main(args)
  File "/mnt/DataSSD/AI/Apps/rocm/kohya_ss/sd-scripts/finetune/tag_images_by_wd14_tagger.py", line 148, in main
    ort_sess = ort.InferenceSession(
  File "/mnt/DataSSD/AI/Apps/ipex/kohya_ss/venv/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 419, in __init__
    self._create_inference_session(providers, provider_options, disabled_optimizers)
  File "/mnt/DataSSD/AI/Apps/ipex/kohya_ss/venv/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 483, in _create_inference_session
    sess.initialize_session(providers, provider_options, disabled_optimizers)
onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Encountered unknown exception in Initialize()

Same script runs fine on CPUExecutionProvider and ROCmExecutionProvider (from https://pypi.lsh.sh/60/onnxruntime-training/).

@sfatimar
Copy link
Contributor

sfatimar commented Apr 2, 2024

Yes there is a regression Binary DLL uploaded in github.com/intel/onnxruntime for 1.17.1 onnxruntime-openvino is compatible only with OpenVINO 2023.3.0.
There was a change in Exception API on OpenVINO which was not handled properly so backward compatibility was broken. But it is possible to build 1.17.1 code with OpenVINO 2023.1.0 and execute ...

@Disty0
Copy link

Disty0 commented Apr 2, 2024

I tried buıilding from the main branch (commit id: a2998e5) and it runs fine now.

Had to use OpenVINO 2023.3 since building with 2024.0 segfaulted on import.
2023.3 still fails when running tests but it runs fine for my use case so i added --skip_tests to build.sh.

Ran into the same issue as intel/neural-speed#188 and had to add --compile_no_warning_as_error to build.sh.

Build command:

./build.sh --config RelWithDebInfo --use_openvino GPU_FP32 --parallel --build_shared_lib --build_wheel --compile_no_warning_as_error --skip_tests

@mertalev
Copy link
Author

Yes there is a regression Binary DLL uploaded in github.com/intel/onnxruntime for 1.17.1 onnxruntime-openvino is compatible only with OpenVINO 2023.3.0.

There was a change in Exception API on OpenVINO which was not handled properly so backward compatibility was broken. But it is possible to build 1.17.1 code with OpenVINO 2023.1.0 and execute ...

To clarify, the issue is occurring with 2023.3.0. The pattern I'm seeing is that it works on CPUs with Iris Xe graphics, but not on CPUs with UHD graphics.

@shummo
Copy link

shummo commented May 15, 2024

Is it possible to DOWNGRADE to 1.15.0 onnxruntime-openvino with OpenVINO 2023.1.0? If yes, how with docker compose (sorry but I"m not an expert) Thanks

@henryruhs
Copy link

I have to confirm, that this is an existing issue that breaks OpenVINO using Intel Arc (770) under Windows.

@shummo A downgrade to onnxruntime==1.15.0 and openvino=2023.1.0 solved it

@ankitm3k
Copy link
Contributor

ankitm3k commented Jun 2, 2024

Hi @mertalev, I have tested your model this and it's inferencing successfully on both Windows 11 and Ubuntu 22.04 for CPU and GPU. I would recommend you and community towards using latest OpenVINO Toolkit v2024.1 and OpenVINO EP v1.18.0 which will be available soon in the upcoming ONNXRuntime release. You can also build and find OpenVINO EP from source for the same.

@mertalev
Copy link
Author

mertalev commented Jun 2, 2024

Thanks for the testing and update! We'll upgrade to 1.18.0 and 2024.1.0 once the former is available.

When you mention that it works on GPU, can you clarify if you tested with an iGPU like UHD Graphics, or a dGPU like Arc (and I believe Iris Xe is also counted as a dGPU). iGPUs struggle, but I haven't seen anyone with a dGPU have an issue with this model.

@ankitm3k
Copy link
Contributor

ankitm3k commented Jun 2, 2024

Thanks for the testing and update! We'll upgrade to 1.18.0 and 2024.1.0 once the former is available.

When you mention that it works on GPU, can you clarify if you tested with an iGPU like UHD Graphics, or a dGPU like Arc (and I believe Iris Xe is also counted as a dGPU). iGPUs struggle, but I haven't seen anyone with a dGPU have an issue with this model.

It's tested on a Meteor Lake architecture processor CPU (Intel Core Ultra 7 1003H) comprising of iGPU (Intel Arc Graphics). I'd recommend you to try your application on multiple platforms. As suggested above, you can also use the main branch of this repository to build OpenVINO EP to get the latest wheels for your work environment and reproduce the same.

@Snuupy
Copy link

Snuupy commented Jun 2, 2024

Thanks for the update!

It's tested on a Meteor Lake architecture processor CPU (Intel Core Ultra 7 1003H) comprising of iGPU (Intel Arc Graphics).

Hi, could you please add Intel UHD graphics to your list of test cases/testing setup because it is currently broken on that (but not XE graphics, so if it works on your testing setup, this may still be broken)

@henryruhs
Copy link

henryruhs commented Jun 3, 2024

@ankitm3k No, onnxruntime-openvino does not work with latest OpenVino 2024.1 ... just release 1.18.0 finally

@ankitm3k
Copy link
Contributor

ankitm3k commented Jun 6, 2024

Thanks for the testing and update! We'll upgrade to 1.18.0 and 2024.1.0 once the former is available.

When you mention that it works on GPU, can you clarify if you tested with an iGPU like UHD Graphics, or a dGPU like Arc (and I believe Iris Xe is also counted as a dGPU). iGPUs struggle, but I haven't seen anyone with a dGPU have an issue with this model.

Hi @mertalev @Snuupy

I have tested your model with our C++ onnxruntime_perf_test app built from source and it runs inference successfully with below machine configurations -
Machine 1 -
OS: Windows 11
CPU: Raptor Lake arch
iGPU: Intel UHD Graphics 770
dGPU: Intel Arc A380 Graphics

Machine 2 -
OS: Windows 11
CPU: i7-1270P
iGPU: Intel Iris Xe Graphics

I'd recommend you to use either intel's repo master or rel-1.18.0 (https://github.com/intel/onnxruntime.git) or directly from Microsoft's master branch (https://github.com/microsoft/onnxruntime.git) or rel-1.18.0 (https://github.com/microsoft/onnxruntime/commits/rel-1.18.0/) for building the wheels using below command -

image

Discover the wheel in below example path and install using pip -
pip install ${CWD}\onnxruntime\build\Windows\Debug\Debug\dist\onnxruntime_openvino-1.19.0-cp310-cp310-win_amd64.whl

@Snuupy
Copy link

Snuupy commented Jun 18, 2024

@ankitm3k

I have tested your model with our C++ onnxruntime_perf_test app built from source and it runs inference successfully with below machine configurations - Machine 1 - OS: Windows 11 CPU: Raptor Lake arch iGPU: Intel UHD Graphics 770 dGPU: Intel Arc A380 Graphics

Doesn't onnxruntime default to the dgpu if one is provided? So it will run on the A380 (working) instead of the UHD 770 igpu (not working)

I can try substituting 1.18 to see if that fixes anything regardless.

@ankitm3k
Copy link
Contributor

ankitm3k commented Jun 24, 2024

@ankitm3k

I have tested your model with our C++ onnxruntime_perf_test app built from source and it runs inference successfully with below machine configurations - Machine 1 - OS: Windows 11 CPU: Raptor Lake arch iGPU: Intel UHD Graphics 770 dGPU: Intel Arc A380 Graphics

Doesn't onnxruntime default to the dgpu if one is provided? So it will run on the A380 (working) instead of the UHD 770 igpu (not working)

I can try substituting 1.18 to see if that fixes anything regardless.

The onnxruntime default device_type for GPU is iGPU (GPU.0) and if you want to explicitly use dGPU (GPU.1) then set your device _type as GPU.1 during your inference provider options.

Please build your onnxruntime-openvino wheels from the main branch and install it in your python virtual env so that you can get the latest release changes. This should solve your issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ep:OpenVINO issues related to OpenVINO execution provider model:transformer issues related to a transformer model: BERT, GPT2, Hugging Face, Longformer, T5, etc.
Projects
None yet
Development

No branches or pull requests

8 participants