Is it possible to use triton for inference acceleration in ONNXRuntime ? #19219
Unanswered
twoapples1
asked this question in
Other Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
hello, I see that in the current version, triton is only designed to support CUDA training on Linux,I would like to know if it is possible to use it for inference acceleration in ONNXRuntime,could you help me evaluate the feasibility of using it for inference acceleration in ONNXRuntime? If this plan is feasible, I would like to try introducing triton in the inference application of some models to improve the application speed of inference. Also, I would like to ask if there are any plans to add triton to the ONNXRuntime inference application in future versions. thanks~
Beta Was this translation helpful? Give feedback.
All reactions