Is it possible to use triton for inference acceleration in ONNXRuntime ？ #19219

twoapples1 · 2024-01-22T08:08:08Z

twoapples1
Jan 22, 2024

hello, I see that in the current version, triton is only designed to support CUDA training on Linux，I would like to know if it is possible to use it for inference acceleration in ONNXRuntime,could you help me evaluate the feasibility of using it for inference acceleration in ONNXRuntime? If this plan is feasible, I would like to try introducing triton in the inference application of some models to improve the application speed of inference. Also, I would like to ask if there are any plans to add triton to the ONNXRuntime inference application in future versions. thanks~

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is it possible to use triton for inference acceleration in ONNXRuntime ？ #19219

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Is it possible to use triton for inference acceleration in ONNXRuntime ？ #19219

twoapples1 Jan 22, 2024

Replies: 0 comments

twoapples1
Jan 22, 2024