Skip to content

v0.0.3

Latest
Compare
Choose a tag to compare
@cliffordkleinsr cliffordkleinsr released this 05 Apr 11:51
· 42 commits to main since this release
81794b5

Performance Optimizations

  1. Moved the final scaling and uint8 quantization to GPU, reducing CPU and main memory bandwidth consumption. 2.5x speed-up.
  2. Instruct FFMPEG to use RGB frames instead of BGR so no need to swap channels.
  3. Batched inference (controlled by invoking the --batch & --batches parameter, default is 4).
  4. Instruct torch to make contiguous tensors after the BCHW -> BHWC transform on GPU. So no need to copy the buffer before writing to FFMPEG . Reduced output IO time by 10x.
  5. Use NVENC pipilene when available to decode and encode the images when piping inputs