A General-purpose Task-parallel Programming System using Modern C++
-
Updated
Jun 30, 2024 - C++
A General-purpose Task-parallel Programming System using Modern C++
Sample codes for my CUDA programming book
CUDA C++ Core Libraries
🎉CUDA 笔记 / 大模型手撕CUDA / C++笔记,更新随缘: flash_attn、sgemm、sgemv、warp reduce、block reduce、dot product、elementwise、softmax、layernorm、rmsnorm、hist etc.
Thin, unified, C++-flavored wrappers for the CUDA APIs
Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models
TinyChatEngine: On-Device LLM Inference Library
Safe rust wrapper around CUDA toolkit
A simple GPU hash table implemented in CUDA using lock free techniques
🚀 TensorRT-YOLO: Supports YOLOv3, YOLOv5, YOLOv6, YOLOv7, YOLOv8, YOLOv9, YOLOv10, and PP-YOLOE using TensorRT acceleration with EfficientNMS, CUDA Kernels and CUDA Graphs!
This is an archive of materials produced for an introductory class on CUDA programming at Stanford University in 2010
From zero to hero CUDA for accelerating maths and machine learning on GPU.
μ-Cuda, COVER THE LAST MILE OF CUDA. With features: intellisense-friendly, structured launch, automatic cuda graph generation and updating.
An implementation of HIP that works on CPUs, across OSes.
CUDA kernel author's tools
A self-learning tutorail for CUDA High Performance Programing.
Install CUDA on Windows11 using WSL2
Speed up image preprocess with cuda when handle image or tensorrt inference
Add a description, image, and links to the cuda-programming topic page so that developers can more easily learn about it.
To associate your repository with the cuda-programming topic, visit your repo's landing page and select "manage topics."