Skip to content

Efficient Neural Network Deployment on Heterogenous TinyML Platforms

License

Notifications You must be signed in to change notification settings

KULeuven-MICAS/htvm

Repository files navigation

HTVM

Efficient Neural Network Deployment on Heterogenous TinyML Platforms

CI

HTVM is a deep learning compiler for deploying neural networks on heterogeneous embedded compute platforms with multiple scratchpad-managed accelerators. HTVM generates self-contained C code that runs and dispatches neural network layers to either the platform's CPU, or one of it's accelerators.

To do this, HTVM mainly relies on:

  • Apache TVM to generate CPU kernels, and run different layers sequentially.
  • DORY to generate accelerator kernels with optimized scratchpad memory management.

Requirements

Main requirements:

  • TVM (contained in this repository) and tools to compile TVM.
  • DORY version 8a0fe7bcadb207c6d80820a4bd2c2f2c0e823248
  • Python 3.8

For DIANA, HTVM also requires:

For your convenience, we advise to use our docker container with all dependencies installed, needed for building TVM.

Installation in a container

We use podman commands here, but note that you can use docker as well if preferred.

Getting the docker image

Our github CI has an up-to-date image available that you can pull with:

podman pull ghcr.io/kuleuven-micas/htvm:main

Or you could build the container image locally with:

git clone --recursive https://github.com/KULeuven-MICAS/htvm
cd htvm
podman build . -f diana/docker/Dockerfile.tvm -t htvm:main

Note

See the Dockerfile in case you want to attempt installation without a container.

Building HTVM

If you haven't already cloned the repo, do:

git clone --recursive https://github.com/KULeuven-MICAS/htvm
cd htvm

Now create and start a container:

podman run -itv=`pwd`:/tvm-fork:z htvm:main

Inside the container shell run:

mkdir build
cp diana/config.cmake build
cd build
cmake ..
make -j$(nproc)
cd ..

Test if it works (also run from inside the container):

cd diana/byoc
python3 driver.py -h

Compiling an ONNX model

A number of ONNX example models, quantized by diana-quantlib, are provided in this repo through git LFS. For quantizing your own models, see diana-quantlib.

Download the model data with:

git lfs pull

Compile a model for DIANA with digital acceleration:

python3 driver.py --no-run --onnx test_data/export_resnet8/ResNet_QL_NOANNOTATION.onnx

Output C-code and pulp binaries can be found at /tmp/digital_pulp_dory_fused_O3_None/pulp/.

Compiling a model for running on the CPU of your local machine:

python3 driver.py --no-run --device x86 --target c --onnx test_data/export_resnet8/ResNet_QL_NOANNOTATION.onnx

Output C-code and x86 binaries can be found at /tmp/digital_x86_c_fused_O3_None/x86.

Run it locally with:

/tmp/digital_x86_c_fused_O3_None/x86/demo

Running tests

In addition to the standard test suite, provided by TVM, HTVM contains its own additional unit tests and end-to-end test.

The unit tests can be run with:

cd /path/to/htvm
pytest -v tests/python/contrib/test_soma_dory

The end-to-end tests rely on example ONNX files that are tracked with git lfs. Run git lfs pull in case you haven't done that already. Now run:

cd diana/byoc
pytest -v test.py

Project Status

HTVM currently supports deploying a number of tested neural networks on the Diana heterogeneous SoC.

The front-end supports ingesting quantized neural networks in ONNX format from Quantlib.

License

HTVM is Apache 2.0 Licensed.

Acknowledgements

This repository started off as a fork of the Apache TVM project on commit 2af3ab1e36e0e78bac8448a0357abee317fabb1f but was rebased on upstream several times.