Skip to content

Implementation of paper - YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors

License

Notifications You must be signed in to change notification settings

VynOpenSource/yolov7

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Official YOLOv7

Implementation of paper - YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors

PWC Hugging Face Spaces Open In Colab arxiv.org

Web Demo

Performance

MS COCO

Model Test Size APtest AP50test AP75test batch 1 fps batch 32 average time
YOLOv7 640 51.4% 69.7% 55.9% 161 fps 2.8 ms
YOLOv7-X 640 53.1% 71.2% 57.8% 114 fps 4.3 ms
YOLOv7-W6 1280 54.9% 72.6% 60.1% 84 fps 7.6 ms
YOLOv7-E6 1280 56.0% 73.5% 61.2% 56 fps 12.3 ms
YOLOv7-D6 1280 56.6% 74.0% 61.8% 44 fps 15.0 ms
YOLOv7-E6E 1280 56.8% 74.4% 62.1% 36 fps 18.7 ms

Installation

Docker environment (recommended)

Expand
# create the docker container, you can change the share memory size if you have more.
nvidia-docker run --name yolov7 -it -v your_coco_path/:/coco/ -v your_code_path/:/yolov7 --shm-size=64g nvcr.io/nvidia/pytorch:21.08-py3

# apt install required packages
apt update
apt install -y zip htop screen libgl1-mesa-glx

# pip install required packages
pip install seaborn thop

# go to code folder
cd /yolov7

Testing

yolov7.pt yolov7x.pt yolov7-w6.pt yolov7-e6.pt yolov7-d6.pt yolov7-e6e.pt

python test.py --data data/coco.yaml --img 640 --batch 32 --conf 0.001 --iou 0.65 --device 0 --weights yolov7.pt --name yolov7_640_val

You will get the results:

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.51206
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.69730
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.55521
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.35247
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.55937
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.66693
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.38453
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.63765
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.68772
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.53766
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.73549
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.83868

To measure accuracy, download COCO-annotations for Pycocotools to the ./coco/annotations/instances_val2017.json

Training

Data preparation

bash scripts/get_coco.sh
  • Download MS COCO dataset images (train, val, test) and labels. If you have previously used a different version of YOLO, we strongly recommend that you delete train2017.cache and val2017.cache files, and redownload labels

Single GPU training

# train p5 models
python train.py --workers 8 --device 0 --batch-size 32 --data data/coco.yaml --img 640 640 --cfg cfg/training/yolov7.yaml --weights '' --name yolov7 --hyp data/hyp.scratch.p5.yaml

# train p6 models
python train_aux.py --workers 8 --device 0 --batch-size 16 --data data/coco.yaml --img 1280 1280 --cfg cfg/training/yolov7-w6.yaml --weights '' --name yolov7-w6 --hyp data/hyp.scratch.p6.yaml

Multiple GPU training

# train p5 models
python -m torch.distributed.launch --nproc_per_node 4 --master_port 9527 train.py --workers 8 --device 0,1,2,3 --sync-bn --batch-size 128 --data data/coco.yaml --img 640 640 --cfg cfg/training/yolov7.yaml --weights '' --name yolov7 --hyp data/hyp.scratch.p5.yaml

# train p6 models
python -m torch.distributed.launch --nproc_per_node 8 --master_port 9527 train_aux.py --workers 8 --device 0,1,2,3,4,5,6,7 --sync-bn --batch-size 128 --data data/coco.yaml --img 1280 1280 --cfg cfg/training/yolov7-w6.yaml --weights '' --name yolov7-w6 --hyp data/hyp.scratch.p6.yaml

This sections illustrates the changes added to main code in this fork.

There are three main additions:

  1. Custom augmentations: This permits the addition of new augmentations not present in the yolov7 pipeline.
  2. Excluding bounding boxes: This enables the training process to remove specific classes from the images.
  3. Biasing data using labels: This allows the user to select what classes are going to appear more freqently in the training.
  4. A Squeeze and Excitation Block has been added, which can be used between the neck and head.

The first three additions can be used in the code as follows:

from vyn_yolov7 import Setting
from train import run_train

# relative ratio of the classes. For instance, imagine we have two classes: class1 and class2.
# If we want the class2 to be picked twice as class1, then we do
probabilities = {
    'class1': 1.0,
    'class2': 2.0,
}

options = Setting()
options.shuffle_class = True
options.probabilities = probabilities
options.custom_augment_fun = custom_augmentations
options.data = 'PATH/TO/YAML_FILE.yaml'

run_train(options)

The custom augmentation is a function that receives the image and bounding boxes as numpy arrays and returns the modified image and bounding boxes

An illustration of an augmentation function is presented below:

def custom_augmentation(image, bounding_boxes):
    .
    .
    .
    return augmented_image, augmented_bounding_boxes

In order to perform the removal of bounding boxes from images the fields excluded_classes and mapping_classes from the yaml are used. The purpose of this addition is the following: Imagine that we have a small dataset of people. In this dataset, each individual person is labelled, but in the dataset there are also hands, heads and other parts of the human body, but very few of them. In order to get an initial model, it would be convenient not to use these objects, but we do want to keep using those images because complete persons may be in those images as well.

To solve this issue, this code allows the user to select a set of classes that are going to be removed, meaning the objects in the images will be replaced with random noise.

The bounding boxes in training and validation contain the classes that are being represented. These classes are going to be integers from 0 to nc_complete, in contrast to nc as in the original code. The nc represents the number of classes that are going to be detected whereas nc_complete is the total number of classes in the dataset.

An example of these variables is shown below:

The nc_complete must be passed to the yaml file together with the other two parameters. An example of this yaml file is:

excluded_classes:
- 3
mapping_classes:
  0: 0
  1: 1
  2: 2
names:
- fire_extinguisher_shell
- forklift
- ladder
nc: 3
nc_complete: 4
train: PATH/TO/training.txt
val: PATH/TO/validation.txt

This means that in the dataset there will be 4 classes, the first three are normal ones and they will work in the same way as the original yolov7 code, but the fourth class (index number 3) will be excluded, so the bounding box will not be used for training, but the section of the image inside that bounding box will be changed for noise.

Lastly, the shuffle_class and probabilities are used to bias the dataset when needed. It allows to pass the relative rate of each class. For instance, if we have 3 classes: barrier, manhole and water_barrier and we want the class manhole to be used twice as much as the other two since this class seems to be harder for the model to train it properly. Then,

probabilities = {'barrier': 1, 'manhole': 2, 'water_barrier': 1}

Notice the shuffle_class is used to radomly select data from a class or to use all the dataset as is. So, if shuffle_class is False (default behaviour and the only one in the original code) then all the dataset will be used, for instance, if there are 100 images (80 of class barrier and 20 of the rests) the 100 images will be used per epoch. When shuffle_class is True, the training will select each class with equal probability regardless of the number of images of each class. So, 'barrier', 'manhole', 'water_barrier' will be selected with the same probability even when 'barrier' is more common. If probabilities is provided then the classes will be selected following that rate instead of with equal probability.

Transfer learning

yolov7_training.pt yolov7x_training.pt yolov7-w6_training.pt yolov7-e6_training.pt yolov7-d6_training.pt yolov7-e6e_training.pt

Single GPU finetuning for custom dataset

# finetune p5 models
python train.py --workers 8 --device 0 --batch-size 32 --data data/custom.yaml --img 640 640 --cfg cfg/training/yolov7-custom.yaml --weights 'yolov7_training.pt' --name yolov7-custom --hyp data/hyp.scratch.custom.yaml

# finetune p6 models
python train_aux.py --workers 8 --device 0 --batch-size 16 --data data/custom.yaml --img 1280 1280 --cfg cfg/training/yolov7-w6-custom.yaml --weights 'yolov7-w6_training.pt' --name yolov7-w6-custom --hyp data/hyp.scratch.custom.yaml

Re-parameterization

See reparameterization.ipynb

Inference

On video:

python detect.py --weights yolov7.pt --conf 0.25 --img-size 640 --source yourvideo.mp4

On image:

python detect.py --weights yolov7.pt --conf 0.25 --img-size 640 --source inference/images/horses.jpg

Export

Pytorch to CoreML (and inference on MacOS/iOS) Open In Colab

Pytorch to ONNX with NMS (and inference) Open In Colab

python export.py --weights yolov7-tiny.pt --grid --end2end --simplify \
        --topk-all 100 --iou-thres 0.65 --conf-thres 0.35 --img-size 640 640 --max-wh 640

Pytorch to TensorRT with NMS (and inference) Open In Colab

wget https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7-tiny.pt
python export.py --weights ./yolov7-tiny.pt --grid --end2end --simplify --topk-all 100 --iou-thres 0.65 --conf-thres 0.35 --img-size 640 640
git clone https://github.com/Linaom1214/tensorrt-python.git
python ./tensorrt-python/export.py -o yolov7-tiny.onnx -e yolov7-tiny-nms.trt -p fp16

Pytorch to TensorRT another way Open In Colab

Expand

wget https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7-tiny.pt
python export.py --weights yolov7-tiny.pt --grid --include-nms
git clone https://github.com/Linaom1214/tensorrt-python.git
python ./tensorrt-python/export.py -o yolov7-tiny.onnx -e yolov7-tiny-nms.trt -p fp16

# Or use trtexec to convert ONNX to TensorRT engine
/usr/src/tensorrt/bin/trtexec --onnx=yolov7-tiny.onnx --saveEngine=yolov7-tiny-nms.trt --fp16

Tested with: Python 3.7.13, Pytorch 1.12.0+cu113

Pose estimation

code yolov7-w6-pose.pt

See keypoint.ipynb.

Instance segmentation (with NTU)

code yolov7-mask.pt

See instance.ipynb.

Instance segmentation

code yolov7-seg.pt

YOLOv7 for instance segmentation (YOLOR + YOLOv5 + YOLACT)

Model Test Size APbox AP50box AP75box APmask AP50mask AP75mask
YOLOv7-seg 640 51.4% 69.4% 55.8% 41.5% 65.5% 43.7%

Anchor free detection head

code yolov7-u6.pt

YOLOv7 with decoupled TAL head (YOLOR + YOLOv5 + YOLOv6)

Model Test Size APval AP50val AP75val
YOLOv7-u6 640 52.6% 69.7% 57.3%

Citation

@inproceedings{wang2023yolov7,
  title={{YOLOv7}: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors},
  author={Wang, Chien-Yao and Bochkovskiy, Alexey and Liao, Hong-Yuan Mark},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2023}
}
@article{wang2023designing,
  title={Designing Network Design Strategies Through Gradient Path Analysis},
  author={Wang, Chien-Yao and Liao, Hong-Yuan Mark and Yeh, I-Hau},
  journal={Journal of Information Science and Engineering},
  year={2023}
}

Teaser

YOLOv7-semantic & YOLOv7-panoptic & YOLOv7-caption

YOLOv7-semantic & YOLOv7-detection & YOLOv7-depth (with NTUT)

YOLOv7-3d-detection & YOLOv7-lidar & YOLOv7-road (with NTUT)

Acknowledgements

Expand

About

Implementation of paper - YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Languages

  • Jupyter Notebook 98.7%
  • Python 1.3%