diff --git a/docs/tutorials/recommendation_system/DeepFM.md b/docs/tutorials/recommendation_system/DeepFM.md new file mode 100644 index 000000000..b80050da7 --- /dev/null +++ b/docs/tutorials/recommendation_system/DeepFM.md @@ -0,0 +1,82 @@ +# DeepFM模型 + +## 1.模型简介 + +CTR预估是目前推荐系统的核心技术,其目标是预估用户点击推荐内容的概率。DeepFM模型包含FM和DNN两部分,FM模型可以抽取low-order(低阶)特征,DNN可以抽取high-order(高阶)特征。低阶特征可以理解为线性的特征组合,高阶特征,可以理解为经过多次线性-非线性组合操作之后形成的特征,为高度抽象特征。无需Wide&Deep模型人工特征工程。由于输入仅为原始特征,而且FM和DNN共享输入向量特征,DeepFM模型训练速度很快。 + +注解:Wide&Deep是一种融合浅层(wide)模型和深层(deep)模型进行联合训练的框架,综合利用浅层模型的记忆能力和深层模型的泛化能力,实现单模型对推荐系统准确性和扩展性的兼顾。 + +## 2.DeepFM模型结构 + +为了同时利用low-order和high-order特征,DeepFM包含FM和DNN两部分,两部分共享输入特征。对于特征i,标量wi是其1阶特征的权重,该特征和其他特征的交互影响用隐向量Vi来表示。Vi输入到FM模型获得特征的2阶表示,输入到DNN模型得到high-order高阶特征。 + + +$$ +\hat{y} = sigmoid(y_{FM} + y_{DNN}) +$$ + +DeepFM模型结构如下图所示,完成对稀疏特征的嵌入后,由FM层和DNN层共享输入向量,经前向反馈后输出。 + +![](https://ai-studio-static-online.cdn.bcebos.com/8654648d844b4233b3a05e918dedc9b777cf786af2ba49af9a92fc00cd050ef3) + +## 3.FM + +FM(Factorization Machines,因子分解机)最早由Steffen Rendle于2010年在ICDM上提出,它是一种通用的预测方法,在即使数据非常稀疏的情况下,依然能估计出可靠的参数进行预测。与传统的简单线性模型不同的是,因子分解机考虑了特征间的交叉,对所有嵌套变量交互进行建模(类似于SVM中的核函数),因此在推荐系统和计算广告领域关注的点击率CTR(click-through rate)和转化率CVR(conversion rate)两项指标上有着良好的表现。 + +FM模型不单可以建模1阶特征,还可以通过隐向量点积的方法高效的获得2阶特征表示,即使交叉特征在数据集中非常稀疏甚至是从来没出现过。这也是FM的优势所在。 + + +$$ +y_{FM}= + \sum_{j_1=1}^{d}\sum_{j_2=j_1+1}^{d}x_{j_1}\cdot x_{j_2} +$$ + +单独的FM层结构如下图所示: + +![](https://ai-studio-static-online.cdn.bcebos.com/bda8da10940b43ada3337c03332fe06ad1cd95f7780243888050023be33fc88c) + +## 4.DNN + +该部分和Wide&Deep模型类似,是简单的前馈网络。在输入特征部分,由于原始特征向量多是高纬度,高度稀疏,连续和类别混合的分域特征,因此将原始的稀疏表示特征映射为稠密的特征向量。 + +假设子网络的输出层为: + + +$$ +a^{(0)}=[e1,e2,e3,...en] +$$ +DNN网络第l层表示为: + + +$$ +a^{(l+1)}=\sigma{(W^{(l)}a^{(l)}+b^{(l)})} +$$ +再假设有H个隐藏层,DNN部分的预测输出可表示为: + + +$$ +y_{DNN}= \sigma{(W^{|H|+1}\cdot a^H + b^{|H|+1})} +$$ +DNN深度神经网络层结构如下图所示: + +![](https://ai-studio-static-online.cdn.bcebos.com/df8159e1d56646fe868e8a3ed71c6a46f03c716ad1d74f3fae88800231e2f6d8) + +## 5.Loss及Auc计算 + +DeepFM模型的损失函数选择Binary_Cross_Entropy(二值交叉熵)函数 + + +$$ +H_p(q)=-\frac{1}{N}\sum_{i=1}^Ny_i\cdot log(p(y_i))+(1-y_i) \cdot log(1-p(y_i)) +$$ +对于公式的理解,y是样本点,p(y)是该样本为正样本的概率,log(p(y))可理解为对数概率。 + +Auc是Area Under Curve的首字母缩写,这里的Curve指的就是ROC曲线,AUC就是ROC曲线下面的面积,作为模型评价指标,他可以用来评价二分类模型。其中,ROC曲线全称为受试者工作特征曲线 (receiver operating characteristic curve),它是根据一系列不同的二分类方式(分界值或决定阈),以真阳性率(敏感性)为纵坐标,假阳性率(1-特异性)为横坐标绘制的曲线。 + +可使用paddle.metric.Auc()进行调用。 + +## 6.参考文献 + +[[IJCAI 2017]Guo, Huifeng,Tang, Ruiming,Ye, Yunming,Li, Zhenguo,He, Xiuqiang. DeepFM: A Factorization-Machine based Neural Network for CTR Prediction](https://arxiv.org/pdf/1703.04247.pdf) + + + diff --git a/docs/tutorials/recommendation_system/index.rst b/docs/tutorials/recommendation_system/index.rst index 59eb8795e..ef207cb48 100644 --- a/docs/tutorials/recommendation_system/index.rst +++ b/docs/tutorials/recommendation_system/index.rst @@ -2,11 +2,10 @@ ======================== .. toctree:: - :maxdepth: 2 + :maxdepth: 3 :caption: 目录结构 推荐系统基础 推荐系统的评价指标 DSSM - - + DeepFM diff --git a/examples/ABClass/README.md b/examples/ABClass/README.md new file mode 100644 index 000000000..a2c3fa808 --- /dev/null +++ b/examples/ABClass/README.md @@ -0,0 +1,633 @@ +# **一、简要介绍** +  图像分类,根据各自在图像信息中所反映的不同特征,把不同类别的目标区分开来的图像处理方法。它利用计算机对图像进行定量分析,把图像或图像中的每个像元或区域划归为若干个类别中的某一种,以代替人的视觉判读。 + +  本示例简要介绍如何通过飞桨图像识别套件PaddleClas,在飞桨深度学习平台[AI Studio](https://aistudio.baidu.com/aistudio/index)上实现手语字母图像分类,项目连接:[【PaddleClas2.2】英文字母手语识别](https://aistudio.baidu.com/aistudio/projectdetail/2263110),有关PaddleClas的介绍请见:[PaddleClas](https://github.com/PaddlePaddle/PaddleClas)。 + +  在本示例中,使用ResNet50_vd作为骨干网络,开启预训练模型进行微调。 + +  ResNet系列模型是在2015年提出的,一举在ILSVRC2015比赛中取得冠军,top5错误率为3.57%。该网络创新性的提出了残差结构,通过堆叠多个残差结构从而构建了ResNet网络。实验表明使用残差块可以有效地提升收敛速度和精度。斯坦福大学的Joyce Xu将ResNet称为「真正重新定义了我们看待神经网络的方式」的三大架构之一。由于ResNet卓越的性能,越来越多的来自学术界和工业界学者和工程师对其结构进行了改进,比较出名的有Wide-ResNet, ResNet-vc ,ResNet-vd, Res2Net等。 + +  加深网络的深度是能让网络效果变的更好的重要因素,但随着网络的加深,梯度弥散问题会越来越严重,导致网络很难收敛,梯度弥散问题目前有很多的解决办法,包括网络初始标准化,数据标准化以及中间层的标准化(Batch Normalization)等。但除此之外,网络加深还会带来另外一个问题:随着网络加深,网络开始退化,出现训练集准确率下降的现象,如下图 + +![](https://ai-studio-static-online.cdn.bcebos.com/8edafef7e3cd4c9f9ce335417567abd59bf9df9ef57b47abb1c2165e33afc608) + +  为此,ResNet的作者引入了一种名为“残差学习”的思想: + +![](https://ai-studio-static-online.cdn.bcebos.com/f461f7655c874edaa62e4c11ee4e9bc4d4a367e9b0d24082989e9b350a357eec) + +  残差学习的block一共包含两个分支: +* identity mapping,指的是上图右方曲线,代表自身映射; +* residual mapping,指的是另一条分支,称为残差映射。 + +  针对不同深度的ResNet,作者提出了两种Residual Block: + +![](https://ai-studio-static-online.cdn.bcebos.com/436722fef42b42688b75d4658c1db8d61142ed782fbb4c01894eec2020b4beb7) + +  下图为VGG-19,Plain-34(没有使用residual结构)和ResNet-34网络结构对比: + +![](https://ai-studio-static-online.cdn.bcebos.com/208edb003c73406a8e9640498b7c64a86ceb4631b972473b8e676ec5f3c306e5) + +  论文一共提出5种ResNet网络,网络参数统计表如下: +![](https://ai-studio-static-online.cdn.bcebos.com/68a92b8647fc4699abb287a0e1256e31a05f11486d5f4214ae1e993554d14463) + + +# **二、环境设置** + +## 2.1 安装PaddleClas + + +```python +#安装PaddleClas +!git clone https://gitee.com/paddlepaddle/PaddleClas.git work/PaddleClas +``` + +## 2.2 更新前置 + + +```python +#更新前置(如果时间过长,可以尝试把work/PaddleClas/requirements.txt中的opencv-python==4.4.0.46删去) +!pip install --upgrade -r work/PaddleClas/requirements.txt -i https://mirror.baidu.com/pypi/simple +``` + +## 2.3 导入模块 + + +```python +#导入所需库 +import os +import random +from PIL import Image +import matplotlib.pyplot as plt +``` + +# **三、数据集** + +## 3.1 准备数据集 + +  [美国手语字母图像数据集](https://www.kaggle.com/grassknoted/asl-alphabet),训练数据集包含 87,000 张 200x200 像素的图像,有29个类,其中26个分别为字母A-Z,3个分别为SPACE、DELETE和NOTHING。 + +  本数据集已由[bnmvv5](https://aistudio.baidu.com/aistudio/personalcenter/thirdview/71231)上传至AI Studio中:[ASL Alphabet:手语字母表](https://aistudio.baidu.com/aistudio/datasetdetail/99209) + + +```python +#解压数据集 +!unzip -q data/data99209/ASL_Alphabet.zip -d data/ +``` + +## 3.2 数据集概览 + + +```python +imgtestroot = 'data/asl_alphabet_test/asl_alphabet_test' +imglist = os.listdir(imgtestroot) +imglist.sort() +plt.figure(figsize=(20,20)) +for num, imgname in enumerate(imglist): + imgpath = os.path.join(imgtestroot, imgname) + img = Image.open(imgpath) + plt.subplot(7,4,num+1) + plt.imshow(img) + plt.title(imgname) + plt.axis('off') +``` + + +![png](output_13_0.png) + + +## 3.3 标注文件生成 + +  有关标注文件的格式请参照:[数据说明](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.2/docs/zh_CN/tutorials/data.md) + +  本项目将数据集按照0.9 : 0.1的比例划分成训练集和验证集,划分前进行乱序操作 + +``` +# 每一行采用"空格"分隔图像路径与标注 +# 下面是Train.txt中的格式样例 +M/M895.jpg 12 +Z/Z382.jpg 25 +Z/Z1340.jpg 25 +E/E2814.jpg 4 +... +``` + + +```python +#生成数据集划分TXT(0.9 : 0.1) +os.makedirs('work/List') +AllClass = os.listdir("data/asl_alphabet_train/asl_alphabet_train") +AllClass.sort() +TrainLIst = [] +EvalList = [] +TrainTXT = open("work/List/Train.txt","w") +EvalTXT = open("work/List/Eval.txt","w") +IDMapTXT = open("work/List/IDMap.txt","w") +for Label, ABClass in enumerate(AllClass): + #训练集 + for number in range(1,2901): + TrainLIst.append(ABClass + '/' + ABClass + str(number) + '.jpg ' + str(Label)) + #验证集 + for number in range(2901,3001): + EvalList.append(ABClass + '/' + ABClass + str(number) + '.jpg ' + str(Label)) + #类别标记 + IDMapTXT.write(str(Label) + ' ' + ABClass + '\n') +random.shuffle(TrainLIst) +random.shuffle(EvalList) +TrainTXT.write('\n'.join(TrainLIst)) +EvalTXT.write('\n'.join(EvalList)) +TrainTXT.close() +EvalTXT.close() +IDMapTXT.close() +``` + +# **四、模型配置** + +有关配置文件请参考:[配置说明](https://github.com/PaddlePaddle/PaddleClas/blob/release/2.2/docs/zh_CN/tutorials/config_description.md) + + 记得修改配置文件路径及内容,这里需要修改一下epochs,如果打开了预训练模型进行微调,只需要设置成2-3,如果不使用预训练模型,需要将epochs设置成100多轮 + + 此外learning_rate、batch_size和正则化系数请根据网络实际收敛速度自行进行调整,下面给出的是在预训练模型打开的情况下效果尚可的参数 + + 因为一共29类故将class_num设置成29 + + image_root为之前标注文件中目录的根目录,在cls_label_path中引用标注文件 + +``` +# global configs +Global: + checkpoints: null + pretrained_model: null + output_dir: work/output/ #模型保存路径 + device: gpu + save_interval: 1 + eval_during_train: True + eval_interval: 1 + epochs: 3 + print_batch_step: 10 + use_visualdl: False + # used for static mode and model export + image_shape: [3, 200, 200] #图片输入大小 + save_inference_dir: work/inference + +# model architecture +Arch: + name: ResNet50_vd + class_num: 29 + +# loss function config for traing/eval process +Loss: + Train: + - CELoss: + weight: 1.0 + epsilon: 0.1 + Eval: + - CELoss: + weight: 1.0 + +Optimizer: + name: Momentum + momentum: 0.9 + lr: + name: Cosine + learning_rate: 0.01 + #last0.001 + regularizer: + name: 'L2' + coeff: 0.00001 + +# data loader for train and eval +DataLoader: + Train: + dataset: + name: ImageNetDataset + image_root: data/asl_alphabet_train/asl_alphabet_train + cls_label_path: work/List/Train.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 128 + drop_last: False + shuffle: True + loader: + num_workers: 0 + use_shared_memory: True + Eval: + dataset: + name: ImageNetDataset + image_root: data/asl_alphabet_train/asl_alphabet_train + cls_label_path: work/List/Eval.txt + transform_ops: + - DecodeImage: + to_rgb: True + channel_first: False + - RandFlipImage: + flip_code: 1 + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + sampler: + name: DistributedBatchSampler + batch_size: 128 + drop_last: False + shuffle: False + loader: + num_workers: 0 + use_shared_memory: True + +Infer: + infer_imgs: data/asl_alphabet_test/asl_alphabet_test + batch_size: 28 + transforms: + - DecodeImage: + to_rgb: True + channel_first: False + - NormalizeImage: + scale: 1.0/255.0 + mean: [0.485, 0.456, 0.406] + std: [0.229, 0.224, 0.225] + order: '' + - ToCHWImage: + PostProcess: + name: Topk + topk: 1 + class_id_map_file: work/List/IDMap.txt + +Metric: + Train: + Eval: + - TopkAcc: + topk: [1, 5] +``` + +# **五、模型训练** + + +```python +#开始训练 +!export CUDA_VISIBLE_DEVICES=0 +!python work/PaddleClas/tools/train.py \ + -o Arch.pretrained=True \ + -c work/Config/ABClass_ResNet50_vd.yaml +``` + + +```python +#恢复训练 +!python work/PaddleClas/tools/train.py \ + -c work/Config/ABClass_ResNet50_vd.yaml \ + -o Global.checkpoints="work/output/ResNet50_vd/latest" +``` + +# **六、模型评估** + + +```python +#模型评估 +!python3 work/PaddleClas/tools/eval.py \ + -c work/Config/ABClass_ResNet50_vd.yaml \ + -o Global.pretrained_model=work/output/ResNet50_vd/best_model +``` + + /home/aistudio/work/PaddleClas/ppcls/arch/backbone/model_zoo/vision_transformer.py:15: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working + from collections import Callable + /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/__init__.py:107: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working + from collections import MutableMapping + /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/rcsetup.py:20: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working + from collections import Iterable, Mapping + /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/colors.py:53: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working + from collections import Sized + [2021/10/01 13:16:40] root INFO: + =========================================================== + == PaddleClas is powered by PaddlePaddle ! == + =========================================================== + == == + == For more info please go to the following website. == + == == + == https://github.com/PaddlePaddle/PaddleClas == + =========================================================== + + [2021/10/01 13:16:40] root INFO: Arch : + [2021/10/01 13:16:40] root INFO: class_num : 29 + [2021/10/01 13:16:40] root INFO: name : ResNet50_vd + [2021/10/01 13:16:40] root INFO: DataLoader : + [2021/10/01 13:16:40] root INFO: Eval : + [2021/10/01 13:16:40] root INFO: dataset : + [2021/10/01 13:16:40] root INFO: cls_label_path : work/List/Eval.txt + [2021/10/01 13:16:40] root INFO: image_root : data/asl_alphabet_train/asl_alphabet_train + [2021/10/01 13:16:40] root INFO: name : ImageNetDataset + [2021/10/01 13:16:40] root INFO: transform_ops : + [2021/10/01 13:16:40] root INFO: DecodeImage : + [2021/10/01 13:16:40] root INFO: channel_first : False + [2021/10/01 13:16:40] root INFO: to_rgb : True + [2021/10/01 13:16:40] root INFO: RandFlipImage : + [2021/10/01 13:16:40] root INFO: flip_code : 1 + [2021/10/01 13:16:40] root INFO: NormalizeImage : + [2021/10/01 13:16:40] root INFO: mean : [0.485, 0.456, 0.406] + [2021/10/01 13:16:40] root INFO: order : + [2021/10/01 13:16:40] root INFO: scale : 1.0/255.0 + [2021/10/01 13:16:40] root INFO: std : [0.229, 0.224, 0.225] + [2021/10/01 13:16:40] root INFO: loader : + [2021/10/01 13:16:40] root INFO: num_workers : 0 + [2021/10/01 13:16:40] root INFO: use_shared_memory : True + [2021/10/01 13:16:40] root INFO: sampler : + [2021/10/01 13:16:40] root INFO: batch_size : 128 + [2021/10/01 13:16:40] root INFO: drop_last : False + [2021/10/01 13:16:40] root INFO: name : DistributedBatchSampler + [2021/10/01 13:16:40] root INFO: shuffle : False + [2021/10/01 13:16:40] root INFO: Train : + [2021/10/01 13:16:40] root INFO: dataset : + [2021/10/01 13:16:40] root INFO: cls_label_path : work/List/Train.txt + [2021/10/01 13:16:40] root INFO: image_root : data/asl_alphabet_train/asl_alphabet_train + [2021/10/01 13:16:40] root INFO: name : ImageNetDataset + [2021/10/01 13:16:40] root INFO: transform_ops : + [2021/10/01 13:16:40] root INFO: DecodeImage : + [2021/10/01 13:16:40] root INFO: channel_first : False + [2021/10/01 13:16:40] root INFO: to_rgb : True + [2021/10/01 13:16:40] root INFO: RandFlipImage : + [2021/10/01 13:16:40] root INFO: flip_code : 1 + [2021/10/01 13:16:40] root INFO: NormalizeImage : + [2021/10/01 13:16:40] root INFO: mean : [0.485, 0.456, 0.406] + [2021/10/01 13:16:40] root INFO: order : + [2021/10/01 13:16:40] root INFO: scale : 1.0/255.0 + [2021/10/01 13:16:40] root INFO: std : [0.229, 0.224, 0.225] + [2021/10/01 13:16:40] root INFO: loader : + [2021/10/01 13:16:40] root INFO: num_workers : 0 + [2021/10/01 13:16:40] root INFO: use_shared_memory : True + [2021/10/01 13:16:40] root INFO: sampler : + [2021/10/01 13:16:40] root INFO: batch_size : 128 + [2021/10/01 13:16:40] root INFO: drop_last : False + [2021/10/01 13:16:40] root INFO: name : DistributedBatchSampler + [2021/10/01 13:16:40] root INFO: shuffle : True + [2021/10/01 13:16:40] root INFO: Global : + [2021/10/01 13:16:40] root INFO: checkpoints : None + [2021/10/01 13:16:40] root INFO: device : gpu + [2021/10/01 13:16:40] root INFO: epochs : 200 + [2021/10/01 13:16:40] root INFO: eval_during_train : True + [2021/10/01 13:16:40] root INFO: eval_interval : 1 + [2021/10/01 13:16:40] root INFO: image_shape : [3, 200, 200] + [2021/10/01 13:16:40] root INFO: output_dir : work/output/ + [2021/10/01 13:16:40] root INFO: pretrained_model : work/output/ResNet50_vd/best_model + [2021/10/01 13:16:40] root INFO: print_batch_step : 10 + [2021/10/01 13:16:40] root INFO: save_inference_dir : work/inference + [2021/10/01 13:16:40] root INFO: save_interval : 1 + [2021/10/01 13:16:40] root INFO: use_visualdl : False + [2021/10/01 13:16:40] root INFO: Infer : + [2021/10/01 13:16:40] root INFO: PostProcess : + [2021/10/01 13:16:40] root INFO: class_id_map_file : work/List/IDMap.txt + [2021/10/01 13:16:40] root INFO: name : Topk + [2021/10/01 13:16:40] root INFO: topk : 1 + [2021/10/01 13:16:40] root INFO: batch_size : 28 + [2021/10/01 13:16:40] root INFO: infer_imgs : data/asl_alphabet_test/asl_alphabet_test + [2021/10/01 13:16:40] root INFO: transforms : + [2021/10/01 13:16:40] root INFO: DecodeImage : + [2021/10/01 13:16:40] root INFO: channel_first : False + [2021/10/01 13:16:40] root INFO: to_rgb : True + [2021/10/01 13:16:40] root INFO: NormalizeImage : + [2021/10/01 13:16:40] root INFO: mean : [0.485, 0.456, 0.406] + [2021/10/01 13:16:40] root INFO: order : + [2021/10/01 13:16:40] root INFO: scale : 1.0/255.0 + [2021/10/01 13:16:40] root INFO: std : [0.229, 0.224, 0.225] + [2021/10/01 13:16:40] root INFO: ToCHWImage : None + [2021/10/01 13:16:40] root INFO: Loss : + [2021/10/01 13:16:40] root INFO: Eval : + [2021/10/01 13:16:40] root INFO: CELoss : + [2021/10/01 13:16:40] root INFO: weight : 1.0 + [2021/10/01 13:16:40] root INFO: Train : + [2021/10/01 13:16:40] root INFO: CELoss : + [2021/10/01 13:16:40] root INFO: epsilon : 0.1 + [2021/10/01 13:16:40] root INFO: weight : 1.0 + [2021/10/01 13:16:40] root INFO: Metric : + [2021/10/01 13:16:40] root INFO: Eval : + [2021/10/01 13:16:40] root INFO: TopkAcc : + [2021/10/01 13:16:40] root INFO: topk : [1, 5] + [2021/10/01 13:16:40] root INFO: Train : None + [2021/10/01 13:16:40] root INFO: Optimizer : + [2021/10/01 13:16:40] root INFO: lr : + [2021/10/01 13:16:40] root INFO: learning_rate : 0.01 + [2021/10/01 13:16:40] root INFO: name : Cosine + [2021/10/01 13:16:40] root INFO: momentum : 0.9 + [2021/10/01 13:16:40] root INFO: name : Momentum + [2021/10/01 13:16:40] root INFO: regularizer : + [2021/10/01 13:16:40] root INFO: coeff : 1e-05 + [2021/10/01 13:16:40] root INFO: name : L2 + W1001 13:16:40.327559 4786 device_context.cc:404] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.1, Runtime API Version: 10.1 + W1001 13:16:40.331833 4786 device_context.cc:422] device: 0, cuDNN Version: 7.6. + [2021/10/01 13:16:45] root INFO: train with paddle 2.1.2 and device CUDAPlace(0) + {'CELoss': {'weight': 1.0}} + [2021/10/01 13:16:45] root INFO: [Eval][Epoch 0][Iter: 0/23]CELoss: 0.10967, loss: 0.10967, top1: 1.00000, top5: 1.00000, batch_cost: 0.60887s, reader_cost: 0.48905, ips: 210.22458 images/sec + [2021/10/01 13:16:49] root INFO: [Eval][Epoch 0][Iter: 10/23]CELoss: 0.10959, loss: 0.10959, top1: 1.00000, top5: 1.00000, batch_cost: 0.34852s, reader_cost: 0.24066, ips: 367.27212 images/sec + [2021/10/01 13:16:52] root INFO: [Eval][Epoch 0][Iter: 20/23]CELoss: 0.13111, loss: 0.13111, top1: 0.98438, top5: 1.00000, batch_cost: 0.34876s, reader_cost: 0.24073, ips: 367.01294 images/sec + [2021/10/01 13:16:53] root INFO: [Eval][Epoch 0][Avg]CELoss: 0.11529, loss: 0.11529, top1: 0.99724, top5: 1.00000 + + +# **七、模型预测** + + +``` +# 把work/PaddleClas/ppcls/engine/trainer.py里第580行的print删掉并改成下面的样子,方便看结果 +# 别忘了添加(import matplotlib.pyplot as plt和from PIL import Image) + +plt.figure(figsize=(11,16)) +for num, x in enumerate(result): + print(x,end='\n') + imgpath = x['file_name'] + label_names = x['label_names'][0] + img = Image.open(imgpath) + imgname = imgpath.replace('data/asl_alphabet_test/asl_alphabet_test/','') + imgname = imgname.replace('_test.jpg','') + title = 'imgname: ' + imgname + '; predict: ' + label_names + plt.subplot(10,3,num+1) + plt.imshow(img) + plt.title(title) + plt.axis('off') +plt.savefig('work/output/testresults.jpg') +``` + + +```python +#预测效果 +!python work/PaddleClas/tools/infer.py \ + -c work/Config/ABClass_ResNet50_vd.yaml \ + -o Infer.infer_imgs=data/asl_alphabet_test/asl_alphabet_test \ + -o Global.pretrained_model=work/output/ResNet50_vd/best_model + +img = Image.open('work/output/testresults.jpg') +img.show() +``` + + /home/aistudio/work/PaddleClas/ppcls/arch/backbone/model_zoo/vision_transformer.py:15: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working + from collections import Callable + /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/__init__.py:107: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working + from collections import MutableMapping + /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/rcsetup.py:20: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working + from collections import Iterable, Mapping + /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/colors.py:53: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working + from collections import Sized + [2021/10/01 14:32:17] root INFO: + =========================================================== + == PaddleClas is powered by PaddlePaddle ! == + =========================================================== + == == + == For more info please go to the following website. == + == == + == https://github.com/PaddlePaddle/PaddleClas == + =========================================================== + + [2021/10/01 14:32:17] root INFO: Arch : + [2021/10/01 14:32:17] root INFO: class_num : 29 + [2021/10/01 14:32:17] root INFO: name : ResNet50_vd + [2021/10/01 14:32:17] root INFO: DataLoader : + [2021/10/01 14:32:17] root INFO: Eval : + [2021/10/01 14:32:17] root INFO: dataset : + [2021/10/01 14:32:17] root INFO: cls_label_path : work/List/Eval.txt + [2021/10/01 14:32:17] root INFO: image_root : data/asl_alphabet_train/asl_alphabet_train + [2021/10/01 14:32:17] root INFO: name : ImageNetDataset + [2021/10/01 14:32:17] root INFO: transform_ops : + [2021/10/01 14:32:17] root INFO: DecodeImage : + [2021/10/01 14:32:17] root INFO: channel_first : False + [2021/10/01 14:32:17] root INFO: to_rgb : True + [2021/10/01 14:32:17] root INFO: RandFlipImage : + [2021/10/01 14:32:17] root INFO: flip_code : 1 + [2021/10/01 14:32:17] root INFO: NormalizeImage : + [2021/10/01 14:32:17] root INFO: mean : [0.485, 0.456, 0.406] + [2021/10/01 14:32:17] root INFO: order : + [2021/10/01 14:32:17] root INFO: scale : 1.0/255.0 + [2021/10/01 14:32:17] root INFO: std : [0.229, 0.224, 0.225] + [2021/10/01 14:32:17] root INFO: loader : + [2021/10/01 14:32:17] root INFO: num_workers : 0 + [2021/10/01 14:32:17] root INFO: use_shared_memory : True + [2021/10/01 14:32:17] root INFO: sampler : + [2021/10/01 14:32:17] root INFO: batch_size : 128 + [2021/10/01 14:32:17] root INFO: drop_last : False + [2021/10/01 14:32:17] root INFO: name : DistributedBatchSampler + [2021/10/01 14:32:17] root INFO: shuffle : False + [2021/10/01 14:32:17] root INFO: Train : + [2021/10/01 14:32:17] root INFO: dataset : + [2021/10/01 14:32:17] root INFO: cls_label_path : work/List/Train.txt + [2021/10/01 14:32:17] root INFO: image_root : data/asl_alphabet_train/asl_alphabet_train + [2021/10/01 14:32:17] root INFO: name : ImageNetDataset + [2021/10/01 14:32:17] root INFO: transform_ops : + [2021/10/01 14:32:17] root INFO: DecodeImage : + [2021/10/01 14:32:17] root INFO: channel_first : False + [2021/10/01 14:32:17] root INFO: to_rgb : True + [2021/10/01 14:32:17] root INFO: RandFlipImage : + [2021/10/01 14:32:17] root INFO: flip_code : 1 + [2021/10/01 14:32:17] root INFO: NormalizeImage : + [2021/10/01 14:32:17] root INFO: mean : [0.485, 0.456, 0.406] + [2021/10/01 14:32:17] root INFO: order : + [2021/10/01 14:32:17] root INFO: scale : 1.0/255.0 + [2021/10/01 14:32:17] root INFO: std : [0.229, 0.224, 0.225] + [2021/10/01 14:32:17] root INFO: loader : + [2021/10/01 14:32:17] root INFO: num_workers : 0 + [2021/10/01 14:32:17] root INFO: use_shared_memory : True + [2021/10/01 14:32:17] root INFO: sampler : + [2021/10/01 14:32:17] root INFO: batch_size : 128 + [2021/10/01 14:32:17] root INFO: drop_last : False + [2021/10/01 14:32:17] root INFO: name : DistributedBatchSampler + [2021/10/01 14:32:17] root INFO: shuffle : True + [2021/10/01 14:32:17] root INFO: Global : + [2021/10/01 14:32:17] root INFO: checkpoints : None + [2021/10/01 14:32:17] root INFO: device : gpu + [2021/10/01 14:32:17] root INFO: epochs : 200 + [2021/10/01 14:32:17] root INFO: eval_during_train : True + [2021/10/01 14:32:17] root INFO: eval_interval : 1 + [2021/10/01 14:32:17] root INFO: image_shape : [3, 200, 200] + [2021/10/01 14:32:17] root INFO: output_dir : work/output/ + [2021/10/01 14:32:17] root INFO: pretrained_model : work/output/ResNet50_vd/best_model + [2021/10/01 14:32:17] root INFO: print_batch_step : 10 + [2021/10/01 14:32:17] root INFO: save_inference_dir : work/inference + [2021/10/01 14:32:17] root INFO: save_interval : 1 + [2021/10/01 14:32:17] root INFO: use_visualdl : False + [2021/10/01 14:32:17] root INFO: Infer : + [2021/10/01 14:32:17] root INFO: PostProcess : + [2021/10/01 14:32:17] root INFO: class_id_map_file : work/List/IDMap.txt + [2021/10/01 14:32:17] root INFO: name : Topk + [2021/10/01 14:32:17] root INFO: topk : 1 + [2021/10/01 14:32:17] root INFO: batch_size : 28 + [2021/10/01 14:32:17] root INFO: infer_imgs : data/asl_alphabet_test/asl_alphabet_test + [2021/10/01 14:32:17] root INFO: transforms : + [2021/10/01 14:32:17] root INFO: DecodeImage : + [2021/10/01 14:32:17] root INFO: channel_first : False + [2021/10/01 14:32:17] root INFO: to_rgb : True + [2021/10/01 14:32:17] root INFO: NormalizeImage : + [2021/10/01 14:32:17] root INFO: mean : [0.485, 0.456, 0.406] + [2021/10/01 14:32:17] root INFO: order : + [2021/10/01 14:32:17] root INFO: scale : 1.0/255.0 + [2021/10/01 14:32:17] root INFO: std : [0.229, 0.224, 0.225] + [2021/10/01 14:32:17] root INFO: ToCHWImage : None + [2021/10/01 14:32:17] root INFO: Loss : + [2021/10/01 14:32:17] root INFO: Eval : + [2021/10/01 14:32:17] root INFO: CELoss : + [2021/10/01 14:32:17] root INFO: weight : 1.0 + [2021/10/01 14:32:17] root INFO: Train : + [2021/10/01 14:32:17] root INFO: CELoss : + [2021/10/01 14:32:17] root INFO: epsilon : 0.1 + [2021/10/01 14:32:17] root INFO: weight : 1.0 + [2021/10/01 14:32:17] root INFO: Metric : + [2021/10/01 14:32:17] root INFO: Eval : + [2021/10/01 14:32:17] root INFO: TopkAcc : + [2021/10/01 14:32:17] root INFO: topk : [1, 5] + [2021/10/01 14:32:17] root INFO: Train : None + [2021/10/01 14:32:17] root INFO: Optimizer : + [2021/10/01 14:32:17] root INFO: lr : + [2021/10/01 14:32:17] root INFO: learning_rate : 0.01 + [2021/10/01 14:32:17] root INFO: name : Cosine + [2021/10/01 14:32:17] root INFO: momentum : 0.9 + [2021/10/01 14:32:17] root INFO: name : Momentum + [2021/10/01 14:32:17] root INFO: regularizer : + [2021/10/01 14:32:17] root INFO: coeff : 1e-05 + [2021/10/01 14:32:17] root INFO: name : L2 + W1001 14:32:17.844880 10298 device_context.cc:404] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.1, Runtime API Version: 10.1 + W1001 14:32:17.849160 10298 device_context.cc:422] device: 0, cuDNN Version: 7.6. + [2021/10/01 14:32:22] root INFO: train with paddle 2.1.2 and device CUDAPlace(0) + /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/tensor/creation.py:125: DeprecationWarning: `np.object` is a deprecated alias for the builtin `object`. To silence this warning, use `object` by itself. Doing this will not modify any behavior and is safe. + Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations + if data.dtype == np.object: + {'class_ids': [0], 'scores': [0.8845], 'file_name': 'data/asl_alphabet_test/asl_alphabet_test/A_test.jpg', 'label_names': ['A']} + /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/cbook/__init__.py:2349: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working + if isinstance(obj, collections.Iterator): + /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/cbook/__init__.py:2366: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working + return list(data) if isinstance(data, collections.MappingView) else data + {'class_ids': [1], 'scores': [0.89221], 'file_name': 'data/asl_alphabet_test/asl_alphabet_test/B_test.jpg', 'label_names': ['B']} + {'class_ids': [2], 'scores': [0.90011], 'file_name': 'data/asl_alphabet_test/asl_alphabet_test/C_test.jpg', 'label_names': ['C']} + {'class_ids': [3], 'scores': [0.90496], 'file_name': 'data/asl_alphabet_test/asl_alphabet_test/D_test.jpg', 'label_names': ['D']} + {'class_ids': [4], 'scores': [0.89183], 'file_name': 'data/asl_alphabet_test/asl_alphabet_test/E_test.jpg', 'label_names': ['E']} + {'class_ids': [5], 'scores': [0.89325], 'file_name': 'data/asl_alphabet_test/asl_alphabet_test/F_test.jpg', 'label_names': ['F']} + {'class_ids': [6], 'scores': [0.89952], 'file_name': 'data/asl_alphabet_test/asl_alphabet_test/G_test.jpg', 'label_names': ['G']} + {'class_ids': [7], 'scores': [0.90223], 'file_name': 'data/asl_alphabet_test/asl_alphabet_test/H_test.jpg', 'label_names': ['H']} + {'class_ids': [8], 'scores': [0.90036], 'file_name': 'data/asl_alphabet_test/asl_alphabet_test/I_test.jpg', 'label_names': ['I']} + {'class_ids': [9], 'scores': [0.88778], 'file_name': 'data/asl_alphabet_test/asl_alphabet_test/J_test.jpg', 'label_names': ['J']} + {'class_ids': [10], 'scores': [0.89786], 'file_name': 'data/asl_alphabet_test/asl_alphabet_test/K_test.jpg', 'label_names': ['K']} + {'class_ids': [11], 'scores': [0.90049], 'file_name': 'data/asl_alphabet_test/asl_alphabet_test/L_test.jpg', 'label_names': ['L']} + {'class_ids': [12], 'scores': [0.90034], 'file_name': 'data/asl_alphabet_test/asl_alphabet_test/M_test.jpg', 'label_names': ['M']} + {'class_ids': [13], 'scores': [0.90773], 'file_name': 'data/asl_alphabet_test/asl_alphabet_test/N_test.jpg', 'label_names': ['N']} + {'class_ids': [14], 'scores': [0.90272], 'file_name': 'data/asl_alphabet_test/asl_alphabet_test/O_test.jpg', 'label_names': ['O']} + {'class_ids': [15], 'scores': [0.89565], 'file_name': 'data/asl_alphabet_test/asl_alphabet_test/P_test.jpg', 'label_names': ['P']} + {'class_ids': [16], 'scores': [0.92907], 'file_name': 'data/asl_alphabet_test/asl_alphabet_test/Q_test.jpg', 'label_names': ['Q']} + {'class_ids': [17], 'scores': [0.9], 'file_name': 'data/asl_alphabet_test/asl_alphabet_test/R_test.jpg', 'label_names': ['R']} + {'class_ids': [18], 'scores': [0.88346], 'file_name': 'data/asl_alphabet_test/asl_alphabet_test/S_test.jpg', 'label_names': ['S']} + {'class_ids': [19], 'scores': [0.92621], 'file_name': 'data/asl_alphabet_test/asl_alphabet_test/T_test.jpg', 'label_names': ['T']} + {'class_ids': [20], 'scores': [0.90095], 'file_name': 'data/asl_alphabet_test/asl_alphabet_test/U_test.jpg', 'label_names': ['U']} + {'class_ids': [21], 'scores': [0.88883], 'file_name': 'data/asl_alphabet_test/asl_alphabet_test/V_test.jpg', 'label_names': ['V']} + {'class_ids': [22], 'scores': [0.8957], 'file_name': 'data/asl_alphabet_test/asl_alphabet_test/W_test.jpg', 'label_names': ['W']} + {'class_ids': [23], 'scores': [0.88814], 'file_name': 'data/asl_alphabet_test/asl_alphabet_test/X_test.jpg', 'label_names': ['X']} + {'class_ids': [24], 'scores': [0.90919], 'file_name': 'data/asl_alphabet_test/asl_alphabet_test/Y_test.jpg', 'label_names': ['Y']} + {'class_ids': [25], 'scores': [0.90121], 'file_name': 'data/asl_alphabet_test/asl_alphabet_test/Z_test.jpg', 'label_names': ['Z']} + {'class_ids': [27], 'scores': [0.88753], 'file_name': 'data/asl_alphabet_test/asl_alphabet_test/nothing_test.jpg', 'label_names': ['nothing']} + {'class_ids': [28], 'scores': [0.89806], 'file_name': 'data/asl_alphabet_test/asl_alphabet_test/space_test.jpg', 'label_names': ['space']} + + + +![png](output_27_1.png) diff --git a/examples/ABClass/output_13_0.png b/examples/ABClass/output_13_0.png new file mode 100644 index 000000000..d4973ed49 Binary files /dev/null and b/examples/ABClass/output_13_0.png differ diff --git a/examples/ABClass/output_27_1.png b/examples/ABClass/output_27_1.png new file mode 100644 index 000000000..8ac2cc482 Binary files /dev/null and b/examples/ABClass/output_27_1.png differ diff --git a/examples/Pedestrian_Detection_and_Tracking/README.md b/examples/Pedestrian_Detection_and_Tracking/README.md new file mode 100644 index 000000000..206bab12f --- /dev/null +++ b/examples/Pedestrian_Detection_and_Tracking/README.md @@ -0,0 +1,207 @@ +# 人流量统计/人体检测 + +## 1. 项目说明 + +本案例面向人流量统计/人体检测等场景,提供基于PaddleDetection的解决方案,希望通过梳理优化模型精度和性能的思路帮助用户更高效的解决实际问题。 + +本项目AI Studio链接:https://aistudio.baidu.com/aistudio/projectdetail/2421822 + +应用场景:静态场景下的人员计数和动态场景下的人流量统计 + +![demo](./images/demo.png) + +业务难点: + +* 遮挡重识别问题。场景中行人可能比较密集,人与人之间存在遮挡问题。这可能会导致误检、漏检问题。同时,对遮挡后重新出现的行人进行准确的重识别也是一个比较复杂的问题。容易出现ID切换问题。 + +* 行人检测的实时性。在实际应用中,往往对行人检测的处理速度有一定要求。 + + + +## 2. 数据准备 + +### 训练数据集 + +请参照 [数据准备文档](https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.2/docs/tutorials/PrepareMOTDataSet_cn.md) 去下载并准备好所有的数据集,包括 Caltech Pedestrian, CityPersons, CHUK-SYSU, PRW, ETHZ, MOT17和MOT16。训练时,我们采用前六个数据集,共 53694 张已标注好的数据集用于训练。MOT16作为评测数据集。所有的行人都有检测框标签,部分有ID标签。如果您想使用这些数据集,请遵循他们的License。对数据集的详细介绍参见:[数据集介绍](dataset.md) + +### 数据格式 + +上述数据集都遵循以下结构: + +``` +Caltech + |——————images + | └——————00001.jpg + | |—————— ... + | └——————0000N.jpg + └——————labels_with_ids + └——————00001.txt + |—————— ... + └——————0000N.txt +MOT17 + |——————images + | └——————train + | └——————test + └——————labels_with_ids + └——————train +``` + +所有数据集的标注是以统一数据格式提供的。各个数据集中每张图片都有相应的标注文本。给定一个图像路径,可以通过将字符串`images`替换为 `labels_with_ids`并将 `.jpg`替换为`.txt`来生成标注文本路径。在标注文本中,每行都描述一个边界框,格式如下: + +``` +[class] [identity] [x_center] [y_center] [width] [height] +``` + +注意: + +* `class`为`0`,目前仅支持单类别多目标跟踪。 +* `identity`是从`1`到`num_identifies`的整数(`num_identifies`是数据集中不同物体实例的总数),如果此框没有`identity`标注,则为`-1`。 +* `[x_center] [y_center] [width] [height]`是中心点坐标和宽高,它们的值是基于图片的宽度/高度进行标准化的,因此值为从0到1的浮点数。 + +### 数据集目录 + +首先按照以下命令下载`image_lists.zip`并解压放在`dataset/mot`目录下: + +```bash +wget https://dataset.bj.bcebos.com/mot/image_lists.zip +``` + +然后依次下载各个数据集并解压,最终目录为: + +``` +dataset/mot + |——————image_lists + |——————caltech.10k.val + |——————caltech.all + |——————caltech.train + |——————caltech.val + |——————citypersons.train + |——————citypersons.val + |——————cuhksysu.train + |——————cuhksysu.val + |——————eth.train + |——————mot15.train + |——————mot16.train + |——————mot17.train + |——————mot20.train + |——————prw.train + |——————prw.val + |——————Caltech + |——————Cityscapes + |——————CUHKSYSU + |——————ETHZ + |——————MOT15 + |——————MOT16 + |——————MOT17 + |——————PRW +``` + + + +### 调优数据集 + +在进行调优时,我们采用 Caltech Pedestrian, CityPersons, CHUK-SYSU, PRW, ETHZ和MOT17中一半的数据集,使用MOT17另一半数据集作为评测数据集。调优时和训练时使用的数据集不同,主要是因为MOT官网的测试集榜单提交流程比较复杂,这种数据集的使用方式也是学术界慢慢摸索出的做消融实验的方法。调优时使用的训练数据共 51035 张。 + + + +## 3. 模型选择 + +PaddleDetection对于多目标追踪算法主要提供了三种模型,DeepSORT、JDE和FairMOT。 + +- [DeepSORT](https://arxiv.org/abs/1812.00442) (Deep Cosine Metric Learning SORT) 扩展了原有的 [SORT](https://arxiv.org/abs/1703.07402) (Simple Online and Realtime Tracking) 算法,增加了一个CNN模型用于在检测器限定的人体部分图像中提取特征,在深度外观描述的基础上整合外观信息,将检出的目标分配和更新到已有的对应轨迹上即进行一个ReID重识别任务。DeepSORT所需的检测框可以由任意一个检测器来生成,然后读入保存的检测结果和视频图片即可进行跟踪预测。ReID模型此处选择 [PaddleClas](https://github.com/PaddlePaddle/PaddleClas) 提供的`PCB+Pyramid ResNet101`模型。 +- [JDE](https://arxiv.org/abs/1909.12605) (Joint Detection and Embedding) 是在一个单一的共享神经网络中同时学习目标检测任务和embedding任务,并同时输出检测结果和对应的外观embedding匹配的算法。JDE原论文是基于Anchor Base的YOLOv3检测器新增加一个ReID分支学习embedding,训练过程被构建为一个多任务联合学习问题,兼顾精度和速度。 +- [FairMOT](https://arxiv.org/abs/2004.01888) 以Anchor Free的CenterNet检测器为基础,克服了Anchor-Based的检测框架中anchor和特征不对齐问题,深浅层特征融合使得检测和ReID任务各自获得所需要的特征,并且使用低维度ReID特征,提出了一种由两个同质分支组成的简单baseline来预测像素级目标得分和ReID特征,实现了两个任务之间的公平性,并获得了更高水平的实时多目标跟踪精度。 + +综合精度和速度,这里我们选择了FairMOT算法进行人流量统计/人体检测。 + + + +## 4. 模型训练 + +下载PaddleDetection + +```bash +git clone https://github.com/PaddlePaddle/PaddleDetection.git +``` + +**说明:** 本实验使用**PaddleDetection release/2.2**,如遇PaddleDetection更新训练效果出现变动,可尝试下载PaddleDetection 2.2版本进行实验。 + +在训练前先正确安装PaddleDetection所需依赖: + +```bash +cd PaddleDetection/ +pip install -r requirements.txt +``` + +运行如下代码开始训练模型: + +使用两个GPU开启训练 + +```bash +python -m paddle.distributed.launch --log_dir=./fairmot_dla34_30e_1088x608/ --gpus 0,1 tools/train.py -c configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml +``` + + + +## 5. 模型评估 + +FairMOT使用单张GPU通过如下命令一键式启动评估: + +```bash +CUDA_VISIBLE_DEVICES=0 python tools/eval_mot.py -c configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml -o weights=output/fairmot_dla34_30e_1088x608/model_final.pdparams +``` + +**注意:** 默认评估的是MOT-16 Train Set数据集,如需换评估数据集可参照以下代码修改`configs/datasets/mot.yml`,修改`data_root`: + +```bash +EvalMOTDataset: + !MOTImageFolder + dataset_dir: dataset/mot + data_root: MOT16/images/train + keep_ori_im: False # set True if save visualization images or video +``` + + + +## 6. 模型优化(进阶) + +具体内容参见[模型优化文档](./improvements.md)。 + + + +## 7. 模型预测 + +使用单个GPU通过如下命令预测一个视频,并保存为视频 + +```bash +# 预测一个视频 +CUDA_VISIBLE_DEVICES=0 python tools/infer_mot.py -c configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608.pdparams --video_file={your video name}.mp4 --frame_rate=20 --save_videos +``` + +使用单个GPU通过如下命令预测一个图片文件夹,并保存为视频 + +```bash +# 预测一个图片文件夹 +CUDA_VISIBLE_DEVICES=0 python tools/infer_mot.py -c configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608.pdparams --image_dir={your infer images folder} --save_videos +``` + +**注意:** 请先确保已经安装了[ffmpeg](https://ffmpeg.org/ffmpeg.html), Linux(Ubuntu)平台可以直接用以下命令安装:`apt-get update && apt-get install -y ffmpeg`。`--frame_rate`表示视频的帧率,表示每秒抽取多少帧,可以自行设置,默认为-1表示会使用OpenCV读取的视频帧率。 + + + +## 8. 模型导出 + +```bash +CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c configs/mot/fairmot/fairmot_dla34_30e_1088x608.yml -o weights=https://paddledet.bj.bcebos.com/models/mot/fairmot_dla34_30e_1088x608.pdparams +``` + + + +## 9. 模型上线选择 + + + +## 引用 + + + diff --git a/examples/Pedestrian_Detection_and_Tracking/code/centernet_fpn_attention.py b/examples/Pedestrian_Detection_and_Tracking/code/centernet_fpn_attention.py new file mode 100644 index 000000000..c6eca57e4 --- /dev/null +++ b/examples/Pedestrian_Detection_and_Tracking/code/centernet_fpn_attention.py @@ -0,0 +1,303 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import numpy as np +import math +import paddle +import paddle.nn as nn +from paddle.nn.initializer import KaimingUniform +from ppdet.core.workspace import register, serializable +from ppdet.modeling.layers import ConvNormLayer +from ..shape_spec import ShapeSpec + + +import paddle.nn.functional as F +# attention + +# SGE attention +class BasicConv(nn.Layer): + def __init__(self, in_planes, out_planes, kernel_size, stride=1, padding=0, dilation=1, groups=1, relu=True, bn=True, bias_attr=False): + super(BasicConv, self).__init__() + self.out_channels = out_planes + self.conv = nn.Conv2D(in_planes, out_planes, kernel_size=kernel_size, stride=stride, padding=padding, dilation=dilation, groups=groups, bias_attr=bias_attr) + self.bn = nn.BatchNorm2D(out_planes, epsilon=1e-5, momentum=0.01, weight_attr=False, bias_attr=False) if bn else None + self.relu = nn.ReLU() if relu else None + + def forward(self, x): + x = self.conv(x) + if self.bn is not None: + x = self.bn(x) + if self.relu is not None: + x = self.relu(x) + return x + + +class ChannelPool(nn.Layer): + def forward(self, x): + return paddle.concat((paddle.max(x,1).unsqueeze(1), paddle.mean(x,1).unsqueeze(1)), axis=1) + + +class SpatialGate(nn.Layer): + def __init__(self): + super(SpatialGate, self).__init__() + kernel_size = 7 + self.compress = ChannelPool() + self.spatial = BasicConv(2, 1, kernel_size, stride=1, padding=(kernel_size-1) // 2, relu=False) + print(f'************************************ use SpatialGate ************************************') + + def forward(self, x): + x_compress = self.compress(x) + x_out = self.spatial(x_compress) + scale = F.sigmoid(x_out) # broadcasting + return x * scale + + +# used by SANN_Attention +def autopad(k, p=None): # kernel, padding + # Pad to 'same' + if p is None: + p = k // 2 if isinstance(k, int) else [x // 2 for x in k] # auto-pad + return p + + +class Conv(nn.Layer): + # Standard convolution + def __init__(self, c1, c2, k=1, s=1, p=None, g=1, act=True): # ch_in, ch_out, kernel, stride, padding, groups + super(Conv, self).__init__() + self.conv = nn.Conv2D(c1, c2, k, s, autopad(k, p), groups=g, bias_attr=False) + self.bn = nn.BatchNorm2D(c2) + self.act = nn.LeakyReLU(0.1) if act else nn.Identity() + + def forward(self, x): + return self.act(self.bn(self.conv(x))) + + def fuseforward(self, x): + return self.act(self.conv(x)) + + +class SANN_Attention(nn.Layer): + def __init__(self, k_size = 3, ch = 64, s_state = False, c_state = False): + super(SANN_Attention, self).__init__() + print(f'************************************use SANN_Attention s_state => {s_state} -- c_state => {c_state}') + self.avg_pool = nn.AdaptiveAvgPool2D(1) + self.max_pool = nn.AdaptiveAvgPool2D(1) + self.sigmoid = nn.Sigmoid() + self.s_state = s_state + self.c_state = c_state + + if c_state: + self.c_attention = nn.Sequential(nn.Conv1D(1, 1, kernel_size=k_size, padding=(k_size - 1) // 2, bias_attr=False), + nn.LayerNorm([1, ch]), + nn.LeakyReLU(0.3), + nn.Linear(ch, ch, bias_attr=False)) + + if s_state: + self.conv_s = nn.Sequential(Conv(ch, ch // 4, k=1)) + self.s_attention = nn.Conv2D(2, 1, 7, padding=3, bias_attr=False) + + def forward(self, x): + # x: input features with shape [b, c, h, w] + b, c, h, w = x.shape + + # channel_attention + if self.c_state: + y_avg = self.avg_pool(x) + y_max = self.max_pool(x) + y_c = self.c_attention(y_avg.squeeze(-1).transpose((0,2,1))).transpose((0,2,1)).unsqueeze(-1)+\ + self.c_attention(y_max.squeeze(-1).transpose((0,2,1))).transpose((0,2,1)).unsqueeze(-1) + y_c = self.sigmoid(y_c) + + #spatial_attention + if self.s_state: + x_s = self.conv_s(x) + avg_out = paddle.mean(x_s, axis=1, keepdim=True) + max_out = paddle.max(x_s, axis=1, keepdim=True) + y_s = paddle.concat([avg_out, max_out], axis=1) + y_s = self.sigmoid(self.s_attention(y_s)) + + if self.c_state and self.s_state: + y = x * y_s * y_c + x + elif self.c_state: + y = x * y_c + x + elif self.s_state: + y = x * y_s + x + else: + y = x + return y + + +def fill_up_weights(up): + weight = up.weight + f = math.ceil(weight.shape[2] / 2) + c = (2 * f - 1 - f % 2) / (2. * f) + for i in range(weight.shape[2]): + for j in range(weight.shape[3]): + weight[0, 0, i, j] = \ + (1 - math.fabs(i / f - c)) * (1 - math.fabs(j / f - c)) + for c in range(1, weight.shape[0]): + weight[c, 0, :, :] = weight[0, 0, :, :] + + +class IDAUp(nn.Layer): + def __init__(self, ch_ins, ch_out, up_strides, dcn_v2=True): + super(IDAUp, self).__init__() + for i in range(1, len(ch_ins)): + ch_in = ch_ins[i] + up_s = int(up_strides[i]) + proj = nn.Sequential( + ConvNormLayer( + ch_in, + ch_out, + filter_size=3, + stride=1, + use_dcn=dcn_v2, + bias_on=dcn_v2, + norm_decay=None, + dcn_lr_scale=1., + dcn_regularizer=None), + nn.ReLU()) + node = nn.Sequential( + ConvNormLayer( + ch_out, + ch_out, + filter_size=3, + stride=1, + use_dcn=dcn_v2, + bias_on=dcn_v2, + norm_decay=None, + dcn_lr_scale=1., + dcn_regularizer=None), + nn.ReLU()) + + param_attr = paddle.ParamAttr(initializer=KaimingUniform()) + up = nn.Conv2DTranspose( + ch_out, + ch_out, + kernel_size=up_s * 2, + weight_attr=param_attr, + stride=up_s, + padding=up_s // 2, + groups=ch_out, + bias_attr=False) + # TODO: uncomment fill_up_weights + #fill_up_weights(up) + setattr(self, 'proj_' + str(i), proj) + setattr(self, 'up_' + str(i), up) + setattr(self, 'node_' + str(i), node) + + def forward(self, inputs, start_level, end_level): + for i in range(start_level + 1, end_level): + upsample = getattr(self, 'up_' + str(i - start_level)) + project = getattr(self, 'proj_' + str(i - start_level)) + + inputs[i] = project(inputs[i]) + inputs[i] = upsample(inputs[i]) + node = getattr(self, 'node_' + str(i - start_level)) + inputs[i] = node(paddle.add(inputs[i], inputs[i - 1])) + + +class DLAUp(nn.Layer): + def __init__(self, start_level, channels, scales, ch_in=None, dcn_v2=True): + super(DLAUp, self).__init__() + self.start_level = start_level + if ch_in is None: + ch_in = channels + self.channels = channels + channels = list(channels) + scales = np.array(scales, dtype=int) + for i in range(len(channels) - 1): + j = -i - 2 + setattr( + self, + 'ida_{}'.format(i), + IDAUp( + ch_in[j:], + channels[j], + scales[j:] // scales[j], + dcn_v2=dcn_v2)) + scales[j + 1:] = scales[j] + ch_in[j + 1:] = [channels[j] for _ in channels[j + 1:]] + + def forward(self, inputs): + out = [inputs[-1]] # start with 32 + for i in range(len(inputs) - self.start_level - 1): + ida = getattr(self, 'ida_{}'.format(i)) + ida(inputs, len(inputs) - i - 2, len(inputs)) + out.insert(0, inputs[-1]) + return out + + +@register +@serializable +class CenterNetDLAFPN(nn.Layer): + """ + Args: + in_channels (list): number of input feature channels from backbone. + [16, 32, 64, 128, 256, 512] by default, means the channels of DLA-34 + down_ratio (int): the down ratio from images to heatmap, 4 by default + last_level (int): the last level of input feature fed into the upsamplng block + out_channel (int): the channel of the output feature, 0 by default means + the channel of the input feature whose down ratio is `down_ratio` + dcn_v2 (bool): whether use the DCNv2, true by default + + """ + + def __init__(self, + in_channels, + down_ratio=4, + last_level=5, + out_channel=0, + dcn_v2=True): + super(CenterNetDLAFPN, self).__init__() + self.first_level = int(np.log2(down_ratio)) + self.down_ratio = down_ratio + self.last_level = last_level + scales = [2**i for i in range(len(in_channels[self.first_level:]))] + self.dla_up = DLAUp( + self.first_level, + in_channels[self.first_level:], + scales, + dcn_v2=dcn_v2) + self.out_channel = out_channel + if out_channel == 0: + self.out_channel = in_channels[self.first_level] + self.ida_up = IDAUp( + in_channels[self.first_level:self.last_level], + self.out_channel, + [2**i for i in range(self.last_level - self.first_level)], + dcn_v2=dcn_v2) + + self.attention = SpatialGate() + #self.attention = SANN_Attention(c_state = False, s_state = True) # spatial_attention + + @classmethod + def from_config(cls, cfg, input_shape): + return {'in_channels': [i.channels for i in input_shape]} + + def forward(self, body_feats): + dla_up_feats = self.dla_up(body_feats) + + ida_up_feats = [] + for i in range(self.last_level - self.first_level): + ida_up_feats.append(dla_up_feats[i].clone()) + + self.ida_up(ida_up_feats, 0, len(ida_up_feats)) + + feat = ida_up_feats[-1] + feat = self.attention(feat) + return feat + + @property + def out_shape(self): + return [ShapeSpec(channels=self.out_channel, stride=self.down_ratio)] diff --git a/examples/Pedestrian_Detection_and_Tracking/code/centernet_head_iou_head.py b/examples/Pedestrian_Detection_and_Tracking/code/centernet_head_iou_head.py new file mode 100644 index 000000000..f67c84439 --- /dev/null +++ b/examples/Pedestrian_Detection_and_Tracking/code/centernet_head_iou_head.py @@ -0,0 +1,232 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import math +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from paddle.nn.initializer import KaimingUniform +from ppdet.core.workspace import register +from ppdet.modeling.losses import CTFocalLoss, GIoULoss, IouLoss + + +class ConvLayer(nn.Layer): + def __init__(self, + ch_in, + ch_out, + kernel_size, + stride=1, + padding=0, + dilation=1, + groups=1, + bias=False): + super(ConvLayer, self).__init__() + bias_attr = False + fan_in = ch_in * kernel_size**2 + bound = 1 / math.sqrt(fan_in) + param_attr = paddle.ParamAttr(initializer=KaimingUniform()) + if bias: + bias_attr = paddle.ParamAttr( + initializer=nn.initializer.Uniform(-bound, bound)) + self.conv = nn.Conv2D( + in_channels=ch_in, + out_channels=ch_out, + kernel_size=kernel_size, + stride=stride, + padding=padding, + dilation=dilation, + groups=groups, + weight_attr=param_attr, + bias_attr=bias_attr) + + def forward(self, inputs): + out = self.conv(inputs) + + return out + + +@register +class CenterNetHead(nn.Layer): + """ + Args: + in_channels (int): the channel number of input to CenterNetHead. + num_classes (int): the number of classes, 80 by default. + head_planes (int): the channel number in all head, 256 by default. + heatmap_weight (float): the weight of heatmap loss, 1 by default. + regress_ltrb (bool): whether to regress left/top/right/bottom or + width/height for a box, true by default + size_weight (float): the weight of box size loss, 0.1 by default. + offset_weight (float): the weight of center offset loss, 1 by default. + + """ + + __shared__ = ['num_classes'] + + def __init__(self, + in_channels, + num_classes=80, + head_planes=256, + heatmap_weight=1, + regress_ltrb=True, + size_weight=0.1, + offset_weight=1): + super(CenterNetHead, self).__init__() + self.weights = { + 'heatmap': heatmap_weight, + 'size': size_weight, + 'iou': size_weight, + 'offset': offset_weight + } + self.heatmap = nn.Sequential( + ConvLayer( + in_channels, head_planes, kernel_size=3, padding=1, bias=True), + nn.ReLU(), + ConvLayer( + head_planes, + num_classes, + kernel_size=1, + stride=1, + padding=0, + bias=True)) + self.heatmap[2].conv.bias[:] = -2.19 + self.size = nn.Sequential( + ConvLayer( + in_channels, head_planes, kernel_size=3, padding=1, bias=True), + nn.ReLU(), + ConvLayer( + head_planes, + 4 if regress_ltrb else 2, + kernel_size=1, + stride=1, + padding=0, + bias=True)) + self.iou = nn.Sequential( + ConvLayer( + in_channels, head_planes, kernel_size=3, padding=1, bias=True), + nn.ReLU(), + ConvLayer( + head_planes, + 4 if regress_ltrb else 2, + kernel_size=1, + stride=1, + padding=0, + bias=True)) + self.offset = nn.Sequential( + ConvLayer( + in_channels, head_planes, kernel_size=3, padding=1, bias=True), + nn.ReLU(), + ConvLayer( + head_planes, 2, kernel_size=1, stride=1, padding=0, bias=True)) + self.focal_loss = CTFocalLoss() + self.iou_loss = GIoULoss(reduction='sum') + + @classmethod + def from_config(cls, cfg, input_shape): + if isinstance(input_shape, (list, tuple)): + input_shape = input_shape[0] + return {'in_channels': input_shape.channels} + + def forward(self, feat, inputs): + heatmap = self.heatmap(feat) + size = self.size(feat) + iou = self.iou(feat) + offset = self.offset(feat) + if self.training: + loss = self.get_loss(heatmap, size, iou, offset, self.weights, inputs) + return loss + else: + heatmap = F.sigmoid(heatmap) + return {'heatmap': heatmap, 'size': size, 'iou':iou, 'offset': offset} + + def get_loss(self, heatmap, size, iou, offset, weights, inputs): + heatmap_target = inputs['heatmap'] + size_target = inputs['size'] + offset_target = inputs['offset'] + index = inputs['index'] + mask = inputs['index_mask'] + heatmap = paddle.clip(F.sigmoid(heatmap), 1e-4, 1 - 1e-4) + heatmap_loss = self.focal_loss(heatmap, heatmap_target) + + size = paddle.transpose(size, perm=[0, 2, 3, 1]) + size_n, size_h, size_w, size_c = size.shape + size = paddle.reshape(size, shape=[size_n, -1, size_c]) + index = paddle.unsqueeze(index, 2) + batch_inds = list() + for i in range(size_n): + batch_ind = paddle.full( + shape=[1, index.shape[1], 1], fill_value=i, dtype='int64') + batch_inds.append(batch_ind) + batch_inds = paddle.concat(batch_inds, axis=0) + index = paddle.concat(x=[batch_inds, index], axis=2) + pos_size = paddle.gather_nd(size, index=index) + mask = paddle.unsqueeze(mask, axis=2) + size_mask = paddle.expand_as(mask, pos_size) + size_mask = paddle.cast(size_mask, dtype=pos_size.dtype) + pos_num = size_mask.sum() + size_mask.stop_gradient = True + size_target.stop_gradient = True + size_loss = F.l1_loss( + pos_size * size_mask, size_target * size_mask, reduction='sum') + size_loss = size_loss / (pos_num + 1e-4) + + ### giou loss + iou = paddle.transpose(iou, perm=[0, 2, 3, 1]) + iou_n, iou_h, iou_w, iou_c = iou.shape + iou = paddle.reshape(iou, shape=[iou_n, -1, iou_c]) + pos_iou = paddle.gather_nd(iou, index=index) + iou_mask = paddle.expand_as(mask, pos_iou) + iou_mask = paddle.cast(iou_mask, dtype=pos_iou.dtype) + pos_num = iou_mask.sum() + iou_mask.stop_gradient = True + gt_bbox_xys = inputs['bbox_xys'] + gt_bbox_xys.stop_gradient = True + centers_x = (gt_bbox_xys[:,:,0:1] + gt_bbox_xys[:,:,2:3]) / 2.0 + centers_y = (gt_bbox_xys[:,:,1:2] + gt_bbox_xys[:,:,3:4]) / 2.0 + x1 = centers_x - pos_size[:,:,0:1] + y1 = centers_y - pos_size[:,:,1:2] + x2 = centers_x + pos_size[:,:,2:3] + y2 = centers_y + pos_size[:,:,3:4] + pred_boxes = paddle.concat([x1, y1, x2, y2], axis=-1) + + iou_loss = self.iou_loss( + pred_boxes * iou_mask, + gt_bbox_xys * iou_mask, + iou_weight=iou_mask, + loc_reweight=None) + iou_loss = iou_loss / (pos_num + 1e-4) + + offset = paddle.transpose(offset, perm=[0, 2, 3, 1]) + offset_n, offset_h, offset_w, offset_c = offset.shape + offset = paddle.reshape(offset, shape=[offset_n, -1, offset_c]) + pos_offset = paddle.gather_nd(offset, index=index) + offset_mask = paddle.expand_as(mask, pos_offset) + offset_mask = paddle.cast(offset_mask, dtype=pos_offset.dtype) + pos_num = offset_mask.sum() + offset_mask.stop_gradient = True + offset_target.stop_gradient = True + offset_loss = F.l1_loss( + pos_offset * offset_mask, + offset_target * offset_mask, + reduction='sum') + offset_loss = offset_loss / (pos_num + 1e-4) + + det_loss = weights['heatmap'] * heatmap_loss + weights['size'] * size_loss + weights['offset'] * offset_loss + weights['iou'] * iou_loss + + return { + 'det_loss': det_loss, + 'heatmap_loss': heatmap_loss, + 'size_loss': size_loss, + 'iou_loss': iou_loss, + 'offset_loss': offset_loss + } diff --git a/examples/Pedestrian_Detection_and_Tracking/code/dla_backbones.py b/examples/Pedestrian_Detection_and_Tracking/code/dla_backbones.py new file mode 100644 index 000000000..00a238f95 --- /dev/null +++ b/examples/Pedestrian_Detection_and_Tracking/code/dla_backbones.py @@ -0,0 +1,295 @@ +# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import math +import paddle +import paddle.nn as nn +import paddle.nn.functional as F +from ppdet.core.workspace import register, serializable +from ppdet.modeling.layers import ConvNormLayer +from ..shape_spec import ShapeSpec + +DLA_cfg = { + 34: ([1, 1, 1, 2, 2, 1], [16, 32, 64, 128, 256, 512]), + 46: ([1, 1, 1, 2, 2, 1], [16, 32, 64, 64, 128, 256]), + 60: ([1, 1, 1, 2, 3, 1], [16, 32, 128, 256, 512, 1024]), + 102: ([1, 1, 1, 3, 4, 1], [16, 32, 128, 256, 512, 1024]) + } + + +class BasicBlock(nn.Layer): + def __init__(self, ch_in, ch_out, stride=1): + super(BasicBlock, self).__init__() + self.conv1 = ConvNormLayer( + ch_in, + ch_out, + filter_size=3, + stride=stride, + bias_on=False, + norm_decay=None) + self.conv2 = ConvNormLayer( + ch_out, + ch_out, + filter_size=3, + stride=1, + bias_on=False, + norm_decay=None) + + def forward(self, inputs, residual=None): + if residual is None: + residual = inputs + + out = self.conv1(inputs) + out = F.relu(out) + out = self.conv2(out) + out = paddle.add(x=out, y=residual) + out = F.relu(out) + + return out + +class Bottleneck(nn.Layer): + expansion = 2 + + def __init__(self, ch_in, ch_out, stride=1, base_width=64, cardinality=1): + super(Bottleneck, self).__init__() + self.stride = stride + mid_planes = int( + math.floor(ch_out * (base_width / 64)) * cardinality + ) + mid_planes = mid_planes // self.expansion + + self.conv1 = ConvNormLayer( + ch_in, + mid_planes, + filter_size=1, + stride=1, + bias_on=False, + norm_decay=None) + self.conv2 = ConvNormLayer( + mid_planes, + mid_planes, + filter_size=3, + stride=stride, + bias_on=False, + norm_decay=None) + self.conv3 = ConvNormLayer( + mid_planes, + ch_out, + filter_size=1, + stride=1, + bias_on=False, + norm_decay=None) + + def forward(self, inputs, residual=True): + if residual is None: + residual = inputs + out = self.conv1(inputs) + out = F.relu(out) + out = self.conv2(out) + out = F.relu(out) + out = self.conv3(out) + out += residual + out = F.relu(out) + + return out + + +class Root(nn.Layer): + def __init__(self, ch_in, ch_out, kernel_size, residual): + super(Root, self).__init__() + self.conv = ConvNormLayer( + ch_in, + ch_out, + filter_size=1, + stride=1, + bias_on=False, + norm_decay=None) + self.residual = residual + + def forward(self, inputs): + children = inputs + out = self.conv(paddle.concat(inputs, axis=1)) + if self.residual: + out = paddle.add(x=out, y=children[0]) + out = F.relu(out) + + return out + + +class Tree(nn.Layer): + def __init__(self, + level, + block, + ch_in, + ch_out, + stride=1, + level_root=False, + root_dim=0, + root_kernel_size=1, + root_residual=False): + super(Tree, self).__init__() + if root_dim == 0: + root_dim = 2 * ch_out + if level_root: + root_dim += ch_in + if level == 1: + self.tree1 = block(ch_in, ch_out, stride) + self.tree2 = block(ch_out, ch_out, 1) + else: + self.tree1 = Tree( + level - 1, + block, + ch_in, + ch_out, + stride, + root_dim=0, + root_kernel_size=root_kernel_size, + root_residual=root_residual) + self.tree2 = Tree( + level - 1, + block, + ch_out, + ch_out, + 1, + root_dim=root_dim + ch_out, + root_kernel_size=root_kernel_size, + root_residual=root_residual) + + if level == 1: + self.root = Root(root_dim, ch_out, root_kernel_size, root_residual) + self.level_root = level_root + self.root_dim = root_dim + self.downsample = None + self.project = None + self.level = level + if stride > 1: + self.downsample = nn.MaxPool2D(stride, stride=stride) + if ch_in != ch_out: + self.project = ConvNormLayer( + ch_in, + ch_out, + filter_size=1, + stride=1, + bias_on=False, + norm_decay=None) + + def forward(self, x, residual=None, children=None): + children = [] if children is None else children + bottom = self.downsample(x) if self.downsample else x + residual = self.project(bottom) if self.project else bottom + if self.level_root: + children.append(bottom) + x1 = self.tree1(x, residual) + if self.level == 1: + x2 = self.tree2(x1) + x = self.root([x2, x1] + children) + else: + children.append(x1) + x = self.tree2(x1, children=children) + return x + + +@register +@serializable +class DLA(nn.Layer): + """ + DLA, see https://arxiv.org/pdf/1707.06484.pdf + + Args: + depth (int): DLA depth, should be 34. + residual_root (bool): whether use a reidual layer in the root block + + """ + + def __init__(self, depth=34, residual_root=False): + super(DLA, self).__init__() + levels, channels = DLA_cfg[depth] + if depth == 34: + block = BasicBlock + if depth == 46 or depth == 60 or depth == 102: + block = Bottleneck + self.channels = channels + self.base_layer = nn.Sequential( + ConvNormLayer( + 3, + channels[0], + filter_size=7, + stride=1, + bias_on=False, + norm_decay=None), + nn.ReLU()) + self.level0 = self._make_conv_level(channels[0], channels[0], levels[0]) + self.level1 = self._make_conv_level( + channels[0], channels[1], levels[1], stride=2) + self.level2 = Tree( + levels[2], + block, + channels[1], + channels[2], + 2, + level_root=False, + root_residual=residual_root) + self.level3 = Tree( + levels[3], + block, + channels[2], + channels[3], + 2, + level_root=True, + root_residual=residual_root) + self.level4 = Tree( + levels[4], + block, + channels[3], + channels[4], + 2, + level_root=True, + root_residual=residual_root) + self.level5 = Tree( + levels[5], + block, + channels[4], + channels[5], + 2, + level_root=True, + root_residual=residual_root) + + def _make_conv_level(self, ch_in, ch_out, conv_num, stride=1): + modules = [] + for i in range(conv_num): + modules.extend([ + ConvNormLayer( + ch_in, + ch_out, + filter_size=3, + stride=stride if i == 0 else 1, + bias_on=False, + norm_decay=None), nn.ReLU() + ]) + ch_in = ch_out + return nn.Sequential(*modules) + + @property + def out_shape(self): + return [ShapeSpec(channels=self.channels[i]) for i in range(6)] + + def forward(self, inputs): + outs = [] + im = inputs['image'] + feats = self.base_layer(im) + for i in range(6): + feats = getattr(self, 'level{}'.format(i))(feats) + outs.append(feats) + + return outs diff --git a/examples/Pedestrian_Detection_and_Tracking/dataset.md b/examples/Pedestrian_Detection_and_Tracking/dataset.md new file mode 100644 index 000000000..01492e96a --- /dev/null +++ b/examples/Pedestrian_Detection_and_Tracking/dataset.md @@ -0,0 +1,50 @@ +# 数据集介绍 + +**Caltech Pedestrian** + +Caltech Pedestrain 数据集由加州理工提供、由固定在在城市环境中常规行驶的车辆上的摄像头采集得到。数据集包含约10小时的 640x480 30Hz 视频,其中标注了约250,000帧(约137分钟的片段)中的350,000个边界框和2300个行人。更多信息可参考:[Caltech Pedestrain Detection Benchmark](http://www.vision.caltech.edu/Image_Datasets/CaltechPedestrians/) + +![caltech dataset](./images/dataset/caltech.png) + + + +**CityPersons** + +CityPersons 数据集是基于CityScapes数据集在行人检测领域专门建立的数据集,它选取了CityScapes 中5000张精标图片,并对其中的行人进行边界框标注。其中训练集包含2975张图片,验证集包含500张,测试集包含1575张。图片中行人的平均数量为7人,标注提供全身标注和可视区域标注。更多信息可参考:[CityPersons](https://github.com/cvgroup-njust/CityPersons) + +![CityPersons](./images/dataset/citypersons.png) + +**CUHK-SYSU** + +CUHK-SYSU 是一个大规模的人员搜索基准数据集,包含18184张图像和8432个行人,以及99,809个标注好的边界框。根据图像来源,数据集可分为在街道场景下采集和影视剧中采集两部分。在街道场景下,图像通过手持摄像机采集,包含数百个场景,并尝试尽可能的包含不同的视角、光线、分辨率、遮挡和背景等。另一部分数据集采集自影视剧,因为它们可以提供更加多样化的场景和更具挑战性的视角。 + +该数据集为行人检测和人员重识别提供注释。每个查询人会出现在至少两个图像中,并且每个图像可包含多个查询人和更多的其他人员。数据集被划分为训练集和测试集。训练集包含11206张图片和5532个查询人,测试集包含6978张图片和2900个查询人。更多信息可参考:[End-to-End Deep Learning for Person Search](http://www.ee.cuhk.edu.hk/~xgwang/PS/dataset.html) + +![CUHK-SYSU](./images/dataset/cuhk_sysu.png) + +**PRW** + +PRW (Person Re-identification in the Wild) 是一个人员重识别数据集。该数据集采集于清华大学,通过六个摄像机,采集共10小时的视频。数据集被分为训练、验证和测试集。训练集包含5134帧和482个ID,验证集共570帧和482个ID,测试集则包含6112帧和450个ID。每帧中出现的所有行人都会被标注边界框,同时分配一个ID。更多信息可参考:[PRW](https://github.com/liangzheng06/PRW-baseline) + +![prw](./images/dataset/prw.png) + +**ETHZ** + +ETHZ 数据集由一对车载的AVT Marlins F033C摄像头拍摄采集,分辨率为 640x480,帧率为13-14 fps。数据集给出原始图像、标定信息和行人标注信息。更多信息可参考:[ETHZ](https://data.vision.ee.ethz.ch/cvl/aess/dataset/) + +![ETHZ](./images/dataset/ethz.png) + + + +**MOT16** + +MOT16数据集是在2016年提出的用于衡量多目标跟踪检测和跟踪方法标准的数据集,专门用于行人跟踪。其主要标注目标为移动或静止的行人与行进中的车辆。MOT16基于MOT15添加了更细化的标注和更多的边界框,它拥有更加丰富的画面、不同拍摄视角及不同的天气情况。MOT16数据集共有14个视频,其中7个为带有标注的训练集,7个为测试集。它因为提供了标注好的检测结果,因此可以免去目标检测部分,更加关注在目标跟踪部分。更多信息可参考:[MOT16](https://motchallenge.net/data/MOT16/) + +![mot16](./images/dataset/mot16.png) + + + +**MOT17** + +MOT17与MOT16数据集相同,但标注更为准确。更多信息可参考:[MOT17](https://motchallenge.net/data/MOT17/) + diff --git a/examples/Pedestrian_Detection_and_Tracking/images/dataset/caltech.png b/examples/Pedestrian_Detection_and_Tracking/images/dataset/caltech.png new file mode 100644 index 000000000..2680ff3ee Binary files /dev/null and b/examples/Pedestrian_Detection_and_Tracking/images/dataset/caltech.png differ diff --git a/examples/Pedestrian_Detection_and_Tracking/images/dataset/citypersons.png b/examples/Pedestrian_Detection_and_Tracking/images/dataset/citypersons.png new file mode 100644 index 000000000..2f1faf6ce Binary files /dev/null and b/examples/Pedestrian_Detection_and_Tracking/images/dataset/citypersons.png differ diff --git a/examples/Pedestrian_Detection_and_Tracking/images/dataset/cuhk_sysu.png b/examples/Pedestrian_Detection_and_Tracking/images/dataset/cuhk_sysu.png new file mode 100644 index 000000000..d3f051c75 Binary files /dev/null and b/examples/Pedestrian_Detection_and_Tracking/images/dataset/cuhk_sysu.png differ diff --git a/examples/Pedestrian_Detection_and_Tracking/images/dataset/ethz.png b/examples/Pedestrian_Detection_and_Tracking/images/dataset/ethz.png new file mode 100644 index 000000000..4214b400f Binary files /dev/null and b/examples/Pedestrian_Detection_and_Tracking/images/dataset/ethz.png differ diff --git a/examples/Pedestrian_Detection_and_Tracking/images/dataset/mot16.png b/examples/Pedestrian_Detection_and_Tracking/images/dataset/mot16.png new file mode 100644 index 000000000..677b69ec6 Binary files /dev/null and b/examples/Pedestrian_Detection_and_Tracking/images/dataset/mot16.png differ diff --git a/examples/Pedestrian_Detection_and_Tracking/images/dataset/prw.png b/examples/Pedestrian_Detection_and_Tracking/images/dataset/prw.png new file mode 100644 index 000000000..95676ee18 Binary files /dev/null and b/examples/Pedestrian_Detection_and_Tracking/images/dataset/prw.png differ diff --git a/examples/Pedestrian_Detection_and_Tracking/images/demo.png b/examples/Pedestrian_Detection_and_Tracking/images/demo.png new file mode 100644 index 000000000..6270f947a Binary files /dev/null and b/examples/Pedestrian_Detection_and_Tracking/images/demo.png differ diff --git a/examples/Pedestrian_Detection_and_Tracking/images/optimization/cutmix.png b/examples/Pedestrian_Detection_and_Tracking/images/optimization/cutmix.png new file mode 100644 index 000000000..7dd679779 Binary files /dev/null and b/examples/Pedestrian_Detection_and_Tracking/images/optimization/cutmix.png differ diff --git a/examples/Pedestrian_Detection_and_Tracking/images/optimization/dcn.png b/examples/Pedestrian_Detection_and_Tracking/images/optimization/dcn.png new file mode 100644 index 000000000..91056995a Binary files /dev/null and b/examples/Pedestrian_Detection_and_Tracking/images/optimization/dcn.png differ diff --git a/examples/Pedestrian_Detection_and_Tracking/images/optimization/dla.png b/examples/Pedestrian_Detection_and_Tracking/images/optimization/dla.png new file mode 100644 index 000000000..0bf77782f Binary files /dev/null and b/examples/Pedestrian_Detection_and_Tracking/images/optimization/dla.png differ diff --git a/examples/Pedestrian_Detection_and_Tracking/improvements.md b/examples/Pedestrian_Detection_and_Tracking/improvements.md new file mode 100644 index 000000000..3e4c4e68c --- /dev/null +++ b/examples/Pedestrian_Detection_and_Tracking/improvements.md @@ -0,0 +1,275 @@ +## 6. 模型优化(进阶) + +### 6.1 精度优化 + +本小节侧重展示在模型优化过程中,提升模型精度的思路。在这些思路中,有些会对精度有所提升,有些没有。在其他人流量统计/人体检测场景中,可以根据实际情况尝试如下策略,不同的场景下可能会有不同的效果。 + +#### (1) 基线模型选择 + +本案例采用FairMOT模型作为基线模型,其骨干网络选择是DLA34。基线模型共有三种: + +1)训练基于NVIDIA Tesla V100 32G 2GPU,batch size = 6,使用Adam优化器,模型使用CrowdHuman数据集进行预训练; + +2)训练基于NVIDIA Tesla V100 32G 4GPU,batch size = 8,使用Momentum优化器,模型使用CrowdHuman数据集进行预训练; + +3)训练基于NVIDIA Tesla V100 32G 4GPU,batch size = 8,使用Momentum优化器,模型使用ImageNet数据集进行预训练。 + +模型优化时使用的数据集,参见 `调优数据集`。 + +| 模型 | MOTA | 推理速度 | +| ------------------------------------------------------ | ---- | -------- | +| baseline (dla34 2gpu bs6 adam lr=0.0001) | 70.9 | 15.600 | +| baseline (dla34 4gpu bs8 momentum) | 67.5 | 15.291 | +| baseline (dla34 4gpu bs8 momentum + imagenet_pretrain) | 64.3 | 15.314 | + + + +#### (2) 数据增强 + +**增加cutmix** + +下图中展示了三种数据增强的方式: + +* Mixup: 将随机两幅图像以一定的全值叠加构成新的图像; +* Cutout:将图像中随机区域剪裁掉,用0像素值来填充; +* CutMix:将一张图像中的随机区域剪裁掉,并随机选取另一张图片,用其对应区域中的像素来填充剪裁掉的部分。 + +![cutmix](./images/optimization/cutmix.png) + +相比于Mixup和Cutout,CutMix在图像分类和目标检测任务上都有更好的效果。因为CutMix要求模型从局部识别对象,可以进一步增强模型定位能力。 + +实现上,可以通过修改 `configs/mot/fairmot/__base__/fairmot_reader_1088x608.yml`,加入如下代码,实现CutMix数据增强: + +```yaml +TrainReader: + inputs_def: + image_shape: [3, 608, 1088] + sample_transforms: + - Decode: {} + - RGBReverse: {} + - AugmentHSV: {} + - LetterBoxResize: {target_size: [608, 1088]} + - MOTRandomAffine: {reject_outside: False} + - RandomFlip: {} + + - Cutmix: {} + + - BboxXYXY2XYWH: {} + - NormalizeBox: {} + - NormalizeImage: {mean: [0, 0, 0], std: [1, 1, 1]} + - RGBReverse: {} + - Permute: {} +``` + +实验结果: + +| 模型 | MOTA | 推理速度 | +| -------------------------------- | ---- | -------- | +| dla34 4gpu bs8 momentum + cutmix | 67.7 | 15.528 | + +在baseline中加入cutmix,模型MOTA提升0.2%。 + + + +#### (3) 可变形卷积 + +可变形卷积(Deformable Convolution Network, DCN)顾名思义就是卷积的位置是可变形的,并非在传统的 $N \times N$ 网格上做卷积,这样的好处就是更准确地提取到我们想要的特征(传统的卷积仅仅只能提取到矩形框的特征),通过一张图我们可以更直观地了解: + +![dcn](./images/optimization/dcn.png) + +在上面这张图里面,左边传统的卷积显然没有提取到完整绵羊的特征,而右边的可变形卷积则提取到了完整的不规则绵羊的特征。本实验在 CenterNet head 中加入了DCN,具体实现方法为:使用 `code/centernet_head_dcn.py` 中的代码替换 `ppdet/modeling/heads/centernet_head.py` 中的代码。 + +实验结果: + +| 模型 | MOTA | 推理速度 | +| ----------------------------- | ---- | -------- | +| dla34 4gpu bs8 momentum + dcn | 67.2 | 16.695 | + +在baseline中加入dcn,模型MOTA降低0.3%。 + + + +#### (4) syncbn+ema + +**syncbn** + +默认情况下,在使用多个GPU卡训练模型的时候,Batch Normalization都是非同步的 (unsynchronized)。每次迭代时,输入被分为多等分,然后在不同的卡上进行前向后向运算,每个卡上的模型都是单独运算的,相应的Batch Normalization也是在卡内完成。因此BN所归一化的样本数量也只局限于卡内的样本数。开启跨卡同步Batch Normalization后,在前向运算时即可得到全局的均值和方差,后向运算时得到相应的全局梯度,卡与卡之间同步。 + +**ema** + +在深度学习中,经常会使用EMA(指数移动平均)这个方法对模型的参数做平均,以求提高测试指标并增加模型鲁棒。指数移动平均(Exponential Moving Average)也叫权重移动平均(Weighted Moving Average),是一种给予近期数据更高权重的平均方法。在深度学习优化中,其基本假设为,模型权重在最后的n步内会在最优点附近震荡,所以我们取n步后的平均值,则能使模型更加鲁棒。 + +本实验中,使用synbn和ema,可以通过在 `configs/mot/fairmot/_base_/fairmot_dla34.yml` 中,进行如下修改: + +```yaml +architecture: FairMOT +pretrain_weights: https://paddledet.bj.bcebos.com/models/pretrained/fairmot_dla34_crowdhuman_pretrained.pdparams +norm_type: sync_bn +use_ema: true +ema_decay: 0.9998 +``` + +实验结果: + +| 模型 | MOTA | 推理速度 | +| -------------------------------------- | ---- | -------- | +| dla34 4gpu bs8 momentum + syncbn + ema | 67.4 | 16.695 | + +在baseline上开启syncbn和ema,模型MOTA降低0.1%。 + + + +#### (5) 优化策略 + +Adam使用动量和自适应学习率来加快收敛速度。对梯度的一阶矩阵估计和二阶矩阵估计进行综合考虑,以此计算更新步长。本实验中可以通过在 `PaddleDetection/configs/mot/fairmot/_base_/optimizer_30e.yml` 中,进行如下修改: + +```yaml +LearningRate: + base_lr: 0.0002 + schedulers: + - !PiecewiseDecay + gamma: 0.1 + milestones: [20,] + use_warmup: False + +OptimizerBuilder: + optimizer: + type: Adam + regularizer: NULL +``` + +实验结果: + +| 模型 | MOTA | 推理速度 | +| ----------------------------- | ---- | -------- | +| dla34 4gpu bs6 adam lr=0.0002 | 71.1 | 15.823 | + + + +#### (6) attention + +**Spatial Gate** + +用 `code/centernet_fpn_attention.py` 中的代码替换 `PaddleDetection/ppdet/modeling/necks/centernet_fpn.py`。设置 `self.attention = SpatialGate()`。 + +```python +self.attention = SpatialGate() +# self.attention = SANN_Attention(c_state = False, s_state = True) # spatial_attention +``` + + + +**SANN attention** + +用 `code/centernet_fpn_attention.py` 中的代码替换 `PaddleDetection/ppdet/modeling/necks/centernet_fpn.py`。设置 `self.attention = SANN_Attention(c_state = False, s_state = True) # spatial_attention` 。 + +```python +# self.attention = SpatialGate() +self.attention = SANN_Attention(c_state = False, s_state = True) # spatial_attention +``` + + + +实验结果: + +| 模型 | MOTA | 推理速度 | +| ------------------------------------------------------------ | ---- | -------- | +| dla34 4gpu bs8 momentum + attention | 67.6 | - | +| dla34 4gpu bs6 adam lr=0.0002 + syncbn + ema + attention | 71.6 | - | +| dla34 dla34 4gpu bs6 adam lr=0.0002 + syncbn + ema + sann | 71.1 | - | +| dla34 4gpu bs6 adam lr=0.0002 + syncbn + ema + attention + cutmix | 71.3 | - | + +在baseline上新增了attention,模型MOTA增加0.1%。在baseline上回合使用优化策略、+syncbn+ema和增加spatial gate/ sann attention,模型MOTA增加4.1% / 3.6%。新增attention部分暂不支持开启TensorRT进行推理。 + + + +#### (7) backbone + +![dla](./images/optimization/dla.png) + +在本实验中,我们尝试将baseline中的centernet的backbone由DLA-34更换为其他更大的模型,如DLA-46-C、DLA-60及DLA-102。因为更换的backbone都只有在ImageNet上的预训练模型,而我们实验中使用的dla34 backbone 是在CrowdHuman上做过预训练的。所以这一部分的实验结果要与 `baseline (dla34 4gpu bs8 momentum + image_pretrain)` 进行比较。替换backbone可以通过 `code/dla_backbones`中的代码来替换 `PaddleDetection/ppdet/modeling/backbones/dla.py` 中的代码,并通过调整 `depth` 来选择backbone的结构,可选择dla34、46c、60和102。 + +```python +class DLA(nn.Layer): + """ + DLA, see https://arxiv.org/pdf/1707.06484.pdf + + Args: + depth (int): DLA depth, should be 34. + residual_root (bool): whether use a reidual layer in the root block + + """ + + def __init__(self, depth=34, residual_root=False): + +``` + + + +| 模型 | MOTA | 推理速度 | +| -------------------------------------------- | ---- | -------- | +| dla46c 4gpu bs8 momentum + imagenet_pretrain | 61.2 | 16.863 | +| dla60 4gpu bs8 momentum + imagenet_pretrain | 58.8 | 12.531 | +| dla102 4gpu bs8 momentum + imagenet_pretrain | 54.8 | 12.469 | + + + +#### (8) GIoU Loss + +GIoU解决了IoU Loss存在的两个问题: + +* 预测框如果和真实框没有重叠,则IoU始终为0,损失函数失去可导性; +* IoU无法分辨不同方式的对齐,IoU值相同,但预测框的方向可能完全不同。 + +GIoU提出一种计算方式,对于两个框A和B,先计算出A、B的最小包围框C,然后根据如下公式计算出GIoU: +$$ +GIoU = IoU - \frac{C-(A \cup B)}{C} +$$ +GIoU Loss = 1 - GIoU. 如想尝试增加GIoU Loss,可用 `code/centernet_head_iou_head.py` 替换 `ppdet/modeling/heads/centernet_head.py` 中的代码,并且修改 `ppdet/modeling/architectures/fairmot.py` 文件,在第84行增加 `'iou_loss': det_outs['iou_loss'],` : + +```python +det_loss = det_outs['det_loss'] +loss = self.loss(det_loss, reid_loss) +loss.update({ + 'heatmap_loss': det_outs['heatmap_loss'], + 'size_loss': det_outs['size_loss'], + 'iou_loss': det_outs['iou_loss'], + 'offset_loss': det_outs['offset_loss'], + 'reid_loss': reid_loss + }) +return loss +``` + +实验结果如下: + +| 模型 | MOTA | 推理速度 | +| ------------------------------------------------------- | ---- | -------- | +| dla34 4gpu bs6 adam lr=0.0002 + syncbn + ema + iou head | 71.6 | 15.723 | + + + +### 6.2 性能优化 + +暂无 + +全部实验结果: + +| 模型 | MOTA | 推理速度(开启TensorRT) | +| ------------------------------------------------------------ | ---- | ------------------------ | +| baseline (dla34 2gpu bs6 adam lr=0.0001) | 70.9 | 15.600 | +| baseline (dla34 4gpu bs8 momentum) | 67.5 | 15.291 | +| baseline (dla34 4gpu bs8 momentum + imagenet_pretrain) | 64.3 | 15.314 | +| dla34 4gpu bs8 momentum + dcn | 67.2 | 16.695 | +| dla34 4gpu bs8 momentum + syncbn + ema | 67.4 | 16.103 | +| dla34 4gpu bs8 momentum + cutmix | 67.7 | 15.528 | +| dla34 4gpu bs8 momentum + attention | 67.6 | - | +| dla34 4gpu bs6 adam lr=0.0002 | 71.1 | 15.823 | +| dla34 4gpu bs6 adam lr=0.0002 + syncbn + ema | 71.7 | 15.038 | +| dla34 4gpu bs6 adam lr=0.0002 + syncbn + ema + attention | 71.6 | - | +| dla34 4gpu bs6 adam lr=0.0002 + syncbn + ema + iou head | 71.6 | 15.723 | +| dla34 4gpu bs6 adam lr=0.0002 + syncbn + ema + sann | 71.1 | - | +| dla34 4gpu bs6 adam lr=0.0002 + syncbn + ema + attention + cutmix | 71.3 | - | +| dla46c 4gpu bs8 momentum + imagenet_pretrain | 61.2 | 16.863 | +| dla60 4gpu bs8 momentum + imagenet_pretrain | 58.8 | 12.531 | +| dla102 4gpu bs8 momentum + imagenet_pretrain | 54.8 | 12.469 | +