ch_PP-OCRv4_rec_svtr_large.yml训练导出的模型，使用predict_rec.py预测宽度比较大的图片时出现(InvalidArgument) Broadcast dimension mismatch #12440

LMR2018 · 2024-05-25T09:03:04Z

使用ch_PP-OCRv4_rec_svtr_large.yml训练的OCR识别模型，训练正常，
使用python tools/eval.py -c configs/rec/PP-OCRv4/ch_PP-OCRv4_rec_svtr_large.yml也是正常的，
用python tools/export_model.py -c configs/rec/PP-OCRv4/ch_PP-OCRv4_rec_svtr_large.yml 也是能成功导出模型的
用export_model.py导出的模型，使用python tools/infer/predict_rec.py预测宽度不太长的单行图片也是能正常预测的，
但是predict_rec.py预测宽度比较大的图片时出现：(InvalidArgument) Broadcast dimension mismatch错误
即使是训练、验证用的宽度比较大的图片也是出现这个错误，这个怎么解决？
训练配置：image_shape: [3, 48, 320]， max_text_length: &max_text_length 50

`[2024/05/25 14:48:36] ppocr INFO: Traceback (most recent call last):
File "tools/infer/predict_rec.py", line 728, in main
rec_res, _ = text_recognizer(img_list)
File "tools/infer/predict_rec.py", line 675, in call
self.predictor.run()
ValueError: (InvalidArgument) Broadcast dimension mismatch. Operands could not be broadcast together with the shape of X = [1, 2148, 192] and the shape of Y = [1, 960, 192]. Received [2148] in X is not equal to [960] in Y at i:1.
[Hint: Expected x_dims_array[i] == y_dims_array[i] || x_dims_array[i] <= 1 || y_dims_array[i] <= 1 == true, but received x_dims_array[i] == y_dims_array[i] || x_dims_array[i] <= 1 || y_dims_array[i] <= 1:0 != true:1.] (at ../paddle/phi/kernels/funcs/common_shape.h:86)
[operator < elementwise_add > error]

[2024/05/25 14:48:36] ppocr INFO: (InvalidArgument) Broadcast dimension mismatch. Operands could not be broadcast together with the shape of X = [1, 2148, 192] and the shape of Y = [1, 960, 192]. Received [2148] in X is not equal to [960] in Y at i:1.
[Hint: Expected x_dims_array[i] == y_dims_array[i] || x_dims_array[i] <= 1 || y_dims_array[i] <= 1 == true, but received x_dims_array[i] == y_dims_array[i] || x_dims_array[i] <= 1 || y_dims_array[i] <= 1:0 != true:1.] (at ../paddle/phi/kernels/funcs/common_shape.h:86)
[operator < elementwise_add > error]`

LMR2018 · 2024-05-25T09:06:17Z

GreatV · 2024-05-25T09:18:47Z

直接resize 成 [3, 48, 320] 可以infer吗

LMR2018 · 2024-05-25T09:51:33Z

直接resize 成 [3, 48, 320] 可以infer吗

一开始就进行直接resize成（320， 48），不行，后面官方又做了别的处理了，最后图片不是（320，48，3）

LMR2018 · 2024-05-25T10:10:56Z

还发现一个问题，上面28032、904118的原始图片出错，但是用过58587，579102，583271，359190的却能正常，所以在猜想不是宽度太长的问题，用ch_PP-OCRv4_rec.yml配置训练的模型就没有这个问题

nissansz · 2024-06-02T01:31:27Z

ch_PP-OCRv4_rec.yml

ch_PP-OCRv4_rec.yml 这个配置我怎么训练不了，batch size = 1都爆显存。
ch_PP-OCRv4_rec.yml 这个配置没问题，可以训练

GreatV added the bug Something isn't working label May 25, 2024

GreatV self-assigned this May 25, 2024

nissansz mentioned this issue Jun 2, 2024

svtr yml用来训练，爆显存，batch_size=1也不行 #12517

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ch_PP-OCRv4_rec_svtr_large.yml训练导出的模型，使用predict_rec.py预测宽度比较大的图片时出现(InvalidArgument) Broadcast dimension mismatch #12440

ch_PP-OCRv4_rec_svtr_large.yml训练导出的模型，使用predict_rec.py预测宽度比较大的图片时出现(InvalidArgument) Broadcast dimension mismatch #12440

LMR2018 commented May 25, 2024

LMR2018 commented May 25, 2024

GreatV commented May 25, 2024

LMR2018 commented May 25, 2024

LMR2018 commented May 25, 2024

nissansz commented Jun 2, 2024

ch_PP-OCRv4_rec_svtr_large.yml训练导出的模型，使用predict_rec.py预测宽度比较大的图片时出现(InvalidArgument) Broadcast dimension mismatch #12440

ch_PP-OCRv4_rec_svtr_large.yml训练导出的模型，使用predict_rec.py预测宽度比较大的图片时出现(InvalidArgument) Broadcast dimension mismatch #12440

Comments

LMR2018 commented May 25, 2024

LMR2018 commented May 25, 2024

GreatV commented May 25, 2024

LMR2018 commented May 25, 2024

LMR2018 commented May 25, 2024

nissansz commented Jun 2, 2024