基于PaddleOCR2.4的天池街景字符编码识别Baseline
时间:2025-07-29 | 作者: | 阅读:0该内容为天池街景字符编码识别比赛的实现过程。介绍了赛题数据来自SVHN数据集,含训练集3W张、验证集1W张等。使用PaddleOCR,经数据准备、参数配置,以CRNN算法、MobileNetV3骨干网等训练,还涉及评估、预测及模型导出,最终可生成提交结果,基础跑分为82分。
一、 天池街景字符编码识别比赛
比赛地址:https://tianchi.aliyun.com/competition/entrance/531795/information
1.数据来源
赛题来源自Google街景图像中的门牌号数据集(The Street View House Numbers Dataset, SVHN),并根据一定方式采样得到比赛数据集。
2.数据基本情况
该数据来自真实场景的门牌号。训练集数据包括3W张照片,验证集数据包括1W张照片,每张照片包括颜色图像和对应的编码类别和具体位置;为了保证比赛的公平性,测试集A包括4W张照片,测试集B包括4W张照片。
enter image description here
3.数据集样本展示
4.字段表
所有的数据(训练集、验证集和测试集)的标注使用JSON格式,并使用文件名进行索引。如果一个文件中包括多个字符,则使用列表将字段进行组合。
注:数据集来源自SVHN,网页链接http://ufldl.stanford.edu/housenumbers/,并进行匿名处理和噪音处理,请各位选手使用比赛给定的数据集完成训练。
二、环境设置
PaddleOCR?https://github.com/paddlepaddle/PaddleOCR?是一款全宇宙最强的用的OCR工具库,开箱即用,速度杠杠的。
In [?]# 从gitee上下载PaddleOCR代码,也可以从GitHub链接下载!git clone https://gitee.com/paddlepaddle/PaddleOCR.git --depth=1# 升级pip!pip install -U pip # 安装依赖%cd ~/PaddleOCR%pip install -r requirements.txt登录后复制In [?]
%cd ~/PaddleOCR/!tree -L 1登录后复制
/home/aistudio/PaddleOCR.├── benchmark├── configs├── deploy├── doc├── __init__.py├── LICENSE├── MANIFEST.in├── paddleocr.py├── ppocr├── PPOCRLabel├── ppstructure├── README_ch.md├── README.md├── requirements.txt├── setup.py├── StyleText├── test_tipc├── tools└── train.sh10 directories, 9 files登录后复制
三、数据准备
据悉train数据集共10万张,解压,并划分出10000张作为测试集。
1.数据下载解压
In [?]# 解压缩数据集%cd ~!unzip -qoa data/data124095/street_code_rec_data.zip -d ~/data/登录后复制
/home/aistudio登录后复制In [?]
# 重命名文件夹!mv data/街景编码识别 data/street_code_rec_data登录后复制In [?]
# 解压test数据集!unzip -qoa data/street_code_rec_data/mchar_test_a.zip -d data/street_code_rec_data/登录后复制In [?]
# 解压eval据集!unzip -qoa data/street_code_rec_data/mchar_val.zip -d data/street_code_rec_data/登录后复制In [?]
# 解压train数据集!unzip -qoa data/street_code_rec_data/mchar_train.zip -d data/street_code_rec_data/登录后复制In [?]
# 使用命令查看训练数据文件夹下数据量是否是3张!cd data/street_code_rec_data/mchar_train && ls -l | grep ”^-“ | wc -l登录后复制
30000登录后复制In [?]
# 使用命令查看test数据文件夹下数据量是否是4万张!cd data/street_code_rec_data/mchar_test_a && ls -l | grep ”^-“ | wc -l登录后复制
40000登录后复制In [?]
# 使用命令查看test数据文件夹下数据量是否是1万张!cd data/street_code_rec_data/mchar_val && ls -l | grep ”^-“ | wc -l登录后复制
10000登录后复制In [?]
%cd data/street_code_rec_data!rm *.zip%cd ~登录后复制
/home/aistudio/data/street_code_rec_data/home/aistudio登录后复制
2. 数据标签处理
In [?]import jsondef trans(path): with open(path + '.json', 'r') as f: json_data = json.load(f) print(len(json_data)) with open(path + '.csv', 'w') as ff: for item in json_data: label = json_data[item]['label'] label = [str(x) for x in label] label = ''.join(label) ff.write(item + 't' + label + 'n')登录后复制In [?]
trans('data/street_code_rec_data/mchar_val')trans('data/street_code_rec_data/mchar_train')登录后复制
1000030000登录后复制
3. 数据查看
In [?]!head data/street_code_rec_data/mchar_val.csv登录后复制
000000.png5000001.png210000002.png6000003.png1000004.png9000005.png1000006.png183000007.png65000008.png144000009.png16登录后复制In [?]
!head data/street_code_rec_data/mchar_train.csv登录后复制
000000.png19000001.png23000002.png25000003.png93000004.png31000005.png33000006.png28000007.png744000008.png128000009.png16登录后复制In [?]
from PIL import Imageimg=Image.open('data/street_code_rec_data/mchar_train/000000.png')print(img.size)img登录后复制
(741, 350)登录后复制
<PIL.PngImagePlugin.PngImageFile image mode=RGB size=741x350 at 0x7F134A1CAB10>登录后复制
四、配置训练参数
以PaddleOCR/configs/rec/ch_ppocr_v2.0/rec_chinese_lite_train_v2.0.yml为基准进行配置
1.配置模型网络
使用CRNN算法,backbone是MobileNetV3,损失函数是CTCLoss
Architecture: model_type: rec algorithm: CRNN Transform: Backbone: name: MobileNetV3 scale: 0.5 model_name: small small_stride: [1, 2, 2, 2] Neck: name: SequenceEncoder encoder_type: rnn hidden_size: 48 Head: name: CTCHead fc_decay: 0.00001登录后复制
2.配置数据
对Train.data_dir, Train.label_file_list, Eval.data_dir, Eval.label_file_list进行配置
Train: dataset: name: SimpleDataSet data_dir: /home/aistudio/data/street_code_rec_data/mchar_train label_file_list: [”/home/aistudio/data/street_code_rec_data/mchar_train.csv“]......Eval: dataset: name: SimpleDataSet data_dir: /home/aistudio/data/street_code_rec_data/mchar_val label_file_list: [”/home/aistudio/data/street_code_rec_data/mchar_val.csv“]登录后复制
3. 显卡、评估设置
use_gpu、cal_metric_during_train分别是GPU、评估开关
Global: use_gpu: false # true 使用GPU ..... cal_metric_during_train: False # true 打开评估登录后复制
4. 多线程任务
Train.loader.num_workers:4Eval.loader.num_workers: 4登录后复制
5.完整配置
Global: use_gpu: True epoch_num: 500 log_smooth_window: 20 print_batch_step: 10 save_model_dir: ./output/rec_en_number_lite save_epoch_step: 3 # evaluation is run every 5000 iterations after the 4000th iteration eval_batch_step: [1000, 100] # if pretrained_model is saved in static mode, load_static_weights must set to True cal_metric_during_train: True pretrained_model: ./en_number_mobile_v2.0_rec_train/best_accuracy.pdparams checkpoints: save_inference_dir: use_visualdl: False infer_img: # for data or label process character_dict_path: ppocr/utils/en_dict.txt max_text_length: 25 infer_mode: False use_space_char: TrueOptimizer: name: Adam beta1: 0.9 beta2: 0.999 lr: name: Cosine learning_rate: 0.001 regularizer: name: 'L2' factor: 0.00001Architecture: model_type: rec algorithm: CRNN Transform: Backbone: name: MobileNetV3 scale: 0.5 model_name: small small_stride: [1, 2, 2, 2] Neck: name: SequenceEncoder encoder_type: rnn hidden_size: 48 Head: name: CTCHead fc_decay: 0.00001Loss: name: CTCLossPostProcess: name: CTCLabelDecodeMetric: name: RecMetric main_indicator: accTrain: dataset: name: SimpleDataSet data_dir: /home/aistudio/data/street_code_rec_data/mchar_train label_file_list: [”/home/aistudio/data/street_code_rec_data/mchar_train.csv“] transforms: - DecodeImage: # load image img_mode: BGR channel_first: False - RecAug: - CTCLabelEncode: # Class handling label - RecResizeImg: image_shape: [3, 32, 320] - KeepKeys: keep_keys: ['image', 'label', 'length'] # dataloader will return list in this order loader: shuffle: True batch_size_per_card: 256 drop_last: True num_workers: 8Eval: dataset: name: SimpleDataSet data_dir: /home/aistudio/data/street_code_rec_data/mchar_val label_file_list: [”/home/aistudio/data/street_code_rec_data/mchar_val.csv“] transforms: - DecodeImage: # load image img_mode: BGR channel_first: False - CTCLabelEncode: # Class handling label - RecResizeImg: image_shape: [3, 32, 320] - KeepKeys: keep_keys: ['image', 'label', 'length'] # dataloader will return list in this order loader: shuffle: False drop_last: False batch_size_per_card: 256 num_workers: 8登录后复制In [1]
# 已配置好的文件,直接覆盖替换(-f)!cp -f ~/rec_en_number_lite_train.yml ~/PaddleOCR/configs/rec/multi_language/rec_en_number_lite_train.yml登录后复制
6.使用预训练模型
据悉使用预训练模型,训练速度更快!!!
PaddleOCR提供的可下载模型包括推理模型、训练模型、预训练模型、slim模型,模型区别说明如下:
各个模型的关系如下面的示意图所示。
文本检测模型
英文识别模型
%cd ~/PaddleOCR/# mobile模型!wget https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/en_number_mobile_v2.0_rec_train.tar!tar -xf en_number_mobile_v2.0_rec_train.tar登录后复制
/home/aistudio/PaddleOCR--2022-01-02 00:10:41-- https://paddleocr.bj.bcebos.com/dygraph_v2.0/multilingual/en_number_mobile_v2.0_rec_train.tarResolving paddleocr.bj.bcebos.com (paddleocr.bj.bcebos.com)... 182.61.200.229, 182.61.200.195, 2409:8c04:1001:1002:0:ff:b001:368aConnecting to paddleocr.bj.bcebos.com (paddleocr.bj.bcebos.com)|182.61.200.229|:443... connected.HTTP request sent, awaiting response... 200 OKLength: 9123840 (8.7M) [application/x-tar]Saving to: ‘en_number_mobile_v2.0_rec_train.tar’en_number_mobile_v2 100%[===================>] 8.70M 8.63MB/s in 1.0s 2022-01-02 00:10:42 (8.63 MB/s) - ‘en_number_mobile_v2.0_rec_train.tar’ saved [9123840/9123840]登录后复制
五、训练
In [?]%cd ~/PaddleOCR/# mobile模型!python tools/train.py -c ./configs/rec/multi_language/rec_en_number_lite_train.yml -o Global.checkpoints=./output/rec_en_number_lite/latest登录后复制
1.选择合适的batch size
2.训练日志
2022/01/02 01:28:23] root INFO: save model in ./output/rec_en_number_lite/latest[2022/01/02 01:28:23] root INFO: Initialize indexs of datasets:['/home/aistudio/data/street_code_rec_data/mchar_train.csv'][2022/01/02 01:28:54] root INFO: epoch: [27/500], iter: 180, lr: 0.000986, loss: 1.043328, acc: 0.765624, norm_edit_dis: 0.863509, reader_cost: 2.26051 s, batch_cost: 2.59590 s, samples: 7168, ips: 276.12724[2022/01/02 01:29:18] root INFO: epoch: [27/500], iter: 190, lr: 0.000986, loss: 1.056450, acc: 0.765624, norm_edit_dis: 0.864510, reader_cost: 1.18228 s, batch_cost: 1.65932 s, samples: 10240, ips: 617.12064[2022/01/02 01:29:34] root INFO: epoch: [27/500], iter: 200, lr: 0.000985, loss: 1.069025, acc: 0.759277, norm_edit_dis: 0.860254, reader_cost: 0.74316 s, batch_cost: 1.15521 s, samples: 10240, ips: 886.42030eval model:: 100%|██████████████████████████████| 10/10 [00:07<00:00, 2.12it/s][2022/01/02 01:29:42] root INFO: cur metric, acc: 0.6261999373800062, norm_edit_dis: 0.7362716930394972, fps: 4054.7339744968563[2022/01/02 01:29:42] root INFO: save best model is to ./output/rec_en_number_lite/best_accuracy[2022/01/02 01:29:42] root INFO: best metric, acc: 0.6261999373800062, start_epoch: 21, norm_edit_dis: 0.7362716930394972, fps: 4054.7339744968563, best_epoch: 27登录后复制
3. visualdl可视化
- 本地安装visualdl?pip install visualdl
- 下载日志至本地
- 启动visualdl可视化??visualdl --logdir ./
- 打开浏览器查看??http://localhost:8040/
六、模型评估
In [?]# GPU 评估, Global.checkpoints 为待测权重%cd ~/PaddleOCR/# mobile模型!python -m paddle.distributed.launch tools/eval.py -c ./configs/rec/multi_language/rec_en_number_lite_train.yml -o Global.checkpoints=./output/rec_en_number_lite/best_accuracy.pdparams登录后复制
/home/aistudio/PaddleOCR----------- Configuration Arguments -----------backend: autoelastic_server: Noneforce: Falsegpus: Noneheter_devices: heter_worker_num: Noneheter_workers: host: Nonehttp_port: Noneips: 127.0.0.1job_id: Nonelog_dir: lognp: Nonenproc_per_node: Nonerun_mode: Nonescale: 0server_num: Noneservers: training_script: tools/eval.pytraining_script_args: ['-c', './configs/rec/multi_language/rec_en_number_lite_train.yml', '-o', 'Global.checkpoints=./output/rec_en_number_lite/best_accuracy.pdparams']worker_num: Noneworkers: ------------------------------------------------WARNING 2022-01-02 01:32:26,892 launch.py:423] Not found distinct arguments and compiled with cuda or xpu. Default use collective modelaunch train in GPU mode!INFO 2022-01-02 01:32:26,894 launch_utils.py:528] Local start 1 processes. First process distributed environment info (Only For Debug): +=======================================================================================+ | Distributed Envs Value | +---------------------------------------------------------------------------------------+ | PADDLE_TRAINER_ID 0 | | PADDLE_CURRENT_ENDPOINT 127.0.0.1:33420 | | PADDLE_TRAINERS_NUM 1 | | PADDLE_TRAINER_ENDPOINTS 127.0.0.1:33420 | | PADDLE_RANK_IN_NODE 0 | | PADDLE_LOCAL_DEVICE_IDS 0 | | PADDLE_WORLD_DEVICE_IDS 0 | | FLAGS_selected_gpus 0 | | FLAGS_selected_accelerators 0 | +=======================================================================================+INFO 2022-01-02 01:32:26,894 launch_utils.py:532] details abouts PADDLE_TRAINER_ENDPOINTS can be found in log/endpoints.log, and detail running logs maybe found in log/workerlog.0launch proc_id:1384 idx:0[2022/01/02 01:32:28] root INFO: Architecture : [2022/01/02 01:32:28] root INFO: Backbone : [2022/01/02 01:32:28] root INFO: model_name : small[2022/01/02 01:32:28] root INFO: name : MobileNetV3[2022/01/02 01:32:28] root INFO: scale : 0.5[2022/01/02 01:32:28] root INFO: small_stride : [1, 2, 2, 2][2022/01/02 01:32:28] root INFO: Head : [2022/01/02 01:32:28] root INFO: fc_decay : 1e-05[2022/01/02 01:32:28] root INFO: name : CTCHead[2022/01/02 01:32:28] root INFO: Neck : [2022/01/02 01:32:28] root INFO: encoder_type : rnn[2022/01/02 01:32:28] root INFO: hidden_size : 48[2022/01/02 01:32:28] root INFO: name : SequenceEncoder[2022/01/02 01:32:28] root INFO: Transform : None[2022/01/02 01:32:28] root INFO: algorithm : CRNN[2022/01/02 01:32:28] root INFO: model_type : rec[2022/01/02 01:32:28] root INFO: Eval : [2022/01/02 01:32:28] root INFO: dataset : [2022/01/02 01:32:28] root INFO: data_dir : /home/aistudio/data/street_code_rec_data/mchar_val[2022/01/02 01:32:28] root INFO: label_file_list : ['/home/aistudio/data/street_code_rec_data/mchar_val.csv'][2022/01/02 01:32:28] root INFO: name : SimpleDataSet[2022/01/02 01:32:28] root INFO: transforms : [2022/01/02 01:32:28] root INFO: DecodeImage : [2022/01/02 01:32:28] root INFO: channel_first : False[2022/01/02 01:32:28] root INFO: img_mode : BGR[2022/01/02 01:32:28] root INFO: CTCLabelEncode : None[2022/01/02 01:32:28] root INFO: RecResizeImg : [2022/01/02 01:32:28] root INFO: image_shape : [3, 32, 320][2022/01/02 01:32:28] root INFO: KeepKeys : [2022/01/02 01:32:28] root INFO: keep_keys : ['image', 'label', 'length'][2022/01/02 01:32:28] root INFO: loader : [2022/01/02 01:32:28] root INFO: batch_size_per_card : 1024[2022/01/02 01:32:28] root INFO: drop_last : False[2022/01/02 01:32:28] root INFO: num_workers : 8[2022/01/02 01:32:28] root INFO: shuffle : False[2022/01/02 01:32:28] root INFO: Global : [2022/01/02 01:32:28] root INFO: cal_metric_during_train : True[2022/01/02 01:32:28] root INFO: character_dict_path : ppocr/utils/en_dict.txt[2022/01/02 01:32:28] root INFO: checkpoints : ./output/rec_en_number_lite/best_accuracy.pdparams[2022/01/02 01:32:28] root INFO: debug : False[2022/01/02 01:32:28] root INFO: distributed : False[2022/01/02 01:32:28] root INFO: epoch_num : 500[2022/01/02 01:32:28] root INFO: eval_batch_step : [100, 100][2022/01/02 01:32:28] root INFO: infer_img : None[2022/01/02 01:32:28] root INFO: infer_mode : False[2022/01/02 01:32:28] root INFO: log_smooth_window : 20[2022/01/02 01:32:28] root INFO: max_text_length : 25[2022/01/02 01:32:28] root INFO: pretrained_model : ./en_number_mobile_v2.0_rec_train/best_accuracy.pdparams[2022/01/02 01:32:28] root INFO: print_batch_step : 10[2022/01/02 01:32:28] root INFO: save_epoch_step : 3[2022/01/02 01:32:28] root INFO: save_inference_dir : None[2022/01/02 01:32:28] root INFO: save_model_dir : ./output/rec_en_number_lite[2022/01/02 01:32:28] root INFO: use_gpu : True[2022/01/02 01:32:28] root INFO: use_space_char : True[2022/01/02 01:32:28] root INFO: use_visualdl : False[2022/01/02 01:32:28] root INFO: Loss : [2022/01/02 01:32:28] root INFO: name : CTCLoss[2022/01/02 01:32:28] root INFO: Metric : [2022/01/02 01:32:28] root INFO: main_indicator : acc[2022/01/02 01:32:28] root INFO: name : RecMetric[2022/01/02 01:32:28] root INFO: Optimizer : [2022/01/02 01:32:28] root INFO: beta1 : 0.9[2022/01/02 01:32:28] root INFO: beta2 : 0.999[2022/01/02 01:32:28] root INFO: lr : [2022/01/02 01:32:28] root INFO: learning_rate : 0.001[2022/01/02 01:32:28] root INFO: name : Cosine[2022/01/02 01:32:28] root INFO: name : Adam[2022/01/02 01:32:28] root INFO: regularizer : [2022/01/02 01:32:28] root INFO: factor : 1e-05[2022/01/02 01:32:28] root INFO: name : L2[2022/01/02 01:32:28] root INFO: PostProcess : [2022/01/02 01:32:28] root INFO: name : CTCLabelDecode[2022/01/02 01:32:28] root INFO: Train : [2022/01/02 01:32:28] root INFO: dataset : [2022/01/02 01:32:28] root INFO: data_dir : /home/aistudio/data/street_code_rec_data/mchar_train[2022/01/02 01:32:28] root INFO: label_file_list : ['/home/aistudio/data/street_code_rec_data/mchar_train.csv'][2022/01/02 01:32:28] root INFO: name : SimpleDataSet[2022/01/02 01:32:28] root INFO: transforms : [2022/01/02 01:32:28] root INFO: DecodeImage : [2022/01/02 01:32:28] root INFO: channel_first : False[2022/01/02 01:32:28] root INFO: img_mode : BGR[2022/01/02 01:32:28] root INFO: RecAug : None[2022/01/02 01:32:28] root INFO: CTCLabelEncode : None[2022/01/02 01:32:28] root INFO: RecResizeImg : [2022/01/02 01:32:28] root INFO: image_shape : [3, 32, 320][2022/01/02 01:32:28] root INFO: KeepKeys : [2022/01/02 01:32:28] root INFO: keep_keys : ['image', 'label', 'length'][2022/01/02 01:32:28] root INFO: loader : [2022/01/02 01:32:28] root INFO: batch_size_per_card : 1024[2022/01/02 01:32:28] root INFO: drop_last : True[2022/01/02 01:32:28] root INFO: num_workers : 8[2022/01/02 01:32:28] root INFO: shuffle : True[2022/01/02 01:32:28] root INFO: profiler_options : None[2022/01/02 01:32:28] root INFO: train with paddle 2.2.1 and device CUDAPlace(0)[2022/01/02 01:32:28] root INFO: Initialize indexs of datasets:['/home/aistudio/data/street_code_rec_data/mchar_val.csv']W0102 01:32:28.580307 1384 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.1, Runtime API Version: 10.1W0102 01:32:28.584791 1384 device_context.cc:465] device: 0, cuDNN Version: 7.6.[2022/01/02 01:32:33] root INFO: resume from ./output/rec_en_number_lite/best_accuracy[2022/01/02 01:32:33] root INFO: metric in ckpt ***************[2022/01/02 01:32:33] root INFO: acc:0.6261999373800062[2022/01/02 01:32:33] root INFO: start_epoch:28[2022/01/02 01:32:33] root INFO: norm_edit_dis:0.7362716930394972[2022/01/02 01:32:33] root INFO: fps:4054.7339744968563[2022/01/02 01:32:33] root INFO: best_epoch:27eval model:: 0%| | 0/10 [00:00<?, ?it/s]eval model:: 10%|█ | 1/10 [00:03<00:32, 3.65s/it]eval model:: 20%|██ | 2/10 [00:04<00:21, 2.67s/it]eval model:: 30%|███ | 3/10 [00:04<00:14, 2.01s/it]eval model:: 40%|████ | 4/10 [00:04<00:09, 1.51s/it]eval model:: 50%|█████ | 5/10 [00:05<00:05, 1.16s/it]eval model:: 60%|██████ | 6/10 [00:05<00:03, 1.10it/s]eval model:: 70%|███████ | 7/10 [00:05<00:02, 1.35it/s]eval model:: 80%|████████ | 8/10 [00:06<00:01, 1.61it/s]eval model:: 90%|█████████ | 9/10 [00:06<00:00, 1.87it/s]eval model:: 100%|██████████| 10/10 [00:06<00:00, 2.20it/s][2022/01/02 01:32:40] root INFO: metric eval ***************[2022/01/02 01:32:40] root INFO: acc:0.6261999373800062[2022/01/02 01:32:40] root INFO: norm_edit_dis:0.7362716930394972[2022/01/02 01:32:40] root INFO: fps:4274.752272925087INFO 2022-01-02 01:32:41,951 launch.py:311] Local processes completed.登录后复制
七、结果预测
预测脚本使用预测训练好的模型,并将结果保存成txt格式,可以直接送到比赛提交入口测评,文件默认保存在output/rec/predicts_chinese_lite_v2.0.txt
1.提交内容与格式
本次比赛要求参赛选手必须提交使用深度学习平台飞桨(PaddlePaddle)训练的模型。参赛者要求以.txt 文本格式提交结果,其中每一行是图片名称和文字预测的结果,中间以 “t” 作为分割符,示例如下:
2. infer_rec.py修改
with open(save_res_path, ”w“) as fout: # 添加列头 fout.write('file_name' + ”,“ + 'file_code' +'n') for file in get_image_file_list(config['Global']['infer_img']): logger.info(”infer_img: {}“.format(file)) with open(file, 'rb') as f: img = f.read() data = {'image': img} batch = transform(data, ops) if config['Architecture']['algorithm'] == ”SRN“: encoder_word_pos_list = np.expand_dims(batch[1], axis=0) gsrm_word_pos_list = np.expand_dims(batch[2], axis=0) gsrm_slf_attn_bias1_list = np.expand_dims(batch[3], axis=0) gsrm_slf_attn_bias2_list = np.expand_dims(batch[4], axis=0) others = [ paddle.to_tensor(encoder_word_pos_list), paddle.to_tensor(gsrm_word_pos_list), paddle.to_tensor(gsrm_slf_attn_bias1_list), paddle.to_tensor(gsrm_slf_attn_bias2_list) ] if config['Architecture']['algorithm'] == ”SAR“: valid_ratio = np.expand_dims(batch[-1], axis=0) img_metas = [paddle.to_tensor(valid_ratio)] images = np.expand_dims(batch[0], axis=0) images = paddle.to_tensor(images) if config['Architecture']['algorithm'] == ”SRN“: preds = model(images, others) elif config['Architecture']['algorithm'] == ”SAR“: preds = model(images, img_metas) else: preds = model(images) post_result = post_process_class(preds) info = None if isinstance(post_result, dict): rec_info = dict() for key in post_result: if len(post_result[key][0]) >= 2: rec_info[key] = { ”label“: post_result[key][0][0], ”score“: float(post_result[key][0][1]), } info = json.dumps(rec_info) else: if len(post_result[0]) >= 2: info = post_result[0][0] + ”t“ + str(post_result[0][1]) if info is not None: logger.info(”t result: {}“.format(info)) fout.write(file + ”,“ + post_result[0][0] +'n') logger.info(”success!“)登录后复制In [?]
%cd ~/PaddleOCR/# mobile模型!python tools/infer_rec.py -c configs/rec/multi_language/rec_en_number_lite_train.yml -o Global.infer_img=”/home/aistudio/data/street_code_rec_data/mchar_test_a“ Global.checkpoints=./output/rec_en_number_lite/best_accuracy.pdparams登录后复制
预测日志
[2022/01/02 02:01:08] root INFO: result: 21230.9544541[2022/01/02 02:01:08] root INFO: infer_img: /home/aistudio/data/street_code_rec_data/mchar_test_a/039996.png[2022/01/02 02:01:08] root INFO: result: 3410.8990403[2022/01/02 02:01:08] root INFO: infer_img: /home/aistudio/data/street_code_rec_data/mchar_test_a/039997.png[2022/01/02 02:01:08] root INFO: result: 1670.95185596[2022/01/02 02:01:08] root INFO: infer_img: /home/aistudio/data/street_code_rec_data/mchar_test_a/039998.png[2022/01/02 02:01:08] root INFO: result: 2350.9978804[2022/01/02 02:01:08] root INFO: infer_img: /home/aistudio/data/street_code_rec_data/mchar_test_a/039999.png[2022/01/02 02:01:08] root INFO: result: 9100.93325263[2022/01/02 02:01:08] root INFO: success!......登录后复制
八、基于预测引擎的预测
1.模型大小限制
约束性条件1:模型总大小不超过10MB(以.pdmodel和.pdiparams文件非压缩状态磁盘占用空间之和为准);
2.解决办法
训练过程中保存的模型是checkpoints模型,保存的只有模型的参数,多用于恢复训练等。实际上,此处的约束条件限制的是inference 模型的大小。inference 模型一般是模型训练,把模型结构和模型参数保存在文件中的固化模型,多用于预测部署场景。与checkpoints模型相比,inference 模型会额外保存模型的结构信息,在预测部署、加速推理上性能优越,灵活方便,适合于实际系统集成,模型大小也会小一些。
In [?]# 静态模型导出%cd ~/PaddleOCR/# mobile模型!python tools/export_model.py -c configs/rec/multi_language/rec_en_number_lite_train.yml -o Global.checkpoints=./output/rec_en_number_lite/best_accuracy.pdparams Global.save_inference_dir=./inference/rec_inference/登录后复制
/home/aistudio/PaddleOCRW0102 02:06:39.026404 4766 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.1, Runtime API Version: 10.1W0102 02:06:39.030951 4766 device_context.cc:465] device: 0, cuDNN Version: 7.6.[2022/01/02 02:06:43] root INFO: resume from ./output/rec_en_number_lite/best_accuracy[2022/01/02 02:06:45] root INFO: inference model is saved to ./inference/rec_inference/inference登录后复制In [?]
%cd ~/PaddleOCR/!du -sh ./inference/rec_inference/登录后复制
/home/aistudio/PaddleOCR2.8M./inference/rec_inference/登录后复制
- 可以看到,当前训练使用的CRNN算法导出inference后,仅有2.8M。
- 导出的inference模型也可以用来预测,预测逻辑如下代码所示。
# 使用导出静态模型预测%cd ~/PaddleOCR/!python3.7 tools/infer/predict_rec.py --rec_model_dir=./inference/rec_inference/ --image_dir=”/home/aistudio/data/street_code_rec_data/mchar_test_a“登录后复制
预测日志
[2022/01/02 02:08:37] root INFO: Predicts of /home/aistudio/data/street_code_rec_data/mchar_test_a/012500.png:('疗绚娇', 0.71012855)[2022/01/02 02:08:37] root INFO: Predicts of /home/aistudio/data/street_code_rec_data/mchar_test_a/012501.png:('绚诚', 0.9246478)[2022/01/02 02:08:37] root INFO: Predicts of /home/aistudio/data/street_code_rec_data/mchar_test_a/012502.png:('溜', 0.93994504)[2022/01/02 02:08:37] root INFO: Predicts of /home/aistudio/data/street_code_rec_data/mchar_test_a/012503.png:('诚溜', 0.95832443)[2022/01/02 02:08:37] root INFO: Predicts of /home/aistudio/data/street_code_rec_data/mchar_test_a/012504.png:('溜溜', 0.87103844)[2022/01/02 02:08:37] root INFO: Predicts of /home/aistudio/data/street_code_rec_data/mchar_test_a/012505.png:('贿', 0.34199885)[2022/01/02 02:08:37] root INFO: Predicts of /home/aistudio/data/street_code_rec_data/mchar_test_a/012506.png:('题', 0.9996681)[2022/01/02 02:08:37] root INFO: Predicts of /home/aistudio/data/street_code_rec_data/mchar_test_a/012507.png:('绚绚', 0.9908391)[2022/01/02 02:08:37] root INFO: Predicts of /home/aistudio/data/street_code_rec_data/mchar_test_a/012508.png:('绚', 0.58176464)......登录后复制
九、提交
预测结果保存到配置文件指定的 output/rec/predicts_chinese_lite_v2.0.txt文件,可直接提交即可。
In [28]%cd ~!head PaddleOCR/output/rec/predicts_rec.txt登录后复制
/home/aistudiofile_name,file_code/home/aistudio/data/street_code_rec_data/mchar_test_a/000000.png,59/home/aistudio/data/street_code_rec_data/mchar_test_a/000001.png,290/home/aistudio/data/street_code_rec_data/mchar_test_a/000002.png,113/home/aistudio/data/street_code_rec_data/mchar_test_a/000003.png,97/home/aistudio/data/street_code_rec_data/mchar_test_a/000004.png,63/home/aistudio/data/street_code_rec_data/mchar_test_a/000005.png,39/home/aistudio/data/street_code_rec_data/mchar_test_a/000006.png,126/home/aistudio/data/street_code_rec_data/mchar_test_a/000007.png,1475/home/aistudio/data/street_code_rec_data/mchar_test_a/000008.png,48登录后复制
随便跑跑82分,大家可以再处理处理,把检测数据也用上,优化优化,多跑几轮,一定可以取得更好的成绩。
福利游戏
相关文章
更多-
- nef 格式图片降噪处理用什么工具 效果如何
- 时间:2025-07-29
-
- 邮箱长时间未登录被注销了能恢复吗?
- 时间:2025-07-29
-
- Outlook收件箱邮件不同步怎么办?
- 时间:2025-07-29
-
- 为什么客户端收邮件总是延迟?
- 时间:2025-07-29
-
- 一英寸在磁带宽度中是多少 老式设备规格
- 时间:2025-07-29
-
- 大卡和年龄的关系 不同年龄段热量需求
- 时间:2025-07-29
-
- jif 格式是 gif 的变体吗 现在还常用吗
- 时间:2025-07-29
-
- hdr 格式图片在显示器上能完全显示吗 普通显示器有局限吗
- 时间:2025-07-29
大家都在玩
大家都在看
更多-
- 数据代币领域有哪些值得买的加密货币?
- 时间:2025-08-12
-
- 手机如何设置币安APP推送通知,及时掌握行情
- 时间:2025-08-12
-
- 怎么免费获取苹果ID下载必安APP
- 时间:2025-08-12
-
- 如何关闭币安APP后台自动刷新,节省流量电量
- 时间:2025-08-12
-
- 安卓用户如何从Google Play安全下载币安APP
- 时间:2025-08-12
-
- 币安APP无法接收短信验证码的解决方法
- 时间:2025-08-12
-
- Web 3.0是什么?它与Web 2.0的区别是什么?
- 时间:2025-08-12
-
- Gala (GALA币) 是什么?GALA 代币长期价格预测2025–2050年
- 时间:2025-08-12