Scene Text Recognition With Deep Learning Methods In Farsi.
- Install Dependencies
$ pip install -r requirements.txt
- Download Pretrained Weights Here
Fig. 1: Model architectur.
- Project Structure
.
├── src
│ ├── nn
│ │ ├── feature_extractor.py
│ │ ├── layers.py
│ │ └── ocr_model.py
│ └── utils
│ ├── dataset.py
│ ├── labelConverter.py
│ ├── loss_calculator.py
│ ├── misc.py
│ ├── trainUtils.py
│ └── transforms.py
├── config.py
└── train.py
- place dataset path in
config.py
file.
ds_path = {
"train_ds" : "path/to/train/dataset",
"test_ds" : "path/to/test/dataset",
}
- DataSet Structure (each image must eventually contain a word)
.
├── Images
│ ├── img_1.jpg
│ ├── img_2.jpg
│ ├── img_3.jpg
│ ├── img_4.jpg
│ └── img_5.jpg
│ ...
└── labels.json
labels.json
Contents
{"img_1": "بالا", "img_2": "و", "img_3": "بدانند", "img_4": "چندین", "img_5": "به", ...}
Denote the training dataset by
This function calculates a cost from an image and its word label, and the modules in the framework are trained end-to-end manner.
Fig. 1: Model Training History.
CTC takes a sequence
where
Model | Input Size | Recall | Precision | F1 | Params | Speed(img/s) |
---|---|---|---|---|---|---|
|
|
|
|
|
|
|
- What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis
- An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition
- Text recognition (optical character recognition) with deep learning methods, ICCV 2019
Project is distributed under MIT License