Traditional optical character recognition (OCR) systems have evolved from character-level segmentation to word-level approaches that utilize sequence-to-sequence models for improved language model integration. However, this progression has shifted the accuracy bottleneck to word detection, which remains error-prone and computationally intensive.
This paper presents a novel line-level OCR paradigm that bypasses word detection entirely, enabling direct recognition of complete text lines. Our approach provides enhanced sentence-level context for language models while simultaneously improving both accuracy and computational efficiency. The method addresses fundamental limitations in existing word-based OCR pipelines by eliminating intermediate segmentation steps.
We contribute a comprehensive dataset of 251 English document pages with line-level annotations to support this research direction. Experimental evaluation demonstrates a 5.4% accuracy improvement and 4× efficiency gain compared to state-of-the-art word-based methods, establishing the effectiveness of line-level processing for document image analysis.
Comparison between existing and proposed pipelines for OCR
5.4% end-to-end accuracy improvement by eliminating word detection errors and leveraging larger sentence context
4× efficiency improvement by removing the word detection step from the pipeline
Curated dataset of 251 English pages with line-level annotations for training and benchmarking
Better exploitation of language models with larger sentence-level context for recognition
Challenging examples demonstrating the superiority of line-level OCR over traditional word-based pipelines
OCR System | CRR (Character Recognition Rate) (%) | Flex Character Accuracy (%) |
---|---|---|
PP-OCR | 78.96 | 87.72 |
Tesseract | 88.16 | 88.41 |
DocTR | 83.52 | 97.14 |
Ours(Kraken + PARSeqline) | 85.76 | 97.62 |
Word Detection | Word Recognition | CRR (Character Recognition Rate) (%) | Flex Character Accuracy (%) | Inference Time (s) |
---|---|---|---|---|
DBNet | PARSeq | 81.94 | 89.66 | 05.64 |
DBNet | ABINet | 80.72 | 88.48 | 05.91 |
DBNet | MATRN | 79.66 | 85.63 | 06.39 |
DBNet | CCD | 75.78 | 83.41 | 14.27 |
DBNet | MAERec | 81.26 | 89.26 | 16.20 |
DBNet | SIGA | 77.36 | 86.13 | 06.92 |
DPText | PARSeq | 83.02 | 91.75 | 07.08 |
DPText | ABINet | 82.43 | 90.88 | 07.35 |
DPText | MATRN | 81.04 | 87.82 | 07.83 |
DPText | CCD | 79.69 | 88.23 | 15.71 |
DPText | MAERec | 82.75 | 91.48 | 17.64 |
DPText | SIGA | 79.66 | 88.76 | 08.45 |
TextFuseNet | PARSeq | 80.48 | 89.57 | 29.29 |
TextFuseNet | ABINet | 80.02 | 89.05 | 29.57 |
TextFuseNet | MATRN | 78.80 | 86.24 | 30.05 |
TextFuseNet | CCD | 77.35 | 85.90 | 37.93 |
TextFuseNet | MAERec | 77.97 | 87.18 | 39.86 |
TextFuseNet | SIGA | 77.80 | 86.74 | 30.63 |
CRAFT | PARSeq | 83.46 | 92.15 | 02.11 |
CRAFT | ABINet | 82.68 | 91.30 | 02.39 |
CRAFT | MATRN | 81.03 | 87.49 | 02.88 |
CRAFT | CCD | 80.04 | 88.59 | 10.76 |
CRAFT | MAERec | 83.01 | 91.83 | 12.69 |
CRAFT | SIGA | 77.33 | 86.22 | 03.35 |
MixNet | PARSeq | 82.52 | 91.25 | 10.38 |
MixNet | ABINet | 81.86 | 90.34 | 10.65 |
MixNet | MATRN | 80.41 | 87.08 | 11.13 |
MixNet | CCD | 75.67 | 83.10 | 19.01 |
MixNet | MAERec | 82.19 | 90.92 | 20.94 |
MixNet | SIGA | 79.25 | 88.39 | 11.72 |
- | PARSeqline (Ours) | 85.76 | 97.62 | 00.53 |
We contribute a meticulously curated dataset of 251 English pages with line-level annotations. This dataset fills a crucial gap in the literature as no public dataset existed for training and benchmarking line-level OCR systems.
Sample 1
Sample 2
Sample 3
Sample 4
@article{vempati_anand2025lineocr,
title={Why Stop at Words? Unveiling the Bigger Picture through Line-Level OCR},
author={Vempati, Shashank and Anand, Nishit and Talebailkar, Gaurav and Garai, Arpan and Arora, Chetan},
journal={arXiv preprint},
year={2025},
institution={Indian Institute of Technology Delhi}
}