Why Stop at Words?
Unveiling the Bigger Picture through
Line-Level OCR

Shashank Vempati*, Nishit Anand*, Gaurav Talebailkar, Arpan Garai, Chetan Arora

Indian Institute of Technology Delhi, India

*Joint first authors with equal contribution

Correspondence: aiy227509@iitd.ac.in, nishit.cstaff@iitd.ac.in

5.4% Accuracy Improvement
Efficiency Improvement
251 Annotated Page Dataset

Abstract

Traditional optical character recognition (OCR) systems have evolved from character-level segmentation to word-level approaches that utilize sequence-to-sequence models for improved language model integration. However, this progression has shifted the accuracy bottleneck to word detection, which remains error-prone and computationally intensive.

This paper presents a novel line-level OCR paradigm that bypasses word detection entirely, enabling direct recognition of complete text lines. Our approach provides enhanced sentence-level context for language models while simultaneously improving both accuracy and computational efficiency. The method addresses fundamental limitations in existing word-based OCR pipelines by eliminating intermediate segmentation steps.

We contribute a comprehensive dataset of 251 English document pages with line-level annotations to support this research direction. Experimental evaluation demonstrates a 5.4% accuracy improvement and 4× efficiency gain compared to state-of-the-art word-based methods, establishing the effectiveness of line-level processing for document image analysis.

Pipeline Comparison

Pipeline Comparison Diagram

Comparison between existing and proposed pipelines for OCR

Key Contributions

Improved Accuracy

5.4% end-to-end accuracy improvement by eliminating word detection errors and leveraging larger sentence context

Enhanced Efficiency

4× efficiency improvement by removing the word detection step from the pipeline

New Dataset

Curated dataset of 251 English pages with line-level annotations for training and benchmarking

Context Utilization

Better exploitation of language models with larger sentence-level context for recognition

Qualitative Results

Challenging examples demonstrating the superiority of line-level OCR over traditional word-based pipelines

Example 1: Challenging text with errors

Complex Text Recognition

Ground Truth: eliminating any need for a milling machine
CRAFT+ABINet: eliminating any need.fora.miling.mahhiee
PARSeqline (Ours): eliminating any need for a milling machine,
Example 2: Word spacing challenges

Word Spacing Challenges

Ground Truth: we will watch over his children.
DBNet+PARSeq: wewillwatch overhis children
PARSeqline (Ours): we will watch over his children

Quantitative Results

System Comparison

OCR System CRR (Character Recognition Rate) (%) Flex Character Accuracy (%)
PP-OCR 78.96 87.72
Tesseract 88.16 88.41
DocTR 83.52 97.14
Ours(Kraken + PARSeqline) 85.76 97.62

Accuracy Comparison

Word Detection Word Recognition CRR (Character Recognition Rate) (%) Flex Character Accuracy (%) Inference Time (s)
DBNet PARSeq 81.94 89.66 05.64
DBNet ABINet 80.72 88.48 05.91
DBNet MATRN 79.66 85.63 06.39
DBNet CCD 75.78 83.41 14.27
DBNet MAERec 81.26 89.26 16.20
DBNet SIGA 77.36 86.13 06.92
DPText PARSeq 83.02 91.75 07.08
DPText ABINet 82.43 90.88 07.35
DPText MATRN 81.04 87.82 07.83
DPText CCD 79.69 88.23 15.71
DPText MAERec 82.75 91.48 17.64
DPText SIGA 79.66 88.76 08.45
TextFuseNet PARSeq 80.48 89.57 29.29
TextFuseNet ABINet 80.02 89.05 29.57
TextFuseNet MATRN 78.80 86.24 30.05
TextFuseNet CCD 77.35 85.90 37.93
TextFuseNet MAERec 77.97 87.18 39.86
TextFuseNet SIGA 77.80 86.74 30.63
CRAFT PARSeq 83.46 92.15 02.11
CRAFT ABINet 82.68 91.30 02.39
CRAFT MATRN 81.03 87.49 02.88
CRAFT CCD 80.04 88.59 10.76
CRAFT MAERec 83.01 91.83 12.69
CRAFT SIGA 77.33 86.22 03.35
MixNet PARSeq 82.52 91.25 10.38
MixNet ABINet 81.86 90.34 10.65
MixNet MATRN 80.41 87.08 11.13
MixNet CCD 75.67 83.10 19.01
MixNet MAERec 82.19 90.92 20.94
MixNet SIGA 79.25 88.39 11.72
- PARSeqline (Ours) 85.76 97.62 00.53

Efficiency Improvements

Faster Processing
5.4% Accuracy Boost

Dataset

Line-Level OCR Dataset

We contribute a meticulously curated dataset of 251 English pages with line-level annotations. This dataset fills a crucial gap in the literature as no public dataset existed for training and benchmarking line-level OCR systems.

251 English document pages
Line-level annotations
Diverse document types
Publicly available
Download Dataset

Sample Images from Dataset

Dataset Sample 1: Document page with text

Sample 1

Dataset Sample 2: Document with complex formatting

Sample 2

Dataset Sample 3: Multi-column document

Sample 3

Document Example: Full page OCR

Sample 4

Citation

@article{vempati_anand2025lineocr,
  title={Why Stop at Words? Unveiling the Bigger Picture through Line-Level OCR},
  author={Vempati, Shashank and Anand, Nishit and Talebailkar, Gaurav and Garai, Arpan and Arora, Chetan},
  journal={arXiv preprint},
  year={2025},
  institution={Indian Institute of Technology Delhi}
}