Why Stop at Words?
Unveiling the Bigger Picture through
Line-Level OCR

Shashank Vempati*, Nishit Anand*, Gaurav Talebailkar, Arpan Garai, Chetan Arora

Indian Institute of Technology Delhi, India

*Joint first authors with equal contribution

Correspondence: aiy227509@iitd.ac.in, nishit.cstaff@iitd.ac.in

Paper Code Dataset

5.4% Accuracy Improvement

4× Efficiency Improvement

251 Annotated Page Dataset

Abstract

Traditional optical character recognition (OCR) systems have evolved from character-level segmentation to word-level approaches that utilize sequence-to-sequence models for improved language model integration. However, this progression has shifted the accuracy bottleneck to word detection, which remains error-prone and computationally intensive.

This paper presents a novel line-level OCR paradigm that bypasses word detection entirely, enabling direct recognition of complete text lines. Our approach provides enhanced sentence-level context for language models while simultaneously improving both accuracy and computational efficiency. The method addresses fundamental limitations in existing word-based OCR pipelines by eliminating intermediate segmentation steps.

We contribute a comprehensive dataset of 251 English document pages with line-level annotations to support this research direction. Experimental evaluation demonstrates a 5.4% accuracy improvement and 4× efficiency gain compared to state-of-the-art word-based methods, establishing the effectiveness of line-level processing for document image analysis.

Key Contributions

Improved Accuracy

5.4% end-to-end accuracy improvement by eliminating word detection errors and leveraging larger sentence context

Enhanced Efficiency

4× efficiency improvement by removing the word detection step from the pipeline

New Dataset

Curated dataset of 251 English pages with line-level annotations for training and benchmarking

Context Utilization

Better exploitation of language models with larger sentence-level context for recognition

Qualitative Results

Challenging examples demonstrating the superiority of line-level OCR over traditional word-based pipelines

Complex Text Recognition

Ground Truth: eliminating any need for a milling machine

CRAFT+ABINet: eliminating any need.fora.miling.mahhiee

PARSeq_line (Ours): eliminating any need for a milling machine,

Word Spacing Challenges

Ground Truth: we will watch over his children.

DBNet+PARSeq: wewillwatch overhis children

PARSeq_line (Ours): we will watch over his children

Quantitative Results

System Comparison

OCR System	CRR (Character Recognition Rate) (%)	Flex Character Accuracy (%)
PP-OCR	78.96	87.72
Tesseract	88.16	88.41
DocTR	83.52	97.14
Ours(Kraken + PARSeq_line)	85.76	97.62

Accuracy Comparison

Word Detection	Word Recognition	CRR (Character Recognition Rate) (%)	Flex Character Accuracy (%)	Inference Time (s)
DBNet	PARSeq	81.94	89.66	05.64
DBNet	ABINet	80.72	88.48	05.91
DBNet	MATRN	79.66	85.63	06.39
DBNet	CCD	75.78	83.41	14.27
DBNet	MAERec	81.26	89.26	16.20
DBNet	SIGA	77.36	86.13	06.92
DPText	PARSeq	83.02	91.75	07.08
DPText	ABINet	82.43	90.88	07.35
DPText	MATRN	81.04	87.82	07.83
DPText	CCD	79.69	88.23	15.71
DPText	MAERec	82.75	91.48	17.64
DPText	SIGA	79.66	88.76	08.45
TextFuseNet	PARSeq	80.48	89.57	29.29
TextFuseNet	ABINet	80.02	89.05	29.57
TextFuseNet	MATRN	78.80	86.24	30.05
TextFuseNet	CCD	77.35	85.90	37.93
TextFuseNet	MAERec	77.97	87.18	39.86
TextFuseNet	SIGA	77.80	86.74	30.63
CRAFT	PARSeq	83.46	92.15	02.11
CRAFT	ABINet	82.68	91.30	02.39
CRAFT	MATRN	81.03	87.49	02.88
CRAFT	CCD	80.04	88.59	10.76
CRAFT	MAERec	83.01	91.83	12.69
CRAFT	SIGA	77.33	86.22	03.35
MixNet	PARSeq	82.52	91.25	10.38
MixNet	ABINet	81.86	90.34	10.65
MixNet	MATRN	80.41	87.08	11.13
MixNet	CCD	75.67	83.10	19.01
MixNet	MAERec	82.19	90.92	20.94
MixNet	SIGA	79.25	88.39	11.72
-	PARSeq_line (Ours)	85.76	97.62	00.53

Efficiency Improvements

4× Faster Processing

5.4% Accuracy Boost

Dataset

Line-Level OCR Dataset

We contribute a meticulously curated dataset of 251 English pages with line-level annotations. This dataset fills a crucial gap in the literature as no public dataset existed for training and benchmarking line-level OCR systems.

251 English document pages

Line-level annotations

Diverse document types

Publicly available

Download Dataset

Sample Images from Dataset

Sample 1

Sample 2

Sample 3

Sample 4

Citation

@article{vempati_anand2025lineocr,
  title={Why Stop at Words? Unveiling the Bigger Picture through Line-Level OCR},
  author={Vempati, Shashank and Anand, Nishit and Talebailkar, Gaurav and Garai, Arpan and Arora, Chetan},
  journal={arXiv preprint},
  year={2025},
  institution={Indian Institute of Technology Delhi}
}

Why Stop at Words?Unveiling the Bigger Picture throughLine-Level OCR