Published: 2025-12-01

Optimization of Tesseract OCR for Automatic Text Extraction on Indonesian ID Cards (KTP) Through Image Quality Enhancement Using Preprocessing Techniques

DOI: 10.35870/ijsecs.v5i3.5183

No Cover Available

Downloads

Article Metrics
Share:

Abstract

Tesseract OCR ranks among the most widely adopted open-source tools for text extraction. Nevertheless, processing documents with degraded image quality—including blurry e-KTPs, low-contrast specimens, or those affected by uneven lighting—presents substantial challenges. We conducted experimental research to generate empirical data supporting the development of text detection systems for e-KTPs operating under non-ideal conditions. Our methodology involved testing 10 e-KTP images, each containing 15 text attributes, yielding 150 evaluated data points. Image preprocessing proceeded sequentially through grayscale conversion, denoising, contrast enhancement (CLAHE), and thresholding to improve image clarity prior to Tesseract OCR processing. We evaluated accuracy using confusion matrix analysis, emphasizing True Positive (TP), False Positive (FP), and False Negative (FN) metrics. Results demonstrate that preprocessing stages substantially improved text readability. Baseline OCR accuracy of 39.55% increased incrementally: +22.68% following grayscale conversion, +47.70% after denoising, +60.99% post-CLAHE application, and +19.62% after thresholding, culminating in 64.97% accuracy upon completing all preprocessing stages. Average TP values rose from 4 to 8 out of 15 attributes per image, while precision remained stable at 100% (FP = 0). Despite variable CLAHE performance across samples, preprocessing stages proved essential for OCR systems operating under degraded image conditions. Our work introduces a novel preprocessing pipeline tailored specifically to Indonesian e-KTP characteristics, providing quantitative benchmarks and systematic analysis that can inform the development of more adaptive digitalization and verification systems for population documents under real-world field conditions

Keywords

Optical Character Recognition ; Tesseract ; Pre-Processing ; Image Enhancement ; Confusion Matrix

Peer Review Process

This article has undergone a double-blind peer review process to ensure quality and impartiality.

Indexing Information

Discover where this journal is indexed at our indexing page to understand its reach and credibility.

Open Science Badges

This journal supports transparency in research and encourages authors to meet criteria for Open Science Badges by sharing data, materials, or preregistered studies.

Similar Articles

You may also start an advanced similarity search for this article.

Most read articles by the same author(s)