Published: 2025-12-01
Optimization of Tesseract OCR for Automatic Text Extraction on Indonesian ID Cards (KTP) Through Image Quality Enhancement Using Preprocessing Techniques
DOI: 10.35870/ijsecs.v5i3.5183
Gilang Ramadhan, Dadang Iskandar Mulyana, Sopan Adrianto
- Gilang Ramadhan: Sekolah Tinggi Ilmu Komputer Cipta Karya Informatika
- Dadang Iskandar Mulyana: Sekolah Tinggi Ilmu Komputer Cipta Karya Informatika
- Sopan Adrianto: Sekolah Tinggi Ilmu Komputer Cipta Karya Informatika
Downloads
Article Metrics
- Views 0
- Downloads 0
- Scopus Citations
- Google Scholar
- Crossref Citations
- Semantic Scholar
- DataCite Metrics
-
If the link doesn't work, copy the DOI or article title for manual search (API Maintenance).
Abstract
Tesseract OCR ranks among the most widely adopted open-source tools for text extraction. Nevertheless, processing documents with degraded image quality—including blurry e-KTPs, low-contrast specimens, or those affected by uneven lighting—presents substantial challenges. We conducted experimental research to generate empirical data supporting the development of text detection systems for e-KTPs operating under non-ideal conditions. Our methodology involved testing 10 e-KTP images, each containing 15 text attributes, yielding 150 evaluated data points. Image preprocessing proceeded sequentially through grayscale conversion, denoising, contrast enhancement (CLAHE), and thresholding to improve image clarity prior to Tesseract OCR processing. We evaluated accuracy using confusion matrix analysis, emphasizing True Positive (TP), False Positive (FP), and False Negative (FN) metrics. Results demonstrate that preprocessing stages substantially improved text readability. Baseline OCR accuracy of 39.55% increased incrementally: +22.68% following grayscale conversion, +47.70% after denoising, +60.99% post-CLAHE application, and +19.62% after thresholding, culminating in 64.97% accuracy upon completing all preprocessing stages. Average TP values rose from 4 to 8 out of 15 attributes per image, while precision remained stable at 100% (FP = 0). Despite variable CLAHE performance across samples, preprocessing stages proved essential for OCR systems operating under degraded image conditions. Our work introduces a novel preprocessing pipeline tailored specifically to Indonesian e-KTP characteristics, providing quantitative benchmarks and systematic analysis that can inform the development of more adaptive digitalization and verification systems for population documents under real-world field conditions
Keywords
Optical Character Recognition ; Tesseract ; Pre-Processing ; Image Enhancement ; Confusion Matrix
Article Metadata
Peer Review Process
This article has undergone a double-blind peer review process to ensure quality and impartiality.
Indexing Information
Discover where this journal is indexed at our indexing page to understand its reach and credibility.
Open Science Badges
This journal supports transparency in research and encourages authors to meet criteria for Open Science Badges by sharing data, materials, or preregistered studies.
How to Cite
Article Information
This article has been peer-reviewed and published in the International Journal Software Engineering and Computer Science (IJSECS). The content is available under the terms of the Creative Commons Attribution 4.0 International License.
-
Issue: Vol. 5 No. 3 (2025)
-
Section: Articles
-
Published: %750 %e, %2025
-
License: CC BY 4.0
-
Copyright: © 2025 Authors
-
DOI: 10.35870/ijsecs.v5i3.5183
AI Research Hub
This article is indexed and available through various AI-powered research tools and citation platforms. Our AI Research Hub ensures that scholarly work is discoverable, accessible, and easily integrated into the global research ecosystem. By leveraging artificial intelligence for indexing, recommendation, and citation analysis, we enhance the visibility and impact of published research.
Gilang Ramadhan
Informatics Engineering Study Program, Faculty of Computer Science, Sekolah Tinggi Ilmu Komputer Cipta Karya Informatika, East Jakarta City, Special Capital Region of Jakarta, Indonesia
Dadang Iskandar Mulyana
Informatics Engineering Study Program, Faculty of Computer Science, Sekolah Tinggi Ilmu Komputer Cipta Karya Informatika, East Jakarta City, Special Capital Region of Jakarta, Indonesia
-
Benaissa, A., Bahri, A., El Allaoui, A., & Salahddine, M. A. (2023). Build a trained data of Tesseract OCR engine for Tifinagh script recognition. Data and Metadata, 2, 185. https://doi.org/10.56294/dm2023185
-
Handayani, L. F., et al. (2021). Sistem pengenalan dan pencocokan citra KTP elektronik Indonesia menggunakan OCR dan CNN. Jurnal RESTI, 5. http://dx.doi.org/10.31544/jtera.v6.i1.2021.1-6
-
Toha, M. R., & Triayudi, A. (2023). Penerapan membaca tulisan di dalam gambar menggunakan metode OCR berbasis website pada e-KTP. Jurnal Ilmu dan Teknologi Pendidikan (JST Undiksha), 11(1). https://doi.org/10.23887/jstundiksha.v11i1.42279
-
Hudaya, M. M., Sa'adah, S., & Irawan, H. (2021). Implementasi verifikasi dan pencocokan gambar e-KTP menggunakan Faster R-CNN, ORB, dan KNN-BFM. Jurnal RESTI, 5(4). https://doi.org/10.29207/resti.v5i4.3175
-
Ibnutama, K., & Suryanata, M. G. (2020). Ekstraksi karakter citra menggunakan optical character recognition untuk pencetakan nomor kendaraan pada struk parkir. Jurnal Media Informatika Budidarma, 4(4), 1119–1125. https://doi.org/10.30865/mib.v4i4.2432
-
Memon, J., Sami, M., Khan, R. A., & Uddin, M. (2020). Handwritten optical character recognition (OCR): A comprehensive systematic literature review (SLR). IEEE Access, 8, 142642–142668. https://doi.org/10.1109/ACCESS.2020.3012542
-
Sugiarta, G., Andini, D. P., & Hidayatullah, S. (2021). Ekstraksi informasi/data e-KTP menggunakan optical character recognition convolutional neural network. JTERA, 6(1), 1–6. https://doi.org/10.31544/jtera.v6.i1.2021.1-6
-
Octaviani, T., Setiawan, H., & Kelana, O. H. (2023). Perbandingan pytesseract dan template matching untuk otomatisasi input data KTP. Jurnal Buana Informatika, 14(2), 170–179. https://doi.org/10.24002/jbi.v14i02.7612
-
Haris, M., Suryanata, M. G., & Yetri, M. (2023). Implementasi OCR menggunakan algoritma template matching correlation pada pengarsipan e-KTP. Jurnal Teknologi Sistem Informasi dan Sistem Komputer TGD, 6(2), 357–364. https://doi.org/10.53513/jsk.v6i2.8134
-
Putra, N. R., Rachmadi, R. F., & Nugroho, S. M. S. (2023). Web service untuk ekstraksi informasi KTP menggunakan Google Cloud Vision. Jurnal Teknik ITS, 12(3), A218–A223. https://doi.org/10.12962/j23373539.v12i3.122400
-
Taqwa, F. S., et al. (2024). Perancangan sistem informasi pada Jambi sistem informasi pelatihan pertanian menggunakan optical character recognition. Jurnal Sifo Mikroskil (JSM), 25(1), 45–54. https://doi.org/10.55601/jsm.v25i1.1139
-
Rusli, F. M., Adhiguna, K. A., & Irawan, H. (2021). Indonesian ID card extractor using optical character recognition and natural language post-processing. arXiv. https://arxiv.org/abs/2101.05214
-
Angela, S. M., Eviyanti, A., & Mauliana, M. I. (2024). Pengembangan teknologi optical character recognition di Flutter berupa deteksi teks pada gambar. Jurnal TEKINKOM, 7(1), 64–72. https://doi.org/10.37600/tekinkom.v7i1.1167
-
Bahar, R. D. Y., & Raban, R. A. (2023). Model pendeteksi nominal uang kertas rupiah menggunakan teknologi optical character recognition. Jurnal TICOM, 12(1), 17–24. https://doi.org/10.70309/ticom.v12i1.101
-
Suhairi, M., Rahmi, E., & Kurniawaty, E. (2025). Penerapan teknologi OCR pada aplikasi pemindaian nutrisi di label kemasan makanan. Jurnal Ilmiah Teknik Informatika dan Komunikasi, 5(1), 1–9. https://doi.org/10.55606/juitik.v5i1.1205
-
Putri, W. P., et al. (2024). Implementasi OCR untuk klasifikasi penempatan obat berdasarkan kelas terapi di apotek. Jurnal Elkolind, 11(2), 245–254. https://doi.org/10.33795/elkolind.v11i2.5341
-
Reswan, Y., Raffles, R., et al. (2024). Penerapan algoritma OCR untuk ekstraksi informasi dari citra KTM. JATI, 8(5), 2339–2346. https://doi.org/10.36040/jati.v8i5.11006
-
-
Iskandar, R., & Kesuma, M. E. K. (2022). Designing a real-time-based OCR to detect ID cards. International Journal of Electronics and Communications System, 2(1), 43–52. https://doi.org/10.24042/ijecs.v2i1.13108
-
Anakpluek, et al. (2025). Improved Tesseract OCR performance on Thai document datasets. Big Data Research, 39, 100508. https://doi.org/10.1016/j.bdr.2025.100508
-
Nugroho, I. A., et al. (2024). The design of a C1 document data extraction application using Tesseract-OCR engine. Jurnal RESTI, 8(1), 152–159. https://doi.org/10.29207/resti.v8i1.5151
-
Zacharias, E., Teuchler, M., & Bernier, B. (2020). Image processing based scene-text detection and recognition with Tesseract. arXiv. https://doi.org/10.48550/arXiv.2004.08079
-
Munawaroh, A., & Jamzuri, E. R. (2023). Automatic optical inspection for detecting keycaps misplacement using Tesseract OCR. International Journal of Electrical and Computer Engineering, 13(5), 5147–5155. https://doi.org/10.11591/ijece.v13i5.pp5147-5155
-
Kotwal, N., et al. (2021). Optical character recognition using Tesseract engine. International Journal of Engineering Research & Technology, 10(9), 508–512. https://doi.org/10.17577/IJERTV10IS090157
-
Boruah, P. K., & Haloi, P. (2025). Development and implementation of a custom license plate detection and recognition system using YOLOv10 and Tesseract OCR. International Journal of Innovative Technology and Exploring Engineering, 14(6), 1–8. https://doi.org/10.35940/ijitee.E1083.14060525
-
Aji, M. I. S., Mulyana, D. I., & Akbar, Y. (2023). Penerapan IoT dengan algoritma fuzzy dalam monitoring kesehatan mata berbasis Android. Jurnal Teknologi Sistem Informasi dan Sistem Komputer TGD, 6(1), 135–144. https://doi.org/10.53513/jsk.v6i1.7346
-
Schmidt, T., Kamlah, J., & Weil, S. (2024). Reichsanzeiger-GT: An OCR ground truth dataset based on the historical newspaper Deutscher Reichsanzeiger und Preußischer Staatsanzeiger (1819–1945). Data in Brief, 54, 110274. https://doi.org/10.1016/j.dib.2024.110274
-
Ponnuru, M., Ponmalar, S., Likhitha, A., Sree, T. B., & Chaitanya, G. G. (2024). Image-based extraction of prescription information using OCR-Tesseract. Procedia Computer Science, 235, 1077–1086. https://doi.org/10.1016/j.procs.2024.04.102
-
Robby, A. G., Tandra, A., Susanto, I., Harefa, J., & Chowanda, A. (2019). Implementation of optical character recognition using Tesseract with the Javanese script target in Android application. Procedia Computer Science, 157, 499–505. https://doi.org/10.1016/j.procs.2019.09.006

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
1. Copyright Retention and Open Access License
Authors retain copyright of their work and grant the journal non-exclusive right of first publication under the Creative Commons Attribution 4.0 International License (CC BY 4.0).
This license allows unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
2. Rights Granted Under CC BY 4.0
Under this license, readers are free to:
- Share — copy and redistribute the material in any medium or format
- Adapt — remix, transform, and build upon the material for any purpose, including commercial use
- No additional restrictions — the licensor cannot revoke these freedoms as long as license terms are followed
3. Attribution Requirements
All uses must include:
- Proper citation of the original work
- Link to the Creative Commons license
- Indication if changes were made to the original work
- No suggestion that the licensor endorses the user or their use
4. Additional Distribution Rights
Authors may:
- Deposit the published version in institutional repositories
- Share through academic social networks
- Include in books, monographs, or other publications
- Post on personal or institutional websites
Requirement: All additional distributions must maintain the CC BY 4.0 license and proper attribution.
5. Self-Archiving and Pre-Print Sharing
Authors are encouraged to:
- Share pre-prints and post-prints online
- Deposit in subject-specific repositories (e.g., arXiv, bioRxiv)
- Engage in scholarly communication throughout the publication process
6. Open Access Commitment
This journal provides immediate open access to all content, supporting the global exchange of knowledge without financial, legal, or technical barriers.