Optimization of K Value in KNN Algorithm for Spam and HAM Classification in SMS Texts

Ferryma Arba Apriansyah; Arief Hermawan; Donny Avianto

doi:10.35870/ijsecs.v4i2.2681

Published: 2024-08-20

Optimization of K Value in KNN Algorithm for Spam and HAM Classification in SMS Texts

DOI: 10.35870/ijsecs.v4i2.2681

Ferryma Arba Apriansyah, Arief Hermawan, Donny Avianto

Affiliation Details

Ferryma Arba Apriansyah: Universitas Teknologi Yogyakarta , Indonesia .
Arief Hermawan: Universitas Teknologi Yogyakarta , Indonesia .
Donny Avianto: Universitas Teknologi Yogyakarta , Indonesia .

Front Cover IJSECS VOLUME 5 NOMOR 3 DESEMBER 2025

Downloads

PDF

Article Metrics

Views 1,377
Downloads 783
Scopus Citations
Google Scholar
Crossref Citations
Semantic Scholar
DataCite Metrics
If the link doesn't work, copy the DOI or article title for manual search (API Maintenance).

Abstract

Spam refers to the unsolicited and repetitive sending of messages to others via electronic devices without their consent. This activity, commonly known as spamming, is typically carried out by individuals referred to as spammers. SMS spam, which often originates from unknown sources, frequently contains advertisements, phishing attempts, scams, and even malware. Such spam messages can be pervasive, affecting almost all mobile phone numbers, thereby causing significant disruptions to communication by delivering irrelevant content. The persistent nature of spam messages underscores the need for effective filtering mechanisms. This study investigates the application of the K-Nearest Neighbors (KNN) algorithm for classifying SMS messages as either spam or non-spam (ham). The findings demonstrate that KNN, when optimized through various methods for determining the appropriate value of K, can achieve an impressive average accuracy of 99.16% in classifying SMS spam. This high level of accuracy indicates that KNN is a reliable method for spam detection.

Keywords

Classification ; KNN ; SMS Spam

Peer Review Process

This article has undergone a double-blind peer review process to ensure quality and impartiality.

Indexing Information

Discover where this journal is indexed at our indexing page to understand its reach and credibility.

Open Science Badges

This journal supports transparency in research and encourages authors to meet criteria for Open Science Badges by sharing data, materials, or preregistered studies.

How to Cite

Apriansyah, F. A., Hermawan, A., & Avianto, D. (2024). Optimization of K Value in KNN Algorithm for Spam and HAM Classification in SMS Texts. International Journal Software Engineering and Computer Science (IJSECS), 4(2), 767-779. https://doi.org/10.35870/ijsecs.v4i2.2681

Article Information

This article has been peer-reviewed and published in the International Journal Software Engineering and Computer Science (IJSECS). The content is available under the terms of the Creative Commons Attribution 4.0 International License.

Issue: Vol. 5 No. 3 (2025)
Section: Articles
Published: %750 %e, %2024

License: CC BY 4.0
Copyright: © 2024 Authors
DOI: 10.35870/ijsecs.v4i2.2681

AI Research Hub

This article is indexed and available through various AI-powered research tools and citation platforms. Our AI Research Hub ensures that scholarly work is discoverable, accessible, and easily integrated into the global research ecosystem. By leveraging artificial intelligence for indexing, recommendation, and citation analysis, we enhance the visibility and impact of published research.

Scholarly Connection Platforms

Dimensions

Connected Papers

Scite

Google Scholar

Semantic Scholar

Garuda

Scilit

Crossref

BASE

Zenodo

Unpaywall

OpenCitations

Author Biographies

Ferryma Arba Apriansyah

Information Technology Study Program-Masters Program, Universitas Teknologi Yogyakarta, Special Region of Yogyakarta, Indonesia

Arief Hermawan

Information Technology Study Program-Masters Program, Universitas Teknologi Yogyakarta, Special Region of Yogyakarta, Indonesia

Donny Avianto

Information Technology Study Program-Masters Program, Universitas Teknologi Yogyakarta, Special Region of Yogyakarta, Indonesia

References

Nanja, M., & Purwanto, P. (2015). Metode K-Nearest Neighbor berbasis forward selection untuk prediksi harga komoditi lada. Pseudocode, 2(1), 53–64. https://doi.org/10.33369/pseudocode.2.1.53-64
Jain, G., Sharma, M., & Agarwal, B. (2019). Optimizing semantic LSTM for spam detection. International Journal of Information Technology, 11, 239-250. https://doi.org/10.1007/s41870-018-0157-5.
Jindal, N., & Liu, B. (2007, May). Review spam detection. In Proceedings of the 16th international conference on World Wide Web (pp. 1189-1190).
Jiang, M., Cui, P., & Faloutsos, C. (2016). Suspicious behavior detection: Current trends and future directions. IEEE intelligent systems, 31(1), 31-39. https://doi.org/10.1109/MIS.2016.5.
Roul, R. K., Sahoo, J. K., & Arora, K. (2018). Modified TF-IDF term weighting strategies for text categorization. In 2017 14th IEEE India Council International Conference (INDICON) (no. October). https://doi.org/10.1109/INDICON.2017.8487593
Martha, M., Christanti, V., Naga, D. S., & Rompas, P. T. D. (2018). Perbandingan Pengklasifikasi k-Nearest Neighbor dan Neighbor-Weighted k-Nearest Neighbor Pada Sistem Analisis Sentimen dengan Data Microblog. FRONTIERS: JURNAL SAINS DAN TEKNOLOGI, 1(1). https://doi.org/10.36412/frontiers/001035e1/april201801.08
Irfa, A. A., Adiwijaya, A., & Mubarok, M. S. (2018). Klasifikasi Topik Berita Berbahasa Indonesia Menggunakan k-Nearest Neighbor. eProceedings of Engineering, 5(2).
Ling, J., Kencana, I. P. E. N., & Oka, T. B. (2014). Analisis sentimen menggunakan metode Naïve Bayes Classifier dengan seleksi fitur Chi Square. E-Jurnal Matematika, 3(3), 92. https://doi.org/10.24843/mtk.2014.v03.i03.p070
Tamil, N., & Andhra, P. (2020). Classification of social media text spam using VAE-CNN and LSTM mode. Ingénierie des Systèmes d’Information, 25(6), 747-753.
Widyasanti, N. K., Putra, I. D., & Rusjayanthi, N. D. (2018). Seleksi Fitur Bobot Kata dengan Metode TFIDF untuk Ringkasan Bahasa Indonesia. J. Ilm. Merpati (Menara Penelit. Akad. Teknol. Informasi), 6(2), 119.
Zuviyanto, E., Adji, T. B., & Setiawan, N. A. (2018). Perbandingan Algoritme-algoritme Pembelajaran Mesin pada Klasifikasi SMS Spam. Prosiding SENIATI, 4(3), 20-26. https://doi.org/10.36040/seniati.v4i3.1350.
Muzakki, M. A. (2020). Klasifikasi dan Analisa Sentimen Kuesioner Fasilitas dan Layanan untuk Universitas Qomaruddin Gresik. Journal of Computer Science and Visual Communication Design, 5(2), 68-76.
Ramadhan, R., Sari, Y. A., & Adikara, P. P. (2021). Perbandingan Pembobotan Term Frequency-Inverse Document Frequency dan Term Frequency-Relevance Frequency terhadap Fitur N-Gram pada Analisis Sentimen. Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer, 5(11), 5075-5079.
Herwijayanti, B., Ratnawati, D. E., & Muflikhah, L. (2018). Klasifikasi Berita Online dengan menggunakan Pembobotan TF-IDF dan Cosine Similarity. Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer, 2(1), 306-312.
Pramartha, G. S., Shaufiah, S., & Bijaksana, M. A. (2015). Analisis Dan Implementasi Algoritma Graph-basedk-nearest Neighbour Untuk Klasifikasi Spam Pada Pesan Singkat. eProceedings of Engineering, 2(2).

License & Copyright

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Authors who publish with this journal agree to the following terms:

1. Copyright Retention and Open Access License

Authors retain copyright of their work and grant the journal non-exclusive right of first publication under the Creative Commons Attribution 4.0 International License (CC BY 4.0).

This license allows unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

2. Rights Granted Under CC BY 4.0

Under this license, readers are free to:

Share — copy and redistribute the material in any medium or format
Adapt — remix, transform, and build upon the material for any purpose, including commercial use
No additional restrictions — the licensor cannot revoke these freedoms as long as license terms are followed

3. Attribution Requirements

All uses must include:

Proper citation of the original work
Link to the Creative Commons license
Indication if changes were made to the original work
No suggestion that the licensor endorses the user or their use

4. Additional Distribution Rights

Authors may:

Deposit the published version in institutional repositories
Share through academic social networks
Include in books, monographs, or other publications
Post on personal or institutional websites

Requirement: All additional distributions must maintain the CC BY 4.0 license and proper attribution.

5. Self-Archiving and Pre-Print Sharing

Authors are encouraged to:

Share pre-prints and post-prints online
Deposit in subject-specific repositories (e.g., arXiv, bioRxiv)
Engage in scholarly communication throughout the publication process

6. Open Access Commitment

This journal provides immediate open access to all content, supporting the global exchange of knowledge without financial, legal, or technical barriers.

Published: 2024-08-20

Optimization of K Value in KNN Algorithm for Spam and HAM Classification in SMS Texts

DOI: 10.35870/ijsecs.v4i2.2681

Ferryma Arba Apriansyah, Arief Hermawan, Donny Avianto