Penerapan Algoritma TF-IDF dan Cosine Similarity untuk Query Pencarian Pada Dataset Destinasi Wisata

Main Article Content

Rio Al Rasyid
Dewi Handayani Untari Ningsih

Abstract

This research aims to improve the search for tourist destinations in 50 datasets by using search queries to find relevant documents. By optimizing the search process, the goal is to create an accurate list of tourist destinations based on a given query. To achieve this, researchers used the TF-IDF and Cosine Similarity algorithms to retrieve and compare information, measuring similarity scores between search queries and tourist destinations in the dataset. Finally, the list of tourist destinations is ranked based on the similarity score measurement. The methods used are TF-IDF and Cosine Similarity. The fifty datasets containing text content documents were normalized through pre-processing stages, namely Case Folding, Stopword Removal, and Tokenization. Documents that have been normalized are then processed again through TF-IDF weighting. TF-IDF weighting is also applied to search queries. The similarity calculation between the TF-IDF vector from the document and the TF-IDF vector from the search query is carried out using Cosine Similarity to obtain a similarity score for each document based on the search query. Testing was carried out on 5 different queries, and precision testing results were obtained with an average value of 83%

Downloads

Download data is not yet available.

Article Details

How to Cite
Al Rasyid, R., & Ningsih, D. H. U. (2024). Penerapan Algoritma TF-IDF dan Cosine Similarity untuk Query Pencarian Pada Dataset Destinasi Wisata. Jurnal JTIK (Jurnal Teknologi Informasi Dan Komunikasi), 8(1), 170–178. https://doi.org/10.35870/jtik.v8i1.1416
Section
Computer & Communication Science
Author Biographies

Rio Al Rasyid, Universitas Stikubank

Program Studi Teknik Informatika, Fakultas Teknologi Informasi dan Industri, Universitas Stikubank, Kota Semarang, Provinsi Jawa Tengah, Indonesia

Dewi Handayani Untari Ningsih, Universitas Stikubank

Program Studi Teknik Informatika, Fakultas Teknologi Informasi dan Industri, Universitas Stikubank, Kota Semarang, Provinsi Jawa Tengah, Indonesia

References

Sipayung, E.M., Fiarni, C. and Febrian, M., 2021, October. Implementation of Search Engine Optimization (SEO) in Wellness and Beauty Tourism Industry. In 2021 8th International Conference on Electrical Engineering, Computer Science and Informatics (EECSI) (pp. 397-402). IEEE. DOI: https://doi.org/10.23919/EECSI53397.2021.9624309.

Ravichandiran, A., Vijayan, A. and Ravikumar, K., 2015. Memory Optimization Using Genetic Algorithm of Relational Keyword Search Techniques. Memory, 3(4).

Qaiser, S. and Ali, R., 2018. Text mining: use of TF-IDF to examine the relevance of words to documents. International Journal of Computer Applications, 181(1), pp.25-29.

Salton, G. and Buckley, C., 1988. Term-weighting approaches in automatic text retrieval. Information processing & management, 24(5), pp.513-523. DOI: https://doi.org/10.1016/0306-4573(88)90021-0.

Arif, Y.M., Putra, D.D. and Khan, N., 2023. Selecting tourism site using 6 as tourism destinations framework based multi-criteria recommender system. Applied Information System and Management (AISM), 6(1), pp.7-12.

Arif, Y.M., Nurhayati, H., Harini, S., Nugroho, S.M.S. and Hariadi, M., 2020, February. Decentralized tourism destinations rating system using 6AsTD framework and blockchain. In 2020 international conference on smart technology and applications (ICoSTA) (pp. 1-6). DOI: IEEE. https://doi.org/10.1109/ICoSTA48221.2020.1570614662.

Liu, G., Lee, K.Y. and Jordan, H.F., 1997. TDM and TWDM de Bruijn networks and shufflenets for optical communications. IEEE Transactions on Computers, 46(6), pp.695-701. DOI: https://doi.org/10.1109/12.600827.

Gunawansyah, R. and Nurwathi, S., 2020. Automated essay scoring using natural language processing and text mining method. In Proceeding of 14th International Conference on Telecommunication Systems, Services, and Applications.

Pathak, P., Raghav, S., Jain, S. and Jalal, S., 2021, October. Essay Rating System Using Machine Learning. In 2021 5th International Conference on Information Systems and Computer Networks (ISCON) (pp. 1-6). IEEE. DOI: https://doi.org/10.1109/ISCON52037.2021.9702504.

Ratna, A.A.P., Santiar, L., Ibrahim, I., Purnamasari, P.D., Luhurkinanti, D.L. and Larasati, A., 2019, October. Latent semantic analysis and winnowing algorithm based automatic Japanese short essay answer grading system comparative performance. In 2019 IEEE 10th International Conference on Awareness Science and Technology (iCAST) (pp. 1-7). IEEE. DOI: https://doi.org/10.1109/ICAwST.2019.8923226.

Robertson, S., 2004. Understanding inverse document frequency: on theoretical arguments for IDF. Journal of documentation, 60(5), pp.503-520.

Hakim, A.A., Erwin, A., Eng, K.I., Galinium, M. and Muliady, W., 2014, October. Automated document classification for news article in Bahasa Indonesia based on term frequency inverse document frequency (TF-IDF) approach. In 2014 6th international conference on information technology and electrical engineering (ICITEE) (pp. 1-4). IEEE. DOI: https://doi.org/10.1109/ICITEED.2014.7007894.

Ayob, A., 2019. Comparison between conventional and digital essay writing assessment system: Consumer concept and user friendly. Research in World Economy, 10(2), pp.96-101.

Yang, Y., Xia, L. and Zhao, Q., 2019. An automated grader for Chinese essay combining shallow and deep semantic attributes. IEEE Access, 7, pp.176306-176316. DOI: https://doi.org/10.1109/ACCESS.2019.2957582.

Yulita, W., Untoro, M.C., Praseptiawan, M., Ashari, I.F., Afriansyah, A. and Pee, A.N.B.C., 2023. Automatic Scoring Using Term Frequency Inverse Document Frequency Document Frequency and Cosine Similarity. Scientific Journal of Informatics, 10(2), pp.93-104. DOI: https://doi.org/10.15294/sji.v10i2.42209.