Implementasi Metode Imputasi Mean dan Single Center Imputation Chained Equation (SICE) Terhadap Hasil Prediksi Linear Regression pada Data Numerik

Main Article Content

Mario Rangga Baihaqi
Tesa Nur Padilah
Mohamad Jajuli

Abstract

Data and information play an important role in all aspects of science, so data must be processed well through the process of data excavation or data mining. The excavation of patterns from data can be done using machine learning algorithms such as linear regression. However, in the process of extracting information from data, it can be less effective if there is a loss of value in a data. The purpose of this research is to implement the mean imputation and single center imputation chained equation (SICE) techniques against the linear regression algorithm. The data used in this research is numerical data. The root mean squared error (RMSE) value shows that the implementation of linear regression algorithm using the mean imputation technique results in better performance compared to the SICE imputation technique.

Downloads

Download data is not yet available.

Article Details

How to Cite
Baihaqi, M. R., Padilah, T. N., & Jajuli, M. (2023). Implementasi Metode Imputasi Mean dan Single Center Imputation Chained Equation (SICE) Terhadap Hasil Prediksi Linear Regression pada Data Numerik. Jurnal JTIK (Jurnal Teknologi Informasi Dan Komunikasi), 7(4), 661–671. https://doi.org/10.35870/jtik.v7i4.1169
Section
Computer & Communication Science
Author Biographies

Mario Rangga Baihaqi, Universitas Singaperbangsa Karawang

Fakultas Ilmu Komputer, Universitas Singaperbangsa Karawang, Kabupaten Karawang, Provinsi Jawa Barat, Indonesia

Tesa Nur Padilah, Universitas Singaperbangsa Karawang

Fakultas Ilmu Komputer, Universitas Singaperbangsa Karawang, Kabupaten Karawang, Provinsi Jawa Barat, Indonesia

Mohamad Jajuli, Universitas Singaperbangsa Karawang

Fakultas Ilmu Komputer, Universitas Singaperbangsa Karawang, Kabupaten Karawang, Provinsi Jawa Barat, Indonesia

References

Jordanov, I., Petrov, N. and Petrozziello, A., 2018. Classifiers accuracy improvement based on missing data imputation. Journal of Artificial Intelligence and Soft Computing Research, 8(1), pp.31-48. DOI: https://doi.org/10.1515/jaiscr-2018-0002.

Jadhav, A., Pramod, D. and Ramanathan, K., 2019. Comparison of performance of data imputation methods for numeric dataset. Applied Artificial Intelligence, 33(10), pp.913-933. DOI: https://doi.org/10.1080/08839514.2019.1637138.

Enders, C.K., 2022. Applied missing data analysis. Guilford Publications. Available at: https://lccn.loc.gov/2022009851.

Afarisi, R., Tjandrasa, H. and Arieshanti, I., 2018. Perbandingan performa antara imputasi metode konvensional dan imputasi dengan algoritma mutual nearest neighbor. Jurnal Teknik Pomits, 2(1), pp.73-76. DOI: https://doi.org/10.12962/j23373539.v2i1.2735.

Han, J., Kamber, M. and Pei, J., 2011. Data Mining: Concepts and techniques. Cambridge: Elsevier. DOI: https://doi.org/10.1016/C2009-0-61819-5.

Susanti, P. and Azizah, N., 2019. Imputation of missing value using dynamic bayesian network for multivariate time series data. International conference on data and software engineering, 1, pp.1-5. DOI: https://doi.org/10.1109/ICODSE.2017.8285864.

Das, D.D., Nayak, M. and Pani, S. K., 2019. Missing value imputation a review. International Journal of Computer Science and Engineering. 7(4), pp.548-558. DOI: https://doi.org/10.26438/ijcse/v7i4.548558.

Jajuli, M. and Komarudin, O., 2017. Implementasi Metode Impusati Mean dan Expectation Maximisation terhadap Hasil Clustering k-Means Mahasiswa Pelamar Beasiswa Fakultas Ilmu Komputer Universitas Singaperbangsa Karawang. SESIOMADIKA, 1, pp.19-27. Available at: http://pmat-unsika.eu5.org/Prosiding/4MohammadJajuli-SESIOMADIKA-2017.pdf.

Young, W., Weckman, G. and Holland, W., (2011). A survey of methodologies for the treatment of missing values within datasets: limitations and benefits. Theoretical Issue in Ergonomics Science, 12(1), pp.16-43. DOI: https://doi.org/10.1080/14639220903470205

Khan, S. and Hoque, A., 2020. SICE: an improved missing data imputation technique. Journal of Big Data, 7(37), pp.1-21. DOI: https://doi.org/10.1186/s40537-020-00313-w.

Ilham, A., 2020. Hybrid Metode Boostrap dan Teknik Imputasi Pada Metode C4-5 untuk Prediksi Penyakit Ginjal Kronis. Statistika, 8(1), pp.43-51. Available at: https://garuda.kemdikbud.go.id/documents/detail/1767581.

Joseph, V., 2022. Optimal Ratio for Data Splitting. Stat Anal Data Min: The ASA Data Sci Journal, 1, pp.531-538. DOI: https://doi.org/10.1002/sam.11583.

Marcot, B. and Hanea, A., 2021. What is an Optimal Value of k in k-Fold Cross-Validation in Discrete Bayesian Network Analysis. Computational Statistics, 36(1), pp.2009-2031. DOI: https://doi.org/10.1007/s00180-020-00999-9.

Berrar, D., 2018. Cross-Validation. Data Science Laboratory, 1, pp.542-545. DOI: https://doi.org/10.1007/s00180-020-00999-9.

SinSomboonthong, S., 2022. Performance Comparison of New Adjusted Min-Max with Decimal Scaling and Statistical Column Normalization Methods for Artificial Neural Network Classification. International Journal of Mathematics and Mathematical Sciences, 1, pp.1-9. DOI: https://doi.org/10.1155/2022/3584406.

Chai, T. and Draxler, R., 2014. Root mean squared error (RMSE) or mean absolute error (MAE)?. Geosci. Model Dev, 7(1), pp.1525-1534. DOI: https://doi.org/10.5194/gmdd-7-1525-2014.