- Artificial Intelligence Theory and Applications
- Volume:2 Issue:2
- Diabetes Risk Prediction with Machine Learning Models
Diabetes Risk Prediction with Machine Learning Models
Authors : Gözde ÖZSEZER, Gülengül MERMER
Pages : 1-9
View : 108 | Download : 19
Publication Date : 2022-10-01
Article Type : Research Paper
Abstract :Diabetes mellitus insert ignore into journalissuearticles values(DM); is one of the most common chronic diseases worldwide, which is a major public health problem. The aim of this study is to predict DM risk with machine learning insert ignore into journalissuearticles values(ML); models using available data. In the analytical study, the “Diabetes Health Indicators Dataset” consisting of 253680 data and 21 variables collected annually by the CDC was used. The open access dataset was retrieved from Kaggle on March 5, 2022. Data analysis was done with Phyton 3.0 programming language using numpy, pandas, matplotlib, seaborn, sciktlearn, imblearn libraries. With data pre-processing, outliers and missing data were removed. KNN, Logistic regression, Decision tree, Random forest and Naive Bayes from ML algorithms were used in predictive modeling. The prediction rate of the algorithms was evaluated with accuracy, precision, recall and F1 Score. It did not require permission as the data was open access. KNN’s accuracy was 0.74, precision 0.31, recall 0.55, F1 score 0.39; Logistic regression’s accuracy was 0.72; precision 0.33, recall 0.74, F1 score 0.46; Decision tree’s was accuracy 0.84, precision 0.54 recall 0.15, F1 score 0.24; Random forest’s accuracy was 0.84, precision 0.56, recall 0.16, F1 score 0.25; Naive bayes`s accuracy was 0.84, precision 0.52, recall 0.19, F1 score 0.28. In this study, ML algorithms were used for DM risk estimation. According to the experimental results, when the data set is divided into random training insert ignore into journalissuearticles values(80%); and testing insert ignore into journalissuearticles values(20%);, the accuracy values of random forest and decision tree algorithms are very close to each other insert ignore into journalissuearticles values(RF: 0.848, DT: 0.847);. Therefore, it can be said that the two best algorithms for diabetes risk estimation are random forest and decision tree.Keywords : diabetes, risk, prediction, machine learning, artificial intelligence
ORIGINAL ARTICLE URL
