Prediction of healthcare insurance costs

Shoroog ALBALAWİ; Lama ALSHAHRANİ; Nouf ALBALAWİ; Rawan ALHARBİ; A\`aeshah ALHAKAMY

Prediction of healthcare insurance costs

Authors : Shoroog ALBALAWİ, Lama ALSHAHRANİ, Nouf ALBALAWİ, Rawan ALHARBİ, A\`aeshah ALHAKAMY

Pages : 9-18

View : 153 | Download : 81

Publication Date : 2023-06-30

Article Type : Research Paper

Abstract :Machine learning insert ignore into journalissuearticles values(ML); is one of the computational intelligence aspects that can offer diverse solutions. Medical insurance cost prediction using ML methods is still a problem that must be investigated and improved in the healthcare industry. Two approaches are presented in this study the first uses computational intelligence to predict healthcare insurance costs using ML algorithms. And the second is Spark considered a big data tool. Among the first approach, the algorithms are the well-known linear regression and polynomial regression—based on the features of the input data. Linear regression is a method that shows the relationship between two or more variables. However, in polynomial analysis, the relationship between dependent and independent variables is modeled using polynomials of the nth degree. In this work, we use the KAGGLE repository to analyze the various regression models that can predict the cost of medical insurance. These data are divided based on essential features such as age, sex, BMI, region, number of children, smokers, and charges. The results show that the performance of the polynomial regression model is much better than the linear regression model. The polynomial regression model precisely fits the data according to the target. This is because the given task is non-linear which is hard for a linear model to predict the output as desired. Through the second approach, the data was built on a Jupyter notebook by interfacing tools to get the benefits that coding is very similar to python ML. Also, the cell could be closed, and usual ML coding is resumed on the same notebook. For this method, the obtained results show that the performance of the gradient-boosted tree regression model is much better than a multi-variate and random forest with R2 = 0.9067. This is because of its sequential technique of regression.
Keywords : Apache spark, Cost prediction, ML algorithms, Medical insurance

ORIGINAL ARTICLE URL