Loading...
+1-9179056297
contact@mkscienceset.com

Predicting Medical Insurance Claims Using Machine Learning: A Data-Driv en Approach for Improved Risk Assessment

Abstract:
The present study thoroughly examines the prediction of health insurance claims by demographic and health-related features to the end of supporting more precise and uniform risk evaluation in the insurance sector. The dataset, which was downloaded from Kaggle, contains the following variables: age, BMI, blood pressure, diabetes status, smoking status, gender, number of children, and area. Comprehensive preprocessing was performed to completely eliminate missing values, duplicates, and categorical inconsistencies while also performing equal feature scaling through utilizing median/mode imputation, categorical standardization, label and one-hot encoding, and robust scaling. Four distinct types of regression models were created and assessed utilizing both train-test split and K-Fold Cross Validation: Linear Regression, Decision Tree Regression, Ran dom Forest Regression, and K-Nearest Neighbors. The performance of these models was evaluated using R-Squared (R2), Mean Squared Error (MSE), Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE). The Random Forest model showed the highest predictive accuracy and consistency across all metrics and therefore outperformed the other models in both evaluation settings. The results point out the success of ensemble methods in the frame of revealing complex relationships in healthcare insurance datasets and also indicate the necessity of clean, well-processed data for the improvement of predictive performance. Suggested approaches for enhancing the quality of the dataset include an oversampling or stratified sampling method to eliminate the problem of data imbalance, as well as applying more sophisticated imputation methods like KNN Imputer and Iterative Imputer for better data quality. This work has shown how machine learning can be a strong ally for insurance companies; they would be able to charge fair premiums and make informed policy decisions with the help of this data-driven approach.