Search News Posts
General Inquiries 1-888-555-5555
•
Support 1-888-555-5555
Diabetes Prediction
In this study, our machine learning model is applied to the Diabetes dataset to predict the risk of this disease in an individual. An end-to end process is used where people must enter their details in the web application and submit the data. The real-time processing takes place, and the risk is predicted within a few seconds.
The web application that is used as a real-time database on the cloud is the cloud-native database. The trained parameters of the model are stored in the database, and prediction is done in real-time.
Further, the user is also notified of the accuracy of the model. Apart from this, the news article from trusted sources is also shared in the app in real-time.
As in all disease prediction models, patient data will be preprocessed first. The second step will be the first step that defines the prediction model. Many parameters and hyperparameters must be set when defining the model. These elements have a very significant effect on accuracy, they can also prevent under-fitting and overfitting of our prediction model. The third step is to fit the data to the model and finally the fourth step will be to verify the model accuracy.
The e main aspects of the service is as follow
The first part is about preparing and preprocessing the data. This part discusses different features related to each other and also how some features are eliminated from the process.
*e dataset has 768 data points. Out of the features listed in the table, the features used include “pregnancies,” “blood pressure,” “BMI,” and “age.”
The aim of the case study is not only to build a model of prediction by using artificial intelligence but to make it practically possible to use the models in the real-time.
There were some outliers and the are not included. The features including skin thickness and diabetes pedigree function are not possible for a normal person to determine at home. For instance, diabetes pedigree function is a complex function calculated by using various factors including parents, siblings, half aunt, and half-uncle.The dataset can be seen as below. Dataset has demographic properies of the invidiual as wel as habits.
Diabetes dataset used in this study has features shown below.
Pregnancies
Glucose
Blood pressure
Skin Thickness
Insulin
BMI
Pedigree
Age
Outcome
6
148
72
35
0
33.6
0.63
50
1
1
85
66
29
0
26.6
0.35
31
0
8
183
64
0
0
23.3
0.67
32
1
1
89
66
23
94
28.1
0.17
21
0
0
137
40
35
168
43.1
2.29
33
1
A heat map has been drawn to determine the importance of. According to the heat map, pregnancies, glucose, BMI, and age have the highest impact (greater than 0.2) on predicting diabetes. Out of this, glucose is not considered for making the model useful for practical use
The dataset collected for diabetes has the features as shown below.
Feature
Number of times pregnant
Glucose
Plasma glucose concentration 2 hours
Blood pressure
Diastolic blood pressure (mm Hg)
Skin thickness
Triceps skin fold thickness (mm)
Insulin
2-hour serum insulin (mu U/ml)
BMI
Body mass index (weight in kg/(height in m)^2)
Diabetes pedigree function
Diabetes pedigree function
Age
Age (years)
Outcome
Binary
After cleaning and analyzing the dataset, machine learning models were applied. The logistic regression model is used for the dataset. To make the prediction, the coefficients and intercept of all the three logistic regression models are stored in a cloud-native real-time database.
Our model outperforms tthe competitive machine learning models in terms of accuracy and F-measure by 1.8274% and 1.7264, respectively, for diabetes dataset.
This case study provides insights into using the machine learning models to predict the risk of diabetes in an individual based on answering a few questions related to various factors like travel history, age, gender, and blood pressure. Logistic regression is used for prediction.
The findings in this diagnosis service can be helpful in the early screening of potential diabetes patients. It can be helpful in the sense that the first screening can be performed at the comfort of home. If a high risk of disease is predicted in a patient, then it can be followed by clinical trials for confirmation.