anaplatform Data Consultancy :: Büyük Veri Danışmanlığı

General Inquiries 1-888-555-5555
•
Support 1-888-555-5555

Get started today!
Register
Login

Diabetes Prediction

Case Study - Diabetes Prediction

In this study, our machine learning model is applied to the Diabetes dataset to predict the risk of this disease in an individual. An end-to end process is used where people must enter their details in the web application and submit the data. The real-time processing takes place, and the risk is predicted within a few seconds.

The web application that is used as a real-time database on the cloud is the cloud-native database. The trained parameters of the model are stored in the database, and prediction is done in real-time.

Further, the user is also notified of the accuracy of the model. Apart from this, the news article from trusted sources is also shared in the app in real-time.

As in all disease prediction models, patient data will be preprocessed first. The second step will be the first step that defines the prediction model. Many parameters and hyperparameters must be set when defining the model. These elements have a very significant effect on accuracy, they can also prevent under-fitting and overfitting of our prediction model. The third step is to fit the data to the model and finally the fourth step will be to verify the model accuracy.

How we predict the Diabetes ?

The e main aspects of the service is as follow

An efficient automated disease diagnosis model is designed using the machine learning models.
A critical disease is selected such as diabetes.
In the proposed service, the data are entered into an web app, the analysis is then performed in a real-time database using a pretrained machine learning model on the cloud which was trained on the same dataset and deployed in the cloud, and finally, the disease detection result is shown in the android app.
Logistic regression is used to carry out computation for prediction.

Data

The first part is about preparing and preprocessing the data. This part discusses different features related to each other and also how some features are eliminated from the process.

Diabetes Data

*e dataset has 768 data points. Out of the features listed in the table, the features used include “pregnancies,” “blood pressure,” “BMI,” and “age.”

The aim of the case study is not only to build a model of prediction by using artificial intelligence but to make it practically possible to use the models in the real-time.

There were some outliers and the are not included. The features including skin thickness and diabetes pedigree function are not possible for a normal person to determine at home. For instance, diabetes pedigree function is a complex function calculated by using various factors including parents, siblings, half aunt, and half-uncle.

The dataset can be seen as below. Dataset has demographic properies of the invidiual as wel as habits.

Diabetes dataset used in this study has features shown below.

Pregnancies

Glucose

Blood pressure

Skin Thickness

Insulin

BMI

Pedigree

Age

Outcome

148

33.6

0.63

26.6

0.35

183

23.3

0.67

28.1

0.17

137

168

43.1

2.29

A heat map has been drawn to determine the importance of. According to the heat map, pregnancies, glucose, BMI, and age have the highest impact (greater than 0.2) on predicting diabetes. Out of this, glucose is not considered for making the model useful for practical use

Features in Diabetes Dataset

The dataset collected for diabetes has the features as shown below.

Feature

Number of times pregnant

Glucose

Plasma glucose concentration 2 hours

Blood pressure

Diastolic blood pressure (mm Hg)

Skin thickness

Triceps skin fold thickness (mm)

Insulin

2-hour serum insulin (mu U/ml)

BMI

Body mass index (weight in kg/(height in m)^2)

Diabetes pedigree function

Age

Age (years)

Outcome

Binary

Implementation

Implementation for Forecasting

After cleaning and analyzing the dataset, machine learning models were applied. The logistic regression model is used for the dataset. To make the prediction, the coefficients and intercept of all the three logistic regression models are stored in a cloud-native real-time database.

Results

Below figure hows an example of prediction in our web app. From the comparative analysis, it is found that among the existing models, the proposed model outperforms the competitive models in terms of various performance measures.

Accuracy analysis on diabetes dataset

Our model outperforms tthe competitive machine learning models in terms of accuracy and F-measure by 1.8274% and 1.7264, respectively, for diabetes dataset.

Conclusion

This case study provides insights into using the machine learning models to predict the risk of diabetes in an individual based on answering a few questions related to various factors like travel history, age, gender, and blood pressure. Logistic regression is used for prediction.

The findings in this diagnosis service can be helpful in the early screening of potential diabetes patients. It can be helpful in the sense that the first screening can be performed at the comfort of home. If a high risk of disease is predicted in a patient, then it can be followed by clinical trials for confirmation.

open positions

Senior Software Developer
read more
Mobile app developer
read more
User Interface Designer
read more

upcoming events

Mastering Software Design
June 25 | 3:00pm EST.
Core Software Design
June 28 | 4:00pm EST.
Advanced Ranorex
July 03 | 3:30pm EST.

latest news

about 14 days ago Check out new work on my@Behance portfolio: "HOPE Charity Theme" bit.ly/1szLobl
about 21 days ago Check out new work on my@Behance portfolio: "Pulsar Media design" bit.ly/1szLobl