View Article

  • An Overview of the Heart Disease Prediction Using Machine Learning and its Application

  • Computer Science and Engineering, GRD IMT Dehradun

Abstract

The Heart Disease Prediction System is an application designed to predict the presence of heart disease in individuals based on critical medical data. This web-based solution, implemented using Python and Streamlit, combines data science and machine learning techniques to offer an intuitive and interactive interface for healthcare professionals, researchers, and students. By leveraging supervised learning algorithms such as K-Nearest Neighbors (KNN), Decision Trees, Random Forests, and Support Vector Machines (SVM), the application facilitates accurate predictions while enabling exploratory data analysis. Key functionalities include detailed data visualization, advanced feature engineering, and model selection, all of which aim to improve the interpretability and predictive power of the system. The project highlights the potential of machine learning in addressing critical health challenges, providing an accessible and effective tool for disease prediction.

Keywords

Overview, Heart Disease Prediction, Machine Learning, its Application

Introduction

Cardiovascular diseases are among the most prevalent health conditions worldwide, contributing to significant morbidity and mortality. According to the World Health Organization, early detection and intervention are crucial in mitigating risks and improving patient outcomes. Advances in data science and machine learning have paved the way for predictive systems that can assist healthcare providers in diagnosing and managing such conditions more effectively. The heart is a vital organ responsible for pumping blood throughout the body, ensuring the proper functioning of all other organs. If the heart fails to operate correctly, critical systems such as the brain and other organs cease functioning, leading to death within minutes. Changes in lifestyle, work-related stress, and unhealthy dietary habits have significantly contributed to the rise of heart-related diseases globally. Heart diseases have emerged as a leading cause of mortality worldwide. According to the World Health Organization (WHO), cardiovascular diseases account for approximately 17.7 million deaths annually, representing 31% of all global deaths. In India, CVDs account for a significant proportion of deaths, ranging from 30% to 42%. The Global Burden of Disease study estimates that the age-standardized CVD death rate in India is 272 per 100,000 people, which is higher than the global average of 235 per 100,000. In India, heart-related diseases have become the primary cause of death, with 1.7 million fatalities reported in 2016, as per the 2016 Global Burden of Disease Report. The economic impact is equally severe; between 2005 and 2015, India is estimated to have

LITERATURE SURVEY

Chala Beyene et al [1] proposed a framework for the Prediction and Analysis of Heart Disease Occurrence Using Data Mining Techniques. The primary goal of their methodology is to facilitate early and automated diagnosis of heart disease, delivering results swiftly. This approach is particularly beneficial in healthcare organizations with limited expertise or insufficient specialized skills. The proposed system employs a variety of medical attributes, including blood sugar levels, heart rate, age, and sex, to determine whether an individual is at risk of heart disease. By leveraging these attributes, the framework aims to improve the accuracy of predictions and assist in timely medical intervention. Senthil Kumar Mohan et al [2], proposed a hybrid machine learning approach for predicting heart diseases using the Cleveland dataset. Their method begins with a data pre-processing step, where tuples with missing values are removed, and non-essential attributes like age and sex are excluded, as they were deemed personal and irrelevant to prediction accuracy. The remaining 11 attributes, which hold significant clinical relevance, were retained for analysis. The authors introduced a Hybrid Random Forest Linear Method (HRFLM), combining Random Forest (RF) and Linear Method (LM). The HRFLM framework comprises four main algorithms:

  1. Dataset Partitioning: A decision tree partitions the input dataset into feature-specific leaf nodes.
  2. Rule Application: Classification rules are applied to the partitioned dataset, resulting in labeled outputs.
  3. Feature Extraction: Using a Less Error Classifier, the

METHODOLOGY

The Heart Disease Prediction System is structured into five main modules: data structure analysis, data visualization, feature engineering, model building, and prediction. Each module is designed to address a specific aspect of the data science pipeline, ensuring a comprehensive approach to data exploration and model development.

Data Loading and Exploration

The application begins by loading the heart disease dataset, which contains various medical attributes such as age, cholesterol levels, blood pressure, and others. These attributes are crucial for predicting the target variable, which indicates the presence or absence of heart disease. The dataset is loaded using Pandas and cached to improve performance. Users can explore the dataset's structure, including its shape, column names, data types, and summary statistics. This step provides a foundational understanding of the data. The first step in the methodology involves loading the dataset and performing initial exploration to understand the data structure, identify missing values, and recognize patterns. For this project, we use the Heart Disease dataset, which contains 303 instances, each representing a patient’s medical record. Each record includes 14 attributes that describe various aspects of the patient's medical condition, such as age, sex, blood pressure, cholesterol levels, and whether they suffer from heart disease.

a. Loading the Data

The dataset is loaded from a CSV file using a Python library like Pandas. Pandas provides an efficient way to read and manipulate structured data. The dataset is loaded into a Data Frame for easy

Reference

  1. World Health Organization. "Cardiovascular diseases (CVDs)." Retrieved from: https://www.who.int/news-room/fact-sheets/detail/cardiovascular-diseases-(cvds)
  2. Scikit-learn Documentation. Retrieved from: https://scikit-learn.org/stable/documentation.html
  3. Streamlit Official Documentation. Retrieved from: https://docs.streamlit.io/
  4. Kaggle Dataset Repository. Retrieved from: https://www.kaggle.com/datasets
  5. Matplotlib Documentation. Retrieved from: https://matplotlib.org/stable/contents.html
  6. Seaborn Visualization Guide. Retrieved from: https://seaborn.pydata.org/
  7. UCI Machine Learning Repository: Heart Disease Dataset. Retrieved from: https://archive.ics.uci.edu/ml/datasets/Heart+Disease
  8. Python Official Documentation. Retrieved from: https://docs.python.org/3/
  9. NumPy Official Guide. Retrieved from: https://numpy.org/doc/
  10. Research Article: "Predicting Heart Disease Using Machine Learning Algorithms," Journal of Health Informatics.

Photo
Himanshu Kothari
Corresponding author

Computer Science and Engineering, GRD IMT Dehradun

Photo
Suman Rani
Co-author

Computer Science and Engineering, GRD IMT Dehradun

Himanshu Kothari*, Suman Rani, An Overview of the Heart Disease Prediction Using Machine Learning and its Application, Int. J. Sci. R. Tech., 2025, 2 (6), 560-565. https://doi.org/10.5281/zenodo.15715476

More related articles
A Study on Antioxidant from Natural Origin...
Dipak Sontakke, Vinod Chavare, Praful Patil, Om lole, Dhananjay P...
Related Articles
Ficus Sycomorus: Ecological, Cultural, And Medicinal Insights into A Timeless Tr...
Eunice Wothaya Muthee, Mathew Ngugi, Stephen Gitahi, Alex Machocho, ...
Overview Of In Vitro – Antioxidant Models...
Vishal Shewale , Shubham Pawar, Aakanksha Shewale , Nikita Sandhan , Priti Patle, Vaidehi Pawar , ...
Abdominal DSCT Effectiveness of Contrast Media Dose on Basis of BMI...
Virendra Kumar Maurya, Neha Gupta, Rahul Gangwar, Priyanka Saxena, Richa Mishra, Ravi Kumar, ...