Cardiovascular diseases are among the most prevalent health conditions worldwide, contributing to significant morbidity and mortality. According to the World Health Organization, early detection and intervention are crucial in mitigating risks and improving patient outcomes. Advances in data science and machine learning have paved the way for predictive systems that can assist healthcare providers in diagnosing and managing such conditions more effectively. The heart is a vital organ responsible for pumping blood throughout the body, ensuring the proper functioning of all other organs. If the heart fails to operate correctly, critical systems such as the brain and other organs cease functioning, leading to death within minutes. Changes in lifestyle, work-related stress, and unhealthy dietary habits have significantly contributed to the rise of heart-related diseases globally. Heart diseases have emerged as a leading cause of mortality worldwide. According to the World Health Organization (WHO), cardiovascular diseases account for approximately 17.7 million deaths annually, representing 31% of all global deaths. In India, CVDs account for a significant proportion of deaths, ranging from 30% to 42%. The Global Burden of Disease study estimates that the age-standardized CVD death rate in India is 272 per 100,000 people, which is higher than the global average of 235 per 100,000. In India, heart-related diseases have become the primary cause of death, with 1.7 million fatalities reported in 2016, as per the 2016 Global Burden of Disease Report. The economic impact is equally severe; between 2005 and 2015, India is estimated to have
LITERATURE SURVEY
Chala Beyene et al [1] proposed a framework for the Prediction and Analysis of Heart Disease Occurrence Using Data Mining Techniques. The primary goal of their methodology is to facilitate early and automated diagnosis of heart disease, delivering results swiftly. This approach is particularly beneficial in healthcare organizations with limited expertise or insufficient specialized skills. The proposed system employs a variety of medical attributes, including blood sugar levels, heart rate, age, and sex, to determine whether an individual is at risk of heart disease. By leveraging these attributes, the framework aims to improve the accuracy of predictions and assist in timely medical intervention. Senthil Kumar Mohan et al [2], proposed a hybrid machine learning approach for predicting heart diseases using the Cleveland dataset. Their method begins with a data pre-processing step, where tuples with missing values are removed, and non-essential attributes like age and sex are excluded, as they were deemed personal and irrelevant to prediction accuracy. The remaining 11 attributes, which hold significant clinical relevance, were retained for analysis. The authors introduced a Hybrid Random Forest Linear Method (HRFLM), combining Random Forest (RF) and Linear Method (LM). The HRFLM framework comprises four main algorithms:
- Dataset Partitioning: A decision tree partitions the input dataset into feature-specific leaf nodes.
- Rule Application: Classification rules are applied to the partitioned dataset, resulting in labeled outputs.
- Feature Extraction: Using a Less Error Classifier, the
METHODOLOGY
The Heart Disease Prediction System is structured into five main modules: data structure analysis, data visualization, feature engineering, model building, and prediction. Each module is designed to address a specific aspect of the data science pipeline, ensuring a comprehensive approach to data exploration and model development.
Data Loading and Exploration
The application begins by loading the heart disease dataset, which contains various medical attributes such as age, cholesterol levels, blood pressure, and others. These attributes are crucial for predicting the target variable, which indicates the presence or absence of heart disease. The dataset is loaded using Pandas and cached to improve performance. Users can explore the dataset's structure, including its shape, column names, data types, and summary statistics. This step provides a foundational understanding of the data. The first step in the methodology involves loading the dataset and performing initial exploration to understand the data structure, identify missing values, and recognize patterns. For this project, we use the Heart Disease dataset, which contains 303 instances, each representing a patient’s medical record. Each record includes 14 attributes that describe various aspects of the patient's medical condition, such as age, sex, blood pressure, cholesterol levels, and whether they suffer from heart disease.
a. Loading the Data
The dataset is loaded from a CSV file using a Python library like Pandas. Pandas provides an efficient way to read and manipulate structured data. The dataset is loaded into a Data Frame for easy
Himanshu Kothari*
Suman Rani
10.5281/zenodo.15715476