View Article

  • Explainable Chronic Kidney Disease Prediction Using LightGBM with Shap and Fuzzy Rule-Based System

  • Department of Electronics Communication and Engineering, Sri Venkateshwara University College of Engineering, Tirupati

Abstract

Chronic Kidney Disease (CKD) is a progressive condition, which in most cases can go through without realisation until the late stages, so timely identification and clarification of the condition is important to intervene. This paper presents a machine learning explainable system to predict CKD stage with Light Gradient Boosting Machine (LightGBM), Synthetic Minority Oversampling Technique (SMOTE), explainability with the help of SHAP, and a fuzzy rule-based reasoning system. The data have biochemical, clinical, lifestyle, and urinalysis characteristics related to the severity of CKD. The pipeline of systematic preprocessing was built to address the problem of missing data, coding of nominal data, the normalisation of numerical data, and the large class imbalance with the use of SMOTE. The reason why LightGBM was chosen is that it is efficient and able to capture non-linear and complicated relationships across clinical data. Probability calibration was done using Platt scaling to enhance clinical reliability. SHAP was included to offer global and local interpretability, which would guarantee transparency behind every prediction. A fuzzy reasoning layer, A model that converted model outputs to intuitive linguistic rules, was used to improve clinical understanding. The results of the experiments demonstrate a weighted F1-score of 0.87-0.92, which is an indicator of a high predictive ability. SHAP analysis identified such important biomarkers as GFR, creatinine, and BUN. A graphical user interface was created to provide real-time predictions, SHAP visualisation, and recommendations with personalisation. The hybrid framework shows that the decision-support tool used to assist in making decisions about CKD staging is clinically viable, transparent, and accurate.

Keywords

CKD prediction, LightGBM, SHAP, Fuzzy logic, Explainable AI, SMOTE, Clinical decision support

Introduction

Chronic Kidney Disease (CKD) is one of the greatest global health challenges that afflicts over 850 million people worldwide [1]. Clinically, CKD can be characterised by the consistent decrease of kidney ability to filter, often expressed in estimated glomerular filtration rate (eGFR), serum creatinine and proteinuria [1, 2]. Due to CKD being a silent disease in most cases, most patients develop the condition to a complicated stage before they are well attended to, exposing them to cardiovascular issues, hospitalisation, and even death [3]. An increase in cases of diabetes, high blood pressure, and other lifestyle-related illnesses has played a major role in CKD in developing nations [3]. The conventional diagnosis is based on manual analysis of biochemical pointers. This is, however, complicated with complex datasets that have multivariate relationships that cannot easily be established by human evaluation. Machine learning (ML) models have proven to have significant potential in CKD prediction because they are able to process complex clinical data and identify concealed patterns [4, 5]. Random Forest (RF), Logistic Regression (LR), Support Vector Machines (SVM), as well as boosting models, including XGBoost, have demonstrated encouraging performance [4, 6]. Although this has been achieved, the majority of research has been done on binary classification, between CKD and non-CKD, and this constrains clinical relevance to treatment planning since CKD progression is very severe based on the stage of advancement [6, 13]. Another important challenge is interpretability. ML models, especially those based on ensemble and boosting, are considered black boxes (their inner workings are hard to understand) in some way. This limits its implementation in health systems demanding transparent, auditable and clinically interpretable decisions in order to guarantee patient safety and trust [9, 14]. SHAP (SHapley Additive exPlanations) is the solution to this problem by offering mathematically consistent contributions of each feature to the final prediction [9]. Nevertheless, numerical SHAP values can also not be intuitively understood by clinicians. Fuzzy logic, which is based on the human reasoning style, is a natural solution when the model decisions are translated into the form of readable rules [10, 12]. The literature on the prediction of CKD has several research gaps. To begin with, stage-wise classification of CKD is still scanty, with the majority of the studies conducting binary classification as their approaches [13, 16]. Second, the terrible class imbalance in real-world CKD datasets, especially at low stages, can be observed, and the issue is not properly covered in many studies [8]. Third, many ML models employed to predict CKD do not have or lack adequate explainability [14]. Fourth, hybrid systems are uncommon that include hybridisation of ML, SHAP, and fuzzy reasoning. Finally, tools including GUI-based decision-support systems are deployable and are not available, which restricts the application in clinical and screening settings [13]. The proposed study is a bridge between these gaps because it presents a complete explainable model of CKD stage prediction with LightGBM and SMOTE, as well as SHAP and fuzzy reasoning, and a graphical interface. The main findings of the research are the following:

  • The creation of a holistic CKD stage prediction tool that can predict all 6 stages (0-5).
  • Successful management of class imbalance by optimising the performance of the minority classes with the help of SMOTE.
  • SHAP international and local explainability implementation.
  • Mechanism of clinical interpretability: The integration of fuzzy rules.
  • Creation of a GUI to predict in real-time, visualise and make personalised suggestions.

LITERATURE REVIEW

Machine learning has experienced a significant amount of CKD prediction research, with the first models of this type examining the CKD presence by classifying it through the use of random forest (RF), Support Vector Machines (SVM), and Logistic Regression (LR) models, using structured clinical data [4, 5]. RF was effective in predicting because it was robust to noise and had the capacity to predict nonlinear relationships, whereas SVM was effective at dealing with high-dimensional medical data. XGBoost was subsequently enhanced with gradient boosting to achieve better precision, but remained poor at interpretation due to its complicated internal characterisation [6]. One of the most significant weaknesses witnessed in CKD datasets is an extreme imbalance of classes, in which most are of early-stage, and few are of advanced CKD (4-5) stages. Such an imbalance may cause biased model training, which will cause impoverished generalisation on minority classes. Synthetic Minority Oversampling Technique (SMOTE) has already been shown to be efficient in addressing such imbalance by creating synthetic samples in clusters of minority classes, thus enhancing the bias and recall of classifiers [8]. Research that has included SMOTE has had a continued increase in F1-score and sensitivity on underrepresented CKD groups. Machine learning must be adopted in healthcare because it has to be interpretable. One of the most mathematically sound schemes to explain model choices is SHAP, which is an algorithm introduced by Lundberg and Lee that computes the marginal contribution of each feature to the output [9]. SHAP has demonstrated itself to be a promising predictive model in areas of diabetes, cardiovascular disease, and oncology, yet it is hardly used to explain CKD staging. Arvind et al. noted that SHAP could be appropriate in the clinical setting, particularly because it could provide explanations that corresponded to physician reasoning and regulatory sustainability [14]. In addition to numerical interpretability, fuzzy logic provides the ability to think in a human manner with the use of linguistic representations like low GFR, moderately high creatinine, or high BUN. Fuzzy logic, originally introduced by Zadeh [10] and extended by Kosko [11], is highly used in clinical diagnostic systems because it is more flexible in uncertainty management and its interpretation ability. Son et al. proved that the use of fuzzy rule-based reasoning has been found to increase both clinician trust and enhance the usability of the decision-support system [12]. The latest systematic reviews of CKD prediction models highlight various gaps in the current literature that remain unaddressed [13, 16]. These gaps are a scarcity of research on stage-by-stage classification, inadequate work with skewed datasets, little incorporation of explainability methods like SHAP, and the absence of solutions linking ML with fuzzy and user interfaces. Moreover, most of the models are at the stage of academic research and do not become real-world clinical solutions because of the lack of deployable GUI-based solutions [13]. Resting on the above observations, it is evident that there is a need to have a holistic CKD prediction framework that:

  • carries out prediction on a stage-by-stage basis,
  • manages the issue of class imbalance,
  • explains openly,
  • is a fuzzy system that incorporates fuzzy reasoning to achieve clinical interpretability, and
  • provides a GUI for real-time decision support.

All these research gaps are discussed in the current study, which is why it can be regarded as an important contribution to the CKD prediction literature.

METHODOLOGY

The proposed system of predicting CKD incorporates the preprocessing of data, balancing of classes, machine learning classification, probability calibration, explainability using SHAP, reasoning rules (fuzzy), and deployment into the GUI. The pipeline is multistage and therefore has high predictive accuracy, transparency and clinical usability.

Reference

  1. Kidney Disease: Improving Global Outcomes (KDIGO). KDIGO 2012 Clinical Practice Guideline for the Evaluation and Management of Chronic Kidney Disease. Kidney International Supplements, 2013.
  2. Levey, A.S., et al. Definition and classification of chronic kidney disease: A position statement from Kidney Disease: Improving Global Outcomes (KDIGO). Kidney International, 67, pp. 2089–2100, 2005.
  3. Jha, V., Garcia-Garcia, G., Iseki, K., Li, Z., et al. Chronic kidney disease: Global dimension and perspectives. The Lancet, 382(9888), pp. 260–272, 2013.
  4. Kshirsagar, N.T., et al. Machine learning models for chronic kidney disease prediction: A comparative study. IEEE Access, 9, pp. 12338–12348, 2021.
  5. Breiman, L. Random forests. Machine Learning, 45(1), pp. 5–32, 2001.
  6. Chen, T., Guestrin, C. XGBoost: A scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD Conference, pp. 785–794, 2016.
  7. Ke, G., et al. LightGBM: A highly efficient gradient boosting decision tree. Advances in Neural Information Processing Systems, pp. 3146–3154, 2017.
  8. Chawla, N.V., et al. SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, pp. 321–357, 2002.
  9. Lundberg, S.M., Lee, S. A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems, pp. 4765–4774, 2017.
  10. Zadeh, L.A. Fuzzy sets. Information and Control, 8(3), pp. 338–353, 1965.
  11. Kosko, B. Fuzzy Engineering. Prentice Hall, New Jersey, 1997.
  12. Son, H., Seo, J., Kim, C. Data-driven fuzzy rule-based system for clinical decision-making. Expert Systems with Applications, 42(1), pp. 574–586, 2015.
  13. Gunarathne, S., Meegahapola, H., Wicramasinghe, A. Chronic kidney disease prediction using machine learning techniques. Procedia Computer Science, 232, pp. 802–811, 2024.
  14. Arvind, R., et al. Explainable AI models for healthcare: A review of SHAP and LIME. IEEE Reviews in Biomedical Engineering, 16, pp. 1–16, 2023.
  15. Razzak, M.I., Naz, S., Zaib, A. Deep learning for medical image processing. Neurocomputing, 300, pp. 48–64, 2018.
  16. Kuo, J.D., et al. Predicting CKD progression using machine learning – A systematic review. BMC Nephrology, 22(319), pp. 1–16, 2021.

Photo
Govardan Sai Palla
Corresponding author

Department of Electronics Communication and Engineering, Sri Venkateshwara University College of Engineering, Tirupati

Photo
Dr. I. Kullayamma
Co-author

Department of Electronics Communication and Engineering, Sri Venkateshwara University College of Engineering, Tirupati

Govardan Sai Palla*, Dr. I. Kullayamma, Explainable Chronic Kidney Disease Prediction Using LightGBM with Shap and Fuzzy Rule-Based System, Int. J. Sci. R. Tech., 2025, 2 (12), 174-184. https://doi.org/10.5281/zenodo.17918804

More related articles
Formulation and Evaluation of Paracetamol Matrix T...
Amit Dubey, Kusum Kumari, Lubna Shaheen, Manish Kumar, Mo Rayyan,...
Nodular Sclerosis Classical Hodgkin Lymphoma in a ...
Saira Susan Thomas, Pallippat Thumban Kheif Mamu, Manjima Sunil, ...
Niosome As A Promising Tool for Increasing the Eff...
Nabamita Sen, Fowad Khurshid, M. Ganga Raju, J. Tejaswi, B. Tejas...
An Overview of the 3-IN-1 TEXT TOOLS And Its Application...
Suman Rani, Gaurav Pratap Singh Chauhun, ...
Targeting and Reversing HIV Latency Using Novel 'Block and Lock' Strategies: A C...
Arnab Roy, Sandeep Prasad Verma , Nikita Kumari , Bikash Vishwakarma, Kristy Kumari , Sajid Ansari...
Privacy and Cybersecurity in Smart Devices: Challenges and Opportunity...
Oketayo Abimbola M., Nriagu’ Chukwunonso, Oduwole Oluwakemi O., ...
Related Articles
Association Of Oral Health Status In Relation To BMI, Screen Time, And Physical ...
Mustak Sheriff, A.S.Nithyashri, S.M.Nishaanth, C.Selvakumar , T.Yoka, ...
Geospatial Assessment of Agricultural Land Suitability in IFE South, Osun State,...
Omisore Oyelola, Oluwasegun A. John, Ojetade Olayinka Julius, John A. Eyinade, ...
Disparities in Access to Essential Medicines in India: A Systematic Review of Av...
Arnab Roy, Alok Kumar , Ankit Kumar Srivastava , Faijan Ansari , Kishor Kumar , Madhu Vishwakarma ...
Formulation and Evaluation of Pineapple Based Herbal Cough Syrup...
Sanika Kondhalkar, Vishal Madankar, Anil Panchal, ...
Formulation and Evaluation of Paracetamol Matrix Tablet Using Natural Polymer...
Amit Dubey, Kusum Kumari, Lubna Shaheen, Manish Kumar, Mo Rayyan, Mohammad Saklain, Mohd Aarif Khan,...
More related articles
Formulation and Evaluation of Paracetamol Matrix Tablet Using Natural Polymer...
Amit Dubey, Kusum Kumari, Lubna Shaheen, Manish Kumar, Mo Rayyan, Mohammad Saklain, Mohd Aarif Khan,...
Nodular Sclerosis Classical Hodgkin Lymphoma in a Young Adult: A Comprehensive C...
Saira Susan Thomas, Pallippat Thumban Kheif Mamu, Manjima Sunil, ...
Niosome As A Promising Tool for Increasing the Effectiveness of Anti-Diabetic Dr...
Nabamita Sen, Fowad Khurshid, M. Ganga Raju, J. Tejaswi, B. Tejaswini, M. Sruthi, ...
Formulation and Evaluation of Paracetamol Matrix Tablet Using Natural Polymer...
Amit Dubey, Kusum Kumari, Lubna Shaheen, Manish Kumar, Mo Rayyan, Mohammad Saklain, Mohd Aarif Khan,...
Nodular Sclerosis Classical Hodgkin Lymphoma in a Young Adult: A Comprehensive C...
Saira Susan Thomas, Pallippat Thumban Kheif Mamu, Manjima Sunil, ...
Niosome As A Promising Tool for Increasing the Effectiveness of Anti-Diabetic Dr...
Nabamita Sen, Fowad Khurshid, M. Ganga Raju, J. Tejaswi, B. Tejaswini, M. Sruthi, ...