View Article

  • Overfitting In Decision Trees: Remedies Through Ensemble Learning And Explainable AI

  • Department of AI and Data Science, Dr. Akhilesh Das Gupta Institute of Professional Studies, Delhi, India

Abstract

This study focuses on improving the performance of decision tree models by addressing key challenges such as overfitting and lack of interpretability. To overcome these issues, ensemble learning techniques including bagging and boosting are applied to enhance prediction accuracy and model stability. In addition, explainable artificial intelligence methods such as SHAP and LIME are used to improve model transparency and provide insights into feature importance. The proposed approach also incorporates data preprocessing techniques, including handling imbalanced datasets using SMOTE. The experimental results show that the proposed model outperforms traditional decision tree models in terms of accuracy and overall performance. However, the increased computational complexity highlights certain limitations. Overall, this study provides an effective approach for developing accurate and interpretable machine learning models for real-world applications.

Keywords

Decision Tree, Overfitting, Ensemble Learning, Random Forest, Gradient Boosting, Explainable Artificial Intelligence.

Introduction

Machine learning has become a key technology in modern applications enabling systems to learn from data and make decisions. It is widely used in domains such as healthcare, finance, and recommendation systems. Among various machine learning techniques decision tree algorithms are popular due to their simplicity, ease of implementation, and interpretability [1].

A decision tree is a supervised learning algorithm that can be used for both classification and regression tasks. It works by recursively partitioning the data into subsets based on feature values, making decisions at each node to maximize a specific criterion. On the other hand despite these advantages decision trees suffer from a significant limitation known as overfitting. Overfitting occurs when a model is trained too well on the training data and captures noise instead of meaningful patterns. As a result, model performs well on training data but fails to generalize effectively to unseen data. This issue is common in deep decision trees where excessive branching makes the model highly specific and less robust.  This leads to weak adaptability of new, unseen data undermining the model’s predictive or explanatory power [2], [3].

To address this challenge, ensemble learning techniques have been widely used. Ensemble learning is a technique that uses multiple models of different kinds to create one model to improve accuracy and stability. One of the most commonly used technique is Random forest which constructs multiple decision trees using different subsets of data and aggregates their predictions. This approach reduces variance and significantly improves generalization performance [4], [5]. Another powerful ensemble technique is Gradient Boosting which builds models sequentially by correcting the errors of previous models. Gradient boosting focuses on improving performance step by step. This technique gives high predictive accuracy but increase the complexity of the model. Ensemble methods help improve accuracy and reduce overfitting but they introduce a new challenge related to interpretability. Unlike a single decision tree, these models are harder to understand. It becomes difficult to understand how they make decisions. This lack of transparency can reduce trust in machine learning systems [6].

To overcome this limitation, two widely used explainable artificial intelligence (XAI) techniques such as Local Interpretable model-agnostic Explanations (LIME) and Shapely Additive Explanations (SHAP) have been developed. Building on these methods, this research proposes a novel interpretability metric that evaluates the consistency base on LIME and SHAP as a measure of robustness in model interpretability [7]. 

This paper aims to reduce overfitting in decision trees by applying ensemble learning techniques while maintaining interpretability using explainable AI techniques. The proposed approach seeks to achieve a balance between model performance and transparency.

LITERATURE REVIEW

Several studies have been conducted to improve the performance of decision trees and address the problem of overfitting. Various techniques like ensemble learning methods and explainable artificial intelligence approaches have been developed to enhance the model accuracy and interpretability. This section reviews the existing literature related to decision trees, overfitting, ensemble methods and interpretability techniques. Decision trees, although highly interpretable tend to overfit when the model becomes complex, especially in the presence of noisy and high dimensional data. This limitation has motivated the use of advanced techniques to improve model generalization [8].

The reviewed studies showed that ensemble learning techniques such as bagging and boosting significantly improves the predictive performance of the decision tree models by reducing overfitting. Additionally, the integration of explainable artificial intelligence (XAI), specifically SHAP and LIME has enhanced the model transparency and interpretability. These approaches have been successfully applied in various domains such as healthcare, finance to demonstrate there practical significance. A comparative analysis of existing approaches indicates that ensemble learning methods primarily focus on improving predictive accuracy whereas XAI techniques emphasize model interpretability. However, very few studies provide a unified framework that effectively balances both aspects [9], [10] [11], [12].

Despite these advancements, several challenges remain. Ensemble methods often increase computational complexity and reduce interpretability. While explainable AI techniques may produce inconsistent explanations and require significant computational resources. Furthermore, existing studies focused either on improving interpretability or accuracy rather than achieving a balance between them. Therefore, there is a need to develop an efficient and scalable models that can give high predictive performance while maintaining interpretability. This highlights a critical research gap where an integrated approach combining ensemble learning and explainable AI is required to achieve both accuracy and transparency without significantly increasing computational cost [13], [14], [15], [16].

PROBLEM STATEMENT

Despite significant advancements in machine learning decision tree models continue to suffer from overfitting which limits their ability to generalize unseen data. Ensemble learning techniques, such as bagging and boosting help improve prediction performance of the decision tree models by reducing overfitting but they often increase model complexity and reduce interpretability. On the other hand, explainable artificial intelligence (XAI) techniques such as

SHAP and LIME enhance model transparency and interpretability but may produce inconsistent explanations and require significant computational resources. Furthermore, existing studies focused either on improving interpretability or accuracy rather than achieving a balance between them. Therefore, there is a need to develop an efficient and scalable models that can give high predictive performance while maintaining interpretability for real-world applications [17], [18], [19].

OBJECTIVE

The main objectives of this study are:

  • To analyze the problem of overfitting in decision trees.
  • To improve prediction accuracy using ensemble learning techniques.
  • To enhance model interpretability using XAI methods such as SHAP and LIME.
  • To handle imbalanced data using techniques like SMOTE.
  • To develop a model that balance accuracy, interpretability and computational efficiency.

METHODOLOGY

This study proposes a structured approach to improve the performance of decision tree models while maintaining interpretability. The methodology integrates data preprocessing, ensemble learning techniques and explainable artificial intelligence methods to achieve a balance between model transparency and prediction accuracy. The overall process consists of data collection, preprocessing, model training, application of ensemble learning methods and model interpretation using SHAP and LIME.

Proposed methodology of the system :

Data Collection

Data Preprocessing

Handling Imbalanced data (SMOTE)

Model Training (Decision Tree)

Ensemble Learning (Random Forest / Boosting)

Model Evaluation

XAI (SHAP + LIME)

Result Analysis

Data collection: The dataset is collected from reliable sources related to application domains like healthcare or financial data. This dataset contains multiple features used for prediction.

Data preprocessing: This process includes handling missing and duplicate values, normalization and feature selection to improve model performance.

Handling imbalanced data: To address imbalanced data, SMOTE (Synthetic Minority Over- Sampling Technique) is applied to generate synthetic samples and balance the dataset [20].

Model development: A decision tree model is initially trained to analyze the baseline performance. Further, ensemble learning techniques such as Random Forest and boosting are applied to improve accuracy and reduce overfitting [21].

Model Evaluation: The model performance is evaluated using metrices such as Accuracy, Precision, Recall and F1-score.

Explainability (XAI): SHAP and LIME are applied to interpret the model predictions. These techniques provide feature importance and local explanations, improving transparency and trust in the model [22].

Result Analysis: The result are analyzed to compare model performance and interpretability, ensuring that the proposed model approach achieves a balance between both.

RESULT AND DISCUSSION

The performance of the proposed model is evaluated using publicly available dataset. The dataset was preprocessed by handling missing and duplicate values, normalizing features and balancing class distribution using SMOTE. Different models, including decision trees and ensemble learning techniques were trained and evaluated using performance metrices such as accuracy, precision, recall and F1- score.

F1-

 

 

 

 

Model

Accuracy

Precision

Recall

Score

Decision Tree

78%

75%

72%

73%

Random Forest

88%

85%

84%

84%

Boosting Model

91%

89%

87%

88%

Proposed Model

93%

91%

90%

90%

(Ensemble + XAI)

 

 

 

 

The results indicate that ensemble learning techniques significantly improve model performance compared to the basic decision tree. The decision tree model shows lower accuracy due to overfitting which affects it ability to generalize unseen data. In contrast, Random Forest reduces overfitting by combining multiple trees, resulting in improved accuracy. Boosting further enhances performance by focusing on misclassified instances, leading to better prediction results.

The proposed model achieves the highest accuracy as it combines ensemble learning with preprocessing techniques such as SMOTE, which helps in handling imbalanced data. Additionally, the use of SHAP and LIME improves model interpretability by providing insights into feature importance and prediction behaviour [22].

However, the improved performance comes with certain limitations. The use of ensemble methods and XAI techniques increases computational complexity and requires more processing time. This may limit the applicability of the model in real-time systems.

CONCLUSION

This research aims to address the limitations of decision tree models particularly the problem of overfitting and lack of interpretability. To overcome these limitations, ensemble learning techniques such as Bagging and Boosting were applied to improve prediction accuracy and model stability. In addition explainable artificial intelligence (XAI) methods such as SHAP and LIME were integrated to enhance model transparency and provide insights into feature importance.

The experimental results demonstrated that the proposed approach significantly outperforms the traditional decision tree model in terms of accuracy, precision, recall, and F1score. The use of ensemble learning techniques helped in reducing overfitting while explainable artificial methods improves the understanding of the model. However , the integration of these techniques increases computational  complexity and may affect real- time applicability.

Overall, the study highlights the importance of balancing accuracy and interpretability in machine learning models and provides a foundation for developing efficient and reliable predictive systems.

FUTURE WORK

Although the proposed approach demonstrates improved performance and interpretability, there are several areas for future enhancement. Future research can focus on reducing computational complexity to make the model more efficient for real-time applications. Additionally, more advanced and consistent explainable AI techniques can be explored to overcome the limitations of SHAP and LIME.

Further improvements can include the use of deep learning models combined with explainability techniques to handle more complex datasets. The model can also be tested on larger and more diverse datasets to validate its robustness and scalability. Moreover, the proposed approach can be applied to real-world domains such as healthcare, financial prediction, and fraud detection to evaluate its practical usability [23].

REFERENCES

  1. Ibmoiye Domor Mienye and Nobert Jere, A survey of decision trees : concepts, algorithms and applications, IEEE Access, 2024
  2. A.D. Mankar, S.D. Bholte, K.G. Kharade, K.A. Raskar, Metaanalysis of overfitting of decision trees, Journal of Nonlinear Analysis and Optimization, 2024
  3. Erblin Halabaku, Eliot Bytyci, Overfitting in Machine learning: A comparative analysis of decision trees and random forests, Intelligent Automation & Soft Computing, 2024
  4. Hasan Ahmed Salman, Ali Kalakech and Amani Steiti, Random forest algorithm overview, Babylonian journal of machine learning, 2024
  5. Anantha Babu Shanmugavel, Vijayan Ellappan, Anand Mehendran, Murali Subramanian, Ramanathan Lakshmanan and Manuel Mazzara, A novel ensemble based reduced overfitting model model with convolutional neural network for traffic sign recognition system, Electronics (MDPI), 2023
  6. V. S. Stency, N Mohanasundaram, Revathi Santhosh, Ensembled gradient boosting technique with decision tree for intrusion detection system, International Journal of Intelligent systems and applications in engineering, 2024
  7. Ahmed Salih, Zahra Raisi, Ilaria Boscolo Galazzo, Petia Radeva, A perspective on explainable artificial intelligence methods: SHAP and LIME, Advanced Intelligent Systems, 2024
  8. Mykola Zlobin, Volodymyr Bazylevych, A data driven approach for balancing overfitting and underfitting in decision tree models, Collection of Scientific Papers, 2025
  9. Hongke Zhao, Wenhui Liu, Yaxian Wang, Likang Wu, Comparative analysis of algorithmic approaches in ensemble learning: Bagging and Boosting, Scientific Reports, 2025
  10. Abdallah, Hagar F. Gouda & Fatma D.M., Comparative performance of bagging and boosting ensemble models for predicting lumpy skin disease with multiclass-imbalanced data, Scientific Reports, 2025
  11. Evandro S. Ortigossa, Thales Goncalves, Luis Gustavo Nanato, Explainable artificial intelligence (XAI)- from theory to methods and applications, IEEE Access, 2024
  12. Bhawani Sankar Panigrahi, M. Vanitha , Mohd Ashraf, R.V.S Lalitha, D. Haritha, Ajith Sundaram, Explainable AI frameworks using SHAP and LIME enhance interpretable defect classification in additive manufacturing, Nondestructive Testing and Evaluation, 2026
  13. Abel Abusitta, Miles Q. Li, Benjamin C.M. Fung, Survey on explainable AI: Techniques, challenges and open issues, Expert Systems with Applications, 2024
  14. Trisna Ari Roshinta, Szucs Gabor, A comparative study of LIME and SHAP for enhancing trustworthiness and efficiency in explainable AI systems, IEEE International Conference on Computing (ICOCO),2024
  15. Joshua Pinem, Widi Astuti , Adiwijaya, Explainable Ensemble learning Framework with SMOTE, SHAP and LIME FOR PREDICTING 30- DAY readmission in Diabetic patients, Jurnal Resti ( Rekayasa Sistem dan Teknologi Informasi), 2025
  16. Ahmed Salih, Zahra Raisi, Ilaria Boscolo Galazzo, Petia Radeva, Steffen Erhard Petersen, Karim Lekadir, Gloria Manegez, A perspective on explainable artificial intelligence methods: SHAP and LIME, Advanced Intelligent Systems, 2024
  17. MD.    Mahmudal                 Hasan, Understanding model predictions: A comparative analysis of SHAP and LIME on various ML    algorithms,          Journal of Scientific     and Technological Research, 2024
  18. Hagar F. Gouda, Fatma D.M. Abadallah, Comparative performance of bagging and boosting ensemble models for predicting lumpy skin disease with multiclass- imbalanced data, Scientific Reports, 2025
  19. Ashima Kukkar, Gagandeep Kaur, A novel adaptive ensemble          classifier with LIME     and       SHAP-Based interpretability for fake news detection, Expert Systems with Applications, 2025
  20. Essa E. Almazroei, Ensemble machine learning framework with SHAP and LIME for accurate early prediction of student success in online learning environments,Scientific Reports, 2026
  21. Mie Wang, Feixiang Ying, Jianing Yang & Dongming Zhu, An explainable (interpretable) stacking ensemble machine learning model for real- time and short- term significant sea wave height prediction, Sustainable Energy Technologies and Assessments, 2026
  22. Bhawani Sankar Panigrahi, M. Vanitha, Mohd Ashraf, R.V.S. Lalitha, D. Haritha & Ajith Sundaram, Explainable AI frameworks using SHAP and LIME enhance interpretable defect classification in additive manufacturing, Nondestructive Testing and Evaluation, 2026
  23. Guillermo A. Francia lii, Hossain Shahriar, Eman El- Sheikh, Md Abdur Rahman, Sheikh Iqbal Ahamed, An explainable artificial intelligence approach for improved dynamic analysis with SHAP and LIME, IEEE International Conference on Computing (ICOCO), 2026

Reference

  1. Ibmoiye Domor Mienye and Nobert Jere, A survey of decision trees : concepts, algorithms and applications, IEEE Access, 2024
  2. A.D. Mankar, S.D. Bholte, K.G. Kharade, K.A. Raskar, Metaanalysis of overfitting of decision trees, Journal of Nonlinear Analysis and Optimization, 2024
  3. Erblin Halabaku, Eliot Bytyci, Overfitting in Machine learning: A comparative analysis of decision trees and random forests, Intelligent Automation & Soft Computing, 2024
  4. Hasan Ahmed Salman, Ali Kalakech and Amani Steiti, Random forest algorithm overview, Babylonian journal of machine learning, 2024
  5. Anantha Babu Shanmugavel, Vijayan Ellappan, Anand Mehendran, Murali Subramanian, Ramanathan Lakshmanan and Manuel Mazzara, A novel ensemble based reduced overfitting model model with convolutional neural network for traffic sign recognition system, Electronics (MDPI), 2023
  6. V. S. Stency, N Mohanasundaram, Revathi Santhosh, Ensembled gradient boosting technique with decision tree for intrusion detection system, International Journal of Intelligent systems and applications in engineering, 2024
  7. Ahmed Salih, Zahra Raisi, Ilaria Boscolo Galazzo, Petia Radeva, A perspective on explainable artificial intelligence methods: SHAP and LIME, Advanced Intelligent Systems, 2024
  8. Mykola Zlobin, Volodymyr Bazylevych, A data driven approach for balancing overfitting and underfitting in decision tree models, Collection of Scientific Papers, 2025
  9. Hongke Zhao, Wenhui Liu, Yaxian Wang, Likang Wu, Comparative analysis of algorithmic approaches in ensemble learning: Bagging and Boosting, Scientific Reports, 2025
  10. Abdallah, Hagar F. Gouda & Fatma D.M., Comparative performance of bagging and boosting ensemble models for predicting lumpy skin disease with multiclass-imbalanced data, Scientific Reports, 2025
  11. Evandro S. Ortigossa, Thales Goncalves, Luis Gustavo Nanato, Explainable artificial intelligence (XAI)- from theory to methods and applications, IEEE Access, 2024
  12. Bhawani Sankar Panigrahi, M. Vanitha , Mohd Ashraf, R.V.S Lalitha, D. Haritha, Ajith Sundaram, Explainable AI frameworks using SHAP and LIME enhance interpretable defect classification in additive manufacturing, Nondestructive Testing and Evaluation, 2026
  13. Abel Abusitta, Miles Q. Li, Benjamin C.M. Fung, Survey on explainable AI: Techniques, challenges and open issues, Expert Systems with Applications, 2024
  14. Trisna Ari Roshinta, Szucs Gabor, A comparative study of LIME and SHAP for enhancing trustworthiness and efficiency in explainable AI systems, IEEE International Conference on Computing (ICOCO),2024
  15. Joshua Pinem, Widi Astuti , Adiwijaya, Explainable Ensemble learning Framework with SMOTE, SHAP and LIME FOR PREDICTING 30- DAY readmission in Diabetic patients, Jurnal Resti ( Rekayasa Sistem dan Teknologi Informasi), 2025
  16. Ahmed Salih, Zahra Raisi, Ilaria Boscolo Galazzo, Petia Radeva, Steffen Erhard Petersen, Karim Lekadir, Gloria Manegez, A perspective on explainable artificial intelligence methods: SHAP and LIME, Advanced Intelligent Systems, 2024
  17. MD.    Mahmudal                 Hasan, Understanding model predictions: A comparative analysis of SHAP and LIME on various ML    algorithms,          Journal of Scientific     and Technological Research, 2024
  18. Hagar F. Gouda, Fatma D.M. Abadallah, Comparative performance of bagging and boosting ensemble models for predicting lumpy skin disease with multiclass- imbalanced data, Scientific Reports, 2025
  19. Ashima Kukkar, Gagandeep Kaur, A novel adaptive ensemble          classifier with LIME     and       SHAP-Based interpretability for fake news detection, Expert Systems with Applications, 2025
  20. Essa E. Almazroei, Ensemble machine learning framework with SHAP and LIME for accurate early prediction of student success in online learning environments,Scientific Reports, 2026
  21. Mie Wang, Feixiang Ying, Jianing Yang & Dongming Zhu, An explainable (interpretable) stacking ensemble machine learning model for real- time and short- term significant sea wave height prediction, Sustainable Energy Technologies and Assessments, 2026
  22. Bhawani Sankar Panigrahi, M. Vanitha, Mohd Ashraf, R.V.S. Lalitha, D. Haritha & Ajith Sundaram, Explainable AI frameworks using SHAP and LIME enhance interpretable defect classification          in additive manufacturing, Nondestructive Testing and Evaluation, 2026
  23. Guillermo A. Francia lii, Hossain Shahriar, Eman El- Sheikh, Md Abdur Rahman, Sheikh Iqbal Ahamed, An explainable artificial intelligence approach for improved dynamic analysis with SHAP and LIME, IEEE International Conference on Computing (ICOCO), 2026

Photo
Mansi
Corresponding author

Department of AI and Data Science, Dr. Akhilesh Das Gupta Institute of Professional Studies, Delhi, India

Photo
Yatu Rani
Co-author

Department of AI and Data Science, Dr. Akhilesh Das Gupta Institute of Professional Studies, Delhi, India

Photo
Archana Kumar
Co-author

Department of AI and Data Science, Dr. Akhilesh Das Gupta Institute of Professional Studies, Delhi, India

Mansi*, Yatu Rani, Archana Kumar, Overfitting In Decision Trees: Remedies Through Ensemble Learning And Explainable AI, Int. J. Sci. R. Tech., 2026, 3 (5), 214-219. https://doi.org/10.5281/zenodo.20050883

More related articles
Power Electronics in Renewable Energy Systems...
Joel Nzanzu Kanduki, Ndjabu Dhedonga Elisée, Pascal Mushage Bond...
A Review on Health-Related Effects & Pharmacologic...
Vishal Madankar, Anil Panchal, Nikhil Dond, ...
From Localization to Connectomics: A Contemporary View of Human Brain Structure ...
Arnab Roy, Mahesh Kumar Yadav, Anchal Kumari , Karishma Kumari , Amit Kumar Prajapati , Vivek Prajap...
Computer-Aided Drug Design in Modern Pharmaceutical Research: An In-Silico Persp...
Deep Jyoti Shah, Mahesh Kumar Yadav, Astha Topno, Shivam Kashyap, Jiten Goray, Ajay Kumar, Amisha Ku...
Related Articles
A Comprehensive Review on Oral Disintegrating Tablets...
Bhagyashri Randhawan, Shruti Deshpande, Gayatri Gadve, Shubhangi Shete, Monika Waghmode, ...
Formulation, Development & Evaluation Of Polyherbal Sunscreen...
Markunde Onkar Ramu, Yede Sainath Nagorao, Jadhav Pramod Ramrao, Kale Ajwita Satishrao, Zanwar Vijay...
Ion Exchange Chromatography in the Analysis of Brain-Derived DNA: Unravelling th...
Arnab Roy, Mahesh Kumar Yadav, Ashish Kumar, Rishu Raj , ...
More related articles
Power Electronics in Renewable Energy Systems...
Joel Nzanzu Kanduki, Ndjabu Dhedonga Elisée, Pascal Mushage Bondo, ...
Power Electronics in Renewable Energy Systems...
Joel Nzanzu Kanduki, Ndjabu Dhedonga Elisée, Pascal Mushage Bondo, ...