Machine learning has become a key technology in modern applications enabling systems to learn from data and make decisions. It is widely used in domains such as healthcare, finance, and recommendation systems. Among various machine learning techniques decision tree algorithms are popular due to their simplicity, ease of implementation, and interpretability [1].
A decision tree is a supervised learning algorithm that can be used for both classification and regression tasks. It works by recursively partitioning the data into subsets based on feature values, making decisions at each node to maximize a specific criterion. On the other hand despite these advantages decision trees suffer from a significant limitation known as overfitting. Overfitting occurs when a model is trained too well on the training data and captures noise instead of meaningful patterns. As a result, model performs well on training data but fails to generalize effectively to unseen data. This issue is common in deep decision trees where excessive branching makes the model highly specific and less robust. This leads to weak adaptability of new, unseen data undermining the model’s predictive or explanatory power [2], [3].
To address this challenge, ensemble learning techniques have been widely used. Ensemble learning is a technique that uses multiple models of different kinds to create one model to improve accuracy and stability. One of the most commonly used technique is Random forest which constructs multiple decision trees using different subsets of data and aggregates their predictions. This approach reduces variance and significantly improves generalization performance [4], [5]. Another powerful ensemble technique is Gradient Boosting which builds models sequentially by correcting the errors of previous models. Gradient boosting focuses on improving performance step by step. This technique gives high predictive accuracy but increase the complexity of the model. Ensemble methods help improve accuracy and reduce overfitting but they introduce a new challenge related to interpretability. Unlike a single decision tree, these models are harder to understand. It becomes difficult to understand how they make decisions. This lack of transparency can reduce trust in machine learning systems [6].
To overcome this limitation, two widely used explainable artificial intelligence (XAI) techniques such as Local Interpretable model-agnostic Explanations (LIME) and Shapely Additive Explanations (SHAP) have been developed. Building on these methods, this research proposes a novel interpretability metric that evaluates the consistency base on LIME and SHAP as a measure of robustness in model interpretability [7].
This paper aims to reduce overfitting in decision trees by applying ensemble learning techniques while maintaining interpretability using explainable AI techniques. The proposed approach seeks to achieve a balance between model performance and transparency.
LITERATURE REVIEW
Several studies have been conducted to improve the performance of decision trees and address the problem of overfitting. Various techniques like ensemble learning methods and explainable artificial intelligence approaches have been developed to enhance the model accuracy and interpretability. This section reviews the existing literature related to decision trees, overfitting, ensemble methods and interpretability techniques. Decision trees, although highly interpretable tend to overfit when the model becomes complex, especially in the presence of noisy and high dimensional data. This limitation has motivated the use of advanced techniques to improve model generalization [8].
The reviewed studies showed that ensemble learning techniques such as bagging and boosting significantly improves the predictive performance of the decision tree models by reducing overfitting. Additionally, the integration of explainable artificial intelligence (XAI), specifically SHAP and LIME has enhanced the model transparency and interpretability. These approaches have been successfully applied in various domains such as healthcare, finance to demonstrate there practical significance. A comparative analysis of existing approaches indicates that ensemble learning methods primarily focus on improving predictive accuracy whereas XAI techniques emphasize model interpretability. However, very few studies provide a unified framework that effectively balances both aspects [9], [10] [11], [12].
Despite these advancements, several challenges remain. Ensemble methods often increase computational complexity and reduce interpretability. While explainable AI techniques may produce inconsistent explanations and require significant computational resources. Furthermore, existing studies focused either on improving interpretability or accuracy rather than achieving a balance between them. Therefore, there is a need to develop an efficient and scalable models that can give high predictive performance while maintaining interpretability. This highlights a critical research gap where an integrated approach combining ensemble learning and explainable AI is required to achieve both accuracy and transparency without significantly increasing computational cost [13], [14], [15], [16].
PROBLEM STATEMENT
Despite significant advancements in machine learning decision tree models continue to suffer from overfitting which limits their ability to generalize unseen data. Ensemble learning techniques, such as bagging and boosting help improve prediction performance of the decision tree models by reducing overfitting but they often increase model complexity and reduce interpretability. On the other hand, explainable artificial intelligence (XAI) techniques such as
SHAP and LIME enhance model transparency and interpretability but may produce inconsistent explanations and require significant computational resources. Furthermore, existing studies focused either on improving interpretability or accuracy rather than achieving a balance between them. Therefore, there is a need to develop an efficient and scalable models that can give high predictive performance while maintaining interpretability for real-world applications [17], [18], [19].
OBJECTIVE
The main objectives of this study are:
- To analyze the problem of overfitting in decision trees.
- To improve prediction accuracy using ensemble learning techniques.
- To enhance model interpretability using XAI methods such as SHAP and LIME.
- To handle imbalanced data using techniques like SMOTE.
- To develop a model that balance accuracy, interpretability and computational efficiency.
METHODOLOGY
This study proposes a structured approach to improve the performance of decision tree models while maintaining interpretability. The methodology integrates data preprocessing, ensemble learning techniques and explainable artificial intelligence methods to achieve a balance between model transparency and prediction accuracy. The overall process consists of data collection, preprocessing, model training, application of ensemble learning methods and model interpretation using SHAP and LIME.
Proposed methodology of the system :
Data Collection
↓
Data Preprocessing
↓
Handling Imbalanced data (SMOTE)
↓
Model Training (Decision Tree)
↓
Ensemble Learning (Random Forest / Boosting)
↓
Model Evaluation
↓
XAI (SHAP + LIME)
↓
Result Analysis
Data collection: The dataset is collected from reliable sources related to application domains like healthcare or financial data. This dataset contains multiple features used for prediction.
Data preprocessing: This process includes handling missing and duplicate values, normalization and feature selection to improve model performance.
Handling imbalanced data: To address imbalanced data, SMOTE (Synthetic Minority Over- Sampling Technique) is applied to generate synthetic samples and balance the dataset [20].
Model development: A decision tree model is initially trained to analyze the baseline performance. Further, ensemble learning techniques such as Random Forest and boosting are applied to improve accuracy and reduce overfitting [21].
Model Evaluation: The model performance is evaluated using metrices such as Accuracy, Precision, Recall and F1-score.
Explainability (XAI): SHAP and LIME are applied to interpret the model predictions. These techniques provide feature importance and local explanations, improving transparency and trust in the model [22].
Result Analysis: The result are analyzed to compare model performance and interpretability, ensuring that the proposed model approach achieves a balance between both.
RESULT AND DISCUSSION
The performance of the proposed model is evaluated using publicly available dataset. The dataset was preprocessed by handling missing and duplicate values, normalizing features and balancing class distribution using SMOTE. Different models, including decision trees and ensemble learning techniques were trained and evaluated using performance metrices such as accuracy, precision, recall and F1- score.
|
F1- |
|
|
|
|
|
Model |
Accuracy |
Precision |
Recall |
Score |
|
Decision Tree |
78% |
75% |
72% |
73% |
|
Random Forest |
88% |
85% |
84% |
84% |
|
Boosting Model |
91% |
89% |
87% |
88% |
|
Proposed Model |
93% |
91% |
90% |
90% |
|
(Ensemble + XAI) |
|
|
|
|
The results indicate that ensemble learning techniques significantly improve model performance compared to the basic decision tree. The decision tree model shows lower accuracy due to overfitting which affects it ability to generalize unseen data. In contrast, Random Forest reduces overfitting by combining multiple trees, resulting in improved accuracy. Boosting further enhances performance by focusing on misclassified instances, leading to better prediction results.
The proposed model achieves the highest accuracy as it combines ensemble learning with preprocessing techniques such as SMOTE, which helps in handling imbalanced data. Additionally, the use of SHAP and LIME improves model interpretability by providing insights into feature importance and prediction behaviour [22].
However, the improved performance comes with certain limitations. The use of ensemble methods and XAI techniques increases computational complexity and requires more processing time. This may limit the applicability of the model in real-time systems.
CONCLUSION
This research aims to address the limitations of decision tree models particularly the problem of overfitting and lack of interpretability. To overcome these limitations, ensemble learning techniques such as Bagging and Boosting were applied to improve prediction accuracy and model stability. In addition explainable artificial intelligence (XAI) methods such as SHAP and LIME were integrated to enhance model transparency and provide insights into feature importance.
The experimental results demonstrated that the proposed approach significantly outperforms the traditional decision tree model in terms of accuracy, precision, recall, and F1score. The use of ensemble learning techniques helped in reducing overfitting while explainable artificial methods improves the understanding of the model. However , the integration of these techniques increases computational complexity and may affect real- time applicability.
Overall, the study highlights the importance of balancing accuracy and interpretability in machine learning models and provides a foundation for developing efficient and reliable predictive systems.
FUTURE WORK
Although the proposed approach demonstrates improved performance and interpretability, there are several areas for future enhancement. Future research can focus on reducing computational complexity to make the model more efficient for real-time applications. Additionally, more advanced and consistent explainable AI techniques can be explored to overcome the limitations of SHAP and LIME.
Further improvements can include the use of deep learning models combined with explainability techniques to handle more complex datasets. The model can also be tested on larger and more diverse datasets to validate its robustness and scalability. Moreover, the proposed approach can be applied to real-world domains such as healthcare, financial prediction, and fraud detection to evaluate its practical usability [23].
REFERENCES
- Ibmoiye Domor Mienye and Nobert Jere, A survey of decision trees : concepts, algorithms and applications, IEEE Access, 2024
- A.D. Mankar, S.D. Bholte, K.G. Kharade, K.A. Raskar, Metaanalysis of overfitting of decision trees, Journal of Nonlinear Analysis and Optimization, 2024
- Erblin Halabaku, Eliot Bytyci, Overfitting in Machine learning: A comparative analysis of decision trees and random forests, Intelligent Automation & Soft Computing, 2024
- Hasan Ahmed Salman, Ali Kalakech and Amani Steiti, Random forest algorithm overview, Babylonian journal of machine learning, 2024
- Anantha Babu Shanmugavel, Vijayan Ellappan, Anand Mehendran, Murali Subramanian, Ramanathan Lakshmanan and Manuel Mazzara, A novel ensemble based reduced overfitting model model with convolutional neural network for traffic sign recognition system, Electronics (MDPI), 2023
- V. S. Stency, N Mohanasundaram, Revathi Santhosh, Ensembled gradient boosting technique with decision tree for intrusion detection system, International Journal of Intelligent systems and applications in engineering, 2024
- Ahmed Salih, Zahra Raisi, Ilaria Boscolo Galazzo, Petia Radeva, A perspective on explainable artificial intelligence methods: SHAP and LIME, Advanced Intelligent Systems, 2024
- Mykola Zlobin, Volodymyr Bazylevych, A data driven approach for balancing overfitting and underfitting in decision tree models, Collection of Scientific Papers, 2025
- Hongke Zhao, Wenhui Liu, Yaxian Wang, Likang Wu, Comparative analysis of algorithmic approaches in ensemble learning: Bagging and Boosting, Scientific Reports, 2025
- Abdallah, Hagar F. Gouda & Fatma D.M., Comparative performance of bagging and boosting ensemble models for predicting lumpy skin disease with multiclass-imbalanced data, Scientific Reports, 2025
- Evandro S. Ortigossa, Thales Goncalves, Luis Gustavo Nanato, Explainable artificial intelligence (XAI)- from theory to methods and applications, IEEE Access, 2024
- Bhawani Sankar Panigrahi, M. Vanitha , Mohd Ashraf, R.V.S Lalitha, D. Haritha, Ajith Sundaram, Explainable AI frameworks using SHAP and LIME enhance interpretable defect classification in additive manufacturing, Nondestructive Testing and Evaluation, 2026
- Abel Abusitta, Miles Q. Li, Benjamin C.M. Fung, Survey on explainable AI: Techniques, challenges and open issues, Expert Systems with Applications, 2024
- Trisna Ari Roshinta, Szucs Gabor, A comparative study of LIME and SHAP for enhancing trustworthiness and efficiency in explainable AI systems, IEEE International Conference on Computing (ICOCO),2024
- Joshua Pinem, Widi Astuti , Adiwijaya, Explainable Ensemble learning Framework with SMOTE, SHAP and LIME FOR PREDICTING 30- DAY readmission in Diabetic patients, Jurnal Resti ( Rekayasa Sistem dan Teknologi Informasi), 2025
- Ahmed Salih, Zahra Raisi, Ilaria Boscolo Galazzo, Petia Radeva, Steffen Erhard Petersen, Karim Lekadir, Gloria Manegez, A perspective on explainable artificial intelligence methods: SHAP and LIME, Advanced Intelligent Systems, 2024
- MD. Mahmudal Hasan, Understanding model predictions: A comparative analysis of SHAP and LIME on various ML algorithms, Journal of Scientific and Technological Research, 2024
- Hagar F. Gouda, Fatma D.M. Abadallah, Comparative performance of bagging and boosting ensemble models for predicting lumpy skin disease with multiclass- imbalanced data, Scientific Reports, 2025
- Ashima Kukkar, Gagandeep Kaur, A novel adaptive ensemble classifier with LIME and SHAP-Based interpretability for fake news detection, Expert Systems with Applications, 2025
- Essa E. Almazroei, Ensemble machine learning framework with SHAP and LIME for accurate early prediction of student success in online learning environments,Scientific Reports, 2026
- Mie Wang, Feixiang Ying, Jianing Yang & Dongming Zhu, An explainable (interpretable) stacking ensemble machine learning model for real- time and short- term significant sea wave height prediction, Sustainable Energy Technologies and Assessments, 2026
- Bhawani Sankar Panigrahi, M. Vanitha, Mohd Ashraf, R.V.S. Lalitha, D. Haritha & Ajith Sundaram, Explainable AI frameworks using SHAP and LIME enhance interpretable defect classification in additive manufacturing, Nondestructive Testing and Evaluation, 2026
- Guillermo A. Francia lii, Hossain Shahriar, Eman El- Sheikh, Md Abdur Rahman, Sheikh Iqbal Ahamed, An explainable artificial intelligence approach for improved dynamic analysis with SHAP and LIME, IEEE International Conference on Computing (ICOCO), 2026
Mansi*
Yatu Rani
10.5281/zenodo.20050883