- Importance of Ransomware Detection
Ransomware is one of the biggest cybersecurity issues since it may encrypt valuable files and result in severe financial and data losses to both individuals and organizations. The conventional security systems fail to work with the current ransomware because the attackers continuously adapt their methods to prevent them being identified by the signature-based detection system. Consequently, there is a high demand of smart detection techniques that are capable of detecting ransomware even when they are in new forms [1], [3].
The machine learning has been largely identified as one of the effective methods of detecting ransomware since it can be trained to identify patterns based on executable files and system behavior. Machine learning models can be used to differentiate between ransomware and legitimate software with higher precision than conventional approaches by examining such features as file organization and execution nature [6]. It is particularly significant that ransomware should be detected early since it is possible to minimize the damage and avoid massive encryption of the data when they are discovered before or at the initial stage of execution [7].
These studies underscore the need to come up with effective and early-stage ransomware detection systems that can keep the systems secure against the emerging cyber threats at a rapid pace.
- Limitations of Existing Systems
Despite the fact that a lot of the current ransomware detection systems employ machine learning and deep learning algorithms, they are limited by a number of practical reasons. The first problem is that a significant part of solutions rely on the already observed ransomware samples. Due to this, such systems can find it difficult to effectively capture new or unknown ransomware variants particularly when an attacker makes small adjustments to his or her behavior as a way of evading detection. The weakness has been emphasized in the literature in which models are highly effective on familiar datasets but less accurate when applied to unknown or realistic ransomware samples [10], [12].
The second weakness is that machine learning and deep learning models are not transparent enough. Most of the current systems are black boxes, and the users or security analysts may be unable to know how a specific file is identified as a ransom ware. This puts a lower level of confidence in automatic detection systems, and it becomes difficult to debug or refine the model. A number of researchers have indicated that explainability is a significant limitation to the practical implementation of ransomware detection solutions in practice [15]. Moreover, there are the current methods of detection that are based on dynamic or runtime analysis and that can be resource-consuming and add overhead to the performance of the host system. The constant checking of the behavior of the system, memory usage or disk activity can slow down the normal operations and might not fit the real-time deployment in all the devices [6]
Fig.1. Ransomware Attack [6]
- Research Contributions
The proposed research is aimed at enhancing machine learning classification and host-based ransomware detection based on the combination of machine learning and the analysis of the content of the static executable file. The proposed work also takes advantage of meaningful features of Windows executable files, unlike traditional security tools, which primarily utilize signatures in their operation; the training machine learning model is used to differentiate between benign and ransomware samples. Through file structure and behavioral related attributes, the system will identify ransomware early enough before it can cause much damage. Other machine learning-based methods have yielded good results in ransomware classification and this is what has prompted the design of this system [1], [3].
The other valuable addition of this piece of work is the inclusion of explainable artificial intelligence methods to enhance transparency. Most of the available systems are very accurate yet do not give clear explanations on their decisions. To deal with this problem, the proposed solution will integrate ways of explaining why a file is rated as malicious so that users can know the reason. This enhances confidence in the system and increases the analysis of security, which is consistent with the more recent research that emphasizes on the explainable models of ransomware detection [10], [15].
Besides, the system is crafted as a real-time host based monitoring platform that is constantly monitoring executable files in the local system. This enables the detection of ransomware in time and eliminates the reliance on network-based surveillance. This work adds to the more recent findings of importance to host-level analysis in order to detect ransomware faster and more accurately [6], [11].
Fig.2.Random Forest Classifier [9]
LITERATURE COMPARISON
Ransomware has become one of the gravest cybersecurity threats, focusing both on individual users and the multinational organizations, encrypting crucial information and requiring monetary payments as ransom. The initial ransomware detection mechanisms relied mainly on the signature-based antivirus systems which can only be effective in detecting familiar malware samples. Nevertheless, these conventional approaches find it hard to locate some of the new and undetected attacks with the advent of ransomware variants evolving fast. In order to address this shortcoming, a number of researchers have considered machine learning-guided ransomware detection methods that utilize executable files and system behavioral patterns as opposed to signatures alone.
Kunku et al. [1] suggested a machine learning-driven system to detect and classify ransomware using the help of fixed characteristics of the executable files. In their work, they have shown that ransomware and non-malicious software could be differentiated successfully with the help of such models as supervised learning and they were the basis of numerous further studies in the field.
Nanda Rani and Dhavale [2] examined the various approaches that can be employed to identify ransomware using machine learning and the role of features in enhancing the accuracy of detection. Their studies demonstrated that the carefully selected features can increase the model performance greatly.
Alraizza and Algarni [3] examined several machine learning models to detect ransomware and tested their efficacies using large data sets. Their findings supported the claim that ensemble and tree-based models are effective in the ransomware classification tasks.
Khamma [6] introduced a ransomware detection method application of the Random Forest algorithm. The paper has noted the strength of ensemble learning techniques to deal with complicated ransomware patterns.
Ghadhban Salman et al. [10] proposed a explainable deep learning model in the detection of unknown ransomware variants. Their research focused on the black-box characteristics of the deep learning models and highlighted the need to implement security applications in a way that they are understandable.
Zahoora et al. [12] suggested the deep-learning-based ransomware detection model based on the unsupervised feature extraction and ensemble classifiers. Their work was devoted to the management of the unbalanced data set and the better performance of detection on unknown ransomware samples.
Gulmez et al. [15] came up with the explainable ransomware detection framework called XRan that is developed on the principle of dynamic analysis. Their analysis supported the role of explainability and detection in real-time in the current ransomware defense systems.
The recent researches revealed that the Portable Executable (PE) features can be used to perform a statical analysis to identify the ransomware features. It has been proven by researchers, including Kunku et al. and Alraizia et al. that machine learning classifiers such as Random Forest and Decision Trees are effective when trained on well-chosen features of executable files. Other publications underlined the relevance of ensemble learning techniques because they can work with high-dimensional and complicated data. Further, explainability has been a concern in ransomware detection studies, with Ghadhban Salman et al. and Gulmez et al. emphasizing that security systems are not supposed to only make predictions but also clarify why they did, as noted by them. Even with these developments, large numbers of current systems do not possess the capability of real-time monitoring and fail to integrate local analysis with external threat intelligence, allowing room to enhance them.
PROPOSED METHODOLOGY
The ransomware detection system suggested is based on the layered architecture where the data preprocessing, smart detection, and convenient presentation are combined to guarantee the accurate and real-time threat detection. During the data preprocessing phase, executable files are gathered on trusted data sets and system monitoring. The methods of data augmentation are used to equalize between benign and ransomware samples and enhance the strength of the machine learning model. Next comes feature selection which helps to eliminate the noises and only the most relevant attributes are maintained which enhances noise reduction and better classification.
The fundamental section of the system is the detection and intelligence layer. Extraction features are analyzed using a machine learning model that is a Random Forest classifier, which then classifies files as benign or ransomware.
In the quest to enhance the reliability, a confidence estimation mechanism through threshold values is imposed on the predictions of the model. The system also combines VirusTotal API results to authenticate machine learning results with third-party threat intelligence. Moreover, model result interpretation is also provided to provide reasons on why a file is indicated as malicious.
Fig 3: Proposed Model Framework
Lastly, there are application and presentation layers and these guarantee real-time interaction and visualization. File monitoring executable files on a system-wide basis are constantly monitored, and manual analysis on files uploaded manually can be performed on demand. FastAPI server is in charge of the communication between backends and the frontend. Live updates are possible with the help of a WebSocket-based client, and visual representation of detection results, confidence levels, and alerts are displayed in the form of the web dashboard to the user. This multistage approach will guarantee the security of early ransomware detection, better transparency, and successful user interaction, which will turn the suggested system into a viable and efficient solution to implement in reality.
- Algorithm:
Step 1:
The publicly available ransomware datasets are partly used in the experiments in this work, which were acquired on Kaggle. These two sets are labeled samples of benign (safe) and ransomware executables, and thus they can be used in supervised machine learning. The information obtained will be used to train, validate, and assess the performance of the suggested ransomware detection model.
Step 2:
The datasets used in the study of ransomware are also usually imbalanced with benign samples being more than malicious. To reduce this problem and improve the learning ability of the classifier, Synthetic Minority Oversampling Technique (SMOTE) is used.
SMOTE creates artificial examples of the minority group by interpolating an existing data point of minority and its closest neighbors. The equation used to create a new synthetic sample is as shown below:
xnew = xi + rand(0,1).(xnn–xi) (1)
Where:
- xirepresents a minority class sample,
- xnndenotes one of its k-nearest neighbors,
- rand(0,1)is a random value between 0 and 1.
Front-loading makes the dataset balanced and minimizes biasness of the majority class in the process of model training.
Step 3:
Significant and discriminatory attributes are identified in the data to enhance the rate of detection. These are file entropy values, frequency patterns of opcodes and API call sequences, which are known to capture ransomware activity. All numerical features are being normalized or scaled to guarantee the model training is stable and efficient with no features having greater magnitudes dominating the training.
Step 4:
Random Forest algorithm has been used in the detection framework because it is robust and has a high level of classification. Random Forest is a form of an ensemble learning method that takes into account predictions of several decision trees.
The prediction is made by each tree separately and the final result is reached through a majority voting mechanism:
y=Mode{h1(x),h2(x)…..,hK(x)} (2)
Where:
- hk(x)is the prediction of the kthdecision tree,
- Kis the total number of trees,
- Modeselects the most frequent class label.
Step 5:
In order to measure reliability of predictions, confidence estimation in the form of probabilities is adopted. The trained model during inference generates class-wise probabilities using the predict_proba() function.
The predictive confidence is defined as:
y=max(P(y=0∣x),P(y=1∣ x))] (3)
The corresponding uncertainty is defined as:
Unc(y)=1- y (4)
The more the confidence values are higher the more certainty is present in the prediction and the more uncertainty the more vague classifications.
Step 6:
In order to enhance the reliability of detection, external threat intelligence is built in the framework. Every executable file is submitted to VirusTotal malware intelligence and a SHA-256 hash of it is made.
The response will give it information on the number of security engines that have already detected the file as malicious. The machine learning output is used in conjunction with this intelligence to inform and reliable decisions related to ransomware detectors.
Step 7:
The proposed system incorporates SHAP (SHapley Additive exPlanations) to be transparent and interpretable. SHAP estimates a value of importance (Ph i ) of the individual features which reflects how the feature contributes to the final prediction.
The SHAP value is computed as:
Φi = S⊆F{i}S! . F-S-1!F! . [f(S ∪i-fS]
(5)Where:
- Φiis the SHAP value of feature i,
- Fdenotes the complete feature set,
- Srepresents a subset of features excluding i,
- f(⋅)is the model prediction function.
It allows the Explainable AI (XAI) by explicitly demonstrating the impact of separate exploreable on ransomware detection results.
RESULT AND DISCUSSIONS
The suggested ransomware detection framework was tested with a labeled set of benign and ransomware samples. The analysis of the dataset showed that there is a significant imbalance in the number of classes, with benign files prevailing.
Application of confidence estimation also increased the predictive accuracy of the estimations with the provision of probability-based certainty scores. There was a good consistency between high-confidence predictions and the real labels, and there were cases of low confidence that showed that there is overlapping behavior between benign and ransomware samples. The measure of uncertainty was effective in establishing cases on the border, which might need additional examination.
Fig.4 File Upload Interface for Executable Analysis
The proposed system was tried to make sure it can detect the ransomware files in time. When testing, the system would monitor the files that were executable and analyze them when they were uploaded or changed. According to the findings, the ransomware files have been identified in a short time, and it assists in avoiding any additional harm to the system.
Fig 5 Ransomware Monitoring Dashboard
On the monitoring dashboard, it was noted that ransomware files were rightly detected as malicious and benign system files were labeled as normal. This demonstrates that the system is able to make a clear distinction between malice and non-malice files. The accuracy and consistency of the results of the detection were observed in the course of numerous tests.
Fig 6 File Detection Result Chart
Accuracy chart indicates that the majority of the files in the dataset are legitimate, and a smaller part has ransomware. Such kind of data distribution is prevalent in the real world. Despite this imbalance, the system gave good results as a result of appropriate data management during training
Fig 7 Explainable AI Output for Ransomware Detection
The decisions of the model were explained using explainable AI (SHAP and LIME). These approaches demonstrated that file structure-related and entropy-related features were significant towards the identification of ransomware. It renders this system easier to comprehend and trust.
|
Paper |
Methods |
Features Used |
Accu-racy |
|
[1] Kunku et al., 2023 |
RF, XGBoost |
Behavior logs, file access frequency, entropy |
~97.5% |
|
[2] Nanda Rani & Dhavale, 2022 |
XGBoost, LR |
Sandbox-based behavioral features |
~98% |
|
[4] Bandgar & Mote, 2025 |
RF, GBM |
File system & process behavior, registry changes |
>99% |
|
[6] Khamma, 2020 |
RF with Gain Ratio selection |
N-gram byte frequency |
97.74% |
Table.1 : Comparison Results
The table is an analysis of the current ransomware detection strategies as reported by the concerned research studies. Various authors have used different machine learning models like the Random Forest, XGBoost, Gradient Boosting, and Logistic Regression with varying behavioral and static features. Typically, features used are file system activity, entropy values, opcode or byte-level pattern, sandbox-based behavioral log and registry/process behavior.
CONCLUSION
The study developed a powerful machine learning model of ransomware detection through behavioral-based and threat intelligence. The research used datasets that were publicly available and used data augmentation methods to remove the problem of class imbalance that enhanced the learning capacity of the model. Significant behavioral characteristics were picked and standardized to improve the performance of classification.
The Random Forest classifier proved to be effective in detecting ransomware and also proved to be robust in differentiating between ransomware and benevolent files. Confidence estimation inclusion also gave further understanding of the reliability of the prediction and uncertainty analysis revealed some ambiguous samples. In addition, the use of external threat intelligence via hash based verification enhanced credibility and reliability of the outcomes of detection.
SHAP-based model interpretability allowed making a decision in a transparent manner, which revealed the contribution of each of the features, which made the system more comprehensible and believable to security analysts. All in all, the results of the experiment prove that the proposed method is correct, robust, and applicable to the situation of practical detection of ransomware.
The framework can be generalized in the future by adding deep learning-based models and real-time tracking and monitoring as well as incorporating more data to achieve better detection and flexibility to changing ransom ware patterns.
REFERENCES
- Kunku, K., Zaman, A., & Roy, K. (2023). Ransomware Detection and Classification using Machine Learning. arXiv.
- Nanda Rani & Sunita Vikrant Dhavale (2022). Leveraging Machine Learning for Ransomware Detection, arXiv.
- Amjad Alraizza & Abdulmohsen Algarni (2023). Ransomware Detection Using Machine Learning. Big Data and Cognitive Computing, 7(3), 143.
- B.M.Bandgar & Abhijeet Mote (2025). Analysis of Ransomware Attack Detection Using Machine Learning Algorithms. Communications on Applied Nonlinear Analysis, 32(8S).
- Manabu Hirano & Ryotaro Kobayashi. (2022). Machine Learning-based Ransomware Detection Using Low-level Memory Access Patterns Obtained From Live-Forensic Hypervisor, arXiv.
- Ban Mohammed Khamma (2020). Ransomware Detection using Random Forest Technique. ICT Express, 6(4), 325–331.
- Mondal, B., Dukkipati, S. S. N. C., Rahman, M. T., & Taimun, M. T. Y. (2025). Using Machine Learning for Early Detection of Ransomware Threat Attacks in Enterprise Networks. Saudi Journal of Engineering and Technology, 10(4), 159–168.
- Lee, J., Kim, J., Jeong, H., & Lee, K. (2025). A Machine Learning-Based Ransomware Detection Method for Attackers’ Neutralization Techniques Using Format-Preserving Encryption. Sensors, 25(8), 2406.
- Rele, M., Samuel, J., Patil, D., & Krishnan, U. (2025). Exploring ransomware detection based on Artificial Intelligence and Machine Learning. Procedia Computer Science, 252, 548–556.
- Ghadhban Salman, H. A., Bidgoly, A. J., & Fallah, S. (2024). Towards an explainable deep learning model for unknown ransomware detection. Journal of Electrical Systems, 20(3), 5163–5172.
- K. Thummapudi, P. Lama, and R. V. Boppana, “Detection of ransomware attacks using processor and disk usage data,” IEEE Access, 2023.
- U. Zahoora, A. Khan, M. Rajarajan, S. H. Khan, M. Asam, and T. Jamal, “Ransomware detection using deep learning based unsupervised feature extraction and a cost sensitive pareto ensemble classifier,” Scientific Reports, vol. 12, no. 1, p. 15647, 2022.
- I. Ba’abbad and O. Batarfi, “Proactive ransomware detection using extremely fast decision tree (efdt) algorithm: A case study,” Computers, vol. 12, no. 6, p. 121, 2023.
- C. Woralert, C. Liu, and Z. Blasingame, “Hard- lite: A lightweight hardware anomaly realtime detection framework targeting ransomware,” IEEE Transactions on Circuits and Systems I: Regular Papers, 2023.
- Gulmez, S., A.G. Kakisim, and I. Sogukpinar, XRan: Explainable deep learning-based ransomware detection using dynamic analysis. Computers & Security, 2024. 139: p. 103703.
- D. Min, Y. Ko, R. Walker, J. Lee, and Y. Kim, “A content-based ransomware detection and backup solid-state drive for ransomware defense”,IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2021
Bachina Parimala*
Latha Gaddam
Hema Karpurapu
10.5281/zenodo.19698928