1Department of Computer Science, University of Charmo
2School of Humanitarian Sciences, University of Sulaimani, Sulaimani, Kurdistan Region-Iraq
3Department of Computer Science, University of Garmian, Sulaymaniyah, Iraq
4Department of Computer Science, Darbandikhan Technical Institute, Sulaimani Polytechnic University, Sulaimani, Iraq
Fraud detection in banking systems is crucial for financial stability, customer protection, and regulatory compliance. Machine learning plays a vital role in enhancing data analysis and real-time fraud detection. Feature selection is an essential phase in machine learning to improve credit card fraud detection. By eliminating the negative impact of redundant and irrelevant features and selecting effective ones, feature selection aids the classification phase in machine learning. This paper presents an effective method based on a hybrid Grey Wolf and Cheetah algorithm to enhance the accurate identification of fraudulent credit card transactions by recognizing relevant features. Additionally, in the machine learning classification phase, the Support Vector Machine (SVM) method is employed, which has been improved through parameter tuning using the hybrid Grey Wolf and Cheetah algorithm. The results demonstrate that the proposed method has achieved at least a 1% improvement in fraud detection on the Australian credit dataset compared to other methods.
E-commerce refers to the digital transactions of goods or services and has facilitated consumer access to purchasing products [1, 2]. E-commerce frameworks offer numerous advantages, including accelerating procurement processes, reducing costs, enhancing consumer convenience, facilitating product and price comparisons, swiftly adapting to market fluctuations, and accommodating various payment methods. These aspects have contributed to economic resilience, particularly during government-imposed restrictions due to pandemics and lockdown orders, such as those experienced during the COVID-19 pandemic [3, 4]. Before the financial crisis of 2008-2009, the annual growth of e-commerce revenues increased from 15% to 25% but declined to about 3% in 2009 due to the economic downturn [5]. This growth rate was significantly more stable than the decline experienced in traditional retail commerce during the same period. Post-2009, e-commerce growth rates resumed, increasing annually by more than 10%, a rate much higher than total retail sales. The COVID-19 pandemic in 2020, while negatively impacting digital sales, also led to a significant increase in retail e-commerce revenues, driven by consumer shifts towards online shopping, a trend that has persisted post-pandemic [5]. The ongoing pandemic has increased online demand for various products, including essentials. It has also boosted the use of electronic payment methods, resulting in a significant rise in fraudulent financial activities [6]. With the evolution of e-commerce, almost all businesses, whether small or large-scale, have adopted credit card payments. However, this increased use of credit cards, particularly through online transactions, has created new opportunities for criminals to exploit consumer credit card information. Financial fraud, with its extensive implications, poses a serious challenge in academic, commercial, and regulatory domains, adversely affecting service providers and their customers. This issue is critically important in the financial arena and permeates everyday economic interactions on a global scale. It involves the unauthorized acquisition of assets or capital for self-enrichment, undermining trust in financial institutions and increasing the cost of living. The classification of financial fraud is multifaceted, including falsification of financial statements, deception in insurance practices, bank embezzlement, remote fraud, and fraudulent activities in stock and commodity markets [7, 8]. The ripple effects of such fraudulent behavior have significantly transformed global financial infrastructures, accelerating the shift towards digital financial services and ushering in a new era of challenges. Reports indicate a sharp increase in fraud losses associated with credit and debit cards, which grew exponentially from 2000 to 2015 [9], highlighting that unauthorized purchases and counterfeit credit cards, although accounting for only 10 to 15 percent of total fraud cases, are responsible for 75 to 80 percent of financial fraud losses. Consequently, both private and public sectors are increasing their investments in research and development to create more sophisticated systems for fraud detection. Significant financial flows in e-commerce attract fraudulent activities, posing a risk of substantial financial losses. As of now, fraud is reported to increase from $17.5 billion in 2023 to $20 billion in 2024. These figures underscore the necessity of developing fraud detection and prevention mechanisms for e-commerce and financial institutions [10]. Fraud is an illicit act characterized by deception. Credit card fraud involves the illegal acquisition of cardholder information through phone calls, text messages, or cyber hacking to facilitate unauthorized financial transactions. Such fraudulent acts are often executed using software controlled by criminals. The credit card fraud detection process begins when a consumer initiates a transaction using their credit, and this transaction must be verified to confirm its legitimacy [11-12]. Efforts in research are directed towards developing diagnostic systems that utilize methods such as machine learning, deep learning, and data mining. These systems analyze transaction data to differentiate between legitimate and fraudulent activities. The complexity of credit card fraud detection is increasing as the features of fraudulent transactions become more similar to legitimate ones. Consequently, credit card companies are compelled to implement more advanced fraud detection technologies. An efficient fraud detection system is essential for accurately identifying fraudulent activities in real-time transactions, which are broadly classified into anomaly detection and misuse detection systems [11]. Financial institutions that issue credit cards or manage online transactions must employ automated fraud detection systems. This reduces losses and enhances customer trust. With the advent of big data and artificial intelligence, new opportunities have emerged for utilizing advanced machine learning models for fraud detection [13]. Today's fraud detection systems, which rely on advanced machine learning methods, are highly effective. A binary classification model that can distinguish between normal and fraudulent transactions is built using a dataset containing labeled transactions (normal and fraudulent). The developed model determines whether incoming transactions are legitimate or fraudulent. Identifying fraudulent transactions using classification techniques presents numerous challenges [14,15]. Automated fraud detection systems are essential for financial entities involved in issuing credit cards or handling online transactions, as they help minimize losses and bolster customer trust. The emergence of big data and artificial intelligence has opened new avenues for employing sophisticated machine learning models for fraud detection. For instance, research [13] has demonstrated that the latest fraud detection systems using advanced machine learning techniques are highly effective in this domain. The present study examines credit card fraud, a significant subset of banking fraud. Credit cards, a dominant electronic payment method worldwide, have simplified online transactions and increased fraudulent activities by cybercriminals. Unauthorized use of credit card systems or information, often without the owner's knowledge, is a growing concern affecting many banks and financial institutions globally. Credit card fraud detection involves the automated analysis of transaction data stored in service providers' databases. Common types of fraud include application and transaction fraud, encompassing both physical and digital domains. Fraud categories include card-not-present scenarios, counterfeit cards, and identity theft.
Banks play a role in scrutinizing transactions for fraud prior to payment authorization, particularly by checking if the involved website is on a blocklist. Transactions associated with blocked sites are rejected as a fraud prevention measure. Once bank approval is obtained, the fraud prevention process focuses solely on fraud detection. Preventive measures for program fraud include validating personal details. Tools such as Address Verification Services, Card Verification Value, 3D Secure, and encryption are employed to counter transaction fraud. The significance of the number of features in datasets is a critical issue, as they can disrupt the effectiveness of classification methods and potentially lead to overfitting or ambiguity in algorithms. Feature selection involves identifying effective features and eliminating redundant ones, impacting the performance of machine learning models. The goal of feature selection is to reduce the dimensionality of the feature set while maintaining performance accuracy. Various methods have been developed for classifying datasets, but metaheuristic methods have garnered significant attention due to their effectiveness in addressing a wide range of optimization problems. Metaheuristic techniques are optimization strategies designed to find approximate solutions to various optimization issues. These algorithms do not require derivative calculations for optimization and have the ability to bypass local optima and reach global optima. Metaheuristic methods operate stochastically and, unlike gradient search methods that require derivative calculations in the search space, initiate their optimization process with randomly generated solutions. Their simplicity and straightforwardness, derived from basic concepts and ease of implementation, make these algorithms highly flexible and easily adaptable to specific problems. A key feature of metaheuristic methods is their exceptional capacity to avoid premature convergence. Their stochastic nature allows them to effectively function as a "black box," circumventing local optima and thoroughly exploring the search space. Utilizing metaheuristic methods to address feature selection issues can effectively overcome various challenges in data analysis. Feature selection is crucial for extracting relevant features from imbalanced datasets, particularly before classifying credit card fraud cases in extensive datasets. The primary benefits of feature selection include simplified data interpretation, reduced training time, and resolving high-dimensionality issues. Bio-inspired algorithms, adept at tackling complex and combinatorial problems, have been effectively used in credit card fraud detection. Notably, the Cheetah Optimization Algorithm, a renowned bio-inspired optimization method known for its dual capabilities of local and global search through exploration and exploitation techniques, is highlighted. This paper employs the Cheetah Optimization Algorithm in combination with the Grey Wolf Optimizer to enhance search capabilities for feature selection and improve the performance of Support Vector Machines.
LITERATURE REVIEW
Extensive research has been conducted in the field of financial fraud detection, resulting in a plethora of proposed methods extensively cataloged in the literature (as referenced in Table 1). These methods predominantly comprise machine learning techniques. The aim of this study is to explore the inherent limitations of existing class methods, with a particular focus on machine learning techniques.
Table 1. Overview of Machine Learning Methods in Fraud Detection
Reference |
Year |
Dataset |
Method |
Accuracy |
[28] |
2018 |
European Card Readers |
Nearest Neighbor, Bayesian |
97.92 |
[29] |
2019 |
Kaggle |
Neural Network, Bayesian, and Support Vector Machine |
99.93 |
[30] |
2019 |
Kaggle |
Random Forest |
99 |
[31] |
2019 |
Metric Facilities Program Data |
Feature Selection and Logistic Regression |
86.98 |
[32] |
2019 |
Credit Card Data |
Neural Network and Support Vector Machine |
96 |
[33] |
2021 |
Brazil and Europe Data |
LSTM Neural Network |
77 |
[34] |
2021 |
European Card Readers |
Decision Tree |
97.18 |
[35] |
2021 |
Three UCI Datasets |
Genetic Algorithm Feature Selection and Neural Network, Support Vector Machine |
81.97 |
[36] |
2021 |
Australian Data |
Self-Organizing Neural Network |
90 |
[37] |
2022 |
European Card Readers |
Support Vector Machine |
99 |
[7] |
2022 |
Australian Credit |
Firefly Algorithm Feature Selection and Support Vector Machine |
85.65 |
[38] |
2022 |
European Card Readers |
Feature Selection and Support Vector Machine |
98.60 |
[39] |
2022 |
European Card Readers |
Decision Tree, Nearest Neighbor, and Bayesian |
91.11 |
[15] |
2022 |
European Card Readers |
Decision Tree, Random Forest |
99.80 |
[40] |
2022 |
Kaggle |
Sample Augmentation and Random Forest Method |
94 |
[41] |
2023 |
European Card Readers |
Sample Augmentation and XGBoost Method |
99.95 |
[42] |
2023 |
Medical Insurance Claim Data |
Clustering and Logistic Regression Method |
88 |
[43] |
2023 |
AAER Benchmark Data |
Feature Selection and Classification Enhancement with Fisher Algorithm |
94 |
[44] |
2023 |
Lending Club |
Random Forest, Logistic Regression, and Support Vector Machine |
95 |
[45] |
2024 |
Kaggle |
Random Forest, Logistic Regression |
94 |
[46] |
2024 |
Sparkov |
Deep Learning Autoencoder |
97 |
[47] |
2024 |
Real World Data |
Multi-task Learning |
99 |
[48] |
2024 |
Australian Credit |
Brown Bear Algorithm Feature Selection and Support Vector Machine |
92 |
Based on the reviewed articles, it is evident that various datasets have been proposed for fraud detection, and machine learning methods with feature selection and classification phases have been developed in this field. In the latest research in this area, including [48], feature selection using the metaheuristic Brown Bear Algorithm and Support Vector Machine (SVM) has been utilized. In the present paper, a hybrid algorithm combining the Grey Wolf Optimizer and Cheetah Algorithm is employed for feature selection, as well as enhancements to the Support Vector Machine, in comparison to reference [48].
One of the main challenges in fraud detection is the massive volume of data and features present in each financial transaction. For instance, a transaction can include various information such as amount, geographical location, transaction time, and type of goods. However, not all this information is crucial for identifying fraud. Some features can be misleading and negatively impact the model's accuracy. This is where the hybrid optimization algorithm of the Grey Wolf Optimizer and Cheetah Algorithm comes into play. Inspired by the hunting behavior of wolves and cheetahs, this algorithm identifies important and relevant features. Wolves, with their hierarchical hunting strategies led by alpha wolves, and cheetahs, which meticulously scan the environment before deciding on the best prey (feature) and then sprint towards it at maximum speed, serve as models for this approach. Here, the hybrid Grey Wolf and Cheetah algorithm, with its two distinct search strategies, assists the system in identifying the features most critical for fraud detection while ignoring unnecessary or redundant ones. For example, if transaction data comprises 10 features, the proposed hybrid algorithm can determine that, out of these 10 features, perhaps 4 have the greatest impact on fraud detection. These 4 features are then forwarded as input to the next phase, which is classification. A challenge of the proposed method is how to achieve higher accuracy in fraud detection and how the enhanced Support Vector Machine, aided by the proposed hybrid algorithm, can achieve greater accuracy in fraud detection compared to the standard Support Vector Machine method [48] and its improved version with the Brown Bear Algorithm. The framework of the research method in the proposed approach consists of three parts, as shown in Figure 1.
Figure 1. Phases of the Research Method
As shown in Figure 1, the research method in the proposed model consists of three phases:
1. Data Collection Phase: In this phase, the Australian Credit dataset [49] is prepared. At this stage, the necessary data for training the model is extracted from a real dataset known as the Australian Credit dataset. This data includes various information from credit card transactions, such as the amount of money in each transaction, the type of goods or services purchased, the time of the transaction, and other features. The goal of this stage is to provide a reliable database of legitimate and fraudulent transactions. This data is divided into two parts:
In essence, the model first learns from the training data how to differentiate between legitimate and fraudulent transactions. Then, using the testing data, the model's performance is assessed to determine how well it performs.
2. Feature Selection Phase: In this phase, the effective features in the dataset are selected. In each transaction, there is a lot of information, not all of which is useful for fraud detection. Some features are more important and help in identifying fraud, while others may be unnecessary or even reduce detection accuracy. Therefore, it is necessary to select only the relevant and important features for fraud detection. This task is performed by the proposed hybrid Grey Wolf and Cheetah algorithm.
3. Classification Phase: In this phase, data with selected features is fed to the Support Vector Machine for classification. After selecting the important features, it is time to classify the transactions. At this stage, the Support Vector Machine (SVM) is used. This tool is a type of machine learning algorithm aimed at dividing transactions into two categories:
The Support Vector Machine learns from the training data how to differentiate between these two categories. Then, when a new transaction is received, the SVM decides whether the transaction is legitimate or fraudulent. In the proposed method, the hybrid Grey Wolf and Cheetah algorithm is used not only for feature selection but also to assist the SVM in finding the best parameters for classification. This means the system can categorize transactions more accurately. For feature selection, for each possible solution (search agent, which is either a wolf or a cheetah), a vector is considered where each element corresponds to the feature number in the dataset.
feature 10 |
feature 9 |
feature 8 |
feature 7 |
feature 6 |
feature 5 |
feature 4 |
feature 3 |
feature 2 |
feature 1 |
0 |
1 |
1 |
0 |
0 |
0 |
1 |
1 |
0 |
1 |
Figure 2. Structure of a Wolf or Cheetah (Possible Solution) in the Proposed Method during the Feature Selection Phase
In Figure 2, for a dataset with 10 features, a possible solution or a wolf or cheetah is represented as an array with 10 elements. In Figure 1, a possible solution refers to the selected features first, third, fourth, eighth, and ninth, corresponding to the binary number 1011000110, which translates to the real number 710. This is where the operators of the hybrid Grey Wolf and Cheetah algorithm come into play. In the proposed method, each element of the wolf or cheetah (possible solution) that has a value of 1 indicates that the feature is included in the selected feature set. In the proposed method, to evaluate each wolf or cheetah (feature set), a fitness value or fitness function must be assigned. For this purpose, the nearest neighbor method is used to evaluate the selected feature set, calculated using Equation (1).
fitnessiselectedFeatures= ∝Accuracy selectedFeatures+1-∝Nt-NsNt (1)
In this context, Nt and Ns represent the total number of features and the number of selected features, respectively, and a is a coefficient between 0 and 1. The accuracy used in the fitness function is obtained from Equation (2):
classificationAccuracy=TN+TPTN+FN+TP+FP (2)
Each element of the matrix is described as follows:
The overall steps of the proposed method for feature selection are illustrated in the flowchart in Figure 3:
Figure 3 illustrates the overall steps of the proposed method for feature selection using the hybrid Grey Wolf and Cheetah algorithm.
In the classification module (SVM-CO), the Support Vector Machine (SVM) is trained. The data is divided into training and testing sets. The improvement of the SVM method using the hybrid Grey Wolf and Cheetah algorithm is such that in the hybrid algorithm, each possible solution generates a random value for the variable (C) and (w) in the SVM method in Equation 1. The parameter (C) is the regularization parameter that balances maximizing the margin and minimizing the classification error, and it is always greater than zero. The parameter (w) is the weight, and since it is generated randomly in the SVM method and may not be at its optimal value, finding the appropriate values for (C) and (w) in the SVM method is an optimization problem. The hybrid Grey Wolf and Cheetah algorithm determines the best values for them. In SVM, finding the best (w) and (C) is achieved by minimizing Equation 3.
min 12 w2+Ciεi (3)
Where the optimization must consider the condition in Equation 4:
yixi>+b≥1-εi , εi≥0 ∀i (4)
In Equation 4, the parameter (b) represents the bias, xi denotes the feature of the data, and yiindicates the class of the data. The hybrid Grey Wolf and Cheetah algorithm uses its operators to find each possible solution (a value for parameters (C) and (w)) to ultimately achieve the best values for these parameters. The fitness of each solution in this algorithm is calculated using the fitness function based on the classification accuracy of the Support Vector Machine (SVM) from Equation 2.
In the problem of tuning the parameters of the Support Vector Machine, each possible solution in the optimization algorithm is a real-valued array representing the numerical values for the two parameters (W) and (C) in the SVM.
W |
C |
0.74 |
0.45 |
Figure 4: The Structure of a Wolf or Cheetah in Tuning the Parameters of a Support Vector Machine
In Figure 4, each wolf or cheetah represents the two main parameters in the Support Vector Machine. In the Grey Wolf Optimization algorithm, the optimization (hunting process) is carried out using alpha, beta, and delta wolves. To implement this mathematically, it is stated that grey wolves encircle their prey during the hunt, and the following mathematical equations are proposed for them:
D=C.Xpt-Xt (5)
Xt+1=Xpt-A.D (6)
The hunt is usually led by the alpha wolf, sometimes the beta and delta also participate. Other search agents are required to update their position and follow them according to the position of the best search agents. Relationships The movement of wolfs It is as follows:
A=2a.r1-a (7)
C=2r2 (8)
The hunt is usually led by the alpha wolf, sometimes the beta and delta also participate. Other search agents are required to update their position and follow them according to the position of the best search agents. Relationships The movement of wolfs It is as follows:
Dα=C1.Xα-X (9)
Dβ=C2.Xβ-X (10)
Dδ=C3.Xδ-X (11)
X1=Xα-A1.Dα (12)
X2=Xα-A2.Dα (13)
X3=Xα-A3.(Dα) (14)
X(t+1)=X1+X2+X3 3 (15)
In the cheetah algorithm, the movement operators are different from those of the wolfs. cheetah from two through to follow bait they turn or in mode sitting or standing the environment particle for direct object his foot do and find as active in around it tour a woman they do. Equation 16 is related to cheetah search:
Xi,jt+1=Xi,jt+ri,j-1.αi,j t (16)
Xi,jt+1The next position of the cheetah is its current position, and Xi,jt the random parameter is the standard normal distribution, and ri,j-1 the step length is the movement direction, and is greater than 0, and its default state 0.001×tT ,It means that the cheetah is searching slowly. It may also move quickly and change its direction in the face of hunting or other enemies. αi,jtIt is a movement between the cheetah and other neighbors or the leader. The leader is the best solution found in each optimization iteration. in length mode search, bait possible is in the square he saw cheetah be placed.in this Conditions every movement cheetah possible is bait particle for direct object from presence self-aware slow and leading to run away bait to be Equation 17 It is intended for this purpose:
Xi,jt+1=Xi,jt (17)
In equation 14, Xi,jt+1 the next position of the cheetah and Xi,jtis its current position, and in fact, the update does not occur in the position of the cheetah. Cheetahs from two agent important for attack to bait self-use they do: speed and flexibility adaptability The attack strategy is stated in equation 18:
Xi,jt+1=XB,jt+ri,j .βi,j t(18)
where XB,jt the current position is hunting and is actually the best current position in the algorithm. ri,jIt is the rotation factor and βi,jt the cheetah interaction factor. XB,jt It was considered to approach the prey, which is actually the best answer to the problem and βi,jt indicates the interaction of cheetahs with other cheetahs or the leader. ri,j The rotation factor is also a random walk with equation 19, where ri,j is the standard normal distribution.
ri,j =ri,j exp?(ri,j 2)sin2πri,j (19)
In this algorithm for random moves of random parameters (r) Used as well as quantity H is used with equation 20 where r1 is a uniform random number between0?1.
H=e21-tT2r1-1 (20)
The steps of the combined gray wolf and cheetah hybrid algorithm for feature selection are as follows:
RESULTS
The Australian Credit dataset, sourced from the UCI Machine Learning Repository for this study, comprises 690 credit applications, each characterized by 15 features [49], which were also utilized in study [48]. Among these, six features are numerical, eight are categorical, and a binary class label indicates the credit application outcome (1 for approval and 0 for rejection), as shown in Table 3. The dataset consists of 307 approved and 383 rejected applications, forming a total of 690 samples. The dataset was divided into a training set with 75% of the samples, amounting to 518 samples, and a test set with 25% of the samples, amounting to 172 samples. A ten-fold cross-validation method was employed to report the average detection accuracy. The detection accuracy results, calculated using Equation 2, are presented for three methods, which include:
- Testing with a feature selection method based on the Brown Bear algorithm and an unimproved Support Vector Machine classifier (method from study [48]), utilizing various kernel choices including Gaussian, polynomial, and linear.
- Testing with a feature selection method based on the Brown Bear algorithm and an improved Support Vector Machine classifier using the Brown Bear algorithm, with various kernel choices including Gaussian, polynomial, and linear.
- Testing with a feature selection method based on the Hybrid Wolf and Cheetah (HWC) algorithm and an improved Support Vector Machine classifier using the HWC algorithm, with various kernel choices including Gaussian, polynomial, and linear.
Figure 5 illustrates the results of the three Support Vector Machine-based methods with different kernel variations.
|
|
|
Figure 5. Classification Accuracy Results with Different Support Vector Machine Kernels BBO+SVM: Feature selection using the Brown Bear Optimization algorithm and classification with an unimproved Support Vector Machine [48]. BBO+SVM_BBO: Feature selection using the Brown Bear Optimization algorithm and classification with an improved Support Vector Machine utilizing the Brown Bear Optimization algorithm. HWC+SVM_HWC: Feature selection using the Hybrid Wolf and Cheetah algorithm and classification with an improved Support Vector Machine utilizing the Hybrid Wolf and Cheetah algorithm. |
Figure 6. Improvement Results Comparison Between the Proposed Method and BBO+SVM_BBO
As shown in Figure 6, the highest improvement is 2.25% and the lowest is 0.32%. On average, the improvement results of the proposed method (HWC+SVM_HWC) and the BBO+SVM_BBO method are presented in Figure 7.
Figure 7: Average improvement rate of results compared to the proposed method (HWC+SVM_HWC) and the BBO+SVMBO method in different kernels
As shown in Figure 7, the greatest improvement was observed in the Gaussian kernel. Overall, the proposed method has outperformed the BBO+SVM_BBO method, which is an advanced version of the BBO+SVM method from the paper [48]. In the BBO+SVM_BBO method, the support vector machine is enhanced using the brown bear optimization algorithm, which was not done in the BBO+SVM method in paper [48]. The results indicate that the support vector machine with the Gaussian kernel has provided the best performance in the proposed method, achieving at least a 1.55% improvement compared to the method in paper [48] (BBO+SVM) and its advanced model, the BBO+SVM_BBO method. However, this level of improvement was not observed in the linear and polynomial kernels. In fact, the impact of the combined optimization algorithm of the gray wolf and cheetah is more significant compared to the simulated annealing algorithm in improving the support vector machine when the Gaussian kernel is selected. This is because the separating line in the Gaussian kernel is more dependent on tuning the parameters W and C in the support vector machine, whereas in the linear kernel, this impact is less significant because the separating line in the linear kernel does not have the bending and separation power of classes like the polynomial and Gaussian kernels.
SUMMARY AND CONCLUSION
This study presented a novel approach for feature selection and enhancement of support vector machines using a hybrid optimization algorithm combining the gray wolf and cheetah algorithms. This approach yielded promising results and significantly increased fraud detection capability, as it effectively selected the most influential features and identified suitable parameters for the support vector machine in classification. The model from study [48], which utilized feature selection with the brown bear algorithm and classification with an unimproved support vector machine, was also enhanced in a new model where the support vector machine was improved using the brown bear algorithm. Results indicated that the proposed method achieved higher fraud detection accuracy compared to the other two methods, with the best results obtained using the Gaussian kernel. In the feature selection method with the brown bear algorithm and parameter tuning of the support vector machine with the brown bear algorithm, the highest classification accuracy was 94.5%, while the proposed method achieved 95.5%.
For future work, the use of metaheuristic algorithms for tuning the number of support vector machine kernels can be considered. Results show that changing the support vector machine kernels alters the classification accuracy. Therefore, accurately determining the number of kernels is an optimization problem that can be improved with optimization algorithms. Additionally, if a dataset extracted from bank transactions is recorded, and fraud samples are registered or simulated within that dataset, the proposed model can be tested on data from domestic banks. Generally, the proposed model can be utilized in data mining for fraud detection across various collected financial data types because, in the feature selection phase, effective features are chosen, and in the classification phase, the model learns the pattern of new data to be practically used for fraud detection in the banking system.
REFERENCE
Shan Ali Abdula*, Hersh Fakhradin Aziz, Pavel Ali Abdula, Salam Aham Ali, Pehraw Salam Abdalqadir, Credit Card Fraud Detection Based on Feature Selection and Enhanced Support Vector Machine Using A Hybrid Grey Wolf and Cheetah Algorithm, Int. J. Sci. R. Tech., 2025, 2 (7), 384-398. https://doi.org/10.5281/zenodo.16401987