Computer Science and Engineering, Bharath institute of higher education and research, Tamilnadu, India
This paper presents the development of an Intrusion Detection System (IDS) for smart vehicles utilizing advanced machine learning algorithms. The system is designed to detect and classify various types of cyberattacks, such as Distributed Denial of Service (DDoS), Fuzzy, and Impersonation attacks.The dataset used for model training and evaluation is the CAN-intrusion-dataset, which contains crucial vehicle communication features, including Message_ID, Byte-level signals, and Target labels. The study employs a range of machine learning models, including Random Forest, Gradient Boosting, Adaboost, LSTM, and CatBoost classifiers, to identify and mitigate potential threats. By leveraging the power of these algorithms, the system aims to provide robust and real-time detection of anomalous behaviour in vehicular networks, enhancing the security and reliability of smart vehicle systems. The ultimate goal is to develop an efficient and scalable IDS capable of protecting smart vehicles from evolving cyber threats.
The increasing integration of smart technologies in modern vehicles has significantly enhanced their functionality, safety, and convenience. However, this transformation has also introduced new cybersecurity risks, with vehicles becoming potential targets for various cyberattacks. The connected nature of smart vehicles, relying on communication protocols like the Controller Area Network (CAN), makes them vulnerable to attacks that can disrupt vehicle operations, compromise safety, or even expose sensitive data. As these threats evolve, there is an urgent need for advanced systems to detect and mitigate such intrusions in real-time. An Intrusion Detection System (IDS) serves as a critical defense mechanism to identify malicious activities and protect vehicle systems from cyber threats. Traditional IDS solutions have been tailored to general IT networks, but the unique characteristics of vehicular networks require specialized approaches to effectively address the challenges posed by these environments. This paper proposes the development of an IDS specifically designed for smart vehicles, leveraging machine learning algorithms to detect and classify various types of cyberattacks.
PROBLEM STATEMENT:
As smart vehicles become increasingly integrated with advanced communication networks, they are exposed to a wide array of cybersecurity threats, including Distributed Denial of Service (DDoS), Impersonation, and Fuzzy attacks. These cyberattacks pose significant risks to the safety, privacy, and reliability of vehicular systems. Current intrusion detection mechanisms are often inadequate in detecting sophisticated, evolving threats in real time. Therefore, there is a pressing need for an efficient and scalable Intrusion Detection System (IDS) that can accurately identify and classify these cyberattacks in vehicular networks. This study aims to address this gap by leveraging machine learning algorithms, such as Random Forest, Gradient Boosting, and LSTM, to develop a robust IDS capable of enhancing the security of smart vehicles.
OBJECTIVE OF THE PROJECT:
The objective of this project is to develop a robust Intrusion Detection System (IDS) tailored for smart vehicles, leveraging advanced machine learning techniques to identify and mitigate cyber threats in vehicular networks. By utilizing the CAN-intrusion-dataset, the system will classify a range of cyberattacks, including DDoS, Fuzzy, and Impersonation attacks, as well as distinguish between normal and malicious traffic. The project aims to implement a variety of machine learning models, such as Random Forest, Gradient Boosting, Adaboost, LSTM, and CatBoost classifiers, to ensure high accuracy and real-time detection of threats. The ultimate goal is to create an efficient, scalable, and reliable IDS that enhances the security and resilience of smart vehicle systems against emerging cyber risks.
SCOPE:
The scope of this study focuses on the development of an Intrusion Detection System (IDS) tailored for smart vehicles, leveraging advanced machine learning algorithms to identify and classify a variety of cyberattacks. The system aims to detect threats such as Distributed Denial of Service (DDoS), Fuzzy, and Impersonation attacks, as well as distinguish between normal and anomalous vehicle communication patterns. The research utilizes the CAN-intrusion-dataset, incorporating vehicle communication features like Message_ID, Byte-level signals, and Target labels, to train and evaluate various machine learning models. The primary objective is to create a robust, real-time IDS that enhances the security of vehicular networks, ensuring effective protection against dynamic cyber threats.
Architecture:
Feasibility Study
The feasibility of the project is analysed in this phase and business proposal is put forth with a very general plan for the project and some cost estimates. During system analysis the feasibility study of the proposed system is to be carried out. This is to ensure that the proposed system is not a burden to the company. For feasibility analysis, some understanding of the major requirements for the system is essential. Three key considerations involved in the feasibility analysis are
Economic feasibility:
This study is carried out to check the economic impact that the system will have on the organization. The amount of fund that the company can pour into the research and development of the system is limited. The expenditures must be justified. Thus, the developed system as well within the budget and this was achieved because most of the technologies used are freely available. Only the customized products had to be purchased.
Technical feasibility:
This study is carried out to check the technical feasibility, that is, the technical requirements of the system. Any system developed must not have a high demand on the available technical resources. This will lead to high demands on the available technical resources. This will lead to high demands being placed on the client. The developed system must have a modest requirement, as only minimal or null changes are required for implementing this system.
Social feasibility:
The aspect of study is to check the level of acceptance of the system by the user. This includes the process of training the user to use the system efficiently. The user must not feel threatened by the system, instead must accept it as a necessity. The level of acceptance by the users solely depends on the methods that are employed to educate the user about the system and to make him familiar with it. His level of confidence must be raised so that he is also able to make some constructive criticism, which is welcomed, as he is the final user of the system.
METHODOLOGY:
Random Forest:
Random Forest is an ensemble learning algorithm that operates by constructing multiple decision trees during training and combining their outputs for improved accuracy and robustness. Internally, it employs the bagging (Bootstrap Aggregating) technique to create diverse subsets of the training data by sampling with replacement. Each decision tree is trained independently on these subsets, reducing overfitting. Random Forest introduces additional randomness by selecting a random subset of features at each split in a tree, promoting varied decision boundaries across trees. This feature selection mitigates the dominance of strongly predictive variables, fostering model diversity. During prediction, the algorithm aggregates individual tree outputs: for classification, it employs majority voting, while for regression, it averages the predictions. The ensemble effect enhances generalization, minimizes variance, and addresses biases inherent in individual trees. Its parallel architecture ensures scalability, making it effective for high-dimensional data and complex tasks.
AdaBoost:
AdaBoost (Adaptive Boosting) is an ensemble learning technique that combines multiple weak classifiers to create a strong classifier. The core idea is to iteratively train weak models, typically decision trees with a single split (stumps), each focusing on the mistakes made by the previous model. The algorithm begins by assigning equal weights to all training samples. In each iteration, a weak classifier is trained, and the model's error rate is calculated. The weights of the misclassified samples are then increased, while correctly classified samples are given less weight. This adjustment directs the next weak classifier to focus more on the harder-to-classify instances.
Each weak classifier's contribution to the final prediction is weighted based on its accuracy. The final strong classifier is a weighted combination of all individual weak classifiers. AdaBoost typically improves performance by reducing bias and variance, making it effective even when base models are simple and prone to underfitting.
CatBoost:
CatBoost Classifier is a gradient boosting algorithm designed to handle categorical features efficiently. It builds an ensemble of decision trees, each focusing on correcting errors made by previous trees. Unlike traditional gradient boosting methods, CatBoost uses an innovative technique called ordered boosting, which addresses overfitting and improves generalization by shuffling the training data in a specific way to avoid bias from the order of observations. CatBoost's strength lies in its ability to directly handle categorical variables without requiring one-hot encoding or extensive preprocessing. It applies a method called target-based encoding, where categorical features are encoded based on the target values, reducing the need for manual feature engineering. The model optimizes the objective function using gradient descent, minimizing the loss at each iteration. CatBoost incorporates symmetric trees, where splits are balanced across the tree, which reduces computation time and improves prediction accuracy. It also supports parallelization to speed up training.
SYSTEM DESIGN:
Input Design:
In an information system, input is the raw data that is processed to produce output. During the input design, the developers must consider the input devices such as PC, MICR, OMR, etc. Therefore, the quality of system input determines the quality of system output. Well-designed input forms and screens have following properties −
Objectives for Input Design:
The objectives of input design are −
Output Design:
The design of output is the most important task of any system. During output design, developers identify the type of outputs needed, and consider the necessary output controls and prototype report layouts.
Objectives of Output Design:
The objectives of input design are:
ER Diagram:
Test Cases:
Input |
Output |
Result |
Input |
Tested for different model given by user on the different model. |
Success |
Random Forest Classifier |
Tested for different input given by the user on different models are created using the different algorithm and data. |
Success |
Prediction |
Prediction will be performed using to build from the algorithm. |
Success |
Test cases Model building:
S.NO |
Test cases |
I/O |
Expected O/T |
Actual O/T |
P/F |
1 |
Read the datasets. |
Dataset’s path. |
Datasets need to read successfully. |
Datasets fetched successfully. |
It produced P. If this not F will come in case the data is not in the form of .csv |
2 |
Feature engineering |
Need to check the dataset null values/categorical values |
Dataset Preprocessed successfully |
Data wrangled successfully |
It produced P. If this not F will come |
3 |
Modelling |
Input with algorithms to get metrics |
Algorithm accuracy will be in the form of percentage |
We can get the accuracy of each and every model one by one |
It produced P. If this is not, it will undergo F |
4 |
Prediction |
Need to enter the input values |
Need to predict the output based on the user input |
Result successfully predicted with particular algorithm |
It produced P. If this is not, it will undergo F |
CONCLUSION:
In conclusion, this paper demonstrates the successful implementation of an Intrusion Detection System (IDS) for smart vehicles using advanced machine learning algorithms. By leveraging models such as Random Forest, Gradient Boosting, Adaboost, LSTM, and CatBoost, the IDS effectively detect and classifies various cyberattacks, including DDoS, Fuzzy, and Impersonation attacks
REFERENCE
Lokesh Kumar, Siddabattula Maheswar, Sai Siddartha, Vasanth Reddy, K. SathiyaPriya, Intrusion Detection System for Smart Vehicles Using Machine Learning Classifiers, Int. J. Sci. R. Tech., 2025, 2 (4), 197-203. https://doi.org/10.5281/zenodo.15191644