Sage Journals: Discover world-class research

Abstract

Vehicular Ad hoc NETwork is a promising technology providing important facilities for modern transportation systems. It has garnered much interest from researchers studying the mitigation of attacks including distributed denial of service attacks. Machine learning techniques, which mainly rely on the quality of the datasets used, play a role in detecting many attacks with a high level of accuracy. We conducted a comprehensive literature review and found many limitations on the datasets available for distributed denial of service attacks on Vehicular Ad hoc NETwork including the following: unavailability of online versions, an absence of distributed denial of service traffic, unrepresentative of Vehicular Ad hoc NETwork, and no information regarding the network configurations. Therefore, in this article, we proposed a novel simulation technique to generate a valid dataset called Vehicular Ad hoc NETwork distributed denial of service dataset, which is dedicated to Vehicular Ad hoc NETworks. Vehicular Ad hoc NETwork distributed denial of service dataset holds information on distributed denial of service attack traffic considering Vehicular Ad hoc NETwork architecture, traffic density, attack intensity, and nodes mobility. Well-known simulation tools such as SUMO, OMNeT++, Veins, and INET were used to ensure that all the properties of Vehicular Ad hoc NETwork have been captured. We then compared Vehicular Ad hoc NETwork distributed denial of service dataset with several studies to prove its novelty and evaluated the dataset using several machine learning models. We confirmed that studied models using this dataset achieved high accuracy above 99.5% except support-vector machine that achieved 97.3%.

Keywords

Vehicular Ad hoc NETwork ad hoc network distributed denial of service machine learning OMNeT++Veins dataset

Introduction

Globally, car accidents represent a high proportion of road traffic deaths as shown by a World Health Organization¹ report, in 2018, on road safety, reporting a road traffic death rate of 18.2 per 100,000 population around the World and about 26.6 per 100,000 population in some regions. Thus, determining ways to utilize technologies to enhance and improve traffic safety is driven by a mandatory need to safe people’s lives. From this context, several studies have been proposed to employ the new technologies in reducing such traffic accident rate. Among these technologies are the Vehicular Ad hoc NETworks (VANETs), a very popular and important technology due to its link to and impact on the safety of people and communities. The system is a combination of several wireless and sensor technologies including intelligent transport system (ITS), mobile ad hoc network (MANET), and an Internet of thing (IoT) application.²

VANET is comprised of nodes that use wireless networking technologies as a means of communication. The vehicles’ nodes communicate with each other and with the road side units (RSUs) through a communication unit in each vehicle called the on-board unit (OBU), which in turn is connected to the application unit (AU) to provide an application interface.³ VANET supports both safety and non-safety applications. The main goal of its safety applications is to minimize accidents and improve driving safety by alerting drivers regarding collision avoidance, road sign notifications, and alarms for incident management. By contrast, non-safety applications are divided into two subsections: traffic coordination and infotainment applications. Traffic coordination leverages vehicular communications to broadcast traffic information between vehicles on the road; this optimizes traffic flow and improves driver experience. Infotainment applications aim to provide drivers with contextual information such as pertinent advertisements and parking assistance, in addition to entertainment during their journey.⁴ However, safety and non-safety applications are not completely separated from each other; hence, all aspects should be considered when designing VANET applications.⁵

In VANET, routing is a challenging factor owing to the unique characteristics of the network, especially the rapid mobility of nodes that causes quick variations in topology. In addition, diverging density and speed of the vehicles on the road may lead to either overhead or poor connectivity due to sparse distribution. The current VANET routing protocols are categorized based on topology, position, cluster, broadcast, and geocast-based protocols.⁶ The unique characteristics of VANET expose it to many threats that may compromise and corrupt the whole VANET system.⁷ Threats may originate owing to vulnerabilities such as those related to communication protocols,⁸ energy flow and authentication,⁹ integration,¹⁰ and information privacy and integrity.¹¹ Among these attacks is the distributed denial of service (DDoS) attack, which aims to deny network availability by flooding either the vehicle, the infrastructure, or both, with spurious messages.¹²

There are two possible scenarios when targeting VANET with DDoS attacks—from vehicle to vehicle and from vehicle to infrastructure.¹³ The first scenario occurs when a number of attacker vehicles targets a vehicle by flooding it with fake messages, so that the victim vehicle is unable to send/receive legitimate requests. The second scenario occurs when a number of attacker vehicles targets the RSU, so that it becomes unavailable to legitimate nodes. Figure 1 explains a vehicle to infrastructure DDoS attack, where the red vehicles represent attackers targeting the RSU2 by flooding it with fake messages, eventually disabling it.

Figure 1.

Vehicle to infrastructure (V2I) DDoS.

In this article, we explored several studies related DDoS attack detection in VANET systems to highlight the existing limitations on the datasets used as basis for training the prediction models. Based on the found limitations and gap, we proposed and conducted a simulation framework to generate a suitable dataset that fits the VANET architecture considering a common routing protocol in VANET, namely, ad hoc on-demand distance vector (AODV). Moreover, the generated dataset is used to evaluate numerous machine learning (ML) models for the purpose of detecting DDoS attacks targeting VANET nodes.

The organization of this study is as follows: we introduce the literature review in section “Literature review” with a focus on the studies that involve the usage of datasets in evaluating their DDoS attack detection techniques. The simulation work is presented in section “Simulation work for generating the dataset.” The process of generating and adopting the dataset is presented in section “Dataset specifications.” Finally, section “Conclusion” summarizes the main points as a conclusion, and section “Future work” presents the future work.

Literature review

Several studies have been proposed to mitigate the effects of DDoS attacks targeting VANET or ad hoc networks. Some of the studies used statistical methods to detect the attack and others relied on ML techniques to detect and classify the attack traffic. Others still introduced frameworks on generating datasets for evaluating network attack detection techniques. Figure 2 shows the hierarchy and taxonomy of the explored studies in this article. We categorized the studies into six classes based on network environment (VANET or non-VANET), the inclusion of DDoS attack, the presence of a dataset (dataset-based or statistical method), and the evaluation method (ML-based or generation framework).

Figure 2.

Categories of studies investigated in the literature review.

The obtained classes as shown in Figure 2 are as follows: (1) Category 1: studies that used statistical methods rather than ML techniques for detecting attacks on networks.^13–20 (2) Category 2: studies that used ML techniques trained on datasets of VANET to evaluate detection of network attacks including DDoS attacks.^21–25 (3) Category 3: studies satisfying the filtering criteria for Category 2, but not including DDoS attacks.^26–34 (4) Category 4: studies that used ML techniques trained on datasets to evaluate detection of different attacks including DDoS, but not on a VANET environment architecture.^35–40 (5) Category 5: studies introducing frameworks for generating datasets for the VANET environment.^41,42 (6) Finally, Category 6: studies introducing the generation of datasets for detecting attacks but not considering the specifications of VANETs.^43,44

Statistical-based studies for DDoS attack

The first category of these studies includes statistical-based techniques for detecting DDoS/denial of service (DoS) attacks targeting smart communication networks such as VANET.^13–15 However, these studies do not use datasets for validating the proposed techniques; they rely only on statistical methods.

In Kolandaisamy et al.,¹³ multivariant stream analysis (MVSA) was proposed as a method to detect and prevent DDoS attacks targeting VANET. Similarly, Haydari and Yilmaz¹⁴ used a statistical anomaly detection technique to detect the attack by applying an online discrepancy test (ODIT) at the detection phase for both low- and high-rate DDoS attacks and then blocking the traffic generated by the attackers based on their locations.

The study presented in Shabbir et al.¹⁵ proposed a threshold-based framework for detecting and preventing attacks by utilizing communication time as a communication characteristic to be compared with a specific threshold to decide about alerting the other nodes to avoid further communication with the attacker nodes. This type of study was excluded as it did not involve datasets for training the proposed models.

Mirsadeghi et al.¹⁶ introduced a cryptography method based on certificates issued by a trusted authority. To have a trusted clustered vehicular network, they proposed estimating a trust degree for each node considering the trust between vehicles and RSUs. Then, based on the estimated trust degree and other mobility measures, the appropriate cluster head is selected which in turn checks the trust degree of abnormal nodes. Any abnormal nodes will be in the blacklist of the certification authority and thus unable to communicate with either other vehicles or control units.

Bhushan and Gupta¹⁷ discussed the features of software-defined networking (SDN) and proposed a novel flow table sharing technique to mitigate DDoS attacks that target the network by overloading the flow table, which usually has a limited size. For the proposed approach, they modeled the flow table space as an M/G/S/C queuing model and then applied several rule-based methods to detect and prevent DDoS on SDNs by utilizing the flow table status for all the switches and the blacklist database that holds Internet protocol (IP) addresses of attack sources.

Kolandaisamy et al.¹⁸ proposed an analysis model that is capable of detecting DDoS attacks on a VANET environment with less time needed to identify the attack compared with other techniques discussed in their study-related work. The main idea is to calculate different measures through different stages as follows. Based on the clustering score of the incoming packets, a stream position analysis was used to calculate specific determined features for the nodes including the volume of the communicated data, payload, and message rates. Then, these computed features were used to calculate the conflict field, conflict data, and attack signature sample rate, which are finally used in a statistically based model to decide on the legitimacy of a node.

Kolandaisamy et al.¹⁹ proposed using an analytical approach that utilizes the measures gathered by packet marking based on an adapted stream region scheme. The proposed approach involves extracting the neighbor log file, calculating the node’s value, deciding on the source region, identifying the routes for each region, and computing the circulation rate. Finally, the identification of a DDoS attack is based on the deviation of the circulation from the current rate. Similarly, Bensalah et al.²⁰ proposed a statistical method for detecting and controlling malicious nodes in a VANET using a variable control chart as a model to monitor the quality of the communication for each node. Then, based on the taken measurements, a node is considered a malicious node when its statistical quality violates the control limit.

ML-based studies for DDoS attacks on VANET

ML techniques are promising techniques for providing accurate detection and prediction mechanisms used in many areas and domains including VANETs.⁴⁵ There are several ML studies applied to detect numerous malicious behaviors on VANETs. Since our focus in this study is on the dataset that fits the VANET environment, we explored several studies that implemented ML techniques to detect DDoS attacks on VANET systems.^21–25 Moreover, we have highlighted the datasets, simulation tools, and ML techniques used in such studies.

Singh et al.²¹ conducted an analysis on the impact of DDoS attacks on the vehicle to infrastructure (V2I) communication under an SDN architecture. They simulated Software-Defined Video Networking (SDVN) using Mininet-WiFi and scikit-learn library for ML classifiers. In addition, eight supervised classifiers were used including gradient boost, random forest (RF), logistic regression (LR), nearest neighbors, decision tree (DT), Support-Vector Machine (SVM), naïve Bayes (NB), and neural network. Gradient boost classifier gives the best accuracy among the models used. However, the study did not provide details of the generated dataset or the simulation scenarios such as the number of attacker nodes and the attack rate.

Aneja et al.²² introduced a hybrid Intrusion Detection System (IDS) to detect the RREQ Flooding attack in VANET environment where they used SUMO, MOVE, and NS-2 tools in their conducted simulation experiments. They combined Artificial Neural Networks (ANNs) with a Genetic Algorithm (GA) model as a detection model where ANN performs the classification and GA tunes the selected input features. The dataset generated in Aneja et al.²² has a detailed description of the simulation tools used along with the related steps and parameters. However, it has certain limitations: the dataset itself is not available for other researchers, the network configuration was not presented in the study, and there is no report on the dataset features.

The dataset generated in Karagiannis and Argyriou²³ is not available online, the study did not show a clear procedure on processing the data, and there is no report on either the simulation tools or the network configuration parameters. Similarly, the dataset introduced in Belenko et al.²⁴ is not available online and has no description of the configuration of the network and simulation environments. Thus, the generated datasets introduced in Singh et al.,²¹ Aneja et al.,²² Karagiannis and Argyriou,²³ and Belenko et al.²⁴ have been excluded from this study owing to the unavailability of both the dataset and the configuration of simulation work used to generate the datasets.

The study in Zeng et al.²⁵ proposed a deep learning technique as an IDS that can perform feature extraction and classification of different attacks on VANET including DDoS, wormhole, and Sybil attacks. The dataset used to train the detection models was generated using an NS-3 simulator considering only the raw packets and logs as the output from the simulator. In addition, the ISCX IDS dataset⁴⁶ was used to regenerate and extract samples for different types of attacks such as DDoS attacks. The main observed limitations of the generated dataset are related to the generalization of the configuration parameters related to VANETs as well as the lack of enough scenarios on the experimental models. Moreover, the ISCX IDS dataset is a traditional dataset for IDSs and is not designed to capture the structure and features of VANETs.

ML-based studies for malicious attacks on VANET

Many studies have presented different ML techniques trained on datasets for detecting different types of attacks; however, they have not included or considered DDoS attacks.^26–34

Ghaleb et al.²⁶ used ANN model to detect malicious traffic in VANET. They trained ANN on a next generation simulation (NGSIM) dataset using MATLAB tools, and the results showed an accuracy of 99%. They used real traffic along with injected dynamic noises to generate a dataset that had many attacks. However, there were no DDoS attacks; moreover, the dataset did not give any details on the network configuration.

Another study in Li et al.²⁷ presented the usage of SVM to detect nodes with suspicious behavior in a VANET environment by considering several input parameters such as the movement speed and transmission range. The dataset was generated by the GloMoSim simulation framework. However, the study did not report the dataset specification, nor is the dataset available online.

Grover et al.²⁸ conducted an experimental work using the NCTUns-5.0 simulator to generate a dataset that was used to train and evaluate several ML techniques on detecting malicious nodes in VANET where Weka had been used to evaluate the specified classifiers. Although it presented the simulation work to generate the dataset, it did not explain the procedure. Moreover, it did not involve a DDoS attack in the generated dataset.

Ali Alheeti et al.²⁹ proposed a smart security framework to protect the outside communication system for autonomous and semi-autonomous vehicles by detecting gray hole and rushing attacks in real time. They simulated the environment using SUMO, MOVE, and NS-2 which generates a trace file to produce a set of features that can be used to differentiate legitimate from malicious behavior. Two ML classifier algorithms were applied—SVM and FeedForward Neural Networks (FFNNs), where results showed that the FFNN model had a lower false negative rate than SVM. However, SVM showed a high performance in terms of detection time and is faster than FFNN. To summarize, a detailed procedure for generating a dataset was presented, but without involving DDoS attack traffic. Moreover, there was insufficient information regarding network configuration parameters.

Aloqaily et al.³⁰ proposed a framework called D2H-IDS as an IDS in vehicle nodes connected through a cloud network. The effectiveness of this solution was validated through simulations where they generated normal traces using NS-3 and NSL-KDD dataset⁴⁷ for generating several attacks including a DoS attack. The features were selected by applying a Deep Belief Network (DBN), and a DT was used for the classification of attacks where results showed high accuracy and low false rates. However, generally, the datasets proposed and generated in Li et al.,²⁷ Grover et al.,²⁸ Ali Alheeti et al.,²⁹ and Aloqaily et al.³⁰ did not fulfill the required criteria due to the unavailability of the online dataset, not reporting the network configurations, and not considering DDoS attacks as a part of the generated dataset.

Singh et al.³¹ generated a synthetic dataset using an NS-3 simulator and mobility traces produced by an SUMO traffic simulator. In the network simulator, they used about 40 vehicles assuming movements of fixed speed during the simulation time. The simulation was designed to generate the traffic holding features of a wormhole attack. Then, the data were preprocessed and used as a dataset for training both K-nearest neighbor (KNN) and SVM models for detecting wormhole attacks. However, they presented no details on the procedure and methodology of simulating the environment and generating the dataset. Moreover, the dataset can only be used for wormhole attacks.

A study Singh et al.³² proposed using SVM and LR as ML techniques to detect false position data generated by malicious VANET nodes, known as a false position attack. The evaluation of the detection model was conducted on the VeReMi dataset,⁴¹ and the results showed a high accuracy of about 97%. However, the dataset does not involve traces for DDoS attacks.

The VeReMi dataset has been used in Gyawali and Qian³³ to validate different ML techniques (LR, K-nearest, DT, bagging, and RF) on detecting misbehavior attacks including both false alert generation and position falsification attacks. Similarly, a study presented in So et al.³⁴ proposed ML-based techniques (KNN and SVM) for detecting misbehavior attacks on VANETs using the VeReMi dataset for training the proposed models. However, the evaluation and prediction of DDoS attacks were not a part of their studies; moreover, the dataset does not involve traces for such attacks.

ML-based studies for DDoS attacks on non-VANET

Several studies have been proposed for detecting DDoS attacks using either existing datasets or simulations to generate their own datasets for the purpose of training and validating different ML classifiers.^35–40 The main limitation of these studies is that the datasets used do not capture the characteristics of VANETs or the environment.

Kim et al.³⁵ used the KDD CUP 1999 intrusion detection dataset to train SVM for detecting several attacks including DDoS attacks, and the results showed an effective classification of different attacks with an accuracy of about 85%. Although the KDD CUP 1999 used in Kim et al.³⁵ is available online and contains DDoS attacks, it was not designed for VANETs.

Yu et al.³⁶ proposed a framework to detect DDoS attacks on SDVN environments by implementing three different detection models including a trigger detection model for inbound packets, a flow table feature-based detection model utilizing the features of OpenFlow protocol, and an attack detection model based on SVM. They used a combination of real and generated network traffic to generate a dataset considering different types of DDoS attacks using the Scapy and hping3 tools, and simulation results showed an accuracy of greater than 97%. However, the dataset used is not available online, and the simulation was conducted on virtual machines, which do not reflect VANET characteristics as nodes are mobile in VANET and at different speeds resulting in quick changes to the topology.

Luong et al.³⁷ presented a simulation work to generate a training dataset to be used with KNN classifiers for detecting flooding attacks on MANETs by considering the frequency of route request packets. Similarly, a study presented in Reddy and Thilagam³⁸ conducted a simulation work using Network Simulator (NS-2) for ad hoc networks to evaluate a proposed DDoS attack’s mitigation technique that relies on the usage of NB classifier. Gao et al.³⁹ proposed an IDS for DDoS attacks using RF classifier utilizing big data technologies such as Spark and Hadoop distributed file system for implementing the proposed approach. They used both NSL-KDD⁴⁷ and UNSW-NB15⁴⁸ datasets for evaluating the proposed method for detecting DDoS attacks. However, the datasets used in Luong et al.³⁷ and Reddy and Thilagam³⁸ were generated by considering only general ad hoc network characteristics, and are not available online. Similarly, the NSL-KDD and UNSW-NB15 datasets used in Gao et al.³⁹ include traffic for DDoS attacks, but are not properly representative of VANET as they were designed for general network traffic and thus do not capture the characteristics of a VANET environment.

Ali Alheeti and McDonald-Maier⁴⁰ proposed a hybrid IDS for detecting malicious intrusion attacks such as DDoS and network scanning attacks on autonomous vehicles. They used multi-layer perception (MLP) with fuzzy logic techniques trained on the Koyoto dataset,⁴⁹ showing an accuracy of 99%. Although the Koyoto dataset holds traffic and features from real network communications, it does not have traffic obtained from VANET communication systems that use different types of protocols and work on a specific style of communication.

Framework-based studies for generating datasets

The last group of studies explored and evaluated in this study is of studies that presented frameworks for generating datasets that can be used for training and evaluating intrusion detection techniques against several types of attacks. Some of these studies were dedicated for VANET^41,42 and others were not.^43,44

Lyamin et al.⁴² proposed a heuristic approach derived from data mining methods for real-time detection of radio jamming DoS attacks in a VANET communication environment. To train the proposed detection model, they conducted a simulation experiment using MATLAB to generate a sort of dataset holding cooperative awareness message (CAM) transmissions in the IEEE 802.11p protocol. However, the training model focused only on the CAM transmission, leading to short training sequences of 100 s. Moreover, details on the configuration of the simulated environment and the obtained traces are not available online for further studies. A recently published dataset on VANET environments presented in Van der Heijden et al.⁴¹ held many misbehaviors and attacks but did not include DDoS attacks which is the focus in this study.

A study presented in Damasevicius et al.⁴³ proposed a dataset called LITNET-2020 generated using LITNET NetFlow topology and holding different attacking scenarios including DoS, DDoS, worms, land, and fragmentation attacks. They considered data flow for different protocols including IPv6, transmission control protocol (TCP), user datagram protocol (UDP), and Internet control message protocol (ICMP). The dataset is available online and the study provided a description about the dataset features and the network configuration parameters. However, the proposed dataset is not dedicated for VANETs as it does not capture the properties of VANETs nor the VANET protocols.

A framework was proposed in Al-Hadhrami and Hussain⁴⁴ for dataset generation that can be used to train and validate IDS models on IoT networks. The dataset is called IoT-DDoS and involves different types of traffic including normal traffic, flooding attacks, selective forwarding attacks, and blackhole attacks considering different protocols such as the RPL routing protocol, ICMPv6, IEEE 802.15.4, 6LoWPAN, and UDP. However, the framework does not take into account the specifications and protocols of VANETs.

As a summary, we believe that there are limited existing solutions for detecting DDoS attacks in VANETs. The limitations are due to the lack of real or synthetic DDoS datasets designed or generated for VANET environments. Furthermore, applying traditional network solutions on VANETs without considering the VANETs’ unique characteristics may lead to inaccurate results. Even though some studies have generated datasets considering the VANET environment, they did not illustrate the features available in their datasets and others did not demonstrate the methods they followed to generate the datasets. Thus, it is difficult for other researchers to utilize these datasets as well as to validate and compare their results with such solutions. Consequently, we believe that these datasets cannot be used for further studies owing to one or more of the following reasons: (1) dataset is not available online, (2) dataset does not contain a DDoS attack, (3) dataset is not designed for VANET environments, and (4) unavailability of information regarding network configuration. Table 1 compares these explored datasets and studies based on the fulfillment of these four criteria. Moreover, the table presents other aspects of the evaluated studies including the type of attacks being considered, the detection techniques used, and the percentage of accuracy achieved.

Table 1.

Comparison of different datasets used for attack detection in VANET.

Reference	Used dataset	Online availability	Involving DDoS attack	Dedicated for VANET	Network configuration availability	Involved attacks	Trained models	Performance ratio
Singh et al.²¹	Generated dataset	✗	✓	✓	✗	DDoS	RF, DT, SVM, boosting, others	Above 90%
Aneja et al.²²	Generated dataset	✗	✓	✓	✗	Flooding attack	ANN	99%
Karagiannis and Argyriou²³	Generated dataset	✗	✓	✓	✗	RF jamming attack	K-means	NA.
Belenko et al.²⁴	Generated dataset	✗	✓	✓	✗	Including DDoS	None	NA.
Zeng et al.²⁵	Generated dataset for VANET and ISCX IDS	✗	✓	✓	✗	DoS, DDoS, blackhole, wormhole, Sybil	CNN, LSTM	96.9%
Ghaleb et al.²⁶	NGSIM	✓	✗	✓	✗	Malicious nodes	ANN	99%
Li et al.²⁷	Generated dataset	✗	✗	✓	✗	Malicious nodes	SVM	Above 95%
Grover et al.²⁸	Generated dataset	✗	✗	✓	✗	Malicious nodes	NB, J48, RF	97%
Ali Alheeti et al.²⁹	Generated dataset	✗	✗	✓	✗	Gray hole	SVM, FFNN	90%
Aloqaily et al.³⁰	NSL-KDD for attack traffic	✗	✗	✓	✗	DoS, Probe, R2L, U2R	Deep belief, DT	99.43%
Singh et al.³¹	Generated dataset	✓	✗	✓	✗	Wormhole attack	KNN and SVM	99%
Singh et al.³²	VeReMi	✓	✗	✓	✓	Position falsification attack	LR and SVM	97%
Van der Heijden et al.⁴¹	Generated dataset (VeReMi)	✓	✗	✓	✓	Position falsification attack	Non-machine learning	Close to 1
Gyawali and Qian³³	VeReMi	✓	✗	✓	✓	False alert, position falsification	LR, K-N, DT, bagging, RF	97%
So et al.³⁴	VeReMi	✓	✗	✓	✓	Location spoofing	KNN and SVM	94%
Kim et al.³⁵	KDD CUP 1999	✓	✓	✗	✗	DoS attack	SVM	85%
Yu et al.³⁶	Generated dataset	✗	✓	✗	✗	DDoS	SVM	98.56%
Luong et al.³⁷	Generated dataset	✗	✓	✗	✗	Flooding attack	KNN	Above 99%
Reddy and Thilagam³⁸	Generated dataset	✗	✓	✗	✓	DDoS	NB	80%
Gao et al.³⁹	NSL-KDD and UNSW-NB15	✓	✓	✗	✓	DDoS	RF, SVM, NB	99.9% and 98.7%
Ali Alheeti and McDonald-Maier⁴⁰	Koyoto dataset	✓	✓	✗	✗	Malicious intrusion	MLP	99%
Lyamin et al.⁴²	Generated dataset	✗	✗	✓	✗	Radio jamming DoS	Heuristic approach	95%
Damasevicius et al.⁴³	LITNET-2020	✓	✓	✗	✓	DDoS, worms, land, and others	None	NA
Al-Hadhrami and Hussain⁴⁴	Generated dataset called IoT-DDoS	✗	✓	✗	✗	Flooding, forwarding, blackhole attacks	None	NA
VDDD	Generated dataset	✓	✓	✓	✓	DDoS	J48, SVM, ANN, KNN, RF, NB	99.7%

VANET: Vehicular Ad hoc NETworks; RF: random forest; DT: decision tree; SVM: support-vector machine; ANN: artificial neural network; NA: not applicable; IDS: intrusion detection system; CNN: convolutional neural network; LSTM: long short-term memory; NGSIM: next generation simulation; NB: naïve Bayes; FFNN: feedforward neural network; LR: logistic regression; K-N: K-nearest; KNN: K-nearest neighbor; MLP: multi-layer perception; VDDD: Vehicular Ad hoc NETwork distributed denial of service dataset.

To solve the limitations of the existing approaches discussed here, we aim to generate a novel dataset that can be used for DDoS attack detection in VANET environments called VDDD, which can be used by the researchers working in this field. To make our dataset available for others, we present with enough details the procedure we followed for generating this dataset starting from the selection of the simulation tools until the generation of the complete dataset in the VANET environment. Moreover, we analyze and evaluate the quality and performance of the generated dataset by applying commonly used ML techniques.

Simulation work for generating the dataset

When simulating a VANET environment with many vehicles conceivably broadcasting several messages per second, the selection of simulation tools becomes a crucial task. Some important parameters should be considered such as user-friendliness, scalability, and the ability to connect network communication simulators and road traffic. We implemented our model using a combination of four frameworks: OMNeT++,⁵⁰ SUMO,⁵¹ INET,⁵² and Veins.⁵³ The simulation tools and versions used in this study are shown in Table 2. To create a realistic testbed, we simulated the traffic on King Fahad Highway, which is located in the Eastern Province of the Kingdom of Saudi Arabia, and connects between two cities Dammam and Al-Khobar, as shown in Figure 3.

Table 2.

Simulation tools and versions used in this work.

Category	Tool	Version
Network interface	OMNeT++	V 5.1
Model library	INET	V 3.6
Network mobility framework	Veins	V 4.7
Traffic generator	SUMO	V 0.30.0
Machine learning evaluation	Weka	V 3.8.3
Operating system	Windows	Windows 7 64 bits

Figure 3.

Simulation map showing the King Fahad Highway.

OMNeT++ records simulation results as scalar values, vector values, and histograms. In addition, it supports exporting the results to different formats including Python files, SQLite files, and others. INET Framework is an open-source model library for the OMNeT++ and it supports the implementation of many transport layer protocols such as TCP, UDP, and stream control transmission protocol (SCTP). In addition, it supports wired/wireless interfaces like Ethernet, IEEE 802.11, and many other protocols and components.

Veins (vehicles in network simulation) is an open-source framework for running vehicular network simulations. It allows dynamic interaction between OMNeT++ and SUMO through implementing TraCI (traffic control interface). Veins was selected due to the unique features it has such as supporting realistic maps, realistic traffic, and the use of different protocols including the routing protocols provided by INET. Finally, SUMO is an open-source traffic generator which creates mobility scenarios on real road maps based on user-specified parameters. SUMO provides an application programming interface (API) called TraCI, which stands for traffic control interface. TraCI allows accessing and synchronizing retrieved values of the SUMO simulated scenarios by managing the TCP connections under the client/server architecture.

The selection of the aforementioned simulation tools was a result of evaluating various simulation tools in two stages. The first stage was exploring the simulation tools in the literature review and their features as well as how the researchers use these tools to simulate their network. In the second stage, we selected some tools based on the following criteria: model development, customization, ability to generate different types of network traffic, scalability, integration with other tools, and capability to record and analyze the generated events. The combination of the used frameworks satisfies all requirements that we need to simulate VANET including normal and UDP flooding attacks as well as recording all the traffic events within the configured environment.

Overview of proposed work

The main idea of the proposed work is to simulate a VANET environment considering both normal and DDoS traffic for the purpose of generating a synthetic dataset based on several simulated scenarios to be used as an input for ML methods. The proposed work involves three main stages as shown in Figure 4. Starting from the bottom, the first stage is to generate realistic network mobility traffic using SUMO. The second stage is to import the SUMO mobility traffic into OMNeT++ to generate the network traffic (normal and DDoS) utilizing both Veins and INET. The final stage is to for collect and prepare the dataset that will be used for evaluating and studying the performance of several ML algorithms.

Figure 4.

Proposed work diagram.

We started using SUMO to prepare the network and generate the traffic. The first step is to export the simulation area which is the King Fahad Highway from OpenStreetMap (OSM).⁵⁴ Then, the OSM file is processed with SUMO’s net-convert utility that transforms geo-coordinates to metric coordinates of the OSM map; these metric coordinates are utilized in the next step by SUMO. The following command does this task:

netconvert –osm-files *.osm -o *.net.xml

Besides the network file, we need to consider the obstacles found within the scenario such as buildings and parks. OSM files have the advantage of providing such information in addition to other information like streets, lanes, junctions, and the maximum speed for each street. We used the poly-convert utility to generate a poly file, which can be used in Veins to identify all the obstacles using the following command:

polyconvert –net-file *.net.xml –osm-files *.osm –type-*.xml -o *.poly.xml

After generating the obstacles file, the SUMO network is established and we proceed to generate the network traffic. There are two options to generate traffic for the vehicles in SUMO. The first one is to generate a random trip and the second one is to design a custom trip in a specific route. In this study, we selected the second choice and assumed that all vehicles considered in our scenarios cross the King Fahad Highway from Dammam to Al-Khobar. To simulate traffic in our network, we generated four files as follows:

1. Traffic Analysis Zone (TAZ) file which contains the edges for our route.

2. Origin/Destination (OD) matrix file that includes the origin point, the destination point, and the number of vehicles passed while taking the route.

3. Od2trips file that takes the TAZ and OD files as an input. Before generating the fourth file, we combined these three files to generate Od_file.odtrips.xml, by running the following command:

od2trips -c PATH\od2trips.config.xml -n PATH\taz_file.taz.xml -d PATH\OD_file.od -o PATH\od_file.odtrips.xml

4. SUMO configuration file that takes both the network and the OD trip files as input and then generates the route file as an output. By applying the following command, we generate trips and route files. The trip file contains the trip for each vehicle and other information like departure time and speed. Conversely, route files look like trip files except that the route file contains all the intermediate edges from origin to destination.

duarouter –c PATH\duarcfg_file.trips2routes.duarcfg

In OMNeT++, we imported two frameworks—INET and Veins as shown in Figure 4. Veins uses protocols and applications provided by INET to simulate both normal and attack traffic. INET provides and supports different transport layer protocols like TCP, UDP, and SCTP. Moreover, it provides several routing protocols that can be used within the simulation. In this work, we used UDP as it is widely used in VANET owing to its ability to rapidly transport data compared to TCP.⁵⁵ Two types of UDP applications were used in this work: the first one is UDP Basic App and the second one is the UDP sink. UDP Basic App sends UDP packets to the given IP addresses in each time interval where the IP address could be a Wireless Local Area Network (WLAN) or a node IP. The UDP sink App binds a UDP socket to a given local port and prints the received packets’ information such as the source, destination, and length of the packet.

In the Veins subproject, we started editing and building the simulation in three steps to meet our scenario’s parameters as shown in Table 3. The first step is to replace square files (square.net.xml, square.poly.xml, and square.rou.xml) with our SUMO files previously generated by the simulation and that have the same extensions. Figure 5 shows the created contents as a result of the first conducted step. The second step is to edit the “scenario.ned” file to meet our scenario’s parameters and other required network configuration like the life cycle Controller which manages general operations such as shutdown, restart, suspend, and crash. Figure 6 shows the design of “scenario.ned.” Furthermore, we have added the AODV routing protocol to the vehicle node “car.ned” to be connected with the network layer as shown in Figure 7. The final step is to simulate both the normal and DDoS traffic through the usage of “omnetpp.ini.”

Table 3.

Simulation parameters.

Parameter	Value
Routing protocol	AODV
PHY model	IEEE 802.11p
Channel	Wireless
Mobility scenario	Highway (18 km)
Thread	DDoS
Transport protocol	UDP
Vehicle communication range	550 m
RSU communication range	600 m
Packet size	100 byte
RSU	3
Number of vehicles	20, 60
Speed	Maximum of road speed
Number of attackers	2
Attack duration	25 s
Attack rate	10 and 50 pps
Normal rate	1–5 pps
Run time	500 s

AODV: ad hoc on-demand distance vector; DDoS: distributed denial of service; UDP: user datagram protocol; RSU: road side unit.

Figure 5.

KingFahadHighway.launchd.xml.

Figure 6.

Scenario design.

Figure 7.

Internal node design.

OMNeT++ provides all the requirements to simulate different security attacks. Several researchers used OMNeT++ to simulate different types of DDoS attacks in traditional networks.^56–58 In this work, we generated normal and DDoS traffic for VANET scenarios. For the normal traffic, each node (vehicle or RSU) broadcasts UDP packets with a transmission interval of 1–5 per second. Conversely, the DDoS traffic is based on two key attributes: attack intensity and the number of attacker nodes. The attack intensity is between 10 and 50 packets per second (pps) and the number of attackers either 2 or 6 according to the designed scenario.

Figures 8 and 9 show the configuration parameters for both UDP normal traffic and DDoS attack traffic, respectively. The parameters include the IP addresses, port numbers, the multicasting group, start time, end time, and the traffic rate.

Figure 8.

Configuration parameters for UDP normal traffic.

Figure 9.

Configuration parameters for DDoS traffic.

Scenarios

In this study, the implemented topology includes three RSUs and N number of vehicles along a highway of 18 km where we have considered a low traffic rate of 20 nodes, N = 20, and a high traffic rate of 60 nodes, N = 60. For each rate scenario, we considered and used two levels of attack rate: 10 and 50 pps, resulting in four different scenarios. In addition, we configured one of the RSUs to be the victim unit that will be exploited by the attack traffic.

Normal and attack traffic was generated using OMNeT++ where each node broadcasts requests to all reachable nodes. All nodes send normal packets in a random manner where the transmission interval is between 1 and 5 s. The attack traffic was generated by specific vehicles to target the victim RSU with two different rates (10 and 50 pps).

We designed these scenarios with their related parameters based on recent studies, which simulated a VANET environment to either study the impact of some attacks or to generate a dataset for VANET environments. Table 4 shows the simulation parameters used by several studies from which we have adapted our parameters shown in Table 5.

Table 4.

Simulation parameters used by recent studies.

Reference	Simulation time (s)	Number of attackers	Number of vehicles	Number of RSUs	Transmission rate	Speed
Li et al.²⁷	900	5, 10, 15, 20, 25, 30, 35, 40	50, 100, 200	–	–	5, 10, 20, 30 m/s
Ali Alheeti et al.²⁹	499	4	40	9	–	30 m/s
Aloqaily et al.³⁰	600	–	40	–	8 pps	20 m/s
Aneja et al.²²	200	2	20	–	–	30 m/s
Belenko et al.²⁴	100	2	30	–	250 Kbps	30 m/s
Haydari and Yilmaz¹⁴	200	–	250	–	1 pps	–
Siddiqui and Boukerche⁵⁹	120	1	20, 60	3	10, 50, 100 pps	15 m/s

RSU: road side unit.

Table 5.

Details of simulated scenarios.

Number of scenarios	First scenario	Second scenario	Third scenario	Fourth scenario
Simulation time	500 s	500 s	500 s	500 s
Attacker	Node [1…2]	Node [4…5]	Node [1…6]	Node [8…13]
Victim	RSU2	RSU2	RSU2	RSU2
Attack duration	25 s	25 s	25 s	25 s
Attack time	180–205 s	180–205 s	180–205 s	180–205 s
Number of vehicles	20	20	60	60
Number of RSUs	3	3	3	3
Attack rate	Low rate	High rate	Low rate	High rate
Number of pps	10	50	10	50

RSU: road side unit.

Dataset specifications

Generally, evaluating an intrusion detection–based ML model depends on more than the classification accuracy result as many other dimensions should also be considered such as characteristics of the simulation area, the used dataset, and how the normal and attack traffic is being generated.

In this section, we explore the procedure to create VDDD. The following sections illustrate the steps to generate a synthetic dataset on VANET environment. This section starts with the data collection and data preparation steps. After that, we proceeded to the data pre-processing step. Finally, we presented the dataset’s feature selection step.

Data collection

In this stage, we collect our data from OMNeT++ for further analysis. Two files are mainly required to generate the dataset, which are the trace file (log file) and simulation results (vector file). Figure 10 illustrates the workflow we followed starting with collecting the raw data and proceeding until we obtained a complete and informative dataset.

Figure 10.

Data workflow.

The log file holds the events of messages’ transmission taking place among modules during the simulation. Among the information recorded in this file are event number, time, source and destination, packet name, source and destination port, and packet length. The vector file records data values as a series of times, that is, with a timestamp, which is necessary to calculate the features in the upcoming steps. These data values are recorded and captured based on several categories or features. Moreover, OMNeT++ provides several analyses and validation tools that can be used to validate the accuracy of such data vectors. For example, Figure 11 shows a vector plot for all the transmission rates that happened during the simulation.

Figure 11.

Vector plot for transmission state.

Data preparation

After collecting the raw data in the previous step, the data are ready to be prepared and processed in such a way that it can be used for evaluating ML techniques. As shown in Figure 12, the raw data goes through several stages until we get an informative dataset in a suitable format to be read and analyzed. These stages involve processing the log file obtained from the log viewer, processing the vector file generated by OMNeT++, merging the log and vector files using both Python functions and Jupyter Notebook, and labeling traffic instances using queries in SQLite. The purpose of processing log files is to keep only the important information in the log as well as to clean the data by removing redundant information, thus making it ready to be merged with the vector data. Figure 13 shows the final version of the log file.

Figure 12.

Data preparation flowchart.

Figure 13.

Log file.

The vector file has been exported OMNeT++ to a SQLite database browser.⁶⁰ This exported vector file contains 12 tables with some containing general information such as the simulation run information. We only focused on three tables as shown in Figure 14. The vector table contains information about all modules along with many statistics such as Min, Max, Sum, and others. Figure 15 shows a part of the vector table. The preparation step for this file involves correcting some errors in the vector data table such as correcting the data types of some of the fields and validating the data values exported from OMNeT++.

Figure 14.

Schema of exported vector file.

Figure 15.

Vector table.

In order to have an informative dataset, it is necessary to merge the log file with the vector file. To achieve that, we wrote python functions and used Jupyter Notebook⁶¹ to merge these two files by exploring each event in the log file, and then for each event calculating the current, previous, and next time for each node to obtain the 16 selected features’ values in the interval between the previous and next time. Figure 16 shows the workflow of the functions that extract the features’ values from both the log and vector files. The main idea is to conduct some queries on the data files to accumulate the values related to each feature according to the given time interval that takes place between the previous and next time events. As shown in Figure 16, we developed several functions and queries to handle different features based on their natures. Functions perform queries to extract the instances related to a specific node based on the event’s timestamp, previous time, and next time, and then perform procedures to accumulatively calculate the features’ values.

Figure 16.

Workflow of merging and calculating the values for dataset features.

Labeling the dataset is an indispensable stage of data pre-processing. From previous steps, we have full information about each traffic item/event. In this stage, we labeled all traffic to normal and DDoS based on the attack details such as source IP, destination IP, attack times, and duration. Labeling the dataset was done by applying queries in the SQLite DB browser. Table 6 shows the number of instances and their label class in each dataset.

Table 6.

Details of instances in each dataset.

	First dataset	Second dataset	Third dataset	Fourth dataset
Number of instances	4195	6186	11,556	17,375
Number of normal traffic	3695	3686	10,056	10,115
Number of attack traffic	500	2500	1500	7260

Data pre-processing

Usually, in the ML field, raw data may contain wrong data or missing values. So, data pre-processing is required before applying any classifiers. The pre-processing stage in our proposed architecture involves three steps: data normalization, feature selection, and balancing. In this section, we leveraged Weka capability when pre-processing data and the following sections give a detailed explanation for each data pre-processing step.

Data normalization

Data normalization is the process of rescaling the dataset attributes to lie in one particular range, for example, between 0 and 1 or −1 and 1. According to equation (1)

$X = \frac{(x - Min)}{(Max - Min)}$ (1)

Normalizing data often makes the dataset ready for applying any classifier. In addition, to increase accuracy results, we applied normalization to our dataset using Weka.

Feature selection

Feature selection is one of the data reduction methods where selecting features significantly influences the performance as it reduces the training time and improves the accuracy. Conversely, keeping irrelevant or partially relevant features can negatively affect performance. Various feature selection techniques are available today such as correlation-based feature selection (CFS), information gain (IG)–based feature selection, and gain ratio (GR) feature selection.⁴⁹ A brief description of each of these techniques is presented as follows.

CFS

CFS is a popular technique for estimating a correlation between the subset of attributes and their corresponding classes, as well as the inter-correlations among the features. It measures the relevance of a group of features as a high value of the correlation between the features and the classes indicates the group has more relevance, whereas a high value of inter-correlation shows a lower relevance of the group of features.^62,63 The measure of CFS is presented in equation (2)

$Ms = \frac{k \bar{rcf}}{k + k (k - 1) \bar{r_{ff}}}$ (2)

where $Ms$ refers to the heuristic of a subset containing K features, $\bar{rcf}$ is the mean correlation between the features and the classes, and $\bar{r_{ff}}$ is the average correlation only between features. After calculating CFS, we selected only those attributes that have a high positive or negative correlation. In other words, these attributes must be close to −1 or 1. We discarded the low correlation attributes that were close to zero.

IG–based feature selection

IG or entropy is another popular feature selection technique as it measures the contributed information for each feature on the class. The value varies from 0 to 1, where highly informative features get the highest values and 0 means that the feature has no information or impact on the classes.⁶⁴ The measure of IG is presented in equation (3)

$IG = H (Y) - H (\frac{Y}{X}) = H (X) - H (\frac{X}{Y})$ (3)

X and Y in equation (3) represent the random variables, and the entropy of a random variable X is written as H (X).

GR feature selection

The GR is a ratio of IG to the intrinsic information, which can be obtained by dividing IG over the entropy of X as shown in equation (4)

$GR = \frac{IG}{H (X)}$ (4)

When the data of X completely forecast Y, then the value of GR = 1. However, when there is no relation between Y and X, then the value of GR = 0. The GR favors variables with small values which is a conflict with IG.⁶⁴ Usually, with supervision information, feature significance is assessed via its correlation with the class labels.⁶⁵ Based on that, we used a CFS, which is supported by Weka.

Tables 7 –10 show the rank attribute for each dataset obtained by Weka. Based on the ranked attributes, we selected our cutoff to be equal to or greater than 0.2. Thus, if the attribute has a rank value equal to or greater than 0.2, it is considered an important feature to be included in the evaluation process. Otherwise, it has to be discarded.

Table 7.

Ranks of first dataset attributes.

Number	Ranked	Attribute name	Number	Ranked	Attribute name
1	0.53783	throughput	2	0.23248	rcvdPkFromHL
3	0.4611	passedUpPk	4	0.05142	droppedPkNotForUs
5	0.46028	rcvdPkSeqNo	6	0.00982	radioMode
7	0.46028	endToEndDelay	8	0.00875	sentPk
9	0.46028	passedUpPkCount	10	0.0055	queueingTime
11	0.45724	rcvdPk	12	0.0051	transmissionState
13	0.37168	rcvdPkFromLL	14	0.00485	sentDownPk
15	0.3252	receptionState	16	0.00359	queueLength

Table 8.

Ranks of second dataset attributes.

Number	Ranked	Attribute name	Number	Ranked	Attribute name
1	0.769838	throughput	2	0.404341	rcvdPkFromHL
3	0.680628	passedUpPk	4	0.112697	droppedPkNotForUs
5	0.67934	rcvdPkSeqNo	6	0.015552	radioMode
7	0.67934	endToEndDelay	8	0.005627	sentPk
9	0.67934	passedUpPkCount	10	0.004824	queueingTime
11	0.676787	rcvdPk	12	0.004812	transmissionState
13	0.307003	rcvdPkFromLL	14	0.000263	sentDownPk
15	0.249195	receptionState	16	0.009246	queueLength

Table 9.

Ranks of third dataset attributes.

Number	Ranked	Attribute name	Number	Ranked	Attribute name
1	0.33156	throughput	2	0.241	rcvdPkFromHL
3	0.3887	passedUpPk	4	0.04241	droppedPkNotForUs
5	0.38522	rcvdPkSeqNo	6	0.02043	radioMode
7	0.38522	endToEndDelay	8	0.04852	sentPk
9	0.38522	passedUpPkCount	10	0.01537	queueingTime
11	0.38579	rcvdPk	12	0.01526	transmissionState
13	0.36617	rcvdPkFromLL	14	0.00814	sentDownPk
15	0.35468	receptionState	16	0.03282	queueLength

Table 10.

Ranks of fourth dataset attributes.

Number	Ranked	Attribute name	Number	Ranked	Attribute name
1	0.54006	throughput	2	0.39829	rcvdPkFromHL
3	0.56035	passedUpPk	4	0.11365	droppedPkNotForUs
5	0.594	rcvdPkSeqNo	6	0.01521	radioMode
7	0.594	endToEndDelay	8	0.0364	sentPk
9	0.594	passedUpPkCount	10	0.00567	queueingTime
11	0.59013	rcvdPk	12	0.00556	transmissionState
13	0.40062	rcvdPkFromLL	14	0.00825	sentDownPk
15	0.33603	receptionState	16	0.02135	queueLength

Balancing

In our scenarios, we simulated both normal and attack traffic, and according to a real-world environment, the majority of traffic is normal traffic rather than attack traffic. This leads to having an unbalanced dataset. Strictly speaking, we do not have 50% normal and 50% attack or 60% to 40% traffic in our datasets. To handle this problem, we applied the Synthetic Minority Oversampling Technique (SMOTE)⁶⁶ which is supported by Weka. SMOTE is a popular balancing technique as it creates synthetic examples between existing real minority instances. The main idea of SMOTE is to increase the samples of the minority class by generating new instances in a random fashion among minority class samples using the KNN method. For example, before the balancing technique, the first scenario dataset had 3695 normal samples and only 500 attack samples. However, after applying the SMOTE balancing and sampling technique, the dataset holds 3695 normal samples and 3500 attack samples.

For more validation on managing the unbalancing issues within the generated dataset, we applied other techniques besides SMOTE to do the balancing. The techniques we used include ClassBalancer, CostSensitiveClassifier, and ThresholdSelector, which are available with the Weka tool.⁶⁷ However, with these balancing techniques, there was no significant increase in the accuracy of the evaluated models. Thus, we consider applying only the SMOTE sampling method as it gives the best accuracy among the meta-classifiers used in this study.

Dataset description

In each record of the dataset, there are 29 different features including one class attribute as either a DDoS class or a normal one. The features in bold in Table 11 are the ones chosen after applying CFS. Note that some non-qualified features were excluded such as IP addresses, protocol type, and times (the first 12 features in Table 11) from the initial feature set to ensure that the classification model is not reliant on particular acquisition biases. Overall, 10 features were selected from the original 29 for the next stage. Table 11 shows all features alongside their descriptions and an example of each feature.

Table 11.

Dataset features.

Number	Feature	State	Description	Example value
1	No	□	Sequence number	1
2	Event	□	Id of event	56
3	Time	□	Event time	1.921908541
4	PreviousValue	□	Previous event time	0
5	NextValue	□	Next event time	4.112046962
6	SourceName	□	Name of source/sender node	Node [1]
7	Packet	□	Name of the packet	AODV-RREQ
8	PacketType	□	Type of packet (UDP, TCP)	Udp
9	SourceIP	□	Source IP address	10.0.0.169
10	SourcePort	□	Source port number	9003
11	DestIP	□	Destination IP address	10.0.0.5
12	DestPort	□	Destination port number	9002
13	transmissionState	□	Transmission state of the radio	1
14	throughput	☑	Throughput	116
15	sentPk	□	Number of sent bytes	208
16	sentDownPk	□	Packets sent to lower layer	128
17	receptionState	☑	Reception state of the radio	48
18	rcvdPkSeqNo	☑	Sequence number of received packets	1
19	rcvdPkFromLL	☑	Packets received from lower layer	256
20	rcvdPkFromHL	☑	Packets received from higher layer	256
21	rcvdPk	☑	Number of received bytes	336
22	radioMode	□	Requested radio operational mode	1
23	queueingTime	□	Queueing time	1
24	queueLength	□	Queue length	3
25	passedUpPk	☑	Packets bytes passed to higher layer	128
26	passedUpPkCount	☑	Packets count passed to higher layer	1
27	endToEndDelay	☑	End to end delay	1
28	droppedPkNotForUs	□	Drop packet not addressed to us	0
29	class	Label	Represent traffic type	Normal, DDoS

UDP: user datagram protocol; TCP: transmission control protocol; IP: Internet protocol; DDoS: distributed denial of service.

Dataset evaluation

One of the main objectives of this work is to generate VDDD from a VANET environment and to share it with other researchers. To evaluate the validation and the quality of our generated dataset, VDDD, we followed the 11 criteria proposed by the Canadian Institute for Cybersecurity as a framework to evaluate datasets.⁶⁸ VDDD fulfilled nine out of eleven criteria: the two criteria that our dataset did not satisfy are heterogeneity and attack diversity. Moreover, VDDD contains different attack scenarios that have diversity in the attack rates and attack sources. Table 12 demonstrates how VDDD achieves/does not achieve each criterion.

Table 12.

Evaluated VDDD.

Number	Criteria	Status	Reasons
1	Information of network configuration	✓	The simulation scenario that we used to generate VDDD contains all VANET components such as RSU, vehicles, and router.
2	Complete traffic	✓	VDDD is generated based on complete traffic captured from all the nodes.
3	Labeled dataset	✓	All the traffic in VDDD is labeled as normal or DDoS
4	Complete interaction	✓	VDDD captures the whole network interactions including V2V, V2I, and I2I
5	Complete capture	✓	VDDD has captured all the traffic without removing any set of normal or attack traffic
6	Anonymity	✓	IP is provided for each node in the VDDD
7	Available protocols	✓	VDDD has both normal and anomalous traffic considering several protocols involved in generating these traffics
8	Feature set	✓	We provided and presented all features along with the way we used to extract those features.
9	Metadata	✓	In this study, we provided all metadata about the generated dataset
10	Heterogeneity	✗	We only generated VDDD from one source, which is the simulation log
11	Attack diversity	✗	In VDDD, we focused only on UDP flood attack, which is a type of DDoS attacks

VDDD: Vehicular Ad hoc NETwork distributed denial of service dataset; DDoS: distributed denial of service dataset; VANET: Vehicular Ad hoc NETworks; RSU: road side unit; UDP: user datagram protocol; IP: Internet protocol.

One of the important points to be considered is the feature set. In this study, we provided all the available features from the simulation experiments and then we applied feature selection techniques to select the feature set that gives the best accuracy. Table 13 summarizes the features recently studied and used in detecting DDoS attacks on VANETs and other environments using ML techniques.

Table 13.

Used features by other studies.

Reference	Number of features	Features	Environment
Yu et al.³⁶	7	H (srcIP)H (flows)Average number of packetsAverage number of bytesRate of flow table entries (Rf)Percentage of pair flows (Ppf)Ports generating speed (Pgs)	SDVN
Karagiannis and Argyriou²³	4	Received signal strength indicator (RSSI)Signal to noise and interference ratio (SINR)Packet delivery ratio (PDR)Relative speed variations (RSVs)	VANET
Singh et al.²¹	8	Source IPDestination IPSource portDestination portProtocol: the layer 4 protocol used such as TCP, UDP, ICMPByte countPacket countTime duration	SDVN
Aneja et al.²²	18	Not mentioned	VANET
VDDD	9	ThroughputReception state of the radioSequence number of received packetsPackets received from lower layerPackets received from higher layerNumber of received bytesBytes countPackets countDelay	VANET

SDVN: software-defined video networking; VANET: Vehicular Ad hoc NETworks; TCP: transmission control protocol; UDP: user datagram protocol; ICMP: Internet control message protocol; VDDD: Vehicular Ad hoc NETwork distributed denial of service dataset.

To evaluate the validation of VDDD, we examined the performance and accuracy of the selected features with different ML techniques including J48, SVM, RF, KNN, ANN, and NB, which are commonly used with DDoS attack detection as shown in Section “Literature review.” Here, we used the VDDD generated for the fourth scenario discussed early in section “Scenarios.” The experimental results have been evaluated in terms of the accuracy, precision, recall, and F1-score. The accuracy reflects the percentage of correctly classified instances recorded in the test dataset. The precision criterion measures the ratio of total relevant results that are correctly classified as positives out of all the samples that are predicted as positives. The recall criterion measures the ratio of total relevant results that are correctly classified as positives out of all samples that are actually positive. Finally, the F1-score, also called the F-score, combines both recall and precision to reflect the test’s accuracy. The experiments were conducted using the Weka tool with classifiers’ parameters shown in Table 14 for all the applied ML classifiers.

Table 14.

Main parameters of the classifiers.

Classifier	Parameters	Value	Classifier	Parameters	Value
J48	Confidence factor	0.25	KNN	KNN	1
	Minimum number of objects	2		Used distance algorithm	Euclidean distance
	Number of folds	3		Window size	0
SVM	Cache size	40	ANN	Hidden layers	a
	Coef0	0.0		Learning rate	0.3
	Cost	1.0		Momentum	0.2
	Degree	3		Training time	500
				Validation threshold	20
	Eps	0.001		Learning rate	0.3
	Gamma	0.0
	Loss	0.1		Momentum	0.2
	Nu	0.5
RF	Bag size percent	100	NB	Use kernel estimator	False
	Maximum depth	0		Use kernel estimator	False
	Number of execution slots	1		Use supervised discretization	False
	Number of features	0		Display model	False
	Number of iterations	100		Display model	False

KNN: K-nearest neighbor; SVM: support-vector machine; ANN: artificial neural network; RF: random forest; NB: naïve Bayes.

The results of the classification on VDDD presented in Table 15 show that all the applied classifiers achieved high detection accuracies generally greater than 99% except for SVM, which shows an accuracy of 97%. The RF classifier achieved the highest accuracy at 99.7% compared to other applied ML classifiers. Table 16 presents the confusion matrices for each classifier. Based on these statistics, Figure 17 presents the false rates for classifiers, where SVM shows a higher false rate compared to other classifiers.

Table 15.

Evaluating results of ML classifiers on VDDD fourth scenario.

Fourth scenario
Classifier	Accuracy	Precision	Recall	F1
J48	99.7189%	0.995	0.999	0.997
SVM	97.3667%	0.951	0.999	0.974
RF	99.7534%	0.996	0.999	0.998
KNN	99.7337%	0.996	0.999	0.997
ANN	99.6795%	0.994	0.999	0.997
NB	99.5266%	0.995	0.996	0.995

SVM: support-vector machine; RF: random forest; KNN: K-nearest neighbor; ANN: artificial neural network; NB: naïve Bayes.

Bold values represent the best values among the others.

Table 16.

Confusion matrix for fourth scenario.

J48		KNN
10,069	46	10,072	43
11	10,153	11	10,153
SVM		ANN
9587	528	10,058	57
6	10,158	8	10,156
RF		NB
10,077	38	10,064	51
12	10,152	45	10,119

KNN: K-nearest neighbor; SVM: support-vector machine; ANN: artificial neural network; RF: random forest; NB: naïve Bayes.

Figure 17.

False rates for applied ML models on VDDD.

To calculate the computing time for our proposed approach, we subtracted the time taken to generate the whole dataset. The focus was on the other computational aspects, namely, the classifier’s building time and feature weight calculation time.⁷ We conducted several experiments on VDDD starting with the feature selection method and then executing all the studied ML methods to get the average of the computation time for both building the classifier and ranking and selecting the features. Generally, we considered the average of these computation measures from a total of seven runs on a computer running Windows 10 with CPU of 2.6 GHz and 8.00 GB RAM. The results are as follows: the average time taken to do the feature selection using IG ranking method was about 0.23 s. The average time taken to build an ANN model was the longest at about 2.54 s, followed by SVM at 1.8 s; the rest of the models had a comparable time of no more than 0.12 s.

Moreover, we presented a receiver operating characteristic (ROC) curve⁶⁹ as shown in Figure 18 which reflects the quality of the decision made by the classifiers. ROC is one of the best measures to evaluate the performance of classification models based on the threshold setting values as it reflects the model’s ability to predict classes. Under VDDD, the experimental results showed that all models have around 99% as an area under the ROC curve, except for the SVM model, which showed 97%. These results indicate the effectiveness of all models in predicting classes with very low false rates.

Figure 18.

AUC–ROC curve for applied ML models on VDDD.

Conclusion

An insecure VANET can lead to fatal accidents, physical disability, and even deaths. Accordingly, the security concerns regarding VANET required the attention of researchers and developers with special consideration of the unique characteristics of the network. In addition, VANET must be capable of accurately detecting and preventing possible threats that might occur on the network. In this article, we explored several aspects of the VANET system including its architecture and characteristics, as well as introducing a literature review on recent studies about securing VANETs against DDoS attacks. Due to the lack of available DDoS attack datasets that fit the VANET environment, we simulated a VANET environment involving a real highway scenario using several tools including OMNeT++, INET, Veins, and SUMO. The simulated scenarios were used to generate and build a dataset for detecting DDoS attack in VANET environment. The dataset records were processed effectively following all the principles of data preparation and pre-processing that existed in the literature review. The proposed dataset VDDD has fulfilled the majority of the requirements of being a valid dataset as it overcomes the issues with existing VANET datasets such as ignoring the VANET characteristics, dissimilar network configurations, and unavailability of the datasets to the public. Several ML models were trained on the generated dataset and all showed significant accuracy in detecting DDoS attack traffic.

Future work

As a future work, the study can be extended by considering other aspects when simulating VANET such as context and weather conditions. Moreover, the generated dataset, VDDD, currently contains only UDP flooding attacks. It can be extended to be more generic by adding more types of DDoS attacks, as well as other types of attacks. More ML techniques can be trained on the dataset as a sort of evaluation of the VDDD dataset. Generating more attack traffic to balance between legitimate and malicious traffic can be considered as an extension to this work.

Footnotes

Handling Editor: Ashish Kr Luhach

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research,authorship,and/or publication of this article.

Funding

The author(s) received no financial support for the research,authorship,and/or publication of this article.

ORCID iD

Fahd A Alhaidari

References

World Health Organization (WHO). Global status report on road safety 2018. WHO, 2018, https://www.who.int/violence_injury_prevention/road_safety_status/2018/English-Summary-GSRRS2018.pdf (accessed 20 August 2019).

Jabbarpour

Nabaei

Zarrabi

. Intelligent guardrails: an IoT application for vehicle traffic congestion reduction in smart city. In: Proceedings of the 2016 IEEE international conference on Internet of Things (iThings) and IEEE green computing and communications (GreenCom) and IEEE cyber, physical and social computing (CPSCom) and IEEE smart data (SmartData), Chengdu, China, 15–18 December 2016, pp.7–13. New York: IEEE.

Jain

Saxena

VANET: security attacks, solution and simulation. In: Bhateja

Tavares

JMR

Rani

, et al. (eds) Proceedings of the second international conference on computational intelligence and informatics. Singapore: Springer, 2018, pp.457–466.

Ghebleh

A comparative classification of information dissemination approaches in vehicular ad hoc networks from distinctive viewpoints: a survey. Comput Netw 2018; 131: 15–37.

Gruebler

McDonald-Maier

Alheeti

KMA

. An intrusion detection system against black hole attacks on the communication network of self-driving cars. In: Proceedings of the 2015 6th international conference on emerging security technologies (EST), Braunschweig, 3–5 September 2015, pp.86–91. New York: IEEE.

Brendha

Prakash

VSJ

. A survey on routing protocols for vehicular Ad Hoc networks. In: Proceedings of the 2017 4th international conference on advanced computing and communication systems (ICACCS), Coimbatore, India, 6–7 January 2017, pp.1–7. New York: IEEE.

Arif

Wang

Geman

, et al. SDN-based VANETs, security attacks, applications, and challenges. Appl Sci 2020; 10(9): 3217.

Arif

Wang

Balas

VE.

Secure VANETs: trusted communication scheme between vehicles and infrastructure based on fog computing. Stud Inform Control 2018; 27(2): 235–246.

Irshad

Usman

Chaudhry

, et al. A provably secure and efficient authenticated key agreement scheme for energy Internet-based vehicle-to-grid technology framework. IEEE T Ind Appl 2020; 56(4): 4425–4435.

10.

Hussain

Zeadally

Integration of VANET and 5G security: a review of design and implementation issues. Future Gener Comp Sy 2019; 101: 843–864.

11.

Khelifi

Luo

Nour

, et al. Security and privacy issues in vehicular named data networks: an overview. Mob Inf Syst 2018; 2018: 5672154.

12.

Pathre

Agrawal

Jain

. A novel defense scheme against DDOS attack in VANET. In: Proceedings of the 2013 10th international conference on wireless and optical communications networks (WOCN), Bhopal, India, 26–28 July 2013, pp.1–5. New York: IEEE.

13.

Kolandaisamy

Noor

Ahmedy

, et al. A multivariant stream analysis approach to detect and mitigate DDoS attacks in vehicular ad hoc networks. Wirel Commun Mob Com 2018; 2018: 2874509.

14.

Haydari

Yilmaz

. Real-time detection and mitigation of DDoS attacks in intelligent transportation systems. In: Proceedings of the 2018 21st international conference on intelligent transportation systems (ITSC), Maui, HI, 4–7 November 2018, pp.157–163. New York: IEEE.

15.

Shabbir

Khan

, et al. Detection and prevention of distributed denial of service attacks in VANETs. In: Proceedings of the 2016 international conference on computational science and computational intelligence (CSCI), Las Vegas, NV, 15–17 December 2016, pp.970–974. New York: IEEE.

16.

Mirsadeghi

Rafsanjani

Gupta

BB.

A trust infrastructure based authentication method for clustered vehicular ad hoc networks. Peer Peer Netw Appl. Epub ahead of print 24 October 2020. DOI: 10.1007/s12083-020-01010-4.

17.

Bhushan

Gupta

BB.

Distributed denial of service (DDoS) attack mitigation in software defined network (SDN)-based cloud computing environment. J Amb Intel Hum Comp 2019; 10(5): 1985–1997.

18.

Kolandaisamy

Noor

Z’aba

, et al. Adapted stream region for packet marking based on DDoS attack detection in vehicular ad hoc networks. J Supercomput 2020; 76: 5948–5970.

19.

Kolandaisamy

Noor

Kolandaisamy

, et al. A stream position performance analysis model based on DDoS attack detection for cluster-based routing in VANET. J Amb Intel Hum Comp. Epub ahead of print 3 July 2020. DOI: 10.1007/s12652-020-02279-2.

20.

Bensalah

Elkamoun

Baddi

SDNStat-Sec: a statistical defense mechanism against DDoS attacks in SDN-based VANET. In: Saeed

Al-Hadhrami

Mohammed

, et al. (eds) Advances on smart and soft computing, vol. 1188. Singapore: Springer, 2020, pp.527–540.

21.

Singh

Jha

Nandi

, et al. ML-based approach to detect DDoS attack in V2I communication under SDN architecture. In: Proceedings of the TENCON2018—2018 IEEE region 10 conference, Jeju, South Korea, 28–31 October 2019, pp.144–149. New York: IEEE.

22.

Aneja

MJS

Bhatia

Sharma

, et al. Artificial intelligence based intrusion detection system to detect flooding attack in VANETs. In: Shrivistava

Kumar

Gupta

, et al. (eds) Handbook of research on network forensics and analysis techniques. Hershey, PA: IGI Global, 2018, pp.87–100.

23.

Karagiannis

Argyriou

Jamming attack detection in a pair of RF communicating vehicles using unsupervised machine learning. Veh Commun 2018; 13: 56–63.

24.

Belenko

Krundyshev

Kalinin

. Synthetic datasets generation for intrusion detection in VANET. In: Proceedings of the 11th international conference on security of information and networks (SIN’18), Cardiff, 10–12 September 2018, pp.1–6. New York: ACM.

25.

Zeng

Qiu

Zhu

, et al. DeepVCM: a deep learning based intrusion detection method in VANET. In: Proceedings of the IEEE 5th international conference on big data security on cloud, Washington, DC, 27–29 May 2019, pp.288–293. New York: IEEE.

26.

Ghaleb

Zainal

Rassam

, et al. An effective misbehavior detection model using artificial neural network for vehicular ad hoc network applications. In: Proceedings of the 2017 IEEE conference on application, information and network security (AINS), Miri, Malaysia, 13–14 November 2017, pp.13–18. New York: IEEE.

27.

Joshi

Finin

. SVM-CASE: an SVM-based context aware security framework for vehicular ad-hoc networks. In: Proceedings of the 2015 IEEE 82nd vehicular technology conference (VTC2015-Fall), Boston, MA, 6–9 September 2015, pp.1–5. New York: IEEE.

28.

Grover

Prajapati

Laxmi

, et al. Machine learning approach for multiple misbehavior detection in VANET. Comm Com Inf Sc 2011; 192: 644–653.

29.

Ali Alheeti

Gruebler

McDonald-Maier

Intelligent intrusion detection of grey hole and rushing attacks in self-driving vehicular networks. Computers 2016; 5(3): 16.

30.

Aloqaily

Otoum

Al Ridhawi

, et al. An intrusion detection system for connected vehicles in smart cities. Ad Hoc Netw 2019; 90: 101842.

31.

Singh

Gupta

Nandi

, et al. Machine learning based approach to detect wormhole attack in VANETs. In: Proceedings of the workshops of the international conference on advanced information networking and applications, Matsue, Japan, 27–29 March 2019, pp.651–661. Cham: Springer.

32.

Singh

Gupta

Vashistha

, et al. Machine learning based approach to detect position falsification attack in VANETs. In: Proceedings of the international conference on security and privacy, Jaipur, India, 9–11 January 2019, pp.166–178. Singapore: Springer.

33.

Gyawali

Qian

Misbehavior detection using machine learning in vehicular communication networks. In: Proceedings of the ICC 2019—2019 IEEE international conference on communications (ICC), Shanghai, China, 20–24 May 2019, pp.1–6. New York: IEEE.

34.

Sharma

Petit

. Integrating plausibility checks and machine learning for misbehavior detection in VANET. In: Proceedings of the 2018 17th IEEE international conference on machine learning and applications (ICMLA), Orlando, FL, 17–20 December 2018, pp.564–571. New York: IEEE.

35.

Kim

Jang

Choo

, et al. Collaborative security attack detection in software-defined vehicular networks. In: Proceedings of the 2017 19th Asia-Pacific network operations and management symposium (APNOMS), Seoul, South Korea, 27–29 September 2017, pp.19–24. New York: IEEE.

36.

Guo

Liu

, et al. An efficient SDN-based DDoS attack detection and rapid response platform in vehicular networks. IEEE Access 2018; 6: 44570–44579.

37.

Luong

Hoang

FAPRP: a machine learning approach to flooding attacks prevention routing protocol in mobile ad hoc networks. Wirel Commun Mob Com 2019; 2019: 6869307.

38.

Reddy

Thilagam

PS.

Naïve Bayes classifier to mitigate the DDoS attacks severity in ad-hoc networks. Int J Comm Network Inform Secur 2020; 12(2): 221–226.

39.

Gao

Song

, et al. A distributed network intrusion detection system for distributed denial of service attacks in vehicular ad hoc network. IEEE Access 2019; 7: 154560–154571.

40.

Ali Alheeti

McDonald-Maier

. Intelligent intrusion detection in external communication systems for autonomous vehicles. Syst Sci Control Eng 2018; 6(1): 48–56.

41.

Van der Heijden

Lukaseder

Kargl

. VeReMi: a dataset for comparable evaluation of misbehavior detection in VANETs. In: Beyah

Chang

, et al. (eds) Security and privacy in communication networks (SecureComm 2018; Lecture notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering), vol. 254. Cham: Springer, 2018, pp.318–337.

42.

Lyamin

Kleyko

Delooz

, et al. Real-time jamming DoS detection in safety-critical V2V C-ITS using data mining. IEEE Commun Lett 2019; 23(3): 442–445.

43.

Damasevicius

Venckauskas

Grigaliunas

, et al. LITNET-2020: an annotated real-world network flow dataset for network intrusion detection. Electronics 2020; 9(5): 800.

44.

Al-Hadhrami

Hussain

FK.

Real time dataset generation framework for intrusion detection systems in IoT. Future Gener Comp Sy 2020; 108: 414–423.

45.

Alrehan

Alhaidari

. Machine learning techniques to detect DDoS attacks on VANET system: a survey. In: Proceedings of the 2019 2nd international conference on computer applications and information security (ICCAIS), Riyadh, Saudi Arabia, 1–3 May 2019, pp.1–6. New York: IEEE.

46.

Shiravi

Tavallaee

, et al. Toward developing a systematic approach to generate benchmark datasets for intrusion detection. Comput Secur 2012; 31(3): 357–374.

47.

Tavallaee

Bagheri

, et al. A detailed analysis of the KDD CUP 99 dataset. In: Proceedings of the 2009 IEEE symposium on computational intelligence for security and defense applications, Ottawa, ON, Canada, 8–10 July 2009, pp.1–6. New York: IEEE.

48.

Moustafa

Slay

. UNSW-NB15: a comprehensive dataset for network intrusion detection systems (UNSW-NB15 network dataset). In: Proceedings of the 2015 military communications and information systems conference (MilCIS), Canberra, ACT, Australia, 10–12 November 2015, pp.1–6. New York: IEEE.

49.

Koyoto dataset, 2016, https://www.takakura.com/Kyoto_data/BenchmarkData-Description-v5.pdf (accessed 16 November 2020).

50.

Varga

. OMNeT++. In: Wehrle

Güneş

Gross

(eds) Modeling and tools for network simulation (1st ed.) Berlin, Heidelberg: Springer, 2010, pp.35–59.

51.

Lopez

Behrisch

Bieker-Walz

, et al. Microscopic traffic simulation using SUMO. In: 2018 21st International conference on intelligent transportation systems (ITSC), Maui, HI, USA, 4–7 November 2018, pp.2575–2582. New York: IEEE.

52.

INET. INET framework, https://inet.omnetpp.org/ (accessed 14 July 2019).

53.

Sommer

German

Dressler

Bidirectionally coupled network and road traffic simulation for improved IVC analysis. IEEE T Mobile Comput 2011; 10(1): 3–15.

54.

Haklay

Weber

OpenStreetMap: user-generated street maps. IEEE Pervas Comput 2008; 7(4): 12–18.

55.

Fathy

Firouzjaee

Raahemifar

Improving QoS in VANET using MPLS. Procedia Comput Sci 2012; 10: 1018–1025.

56.

Kotenko

Ulanov

Agent-based simulation of DDOS attacks and defense mechanisms. Int J Comput 2014; 4(2): 113–123.

57.

Kaur

Sangal

Kumar

. Modeling and simulation of DDoS attack using Omnet++. In: Proceedings of the 2014 international conference on signal processing and integrated networks (SPIN), Noida, India, 20–21 February 2014, pp.220–225. New York: IEEE.

58.

Alzahrani

Hong

Generation of DDoS attack dataset for effective IDS development and evaluation. J Inf Secur 2018; 9(4): 225–241.

59.

Siddiqui

Boukerche

. On the impact of DDoS attacks on software-defined Internet-of-vehicles control plane. In: Proceedings of the 2018 14th international wireless communications and mobile computing conference (IWCMC), Limassol, Cyprus, 25–29 June 2018, pp.1284–1289. New York: IEEE.

60.

SQLite Brwoser. DB Browser for SQLite, https://sqlitebrowser.org/ (accessed 14 July 2019).

61.

Project Jupyter. Project Jupyter home, https://jupyter.org/index.html (accessed 14 July 2019).

62.

Karegowda

Manjunath

Jayaram

MA.

Comparative study of attribute selection using gain ratio and correlation based feature selection. Int J Inf Technol Knowl Manag 2010; 2(2): 271–277.

63.

Hall

MA.

Correlation-based feature selection for machine learning. PhD Dissertation, Department of Computer Science, Waikato University, Hamilton, New Zealand, 1999.

64.

Novaković

Strbac

Bulatović

Toward optimal feature selection using ranking methods and classification algorithms. Yugosl J Oper Res 2011; 21(1): 119–135.

65.

Cheng

Wang

, et al. Feature selection: a data perspective. ACM Comput Surv 2018; 50(6): 94.

66.

Chawla

Bowyer

Hall

, et al. SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 2002; 16: 321–357.

67.

Jain

Kotsampasakou

Ecker

GF.

Comparing the performance of meta-classifiers-a case study on selected imbalanced datasets relevant for prediction of liver toxicity. J Comput Aid Mol Des 2018; 32(5): 583–590.

68.

Sharafaldin

Gharib

Lashkari

, et al. Towards a reliable intrusion detection benchmark dataset. Softw Netw 2017; 2017(1): 177–200.

69.

Omar

Ivrissimtzis

Using theoretical ROC curves for analysing machine learning binary classifiers. Pattern Recogn Lett 2019; 128(6): 447–451.