Sage Journals: Discover world-class research

Abstract

The presented paper delves into the realm of cybersecurity in the face of escalating and dynamic cyber threats, aiming to fortify the digital landscape through the utilization of data science techniques. In this pursuit, a comprehensive exploration of diverse data science methodologies tailored for bolstering cybersecurity is undertaken. The core objective is to establish robust models with the capability to discern and categorize a spectrum of cyber assaults. Encompassing an array of cyber threats such as malware, phishing, denial-of-service (DoS), distributed denial-of-service (DDoS), and structured query language (SQL) injection, a consolidated dataset is curated for meticulous analysis. This dataset encompasses multifaceted attributes including protocols, flags, packets, sender and receiver identifiers, IP addresses, ports, packet dimensions, and a pivotal target variable signifying the specific cyber-attack category. A meticulous feature-description table expounds upon these attributes. The data are rigorously prepared for model training, involving label encoding to translate categorical data into numerical formats. A discerning selection of pertinent attributes are then orchestrated to optimize the model’s performance. Standardizing the attributes onto a uniform scale is achieved through scaling and normalization techniques, leveling the playing field for subsequent model training. Diverse machine-learning models, comprising support vector machines (SVM), K-Nearest Neighbors (KNN), Random forest (RF), Decision tree (DT), Gradient Boosting Classifier (GBC), Naive Bayes (NB), and logistic regression (LR), are employed to the refined data, accompanied by an evaluation based on crucial metrics like accuracy, precision, recall, and F1-score. This evaluation illuminates the efficacy of these models in aptly categorizing cyber-attacks. Employing GridSearchCV, model parameters are meticulously fine-tuned, unveiling optimization avenues. Upon parameter optimization, a comparative analysis of the models is executed, culminating in the deployment of a voting classifier as an ensemble approach, amalgamating predictions from multiple models. Impressively, the ensemble model attains a 97.33% accuracy rate, underscoring its prowess. The confluence of models with high precision underscores the value of amalgamating distinct model attributes. Visual insights into decision boundaries shed light on the models’ capacity to discriminate between diverse cyber-attack types. Furthermore, holistic classification results and avenues for enhancement are illuminated through intricate confusion matrices. Ultimately, the study underscores the indispensability of integrating data science methodologies into cybersecurity endeavors.

Keywords

Data science cybersecurity evolving threats machine learning anomaly detection big data cyber threat detection

Get full access to this article

View all access options for this article.

References

Katoua

HS.

Exploiting the data mining methodology for cyber security. Egypt Comput Sci J 2013; 37: 44–52, http://www.ecsjournal.org/Archive/Volume37/Issue6/4.pdf

Ghime

Patriciu

A survey of Big Data analytics techniques in cyber security. IJLRET J 2017; 3: 21–25.

Lavanya

Mythili

An integration of Big Data analytics and cyber security: a panoramic survey. Int J Adv Res Eng Technol 2020; 11: 747–754.

Andrade

Tello-Oquendo

Cadena-Vela

, et al. Arquitectura de Analítica de Big Data para Aplicaciones de Ciberseguridad [Big data analytics architecture for cybersecurity applications]. LAJC 2021; 8: 22–37.

Ikram

Cherukuri

, et al. Big data analytics for security intelligence. Handb Big Data Anal Appl ICT, Secur Bus Anal 2021; 2: e037.

Bajpai

Arya

Big data analytics in cyber security. Int J Comput Sci Eng 2018; 6: 731–734.

Suraj

Kumar Singh

Tomar

. Big data Analytics of cyber attacks: a review. In: Proceedings of the 2018 IEEE international conference on system, computation, automation and networking (ICSCA), Pondicherry, India, 6–7 July 2018. New York: IEEE.

Akbar

Spandana

Nageswararao

Big data cyber security threats for mutual detection. Int J Pure Appl Math 2017; 114: 19–26.

Wang

Big data in network security systems. Int T Electr Comput Eng Syst 2017; 4(2): 68–74.

10.

Petrenko

Makoveichuk

KA.

Big data technologies for cybersecurity. CEUR Workshop Proc2017; 2081: 107–111.

11.

Foroughi

Luksch

Data science methodology for cybersecurity projects. 2018, https://arxiv.org/pdf/1803.04219

12.

Al Moaiad

Mohamed

Tarshany

, et al. Cyber attack detection using big data analysis. Int J Comput Sci Inf Technol Res 2022; 10: 26–33.

13.

Yeboah-Ofori

Islam

Lee

, et al. Cyber threat predictive analytics for improving cyber supply chain security. IEEE Access 2021; 9: 94318–94337.

14.

Sarker

Kayes

ASM

Badsha

, et al. Cybersecurity data science: an overview from machine learning perspective. J Big Data 2020; 7: 41.

15.

Tewari

SH.

Data science and its application in cyber security (cyber security data science). 2020, https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3687251

16.

Velmurugan

Rajasutha

Swetha

Prediction of cyber attack using data science technique. Int Res J Eng Technol 2022; 3: 4239–4246, www.irjet.net

17.

Azmi

Kalsum

Alamsyah

. Analysis and Application of Access Control List (ACL) methods on computer networks. Jurnal Komputer, Informasi Dan Teknologi 2022; 2(1), 81–88.

18.

Oseku-Afful

. The use of big data analytics to protect critical information infrastructures from cyber-attacks, information security. Master Thesis, University of Technology Department of Computer Science Electronics and Engineering, http://ltu.diva-portal.org/smash/get/diva2:1037515/FULLTEXT02.pdf

19.

Mazhar

Irfan

Khan

, et al. Analysis of cyber security attacks and its solutions for the smart grid using machine learning and blockchain methods. Futur Internet 2023; 15: 83.

20.

Shaukat

Luo

Varadharajan

, et al. A survey on machine learning techniques for cyber security in the last decade. IEEE Access 2020; 8: 222310–222354.

21.

Velmurugan

Rajasutha

Swetha

Prediction of cyber attack using data science technique. Int Res J Eng Technol 2022; 2022: 858–61, www.irjet.net

22.

Ahsan

Nygard

Gomes

, et al. Cybersecurity threats and their mitigation approaches using machine learning: a review. J Cybersecurity Priv 2022; 2: 527–555.

23.

Boukri

Chaoui

Security analytics in big data infrastructures. Int J Comput Sci Inf Secur 2015; 13: 91–95, http://search.proquest.com.ezproxylocal.library.nova.edu/docview/1693339588?accountid=6579

24.

Joglekar

Pise

Solving cyber security challenges using big data. Int J Comput Appl 2016; 154: 9–12.

25.

Gupta

Chui

Gaurav

, et al. A novel hybrid convolutional neural network- and gated recurrent unit-based paradigm for IoT network traffic attack detection in smart cities. Sensors 2023; 23: 8686.

26.

Kumar

Gupta

Tripathi

A distributed ensemble design based intrusion detection system using fog computing to protect the internet of things networks. J Amb Int Human Comput 2020; 12: 9555–9572.

27.

Lilhore

Dalal

Simaiya

A cognitive security framework for detecting intrusions in IoT and 5G utilizing deep learning. Comput Secur 2024; 136: 103560.

28.

Ghani

Alam

Jaskani

. Comparison of classification models for early prediction of breast cancer. In: 2019 International conference on innovative computing (ICIC), Lahore, Pakistan, 1–2 November 2019. New York: IEEE.

29.

Dalal

Lilhore

Faujdar

, et al. Next-generation cyber attack prediction for IoT systems: leveraging multi-class SVM and optimized CHAID decision tree. J Cloud Comp 2023; 12: 137.

30.

Lilhore

Manoharan

Simaiya

, et al. HIDM: hybrid intrusion detection model for industry 4.0 networks using an optimized CNN-LSTM with transfer learning. Sensors 2023; 23: 7856.