Sage Journals: Discover world-class research

Abstract

This paper conducts a study and analysis of the SEA dataset, grouping the data and digitizing features to obtain corresponding labels. Subsequently, the K-Nearest Neighbors (KNN) algorithm is applied to the dataset to investigate its performance, revealing that the Manhattan distance is the optimal distance formula for researching this problem using the KNN algorithm. The study explicitly selects frequent words chosen by individual users as features for calculation. During the computation process, grid search is employed to find the optimal parameters, and a model is created using these optimal parameters. The paper then applies the Naive Bayes algorithm to the dataset, comparing the strengths and weaknesses of different types of Naive Bayes methods. An analysis is conducted on the differences and advantages/disadvantages of various feature extraction methods, highlighting that the accuracy of Bernoulli Naive Bayes is higher than that of Multinomial Naive Bayes.

Keywords

machine learning K-nearest neighbors algorithm Naive Bayes algorithm feature extraction anomaly detection

Get full access to this article

View all access options for this article.

References

Jia

Xiaonan

Yang

, et al. Abnormal behavior detection for campus email systems based on big data analysis. J Commun 2018; 39(S1): 116–123.

Qing

. Research on network traffic intrusion detection based on machine learning. Chengdu: University of Electronic Science and Technology, 2022, pp. 396–415.

Liu

. Research on advertising content recognition based on convolutional neural network and recurrent neural network. Int J Comput Sci Eng 2021; 24: 398–404. DOI: 10.1504/IJCSE.2021.117022.

Liu

Yue

. Real-time anomaly attack detection based on an improved variable length model. J Comput Methods Sci Eng 2023; 23: 1179–1195. DOI: 10.3233/JCM-226663.

Yan

Feng

. A survey of random forest algorithms. J Hebei Acad Sci. 2019; 36(3): 115–126.

Zhang

. Control and management of linkage protection based on security gateway. Beijing: Beijing University of Posts and Telecommunications, 2019, pp. 354–366.

Zhang

Tan

Wang

. T2FA: Transparent two-factor authentication. IEEE Access 2018; 6: 32677–32686.

Wang

. Research on the key technology of attacker’s portrait in network security. Information Technology and Informatization 2018; 8: 143–145.

Feng

Zhi

Qing

, et al. Federated learning framework for data-driven cooperative localization and location data processing. IEEE Open J Signal Process 2020; 1: 187–215. DOI: 10.1109/OJSP.2020.3036276.

10.

Yang

Liu

. Portrait analysis of threat intelligence for attack recognition. Comput Eng 2020; 46(1): 136–143.

11.

Pan

Lin

. A trust-based DDoS discovery approach for encrypted traffic in cloud environment. J Comput Res Dev 2021; 58(4): 822–833.

12.

Bošnjak

Sreš

Brumen

. Brute-force and dictionary attack on hashed real-world passwords. In: 2018 41st International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), Opatija, Croatia, 21–25 May 2018, pp. 1161–1166.

13.

Han

Zhang

. Fast-flucos: malicious domain name detection method for fast-flux based on DNS traffic. J Commun 2020; 41(5): 37–47.

14.

Izrailov

Zhukovskaya

Kurta

, et al. Investigation of the method of determination of password resistance to brute force based on an artificial neural network. J Phys Conf Ser 2021; 1864(1): 012123.

15.

Cho

Nam

. A method of monitoring and detecting APT attacks based on unknown domains. Procedia Comput Sci 2019; 150: 316–323.

16.

Hong

Lee

. A deep learning-based password security evaluation model. Appl Sci. 2023; 12(5): 238–251.