Sage Journals: Discover world-class research

Abstract

In this investigation, an attempt was made to present a model for distinguishing fake news from truthful news in textual data. For this purpose, using intelligent methods and based on the principles of text analysis, a bi-classification model was presented that divided the textual data into deceptive and truthful classes. Basic algorithms based on artificial intelligence (AI) used for modeling consisting of Adaboost (Ada), Support Vector Classifier (SVC), Random Forest (RF), Neural Network (NN), BERT, and Convolutional Neural Network (ConvNet). Among the methods used in this study are the TF-IDF method for vectorization of textual data; the PCA (Principal Component Analysis) technique for feature transformation; the word2index as well as word embedding models for converting the words into numbers, and the N-gram technique to create a sequence of words. Finally, by conducting a case study and by examining different evaluation indices, the comparison of the offending models was done. The outcomes of this investigation showed that despite the high similarity between the two classes (86.6% similarity in train data and 79.8% similarity in test data), the BERT model had the best result compared to the others. This model has high complexity and can better extract relationships between data. In the basic article, the best value of the Accuracy index was 0.90, which was improved to 0.93 in this study.

Keywords

fake news artificial intelligence Neural Network TF-IDF Principal Component Analysis BERT

Get full access to this article

View all access options for this article.

References

Jiang

TAO

Haq

, et al. A novel stacking approach for accurate detection of fake news. IEEE Access 2021; 9: 22626–22639.

Ahmed

AAA

Aljabouh

Donepudi

, et al. Detecting fake news using machine learning: A systematic literature review. ArXiv Preprint ArXiv:2102.04458. 2021.

Posadas-Durán

J-P

Gómez-Adorno

Sidorov

, et al. Detection of fake news in a new corpus for the Spanish language. J Intell Fuzzy Syst 2019; 36: 4869–4876.

Meel

Vishwakarma

. A temporal ensembling based semi-supervised ConvNet for the detection of fake news articles. Expert Syst Appl 2021; 177: 115002.

Choudhary

Chouhan

Pilli

, et al. Berconvonet: a deep learning framework for fake news classification. Appl Soft Comput 2021; 110: 107614.

Ahmed

Traore

Saad

. Detection of online fake news using n-gram analysis and machine learning techniques. In: Intelligent, Secure, and Dependable Systems in Distributed and Cloud Environments: First International Conference, ISDDC 2017, Vancouver, BC, Canada, October 26–28, 2017, Proceedings 1, Springer, 2017, pp.127–138.

Girgis

Amer

Gadallah

. Deep learning algorithms for detecting fake news in online text. In: 2018 13th International Conference on Computer Engineering and Systems (ICCES), IEEE, 2018, pp.93–97.

Abdullah All

Mahir

Akhter

, et al. Detecting Fake News using Machine Learning and Deep Learning Algorithms. In: 2019 7th International Conference on Smart Computing & Communications (ICSCC), 2019.

Poddar

Umadevi

. Comparison of various machine learning models for accurate detection of fake news. In: 2019 Innovations in Power and Advanced Computing Technologies (i-PACT), IEEE, 2019, pp.1–5.

10.

Kaliyar

Goswami

Narang

. Multiclass fake news detection using ensemble machine learning. In: 2019 IEEE 9th International Conference on Advanced Computing (IACC), IEEE, 2019, pp.103–107.

11.

Abdulrahman

Baykara

. Fake news detection using machine learning and deep learning algorithms. In: 2020 International Conference on Advanced Science and Engineering (ICOASE), IEEE, 2020, pp.18–23.

12.

Hussain

Hasan

Rahman

, et al. Detection of bangla fake news using mnb and svm classifier. In: 2020 International Conference on Computing, Electronics & Communications Engineering (ICCECE), IEEE, 2020, pp.81–85.

13.

Alameri

Mohd

. Comparison of fake news detection using machine learning and deep learning techniques. In: 2021 3rd International Cyber Resilience Conference (CRC), IEEE, 2021, pp.1–6.

14.

Meesad

. Thai Fake news detection based on information retrieval, natural language processing and machine learning. SN Comput Sci 2021; 2: 425.

15.

Guo

Yang

. Research and improvement of feature words weight based on TFIDF algorithm. In: 2016 IEEE Information Technology, Networking, Electronic and Automation Control Conference, IEEE, 2016, pp.415–419.

16.

Ahmed

Traore

Saad

. Detecting opinion spams and fake news using text classification. Secur Privacy 2018; 1: e9.

17.

Zhu

Zhang

Yan

, et al. N-gram MalGAN: evading machine learning detection via feature n-gram. Digital Commun Networks 2022; 8: 485–491.

18.

Yan

Gong

, et al. Prophetnet: Predicting future n-gram for sequence-to-sequence pre-training. ArXiv Preprint ArXiv:2001.04063. 2020.

19.

Ringnér

. What is principal component analysis? Nat Biotechnol 2008; 26: 303–304.

20.

Kurita

. Principal component analysis (PCA). Comput Vision: Ref Guide 2019: 1–4.

21.

Abdi

Williams

. Principal component analysis. Wiley Interdiscip Rev Comput Stat 2010; 2: 433–459.

22.

Shlens

. A tutorial on principal component analysis. ArXiv Preprint ArXiv:1404.1100. 2014.

23.

Wang

Zhou

Jiang

. A survey of word embeddings based on deep learning. Computing 2020; 102: 717–740.

24.

Allen

Hospedales

. Analogies explained: towards understanding word embeddings. In: International Conference on Machine Learning, PMLR, 2019, pp.223–231.

25.

Tangirala

. Evaluating the impact of GINI index and information gain on classification using decision tree classifier algorithm. Int J Adv Comput Sci Appl 2020; 11: 612–619.

26.

Luo

. Efficient English text classification using selected machine learning techniques. Alexandria Eng J 2021; 60: 3401–3409.

27.

Shajihan

. Classification of stages of Diabetic Retinopathy using Deep Learning. Bournemouth University United Kingdom, 2020.

28.

Powers

DMW

. Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. ArXiv Preprint ArXiv:2010.16061. 2020.

29.

Tharwat

. Classification assessment methods. Appl Comput Inf 2020; 17: 168–192.

30.

Breiman

. Random forests. Mach Learn 2001; 45: 5–32.

31.

Hossein

Nadiri

Asghari Moghaddam

, et al. Prediction of transmissivity of malikan plain aquifer using random forest method. Water Soil Sci 2017; 27: 61–75.

32.

Akar

Güngör

. Rastgele orman algoritması kullanılarak çok bantlı görüntülerin sınıflandırılması. Jeodezi ve Jeoinformasyon Dergisi 2012; 106: 139–146.

33.

Sarica

Cerasa

Quattrone

. Random forest algorithm for the classification of neuroimaging data in Alzheimer’s disease: a systematic review. Front Aging Neurosci 2017; 9: 329.

34.

Dogru

Subasi

. Traffic accident detection using random forest classifier. In: 2018 15th Learning and Technology Conference (L&T), IEEE, 2018, pp.40–45.

35.

Tariq

Meng

Yao

, et al. Adaboost-DSNN: an adaptive boosting algorithm based on deep self normalized neural network for pulsar identification. Mon Not R Astron Soc 2022; 511: 683–690.

36.

Mazini

Shirazi

Mahdavi

. Anomaly network-based intrusion detection system using a reliable hybrid artificial bee colony and AdaBoost algorithms. J King Saud UnivComput Inf Sci 2019; 31: 541–553.

37.

Akande

Owolabi

Olatunji

, et al. A hybrid particle swarm optimization and support vector regression model for modelling permeability prediction of hydrocarbon reservoir. J Pet Sci Eng 2017; 150: 43–53.

38.

Fischer

Langensiepen

Luig

, et al. Efficient optimization of hyper-parameters for least squares support vector regression. Optim Methods Software 2015; 30: 1095–1108.

39.

De La Hoz

Ortiz

Ortega

, et al. Network anomaly classification by support vector classifiers ensemble and non-linear projection techniques. In: Hybrid Artificial Intelligent Systems: 8th International Conference, HAIS 2013, Salamanca, Spain, September 11–13, 2013. Proceedings 8, Springer, 2013, pp.103–111.

40.

Nagabushanam

George

Radha

. EEG Signal classification using LSTM and improved neural network algorithms. Soft Comput 2020; 24: 9981–10003.

41.

Alameer

Abd Elaziz

Ewees

, et al. Forecasting gold price fluctuations using improved multilayer perceptron neural network and whale optimization algorithm. Resour Policy 2019; 61: 250–260.

42.

Ansari

Mrazek

Cockburn

, et al. Improving the accuracy and hardware efficiency of neural networks using approximate multipliers. IEEE Trans Very Large Scale Integr (VLSI) Syst 2019; 28: 317–328.

43.

Al-Saffar

AAM

Tao

Talab

. Review of deep convolution neural network in image classification. In: 2017 International Conference on Radar, Antenna, Microwave, Electronics, and Telecommunications (ICRAMET), IEEE, 2017, pp.26–31.

44.

Liu

Yang

, et al. A survey of convolutional neural networks: analysis, applications, and prospects. IEEE Trans Neural Networks Learn Syst 2021; 33: 6999–7019.

45.

Kim

. Convolutional neural network. In: MATLAB Deep Learning: With Machine Learning, Neural Networks and Artificial Intelligence, 2017, pp.121–147.

46.

Devlin

Chang

M-W

Lee

, et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. 2018.

47.

Koroteev

. BERT: A Review of Applications in Natural Language Processing and Understanding. 2021.

48.

Ott

Choi

Cardie

, et al. Finding deceptive opinion spam by any stretch of the imagination. ArXiv Preprint ArXiv:1107.4557. 2011.

Presenting a bi-classification model to detect fake news from textual data using artificial intelligence methods and text analysis techniques

Abstract

Keywords

Get full access to this article

References