Multilingual Phishing Detection in Cybersecurity: A Review of Machine Learning Approaches
Keywords:
Phishing Detection, Machine Learning, Natural Language Processing (NLP), URL AnalysisAbstract
With the increasing rise of internet usage, phishing attacks have become more sophisticated, utilizing multiple languages to deceive a diverse range of users. Traditional phishing detection methods, often limited to specific languages, struggle to address the linguistic patterns found in multilingual phishing schemes. This paper presents an approach to designing and analyzing a machine learning (ML) model tailored for multilingual phishing detection. By leveraging datasets encompassing various languages, we propose a model architecture that integrates Natural Language Processing (NLP) techniques, with classification algorithms to detect phishing content accurately across languages. Features such as character level patterns, semantic cues are incorporated to enhance the model’s adaptability and resilience. We also peruse the model’s performance and accuracy in detecting phishing content in multiple languages by performing a comparative analysis of the various ML algorithms used thereby coming up with a multilingual ML model that will offer a significant increase in accuracy, establishing a foundation for cross-lingual phishing defense.
Downloads
References
T.N. Ranganadham, S. Harshitha, K. Jahnavi, K. Jyothirmai, T. Guna Harshitha. Phishing Detection Using Machine Learning Techniques. Dogo Rangsang Research Journal. UGC Care Group I Journal. ISSN: 2347-7180 Vol-13, Issue-2, No. 1, February 2023
Catal, Cagatay & Giray, Görkem & Tekinerdogan, Bedir & Kumar, Sandeep & Shukla, Suyash. (2022). Applications of deep learning for phishing detection: a systematic literature review. Knowledge and Information Systems. 64. 10.1007/s10115-022-01672-x.
Sahingoz, Ozgur & Buber, Ebubekir & Demir, Onder & Diri, Banu. (2019). Machine learning based phishing detection from URLs. Expert Systems with Applications. 117. 345-357.
Gundla, Sri & Karthik, M & Reddy, Middi & Gourav, & Pankaj, Ashutosh & Stamenkovic, Zoran & Raja, S.P. (2023). A Feature Extraction Approach for the Detection of Phishing Websites Using Machine Learning. Journal of Circuits, Systems and Computers. 33. 10.1142/S0218126624500312.
Sawant, Soham & Savakhande, Rushabh & Sankhe, Om & Tamboli, Santosh. (2024). Phishing Detection by integrating Machine Learning and Deep Learning. 1078-1083. 10.23919/INDIACom61295.2024.10499100.
R. Zaimi, M. Hafidi and M. Lamia, "Survey paper: Taxonomy of website anti-phishing solutions," 2020 Seventh International Conference on Social Networks Analysis, Management and Security (SNAMS), Paris, France, 2020, pp. 1-8, doi: 10.1109/SNAMS52053.2020.9336559.
S. Ghareeb, M. Mahyoub and J. Mustafina, "Analysis of Feature Selection and Phishing Website Classification Using Machine Learning," 2023 15th International Conference on Developments in eSystems Engineering (DeSE), Baghdad & Anbar, Iraq, 2023, pp. 178-183, doi: 10.1109/DeSE58274.2023.10099697.
S. Das Guptta, K. T. Shahriar, H. Alqahtani, et al., "Modeling Hybrid Feature-Based Phishing Websites Detection Using Machine Learning Techniques," Annals of Data Science, vol. 11, pp. 217–242, Feb. 2024, doi: 10.1007/s40745-022-00379-8.
Mahajan, Rishikesh & Siddavatam, Irfan. (2018). Phishing Website Detection using Machine Learning Algorithms. International Journal of Computer Applications. 181. 45-47. 10.5120/ijca2018918026.
Dutta, Ashit. (2021). Detecting phishing websites using machine learning technique. PLOS ONE. 16. e0258361. 10.1371/journal.pone.0258361.
N. Q. Do, A. Selamat, O. Krejcar, E. Herrera-Viedma and H. Fujita, "Deep Learning for Phishing Detection: Taxonomy, Current Challenges and Future Directions," in IEEE Access, vol. 10, pp. 36429-36463, 2022, doi: 10.1109/ACCESS.2022.3151903.
L. A. T. Nguyen, B. L. To, H. K. Nguyen and M. H. Nguyen, "Detecting phishing web sites: A heuristic URL-based approach," 2013 International Conference on Advanced Technologies for Communications (ATC 2013), Ho Chi Minh City, Vietnam, 2013, pp. 597-602, doi: 10.1109/ATC.2013.6698185.
A. Y. Daeef, R. B. Ahmad, Y. Yacob and N. Y. Phing, "Wide scope and fast websites phishing detection using URLs lexical features," in 2016 3rd International Conference on Electronic Design (ICED), Phuket, 2016, pp. 410-415, doi: 10.1109/ICED.2016.7804679.
Y. Zhang, J. Hong and L. Cranor, "Cantina: a content-based approach to detecting phishing web sites," in International World Wide Web Conference, WWW 2007, May 8–12, 2007, Banff, Alberta, Canada
Perla Hari Priya, “Detection of Phishing Website Using Machine Learning” International Journal of Research Publication and Reviews, Vol 4, no 12, pp 4990-4995 December 2023
Narravalu Mounika, R. Sheeja “A Survey on Detection of Phishing Websites Using an Efficient Feature based Machine Learning Framework” January-February 2020 ISSN: 0193-4120Page No. 10572 - 10578
E. Mariappan, C. Jean Celia Grace, S. Joe Patrick Gnanaraj, D. Elizabeth Paulsyah, N. Muthukumaran “Phishing Website Detection using Natural Language Processing” 2024 International Conference on Inventive Computation Technologies (ICICT) | 979-8-3503-5929-9/24/$31.00 ©2024 IEEE | DOI: 10.1109/ICICT60155.2024.10545000
[18] John Arthur Jupin, Tole Sutikno, Mohd Arfian Ismail, Mohd Saberi Mohamad, Shahreen Kasim, Deris Stiawan “Review of the machine learning methods in the classification of phishing attack” Bulletin of Electrical Engineering and Informatics Vol. 8, No. 4, December 2019, pp. 1545~1555 ISSN: 2302-9285, DOI: 10.11591/eei. v8i4.1344
Ms. Sophiya Shikalgar, Dr. S. D. Sawarkar, Mrs.Swati Narwane “Detection of URL based Phishing Attacks using Machine Learning” International Journal of Engineering Research & Technology (IJERT) ISSN: 2278-0181http://www.ijert.org IJERTV8IS110269
Abdul Karim, Mobeen Shahroz, Khabib Mustofa, Samir Brahmin Belhaouri, S. Ramana Kumar Joga, “Phishing Detection System Through Hybrid Machine Learning Based on URL” Digital Object Identifier 10.1109/ACCESS.2023.3252366
Dakota Staples, Saqib Hakak, Paul Cook, “A Comparison of Machine Learning Algorithms for Multilingual Phishing Detection” 2023 20th Annual International Conference on Privacy, Security and Trust (PST) | 979-8-3503-1387-1/23/$31.00 ©2023 IEEE | DOI: 10.1109/PST58708.2023.1032017
Abdul Basit, Maham Zafar, Xuan Liu, Abdul Rehman Javed, Zunera Jalil, Kashif Kifayat, “A comprehensive survey of AI-enabled phishing attacks detection techniques” Telecommunication Systems (2021) 76:139–154 https://doi.org/10.1007/s11235-020-00733-2
Mohammad Nazmul Alam, Ishita Saha, Dhiman Sarma, Rubaiath-E- Ulfath, Farzana Firoz Lima, Sohrab Hossain, “Phishing Attacks Detection using Machine Learning Approach” Proceedings of the Third International Conference on Smart Systems and Inventive Technology (ICSSIT 2020) IEEE Xplore Part Number: CFP20P17-ART; ISBN: 978-1-7281-5821-1
Buber, Ebubekir & Diri, Banu & Sahingoz, Ozgur. (2018). NLP Based Phishing Attack Detection from URLs. 10.1007/978-3-319-76348-4_59.