Performance Analysis on Detection of Cyberbullying in Code-Mixed Language on Social Media
Keywords:
lower code-mixed languages, natural language processing, social networking, machine learning, cyberbullyingAbstract
Bullying that takes place online is known as cyberbullying. Social media's explosive expansion has made a lot of individuals, particularly young people, more vulnerable to cyberbullying. By using machine learning, we can identify linguistic patterns in the posts that involve cyberbullying and create a model that can automatically identify cyberbullying content. With the rapid growth of social media, cyberbullying has emerged as a major concern, affecting individuals' mental and emotional well-being. The detection and mitigation of cyberbullying are vital for creating safer digital spaces. This survey paper provides a comprehensive review of recent advancements in the detection of cyberbullying on social media platforms. It explores various approaches, including traditional machine learning methods and natural language processing (NLP) techniques. This study aims to explore ground-breaking methods for understanding and automatically detecting occurrences of cyberbullying across various social media platforms, including tweets, comments, and messages.
The survey also delves into the unique challenges associated with detecting cyberbullying, such as handling diverse languages, code-mixed text, and the evolving nature of abusive language. Special attention is given to the detection of cyberbullying in multilingual and code-mixed environments, where standard models may struggle to understand linguistic nuances.
Downloads
References
Amiruzzaman, A., Rahman, A., Farjana, A., & Chowdhury, H. R. (2024). Multilingual cyberbullying classification for social platforms. In *2024 6th International Conference on Electrical Engineering and Information & Communication Technology (ICEEICT)* (pp. 904–909). Dhaka, Bangladesh. https://doi.org/10.1109/ICEEICT62016.2024.-10534579
Azeez, N. A., & Misra, S. (2023). Identification and detection of cyberbullying on Facebook using machine learning algorithms. *Journal of Cases on Information Technology, 23*(4). https://doi.org/10.4018/JCIT.296254
Balaji, P. G., Katariya, P. P., Sruthi, S., & Venugopalan, M. (2024). Cyberbullying detection on multiclass data using machine learning and a hybrid CNN-BiLSTM architecture. In *2024 International Conference on Knowledge Engineering and Communication Systems (ICKECS)* (pp. 1–6). Chikkaballapur, India. https://doi.org/10.1109/ICKECS61492.2024.10616957
Balakrisnan, V., & Kaity, M. (2023). Cyberbullying detection and machine learning: A systematic literature review. *Artificial Intelligence Review, 56*(Suppl 1), 1375–1416. https://doi.org/10.1007/s10462-023-10553-w
Chaitanya, I., Madapakula, I., Gupta, S. K., & Thara, S. (2018). Word-level language identification in code-mixed data using word embedding methods for Indian languages. In *Proceedings of the International Conference on Advances in Computing, Communications, and Informatics (ICACCI)* (pp. 1137–1141). Bangalore, India, September. https://doi.org/10.1109/ICACCI.2018.8554501
Chakraborty, P., & Seddiqui, Md. H. (2019). Threat and abusive language detection on social media in Bengali language. In *Proceedings of the 1st International Conference on Advances in Science, Engineering, and Robotics Technology. *
Chu, C. C.-F., So, R., Li, S. S.-W., Kwong, E. K.-L., & Chiu, C.-H. (2023). A framework for early detection of cyberbullying in Chinese-English code-mixed social media text using natural language processing and machine learning. In *2023 5th International Conference on Natural Language Processing (ICNLP)* (pp. 298–302). Guangzhou, China. https://doi.org/10.1109/ICNLP58431.2023.00061
Claeser, D., Felske, D., & Kent, S. (2018). Token-level code-switching detection using Wikipedia as a lexical resource. In *Proceedings of the International Conference of the German Society for Computational Linguistics and Language Technology* (pp. 192–198). Cham, Switzerland: Springer.
Corazza, M., Menini, S., Cabrio, E., Tonelli, S., & Villata, S. (2020). A multilingual evaluation for online hate speech detection. *ACM Transactions on Internet Technology, 20*(2), Article 10, March. 22 pages. https://doi.org/10.1145/3377323
Das, S. D., Mandal, S., & Das, D. (2019). Language identification of Bengali-English code-mixed data using character and phonetic-based LSTM models. In *Proceedings of the 11th Forum on Information Retrieval Evaluation (FIRE)* (pp. 60–64). Kolkata, India, December. https://doi.org/10.1145/3368567.3368578
Dutta, S., Neog, M., & Baruah, N. (2024). Assamese toxic comment detection on social media using machine learning methods. In *2024 Second International Conference on Emerging Trends in Information Technology and Engineering (ICETITE)* (pp. 1–8). Vellore, India. https://doi.org/10.1109/ic-ETITE58242.2024.10493331
Gundapu, S., & Mamidi, R. (2018). Word-level language identification in English-Telugu code-mixed data. In *Proceedings of the 32nd Pacific Asia Conference on Language, Information, and Computation (PACLIC)*, Hong Kong.
Gupta, B., Bhatt, G., & Mittal, A. (2016). Language identification and disambiguation in Indian mixed-script. In *Proceedings of the International Conference on Distributed Computing and Internet Technology (ICDCIT)* (Vol. 9581, pp. 113–121). Cham: Springer. https://doi.org/10.1007/978-3-319-28034-9_14
Haidar, B., Chamoun, M., & Serhrouchni, A. (2017). A multilingual system for cyberbullying detection: Arabic content detection using machine learning. *Saint Joseph University*, Lebanon; *Telecom ParisTech*, France. November.
Jaech, A., Mulcaire, G., Ostendorf, M., & Smith, N. A. (2016). A neural model for language identification in code-switched tweets. In *Proceedings of the 2nd Workshop on Computational Approaches to Code Switching* (pp. 60–64). Austin, TX, USA.
Kazi, M., Mehta, H., & Bharti, S. (2020). Sentence-level language identification in Gujarati-Hindi code-mixed scripts. In *Proceedings of the IEEE International Symposium on Sustainable Energy, Signal Processing, and Cyber Security (iSSSC)* (pp. 1–6). Gunupur, Odisha, India, December. https://doi.org/10.1109/iSSSC50941.-2020.9358837
Mathur, K., Mehta, K. N., Shivakumar, K., & D, U. (2022). Detection of cyberbullying on social media code mixed data. In *2022 IEEE International Conference on Cloud Computing in Emerging Markets (CCEM)* (pp. 16–23). Zurich, Switzerland. https://doi.org/10.1109/CCEM57073.2022.00011
Mishra, A., & Sharma, Y. (2019). Language identification and context-based analysis of code-switching behaviors in social media discussions. In *Proceedings of the IEEE International Conference on Big Data (BigData)* (pp. 5951–5956). Los Angeles, CA, USA, December. https://doi.org/10.1109/BigData47090.2019.9006032
Muthuthanthri, M., & Smith, R. I. (2024). Hate speech detection for transliterated English and Sinhala code-mixed data. In *2024 4th International Conference on Advanced Research in Computing (ICARC)* (pp. 155–160). Belihuloya, Sri Lanka. https://doi.org/10.1109/ICARC61713.2024.10499768
Nguyen, L., Bryant, C., Kidwai, S., & Biberauer, T. (2021). Automatic language identification in code-switched Hindi-English social media text. *Journal of Open Humanities Data, 7*, 1–13. June.
Nafis, N., Kanojia, D., Saini, N., & Murthy, R. (2023). Towards safer communities: Detecting aggression and offensive language in code-mixed tweets to combat cyberbullying. In *Proceedings of the 7th Workshop on Online Abuse and Harms (WOAH)* (pp. 29–41). Association for Computational Linguistics. July 13, 2023.
Phadte, A., & Wagh, R. (2017). Word-level language identification system for Konkani-English code-mixed social media text (CMST). In *Proceedings of the 10th Annual ACM India Compute Conference (ZZZ-Compute)* (pp. 103–107). Bhopal, India. https://doi.org/10.1145/3140107.3140132
Phadtare, C., Rajpara, K., & Shah, K. (2022). Cyber-bullying detection in Hinglish languages using machine learning. *International Journal of Engineering Research & Technology, 11*(5), May.
Shekhar, S., Sharma, D. K., & Sufyan Beg, M. M. (2020). An effective bi-LSTM word embedding system for analysis and identification of language in code-mixed social media text in English and Roman Hindi. *Computación y Sistemas, 24*(4), December. https://doi.org/10.13053/cys-24-4-3151
Sowmya Lakshmi, B. S., & Shambhavi, B. R. (2017). An automatic language identification system for code-mixed English-Kannada social media text. In *Proceedings of the 2nd International Conference on Computational Systems and Information Technology for Sustainable Solution (CSITSS)* (pp. 1–5). Bengaluru, India. https://doi.org/10.1109/CSITSS.2017.8447784
Veena, P. V., Kumar, M. A., & Soman, K. P. (2017). An effective way of word-level language identification for code-mixed Facebook comments using word embedding via character embedding. In *Proceedings of the International Conference on Advances in Computing, Communications, and Informatics (ICACCI)* (pp. 1552–1556). Udupi, India, September.