JNACSISSN:2582-3817

Assessing Machine Learning Strategies for Effective Phishing Detection and Prevention in Cybersecurity

Abstract

Phishing attacks have evolved into advanced cyber threats, leading to substantial financial losses, data breaches, and eroding trust in digital platforms. Conventional detection methods, such as rule-based systems and blacklists, have become increasingly inadequate against ever-evolving phishing tactics. This study investigates the effectiveness of machine learning (ML) algorithms in detecting and mitigating phishing attacks, employing a balanced dataset of 10,000 instances (5,000 phishing and 5,000 legitimate websites) sourced from both PhishTank and the UCI Machine Learning Repository. Essential attributes, encompassing URL-based, content-based, and domain-based features, were extracted and preprocessed for training and evaluating four ML algorithms: Decision Trees, Random Forests, Support Vector Machines (SVM), and Neural Networks (Multi-layer Perceptron). The models underwent hyperparameter tuning and were evaluated using 5-fold cross-validation. Metrics such as accuracy, precision, recall, and F1-score were employed to gauge each algorithm's performance. Results reveal that Neural Networks achieved the highest accuracy (94%), followed by Random Forests (92%), SVM (88%), and Decision Trees (85%). Among the evaluated models, Neural Networks demonstrated superior capability in capturing subtle, non-linear patterns, whereas Random Forests provided consistent performance and greater resilience to noisy data. Decision Trees were fast and interpretable but prone to overfitting, whereas SVM models required significant computational resources. This research emphasizes the potential of ML algorithms, particularly Neural Networks and Random Forests, in advancing phishing detection systems. Additionally, it highlights the necessity of incorporating ML with other cybersecurity strategies, such as user education and multi-factor authentication, to develop a comprehensive defense against phishing threats. The study presents actionable recommendations for enhancing real-time phishing detection and addressing the rapidly evolving cybersecurity landscape.

References

  • I. Goodfellow, Y. Bengio, and A. Courville. “Deep Learning”. MIT Press. 2016.
  • IBM Security. Machine Learning in Cybersecurity. Retrieved from https://www.ibm.com/security. 2021.
  • J. Saxe, and K. Berlin. “Deep Neural Network-Based Malware Detection Using Two-Dimensional Binary Program Features”, IEEE. 2015.
  • Symantec. “The Role of Machine Learning in Cybersecurity”, Retrieved from https://www.symantec.com. 2019.
  • N. Abdelhamid, A. Ayesh, and F. Thabtah. “Phishing Detection Based Associative Classification Data Mining,” Expert Systems with Applications, Vol. 41, No. 13, pp. 5948-5959, 2014.
  • Z. Alkhalil, C. Hewage, L. Nawaf, and I. Khan. “Phishing Attacks: A Recent Comprehensive Study and a New Anatomy”, Frontiers in Computer Science, Vol. 3, pp. 563060, 2021.
  • L. Breiman, “Random Forests. Machine Learning”, Vol. 45, No. 1, pp. 5-32, 2001.
  • L. Breiman, J. Friedman, C. J. Stone, and R. A. Olshen. “Classification and Regression Trees”, CRC Press. 1984.
  • C. Cortes, & V. Vapnik. “Support-Vector Networks”, Machine Learning, Vol. 20, No. 3, pp. 273-297.
  • J. H. Friedman. “Greedy Function Approximation: A Gradient Boosting Machine,” Annals of Statistics, Vol. 29, No. 5, pp. 1189-1232, 2001.
  • T. Hastie, R. Tibshirani, and J. Friedman. “The Elements of Statistical Learning: Data Mining, Inference, and Prediction”, Springer. 2009.
  • R. M. Mohammad, F. Thabtah, and L. McCluskey. “Phishing Websites Features”, arXiv preprint arXiv:1901.02385, 2019.
  • D. E. Rumelhart, G. E. Hinton, and R. J. Williams. “Learning Representations by Back-Propagating Errors,” Nature, Vol. 323, No.6088, pp. 533-536.
  • Sahoo, D., C. Liu, and S. C. H. Hoi. “Malicious URL Detection Using Machine Learning: A Survey”, ACM Computing Surveys, Vol. 53, No. 1, pp. 1-36, 2020.
  • Y. Zhang, J. I. Hong and L. F. Cranor. “CANTINA: A Content-Based Approach to Detecting Phishing Web Sites,” In Proceedings of the 16th International Conference on World Wide Web, pp. 639-648, 2018.