International Journal of Allied Sciences (IJAS)

NETWORK INTRUSION DETECTION USING BIG DATA ANALYTICS: A PYSPARK AND HIVE APPROACH FOR UNSW-NB15

Authors

  • Daniel U. Okon Faculty of Computing, Department of Cybersecurity, University of Port Harcourt, Port Harcourt Rivers, Nigeria

Abstract

Network intrusion detection remains a critical challenge as cyber threats continue to evolve in complexity and scale. This study investigates the application of big data analytics for intrusion detection using the UNSW-NB15 dataset. Apache Hive was used for large-scale querying and feature analysis, while PySpark was used for advanced analytics, including descriptive statistics, correlation, hypothesis testing, and dimensionality reduction. A RF classifier was developed and evaluated for both binary and multi-class intrusion detection tasks. The experimental results demonstrate a 99.99% accuracy in binary classification and 98.62% in multi-class classification, highlighting the effectiveness of combining Hive and PySpark for scalable intrusion detection. These findings underscore the importance of big data frameworks in strengthening cybersecurity defence systems

Keywords:

Cybersecurity, Network Intrusion Detection, Big Data Analytics, Apache Spark, Apache Hive, UNSW-NB15, Random Forest, Machine Learning

Published

2025-09-16

DOI:

https://doi.org/10.5281/zenodo.17135943

Issue

Section

Articles

How to Cite

Okon, D. U. (2025). NETWORK INTRUSION DETECTION USING BIG DATA ANALYTICS: A PYSPARK AND HIVE APPROACH FOR UNSW-NB15. International Journal of Allied Sciences (IJAS), 16(9), 11–30. https://doi.org/10.5281/zenodo.17135943

References

Alrawashdeh T, Alhamid M. 2020. Intrusion detection system using machine learning International Journal of Advanced Computer Science and Applications, 11(6), 7-14. https://doi.org/10.14569/IJACSA.2020.0110602

Aminanto, E., Wibisono, H., & Adi, K. (2017). Intrusion Detection System Using Cloud Computing Data Mining Techniques Journal of Telecommunication, Electronic and Computer Engineering, vol. 9, no. 3–8, pp. 43–47.

Chen, X., Li, D., Chen, M., and Zou, D. (2019). Cybersecurity and privacy protection: Survey, taxonomy, and open issues. IEEE Communications Surveys & Tutorials, 21(3), 2333-2370. https://doi.org/10.1109/COMST.2019.2914962

Federal Bureau of Investigation (FBI). (2021). Internet Crime Complaint Centre (IC3) Report 2020. Retrieved from https://www.ic3.gov/Media/PDF/AnnualReport/2020_IC3Report.pdf

Wang, Y., Wang, J., Huang, L., & Yao, X. (2018). Intrusion detection system based on improved GMM algorithm Journal of Physics: Conference Series, 1096, 032023. https://doi.org/10.1088/1742-6596/1096/3/032023

Chowdhury, M., Zaharia, M., Ma, J., Jordan, M. I., and Stoica, I. (2011). Managing data transfers in computer clusters with orchestra. ACM SIGCOMM Computer Communication Review, 41(4), 98-109. https://doi.org/10.1145/2043164.2018448

Moustafa, N. (2015). The UNSW-NB15 dataset. Research Data Australia. Available from: https://researchdata.edu.au/the-unsw-nb15-dataset/1957529

Turney, S. (2022). Pearson correlation coefficient (r) | Guide and examples. Scribbr. https://www.scribbr.com/statistics/pearson-correlation-coefficient/

Volpi, G. F. (2020). The most gentle introduction to PCA. Towards Data Science. https://towardsdatascience.com/the-most-gentle-introduction-to-principal-component-analysis-9ffae371e93b

Li, J., Wu, Y., & Zhang, H. (2021). Deep learning methods for network intrusion detection: A survey. Computers & Security, 102, 102153. https://doi.org/10.1016/j.cose.2020.102153

Shone, N., Ngoc, T. N., Phai, V. D., & Shi, Q. (2022). Hybrid deep learning approach for network intrusion detection Journal of Information Security and Applications, 67, 103182. https://doi.org/10.1016/j.jisa.2022.103182

Zhang, Y., Sun, Y., & Lin, X. (2023). Scalable ML for big data intrusion detection in cloud environments Future Generation Computer Systems, 144, 85-97. https://doi.org/10.1016/j.future.2023.01.005

Alqahtani, A., & Wang, H. (2024). A survey on big data analytics for cybersecurity: Challenges and opportunities. IEEE Access, 12, 5573-5590. https://doi.org/10.1109/ACCESS.2024.3349557

Similar Articles

1 2 3 4 > >> 

You may also start an advanced similarity search for this article.