NAVIGATING ACADEMIC SUCCESS: A COMPARATIVE ANALYSIS OF IMBALANCE CLASSIFICATION METHODS
Abstract
Imbalanced classification is a critical challenge within the realms of data mining and machine learning, and over the last few years, it has garnered increasing attention from researchers. The conventional approach to classification involves distributing samples evenly across classes to ensure a balanced dataset. However, this practice often leads to unfavorable performance for the majority class. While classifiers are effective in reducing overall classification errors, they tend to exhibit higher error rates when applied to imbalanced datasets, particularly with respect to minority class examples (Barua & Murase, n.d.). In the age of big data, the complexities of imbalanced learning have become more pronounced, and machine learning and data mining have emerged as key tools to address this challenge. This study delves into the intricacies of imbalanced classification in the context of big data, emphasizing the critical importance of addressing class imbalance for effective predictive modeling. Finding rare events in machine learning and data mining is inherently a prediction task, and the scarcity of such events can severely impede prediction accuracy due to the lack of balanced data (Reference [2]). In the realm of big data, where datasets are vast and intricate, the issue of class imbalance becomes particularly pronounced. This phenomenon is prevalent in various real-world applications, including but not limited to spam detection, software defect prediction, and fraud detection (Reference [3]).