IMPROVING OBJECT DETECTION EFFICIENCY: YOLO V4 AND BACKBONE SELECTION
Abstract
Object detection and classification are crucial tasks in various applications, and the You Only Look Once (YOLO) algorithm has emerged as a prominent solution, known for its real-time performance and lightweight model size. This paper investigates the impact of different backbones, namely CSPResNeXt-50, CSPDarkNet53, and EfficientNet-B0, on the performance of YOLO v4 as an object detection model. Microfossil analysis, a critical aspect of biostratigraphy for dating rocks using contained fossils, is used as the primary application domain for evaluation.
The YOLO algorithm comprises three main components: backbone, neck, and head, each serving distinct functions. The backbone acts as a feature extractor from input images, and the selected backbones exhibit promising performance, achieving up to 70% accuracy and detecting objects at 40 frames per second (FPS). Comparisons with other one-stage detectors like RetinaNet and Single Shot Multibox Detector (SSD) highlight YOLO's superiority in real-time scenarios.
The study focuses on microfossil identification, which traditionally relies on specialized human expertise, leading to a decline in education and training opportunities. Leveraging advancements in machine learning, the research explores the potential for automated microfossil characterization using Convolutional Neural Networks (CNNs). Previous research achieved significant accuracy, with ResNet50 model attaining 81.8% accuracy, 76.7% precision, and 71.4% recall.
This paper further examines the performance of YOLO v4 with different backbones. The authors test RetinaNet, EfficientDet-D0, RFBNet, NAS-FPN, ATSS, RDSNet, CenterMask, LRF, Faster R-CNN, M2det, SSD, and TridentNet as part of YOLO v4, with RetinaNet and EfficientDet-D0 showing promising results close to YOLO v4. Notably, YOLO v4 with CSPDarkNet53 backbone achieves 96 FPS and 41.2% Average Precision (AP), while EfficientDet-D0 scores 62.5 FPS and 33.8% AP, and RetinaNet reaches 37 FPS and an AP of 37%.
The primary focus of this research is to assess the impact of CSPResNeXt-50, CSPDarkNet53, and EfficientNet-B0 backbones on the YOLO v4 model's performance as an object detector. Evaluation metrics include mean average precision (mAP), average precision (AP) at various Intersection over Union (IoU) scales, F1-score, and frames per second (FPS). The results shed light on how the selected backbones influence YOLO v4 configuration, enabling a clear understanding of their effects