IJ
Ivan Jovanovski
HomeProjectsLabResumeNowFuture Ideas
IJ
Ivan Jovanovski

Electrical Engineer building practical solutions in energy systems, automotive electronics, and embedded systems.

Navigation

  • Aerospace
  • Projects
  • Lab

Focus Areas

  • Energy Systems
  • Automotive Electronics

Connect

© 2026 Ivan Jovanovski. All rights reserved.

Drone Detection with YOLOv8 – A Learning Project

Computer VisionYOLOv8Object DetectionPythonPyTorchDeep LearningThesis
General
Drone Detection with YOLOv8 – A Learning Project 1

Project Overview

**Note:** This is a first draft and rough AI translation of the original documentation. The content below may contain inconsistencies or unclear phrasing. I'll update it with a proper manual revision when I get the time. Thanks for your patience! --- This started as a university assignment exploring object detection. The goal was to train a YOLOv8 model to detect drones and see what it takes to go from training to a working demo. I used two public datasets, experimented with augmentation strategies, and built a simple streaming setup with an ESP32-CAM. The project taught me a lot about dataset quality, the gap between test metrics and real-world performance, and the practical challenges of deploying ML models.

Version:v1.0
Time:~80.0 hours
Cost:$0 (if you have a GPU)
Status:completed

Materials

  • NVIDIA GPU (CUDA capable) × 1
  • ESP32-CAM module × 1
  • Laptop for inference × 1
  • Anti-UAV Dataset × ~200 videos
  • VisioDECT Dataset × ~20k images
  • Python 3.8+ × 1

Tools

  • PyTorch
  • YOLOv8 (Ultralytics)
  • OpenCV
  • Matplotlib

Contents

  1. 1.How It Works
  2. 2.Datasets
    1. 2.1Anti-UAV Dataset
    2. 2.2VisioDECT Dataset
    3. 2.3Augmentations
  3. 3.Training
    1. 3.1Setup
    2. 3.2Model 1: Baseline
    3. 3.3Model 2: + Vertical Flip
    4. 3.4Model 3: + Horizontal Flip
    5. 3.5Model 4: Combined Datasets
  4. 4.Results
    1. 4.1Model 1 (Baseline)
    2. 4.2Model 2 (+ Vertical Flip)
    3. 4.3Model 3 (+ Horizontal Flip)
    4. 4.4Model 4 (Combined Data)
    5. 4.5Comparison
  5. 5.Demo Setup
  6. 6.What I Learned
  7. 7.Future Work
  8. 8.Resources
How It Works 1

The project covers dataset preparation, model training with different configurations, evaluation, and a basic hardware demo. I trained four YOLOv8 variants to compare data strategies, then built a streaming setup to test detection on live video.

Datasets 1

I used two public datasets to get variety in conditions and drone appearances.

2.1

Anti-UAV Dataset

Anti-UAV Dataset

The primary dataset, containing ~200 training videos at 1080p. I sampled frames to avoid redundancy. Most images show drones at long range against cloudy skies—good for distance detection but limited in lighting variety.

  • Resolution: 1920×1080
  • Conditions: Mostly cloudy, distant subjects
2.2

VisioDECT Dataset

VisioDECT Dataset

Added for variety. Contains ~20k images across sunny, cloudy, rainy, and night conditions. Lower resolution (852×480) but drones appear closer and more detailed.

  • Conditions: Sun, rain, night
  • Advantage: Better visibility of drone features
2.3

Augmentations

I tested geometric augmentations (vertical and horizontal flips) to see their effect on generalization. Results were mixed—more on this in the training section.

Training 1

All training used PyTorch and Ultralytics YOLOv8. I trained four model variants using a continuous training approach—starting from previous weights rather than scratch each time.

3.1

Setup

  • Epochs: 20 per variant
  • Batch Size: 16
  • Image Size: 640px
  • Approach: Continuous training (each model builds on the previous)
3.2

Model 1: Baseline

Anti-UAV dataset only, no augmentations. Established a baseline to compare against.

3.3

Model 2: + Vertical Flip

Added vertical flip augmentation. The idea was that drone silhouettes might be orientation-independent.

3.4

Model 3: + Horizontal Flip

Added horizontal flip augmentation. This turned out to hurt performance—likely because mirroring creates implausible lighting patterns (sun position becomes inconsistent).

3.5

Model 4: Combined Datasets

Combined Anti-UAV and VisioDECT datasets without the horizontal flip. The added variety in conditions and drone appearances improved generalization.

Results 1

Each model was evaluated on a held-out test set using standard metrics.

4.1

Model 1 (Baseline)

Model 1 (Baseline)
  • mAP@50: 0.92
  • mAP@50-95: 0.58
  • Precision: 0.91
  • Recall: 0.88

Reasonable baseline. Some false positives but generally distinguished drones from background.

4.2

Model 2 (+ Vertical Flip)

Model 2 (+ Vertical Flip)
  • mAP@50: 0.93
  • mAP@50-95: 0.60
  • Precision: 0.92
  • Recall: 0.89

Slight improvement. Vertical flip had limited but positive impact.

4.3

Model 3 (+ Horizontal Flip)

Model 3 (+ Horizontal Flip)
  • mAP@50: 0.87
  • mAP@50-95: 0.52
  • Precision: 0.85
  • Recall: 0.84

Performance dropped. Confidence scores also decreased noticeably. The augmentation likely introduced unrealistic training examples.

4.4

Model 4 (Combined Data)

Model 4 (Combined Data)
  • mAP@50: 0.95
  • mAP@50-95: 0.64
  • Precision: 0.94
  • Recall: 0.91

Best overall. Adding diverse real data helped more than augmentation tricks.

4.5

Comparison

Comparison

Model 4 generalized best, but Model 1 was competitive on the original dataset. More data helps generalization but adds complexity. Not all augmentations are beneficial—horizontal flipping hurt this specific task.

Demo Setup 1
Demo Setup 2

To test beyond offline metrics, I built a simple streaming demo.

Hardware:

  • ESP32-CAM module streaming video over WiFi
  • Laptop running inference (the ESP32 doesn't have the power to run YOLO)

Software:

  • Model weights loaded in PyTorch
  • OpenCV for frame capture and visualization
  • Simple web interface for the detection feed

Takeaway: The gap between test metrics and real performance is real. The ESP32-CAM's lower resolution and different lighting conditions challenged the model. Useful for understanding practical deployment issues, even if it's not a production setup.

  • Data variety matters more than augmentation tricks. Adding VisioDECT helped more than any augmentation I tried.
  • Not all augmentations help. Horizontal flipping hurt performance for this task.
  • Test metrics don't tell the whole story. Real-world conditions (lighting, resolution, distance) introduced challenges the test set didn't capture.
  • Building a demo teaches you a lot. Actually running the model on live video revealed issues I wouldn't have found otherwise.

If I revisit this project:

  • Try photometric augmentations (brightness, contrast, noise) instead of geometric ones
  • Experiment with newer architectures (YOLOv9, RT-DETR)
  • Collect custom data that better matches deployment conditions
  • Explore actual edge deployment with model quantization
  • Anti-UAV Dataset
  • VisioDECT Dataset
  • Ultralytics YOLOv8
  • PyTorch

Results

Model 4 (combined datasets) performed best overall. Dataset diversity mattered more than augmentation strategies. Horizontal flipping hurt performance, likely due to lighting inconsistencies.

  • Model 1 mAP@50: 0.92
  • Model 1 mAP@50-95: 0.58
  • Model 2 mAP@50: 0.93
  • Model 2 mAP@50-95: 0.60
  • Model 3 mAP@50: 0.87
  • Model 3 mAP@50-95: 0.52
  • Model 4 mAP@50: 0.95
  • Model 4 mAP@50-95: 0.64
  • Model 4 Precision: 0.94
  • Model 4 Recall: 0.91

Safety Notes

Software project—no physical risks. Watch for disk space with large datasets and GPU temps during training.