Drone Detection with YOLOv8 – A Learning Project

Computer VisionYOLOv8Object DetectionPythonPyTorchDeep LearningThesis

General

Drone Detection with YOLOv8 – A Learning Project 1

Project Overview

**Note:** This is a first draft and rough AI translation of the original documentation. The content below may contain inconsistencies or unclear phrasing. I'll update it with a proper manual revision when I get the time. Thanks for your patience! --- This started as a university assignment exploring object detection. The goal was to train a YOLOv8 model to detect drones and see what it takes to go from training to a working demo. I used two public datasets, experimented with augmentation strategies, and built a simple streaming setup with an ESP32-CAM. The project taught me a lot about dataset quality, the gap between test metrics and real-world performance, and the practical challenges of deploying ML models.

Version:v1.0

Time:~80.0 hours

Cost:$0 (if you have a GPU)

Status:completed

Materials

NVIDIA GPU (CUDA capable) × 1
ESP32-CAM module × 1
Laptop for inference × 1
Anti-UAV Dataset × ~200 videos
VisioDECT Dataset × ~20k images
Python 3.8+ × 1

Tools

PyTorch
YOLOv8 (Ultralytics)
OpenCV
Matplotlib

The project covers dataset preparation, model training with different configurations, evaluation, and a basic hardware demo. I trained four YOLOv8 variants to compare data strategies, then built a streaming setup to test detection on live video.

I used two public datasets to get variety in conditions and drone appearances.

2.1

Anti-UAV Dataset

The primary dataset, containing ~200 training videos at 1080p. I sampled frames to avoid redundancy. Most images show drones at long range against cloudy skies—good for distance detection but limited in lighting variety.

Resolution: 1920×1080
Conditions: Mostly cloudy, distant subjects

2.2

VisioDECT Dataset

Added for variety. Contains ~20k images across sunny, cloudy, rainy, and night conditions. Lower resolution (852×480) but drones appear closer and more detailed.

Conditions: Sun, rain, night
Advantage: Better visibility of drone features

2.3

Augmentations

I tested geometric augmentations (vertical and horizontal flips) to see their effect on generalization. Results were mixed—more on this in the training section.

All training used PyTorch and Ultralytics YOLOv8. I trained four model variants using a continuous training approach—starting from previous weights rather than scratch each time.

3.1

Setup

Epochs: 20 per variant
Batch Size: 16
Image Size: 640px
Approach: Continuous training (each model builds on the previous)

3.2

Model 1: Baseline

Anti-UAV dataset only, no augmentations. Established a baseline to compare against.

3.3

Model 2: + Vertical Flip

Added vertical flip augmentation. The idea was that drone silhouettes might be orientation-independent.

3.4

Model 3: + Horizontal Flip

Added horizontal flip augmentation. This turned out to hurt performance—likely because mirroring creates implausible lighting patterns (sun position becomes inconsistent).

3.5

Model 4: Combined Datasets

Combined Anti-UAV and VisioDECT datasets without the horizontal flip. The added variety in conditions and drone appearances improved generalization.

Each model was evaluated on a held-out test set using standard metrics.

4.1

Model 1 (Baseline)

mAP@50: 0.92
mAP@50-95: 0.58
Precision: 0.91
Recall: 0.88

Reasonable baseline. Some false positives but generally distinguished drones from background.

4.2

Model 2 (+ Vertical Flip)

mAP@50: 0.93
mAP@50-95: 0.60
Precision: 0.92
Recall: 0.89

Slight improvement. Vertical flip had limited but positive impact.

4.3

Model 3 (+ Horizontal Flip)

mAP@50: 0.87
mAP@50-95: 0.52
Precision: 0.85
Recall: 0.84

Performance dropped. Confidence scores also decreased noticeably. The augmentation likely introduced unrealistic training examples.

4.4

Model 4 (Combined Data)

mAP@50: 0.95
mAP@50-95: 0.64
Precision: 0.94
Recall: 0.91

Best overall. Adding diverse real data helped more than augmentation tricks.

4.5

Comparison

Model 4 generalized best, but Model 1 was competitive on the original dataset. More data helps generalization but adds complexity. Not all augmentations are beneficial—horizontal flipping hurt this specific task.

To test beyond offline metrics, I built a simple streaming demo.

Hardware:

ESP32-CAM module streaming video over WiFi
Laptop running inference (the ESP32 doesn't have the power to run YOLO)

Software:

Model weights loaded in PyTorch
OpenCV for frame capture and visualization
Simple web interface for the detection feed

Takeaway: The gap between test metrics and real performance is real. The ESP32-CAM's lower resolution and different lighting conditions challenged the model. Useful for understanding practical deployment issues, even if it's not a production setup.

Data variety matters more than augmentation tricks. Adding VisioDECT helped more than any augmentation I tried.
Not all augmentations help. Horizontal flipping hurt performance for this task.
Test metrics don't tell the whole story. Real-world conditions (lighting, resolution, distance) introduced challenges the test set didn't capture.
Building a demo teaches you a lot. Actually running the model on live video revealed issues I wouldn't have found otherwise.

If I revisit this project:

Try photometric augmentations (brightness, contrast, noise) instead of geometric ones
Experiment with newer architectures (YOLOv9, RT-DETR)
Collect custom data that better matches deployment conditions
Explore actual edge deployment with model quantization

Results

Model 4 (combined datasets) performed best overall. Dataset diversity mattered more than augmentation strategies. Horizontal flipping hurt performance, likely due to lighting inconsistencies.

Model 1 mAP@50: 0.92
Model 1 mAP@50-95: 0.58
Model 2 mAP@50: 0.93
Model 2 mAP@50-95: 0.60
Model 3 mAP@50: 0.87
Model 3 mAP@50-95: 0.52
Model 4 mAP@50: 0.95
Model 4 mAP@50-95: 0.64
Model 4 Precision: 0.94
Model 4 Recall: 0.91

Safety Notes

Software project—no physical risks. Watch for disk space with large datasets and GPU temps during training.

Drone Detection with YOLOv8 – A Learning Project

Project Overview

Materials

Tools

How It Works

Datasets

Anti-UAV Dataset

VisioDECT Dataset

Augmentations

Training

Setup

Model 1: Baseline

Model 2: + Vertical Flip

Model 3: + Horizontal Flip

Model 4: Combined Datasets

Results

Model 1 (Baseline)

Model 2 (+ Vertical Flip)

Model 3 (+ Horizontal Flip)

Model 4 (Combined Data)

Comparison

Demo Setup

What I Learned

Future Work

Resources

Results

Safety Notes