
The project covers dataset preparation, model training with different configurations, evaluation, and a basic hardware demo. I trained four YOLOv8 variants to compare data strategies, then built a streaming setup to test detection on live video.

**Note:** This is a first draft and rough AI translation of the original documentation. The content below may contain inconsistencies or unclear phrasing. I'll update it with a proper manual revision when I get the time. Thanks for your patience! --- This started as a university assignment exploring object detection. The goal was to train a YOLOv8 model to detect drones and see what it takes to go from training to a working demo. I used two public datasets, experimented with augmentation strategies, and built a simple streaming setup with an ESP32-CAM. The project taught me a lot about dataset quality, the gap between test metrics and real-world performance, and the practical challenges of deploying ML models.

The project covers dataset preparation, model training with different configurations, evaluation, and a basic hardware demo. I trained four YOLOv8 variants to compare data strategies, then built a streaming setup to test detection on live video.

I used two public datasets to get variety in conditions and drone appearances.

The primary dataset, containing ~200 training videos at 1080p. I sampled frames to avoid redundancy. Most images show drones at long range against cloudy skies—good for distance detection but limited in lighting variety.

Added for variety. Contains ~20k images across sunny, cloudy, rainy, and night conditions. Lower resolution (852×480) but drones appear closer and more detailed.
I tested geometric augmentations (vertical and horizontal flips) to see their effect on generalization. Results were mixed—more on this in the training section.

All training used PyTorch and Ultralytics YOLOv8. I trained four model variants using a continuous training approach—starting from previous weights rather than scratch each time.
Anti-UAV dataset only, no augmentations. Established a baseline to compare against.
Added vertical flip augmentation. The idea was that drone silhouettes might be orientation-independent.
Added horizontal flip augmentation. This turned out to hurt performance—likely because mirroring creates implausible lighting patterns (sun position becomes inconsistent).
Combined Anti-UAV and VisioDECT datasets without the horizontal flip. The added variety in conditions and drone appearances improved generalization.

Each model was evaluated on a held-out test set using standard metrics.

Reasonable baseline. Some false positives but generally distinguished drones from background.

Slight improvement. Vertical flip had limited but positive impact.

Performance dropped. Confidence scores also decreased noticeably. The augmentation likely introduced unrealistic training examples.

Best overall. Adding diverse real data helped more than augmentation tricks.

Model 4 generalized best, but Model 1 was competitive on the original dataset. More data helps generalization but adds complexity. Not all augmentations are beneficial—horizontal flipping hurt this specific task.


To test beyond offline metrics, I built a simple streaming demo.
Hardware:
Software:
Takeaway: The gap between test metrics and real performance is real. The ESP32-CAM's lower resolution and different lighting conditions challenged the model. Useful for understanding practical deployment issues, even if it's not a production setup.
If I revisit this project:
Model 4 (combined datasets) performed best overall. Dataset diversity mattered more than augmentation strategies. Horizontal flipping hurt performance, likely due to lighting inconsistencies.
Software project—no physical risks. Watch for disk space with large datasets and GPU temps during training.