Dataset Preparation Tutorial
Learn how to prepare datasets for training using WilData.
Overview
This tutorial covers importing, transforming, and exporting datasets for wildlife detection training.
Prerequisites
- WilData installed
- Annotated images (COCO, YOLO, or Label Studio format)
Step 1: Import Dataset
Option A: Using Config File
Create config.yaml:
source_path: "D:/annotations/dataset.json"
source_format: "coco"
dataset_name: "wildlife_train"
root: "D:/data"
split_name: "train"
transformations:
enable_tiling: true
tiling:
tile_size: 800
stride: 640
min_visibility: 0.7
Run import:
Option B: Direct CLI
wildata import-dataset annotations.json \
--format coco \
--name wildlife_train \
--enable-tiling \
--tile-size 800
Step 2: Apply Transformations
Tiling for Large Images
Bbox Clipping
Step 3: Create ROI Dataset
For classification training:
cd wildata
scripts\create-roi-dataset.bat
# Or with CLI
wildata create-roi-dataset --config configs/roi-create-config.yaml
Step 4: Visualize
Step 5: Export for Training
# Export to YOLO format
wildata dataset export wildlife_train --format yolo --output exports/yolo
# Export to COCO
wildata dataset export wildlife_train --format coco --output exports/coco
Complete Example
from wildata import DataPipeline
# Initialize
pipeline = DataPipeline("data")
# Import with transformations
result = pipeline.import_dataset(
source_path="annotations.json",
source_format="coco",
dataset_name="wildlife_train",
transformations={
"enable_tiling": True,
"tiling": {
"tile_size": 800,
"stride": 640
}
}
)
# Export for training
pipeline.export_dataset("wildlife_train", "yolo")
Next Steps: - Model Training Tutorial - WilData Scripts Reference