WilData Scripts Reference
This page documents all batch scripts available in the WilData package for data management operations.
Overview
All scripts are located in wildata/scripts/ directory.
Quick Reference
| Script | Purpose | Config File |
|---|---|---|
| import-dataset-example.bat | Import single dataset | configs/import-config-example.yaml |
| bulk-import-dataset.bat | Bulk import datasets | configs/bulk-import-*.yaml |
| create-roi-dataset.bat | Create ROI dataset | configs/roi-create-config.yaml |
| bulk-roi-create-config.bat | Bulk create ROI datasets | configs/bulk-roi-create-config.yaml |
| update-gps-example.bat | Update GPS from CSV | configs/gps-update-config-example.yaml |
| visualize_data.bat | Visualize dataset | None |
| dvc-setup.bat | Setup DVC | None |
| launch_api.bat | Launch REST API | .env |
| running_tests.bat | Run tests | None |
Data Import Scripts
import-dataset-example.bat
Purpose: Import a single dataset from COCO, YOLO, or Label Studio format.
Location: wildata/scripts/import-dataset-example.bat
Command:
Configuration: wildata/configs/import-config-example.yaml
Key Parameters:
source_path: "path/to/annotations.json"
source_format: "coco" # coco, yolo, ls
dataset_name: "my_dataset"
root: "data"
split_name: "train" # train, val, test
transformations:
enable_bbox_clipping: true
enable_tiling: true
tiling:
tile_size: 800
stride: 640
roi_config:
roi_box_size: 384
random_roi_count: 2
Example Usage:
cd wildata
# Edit config file first
notepad configs\import-config-example.yaml
# Run import
scripts\import-dataset-example.bat
Output:
- Master format dataset in data/datasets/
- Processed images (tiled if enabled)
- ROI dataset (if configured)
bulk-import-dataset.bat
Purpose: Import multiple datasets in batch mode.
Location: wildata/scripts/bulk-import-dataset.bat
Command:
Configuration: wildata/configs/bulk-import-train.yaml or bulk-import-val.yaml
Parameters:
- -n 2: Number of parallel workers (uses threading on Windows)
- --config: Path to bulk import config
Example Config:
# configs/bulk-import-train.yaml
source_paths:
- "D:/annotations/dataset1.json"
- "D:/annotations/dataset2.json"
- "D:/annotations/dataset3.json"
source_format: "coco"
root: "D:/data"
split_name: "train"
# Shared transformation settings
transformations:
enable_tiling: true
tiling:
tile_size: 800
stride: 640
Example Usage:
Features: - Parallel processing (thread-based) - Progress tracking - Error handling per dataset - Summary report
ROI Dataset Scripts
create-roi-dataset.bat
Purpose: Create Region of Interest (ROI) classification dataset from detection annotations.
Location: wildata/scripts/create-roi-dataset.bat
Command:
Configuration: wildata/configs/roi-create-config.yaml
Key Parameters:
source_path: "annotations.json"
source_format: "coco"
dataset_name: "roi_dataset"
roi_config:
roi_box_size: 128 # Size of extracted ROI
min_roi_size: 32 # Min object size to extract
random_roi_count: 10 # Background samples per image
background_class: "background"
save_format: "jpg"
quality: 95
Use Cases: - Hard sample mining - Error analysis - Training classification models - Creating balanced datasets
Example Usage:
Output: - ROI image crops - Classification labels - Class mapping JSON - Statistics file
bulk-roi-create.bat
Purpose: Create multiple ROI datasets in batch.
Location: Script not shown, but referenced in configs
Configuration: wildata/configs/bulk-roi-create-config.yaml
Example Config:
source_paths:
- "dataset1.json"
- "dataset2.json"
source_format: "coco"
split_name: "val"
roi_config:
roi_box_size: 128
random_roi_count: 5
GPS Management Scripts
update-gps-example.bat
Purpose: Update image EXIF GPS data from CSV file.
Location: wildata/scripts/update-gps-example.bat
Command:
Configuration: wildata/configs/gps-update-config-example.yaml
Key Parameters:
image_folder: "path/to/images"
csv_path: "gps_coordinates.csv"
output_dir: "output/images"
skip_rows: 0
filename_col: "filename"
lat_col: "latitude"
lon_col: "longitude"
alt_col: "altitude"
CSV Format:
filename,latitude,longitude,altitude
image1.jpg,40.7128,-74.0060,10.5
image2.jpg,40.7589,-73.9851,15.2
Example Usage:
cd wildata
# Prepare CSV with GPS data
# Edit config
notepad configs\gps-update-config-example.yaml
# Run update
scripts\update-gps-example.bat
Output: - Images with updated EXIF GPS - Summary report - Error log (if any)
Visualization Scripts
visualize_data.bat
Purpose: Launch FiftyOne visualization for datasets.
Location: wildata/scripts/visualize_data.bat
Command:
Example Usage:
cd wildata
# Visualize training set
uv run wildata visualize-dataset --dataset my_dataset --split train
# Or use script
scripts\visualize_data.bat
Features: - Interactive dataset viewer - Annotation visualization - Filtering and search - Statistics display
DVC Scripts
dvc-setup.bat
Purpose: Initialize and configure DVC for data versioning.
Location: wildata/scripts/dvc-setup.bat
Command:
Storage Options:
Example Usage:
cd wildata
scripts\dvc-setup.bat
# Track data
dvc add data\datasets\my_dataset
# Commit DVC file
git add data\datasets\my_dataset.dvc
git commit -m "Add dataset"
# Push to remote
dvc push
DVC Workflow:
API Scripts
launch_api.bat
Purpose: Launch WilData REST API server.
Location: wildata/scripts/launch_api.bat
Command:
Default Port: 8441
Example Usage:
Access:
- API: http://localhost:8441
- Docs: http://localhost:8441/docs
- Redoc: http://localhost:8441/redoc
API Endpoints:
Import Dataset
POST /api/v1/datasets/import
Content-Type: application/json
{
"source_path": "/path/to/data.json",
"source_format": "coco",
"dataset_name": "my_dataset",
"root": "data"
}
List Datasets
Create ROI Dataset
POST /api/v1/roi/create
Content-Type: application/json
{
"source_path": "/path/to/data.json",
"source_format": "coco",
"dataset_name": "roi_dataset",
"roi_config": {
"roi_box_size": 128,
"random_roi_count": 10
}
}
Job Status
Environment Variables:
Full API Documentation: See WilData API Reference
Testing Scripts
running_tests.bat
Purpose: Run WilData test suite.
Location: wildata/scripts/running_tests.bat
Command:
Example Usage:
Test Categories: - Format adapter tests - Transformation tests - Validation tests - API tests - Integration tests
Run Specific Tests:
# Test imports
uv run pytest tests/test_coco_import.py -v
# Test transformations
uv run pytest tests/test_transformations.py -v
# Test API
uv run pytest tests/api/ -v
# With coverage
uv run pytest --cov=wildata tests/
Common Workflows
Dataset Preparation Workflow
# 1. Import dataset
cd wildata
scripts\import-dataset-example.bat
# 2. Visualize
scripts\visualize_data.bat
# 3. Export for training
uv run wildata dataset export my_dataset --format yolo
ROI Extraction Workflow
# 1. Import detection dataset
scripts\import-dataset-example.bat
# 2. Create ROI dataset
scripts\create-roi-dataset.bat
# 3. Visualize ROI dataset
uv run wildata visualize-dataset --dataset roi_dataset
GPS Management Workflow
# 1. Extract GPS from images
# (using WildDetect extract_gps.bat)
# 2. Update GPS if needed
cd wildata
scripts\update-gps-example.bat
# 3. Verify GPS data
# Check EXIF data in images
DVC Workflow
# Setup (once)
cd wildata
scripts\dvc-setup.bat
# After each dataset import
dvc add data\datasets\new_dataset
git add data\datasets\new_dataset.dvc
git commit -m "Add new dataset"
dvc push
# On other machines
git pull
dvc pull
Configuration Examples
Complete Import Config
# configs/import-config-example.yaml
source_path: "D:/annotations/dataset.json"
source_format: "coco"
dataset_name: "wildlife_train"
root: "D:/data"
split_name: "train"
processing_mode: "batch"
# Label Studio integration
ls_xml_config: "configs/label_studio_config.xml"
ls_parse_config: false
# ROI extraction
disable_roi: false
roi_config:
random_roi_count: 2
roi_box_size: 384
min_roi_size: 32
background_class: "background"
sample_background: true
# Transformations
transformations:
enable_bbox_clipping: true
bbox_clipping:
tolerance: 5
skip_invalid: false
enable_tiling: true
tiling:
tile_size: 800
stride: 640
min_visibility: 0.7
max_negative_tiles_in_negative_image: 2
dark_threshold: 0.7
enable_augmentation: false
augmentation:
rotation_range: [-45, 45]
probability: 1.0
num_transforms: 2
Troubleshooting
Import Fails
Issue: Dataset import fails with validation errors
Solutions:
1. Check source file format is correct
2. Verify all image paths are valid
3. Check bbox coordinates are within image bounds
4. Use --verbose flag for detailed errors
DVC Push Fails
Issue: Can't push data to remote
Solutions:
1. Verify remote credentials
2. Check network connection
3. Verify remote storage path exists
4. Use dvc remote list to check configuration
API Won't Start
Issue: API server fails to start
Solutions:
1. Check port 8441 is not in use
2. Verify .env file configuration
3. Check all dependencies installed
4. Look at error logs
Out of Memory
Issue: Import fails with memory error
Solutions:
1. Use processing_mode: "streaming"
2. Reduce number of parallel workers
3. Process datasets one at a time
4. Disable transformations temporarily