WilData Configuration Reference
Documentation for all WilData configuration files used in data management operations.
Configuration Files
| File | Purpose |
|---|---|
| import-config-example.yaml | Dataset import configuration |
| bulk-import-*.yaml | Bulk import configurations |
| roi-create-config.yaml | ROI dataset creation |
| bulk-roi-create-config.yaml | Bulk ROI creation |
| gps-update-config-example.yaml | GPS metadata update |
| label_studio_config.xml | Label Studio interface |
import-config-example.yaml
Purpose: Configure dataset import with transformations
Location: wildata/configs/import-config-example.yaml
Complete Configuration
# Required: Source Information
source_path: "D:/annotations/dataset.json"
source_format: "coco" # coco, yolo, ls (Label Studio)
dataset_name: "my_dataset"
# Pipeline Configuration
root: "D:/data"
split_name: "train" # train, val, test
enable_dvc: false
processing_mode: "batch" # batch, streaming
track_with_dvc: false
bbox_tolerance: 5
# Label Studio Options (for ls format)
dotenv_path: ".env"
ls_xml_config: "configs/label_studio_config.xml"
ls_parse_config: false
# ROI Configuration
disable_roi: false
roi_config:
random_roi_count: 2 # Background samples per image
roi_box_size: 384 # ROI size (pixels)
min_roi_size: 32 # Minimum object size
dark_threshold: 0.7 # Dark image threshold
background_class: "background"
save_format: "jpg" # jpg, png
quality: 95 # JPEG quality
sample_background: true # Sample background regions
# Transformation Pipeline
transformations:
# Bbox Clipping
enable_bbox_clipping: true
bbox_clipping:
tolerance: 5 # Pixels outside image allowed
skip_invalid: false # Skip invalid bboxes
# Data Augmentation
enable_augmentation: false
augmentation:
rotation_range: [-45, 45] # Rotation degrees
probability: 1.0 # Augmentation probability
brightness_range: [-0.2, 0.4]
scale: [1.0, 2.0]
translate: [-0.1, 0.2]
shear: [-5, 5]
contrast_range: [-0.2, 0.4]
noise_std: [0.01, 0.1]
seed: 41
num_transforms: 2 # Augmentations per image
# Image Tiling
enable_tiling: true
tiling:
tile_size: 800
stride: 640 # Tile stride
min_visibility: 0.7 # Min object visibility
max_negative_tiles_in_negative_image: 2
negative_positive_ratio: 1.0
dark_threshold: 0.7
bulk-import Configs
Purpose: Configure batch import of multiple datasets
Files:
- bulk-import-train.yaml
- bulk-import-val.yaml
- bulk-import-config-example.yaml
Configuration Format
source_paths:
- "D:/annotations/dataset1.json"
- "D:/annotations/dataset2.json"
- "D:/annotations/dataset3.json"
source_format: "coco"
root: "D:/data"
split_name: "train"
# Shared settings (same as import-config)
processing_mode: "batch"
bbox_tolerance: 5
transformations:
enable_tiling: true
tiling:
tile_size: 800
stride: 640
roi-create-config.yaml
Purpose: Configure ROI dataset creation
Location: wildata/configs/roi-create-config.yaml
Configuration
source_path: "annotations.json"
source_format: "coco"
dataset_name: "roi_dataset"
root: "data"
split_name: "val" # Usually val or test
bbox_tolerance: 5
roi_config:
roi_box_size: 128 # ROI crop size
min_roi_size: 32 # Min object size to extract
random_roi_count: 10 # Background samples per image
dark_threshold: 0.7
background_class: "background"
save_format: "jpg"
quality: 95
padding: 10 # Padding around object
sample_background: true
# Advanced options
aspect_ratio_range: [0.5, 2.0] # Valid aspect ratios
min_object_area: 32 # Min area (pixels²)
ls_xml_config: null
ls_parse_config: false
draw_original_bboxes: false
bulk-roi-create-config.yaml
Purpose: Bulk ROI dataset creation
Configuration
source_paths:
- "dataset1.json"
- "dataset2.json"
source_format: "coco"
split_name: "val"
roi_config:
roi_box_size: 128
random_roi_count: 5
background_class: "background"
gps-update-config-example.yaml
Purpose: Update image GPS from CSV
Location: wildata/configs/gps-update-config-example.yaml
Configuration
image_folder: "D:/images/"
csv_path: "gps_coordinates.csv"
output_dir: "D:/images_with_gps/"
# CSV Parsing
skip_rows: 0
filename_col: "filename"
lat_col: "latitude"
lon_col: "longitude"
alt_col: "altitude"
# Options
overwrite_existing: false # Overwrite existing GPS
create_backup: true # Backup original files
validate_coordinates: true # Validate GPS coordinates
CSV Format
filename,latitude,longitude,altitude
image001.jpg,40.7128,-74.0060,10.5
image002.jpg,40.7589,-73.9851,15.2
label_studio_config.xml
Purpose: Label Studio annotation interface configuration
Location: wildata/configs/label_studio_config.xml
Example Configuration
<View>
<Image name="image" value="$image"/>
<RectangleLabels name="label" toName="image">
<Label value="elephant" background="red"/>
<Label value="giraffe" background="blue"/>
<Label value="zebra" background="green"/>
<Label value="buffalo" background="yellow"/>
</RectangleLabels>
</View>
Configuration Examples
Import COCO with Tiling
source_path: "D:/coco/annotations.json"
source_format: "coco"
dataset_name: "wildlife_tiled"
root: "D:/data"
split_name: "train"
transformations:
enable_tiling: true
tiling:
tile_size: 800
stride: 640
min_visibility: 0.7
Import Label Studio
source_path: "D:/label_studio/export.json"
source_format: "ls"
dataset_name: "annotated_data"
ls_xml_config: "configs/label_studio_config.xml"
ls_parse_config: true
roi_config:
roi_box_size: 128
random_roi_count: 5
Import with Full Pipeline
source_path: "raw_annotations.json"
source_format: "coco"
dataset_name: "processed_dataset"
transformations:
enable_bbox_clipping: true
enable_tiling: true
tiling:
tile_size: 800
stride: 640
enable_augmentation: true
augmentation:
num_transforms: 2
probability: 0.8
roi_config:
roi_box_size: 384
random_roi_count: 10
Best Practices
- Use absolute paths for cross-platform compatibility
- Enable bbox_clipping to fix annotation errors
- Tile large images for better training
- Sample background ROIs for balanced datasets
- Version control configuration changes