Environment Setup

This guide covers setting up your environment for working with the WildDetect monorepo, including configuration files, environment variables, and external services.

Directory Structure

Create the following directory structure for your project:

your-project/
├── wildetect/          # Main package (cloned repo)
├── data/              # Data storage root
│   ├── raw/           # Original data
│   ├── processed/     # Processed datasets
│   └── exports/       # Exported datasets
├── models/            # Trained models
│   ├── detectors/
│   └── classifiers/
├── results/           # Detection results
│   ├── detections/
│   ├── census/
│   └── visualizations/
└── mlruns/            # MLflow experiment tracking

Environment Variables

Create .env File

Create a .env file in the root directory of each package:

WildDetect `.env`

# MLflow Configuration
MLFLOW_TRACKING_URI=http://localhost:5000
MLFLOW_EXPERIMENT_NAME=wilddetect
MODEL_REGISTRY_PATH=models/

# Data Paths
DATA_ROOT=D:/data/
RESULTS_ROOT=D:/results/

# Label Studio (Optional)
LABEL_STUDIO_URL=http://localhost:8080
LABEL_STUDIO_API_KEY=your_api_key_here
LABEL_STUDIO_PROJECT_ID=1

# FiftyOne (Optional)
FIFTYONE_DATABASE_DIR=D:/fiftyone/
FIFTYONE_DEFAULT_DATASET_DIR=D:/data/fiftyone/

# Inference Server
INFERENCE_SERVER_HOST=0.0.0.0
INFERENCE_SERVER_PORT=4141

# GPU Configuration
CUDA_VISIBLE_DEVICES=0
PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512

# Logging
LOG_LEVEL=INFO
LOG_FILE=logs/wildetect.log

WilData `.env`

# API Configuration
WILDATA_API_HOST=0.0.0.0
WILDATA_API_PORT=8441
WILDATA_API_DEBUG=false

# Data Storage
DATA_ROOT=D:/data/
DVC_REMOTE_URL=s3://my-bucket/datasets  # or local path

# DVC Configuration
DVC_CACHE_DIR=D:/.dvc/cache/

# Label Studio Integration
LABEL_STUDIO_URL=http://localhost:8080
LABEL_STUDIO_API_KEY=your_api_key_here

# Processing
MAX_WORKERS=4
BATCH_SIZE=32

WildTrain `.env`

# MLflow Configuration
MLFLOW_TRACKING_URI=http://localhost:5000
MLFLOW_EXPERIMENT_NAME=wildtrain

# Training Paths
DATA_ROOT=D:/data/
MODEL_OUTPUT_DIR=D:/models/
CHECKPOINT_DIR=D:/checkpoints/

# Hyperparameter Tuning
OPTUNA_STORAGE=sqlite:///optuna.db
N_TRIALS=50

# Distributed Training (Optional)
MASTER_ADDR=localhost
MASTER_PORT=12355
WORLD_SIZE=1
RANK=0

# GPU Configuration
CUDA_VISIBLE_DEVICES=0,1  # Multiple GPUs

Loading Environment Variables

The packages automatically load .env files when using scripts:

# Scripts automatically load .env
scripts\run_detection.bat

External Services Setup

MLflow Tracking Server

MLflow is used for experiment tracking and model registry.

1. Start MLflow Server

# Launch using script
scripts\launch_mlflow.bat

# Or manually
mlflow server --host 0.0.0.0 --port 5000 --backend-store-uri sqlite:///mlflow.db

2. Access MLflow UI

Open browser to: http://localhost:5000

Environment variables in .env or configuration files are used to configure MLflow tracking.

Label Studio (Optional)

For data annotation and labeling.

1. Install Label Studio

uv pip install label-studio

2. Start Server

# Launch using script
scripts\launch_labelstudio.bat

# Or manually
label-studio start --port 8080

3. Create Project

Navigate to http://localhost:8080
Create new project
Upload images
Configure labeling interface (use provided XML configs)

4. Get API Key

Go to Account & Settings
Copy your API token
Add to .env file

FiftyOne (Dataset Visualization)

Interactive dataset viewer and analyzer.

1. Install FiftyOne

uv pip install fiftyone

2. Launch Viewer

# Launch using script
scripts\launch_fiftyone.bat

# Or using CLI
wildetect fiftyone --action launch --dataset my_dataset

3. Configure Database

# Set database directory
fiftyone config database_dir D:/fiftyone/db

# Set default dataset directory
fiftyone config default_dataset_dir D:/data/fiftyone

DVC (Data Version Control)

For versioning large datasets.

1. Initialize DVC

cd wildata
scripts\dvc-setup.bat

# Or manually
dvc init
dvc remote add -d myremote s3://my-bucket/datasets

2. Configure Remote Storage

=== "Local Storage"

dvc remote add -d local D:/dvc-storage

=== "AWS S3"

dvc remote add -d s3remote s3://my-bucket/datasets

# Set credentials
dvc remote modify s3remote access_key_id YOUR_KEY
dvc remote modify s3remote secret_access_key YOUR_SECRET

=== "Google Cloud Storage"

dvc remote add -d gcs gs://my-bucket/datasets

# Set credentials
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/credentials.json

=== "Azure Blob"

dvc remote add -d azure azure://container/path

# Set credentials
export AZURE_STORAGE_CONNECTION_STRING=your_connection_string

3. Track Data

# Add data to DVC
dvc add data/raw/

# Commit changes
git add data/raw.dvc .gitignore
git commit -m "Add raw data"

# Push to remote
dvc push

Configuration Files

WildDetect Configurations

Location: config/

detection.yaml

Main detection configuration. Edit based on your needs:

model:
  mlflow_model_name: "detector"
  mlflow_model_alias: "production"
  device: "cuda"

processing:
  batch_size: 32
  tile_size: 800
  overlap_ratio: 0.2

See Detection Config Reference for all options.

census.yaml

Census campaign configuration:

campaign:
  name: "Summer_2024"
  target_species: ["elephant", "giraffe", "zebra"]

flight_specs:
  flight_height: 120.0
  gsd: 2.38

See Census Config Reference.

WilData Configurations

Location: wildata/configs/

import-config-example.yaml

Dataset import configuration:

source_path: "annotations.json"
source_format: "coco"
dataset_name: "my_dataset"

transformations:
  enable_tiling: true
  tiling:
    tile_size: 800
    stride: 640

See WilData Configs Reference for configuration details.

WildTrain Configurations

Location: wildtrain/configs/

Training Config

For model training:

# configs/classification/classification_train.yaml
model:
  architecture: "resnet50"
  num_classes: 10

training:
  epochs: 100
  batch_size: 32
  learning_rate: 0.001

See WildTrain Configs Reference for configuration details.

GPU Configuration

CUDA Setup

Use the wildetect info command to check CUDA availability:

wildetect info

Set GPU Device

In .env:

CUDA_VISIBLE_DEVICES=0  # Use first GPU
CUDA_VISIBLE_DEVICES=0,1  # Use first two GPUs

In config files:

device: "cuda"  # Use default GPU
device: "cuda:0"  # Specific GPU
device: "cpu"  # Force CPU

Configure memory management in your .env file:

# In .env
PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512

CPU-Only Setup

If you don't have a GPU:

Install CPU-only PyTorch (see Installation)
Set device: "cpu" in all config files
Reduce batch sizes for memory efficiency

Testing Your Setup

Run System Info

wildetect info

This will display:

Python version
Package versions
CUDA availability
GPU information
Memory available

Test Detection

# Test with a single image
wildetect detect test_image.jpg --model model.pt --output test_results/

Test Data Import

# Test data import
wildata import-dataset test_annotations.json --format coco --name test_dataset

Test Training

# Test training setup
cd wildtrain
wildtrain train classifier -c configs/classification/classification_train.yaml --dry-run

IDE Setup

VSCode Configuration

Create .vscode/settings.json:

{
  "python.defaultInterpreterPath": ".venv/Scripts/python.exe",
  "python.formatting.provider": "black",
  "python.linting.enabled": true,
  "python.linting.ruffEnabled": true,
  "python.testing.pytestEnabled": true,
  "python.testing.pytestArgs": [
    "tests",
    "-v"
  ],
  "files.exclude": {
    "**/__pycache__": true,
    "**/*.pyc": true
  }
}

Ruff Configuration

The project uses ruff for linting. Configuration is in pyproject.toml:

# Run ruff on all files
uv run ruff check src/ tests/

# Auto-fix issues
uv run ruff check --fix src/ tests/

Directory Permissions (Windows)

Ensure you have write permissions for:

Data directories
Model directories
Results directories
Log directories

Run PowerShell as Administrator if needed:

# Grant full control to current user
icacls "D:\data" /grant %USERNAME%:F /t

Troubleshooting

Common Issues

??? question "MLflow server won't start"

Check if port 5000 is already in use:

netstat -ano | findstr :5000

Use a different port:

mlflow server --port 5001

??? question "DVC push fails"

Verify remote credentials:

dvc remote list
dvc remote modify --local myremote access_key_id YOUR_KEY

??? question "Out of memory errors"

Reduce batch size and tile size:

processing:
  batch_size: 16  # Reduced
  tile_size: 640  # Reduced

??? question "Import errors"

Verify virtual environment is activated:

which python  # Should point to .venv

Next Steps

Now that your environment is set up:

✅ Test your setup with the commands above
📚 Follow the Quick Start Guide
🎯 Try an End-to-End Detection tutorial

Environment ready? Head to the Quick Start Guide to run your first detection!

Directory Structure​

Environment Variables​

Create .env File​

WildDetect .env​

WilData .env​

WildTrain .env​

Loading Environment Variables​

External Services Setup​

MLflow Tracking Server​

1. Start MLflow Server​

2. Access MLflow UI​

Label Studio (Optional)​

1. Install Label Studio​

2. Start Server​

3. Create Project​

4. Get API Key​

FiftyOne (Dataset Visualization)​

1. Install FiftyOne​

2. Launch Viewer​

3. Configure Database​

DVC (Data Version Control)​

1. Initialize DVC​

2. Configure Remote Storage​

3. Track Data​

Configuration Files​

WildDetect Configurations​

detection.yaml​

census.yaml​

WilData Configurations​

import-config-example.yaml​

WildTrain Configurations​

Training Config​

GPU Configuration​

CUDA Setup​

Set GPU Device​

CPU-Only Setup​

Testing Your Setup​

Run System Info​

Test Detection​

Test Data Import​

Test Training​

IDE Setup​

VSCode Configuration​

Ruff Configuration​

Directory Permissions (Windows)​

Troubleshooting​

Common Issues​

Next Steps​