Environment Setup
This guide covers setting up your environment for working with the WildDetect monorepo, including configuration files, environment variables, and external services.
Directory Structure
Create the following directory structure for your project:
your-project/
├── wildetect/ # Main package (cloned repo)
├── data/ # Data storage root
│ ├── raw/ # Original data
│ ├── processed/ # Processed datasets
│ └── exports/ # Exported datasets
├── models/ # Trained models
│ ├── detectors/
│ └── classifiers/
├── results/ # Detection results
│ ├── detections/
│ ├── census/
│ └── visualizations/
└── mlruns/ # MLflow experiment tracking
Environment Variables
Create .env File
Create a .env file in the root directory of each package:
WildDetect .env
# MLflow Configuration
MLFLOW_TRACKING_URI=http://localhost:5000
MLFLOW_EXPERIMENT_NAME=wilddetect
MODEL_REGISTRY_PATH=models/
# Data Paths
DATA_ROOT=D:/data/
RESULTS_ROOT=D:/results/
# Label Studio (Optional)
LABEL_STUDIO_URL=http://localhost:8080
LABEL_STUDIO_API_KEY=your_api_key_here
LABEL_STUDIO_PROJECT_ID=1
# FiftyOne (Optional)
FIFTYONE_DATABASE_DIR=D:/fiftyone/
FIFTYONE_DEFAULT_DATASET_DIR=D:/data/fiftyone/
# Inference Server
INFERENCE_SERVER_HOST=0.0.0.0
INFERENCE_SERVER_PORT=4141
# GPU Configuration
CUDA_VISIBLE_DEVICES=0
PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512
# Logging
LOG_LEVEL=INFO
LOG_FILE=logs/wildetect.log
WilData .env
# API Configuration
WILDATA_API_HOST=0.0.0.0
WILDATA_API_PORT=8441
WILDATA_API_DEBUG=false
# Data Storage
DATA_ROOT=D:/data/
DVC_REMOTE_URL=s3://my-bucket/datasets # or local path
# DVC Configuration
DVC_CACHE_DIR=D:/.dvc/cache/
# Label Studio Integration
LABEL_STUDIO_URL=http://localhost:8080
LABEL_STUDIO_API_KEY=your_api_key_here
# Processing
MAX_WORKERS=4
BATCH_SIZE=32
WildTrain .env
# MLflow Configuration
MLFLOW_TRACKING_URI=http://localhost:5000
MLFLOW_EXPERIMENT_NAME=wildtrain
# Training Paths
DATA_ROOT=D:/data/
MODEL_OUTPUT_DIR=D:/models/
CHECKPOINT_DIR=D:/checkpoints/
# Hyperparameter Tuning
OPTUNA_STORAGE=sqlite:///optuna.db
N_TRIALS=50
# Distributed Training (Optional)
MASTER_ADDR=localhost
MASTER_PORT=12355
WORLD_SIZE=1
RANK=0
# GPU Configuration
CUDA_VISIBLE_DEVICES=0,1 # Multiple GPUs
Loading Environment Variables
The packages automatically load .env files when using scripts:
# Scripts automatically load .env
scripts\run_detection.bat
# Or manually in Python
from dotenv import load_dotenv
load_dotenv()
External Services Setup
MLflow Tracking Server
MLflow is used for experiment tracking and model registry.
1. Start MLflow Server
# Launch using script
scripts\launch_mlflow.bat
# Or manually
mlflow server --host 0.0.0.0 --port 5000 --backend-store-uri sqlite:///mlflow.db
2. Access MLflow UI
Open browser to: http://localhost:5000
3. Configure in Code
import mlflow
mlflow.set_tracking_uri("http://localhost:5000")
mlflow.set_experiment("my_experiment")
Label Studio (Optional)
For data annotation and labeling.
1. Install Label Studio
2. Start Server
3. Create Project
- Navigate to
http://localhost:8080 - Create new project
- Upload images
- Configure labeling interface (use provided XML configs)
4. Get API Key
- Go to Account & Settings
- Copy your API token
- Add to
.envfile
FiftyOne (Dataset Visualization)
Interactive dataset viewer and analyzer.
1. Install FiftyOne
2. Launch Viewer
# Launch using script
scripts\launch_fiftyone.bat
# Or using CLI
wildetect fiftyone --action launch --dataset my_dataset
3. Configure Database
# Set database directory
fiftyone config database_dir D:/fiftyone/db
# Set default dataset directory
fiftyone config default_dataset_dir D:/data/fiftyone
DVC (Data Version Control)
For versioning large datasets.
1. Initialize DVC
cd wildata
scripts\dvc-setup.bat
# Or manually
dvc init
dvc remote add -d myremote s3://my-bucket/datasets
2. Configure Remote Storage
3. Track Data
# Add data to DVC
dvc add data/raw/
# Commit changes
git add data/raw.dvc .gitignore
git commit -m "Add raw data"
# Push to remote
dvc push
Configuration Files
WildDetect Configurations
Location: config/
detection.yaml
Main detection configuration. Edit based on your needs:
model:
mlflow_model_name: "detector"
mlflow_model_alias: "production"
device: "cuda"
processing:
batch_size: 32
tile_size: 800
overlap_ratio: 0.2
See Detection Config Reference for all options.
census.yaml
Census campaign configuration:
campaign:
name: "Summer_2024"
target_species: ["elephant", "giraffe", "zebra"]
flight_specs:
flight_height: 120.0
gsd: 2.38
WilData Configurations
Location: wildata/configs/
import-config-example.yaml
Dataset import configuration:
source_path: "annotations.json"
source_format: "coco"
dataset_name: "my_dataset"
transformations:
enable_tiling: true
tiling:
tile_size: 800
stride: 640
See WilData Configs Reference for configuration details.
WildTrain Configurations
Location: wildtrain/configs/
Training Config
For model training:
# configs/classification/classification_train.yaml
model:
architecture: "resnet50"
num_classes: 10
training:
epochs: 100
batch_size: 32
learning_rate: 0.001
See WildTrain Configs Reference for configuration details.
GPU Configuration
CUDA Setup
Check CUDA Availability
python -c "import torch; print(f'CUDA: {torch.cuda.is_available()}')"
python -c "import torch; print(f'Device: {torch.cuda.get_device_name(0)}')"
Set GPU Device
In .env:
In config files:
Memory Management
For large models or images:
# In .env
PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512
# Or in Python
import os
os.environ['PYTORCH_CUDA_ALLOC_CONF'] = 'max_split_size_mb:512'
CPU-Only Setup
If you don't have a GPU:
- Install CPU-only PyTorch (see Installation)
- Set
device: "cpu"in all config files - Reduce batch sizes for memory efficiency
Testing Your Setup
Run System Info
This will display: - Python version - Package versions - CUDA availability - GPU information - Memory available
Test Detection
Test Data Import
Test Training
# Test training setup
cd wildtrain
wildtrain train classifier -c configs/classification/classification_train.yaml --dry-run
IDE Setup
VSCode Configuration
Create .vscode/settings.json:
{
"python.defaultInterpreterPath": ".venv/Scripts/python.exe",
"python.formatting.provider": "black",
"python.linting.enabled": true,
"python.linting.ruffEnabled": true,
"python.testing.pytestEnabled": true,
"python.testing.pytestArgs": [
"tests",
"-v"
],
"files.exclude": {
"**/__pycache__": true,
"**/*.pyc": true
}
}
Ruff Configuration
The project uses ruff for linting. Configuration is in pyproject.toml:
# Run ruff on all files
uv run ruff check src/ tests/
# Auto-fix issues
uv run ruff check --fix src/ tests/
Directory Permissions (Windows)
Ensure you have write permissions for: - Data directories - Model directories - Results directories - Log directories
Run PowerShell as Administrator if needed:
Troubleshooting
Common Issues
MLflow server won't start
Check if port 5000 is already in use:
Use a different port:
DVC push fails
Verify remote credentials:
Out of memory errors
Reduce batch size and tile size:
Next Steps
Now that your environment is set up:
- ✅ Test your setup with the commands above
- 📚 Follow the Quick Start Guide
- 🎯 Try an End-to-End Detection tutorial
Environment ready? Head to the Quick Start Guide to run your first detection!