Environment Setup
This guide covers setting up your environment for working with the WildDetect monorepo, including configuration files, environment variables, and external services.
Directory Structureβ
Create the following directory structure for your project:
your-project/
βββ wildetect/ # Main package (cloned repo)
βββ data/ # Data storage root
β βββ raw/ # Original data
β βββ processed/ # Processed datasets
β βββ exports/ # Exported datasets
βββ models/ # Trained models
β βββ detectors/
β βββ classifiers/
βββ results/ # Detection results
β βββ detections/
β βββ census/
β βββ visualizations/
βββ mlruns/ # MLflow experiment tracking
Environment Variablesβ
Create .env Fileβ
Create a .env file in the root directory of each package:
WildDetect .envβ
# MLflow Configuration
MLFLOW_TRACKING_URI=http://localhost:5000
MLFLOW_EXPERIMENT_NAME=wilddetect
MODEL_REGISTRY_PATH=models/
# Data Paths
DATA_ROOT=D:/data/
RESULTS_ROOT=D:/results/
# Label Studio (Optional)
LABEL_STUDIO_URL=http://localhost:8080
LABEL_STUDIO_API_KEY=your_api_key_here
LABEL_STUDIO_PROJECT_ID=1
# FiftyOne (Optional)
FIFTYONE_DATABASE_DIR=D:/fiftyone/
FIFTYONE_DEFAULT_DATASET_DIR=D:/data/fiftyone/
# Inference Server
INFERENCE_SERVER_HOST=0.0.0.0
INFERENCE_SERVER_PORT=4141
# GPU Configuration
CUDA_VISIBLE_DEVICES=0
PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512
# Logging
LOG_LEVEL=INFO
LOG_FILE=logs/wildetect.log
WilData .envβ
# API Configuration
WILDATA_API_HOST=0.0.0.0
WILDATA_API_PORT=8441
WILDATA_API_DEBUG=false
# Data Storage
DATA_ROOT=D:/data/
DVC_REMOTE_URL=s3://my-bucket/datasets # or local path
# DVC Configuration
DVC_CACHE_DIR=D:/.dvc/cache/
# Label Studio Integration
LABEL_STUDIO_URL=http://localhost:8080
LABEL_STUDIO_API_KEY=your_api_key_here
# Processing
MAX_WORKERS=4
BATCH_SIZE=32
WildTrain .envβ
# MLflow Configuration
MLFLOW_TRACKING_URI=http://localhost:5000
MLFLOW_EXPERIMENT_NAME=wildtrain
# Training Paths
DATA_ROOT=D:/data/
MODEL_OUTPUT_DIR=D:/models/
CHECKPOINT_DIR=D:/checkpoints/
# Hyperparameter Tuning
OPTUNA_STORAGE=sqlite:///optuna.db
N_TRIALS=50
# Distributed Training (Optional)
MASTER_ADDR=localhost
MASTER_PORT=12355
WORLD_SIZE=1
RANK=0
# GPU Configuration
CUDA_VISIBLE_DEVICES=0,1 # Multiple GPUs
Loading Environment Variablesβ
The packages automatically load .env files when using scripts:
# Scripts automatically load .env
scripts\run_detection.bat
External Services Setupβ
MLflow Tracking Serverβ
MLflow is used for experiment tracking and model registry.
1. Start MLflow Serverβ
# Launch using script
scripts\launch_mlflow.bat
# Or manually
mlflow server --host 0.0.0.0 --port 5000 --backend-store-uri sqlite:///mlflow.db
2. Access MLflow UIβ
Open browser to: http://localhost:5000
Environment variables in .env or configuration files are used to configure MLflow tracking.
Label Studio (Optional)β
For data annotation and labeling.
1. Install Label Studioβ
uv pip install label-studio
2. Start Serverβ
# Launch using script
scripts\launch_labelstudio.bat
# Or manually
label-studio start --port 8080
3. Create Projectβ
- Navigate to
http://localhost:8080 - Create new project
- Upload images
- Configure labeling interface (use provided XML configs)
4. Get API Keyβ
- Go to Account & Settings
- Copy your API token
- Add to
.envfile
FiftyOne (Dataset Visualization)β
Interactive dataset viewer and analyzer.
1. Install FiftyOneβ
uv pip install fiftyone
2. Launch Viewerβ
# Launch using script
scripts\launch_fiftyone.bat
# Or using CLI
wildetect fiftyone --action launch --dataset my_dataset
3. Configure Databaseβ
# Set database directory
fiftyone config database_dir D:/fiftyone/db
# Set default dataset directory
fiftyone config default_dataset_dir D:/data/fiftyone
DVC (Data Version Control)β
For versioning large datasets.
1. Initialize DVCβ
cd wildata
scripts\dvc-setup.bat
# Or manually
dvc init
dvc remote add -d myremote s3://my-bucket/datasets
2. Configure Remote Storageβ
=== "Local Storage"
dvc remote add -d local D:/dvc-storage
=== "AWS S3"
dvc remote add -d s3remote s3://my-bucket/datasets
# Set credentials
dvc remote modify s3remote access_key_id YOUR_KEY
dvc remote modify s3remote secret_access_key YOUR_SECRET
=== "Google Cloud Storage"
dvc remote add -d gcs gs://my-bucket/datasets
# Set credentials
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/credentials.json
=== "Azure Blob"
dvc remote add -d azure azure://container/path
# Set credentials
export AZURE_STORAGE_CONNECTION_STRING=your_connection_string
3. Track Dataβ
# Add data to DVC
dvc add data/raw/
# Commit changes
git add data/raw.dvc .gitignore
git commit -m "Add raw data"
# Push to remote
dvc push
Configuration Filesβ
WildDetect Configurationsβ
Location: config/
detection.yamlβ
Main detection configuration. Edit based on your needs:
model:
mlflow_model_name: "detector"
mlflow_model_alias: "production"
device: "cuda"
processing:
batch_size: 32
tile_size: 800
overlap_ratio: 0.2
See Detection Config Reference for all options.
census.yamlβ
Census campaign configuration:
campaign:
name: "Summer_2024"
target_species: ["elephant", "giraffe", "zebra"]
flight_specs:
flight_height: 120.0
gsd: 2.38
WilData Configurationsβ
Location: wildata/configs/
import-config-example.yamlβ
Dataset import configuration:
source_path: "annotations.json"
source_format: "coco"
dataset_name: "my_dataset"
transformations:
enable_tiling: true
tiling:
tile_size: 800
stride: 640
See WilData Configs Reference for configuration details.
WildTrain Configurationsβ
Location: wildtrain/configs/
Training Configβ
For model training:
# configs/classification/classification_train.yaml
model:
architecture: "resnet50"
num_classes: 10
training:
epochs: 100
batch_size: 32
learning_rate: 0.001
See WildTrain Configs Reference for configuration details.
GPU Configurationβ
CUDA Setupβ
Use the wildetect info command to check CUDA availability:
wildetect info
Set GPU Deviceβ
In .env:
CUDA_VISIBLE_DEVICES=0 # Use first GPU
CUDA_VISIBLE_DEVICES=0,1 # Use first two GPUs
In config files:
device: "cuda" # Use default GPU
device: "cuda:0" # Specific GPU
device: "cpu" # Force CPU
Configure memory management in your .env file:
# In .env
PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512
CPU-Only Setupβ
If you don't have a GPU:
- Install CPU-only PyTorch (see Installation)
- Set
device: "cpu"in all config files - Reduce batch sizes for memory efficiency
Testing Your Setupβ
Run System Infoβ
wildetect info
This will display:
- Python version
- Package versions
- CUDA availability
- GPU information
- Memory available
Test Detectionβ
# Test with a single image
wildetect detect test_image.jpg --model model.pt --output test_results/
Test Data Importβ
# Test data import
wildata import-dataset test_annotations.json --format coco --name test_dataset
Test Trainingβ
# Test training setup
cd wildtrain
wildtrain train classifier -c configs/classification/classification_train.yaml --dry-run
IDE Setupβ
VSCode Configurationβ
Create .vscode/settings.json:
{
"python.defaultInterpreterPath": ".venv/Scripts/python.exe",
"python.formatting.provider": "black",
"python.linting.enabled": true,
"python.linting.ruffEnabled": true,
"python.testing.pytestEnabled": true,
"python.testing.pytestArgs": [
"tests",
"-v"
],
"files.exclude": {
"**/__pycache__": true,
"**/*.pyc": true
}
}
Ruff Configurationβ
The project uses ruff for linting. Configuration is in pyproject.toml:
# Run ruff on all files
uv run ruff check src/ tests/
# Auto-fix issues
uv run ruff check --fix src/ tests/
Directory Permissions (Windows)β
Ensure you have write permissions for:
- Data directories
- Model directories
- Results directories
- Log directories
Run PowerShell as Administrator if needed:
# Grant full control to current user
icacls "D:\data" /grant %USERNAME%:F /t
Troubleshootingβ
Common Issuesβ
??? question "MLflow server won't start"
Check if port 5000 is already in use:
netstat -ano | findstr :5000
Use a different port:
mlflow server --port 5001
??? question "DVC push fails"
Verify remote credentials:
dvc remote list
dvc remote modify --local myremote access_key_id YOUR_KEY
??? question "Out of memory errors"
Reduce batch size and tile size:
processing:
batch_size: 16 # Reduced
tile_size: 640 # Reduced
??? question "Import errors"
Verify virtual environment is activated:
which python # Should point to .venv
Next Stepsβ
Now that your environment is set up:
- β Test your setup with the commands above
- π Follow the Quick Start Guide
- π― Try an End-to-End Detection tutorial
Environment ready? Head to the Quick Start Guide to run your first detection!