efast-phenocam-validation/README.md
2026-05-17 15:55:15 +02:00

6.4 KiB
Raw Blame History

Satellite Data Fusion Pipeline

Python pipeline for downloading Sentinel-2 and Sentinel-3 imagery and PhenoCam ground truth, applying NDVI-based cloud pre-selection, fusing sensors with the EFAST algorithm, and evaluating fused Green Chromatic Coordinate (GCC) time series against PhenoCam gcc_90.

Features

  • Acquisition — S2 L2A (AWS Element84 STAC), S3 OLCI L1B (Copernicus OpenEO), PhenoCam midday images and GCC CSV
  • Pre-selection — Aggressive and non-aggressive NDVI-based cloud screening (plus dark-scene rejection)
  • Preparation — Harmonised reflectance/GCC rasters, distance-to-cloud weights, S3 compositing and optional temporal smoothing
  • Fusion — EFAST under eight scenarios per site (BtI and ItB × two strategies × σ ∈ {20, 30} days)
  • Post-processing — Crop to valid-data window; NDVI and GCC timeseries at the site
  • Metrics — Temporal comparison vs PhenoCam (metrics.json); optional Tier-2 withheld-S2 gap validation
  • Web viewer — Static HTML dashboard over pipeline outputs (webapp/)

Installation

pip install -r requirements.txt
pip install git+https://github.com/DHI-GRAS/efast.git   # not on PyPI

Create .env with Copernicus Data Space credentials:

  • CDSE_USER
  • CDSE_PASSWORD

Python version is pinned in .python-version (use .venv/ locally).

Usage

from run import run_pipeline

run_pipeline(season=2024, site_position=(47.116171, 11.320308), site_name="innsbruck")

site_position is always (lat, lon). Study sites are listed at the bottom of run.py: innsbruck, forthgr, pitsalu, vindeln2, sunflowerjerez1, institutekarnobat.

By default, most stages in run.py are commented out (metrics-only). Uncomment acquisition → pre-selection → preparation → fusion → post-processing for a full run.

Pipeline stages

  1. Download S2, S3, and PhenoCam
  2. Pre-selection (per-sensor NDVI screening → raw/preselection/)
  3. Prepare S2/S3 for each strategy (prepared_{aggressive|nonaggressive}/ and _itb/ variants)
  4. EFAST fusion (BtI reflectance and ItB GCC products)
  5. Post-process crops and timeseries (processed_*_sigma{20,30}/)
  6. Compute metrics vs PhenoCam → metrics.json

Gap validation (optional)

With prepared data and EFAST installed:

# Phenology sidecars (TIMESAT 50 % amplitude)
python -m phenology_timesat --all

# Spatial NSE_S2 vs withheld S2 (unit test: Estonia peatland, 30 d, green-up)
python -m gap_validation.run --site pitsalu --season 2024 --lat 58.5633 --lon 24.3688 \
  --strategy aggressive --sigma 20 --mode bti --transition green_up --gap-days 30

# All six sites, best BtI scenario per site
python -m gap_validation.batch_spatial

# Full-season NSE_PC on gap-degraded stack (slow)
python -m gap_validation.temporal_pc --site pitsalu --season 2024 --lat 58.5633 --lon 24.3688
python -m gap_validation.batch_temporal

# TIMESAT day-offsets on gap fusion vs PhenoCam (needs temporal tier)
python -m gap_validation.phenology_offsets

Writes gap_manifest.json, gap_withheld_images.json, gap_validation_summary.json (spatial), and optionally gap_metrics.json (temporal). Masked fusion under validation/fusion/gap_{N}_{transition}/. See python -m gap_validation.run --help.

Data layout

data/{site_name}/{season}/
  raw/
    s2/                    # {YYYYMMDD}_{n}.geotiff — B02, B03, B04, B8A
    s3/                    # {YYYYMMDD}_{n}.geotiff — Oa04, Oa06, Oa08, Oa17
    phenocam/              # JPEGs, GCC JSON, phenology sidecar
    preselection/          # {s2,s3}_preselection.{json,csv}
  prepared_{strategy}/
    s2/                    # REFL + DIST_CLOUD GeoTIFFs
    s3/                    # composite_{YYYYMMDD}.tif
    fusion/                # REFL_{YYYYMMDD}.tif (σ≈20)
    fusion_sigma30/        # REFL (σ=30)
  prepared_{strategy}_itb/
    s2/  s3/  fusion/      # GCC products (Index-then-Blend)
  processed_{strategy}_sigma{20,30}/
    s2/  s3/  fusion/      # cropped {YYYYMMDD}_0.geotiff
    gcc/  ndvi/            # timeseries.json per source
  processed_{strategy}_itb_sigma{20,30}/
    s2/  s3/  fusion/  gcc/
  validation/            # gap experiment (when run)
  metrics.json

Site metadata: data/sites.geojson (six thesis sites). data/coweeta/ is local/legacy and not listed there.

File formats

Sentinel-2 — Multi-band GeoTIFF; bands [blue, green, red, nir]; VIEWING_ZENITH_ANGLE metadata; filename {YYYYMMDD}_{increment}.geotiff.

Sentinel-3 — Multi-band GeoTIFF; same band order; filename {YYYYMMDD}_{increment}.geotiff.

Prepared S2S2A_MSIL2A_{YYYYMMDD}_REFL.tif plus *DIST_CLOUD.tif (cloud-distance weights for EFAST).

Web viewer

Static HTML/JS in webapp/ — no build step. Shared GeoTIFF helpers: webapp/common.js. CDN: Leaflet, geotiff.js, proj4. Symlink: webapp/data../data.

Serve from the repository root (not webapp/):

python3 -m http.server 8000
# http://localhost:8000/webapp/index.html

Or from the workspace root: make serve.

Page Purpose Primary data paths
index.html Post-processed maps, NDVI/GCC timeseries, PhenoCam processed_{strategy}_sigma{n}/, raw/phenocam/
preselection.html Cloud-screening diagnostics raw/preselection/{s2,s3}_preselection.json
prepared.html Prepared REFL/GCC before crop prepared_{strategy}/, prepared_{strategy}_itb/
fusion.html EFAST daily fusion rasters prepared_*/fusion/, fusion_sigma30/
postprocessed.html Cropped processed stacks processed_*_sigma*/
metrics.html Tabular metrics.json (thesis export source) {site}/{season}/metrics.json under webapp/data/
gap_validation.html Withheld-S2 gap experiment {site}/{season}/validation/gap_validation_summary.json
phenology.html TIMESAT on PhenoCam GCC raw/phenocam/phenocam_phenology.json

Site/season dropdowns use data/sites.geojson. Map pages: BtI | ItB; scenarios aggressive / nonaggressive, σ 20 / 30. Keep the shared nav consistent across all eight pages. QA only — thesis tables are exported from the workspace root (make export or ../scripts/export_thesis_tables.py).

Development

ruff check --fix . && ruff format .

Pre-commit hooks: .pre-commit-config.yaml.

License

GNU Affero General Public License v3.0 (AGPL-3.0). See LICENSE.