efast-phenocam-validation/AGENTS.md
2026-06-11 16:03:12 +02:00

5.9 KiB

AGENTS.md

Worldwide PhenoCam EFAST feasibility screening. Human summary: README.md. License: MIT (LICENSE)


Layout

Path Role
1-phenocam.py Step 1: download PhenoCam metadata + one_day_summary CSV
2-phenocam-screening.py Step 2: PhenoCam + SNR gates on cached CSVs
3-sentinel-data.py Step 3: S2 (Earth Search COG) + S3 (CDSE OpenEO) download + EFAST prep
4-fusion.py Step 4: GCC computation + EFAST BtI/ItB fusion loop
5-metrics.py Step 5: timeseries, covariates, metrics.json, webapp manifest
data/ Manifests, per-site caches, screening outputs (large; mostly generated)
index.html, common.js Static QA viewer (make serve from workspace root)

Workspace orchestration: ../AGENTS.md.


Where to work

Task Location
PhenoCam bulk download 1-phenocam.py
GCC/SNR screening on disk 2-phenocam-screening.py
S2/S3 download + EFAST prep 3-sentinel-data.py
GCC + fusion 4-fusion.py
Metrics + webapp index 5-metrics.py
Web QA ../Makefile target serveindex.html

Setup

Preferred (uv): from processing/:

uv sync                              # all deps from pyproject.toml (incl. efast)

Run any script as uv run python <script>.py …. Python version is pinned in .python-version (3.11.10).

  • CDSE_USER — Copernicus Data Space username
  • CDSE_PASSWORD — Copernicus Data Space password

Set in ../.env at the workspace root (not under processing/). Required for step 3 S3 download (CDSE OpenEO). Step 3 S2 download uses AWS Earth Search (no auth).


CLI convention

Every numbered step script shares two user-facing flags:

Flag Default Role
--evaluation-year 2025 Calendar year; input/output paths under data/ use {year}
--site all eligible Single sitename to limit scope (testing or single-site runs)

All other tunable parameters (bands, resolution ratio, compositing window, etc.) are public constants at the top of each script. Paths are derived from the year — do not pass manifest paths on the CLI. Each script docstring lists Inputs and Outputs under data/.

Resume behaviour: step 3 skips S3 sites when raw/s3/S3*.tif already exist; step 3 skips S2 scenes when *_REFL.tif already exists. Step 4 skips GCC/fusion files that already exist. Step 5 overwrites JSON sidecars for processed sites.

Example:

uv run python 3-sentinel-data.py --evaluation-year 2025 --site ICOSFR-Fon1
uv run python 4-fusion.py --evaluation-year 2025 --site ICOSFR-Fon1
uv run python 5-metrics.py --evaluation-year 2025 --site ICOSFR-Fon1

Workflow

Stepped pipeline (resumable)

uv run python 1-phenocam.py --evaluation-year 2025
uv run python 2-phenocam-screening.py --evaluation-year 2025
uv run python 3-sentinel-data.py --evaluation-year 2025
uv run python 4-fusion.py --evaluation-year 2025
uv run python 5-metrics.py --evaluation-year 2025

# single site
uv run python 3-sentinel-data.py --evaluation-year 2025 --site ICOSFR-Fon1
uv run python 4-fusion.py --evaluation-year 2025 --site ICOSFR-Fon1
uv run python 5-metrics.py --evaluation-year 2025 --site ICOSFR-Fon1

S3 uses CDSE OpenEO collection SENTINEL3_SYN_L2_SYN (bands Oa04/Oa06/Oa08/Oa17). S2 uses AWS Earth Search COG range reads (no auth). No S2↔S3 radiometric harmonisation.


Screening gates

Step 2 (2-phenocam-screening.py)

Gate Rule
phenocam ROI + one_day_summary CSV; ≥ MIN_GCC_POINTS (30) valid gcc_90 in evaluation year
snr AIC-selected cubic spline SNR ≥ SNR_THRESHOLD (2.0)
cluster SNR-passed sites within 500 m deduplicated; keep highest n_gcc_points (SNR tie-break)

Data layout

Naming: data/ paths follow step script names — 1-phenocam.pyphenocam/, 2-phenocam-screening.pyphenocam_screening/, 3-sentinel-data.pysentinel_data/, 4-fusion.pyfusion/, 5-metrics.pymetrics/.

data/
  phenocam/
    {year}.json                           # step-1 manifest
    {year}/
      {sitename}.json                     # camera + ROI API payload
      {sitename}_1day.csv                 # raw PhenoCam summary CSV
  phenocam_screening/
    {year}.json                           # step-2 results
    {year}.csv
  sentinel_data/{year}/{sitename}/
    raw/s3/                               # step 3: S3 SYN L2 per-date GeoTIFFs
    prepared/s2/                          # step 3: *_REFL.tif, *_DIST_CLOUD.tif, *_GCC.tif
    prepared/s3/                          # step 3: composite_*.tif
    prepared/gcc_s3/                      # step 4: single-band GCC composites
    data.json                             # step-3 run summary
  fusion/{year}/{sitename}/
    bti/fusion/REFL_*.tif                 # step 4: BtI fused reflectance
    bti/gcc/GCC_*.tif                     # step 4: BtI GCC
    itb/s2/GCC_*.tif                      # step 4: S2 GCC (ItB stack)
    itb/s3/GCC_*.tif                      # step 4: S3 GCC (ItB stack)
    itb/fusion/GCC_*.tif                  # step 4: ItB fused GCC
  metrics/
    manifest.json                         # step 5: years + site metadata for webapp
    {year}/{sitename}/
      gcc_*.json, metrics.json, covariates.json, rasters_*.json, bands_*.json

Module map

File Responsibility
1-phenocam.py Paginate PhenoCam API; cache JSON + CSV; write manifest
2-phenocam-screening.py Parse cached CSVs; PhenoCam + SNR gates
3-sentinel-data.py S2 COG range reads (Earth Search); S3 OpenEO download; EFAST REFL/DIST_CLOUD/composites
4-fusion.py GCC from S2 REFL + S3 composites; daily efast.fusion BtI + ItB
5-metrics.py PhenoCam-matched GCC series, baselines, fusion metrics, raster index, covariates