5.9 KiB
AGENTS.md
Worldwide PhenoCam EFAST feasibility screening. Human summary: README.md. License: MIT (LICENSE)
Layout
| Path | Role |
|---|---|
1-phenocam.py |
Step 1: download PhenoCam metadata + one_day_summary CSV |
2-phenocam-screening.py |
Step 2: PhenoCam + SNR gates on cached CSVs |
3-sentinel-data.py |
Step 3: S2 (Earth Search COG) + S3 (CDSE OpenEO) download + EFAST prep |
4-fusion.py |
Step 4: GCC computation + EFAST BtI/ItB fusion loop |
5-metrics.py |
Step 5: timeseries, covariates, metrics.json, webapp manifest |
data/ |
Manifests, per-site caches, screening outputs (large; mostly generated) |
index.html, common.js |
Static QA viewer (make serve from workspace root) |
Workspace orchestration: ../AGENTS.md.
Where to work
| Task | Location |
|---|---|
| PhenoCam bulk download | 1-phenocam.py |
| GCC/SNR screening on disk | 2-phenocam-screening.py |
| S2/S3 download + EFAST prep | 3-sentinel-data.py |
| GCC + fusion | 4-fusion.py |
| Metrics + webapp index | 5-metrics.py |
| Web QA | ../Makefile target serve → index.html |
Setup
Preferred (uv): from processing/:
uv sync # all deps from pyproject.toml (incl. efast)
Run any script as uv run python <script>.py …. Python version is pinned in .python-version (3.11.10).
CDSE_USER— Copernicus Data Space usernameCDSE_PASSWORD— Copernicus Data Space password
Set in ../.env at the workspace root (not under processing/). Required for step 3 S3 download (CDSE OpenEO). Step 3 S2 download uses AWS Earth Search (no auth).
CLI convention
Every numbered step script shares two user-facing flags:
| Flag | Default | Role |
|---|---|---|
--evaluation-year |
2025 |
Calendar year; input/output paths under data/ use {year} |
--site |
all eligible | Single sitename to limit scope (testing or single-site runs) |
All other tunable parameters (bands, resolution ratio, compositing window, etc.) are public constants at the top of each script. Paths are derived from the year — do not pass manifest paths on the CLI. Each script docstring lists Inputs and Outputs under data/.
Resume behaviour: step 3 skips S3 sites when raw/s3/S3*.tif already exist; step 3 skips S2 scenes when *_REFL.tif already exists. Step 4 skips GCC/fusion files that already exist. Step 5 overwrites JSON sidecars for processed sites.
Example:
uv run python 3-sentinel-data.py --evaluation-year 2025 --site ICOSFR-Fon1
uv run python 4-fusion.py --evaluation-year 2025 --site ICOSFR-Fon1
uv run python 5-metrics.py --evaluation-year 2025 --site ICOSFR-Fon1
Workflow
Stepped pipeline (resumable)
uv run python 1-phenocam.py --evaluation-year 2025
uv run python 2-phenocam-screening.py --evaluation-year 2025
uv run python 3-sentinel-data.py --evaluation-year 2025
uv run python 4-fusion.py --evaluation-year 2025
uv run python 5-metrics.py --evaluation-year 2025
# single site
uv run python 3-sentinel-data.py --evaluation-year 2025 --site ICOSFR-Fon1
uv run python 4-fusion.py --evaluation-year 2025 --site ICOSFR-Fon1
uv run python 5-metrics.py --evaluation-year 2025 --site ICOSFR-Fon1
S3 uses CDSE OpenEO collection SENTINEL3_SYN_L2_SYN (bands Oa04/Oa06/Oa08/Oa17). S2 uses AWS Earth Search COG range reads (no auth). No S2↔S3 radiometric harmonisation.
Screening gates
Step 2 (2-phenocam-screening.py)
| Gate | Rule |
|---|---|
phenocam |
ROI + one_day_summary CSV; ≥ MIN_GCC_POINTS (30) valid gcc_90 in evaluation year |
snr |
AIC-selected cubic spline SNR ≥ SNR_THRESHOLD (2.0) |
cluster |
SNR-passed sites within 500 m deduplicated; keep highest n_gcc_points (SNR tie-break) |
Data layout
Naming: data/ paths follow step script names — 1-phenocam.py → phenocam/, 2-phenocam-screening.py → phenocam_screening/, 3-sentinel-data.py → sentinel_data/, 4-fusion.py → fusion/, 5-metrics.py → metrics/.
data/
phenocam/
{year}.json # step-1 manifest
{year}/
{sitename}.json # camera + ROI API payload
{sitename}_1day.csv # raw PhenoCam summary CSV
phenocam_screening/
{year}.json # step-2 results
{year}.csv
sentinel_data/{year}/{sitename}/
raw/s3/ # step 3: S3 SYN L2 per-date GeoTIFFs
prepared/s2/ # step 3: *_REFL.tif, *_DIST_CLOUD.tif, *_GCC.tif
prepared/s3/ # step 3: composite_*.tif
prepared/gcc_s3/ # step 4: single-band GCC composites
data.json # step-3 run summary
fusion/{year}/{sitename}/
bti/fusion/REFL_*.tif # step 4: BtI fused reflectance
bti/gcc/GCC_*.tif # step 4: BtI GCC
itb/s2/GCC_*.tif # step 4: S2 GCC (ItB stack)
itb/s3/GCC_*.tif # step 4: S3 GCC (ItB stack)
itb/fusion/GCC_*.tif # step 4: ItB fused GCC
metrics/
manifest.json # step 5: years + site metadata for webapp
{year}/{sitename}/
gcc_*.json, metrics.json, covariates.json, rasters_*.json, bands_*.json
Module map
| File | Responsibility |
|---|---|
1-phenocam.py |
Paginate PhenoCam API; cache JSON + CSV; write manifest |
2-phenocam-screening.py |
Parse cached CSVs; PhenoCam + SNR gates |
3-sentinel-data.py |
S2 COG range reads (Earth Search); S3 OpenEO download; EFAST REFL/DIST_CLOUD/composites |
4-fusion.py |
GCC from S2 REFL + S3 composites; daily efast.fusion BtI + ItB |
5-metrics.py |
PhenoCam-matched GCC series, baselines, fusion metrics, raster index, covariates |