efast-phenocam-validation/AGENTS.md
2026-06-10 14:18:06 +02:00

151 lines
5.8 KiB
Markdown

# AGENTS.md
Worldwide PhenoCam EFAST feasibility screening. Human summary: [`README.md`](README.md).
---
## Layout
| Path | Role |
|------|------|
| `1-phenocam.py` | Step 1: download PhenoCam metadata + `one_day_summary` CSV |
| `2-phenocam-screening.py` | Step 2: PhenoCam + SNR gates on cached CSVs |
| `3-sentinel-data.py` | Step 3: S2 (Earth Search COG) + S3 (CDSE OpenEO) download + EFAST prep |
| `4-fusion.py` | Step 4: GCC computation + EFAST BtI/ItB fusion loop |
| `5-metrics.py` | Step 5: timeseries, covariates, `metrics.json`, webapp manifest |
| `data/` | Manifests, per-site caches, screening outputs (large; mostly generated) |
| `webapp/` | Static QA viewer (`make serve` from workspace root) |
Workspace orchestration: [`../AGENTS.md`](../AGENTS.md).
---
## Where to work
| Task | Location |
|------|----------|
| PhenoCam bulk download | `1-phenocam.py` |
| GCC/SNR screening on disk | `2-phenocam-screening.py` |
| S2/S3 download + EFAST prep | `3-sentinel-data.py` |
| GCC + fusion | `4-fusion.py` |
| Metrics + webapp index | `5-metrics.py` |
| Web QA | `../Makefile` target `serve``webapp/index.html` |
---
## Setup
**Preferred (uv):** from `processing/`:
```bash
uv sync # all deps from pyproject.toml (incl. efast)
```
Run any script as `uv run python <script>.py …`. Python version is pinned in `.python-version` (3.11.10).
- `CDSE_USER` — Copernicus Data Space username
- `CDSE_PASSWORD` — Copernicus Data Space password
Required for step 3 S3 download (CDSE OpenEO). Step 3 S2 download uses AWS Earth Search (no auth).
---
## CLI convention
Every numbered step script shares two user-facing flags:
| Flag | Default | Role |
|------|---------|------|
| `--evaluation-year` | `2025` | Calendar year; input/output paths under `data/` use `{year}` |
| `--site` | all eligible | Single sitename to limit scope (testing or single-site runs) |
All other tunable parameters (bands, resolution ratio, compositing window, etc.) are public constants at the top of each script. Paths are derived from the year — do not pass manifest paths on the CLI. Each script docstring lists **Inputs** and **Outputs** under `data/`.
Resume behaviour: step 3 skips S3 sites when `raw/s3/S3*.tif` already exist; step 3 skips S2 scenes when `*_REFL.tif` already exists. Step 4 skips GCC/fusion files that already exist. Step 5 overwrites JSON sidecars for processed sites.
Example:
```bash
uv run python 3-sentinel-data.py --evaluation-year 2025 --site ICOSFR-Fon1
uv run python 4-fusion.py --evaluation-year 2025 --site ICOSFR-Fon1
uv run python 5-metrics.py --evaluation-year 2025 --site ICOSFR-Fon1
```
---
## Workflow
### Stepped pipeline (resumable)
```bash
uv run python 1-phenocam.py --evaluation-year 2025
uv run python 2-phenocam-screening.py --evaluation-year 2025
uv run python 3-sentinel-data.py --evaluation-year 2025
uv run python 4-fusion.py --evaluation-year 2025
uv run python 5-metrics.py --evaluation-year 2025
# single site
uv run python 3-sentinel-data.py --evaluation-year 2025 --site ICOSFR-Fon1
uv run python 4-fusion.py --evaluation-year 2025 --site ICOSFR-Fon1
uv run python 5-metrics.py --evaluation-year 2025 --site ICOSFR-Fon1
```
S3 uses CDSE OpenEO collection `SENTINEL3_SYN_L2_SYN` (bands Oa04/Oa06/Oa08/Oa17). S2 uses AWS Earth Search COG range reads (no auth). No S2↔S3 radiometric harmonisation.
---
## Screening gates
### Step 2 (`2-phenocam-screening.py`)
| Gate | Rule |
|------|------|
| `phenocam` | ROI + `one_day_summary` CSV; ≥ `MIN_GCC_POINTS` (30) valid `gcc_90` in evaluation year |
| `snr` | AIC-selected cubic spline SNR ≥ `SNR_THRESHOLD` (2.0) |
| `cluster` | SNR-passed sites within 500 m deduplicated; keep highest `n_gcc_points` (SNR tie-break) |
---
## Data layout
**Naming:** `data/` paths follow step script names — `1-phenocam.py``phenocam/`, `2-phenocam-screening.py``phenocam_screening/`, `3-sentinel-data.py``sentinel_data/`, `4-fusion.py``fusion/`, `5-metrics.py``metrics/`.
```
data/
phenocam/
{year}.json # step-1 manifest
{year}/
{sitename}.json # camera + ROI API payload
{sitename}_1day.csv # raw PhenoCam summary CSV
phenocam_screening/
{year}.json # step-2 results
{year}.csv
sentinel_data/{year}/{sitename}/
raw/s3/ # step 3: S3 SYN L2 per-date GeoTIFFs
prepared/s2/ # step 3: *_REFL.tif, *_DIST_CLOUD.tif, *_GCC.tif
prepared/s3/ # step 3: composite_*.tif
prepared/gcc_s3/ # step 4: single-band GCC composites
data.json # step-3 run summary
fusion/{year}/{sitename}/
bti/fusion/REFL_*.tif # step 4: BtI fused reflectance
bti/gcc/GCC_*.tif # step 4: BtI GCC
itb/s2/GCC_*.tif # step 4: S2 GCC (ItB stack)
itb/s3/GCC_*.tif # step 4: S3 GCC (ItB stack)
itb/fusion/GCC_*.tif # step 4: ItB fused GCC
metrics/
manifest.json # step 5: years + site metadata for webapp
{year}/{sitename}/
gcc_*.json, metrics.json, covariates.json, rasters_*.json, bands_*.json
```
---
## Module map
| File | Responsibility |
|------|----------------|
| `1-phenocam.py` | Paginate PhenoCam API; cache JSON + CSV; write manifest |
| `2-phenocam-screening.py` | Parse cached CSVs; PhenoCam + SNR gates |
| `3-sentinel-data.py` | S2 COG range reads (Earth Search); S3 OpenEO download; EFAST REFL/DIST_CLOUD/composites |
| `4-fusion.py` | GCC from S2 REFL + S3 composites; daily `efast.fusion` BtI + ItB |
| `5-metrics.py` | PhenoCam-matched GCC series, baselines, fusion metrics, raster index, covariates |