149 lines
5.8 KiB
Markdown
149 lines
5.8 KiB
Markdown
# AGENTS.md
|
|
|
|
Worldwide PhenoCam EFAST feasibility screening. Human summary: [`README.md`](README.md). **License:** MIT ([`LICENSE`](LICENSE))
|
|
|
|
---
|
|
|
|
## Layout
|
|
|
|
| Path | Role |
|
|
|------|------|
|
|
| `1-phenocam.py` | Step 1: download PhenoCam metadata + `one_day_summary` CSV |
|
|
| `2-phenocam-screening.py` | Step 2: PhenoCam + SNR gates on cached CSVs |
|
|
| `3-sentinel-data.py` | Step 3: S2 (Earth Search COG) + S3 (CDSE OpenEO) download + EFAST prep |
|
|
| `4-fusion.py` | Step 4: GCC computation + EFAST BtI/ItB fusion loop |
|
|
| `5-metrics.py` | Step 5: timeseries, covariates, `metrics.json`, webapp manifest |
|
|
| `data/` | Manifests, per-site caches, screening outputs (large; mostly generated) |
|
|
| `index.html`, `common.js` | Static QA viewer (`make serve` from workspace root) |
|
|
|
|
---
|
|
|
|
## Where to work
|
|
|
|
| Task | Location |
|
|
|------|----------|
|
|
| PhenoCam bulk download | `1-phenocam.py` |
|
|
| GCC/SNR screening on disk | `2-phenocam-screening.py` |
|
|
| S2/S3 download + EFAST prep | `3-sentinel-data.py` |
|
|
| GCC + fusion | `4-fusion.py` |
|
|
| Metrics + webapp index | `5-metrics.py` |
|
|
| Web QA | `../Makefile` target `serve` → `index.html` |
|
|
|
|
---
|
|
|
|
## Setup
|
|
|
|
**Preferred (uv):** from `processing/`:
|
|
|
|
```bash
|
|
uv sync # all deps from pyproject.toml (incl. efast)
|
|
```
|
|
|
|
Run any script as `uv run python <script>.py …`. Python version is pinned in `.python-version` (3.11.10).
|
|
|
|
- `CDSE_USER` — Copernicus Data Space username
|
|
- `CDSE_PASSWORD` — Copernicus Data Space password
|
|
|
|
Set in `../.env` at the workspace root (not under `processing/`). Required for step 3 S3 download (CDSE OpenEO). Step 3 S2 download uses AWS Earth Search (no auth).
|
|
|
|
---
|
|
|
|
## CLI convention
|
|
|
|
Every numbered step script shares two user-facing flags:
|
|
|
|
| Flag | Default | Role |
|
|
|------|---------|------|
|
|
| `--evaluation-year` | `2025` | Calendar year; input/output paths under `data/` use `{year}` |
|
|
| `--site` | all eligible | Single sitename to limit scope (testing or single-site runs) |
|
|
|
|
All other tunable parameters (bands, resolution ratio, compositing window, etc.) are public constants at the top of each script. Paths are derived from the year — do not pass manifest paths on the CLI. Each script docstring lists **Inputs** and **Outputs** under `data/`.
|
|
|
|
Resume behaviour: step 3 skips S3 sites when `raw/s3/S3*.tif` already exist; step 3 skips S2 scenes when `*_REFL.tif` already exists. Step 4 skips GCC/fusion files that already exist. Step 5 overwrites JSON sidecars for processed sites.
|
|
|
|
Example:
|
|
|
|
```bash
|
|
uv run python 3-sentinel-data.py --evaluation-year 2025 --site ICOSFR-Fon1
|
|
uv run python 4-fusion.py --evaluation-year 2025 --site ICOSFR-Fon1
|
|
uv run python 5-metrics.py --evaluation-year 2025 --site ICOSFR-Fon1
|
|
```
|
|
|
|
---
|
|
|
|
## Workflow
|
|
|
|
### Stepped pipeline (resumable)
|
|
|
|
```bash
|
|
uv run python 1-phenocam.py --evaluation-year 2025
|
|
uv run python 2-phenocam-screening.py --evaluation-year 2025
|
|
uv run python 3-sentinel-data.py --evaluation-year 2025
|
|
uv run python 4-fusion.py --evaluation-year 2025
|
|
uv run python 5-metrics.py --evaluation-year 2025
|
|
|
|
# single site
|
|
uv run python 3-sentinel-data.py --evaluation-year 2025 --site ICOSFR-Fon1
|
|
uv run python 4-fusion.py --evaluation-year 2025 --site ICOSFR-Fon1
|
|
uv run python 5-metrics.py --evaluation-year 2025 --site ICOSFR-Fon1
|
|
```
|
|
|
|
S3 uses CDSE OpenEO collection `SENTINEL3_SYN_L2_SYN` (bands Oa04/Oa06/Oa08/Oa17). S2 uses AWS Earth Search COG range reads (no auth). No S2↔S3 radiometric harmonisation.
|
|
|
|
---
|
|
|
|
## Screening gates
|
|
|
|
### Step 2 (`2-phenocam-screening.py`)
|
|
|
|
| Gate | Rule |
|
|
|------|------|
|
|
| `phenocam` | ROI + `one_day_summary` CSV; ≥ `MIN_GCC_POINTS` (30) valid `gcc_90` in evaluation year |
|
|
| `snr` | AIC-selected cubic spline SNR ≥ `SNR_THRESHOLD` (2.0) |
|
|
| `cluster` | SNR-passed sites within 500 m deduplicated; keep highest `n_gcc_points` (SNR tie-break) |
|
|
|
|
---
|
|
|
|
## Data layout
|
|
|
|
**Naming:** `data/` paths follow step script names — `1-phenocam.py` → `phenocam/`, `2-phenocam-screening.py` → `phenocam_screening/`, `3-sentinel-data.py` → `sentinel_data/`, `4-fusion.py` → `fusion/`, `5-metrics.py` → `metrics/`.
|
|
|
|
```
|
|
data/
|
|
phenocam/
|
|
{year}.json # step-1 manifest
|
|
{year}/
|
|
{sitename}.json # camera + ROI API payload
|
|
{sitename}_1day.csv # raw PhenoCam summary CSV
|
|
phenocam_screening/
|
|
{year}.json # step-2 results
|
|
{year}.csv
|
|
sentinel_data/{year}/{sitename}/
|
|
raw/s3/ # step 3: S3 SYN L2 per-date GeoTIFFs
|
|
prepared/s2/ # step 3: *_REFL.tif, *_DIST_CLOUD.tif, *_GCC.tif
|
|
prepared/s3/ # step 3: composite_*.tif
|
|
prepared/gcc_s3/ # step 4: single-band GCC composites
|
|
data.json # step-3 run summary
|
|
fusion/{year}/{sitename}/
|
|
bti/fusion/REFL_*.tif # step 4: BtI fused reflectance
|
|
bti/gcc/GCC_*.tif # step 4: BtI GCC
|
|
itb/s2/GCC_*.tif # step 4: S2 GCC (ItB stack)
|
|
itb/s3/GCC_*.tif # step 4: S3 GCC (ItB stack)
|
|
itb/fusion/GCC_*.tif # step 4: ItB fused GCC
|
|
metrics/
|
|
manifest.json # step 5: years + site metadata for webapp
|
|
{year}/{sitename}/
|
|
gcc_*.json, metrics.json, covariates.json, rasters_*.json, bands_*.json
|
|
```
|
|
|
|
---
|
|
|
|
## Module map
|
|
|
|
| File | Responsibility |
|
|
|------|----------------|
|
|
| `1-phenocam.py` | Paginate PhenoCam API; cache JSON + CSV; write manifest |
|
|
| `2-phenocam-screening.py` | Parse cached CSVs; PhenoCam + SNR gates |
|
|
| `3-sentinel-data.py` | S2 COG range reads (Earth Search); S3 OpenEO download; EFAST REFL/DIST_CLOUD/composites |
|
|
| `4-fusion.py` | GCC from S2 REFL + S3 composites; daily `efast.fusion` BtI + ItB |
|
|
| `5-metrics.py` | PhenoCam-matched GCC series, baselines, fusion metrics, raster index, covariates |
|