efast-phenocam-validation/README.md
2026-06-17 12:29:35 +02:00

92 lines
4.4 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# EFAST fusion with phenocam validation.
End-to-end pipeline from selecting sites from the global [PhenoCam Network](https://phenocam.nau.edu/) to run [EFAST](https://github.com/DHI-GRAS/efast) spatio-temporal fusion with Sentinel-2 / Sentinel-3 and validate GCCs accross sensors. The numbered steps cover site selection, Sentinel data acquisition, different fusion orders, accuracy metrics, and sample-level statistics, all feeding a static web QA viewer.
---
## Pipeline overview
| Step | Script | What it does |
|------|--------|--------------|
| 1 | `1-phenocam.py` | Download PhenoCam metadata and `one_day_summary` GCC CSVs |
| 2 | `2-phenocam-screening.py` | Apply PhenoCam count, SNR, and proximity gates to select feasible sites |
| 3 | `3-sentinel-data.py` | Acquire S2 (Earth Search COG) and S3 OLCI SYN L2 (CDSE OpenEO); prepare REFL, DIST_CLOUD, and composite GeoTIFFs |
| 4 | `4-fusion.py` | Run EFAST BtI (fuse reflectance → GCC) and ItB (fuse GCC directly) for each screened site |
| 5 | `5-metrics.py` | Extract PhenoCam-matched timeseries, compute NSE/RMSE/r baselines and fusion metrics, emit per-site JSON and webapp manifest |
| 6 | `6-statistics-fusion-order.py` | Paired ItB-vs-BtI significance test (Wilcoxon + t-test) across all sites |
| 7 | `7-gcc-suitability.py` | PhenoCam GCC suitability as a fusion-accuracy reference (representativeness + LOOCV concordance) |
---
## Quick start
### Run pipeline wrapper (recommended)
```bash
uv sync
uv run python run-pipeline.py --evaluation-year 2025
```
Runs all five steps in order. Steps 1 and 2 are skipped when their output already exists. Each site in steps 35 is skipped when `data/metrics/{year}/{site}/metrics.json` is present. Any failure stops the run immediately, so one can fix the issue and re-run; completed work is never repeated.
```bash
# single site (steps 1 and 2 still skip if already done)
uv run python run-pipeline.py --evaluation-year 2025 --site ICOSFR-Fon1
```
### Step by step
```bash
uv sync
uv run python 1-phenocam.py --evaluation-year 2025
uv run python 2-phenocam-screening.py --evaluation-year 2025
uv run python 3-sentinel-data.py --evaluation-year 2025
uv run python 4-fusion.py --evaluation-year 2025
uv run python 5-metrics.py --evaluation-year 2025
uv run python 6-statistics-fusion-order.py --evaluation-year 2025
uv run python 7-gcc-suitability.py --evaluation-year 2025
```
Steps 15 accept `--evaluation-year` (default `2025`) and `--site` (optional, for single-site runs). Steps 67 are full-sample aggregates and only accept `--evaluation-year` (Step 6 and 7 also accept `--alpha`; Step 7 adds `--min-cloudfree-s2`, default `10`). Steps 35 are resumable — existing output files are skipped.
```bash
# single site
uv run python 3-sentinel-data.py --evaluation-year 2025 --site ICOSFR-Fon1
uv run python 4-fusion.py --evaluation-year 2025 --site ICOSFR-Fon1
uv run python 5-metrics.py --evaluation-year 2025 --site ICOSFR-Fon1
```
### Credentials
Step 3 S3 download uses CDSE OpenEO (`SENTINEL3_SYN_L2_SYN`). Set `CDSE_USER` and `CDSE_PASSWORD` in `../.env` at the workspace root. S2 uses AWS Earth Search COG range reads (no auth required).
---
## Outputs (under `data/`)
| Artifact | Step | Role |
|----------|------|------|
| `phenocam/{year}.json` | 1 | Site list + `sites_dir` pointer |
| `phenocam/{year}/{site}.json`, `{site}_1day.csv` | 1 | Raw API payload and GCC CSV |
| `phenocam_screening/{year}.json` / `.csv` | 2 | Gate results (pass/fail per site) |
| `sentinel_data/{year}/{site}/prepared/s2/` | 3 | S2 REFL + DIST_CLOUD GeoTIFFs |
| `sentinel_data/{year}/{site}/prepared/s3/` | 3 | S3 composite GeoTIFFs |
| `fusion/{year}/{site}/bti/`, `.../itb/` | 4 | BtI fused reflectance + GCC; ItB fused GCC |
| `metrics/{year}/{site}/` | 5 | Per-site timeseries, metrics, covariates JSON |
| `metrics/manifest.json` | 5 | Webapp manifest (years + site metadata) |
| `statistics_fusion_order/{year}.json` | 6 | Paired ItB-vs-BtI test summary (NSE, RMSE, nRMSE, r) |
| `gcc_suitability/{year}.json` | 7 | PhenoCam GCC suitability summary (representativeness + LOOCV concordance) |
---
## Web viewer
`python3 -m http.server 8080` runs the webapp on [http://localhost:8000/index.html](http://localhost:8000/index.html). Requires step 5 output (`data/metrics/manifest.json`). The Statistics overlay GCC suitability tab uses step 7 output (`data/gcc_suitability/{year}.json`).
---
## License
[MIT](LICENSE)