Switching horses.
This commit is contained in:
parent
25cbd97662
commit
e3e14027fc
51 changed files with 5078 additions and 11678 deletions
159
README.md
159
README.md
|
|
@ -1,146 +1,57 @@
|
|||
# Satellite Data Fusion Pipeline
|
||||
# Worldwide PhenoCam EFAST feasibility screening
|
||||
|
||||
Python pipeline for downloading Sentinel-2 and Sentinel-3 imagery and PhenoCam ground truth, applying NDVI-based cloud pre-selection, fusing sensors with the [EFAST](https://github.com/DHI-GRAS/efast) algorithm, and evaluating fused **Green Chromatic Coordinate (GCC)** time series against PhenoCam `gcc_90`.
|
||||
Screen the global [PhenoCam Network](https://phenocam.nau.edu/) for sites where EFAST Sentinel-2 / Sentinel-3 fusion is likely to work: enough PhenoCam `gcc_90`, seasonal signal, and S2/S3 coverage for a calendar year.
|
||||
|
||||
## Features
|
||||
Agent-oriented detail: [`AGENTS.md`](AGENTS.md).
|
||||
|
||||
- **Acquisition** — S2 L2A (AWS Element84 STAC), S3 OLCI L1B (Copernicus OpenEO), PhenoCam midday images and GCC CSV
|
||||
- **Pre-selection** — Aggressive and non-aggressive NDVI-based cloud screening (plus dark-scene rejection)
|
||||
- **Preparation** — Harmonised reflectance/GCC rasters, distance-to-cloud weights, S3 compositing and optional temporal smoothing
|
||||
- **Fusion** — EFAST under eight scenarios per site (BtI and ItB × two strategies × σ ∈ {20, 30} days)
|
||||
- **Post-processing** — Crop to valid-data window; NDVI and GCC timeseries at the site
|
||||
- **Metrics** — Temporal comparison vs PhenoCam (`metrics.json`); optional Tier-2 withheld-S2 gap validation
|
||||
- **Web viewer** — Static HTML dashboard over pipeline outputs (`webapp/`)
|
||||
---
|
||||
|
||||
## Installation
|
||||
## Quick start
|
||||
|
||||
From `processing/`:
|
||||
|
||||
```bash
|
||||
pip install -r requirements.txt
|
||||
pip install git+https://github.com/DHI-GRAS/efast.git # not on PyPI
|
||||
uv sync
|
||||
uv run python 1-phenocam.py --evaluation-year 2025
|
||||
```
|
||||
|
||||
Create `.env` with Copernicus Data Space credentials:
|
||||
### Stepped pipeline (resumable)
|
||||
|
||||
- `CDSE_USER`
|
||||
- `CDSE_PASSWORD`
|
||||
|
||||
Python version is pinned in `.python-version` (use `.venv/` locally).
|
||||
|
||||
## Usage
|
||||
|
||||
```python
|
||||
from run import run_pipeline
|
||||
|
||||
run_pipeline(season=2024, site_position=(47.116171, 11.320308), site_name="innsbruck")
|
||||
```
|
||||
|
||||
`site_position` is always **`(lat, lon)`**. Study sites are listed at the bottom of `run.py`: `innsbruck`, `forthgr`, `pitsalu`, `vindeln2`, `sunflowerjerez1`, `institutekarnobat`.
|
||||
|
||||
By default, most stages in `run.py` are **commented out** (metrics-only). Uncomment acquisition → pre-selection → preparation → fusion → post-processing for a full run.
|
||||
|
||||
### Pipeline stages
|
||||
|
||||
1. Download S2, S3, and PhenoCam
|
||||
2. Pre-selection (per-sensor NDVI screening → `raw/preselection/`)
|
||||
3. Prepare S2/S3 for each strategy (`prepared_{aggressive|nonaggressive}/` and `_itb/` variants)
|
||||
4. EFAST fusion (BtI reflectance and ItB GCC products)
|
||||
5. Post-process crops and timeseries (`processed_*_sigma{20,30}/`)
|
||||
6. Compute metrics vs PhenoCam → `metrics.json`
|
||||
|
||||
### Gap validation (optional)
|
||||
|
||||
With prepared data and EFAST installed:
|
||||
All steps use `--evaluation-year` (default 2025) and optional `--site`. See each script docstring for inputs/outputs under `data/`.
|
||||
|
||||
```bash
|
||||
# Phenology sidecars (TIMESAT 50 % amplitude)
|
||||
python -m phenology_timesat --all
|
||||
uv run python 1-phenocam.py --evaluation-year 2025
|
||||
uv run python 2-phenocam-screening.py --evaluation-year 2025
|
||||
uv run python 3-sentinel-data.py --evaluation-year 2025
|
||||
uv run python 4-fusion.py --evaluation-year 2025
|
||||
uv run python 5-metrics.py --evaluation-year 2025
|
||||
|
||||
# Spatial NSE_S2 vs withheld S2 (unit test: Estonia peatland, 30 d, green-up)
|
||||
python -m gap_validation.run --site pitsalu --season 2024 --lat 58.5633 --lon 24.3688 \
|
||||
--strategy aggressive --sigma 20 --mode bti --transition green_up --gap-days 30
|
||||
|
||||
# All six sites, best BtI scenario per site
|
||||
python -m gap_validation.batch_spatial
|
||||
|
||||
# Full-season NSE_PC on gap-degraded stack (slow)
|
||||
python -m gap_validation.temporal_pc --site pitsalu --season 2024 --lat 58.5633 --lon 24.3688
|
||||
python -m gap_validation.batch_temporal
|
||||
|
||||
# TIMESAT day-offsets on gap fusion vs PhenoCam (needs temporal tier)
|
||||
python -m gap_validation.phenology_offsets
|
||||
# single site
|
||||
uv run python 3-sentinel-data.py --evaluation-year 2025 --site innsbruck
|
||||
uv run python 4-fusion.py --evaluation-year 2025 --site innsbruck
|
||||
uv run python 5-metrics.py --evaluation-year 2025 --site innsbruck
|
||||
```
|
||||
|
||||
Writes `gap_manifest.json`, `gap_withheld_images.json`, `gap_validation_summary.json` (spatial), and optionally `gap_metrics.json` (temporal). Masked fusion under `validation/fusion/gap_{N}_{transition}/`. See `python -m gap_validation.run --help`.
|
||||
Step 3 S3 uses CDSE OpenEO (`SENTINEL3_SYN_L2_SYN`); S2 uses AWS Earth Search COG range reads (no auth).
|
||||
|
||||
## Data layout
|
||||
---
|
||||
|
||||
```
|
||||
data/{site_name}/{season}/
|
||||
raw/
|
||||
s2/ # {YYYYMMDD}_{n}.geotiff — B02, B03, B04, B8A
|
||||
s3/ # {YYYYMMDD}_{n}.geotiff — Oa04, Oa06, Oa08, Oa17
|
||||
phenocam/ # JPEGs, GCC JSON, phenology sidecar
|
||||
preselection/ # {s2,s3}_preselection.{json,csv}
|
||||
prepared_{strategy}/
|
||||
s2/ # REFL + DIST_CLOUD GeoTIFFs
|
||||
s3/ # composite_{YYYYMMDD}.tif
|
||||
fusion/ # REFL_{YYYYMMDD}.tif (σ≈20)
|
||||
fusion_sigma30/ # REFL (σ=30)
|
||||
prepared_{strategy}_itb/
|
||||
s2/ s3/ fusion/ # GCC products (Index-then-Blend)
|
||||
processed_{strategy}_sigma{20,30}/
|
||||
s2/ s3/ fusion/ # cropped {YYYYMMDD}_0.geotiff
|
||||
gcc/ ndvi/ # timeseries.json per source
|
||||
processed_{strategy}_itb_sigma{20,30}/
|
||||
s2/ s3/ fusion/ gcc/
|
||||
validation/ # gap experiment (when run)
|
||||
metrics.json
|
||||
```
|
||||
## Outputs (under `data/`)
|
||||
|
||||
Site metadata: `data/sites.geojson` (six thesis sites). `data/coweeta/` is local/legacy and not listed there.
|
||||
| Artifact | Step | Role |
|
||||
|----------|------|------|
|
||||
| `phenocam/{year}.json` | 1 | Site list + `sites_dir` pointer |
|
||||
| `phenocam/{year}/{site}.json`, `{site}_1day.csv` | 1 | Raw API + GCC CSV |
|
||||
| `phenocam_screening/{year}.json` / `.csv` | 2 | PhenoCam + SNR gate results |
|
||||
| `sentinel_data/{year}/{site}/prepared/s2/` | 3 | S2 REFL + DIST_CLOUD GeoTIFFs |
|
||||
| `sentinel_data/{year}/{site}/prepared/s3/` | 3 | S3 composite GeoTIFFs |
|
||||
| `fusion/{year}/{site}/` | 4 | BtI/ItB fused rasters |
|
||||
| `metrics/{year}/{site}/`, `metrics/manifest.json` | 5 | Timeseries JSON, covariates, webapp manifest |
|
||||
|
||||
### File formats
|
||||
The 2025 manifest currently lists **739** cameras with archive overlap; most per-site CSV/JSON files are cached under `data/phenocam/2025/`.
|
||||
|
||||
**Sentinel-2** — Multi-band GeoTIFF; bands `[blue, green, red, nir]`; `VIEWING_ZENITH_ANGLE` metadata; filename `{YYYYMMDD}_{increment}.geotiff`.
|
||||
|
||||
**Sentinel-3** — Multi-band GeoTIFF; same band order; filename `{YYYYMMDD}_{increment}.geotiff`.
|
||||
|
||||
**Prepared S2** — `S2A_MSIL2A_{YYYYMMDD}_REFL.tif` plus `*DIST_CLOUD.tif` (cloud-distance weights for EFAST).
|
||||
---
|
||||
|
||||
## Web viewer
|
||||
|
||||
Static HTML/JS in `webapp/` — no build step. Shared GeoTIFF helpers: `webapp/common.js`. CDN: Leaflet, geotiff.js, proj4. Symlink: `webapp/data` → `../data`.
|
||||
|
||||
Serve from the **repository root** (not `webapp/`):
|
||||
|
||||
```bash
|
||||
python3 -m http.server 8000
|
||||
# http://localhost:8000/webapp/index.html
|
||||
```
|
||||
|
||||
Or from the workspace root: `make serve`.
|
||||
|
||||
| Page | Purpose | Primary data paths |
|
||||
|------|---------|-------------------|
|
||||
| `index.html` | Post-processed maps, NDVI/GCC timeseries, PhenoCam | `processed_{strategy}_sigma{n}/`, `raw/phenocam/` |
|
||||
| `preselection.html` | Cloud-screening diagnostics | `raw/preselection/{s2,s3}_preselection.json` |
|
||||
| `prepared.html` | Prepared REFL/GCC before crop | `prepared_{strategy}/`, `prepared_{strategy}_itb/` |
|
||||
| `fusion.html` | EFAST daily fusion rasters | `prepared_*/fusion/`, `fusion_sigma30/` |
|
||||
| `postprocessed.html` | Cropped processed stacks | `processed_*_sigma*/` |
|
||||
| `metrics.html` | Tabular `metrics.json` (thesis export source) | `{site}/{season}/metrics.json` under `webapp/data/` |
|
||||
| `gap_validation.html` | Withheld-S2 gap experiment | `{site}/{season}/validation/gap_validation_summary.json` |
|
||||
| `phenology.html` | TIMESAT on PhenoCam GCC | `raw/phenocam/phenocam_phenology.json` |
|
||||
|
||||
Site/season dropdowns use `data/sites.geojson`. Map pages: **BtI | ItB**; scenarios `aggressive` / `nonaggressive`, σ 20 / 30. Keep the shared nav consistent across all eight pages. QA only — thesis tables are exported from the workspace root (`make export` or `../scripts/export_thesis_tables.py`).
|
||||
|
||||
## Development
|
||||
|
||||
```bash
|
||||
ruff check --fix . && ruff format .
|
||||
```
|
||||
|
||||
Pre-commit hooks: `.pre-commit-config.yaml`.
|
||||
|
||||
## License
|
||||
|
||||
GNU Affero General Public License v3.0 (AGPL-3.0). See [LICENSE](LICENSE).
|
||||
From the workspace root, `make serve` serves `processing/` at [http://localhost:8000/webapp/index.html](http://localhost:8000/webapp/index.html). Requires step 5 (`data/metrics/manifest.json`).
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue