Add v2 snapshot schema validation and import_snapshots command

This commit is contained in:
Alfredo Di Stasio
2026-03-13 14:00:39 +01:00
parent 6aa66807e9
commit eacff3d25e
14 changed files with 844 additions and 16 deletions

View File

@ -8,10 +8,9 @@ Current v2 foundation scope in this branch:
- nginx reverse proxy
- management-command-driven runtime operations
- static snapshot directories persisted via Docker named volumes
- strict JSON snapshot schema + import management command
Out of scope in this step:
- domain model redesign
- snapshot importer implementation
- extractor implementation
## Runtime Architecture (v2)
@ -81,7 +80,7 @@ Core groups:
- Django runtime/security vars
- PostgreSQL connection vars
- image tag vars (`APP_IMAGE_TAG`, `NGINX_IMAGE_TAG`)
- snapshot directory vars (`SNAPSHOT_*`)
- snapshot directory vars (`STATIC_DATASET_*`)
- optional future scheduler vars (`SCHEDULER_*`)
## Snapshot Storage Convention
@ -91,7 +90,84 @@ Snapshot files are expected under:
- archive: `/app/snapshots/archive`
- failed: `/app/snapshots/failed`
In this foundation step, directories are created and persisted but no importer/extractor is implemented yet.
Configured via environment:
- `STATIC_DATASET_INCOMING_DIR`
- `STATIC_DATASET_ARCHIVE_DIR`
- `STATIC_DATASET_FAILED_DIR`
## Snapshot JSON Schema (MVP)
Each file must be a JSON object:
```json
{
"source_name": "official_site_feed",
"snapshot_date": "2026-03-13",
"records": [
{
"competition_external_id": "comp-nba",
"competition_name": "NBA",
"season": "2025-2026",
"team_external_id": "team-lal",
"team_name": "Los Angeles Lakers",
"player_external_id": "player-23",
"full_name": "LeBron James",
"first_name": "LeBron",
"last_name": "James",
"birth_date": "1984-12-30",
"nationality": "US",
"height_cm": 206,
"weight_kg": 113,
"position": "SF",
"role": "Primary Creator",
"games_played": 60,
"minutes_per_game": 34.5,
"points_per_game": 25.4,
"rebounds_per_game": 7.2,
"assists_per_game": 8.1,
"steals_per_game": 1.3,
"blocks_per_game": 0.7,
"turnovers_per_game": 3.2,
"fg_pct": 51.1,
"three_pt_pct": 38.4,
"ft_pct": 79.8,
"source_metadata": {},
"raw_payload": {}
}
],
"source_metadata": {},
"raw_payload": {}
}
```
Validation is strict:
- unknown fields are rejected
- required fields must exist
- `snapshot_date` and `birth_date` must be `YYYY-MM-DD`
- numeric fields must be numeric
- invalid files are moved to failed directory
## Import Command
Run import:
```bash
docker compose exec web python manage.py import_snapshots
```
Command behavior:
- scans `STATIC_DATASET_INCOMING_DIR` for `.json` files
- validates strict schema
- computes SHA-256 checksum
- creates `ImportRun` + `ImportFile` records
- upserts relational entities (`Competition`, `Season`, `Team`, `Player`, `PlayerSeason`, `PlayerSeasonStats`)
- skips duplicate content using checksum
- moves valid files to archive
- moves invalid files to failed
Import history is visible in Django admin:
- `ImportRun`
- `ImportFile`
## Migration and Superuser Commands