Add v2 snapshot schema validation and import_snapshots command
This commit is contained in:
84
README.md
84
README.md
@ -8,10 +8,9 @@ Current v2 foundation scope in this branch:
|
||||
- nginx reverse proxy
|
||||
- management-command-driven runtime operations
|
||||
- static snapshot directories persisted via Docker named volumes
|
||||
- strict JSON snapshot schema + import management command
|
||||
|
||||
Out of scope in this step:
|
||||
- domain model redesign
|
||||
- snapshot importer implementation
|
||||
- extractor implementation
|
||||
|
||||
## Runtime Architecture (v2)
|
||||
@ -81,7 +80,7 @@ Core groups:
|
||||
- Django runtime/security vars
|
||||
- PostgreSQL connection vars
|
||||
- image tag vars (`APP_IMAGE_TAG`, `NGINX_IMAGE_TAG`)
|
||||
- snapshot directory vars (`SNAPSHOT_*`)
|
||||
- snapshot directory vars (`STATIC_DATASET_*`)
|
||||
- optional future scheduler vars (`SCHEDULER_*`)
|
||||
|
||||
## Snapshot Storage Convention
|
||||
@ -91,7 +90,84 @@ Snapshot files are expected under:
|
||||
- archive: `/app/snapshots/archive`
|
||||
- failed: `/app/snapshots/failed`
|
||||
|
||||
In this foundation step, directories are created and persisted but no importer/extractor is implemented yet.
|
||||
Configured via environment:
|
||||
- `STATIC_DATASET_INCOMING_DIR`
|
||||
- `STATIC_DATASET_ARCHIVE_DIR`
|
||||
- `STATIC_DATASET_FAILED_DIR`
|
||||
|
||||
## Snapshot JSON Schema (MVP)
|
||||
|
||||
Each file must be a JSON object:
|
||||
|
||||
```json
|
||||
{
|
||||
"source_name": "official_site_feed",
|
||||
"snapshot_date": "2026-03-13",
|
||||
"records": [
|
||||
{
|
||||
"competition_external_id": "comp-nba",
|
||||
"competition_name": "NBA",
|
||||
"season": "2025-2026",
|
||||
"team_external_id": "team-lal",
|
||||
"team_name": "Los Angeles Lakers",
|
||||
"player_external_id": "player-23",
|
||||
"full_name": "LeBron James",
|
||||
"first_name": "LeBron",
|
||||
"last_name": "James",
|
||||
"birth_date": "1984-12-30",
|
||||
"nationality": "US",
|
||||
"height_cm": 206,
|
||||
"weight_kg": 113,
|
||||
"position": "SF",
|
||||
"role": "Primary Creator",
|
||||
"games_played": 60,
|
||||
"minutes_per_game": 34.5,
|
||||
"points_per_game": 25.4,
|
||||
"rebounds_per_game": 7.2,
|
||||
"assists_per_game": 8.1,
|
||||
"steals_per_game": 1.3,
|
||||
"blocks_per_game": 0.7,
|
||||
"turnovers_per_game": 3.2,
|
||||
"fg_pct": 51.1,
|
||||
"three_pt_pct": 38.4,
|
||||
"ft_pct": 79.8,
|
||||
"source_metadata": {},
|
||||
"raw_payload": {}
|
||||
}
|
||||
],
|
||||
"source_metadata": {},
|
||||
"raw_payload": {}
|
||||
}
|
||||
```
|
||||
|
||||
Validation is strict:
|
||||
- unknown fields are rejected
|
||||
- required fields must exist
|
||||
- `snapshot_date` and `birth_date` must be `YYYY-MM-DD`
|
||||
- numeric fields must be numeric
|
||||
- invalid files are moved to failed directory
|
||||
|
||||
## Import Command
|
||||
|
||||
Run import:
|
||||
|
||||
```bash
|
||||
docker compose exec web python manage.py import_snapshots
|
||||
```
|
||||
|
||||
Command behavior:
|
||||
- scans `STATIC_DATASET_INCOMING_DIR` for `.json` files
|
||||
- validates strict schema
|
||||
- computes SHA-256 checksum
|
||||
- creates `ImportRun` + `ImportFile` records
|
||||
- upserts relational entities (`Competition`, `Season`, `Team`, `Player`, `PlayerSeason`, `PlayerSeasonStats`)
|
||||
- skips duplicate content using checksum
|
||||
- moves valid files to archive
|
||||
- moves invalid files to failed
|
||||
|
||||
Import history is visible in Django admin:
|
||||
- `ImportRun`
|
||||
- `ImportFile`
|
||||
|
||||
## Migration and Superuser Commands
|
||||
|
||||
|
||||
Reference in New Issue
Block a user