feat(v2): add LBA snapshot extractor and command

This commit is contained in:
Alfredo Di Stasio
2026-03-13 14:28:35 +01:00
parent 850e4de71b
commit 97913c4a79
9 changed files with 358 additions and 0 deletions

View File

@ -184,6 +184,7 @@ Extractor pipeline:
Built-in extractor in this phase:
- `public_json_snapshot` (generic JSON feed extractor for MVP usage)
- `lba` (Lega Basket Serie A MVP extractor)
Run extractor:
@ -203,6 +204,12 @@ Dry-run validation (no file write):
docker compose exec web python manage.py run_extractor public_json_snapshot --dry-run
```
Run only the LBA extractor:
```bash
docker compose exec web python manage.py run_lba_extractor
```
Extractor environment variables:
- `EXTRACTOR_USER_AGENT`
- `EXTRACTOR_HTTP_TIMEOUT_SECONDS`
@ -212,12 +219,29 @@ Extractor environment variables:
- `EXTRACTOR_PUBLIC_JSON_URL`
- `EXTRACTOR_PUBLIC_SOURCE_NAME`
- `EXTRACTOR_INCLUDE_RAW_PAYLOAD`
- `EXTRACTOR_LBA_STATS_URL`
- `EXTRACTOR_LBA_SEASON_LABEL`
- `EXTRACTOR_LBA_COMPETITION_EXTERNAL_ID`
- `EXTRACTOR_LBA_COMPETITION_NAME`
Notes:
- extraction is intentionally low-frequency and uses retries conservatively
- only public pages/endpoints should be targeted
- emitted snapshots must match the same schema consumed by `import_snapshots`
### LBA extractor assumptions and limitations (MVP)
- `source_name` is fixed to `lba`
- the extractor expects one stable public JSON payload that includes player/team/stat rows
- competition is configured by environment and emitted as:
- `competition_external_id` from `EXTRACTOR_LBA_COMPETITION_EXTERNAL_ID`
- `competition_name` from `EXTRACTOR_LBA_COMPETITION_NAME`
- season is configured by `EXTRACTOR_LBA_SEASON_LABEL`
- parser supports payload keys: `records`, `data`, `players`, `items`
- normalization supports nested `player` and `team` objects with common stat aliases (`gp/mpg/ppg/rpg/apg/spg/bpg/tov`)
- no BCL support in this task
- no live HTTP calls in tests; tests use fixtures/mocked responses only
## Migration and Superuser Commands
```bash