273 lines
7.7 KiB
Markdown
273 lines
7.7 KiB
Markdown
# HoopScout v2 (Foundation Reset)
|
|
|
|
HoopScout v2 is a controlled greenfield rebuild inside the existing repository.
|
|
|
|
Current v2 foundation scope in this branch:
|
|
- Django + HTMX server-rendered app
|
|
- PostgreSQL as the only primary database
|
|
- nginx reverse proxy
|
|
- management-command-driven runtime operations
|
|
- static snapshot directories persisted via Docker named volumes
|
|
- strict JSON snapshot schema + import management command
|
|
|
|
Out of scope in this step:
|
|
- extractor implementation
|
|
|
|
## Runtime Architecture (v2)
|
|
|
|
Runtime services are intentionally small:
|
|
- `web` (Django/Gunicorn)
|
|
- `postgres` (primary DB)
|
|
- `nginx` (reverse proxy + static/media serving)
|
|
|
|
No Redis/Celery services are part of the v2 default runtime topology.
|
|
Legacy Celery/provider code is still in repository history/codebase but de-emphasized for v2.
|
|
|
|
## Image Strategy
|
|
|
|
Compose builds and tags images as:
|
|
- `registry.younerd.org/hoopscout/web:${APP_IMAGE_TAG:-latest}`
|
|
- `registry.younerd.org/hoopscout/nginx:${NGINX_IMAGE_TAG:-latest}`
|
|
|
|
Reserved for future optional scheduler use:
|
|
- `registry.younerd.org/hoopscout/scheduler:${APP_IMAGE_TAG:-latest}`
|
|
|
|
## Entrypoint Strategy
|
|
|
|
- `web`: `entrypoint.sh`
|
|
- waits for PostgreSQL
|
|
- optionally runs migrations/collectstatic
|
|
- ensures snapshot directories exist
|
|
- `nginx`: `nginx/entrypoint.sh`
|
|
- simple runtime entrypoint wrapper
|
|
|
|
## Compose Files
|
|
|
|
- `docker-compose.yml`: production-minded baseline runtime (immutable image filesystem)
|
|
- `docker-compose.dev.yml`: development override with source bind mount for `web`
|
|
- `docker-compose.release.yml`: production settings override (`DJANGO_SETTINGS_MODULE=config.settings.production`)
|
|
|
|
### Start development runtime
|
|
|
|
```bash
|
|
cp .env.example .env
|
|
docker compose -f docker-compose.yml -f docker-compose.dev.yml up --build
|
|
```
|
|
|
|
### Start release-style runtime
|
|
|
|
```bash
|
|
docker compose -f docker-compose.yml -f docker-compose.release.yml up -d --build
|
|
```
|
|
|
|
## Named Volumes
|
|
|
|
v2 runtime uses named volumes for persistence:
|
|
- `postgres_data`
|
|
- `static_data`
|
|
- `media_data`
|
|
- `snapshots_incoming`
|
|
- `snapshots_archive`
|
|
- `snapshots_failed`
|
|
|
|
Development override uses separate dev-prefixed volumes to avoid ownership collisions.
|
|
|
|
## Environment Variables
|
|
|
|
Use `.env.example` as the source of truth.
|
|
|
|
Core groups:
|
|
- Django runtime/security vars
|
|
- PostgreSQL connection vars
|
|
- image tag vars (`APP_IMAGE_TAG`, `NGINX_IMAGE_TAG`)
|
|
- snapshot directory vars (`STATIC_DATASET_*`)
|
|
- optional future scheduler vars (`SCHEDULER_*`)
|
|
|
|
## Snapshot Storage Convention
|
|
|
|
Snapshot files are expected under:
|
|
- incoming: `/app/snapshots/incoming`
|
|
- archive: `/app/snapshots/archive`
|
|
- failed: `/app/snapshots/failed`
|
|
|
|
Configured via environment:
|
|
- `STATIC_DATASET_INCOMING_DIR`
|
|
- `STATIC_DATASET_ARCHIVE_DIR`
|
|
- `STATIC_DATASET_FAILED_DIR`
|
|
|
|
## Snapshot JSON Schema (MVP)
|
|
|
|
Each file must be a JSON object:
|
|
|
|
```json
|
|
{
|
|
"source_name": "official_site_feed",
|
|
"snapshot_date": "2026-03-13",
|
|
"records": [
|
|
{
|
|
"competition_external_id": "comp-nba",
|
|
"competition_name": "NBA",
|
|
"season": "2025-2026",
|
|
"team_external_id": "team-lal",
|
|
"team_name": "Los Angeles Lakers",
|
|
"player_external_id": "player-23",
|
|
"full_name": "LeBron James",
|
|
"first_name": "LeBron",
|
|
"last_name": "James",
|
|
"birth_date": "1984-12-30",
|
|
"nationality": "US",
|
|
"height_cm": 206,
|
|
"weight_kg": 113,
|
|
"position": "SF",
|
|
"role": "Primary Creator",
|
|
"games_played": 60,
|
|
"minutes_per_game": 34.5,
|
|
"points_per_game": 25.4,
|
|
"rebounds_per_game": 7.2,
|
|
"assists_per_game": 8.1,
|
|
"steals_per_game": 1.3,
|
|
"blocks_per_game": 0.7,
|
|
"turnovers_per_game": 3.2,
|
|
"fg_pct": 51.1,
|
|
"three_pt_pct": 38.4,
|
|
"ft_pct": 79.8,
|
|
"source_metadata": {},
|
|
"raw_payload": {}
|
|
}
|
|
],
|
|
"source_metadata": {},
|
|
"raw_payload": {}
|
|
}
|
|
```
|
|
|
|
Validation is strict:
|
|
- unknown fields are rejected
|
|
- required fields must exist
|
|
- `snapshot_date` and `birth_date` must be `YYYY-MM-DD`
|
|
- numeric fields must be numeric
|
|
- invalid files are moved to failed directory
|
|
|
|
## Import Command
|
|
|
|
Run import:
|
|
|
|
```bash
|
|
docker compose exec web python manage.py import_snapshots
|
|
```
|
|
|
|
Command behavior:
|
|
- scans `STATIC_DATASET_INCOMING_DIR` for `.json` files
|
|
- validates strict schema
|
|
- computes SHA-256 checksum
|
|
- creates `ImportRun` + `ImportFile` records
|
|
- upserts relational entities (`Competition`, `Season`, `Team`, `Player`, `PlayerSeason`, `PlayerSeasonStats`)
|
|
- skips duplicate content using checksum
|
|
- moves valid files to archive
|
|
- moves invalid files to failed
|
|
|
|
Import history is visible in Django admin:
|
|
- `ImportRun`
|
|
- `ImportFile`
|
|
|
|
## Extractor Framework (v2)
|
|
|
|
v2 keeps extraction and import as two separate steps:
|
|
|
|
1. **Extractors** fetch public source content and emit normalized JSON snapshots.
|
|
2. **Importer** (`import_snapshots`) validates and upserts those snapshots into PostgreSQL.
|
|
|
|
Extractor pipeline:
|
|
- `fetch` (public endpoint/page requests with conservative HTTP behavior)
|
|
- `parse` (source-specific structure)
|
|
- `normalize` (map to HoopScout snapshot schema)
|
|
- `emit` (write JSON file to incoming directory or custom path)
|
|
|
|
Built-in extractor in this phase:
|
|
- `public_json_snapshot` (generic JSON feed extractor for MVP usage)
|
|
|
|
Run extractor:
|
|
|
|
```bash
|
|
docker compose exec web python manage.py run_extractor public_json_snapshot
|
|
```
|
|
|
|
Run extractor with explicit output path (debugging):
|
|
|
|
```bash
|
|
docker compose exec web python manage.py run_extractor public_json_snapshot --output-path /app/snapshots/incoming
|
|
```
|
|
|
|
Dry-run validation (no file write):
|
|
|
|
```bash
|
|
docker compose exec web python manage.py run_extractor public_json_snapshot --dry-run
|
|
```
|
|
|
|
Extractor environment variables:
|
|
- `EXTRACTOR_USER_AGENT`
|
|
- `EXTRACTOR_HTTP_TIMEOUT_SECONDS`
|
|
- `EXTRACTOR_HTTP_RETRIES`
|
|
- `EXTRACTOR_RETRY_SLEEP_SECONDS`
|
|
- `EXTRACTOR_REQUEST_DELAY_SECONDS`
|
|
- `EXTRACTOR_PUBLIC_JSON_URL`
|
|
- `EXTRACTOR_PUBLIC_SOURCE_NAME`
|
|
- `EXTRACTOR_INCLUDE_RAW_PAYLOAD`
|
|
|
|
Notes:
|
|
- extraction is intentionally low-frequency and uses retries conservatively
|
|
- only public pages/endpoints should be targeted
|
|
- emitted snapshots must match the same schema consumed by `import_snapshots`
|
|
|
|
## Migration and Superuser Commands
|
|
|
|
```bash
|
|
docker compose exec web python manage.py migrate
|
|
docker compose exec web python manage.py createsuperuser
|
|
```
|
|
|
|
## Health Endpoints
|
|
|
|
- app health: `/health/`
|
|
- nginx healthcheck proxies `/health/` to `web`
|
|
|
|
## Player Search (v2)
|
|
|
|
Public player search is server-rendered (Django templates) with HTMX partial updates.
|
|
|
|
Supported filters:
|
|
- free text name search
|
|
- nominal position, inferred role
|
|
- competition, season, team
|
|
- nationality
|
|
- age, height, weight ranges
|
|
- stats thresholds: games, MPG, PPG, RPG, APG, SPG, BPG, TOV, FG%, 3P%, FT%
|
|
|
|
Search correctness:
|
|
- combined team/competition/season/stat filters are applied to the same `PlayerSeason` context (no cross-row false positives)
|
|
- filtering happens at database level with Django ORM
|
|
|
|
Search metric semantics:
|
|
- result columns are labeled as **Best Eligible**
|
|
- each displayed metric is `MAX` over eligible player-season rows for that metric in the current filter context
|
|
- different metric columns for one player may come from different eligible seasons
|
|
- when no eligible value exists for a metric in the current context, the UI shows `-`
|
|
|
|
Pagination and sorting:
|
|
- querystring is preserved
|
|
- HTMX navigation keeps URL state in sync with current filters/page/sort
|
|
|
|
## GitFlow
|
|
|
|
Required branch model:
|
|
- `main`: production
|
|
- `develop`: integration
|
|
- `feature/*`, `release/*`, `hotfix/*`
|
|
|
|
This v2 work branch is:
|
|
- `feature/hoopscout-v2-static-architecture`
|
|
|
|
## Notes on Legacy Layers
|
|
|
|
Legacy provider/Celery ingestion layers are not the default runtime path for v2 foundation.
|
|
They are intentionally isolated until replaced by v2 snapshot ingestion commands in later tasks.
|