# HoopScout v2 (Foundation Reset) HoopScout v2 is a controlled greenfield rebuild inside the existing repository. Current v2 foundation scope in this branch: - Django + HTMX server-rendered app - PostgreSQL as the only primary database - nginx reverse proxy - management-command-driven runtime operations - static snapshot directories persisted via Docker named volumes - strict JSON snapshot schema + import management command Out of scope in this step: - extractor implementation ## Runtime Architecture (v2) Runtime services are intentionally small: - `web` (Django/Gunicorn) - `postgres` (primary DB) - `nginx` (reverse proxy + static/media serving) No Redis/Celery services are part of the v2 default runtime topology. Legacy Celery/provider code is still in repository history/codebase but de-emphasized for v2. ## Image Strategy Compose builds and tags images as: - `registry.younerd.org/hoopscout/web:${APP_IMAGE_TAG:-latest}` - `registry.younerd.org/hoopscout/nginx:${NGINX_IMAGE_TAG:-latest}` Reserved for future optional scheduler use: - `registry.younerd.org/hoopscout/scheduler:${APP_IMAGE_TAG:-latest}` ## Entrypoint Strategy - `web`: `entrypoint.sh` - waits for PostgreSQL - optionally runs migrations/collectstatic - ensures snapshot directories exist - `nginx`: `nginx/entrypoint.sh` - simple runtime entrypoint wrapper ## Compose Files - `docker-compose.yml`: production-minded baseline runtime (immutable image filesystem) - `docker-compose.dev.yml`: development override with source bind mount for `web` - `docker-compose.release.yml`: production settings override (`DJANGO_SETTINGS_MODULE=config.settings.production`) ### Start development runtime ```bash cp .env.example .env docker compose -f docker-compose.yml -f docker-compose.dev.yml up --build ``` ### Start release-style runtime ```bash docker compose -f docker-compose.yml -f docker-compose.release.yml up -d --build ``` ## Named Volumes v2 runtime uses named volumes for persistence: - `postgres_data` - `static_data` - `media_data` - `snapshots_incoming` - `snapshots_archive` - `snapshots_failed` Development override uses separate dev-prefixed volumes to avoid ownership collisions. ## Environment Variables Use `.env.example` as the source of truth. Core groups: - Django runtime/security vars - PostgreSQL connection vars - image tag vars (`APP_IMAGE_TAG`, `NGINX_IMAGE_TAG`) - snapshot directory vars (`STATIC_DATASET_*`) - optional future scheduler vars (`SCHEDULER_*`) ## Snapshot Storage Convention Snapshot files are expected under: - incoming: `/app/snapshots/incoming` - archive: `/app/snapshots/archive` - failed: `/app/snapshots/failed` Configured via environment: - `STATIC_DATASET_INCOMING_DIR` - `STATIC_DATASET_ARCHIVE_DIR` - `STATIC_DATASET_FAILED_DIR` ## Snapshot JSON Schema (MVP) Each file must be a JSON object: ```json { "source_name": "official_site_feed", "snapshot_date": "2026-03-13", "records": [ { "competition_external_id": "comp-nba", "competition_name": "NBA", "season": "2025-2026", "team_external_id": "team-lal", "team_name": "Los Angeles Lakers", "player_external_id": "player-23", "full_name": "LeBron James", "first_name": "LeBron", "last_name": "James", "birth_date": "1984-12-30", "nationality": "US", "height_cm": 206, "weight_kg": 113, "position": "SF", "role": "Primary Creator", "games_played": 60, "minutes_per_game": 34.5, "points_per_game": 25.4, "rebounds_per_game": 7.2, "assists_per_game": 8.1, "steals_per_game": 1.3, "blocks_per_game": 0.7, "turnovers_per_game": 3.2, "fg_pct": 51.1, "three_pt_pct": 38.4, "ft_pct": 79.8, "source_metadata": {}, "raw_payload": {} } ], "source_metadata": {}, "raw_payload": {} } ``` Validation is strict: - unknown fields are rejected - required fields must exist - `snapshot_date` and `birth_date` must be `YYYY-MM-DD` - numeric fields must be numeric - invalid files are moved to failed directory ## Import Command Run import: ```bash docker compose exec web python manage.py import_snapshots ``` Command behavior: - scans `STATIC_DATASET_INCOMING_DIR` for `.json` files - validates strict schema - computes SHA-256 checksum - creates `ImportRun` + `ImportFile` records - upserts relational entities (`Competition`, `Season`, `Team`, `Player`, `PlayerSeason`, `PlayerSeasonStats`) - skips duplicate content using checksum - moves valid files to archive - moves invalid files to failed Import history is visible in Django admin: - `ImportRun` - `ImportFile` ## Extractor Framework (v2) v2 keeps extraction and import as two separate steps: 1. **Extractors** fetch public source content and emit normalized JSON snapshots. 2. **Importer** (`import_snapshots`) validates and upserts those snapshots into PostgreSQL. Extractor pipeline: - `fetch` (public endpoint/page requests with conservative HTTP behavior) - `parse` (source-specific structure) - `normalize` (map to HoopScout snapshot schema) - `emit` (write JSON file to incoming directory or custom path) Built-in extractor in this phase: - `public_json_snapshot` (generic JSON feed extractor for MVP usage) - `lba` (Lega Basket Serie A MVP extractor) - `bcl` (Basketball Champions League MVP extractor) Run extractor: ```bash docker compose exec web python manage.py run_extractor public_json_snapshot ``` Run extractor with explicit output path (debugging): ```bash docker compose exec web python manage.py run_extractor public_json_snapshot --output-path /app/snapshots/incoming ``` Dry-run validation (no file write): ```bash docker compose exec web python manage.py run_extractor public_json_snapshot --dry-run ``` Run only the LBA extractor: ```bash docker compose exec web python manage.py run_lba_extractor ``` Run only the BCL extractor: ```bash docker compose exec web python manage.py run_bcl_extractor ``` Extractor environment variables: - `EXTRACTOR_USER_AGENT` - `EXTRACTOR_HTTP_TIMEOUT_SECONDS` - `EXTRACTOR_HTTP_RETRIES` - `EXTRACTOR_RETRY_SLEEP_SECONDS` - `EXTRACTOR_REQUEST_DELAY_SECONDS` - `EXTRACTOR_PUBLIC_JSON_URL` - `EXTRACTOR_PUBLIC_SOURCE_NAME` - `EXTRACTOR_INCLUDE_RAW_PAYLOAD` - `EXTRACTOR_LBA_STATS_URL` - `EXTRACTOR_LBA_SEASON_LABEL` - `EXTRACTOR_LBA_COMPETITION_EXTERNAL_ID` - `EXTRACTOR_LBA_COMPETITION_NAME` - `EXTRACTOR_BCL_STATS_URL` - `EXTRACTOR_BCL_SEASON_LABEL` - `EXTRACTOR_BCL_COMPETITION_EXTERNAL_ID` - `EXTRACTOR_BCL_COMPETITION_NAME` Notes: - extraction is intentionally low-frequency and uses retries conservatively - only public pages/endpoints should be targeted - emitted snapshots must match the same schema consumed by `import_snapshots` ### LBA extractor assumptions and limitations (MVP) - `source_name` is fixed to `lba` - the extractor expects one stable public JSON payload that includes player/team/stat rows - competition is configured by environment and emitted as: - `competition_external_id` from `EXTRACTOR_LBA_COMPETITION_EXTERNAL_ID` - `competition_name` from `EXTRACTOR_LBA_COMPETITION_NAME` - season is configured by `EXTRACTOR_LBA_SEASON_LABEL` - parser supports payload keys: `records`, `data`, `players`, `items` - normalization supports nested `player` and `team` objects with common stat aliases (`gp/mpg/ppg/rpg/apg/spg/bpg/tov`) - no live HTTP calls in tests; tests use fixtures/mocked responses only ### BCL extractor assumptions and limitations (MVP) - `source_name` is fixed to `bcl` - the extractor expects one stable public JSON payload that includes player/team/stat rows - competition is configured by environment and emitted as: - `competition_external_id` from `EXTRACTOR_BCL_COMPETITION_EXTERNAL_ID` - `competition_name` from `EXTRACTOR_BCL_COMPETITION_NAME` - season is configured by `EXTRACTOR_BCL_SEASON_LABEL` - parser supports payload keys: `records`, `data`, `players`, `items` - normalization supports nested `player` and `team` objects with common stat aliases (`gp/mpg/ppg/rpg/apg/spg/bpg/tov`) - no live HTTP calls in tests; tests use fixtures/mocked responses only ## Migration and Superuser Commands ```bash docker compose exec web python manage.py migrate docker compose exec web python manage.py createsuperuser ``` ## Health Endpoints - app health: `/health/` - nginx healthcheck proxies `/health/` to `web` ## Player Search (v2) Public player search is server-rendered (Django templates) with HTMX partial updates. Supported filters: - free text name search - nominal position, inferred role - competition, season, team - nationality - age, height, weight ranges - stats thresholds: games, MPG, PPG, RPG, APG, SPG, BPG, TOV, FG%, 3P%, FT% Search correctness: - combined team/competition/season/stat filters are applied to the same `PlayerSeason` context (no cross-row false positives) - filtering happens at database level with Django ORM Search metric semantics: - result columns are labeled as **Best Eligible** - each displayed metric is `MAX` over eligible player-season rows for that metric in the current filter context - different metric columns for one player may come from different eligible seasons - when no eligible value exists for a metric in the current context, the UI shows `-` Pagination and sorting: - querystring is preserved - HTMX navigation keeps URL state in sync with current filters/page/sort ## GitFlow Required branch model: - `main`: production - `develop`: integration - `feature/*`, `release/*`, `hotfix/*` This v2 work branch is: - `feature/hoopscout-v2-static-architecture` ## Notes on Legacy Layers Legacy provider/Celery ingestion layers are not the default runtime path for v2 foundation. They are intentionally isolated until replaced by v2 snapshot ingestion commands in later tasks.