2026-03-20 15:08:20 +01:00
2026-03-09 15:56:35 +01:00

HoopScout v2 (Foundation Reset)

HoopScout v2 is a controlled greenfield rebuild inside the existing repository.

Current v2 foundation scope in this branch:

  • Django + HTMX server-rendered app
  • PostgreSQL as the only primary database
  • nginx reverse proxy
  • management-command-driven runtime operations
  • static snapshot directories persisted via Docker named volumes
  • strict JSON snapshot schema + import management command

Out of scope in this step:

  • extractor implementation

Runtime Architecture (v2)

Runtime services are intentionally small:

  • web (Django/Gunicorn)
  • postgres (primary DB)
  • nginx (reverse proxy + static/media serving)
  • optional scheduler profile service (runs daily extractor/import loop)

No Redis/Celery services are part of the v2 default runtime topology. Legacy Celery/provider code is still in repository history/codebase but de-emphasized for v2.

Image Strategy

Compose builds and tags images as:

  • registry.younerd.org/hoopscout/web:${APP_IMAGE_TAG:-latest}
  • registry.younerd.org/hoopscout/nginx:${NGINX_IMAGE_TAG:-latest}

Reserved for future optional scheduler use:

  • registry.younerd.org/hoopscout/scheduler:${APP_IMAGE_TAG:-latest}

Entrypoint Strategy

  • web: entrypoint.sh
    • waits for PostgreSQL
    • optionally runs migrations/collectstatic
    • ensures snapshot directories exist
  • nginx: nginx/entrypoint.sh
    • simple runtime entrypoint wrapper

Compose Files

  • docker-compose.yml: production-minded baseline runtime (immutable image filesystem)
  • docker-compose.dev.yml: development override with source bind mount for web
  • docker-compose.release.yml: production settings override (DJANGO_SETTINGS_MODULE=config.settings.production)

Start development runtime

cp .env.example .env
docker compose -f docker-compose.yml -f docker-compose.dev.yml up --build

Start release-style runtime

docker compose -f docker-compose.yml -f docker-compose.release.yml up -d --build

Start scheduler profile (optional)

docker compose --profile scheduler up -d scheduler

For development override:

docker compose -f docker-compose.yml -f docker-compose.dev.yml --profile scheduler up -d scheduler

Named Volumes

v2 runtime uses named volumes for persistence:

  • postgres_data
  • static_data
  • media_data
  • snapshots_incoming
  • snapshots_archive
  • snapshots_failed

Development override uses separate dev-prefixed volumes to avoid ownership collisions.

Environment Variables

Use .env.example as the source of truth.

Core groups:

  • Django runtime/security vars
  • PostgreSQL connection vars
  • image tag vars (APP_IMAGE_TAG, NGINX_IMAGE_TAG)
  • snapshot directory vars (STATIC_DATASET_*)
  • optional future scheduler vars (SCHEDULER_*)
  • daily orchestration vars (DAILY_ORCHESTRATION_*)

Snapshot Storage Convention

Snapshot files are expected under:

  • incoming: /app/snapshots/incoming
  • archive: /app/snapshots/archive
  • failed: /app/snapshots/failed

Configured via environment:

  • STATIC_DATASET_INCOMING_DIR
  • STATIC_DATASET_ARCHIVE_DIR
  • STATIC_DATASET_FAILED_DIR

Snapshot JSON Schema (MVP)

Each file must be a JSON object:

{
  "source_name": "official_site_feed",
  "snapshot_date": "2026-03-13",
  "records": [
    {
      "competition_external_id": "comp-nba",
      "competition_name": "NBA",
      "season": "2025-2026",
      "team_external_id": "team-lal",
      "team_name": "Los Angeles Lakers",
      "player_external_id": "player-23",
      "full_name": "LeBron James",
      "first_name": "LeBron",
      "last_name": "James",
      "birth_date": "1984-12-30",
      "nationality": "US",
      "height_cm": 206,
      "weight_kg": 113,
      "position": "SF",
      "role": "Primary Creator",
      "games_played": 60,
      "minutes_per_game": 34.5,
      "points_per_game": 25.4,
      "rebounds_per_game": 7.2,
      "assists_per_game": 8.1,
      "steals_per_game": 1.3,
      "blocks_per_game": 0.7,
      "turnovers_per_game": 3.2,
      "fg_pct": 51.1,
      "three_pt_pct": 38.4,
      "ft_pct": 79.8,
      "source_metadata": {},
      "raw_payload": {}
    }
  ],
  "source_metadata": {},
  "raw_payload": {}
}

Validation is strict:

  • unknown fields are rejected
  • required fields must exist
  • snapshot_date and birth_date must be YYYY-MM-DD
  • numeric fields must be numeric
  • invalid files are moved to failed directory

Import Command

Run import:

docker compose exec web python manage.py import_snapshots

Run end-to-end daily orchestration manually (extractors -> import):

docker compose exec web python manage.py run_daily_orchestration

Command behavior:

  • scans STATIC_DATASET_INCOMING_DIR for .json files
  • validates strict schema
  • computes SHA-256 checksum
  • creates ImportRun + ImportFile records
  • upserts relational entities (Competition, Season, Team, Player, PlayerSeason, PlayerSeasonStats)
  • skips duplicate content using checksum
  • moves valid files to archive
  • moves invalid files to failed

Source Identity Namespacing

Raw external IDs are not globally unique across basketball data sources. HoopScout v2 uses a namespaced identity for imported entities:

  • Competition: unique key is (source_name, source_uid)
  • Team: unique key is (source_name, source_uid)
  • Player: unique key is (source_name, source_uid)

source_uid values from different sources (for example lba and bcl) can safely overlap without overwriting each other.

Import history is visible in Django admin:

  • ImportRun
  • ImportFile

Extractor Framework (v2)

v2 keeps extraction and import as two separate steps:

  1. Extractors fetch public source content and emit normalized JSON snapshots.
  2. Importer (import_snapshots) validates and upserts those snapshots into PostgreSQL.

Extractor pipeline:

  • fetch (public endpoint/page requests with conservative HTTP behavior)
  • parse (source-specific structure)
  • normalize (map to HoopScout snapshot schema)
  • emit (write JSON file to incoming directory or custom path)

Built-in extractor in this phase:

  • public_json_snapshot (generic JSON feed extractor for MVP usage)
  • lba (Lega Basket Serie A MVP extractor)
  • bcl (Basketball Champions League MVP extractor)

Run extractor:

docker compose exec web python manage.py run_extractor public_json_snapshot

Run extractor with explicit output path (debugging):

docker compose exec web python manage.py run_extractor public_json_snapshot --output-path /app/snapshots/incoming

Dry-run validation (no file write):

docker compose exec web python manage.py run_extractor public_json_snapshot --dry-run

Run only the LBA extractor:

docker compose exec web python manage.py run_lba_extractor

Run only the BCL extractor:

docker compose exec web python manage.py run_bcl_extractor

Daily orchestration behavior

run_daily_orchestration performs:

  1. run configured extractors in order from DAILY_ORCHESTRATION_EXTRACTORS
  2. write snapshots to incoming dir
  3. run import_snapshots
  4. log extractor/import summary

Extractor environment variables:

  • EXTRACTOR_USER_AGENT
  • EXTRACTOR_HTTP_TIMEOUT_SECONDS
  • EXTRACTOR_HTTP_RETRIES
  • EXTRACTOR_RETRY_SLEEP_SECONDS
  • EXTRACTOR_REQUEST_DELAY_SECONDS
  • EXTRACTOR_PUBLIC_JSON_URL
  • EXTRACTOR_PUBLIC_SOURCE_NAME
  • EXTRACTOR_INCLUDE_RAW_PAYLOAD
  • EXTRACTOR_LBA_STATS_URL
  • EXTRACTOR_LBA_SEASON_LABEL
  • EXTRACTOR_LBA_COMPETITION_EXTERNAL_ID
  • EXTRACTOR_LBA_COMPETITION_NAME
  • EXTRACTOR_BCL_STATS_URL
  • EXTRACTOR_BCL_SEASON_LABEL
  • EXTRACTOR_BCL_COMPETITION_EXTERNAL_ID
  • EXTRACTOR_BCL_COMPETITION_NAME
  • DAILY_ORCHESTRATION_EXTRACTORS
  • DAILY_ORCHESTRATION_INTERVAL_SECONDS

Notes:

  • extraction is intentionally low-frequency and uses retries conservatively
  • only public pages/endpoints should be targeted
  • emitted snapshots must match the same schema consumed by import_snapshots
  • optional scheduler container runs scripts/scheduler.sh loop using:
    • image: registry.younerd.org/hoopscout/scheduler:${APP_IMAGE_TAG:-latest}
    • command: /app/scripts/scheduler.sh
    • interval: DAILY_ORCHESTRATION_INTERVAL_SECONDS

Scheduler entrypoint/runtime expectations

  • scheduler uses the same app image and base entrypoint.sh as web
  • scheduler requires database connectivity and snapshot volumes
  • scheduler is disabled unless:
    • compose scheduler profile is started
    • SCHEDULER_ENABLED=1
  • this keeps default runtime simple while supporting daily automation

LBA extractor assumptions and limitations (MVP)

  • source_name is fixed to lba
  • the extractor expects one stable public JSON payload that includes player/team/stat rows
  • competition is configured by environment and emitted as:
    • competition_external_id from EXTRACTOR_LBA_COMPETITION_EXTERNAL_ID
    • competition_name from EXTRACTOR_LBA_COMPETITION_NAME
  • season is configured by EXTRACTOR_LBA_SEASON_LABEL
  • parser supports payload keys: records, data, players, items
  • normalization supports nested player and team objects with common stat aliases (gp/mpg/ppg/rpg/apg/spg/bpg/tov)
  • no live HTTP calls in tests; tests use fixtures/mocked responses only

BCL extractor assumptions and limitations (MVP)

  • source_name is fixed to bcl
  • the extractor expects one stable public JSON payload that includes player/team/stat rows
  • competition is configured by environment and emitted as:
    • competition_external_id from EXTRACTOR_BCL_COMPETITION_EXTERNAL_ID
    • competition_name from EXTRACTOR_BCL_COMPETITION_NAME
  • season is configured by EXTRACTOR_BCL_SEASON_LABEL
  • parser supports payload keys: records, data, players, items
  • normalization supports nested player and team objects with common stat aliases (gp/mpg/ppg/rpg/apg/spg/bpg/tov)
  • no live HTTP calls in tests; tests use fixtures/mocked responses only

Migration and Superuser Commands

docker compose exec web python manage.py migrate
docker compose exec web python manage.py createsuperuser

Health Endpoints

  • app health: /health/
  • nginx healthcheck proxies /health/ to web

Player Search (v2)

Public player search is server-rendered (Django templates) with HTMX partial updates.

Supported filters:

  • free text name search
  • nominal position, inferred role
  • competition, season, team
  • nationality
  • age, height, weight ranges
  • stats thresholds: games, MPG, PPG, RPG, APG, SPG, BPG, TOV, FG%, 3P%, FT%

Search correctness:

  • combined team/competition/season/stat filters are applied to the same PlayerSeason context (no cross-row false positives)
  • filtering happens at database level with Django ORM

Search metric semantics:

  • result columns are labeled as Best Eligible
  • each displayed metric is MAX over eligible player-season rows for that metric in the current filter context
  • different metric columns for one player may come from different eligible seasons
  • when no eligible value exists for a metric in the current context, the UI shows -

Pagination and sorting:

  • querystring is preserved
  • HTMX navigation keeps URL state in sync with current filters/page/sort

Saved Searches and Watchlist (v2)

Authenticated users can:

  • save current search filters from the player search page
  • re-run saved searches from scouting pages
  • rename/update/delete saved searches
  • update saved search filters via structured JSON in the edit screen
  • add/remove favorite players inline (HTMX-friendly) and browse watchlist

GitFlow

Required branch model:

  • main: production
  • develop: integration
  • feature/*, release/*, hotfix/*

This v2 work branch is:

  • feature/hoopscout-v2-static-architecture

Notes on Legacy Layers

Legacy provider/Celery ingestion layers are not the default runtime path for v2 foundation. They are intentionally isolated until replaced by v2 snapshot ingestion commands in later tasks.

Description
No description provided
Readme GPL-3.0 585 KiB
Languages
Python 82.3%
HTML 8.8%
CSS 7.7%
Shell 0.6%
Dockerfile 0.5%
Other 0.1%