2026-03-13 14:24:54 +01:00
2026-03-09 15:56:35 +01:00

HoopScout v2 (Foundation Reset)

HoopScout v2 is a controlled greenfield rebuild inside the existing repository.

Current v2 foundation scope in this branch:

  • Django + HTMX server-rendered app
  • PostgreSQL as the only primary database
  • nginx reverse proxy
  • management-command-driven runtime operations
  • static snapshot directories persisted via Docker named volumes
  • strict JSON snapshot schema + import management command

Out of scope in this step:

  • extractor implementation

Runtime Architecture (v2)

Runtime services are intentionally small:

  • web (Django/Gunicorn)
  • postgres (primary DB)
  • nginx (reverse proxy + static/media serving)

No Redis/Celery services are part of the v2 default runtime topology. Legacy Celery/provider code is still in repository history/codebase but de-emphasized for v2.

Image Strategy

Compose builds and tags images as:

  • registry.younerd.org/hoopscout/web:${APP_IMAGE_TAG:-latest}
  • registry.younerd.org/hoopscout/nginx:${NGINX_IMAGE_TAG:-latest}

Reserved for future optional scheduler use:

  • registry.younerd.org/hoopscout/scheduler:${APP_IMAGE_TAG:-latest}

Entrypoint Strategy

  • web: entrypoint.sh
    • waits for PostgreSQL
    • optionally runs migrations/collectstatic
    • ensures snapshot directories exist
  • nginx: nginx/entrypoint.sh
    • simple runtime entrypoint wrapper

Compose Files

  • docker-compose.yml: production-minded baseline runtime (immutable image filesystem)
  • docker-compose.dev.yml: development override with source bind mount for web
  • docker-compose.release.yml: production settings override (DJANGO_SETTINGS_MODULE=config.settings.production)

Start development runtime

cp .env.example .env
docker compose -f docker-compose.yml -f docker-compose.dev.yml up --build

Start release-style runtime

docker compose -f docker-compose.yml -f docker-compose.release.yml up -d --build

Named Volumes

v2 runtime uses named volumes for persistence:

  • postgres_data
  • static_data
  • media_data
  • snapshots_incoming
  • snapshots_archive
  • snapshots_failed

Development override uses separate dev-prefixed volumes to avoid ownership collisions.

Environment Variables

Use .env.example as the source of truth.

Core groups:

  • Django runtime/security vars
  • PostgreSQL connection vars
  • image tag vars (APP_IMAGE_TAG, NGINX_IMAGE_TAG)
  • snapshot directory vars (STATIC_DATASET_*)
  • optional future scheduler vars (SCHEDULER_*)

Snapshot Storage Convention

Snapshot files are expected under:

  • incoming: /app/snapshots/incoming
  • archive: /app/snapshots/archive
  • failed: /app/snapshots/failed

Configured via environment:

  • STATIC_DATASET_INCOMING_DIR
  • STATIC_DATASET_ARCHIVE_DIR
  • STATIC_DATASET_FAILED_DIR

Snapshot JSON Schema (MVP)

Each file must be a JSON object:

{
  "source_name": "official_site_feed",
  "snapshot_date": "2026-03-13",
  "records": [
    {
      "competition_external_id": "comp-nba",
      "competition_name": "NBA",
      "season": "2025-2026",
      "team_external_id": "team-lal",
      "team_name": "Los Angeles Lakers",
      "player_external_id": "player-23",
      "full_name": "LeBron James",
      "first_name": "LeBron",
      "last_name": "James",
      "birth_date": "1984-12-30",
      "nationality": "US",
      "height_cm": 206,
      "weight_kg": 113,
      "position": "SF",
      "role": "Primary Creator",
      "games_played": 60,
      "minutes_per_game": 34.5,
      "points_per_game": 25.4,
      "rebounds_per_game": 7.2,
      "assists_per_game": 8.1,
      "steals_per_game": 1.3,
      "blocks_per_game": 0.7,
      "turnovers_per_game": 3.2,
      "fg_pct": 51.1,
      "three_pt_pct": 38.4,
      "ft_pct": 79.8,
      "source_metadata": {},
      "raw_payload": {}
    }
  ],
  "source_metadata": {},
  "raw_payload": {}
}

Validation is strict:

  • unknown fields are rejected
  • required fields must exist
  • snapshot_date and birth_date must be YYYY-MM-DD
  • numeric fields must be numeric
  • invalid files are moved to failed directory

Import Command

Run import:

docker compose exec web python manage.py import_snapshots

Command behavior:

  • scans STATIC_DATASET_INCOMING_DIR for .json files
  • validates strict schema
  • computes SHA-256 checksum
  • creates ImportRun + ImportFile records
  • upserts relational entities (Competition, Season, Team, Player, PlayerSeason, PlayerSeasonStats)
  • skips duplicate content using checksum
  • moves valid files to archive
  • moves invalid files to failed

Import history is visible in Django admin:

  • ImportRun
  • ImportFile

Extractor Framework (v2)

v2 keeps extraction and import as two separate steps:

  1. Extractors fetch public source content and emit normalized JSON snapshots.
  2. Importer (import_snapshots) validates and upserts those snapshots into PostgreSQL.

Extractor pipeline:

  • fetch (public endpoint/page requests with conservative HTTP behavior)
  • parse (source-specific structure)
  • normalize (map to HoopScout snapshot schema)
  • emit (write JSON file to incoming directory or custom path)

Built-in extractor in this phase:

  • public_json_snapshot (generic JSON feed extractor for MVP usage)

Run extractor:

docker compose exec web python manage.py run_extractor public_json_snapshot

Run extractor with explicit output path (debugging):

docker compose exec web python manage.py run_extractor public_json_snapshot --output-path /app/snapshots/incoming

Dry-run validation (no file write):

docker compose exec web python manage.py run_extractor public_json_snapshot --dry-run

Extractor environment variables:

  • EXTRACTOR_USER_AGENT
  • EXTRACTOR_HTTP_TIMEOUT_SECONDS
  • EXTRACTOR_HTTP_RETRIES
  • EXTRACTOR_RETRY_SLEEP_SECONDS
  • EXTRACTOR_REQUEST_DELAY_SECONDS
  • EXTRACTOR_PUBLIC_JSON_URL
  • EXTRACTOR_PUBLIC_SOURCE_NAME
  • EXTRACTOR_INCLUDE_RAW_PAYLOAD

Notes:

  • extraction is intentionally low-frequency and uses retries conservatively
  • only public pages/endpoints should be targeted
  • emitted snapshots must match the same schema consumed by import_snapshots

Migration and Superuser Commands

docker compose exec web python manage.py migrate
docker compose exec web python manage.py createsuperuser

Health Endpoints

  • app health: /health/
  • nginx healthcheck proxies /health/ to web

Player Search (v2)

Public player search is server-rendered (Django templates) with HTMX partial updates.

Supported filters:

  • free text name search
  • nominal position, inferred role
  • competition, season, team
  • nationality
  • age, height, weight ranges
  • stats thresholds: games, MPG, PPG, RPG, APG, SPG, BPG, TOV, FG%, 3P%, FT%

Search correctness:

  • combined team/competition/season/stat filters are applied to the same PlayerSeason context (no cross-row false positives)
  • filtering happens at database level with Django ORM

Search metric semantics:

  • result columns are labeled as Best Eligible
  • each displayed metric is MAX over eligible player-season rows for that metric in the current filter context
  • different metric columns for one player may come from different eligible seasons
  • when no eligible value exists for a metric in the current context, the UI shows -

Pagination and sorting:

  • querystring is preserved
  • HTMX navigation keeps URL state in sync with current filters/page/sort

GitFlow

Required branch model:

  • main: production
  • develop: integration
  • feature/*, release/*, hotfix/*

This v2 work branch is:

  • feature/hoopscout-v2-static-architecture

Notes on Legacy Layers

Legacy provider/Celery ingestion layers are not the default runtime path for v2 foundation. They are intentionally isolated until replaced by v2 snapshot ingestion commands in later tasks.

Description
No description provided
Readme GPL-3.0 585 KiB
Languages
Python 82.3%
HTML 8.8%
CSS 7.7%
Shell 0.6%
Dockerfile 0.5%
Other 0.1%