HoopScout v2 (Foundation Reset)
HoopScout v2 is a controlled greenfield rebuild inside the existing repository.
Current v2 foundation scope in this branch:
- Django + HTMX server-rendered app
- PostgreSQL as the only primary database
- nginx reverse proxy
- management-command-driven runtime operations
- static snapshot directories persisted via Docker named volumes
- strict JSON snapshot schema + import management command
Out of scope in this step:
- extractor implementation
Runtime Architecture (v2)
Runtime services are intentionally small:
web(Django/Gunicorn)postgres(primary DB)nginx(reverse proxy + static/media serving)
No Redis/Celery services are part of the v2 default runtime topology. Legacy Celery/provider code is still in repository history/codebase but de-emphasized for v2.
Image Strategy
Compose builds and tags images as:
registry.younerd.org/hoopscout/web:${APP_IMAGE_TAG:-latest}registry.younerd.org/hoopscout/nginx:${NGINX_IMAGE_TAG:-latest}
Reserved for future optional scheduler use:
registry.younerd.org/hoopscout/scheduler:${APP_IMAGE_TAG:-latest}
Entrypoint Strategy
web:entrypoint.sh- waits for PostgreSQL
- optionally runs migrations/collectstatic
- ensures snapshot directories exist
nginx:nginx/entrypoint.sh- simple runtime entrypoint wrapper
Compose Files
docker-compose.yml: production-minded baseline runtime (immutable image filesystem)docker-compose.dev.yml: development override with source bind mount forwebdocker-compose.release.yml: production settings override (DJANGO_SETTINGS_MODULE=config.settings.production)
Start development runtime
cp .env.example .env
docker compose -f docker-compose.yml -f docker-compose.dev.yml up --build
Start release-style runtime
docker compose -f docker-compose.yml -f docker-compose.release.yml up -d --build
Named Volumes
v2 runtime uses named volumes for persistence:
postgres_datastatic_datamedia_datasnapshots_incomingsnapshots_archivesnapshots_failed
Development override uses separate dev-prefixed volumes to avoid ownership collisions.
Environment Variables
Use .env.example as the source of truth.
Core groups:
- Django runtime/security vars
- PostgreSQL connection vars
- image tag vars (
APP_IMAGE_TAG,NGINX_IMAGE_TAG) - snapshot directory vars (
STATIC_DATASET_*) - optional future scheduler vars (
SCHEDULER_*)
Snapshot Storage Convention
Snapshot files are expected under:
- incoming:
/app/snapshots/incoming - archive:
/app/snapshots/archive - failed:
/app/snapshots/failed
Configured via environment:
STATIC_DATASET_INCOMING_DIRSTATIC_DATASET_ARCHIVE_DIRSTATIC_DATASET_FAILED_DIR
Snapshot JSON Schema (MVP)
Each file must be a JSON object:
{
"source_name": "official_site_feed",
"snapshot_date": "2026-03-13",
"records": [
{
"competition_external_id": "comp-nba",
"competition_name": "NBA",
"season": "2025-2026",
"team_external_id": "team-lal",
"team_name": "Los Angeles Lakers",
"player_external_id": "player-23",
"full_name": "LeBron James",
"first_name": "LeBron",
"last_name": "James",
"birth_date": "1984-12-30",
"nationality": "US",
"height_cm": 206,
"weight_kg": 113,
"position": "SF",
"role": "Primary Creator",
"games_played": 60,
"minutes_per_game": 34.5,
"points_per_game": 25.4,
"rebounds_per_game": 7.2,
"assists_per_game": 8.1,
"steals_per_game": 1.3,
"blocks_per_game": 0.7,
"turnovers_per_game": 3.2,
"fg_pct": 51.1,
"three_pt_pct": 38.4,
"ft_pct": 79.8,
"source_metadata": {},
"raw_payload": {}
}
],
"source_metadata": {},
"raw_payload": {}
}
Validation is strict:
- unknown fields are rejected
- required fields must exist
snapshot_dateandbirth_datemust beYYYY-MM-DD- numeric fields must be numeric
- invalid files are moved to failed directory
Import Command
Run import:
docker compose exec web python manage.py import_snapshots
Command behavior:
- scans
STATIC_DATASET_INCOMING_DIRfor.jsonfiles - validates strict schema
- computes SHA-256 checksum
- creates
ImportRun+ImportFilerecords - upserts relational entities (
Competition,Season,Team,Player,PlayerSeason,PlayerSeasonStats) - skips duplicate content using checksum
- moves valid files to archive
- moves invalid files to failed
Import history is visible in Django admin:
ImportRunImportFile
Extractor Framework (v2)
v2 keeps extraction and import as two separate steps:
- Extractors fetch public source content and emit normalized JSON snapshots.
- Importer (
import_snapshots) validates and upserts those snapshots into PostgreSQL.
Extractor pipeline:
fetch(public endpoint/page requests with conservative HTTP behavior)parse(source-specific structure)normalize(map to HoopScout snapshot schema)emit(write JSON file to incoming directory or custom path)
Built-in extractor in this phase:
public_json_snapshot(generic JSON feed extractor for MVP usage)
Run extractor:
docker compose exec web python manage.py run_extractor public_json_snapshot
Run extractor with explicit output path (debugging):
docker compose exec web python manage.py run_extractor public_json_snapshot --output-path /app/snapshots/incoming
Dry-run validation (no file write):
docker compose exec web python manage.py run_extractor public_json_snapshot --dry-run
Extractor environment variables:
EXTRACTOR_USER_AGENTEXTRACTOR_HTTP_TIMEOUT_SECONDSEXTRACTOR_HTTP_RETRIESEXTRACTOR_RETRY_SLEEP_SECONDSEXTRACTOR_REQUEST_DELAY_SECONDSEXTRACTOR_PUBLIC_JSON_URLEXTRACTOR_PUBLIC_SOURCE_NAMEEXTRACTOR_INCLUDE_RAW_PAYLOAD
Notes:
- extraction is intentionally low-frequency and uses retries conservatively
- only public pages/endpoints should be targeted
- emitted snapshots must match the same schema consumed by
import_snapshots
Migration and Superuser Commands
docker compose exec web python manage.py migrate
docker compose exec web python manage.py createsuperuser
Health Endpoints
- app health:
/health/ - nginx healthcheck proxies
/health/toweb
Player Search (v2)
Public player search is server-rendered (Django templates) with HTMX partial updates.
Supported filters:
- free text name search
- nominal position, inferred role
- competition, season, team
- nationality
- age, height, weight ranges
- stats thresholds: games, MPG, PPG, RPG, APG, SPG, BPG, TOV, FG%, 3P%, FT%
Search correctness:
- combined team/competition/season/stat filters are applied to the same
PlayerSeasoncontext (no cross-row false positives) - filtering happens at database level with Django ORM
Search metric semantics:
- result columns are labeled as Best Eligible
- each displayed metric is
MAXover eligible player-season rows for that metric in the current filter context - different metric columns for one player may come from different eligible seasons
- when no eligible value exists for a metric in the current context, the UI shows
-
Pagination and sorting:
- querystring is preserved
- HTMX navigation keeps URL state in sync with current filters/page/sort
GitFlow
Required branch model:
main: productiondevelop: integrationfeature/*,release/*,hotfix/*
This v2 work branch is:
feature/hoopscout-v2-static-architecture
Notes on Legacy Layers
Legacy provider/Celery ingestion layers are not the default runtime path for v2 foundation. They are intentionally isolated until replaced by v2 snapshot ingestion commands in later tasks.