Compare commits

..

10 Commits

31 changed files with 785 additions and 839 deletions

View File

@ -62,6 +62,12 @@ SCHEDULER_INTERVAL_SECONDS=900
# When scheduler is disabled but container is started, keep it idle (avoid restart loops)
SCHEDULER_DISABLED_SLEEP_SECONDS=300
# Legacy provider-sync stack (v1-style) is disabled by default in v2.
LEGACY_PROVIDER_STACK_ENABLED=0
# Optional legacy provider settings (only when LEGACY_PROVIDER_STACK_ENABLED=1):
# PROVIDER_BACKEND=demo
# PROVIDER_DEFAULT_NAMESPACE=mvp_demo
# API safeguards (read-only API is optional)
API_THROTTLE_ANON=100/hour
API_THROTTLE_USER=1000/hour

View File

@ -45,6 +45,13 @@ docker compose -f docker-compose.yml -f docker-compose.dev.yml up --build
docker compose -f docker-compose.yml -f docker-compose.release.yml up -d --build
```
### Verify release topology assumptions
```bash
docker compose -f docker-compose.yml -f docker-compose.release.yml config
./scripts/verify_release_topology.sh
```
## Day-to-Day Feature Workflow
1. Sync `develop`
@ -63,6 +70,15 @@ git checkout -b feature/your-feature-name
3. Implement with focused commits and tests.
4. Open PR: `feature/*` -> `develop`.
## Running Tests (v2)
Runtime images are intentionally lean and may not ship `pytest`.
Use the development compose stack and install dev dependencies before running tests:
```bash
docker compose -f docker-compose.yml -f docker-compose.dev.yml run --rm web sh -lc "export PYTHONUSERBASE=/tmp/pyuser && python -m pip install --user -r requirements/dev.txt && python -m pytest -q"
```
## PR Checklist
- [ ] Target branch is correct
@ -78,6 +94,8 @@ git checkout -b feature/your-feature-name
- Keep PostgreSQL as source of truth.
- Keep snapshot storage file-based and volume-backed.
- Do not introduce MongoDB or Elasticsearch as source of truth.
- Keep legacy provider/Celery sync code isolated behind `LEGACY_PROVIDER_STACK_ENABLED=1`.
- Keep runtime/docs consistency aligned with `docs/runtime-consistency-checklist.md`.
## Repository Bootstrap Commands

View File

@ -9,9 +9,8 @@ Current v2 foundation scope in this branch:
- management-command-driven runtime operations
- static snapshot directories persisted via Docker named volumes
- strict JSON snapshot schema + import management command
Out of scope in this step:
- extractor implementation
- extractor framework with LBA/BCL/public JSON adapters
- daily orchestration command and optional scheduler profile
## Runtime Architecture (v2)
@ -22,7 +21,8 @@ Runtime services are intentionally small:
- optional `scheduler` profile service (runs daily extractor/import loop)
No Redis/Celery services are part of the v2 default runtime topology.
Legacy Celery/provider code is still in repository history/codebase but de-emphasized for v2.
Legacy Celery/provider code remains in-repo but is isolated behind `LEGACY_PROVIDER_STACK_ENABLED=1`.
Default v2 runtime keeps that stack disabled.
## Image Strategy
@ -47,6 +47,7 @@ Reserved for future optional scheduler use:
- `docker-compose.yml`: production-minded baseline runtime (immutable image filesystem)
- `docker-compose.dev.yml`: development override with source bind mount for `web`
- `docker-compose.release.yml`: production settings override (`DJANGO_SETTINGS_MODULE=config.settings.production`)
- `scripts/verify_release_topology.sh`: validates merged release compose has no source-code bind mounts for runtime services
### Start development runtime
@ -73,6 +74,31 @@ For development override:
docker compose -f docker-compose.yml -f docker-compose.dev.yml --profile scheduler up -d scheduler
```
### Runtime Modes At A Glance
- development (`docker-compose.yml` + `docker-compose.dev.yml`):
- mutable source bind mounts for `web` and `scheduler`
- optimized for local iteration
- release-style (`docker-compose.yml` + `docker-compose.release.yml`):
- immutable app filesystem for runtime services
- production settings enabled for Django
- scheduler profile:
- only starts when `--profile scheduler` is used
- if started with `SCHEDULER_ENABLED=0`, scheduler stays in idle sleep mode (no restart loop exit behavior)
### Release Topology Verification
Verify merged release config and immutability:
```bash
docker compose -f docker-compose.yml -f docker-compose.release.yml config
./scripts/verify_release_topology.sh
```
Verification expectation:
- `web` and `scheduler` must not bind-mount repository source code in release mode.
- named volumes for DB/static/media/snapshots remain mounted.
## Named Volumes
v2 runtime uses named volumes for persistence:
@ -85,6 +111,11 @@ v2 runtime uses named volumes for persistence:
Development override uses separate dev-prefixed volumes to avoid ownership collisions.
Snapshot volume intent:
- `snapshots_incoming`: extractor output waiting for import
- `snapshots_archive`: successfully imported files
- `snapshots_failed`: schema/processing failures for operator inspection
## Environment Variables
Use `.env.example` as the source of truth.
@ -96,6 +127,10 @@ Core groups:
- snapshot directory vars (`STATIC_DATASET_*`)
- optional future scheduler vars (`SCHEDULER_*`)
- daily orchestration vars (`DAILY_ORCHESTRATION_*`)
- optional legacy provider-sync toggle (`LEGACY_PROVIDER_STACK_ENABLED`)
Operational reference:
- `docs/runtime-consistency-checklist.md`
## Snapshot Storage Convention
@ -156,11 +191,23 @@ Each file must be a JSON object:
Validation is strict:
- unknown fields are rejected
- required fields must exist
- `snapshot_date` and `birth_date` must be `YYYY-MM-DD`
- required fields must exist:
- `competition_external_id`, `competition_name`, `season`
- `team_external_id`, `team_name`
- `player_external_id`, `full_name`
- core stats (`games_played`, `minutes_per_game`, `points_per_game`, `rebounds_per_game`, `assists_per_game`, `steals_per_game`, `blocks_per_game`, `turnovers_per_game`, `fg_pct`, `three_pt_pct`, `ft_pct`)
- optional player bio/physical fields:
- `first_name`, `last_name`, `birth_date`, `nationality`, `height_cm`, `weight_kg`, `position`, `role`
- when `birth_date` is provided it must be `YYYY-MM-DD`
- numeric fields must be numeric
- invalid files are moved to failed directory
Importer enrichment note:
- `full_name` is source truth for identity display
- `first_name` / `last_name` are optional and may be absent in public snapshots
- when both are missing, importer may derive them from `full_name` as a best-effort enrichment step
- this enrichment is convenience-only and does not override source truth semantics
## Import Command
Run import:
@ -185,6 +232,12 @@ Command behavior:
- moves valid files to archive
- moves invalid files to failed
Import lifecycle summary:
1. extractor writes normalized snapshots to `incoming`
2. `import_snapshots` validates + upserts to PostgreSQL
3. imported files move to `archive`
4. invalid files move to `failed` with error details in `ImportFile`
### Source Identity Namespacing
Raw external IDs are **not globally unique** across basketball data sources. HoopScout v2 uses a namespaced identity for imported entities:
@ -278,6 +331,7 @@ Notes:
- extraction is intentionally low-frequency and uses retries conservatively
- only public pages/endpoints should be targeted
- emitted snapshots must match the same schema consumed by `import_snapshots`
- `public_json_snapshot` uses the same required-vs-optional field contract as `SnapshotSchemaValidator` (no stricter extractor-only required bio/physical fields)
- optional scheduler container runs `scripts/scheduler.sh` loop using:
- image: `registry.younerd.org/hoopscout/scheduler:${APP_IMAGE_TAG:-latest}`
- command: `/app/scripts/scheduler.sh`
@ -304,6 +358,7 @@ Notes:
- season is configured by `EXTRACTOR_LBA_SEASON_LABEL`
- parser supports payload keys: `records`, `data`, `players`, `items`
- normalization supports nested `player` and `team` objects with common stat aliases (`gp/mpg/ppg/rpg/apg/spg/bpg/tov`)
- public-source player bio/physical fields are often incomplete; extractor allows them to be missing and emits `null` for optional fields
- no live HTTP calls in tests; tests use fixtures/mocked responses only
### BCL extractor assumptions and limitations (MVP)
@ -316,8 +371,20 @@ Notes:
- season is configured by `EXTRACTOR_BCL_SEASON_LABEL`
- parser supports payload keys: `records`, `data`, `players`, `items`
- normalization supports nested `player` and `team` objects with common stat aliases (`gp/mpg/ppg/rpg/apg/spg/bpg/tov`)
- public-source player bio/physical fields are often incomplete; extractor allows them to be missing and emits `null` for optional fields
- no live HTTP calls in tests; tests use fixtures/mocked responses only
## Testing
- runtime `web` image stays lean and may not include `pytest` tooling
- runtime containers (`web`/`nginx`/`scheduler`) are for serving/orchestration, not preloaded test tooling
- run tests with the development compose stack (or a dedicated test image/profile) and install dev dependencies first
- local example (one-off):
```bash
docker compose -f docker-compose.yml -f docker-compose.dev.yml run --rm web sh -lc "export PYTHONUSERBASE=/tmp/pyuser && python -m pip install --user -r requirements/dev.txt && python -m pytest -q"
```
## Migration and Superuser Commands
```bash
@ -352,6 +419,20 @@ Search metric semantics:
- different metric columns for one player may come from different eligible seasons
- when no eligible value exists for a metric in the current context, the UI shows `-`
### API Search Metric Transparency
`GET /api/players/` now exposes sortable metric fields directly in each list row:
- `ppg_value`
- `mpg_value`
These fields use the same **best eligible** semantics as UI search. They are computed from eligible
player-season rows in the current filter context and may be `null` when no eligible data exists.
API list responses also include:
- `sort`: effective sort key applied
- `metric_sort_keys`: metric-based sort keys currently supported
- `metric_semantics`: plain-language metric contract used for sorting/interpretation
Pagination and sorting:
- querystring is preserved
- HTMX navigation keeps URL state in sync with current filters/page/sort
@ -379,3 +460,7 @@ This v2 work branch is:
Legacy provider/Celery ingestion layers are not the default runtime path for v2 foundation.
They are intentionally isolated until replaced by v2 snapshot ingestion commands in later tasks.
By default:
- `apps.providers` is not installed
- `/providers/` routes are not mounted
- legacy provider-specific settings are not required

View File

@ -45,6 +45,8 @@ class PlayerListSerializer(serializers.ModelSerializer):
inferred_role = serializers.CharField(source="inferred_role.name", allow_null=True)
origin_competition = serializers.CharField(source="origin_competition.name", allow_null=True)
origin_team = serializers.CharField(source="origin_team.name", allow_null=True)
ppg_value = serializers.SerializerMethodField()
mpg_value = serializers.SerializerMethodField()
class Meta:
model = Player
@ -59,10 +61,20 @@ class PlayerListSerializer(serializers.ModelSerializer):
"origin_team",
"height_cm",
"weight_kg",
"ppg_value",
"mpg_value",
"dominant_hand",
"is_active",
]
def get_ppg_value(self, obj):
value = getattr(obj, "ppg_value", None)
return str(value) if value is not None else None
def get_mpg_value(self, obj):
value = getattr(obj, "mpg_value", None)
return float(value) if value is not None else None
class PlayerAliasSerializer(serializers.Serializer):
alias = serializers.CharField()

View File

@ -9,6 +9,7 @@ from apps.players.forms import PlayerSearchForm
from apps.players.models import Player
from apps.players.services.search import (
METRIC_SORT_KEYS,
SEARCH_METRIC_SEMANTICS_TEXT,
annotate_player_metrics,
apply_sorting,
base_player_queryset,
@ -67,14 +68,17 @@ class PlayerSearchApiView(ReadOnlyBaseAPIView, generics.ListAPIView):
form = self.get_search_form()
if form.is_bound and not form.is_valid():
return self._validation_error_response()
return super().list(request, *args, **kwargs)
response = super().list(request, *args, **kwargs)
response.data["sort"] = form.cleaned_data.get("sort", "name_asc")
response.data["metric_semantics"] = SEARCH_METRIC_SEMANTICS_TEXT
response.data["metric_sort_keys"] = sorted(METRIC_SORT_KEYS)
return response
def get_queryset(self):
form = self.get_search_form()
queryset = base_player_queryset()
queryset = filter_players(queryset, form.cleaned_data)
sort_key = form.cleaned_data.get("sort", "name_asc")
if sort_key in METRIC_SORT_KEYS:
queryset = annotate_player_metrics(queryset, form.cleaned_data)
queryset = apply_sorting(queryset, sort_key)
return queryset

View File

@ -1,4 +1,5 @@
from django.contrib import admin
from django.conf import settings
from .models import ImportFile, ImportRun, IngestionError, IngestionRun
@ -91,15 +92,18 @@ class ImportFileAdmin(admin.ModelAdmin):
)
@admin.register(IngestionRun)
class LegacyIngestionRunAdmin(admin.ModelAdmin):
list_display = ("provider_namespace", "job_type", "status", "started_at", "finished_at")
list_filter = ("provider_namespace", "job_type", "status")
search_fields = ("provider_namespace", "error_summary")
@admin.register(IngestionError)
class LegacyIngestionErrorAdmin(admin.ModelAdmin):
list_display = ("provider_namespace", "entity_type", "external_id", "severity", "occurred_at")
list_filter = ("severity", "provider_namespace")
search_fields = ("entity_type", "external_id", "message")
if settings.LEGACY_PROVIDER_STACK_ENABLED:
admin.site.register(IngestionRun, LegacyIngestionRunAdmin)
admin.site.register(IngestionError, LegacyIngestionErrorAdmin)

View File

@ -16,6 +16,38 @@ def _first_non_empty(record: dict[str, Any], *keys: str) -> Any:
return None
def _first_non_empty_text(record: dict[str, Any], *keys: str) -> str | None:
for key in keys:
value = record.get(key)
if isinstance(value, str):
stripped = value.strip()
if stripped:
return stripped
return None
ESSENTIAL_FIELDS = {
"competition_external_id",
"competition_name",
"season",
"team_external_id",
"team_name",
"player_external_id",
"full_name",
"games_played",
"minutes_per_game",
"points_per_game",
"rebounds_per_game",
"assists_per_game",
"steals_per_game",
"blocks_per_game",
"turnovers_per_game",
"fg_pct",
"three_pt_pct",
"ft_pct",
}
class BCLSnapshotExtractor(BaseSnapshotExtractor):
"""
Basketball Champions League MVP extractor.
@ -86,7 +118,9 @@ class BCLSnapshotExtractor(BaseSnapshotExtractor):
team_external_id = _first_non_empty(source_record, "team_external_id", "team_id") or _first_non_empty(
team_obj, "id", "team_id"
)
team_name = _first_non_empty(source_record, "team_name", "team") or _first_non_empty(team_obj, "name")
team_name = _first_non_empty_text(source_record, "team_name", "team") or _first_non_empty_text(
team_obj, "name"
)
normalized = {
"competition_external_id": self.competition_external_id,
@ -122,7 +156,7 @@ class BCLSnapshotExtractor(BaseSnapshotExtractor):
"ft_pct": _first_non_empty(source_record, "ft_pct", "ft_percentage"),
}
missing = [key for key, value in normalized.items() if key != "role" and value in (None, "")]
missing = [key for key in ESSENTIAL_FIELDS if normalized.get(key) in (None, "")]
if missing:
raise ExtractorNormalizationError(f"bcl row missing required fields: {', '.join(sorted(missing))}")

View File

@ -16,6 +16,38 @@ def _first_non_empty(record: dict[str, Any], *keys: str) -> Any:
return None
def _first_non_empty_text(record: dict[str, Any], *keys: str) -> str | None:
for key in keys:
value = record.get(key)
if isinstance(value, str):
stripped = value.strip()
if stripped:
return stripped
return None
ESSENTIAL_FIELDS = {
"competition_external_id",
"competition_name",
"season",
"team_external_id",
"team_name",
"player_external_id",
"full_name",
"games_played",
"minutes_per_game",
"points_per_game",
"rebounds_per_game",
"assists_per_game",
"steals_per_game",
"blocks_per_game",
"turnovers_per_game",
"fg_pct",
"three_pt_pct",
"ft_pct",
}
class LBASnapshotExtractor(BaseSnapshotExtractor):
"""
LBA (Lega Basket Serie A) MVP extractor.
@ -86,7 +118,9 @@ class LBASnapshotExtractor(BaseSnapshotExtractor):
team_external_id = _first_non_empty(source_record, "team_external_id", "team_id") or _first_non_empty(
team_obj, "id", "team_id"
)
team_name = _first_non_empty(source_record, "team_name", "team") or _first_non_empty(team_obj, "name")
team_name = _first_non_empty_text(source_record, "team_name", "team") or _first_non_empty_text(
team_obj, "name"
)
normalized = {
"competition_external_id": self.competition_external_id,
@ -122,7 +156,7 @@ class LBASnapshotExtractor(BaseSnapshotExtractor):
"ft_pct": _first_non_empty(source_record, "ft_pct", "ft_percentage"),
}
missing = [key for key, value in normalized.items() if key != "role" and value in (None, "")]
missing = [key for key in ESSENTIAL_FIELDS if normalized.get(key) in (None, "")]
if missing:
raise ExtractorNormalizationError(f"lba row missing required fields: {', '.join(sorted(missing))}")

View File

@ -4,6 +4,8 @@ from typing import Any
from django.conf import settings
from apps.ingestion.snapshots.schema import REQUIRED_RECORD_FIELDS
from .base import (
BaseSnapshotExtractor,
ExtractorConfigError,
@ -113,7 +115,7 @@ class PublicJsonSnapshotExtractor(BaseSnapshotExtractor):
"ft_pct": _first_non_empty(source_record, "ft_pct"),
}
missing = [key for key, value in normalized.items() if key != "role" and value in (None, "")]
missing = [key for key in REQUIRED_RECORD_FIELDS if normalized.get(key) in (None, "")]
if missing:
raise ExtractorNormalizationError(
f"public_json_snapshot row missing required fields: {', '.join(sorted(missing))}"

View File

@ -1,9 +1,14 @@
from django.conf import settings
from .runs import finish_ingestion_run, log_ingestion_error, start_ingestion_run
from .sync import run_sync_job
__all__ = [
"start_ingestion_run",
"finish_ingestion_run",
"log_ingestion_error",
"run_sync_job",
]
if settings.LEGACY_PROVIDER_STACK_ENABLED:
from .sync import run_sync_job # pragma: no cover - legacy provider stack only.
__all__.append("run_sync_job")

View File

@ -62,6 +62,21 @@ def _parse_season_dates(label: str) -> tuple[date, date]:
return date(year, 9, 1), date(year + 1, 7, 31)
def _parse_optional_birth_date(value: str | None) -> date | None:
if value in (None, ""):
return None
return parse_date(value)
def _split_name_parts(full_name: str) -> tuple[str, str]:
parts = full_name.strip().split(maxsplit=1)
if not parts:
return "", ""
if len(parts) == 1:
return parts[0], ""
return parts[0], parts[1]
def _resolve_nationality(value: str | None) -> Nationality | None:
if not value:
return None
@ -152,9 +167,12 @@ def _upsert_record(record: dict[str, Any], *, source_name: str, snapshot_date: d
},
)
position_value = record.get("position")
position = None
if position_value:
position, _ = Position.objects.get_or_create(
code=_position_code(record["position"]),
defaults={"name": record["position"]},
code=_position_code(position_value),
defaults={"name": position_value},
)
role = None
if record.get("role"):
@ -163,19 +181,24 @@ def _upsert_record(record: dict[str, Any], *, source_name: str, snapshot_date: d
defaults={"name": record["role"]},
)
first_name = record.get("first_name") or ""
last_name = record.get("last_name") or ""
if not first_name and not last_name:
first_name, last_name = _split_name_parts(record["full_name"])
player, _ = Player.objects.update_or_create(
source_name=source_key,
source_uid=record["player_external_id"],
defaults={
"first_name": record["first_name"],
"last_name": record["last_name"],
"first_name": first_name,
"last_name": last_name,
"full_name": record["full_name"],
"birth_date": parse_date(record["birth_date"]),
"birth_date": _parse_optional_birth_date(record.get("birth_date")),
"nationality": _resolve_nationality(record.get("nationality")),
"nominal_position": position,
"inferred_role": role,
"height_cm": record["height_cm"],
"weight_kg": record["weight_kg"],
"height_cm": record.get("height_cm"),
"weight_kg": record.get("weight_kg"),
"is_active": True,
},
)

View File

@ -14,13 +14,6 @@ REQUIRED_RECORD_FIELDS = {
"team_name",
"player_external_id",
"full_name",
"first_name",
"last_name",
"birth_date",
"nationality",
"height_cm",
"weight_kg",
"position",
"games_played",
"minutes_per_game",
"points_per_game",
@ -34,6 +27,16 @@ REQUIRED_RECORD_FIELDS = {
"ft_pct",
}
OPTIONAL_RECORD_FIELDS = {
"first_name",
"last_name",
"birth_date",
"nationality",
"height_cm",
"weight_kg",
"position",
}
ALLOWED_TOP_LEVEL_FIELDS = {
"source_name",
"snapshot_date",
@ -42,7 +45,7 @@ ALLOWED_TOP_LEVEL_FIELDS = {
"raw_payload",
}
ALLOWED_RECORD_FIELDS = REQUIRED_RECORD_FIELDS | {
ALLOWED_RECORD_FIELDS = REQUIRED_RECORD_FIELDS | OPTIONAL_RECORD_FIELDS | {
"role",
"source_metadata",
"raw_payload",
@ -69,6 +72,15 @@ class SnapshotSchemaValidator:
raise SnapshotValidationError(f"{field} must be a non-empty string")
return value.strip()
@staticmethod
def _optional_string(value: Any, field: str) -> str | None:
if value in (None, ""):
return None
if not isinstance(value, str):
raise SnapshotValidationError(f"{field} must be a string when provided")
stripped = value.strip()
return stripped or None
@staticmethod
def _require_non_negative_int(value: Any, field: str) -> int:
if isinstance(value, bool):
@ -81,6 +93,12 @@ class SnapshotSchemaValidator:
raise SnapshotValidationError(f"{field} must be a non-negative integer")
return parsed
@classmethod
def _optional_non_negative_int(cls, value: Any, field: str) -> int | None:
if value in (None, ""):
return None
return cls._require_non_negative_int(value, field)
@staticmethod
def _require_float(value: Any, field: str) -> float:
try:
@ -112,23 +130,26 @@ class SnapshotSchemaValidator:
"team_name",
"player_external_id",
"full_name",
"first_name",
"last_name",
"nationality",
"position",
):
normalized[field] = cls._require_string(record.get(field), f"record[{index}].{field}")
for field in ("first_name", "last_name", "nationality", "position"):
normalized[field] = cls._optional_string(record.get(field), f"record[{index}].{field}")
if record.get("role") is not None:
normalized["role"] = cls._require_string(record.get("role"), f"record[{index}].role")
birth_date = parse_date(str(record.get("birth_date")))
birth_date_raw = record.get("birth_date")
if birth_date_raw in (None, ""):
normalized["birth_date"] = None
else:
birth_date = parse_date(str(birth_date_raw))
if not birth_date:
raise SnapshotValidationError(f"record[{index}].birth_date must be YYYY-MM-DD")
normalized["birth_date"] = birth_date.isoformat()
normalized["height_cm"] = cls._require_non_negative_int(record.get("height_cm"), f"record[{index}].height_cm")
normalized["weight_kg"] = cls._require_non_negative_int(record.get("weight_kg"), f"record[{index}].weight_kg")
normalized["height_cm"] = cls._optional_non_negative_int(record.get("height_cm"), f"record[{index}].height_cm")
normalized["weight_kg"] = cls._optional_non_negative_int(record.get("weight_kg"), f"record[{index}].weight_kg")
normalized["games_played"] = cls._require_non_negative_int(record.get("games_played"), f"record[{index}].games_played")
for field in (

View File

@ -72,10 +72,14 @@ INSTALLED_APPS = [
"apps.teams",
"apps.stats",
"apps.scouting",
"apps.providers",
"apps.ingestion",
]
# v2 default runtime is snapshot-first. Legacy provider stack is opt-in.
LEGACY_PROVIDER_STACK_ENABLED = env_bool("LEGACY_PROVIDER_STACK_ENABLED", False)
if LEGACY_PROVIDER_STACK_ENABLED:
INSTALLED_APPS.append("apps.providers")
MIDDLEWARE = [
"django.middleware.security.SecurityMiddleware",
"django.contrib.sessions.middleware.SessionMiddleware",
@ -195,6 +199,7 @@ SCHEDULER_INTERVAL_SECONDS = int(os.getenv("SCHEDULER_INTERVAL_SECONDS", "900"))
if SCHEDULER_INTERVAL_SECONDS < 30:
raise ImproperlyConfigured("SCHEDULER_INTERVAL_SECONDS must be >= 30.")
if LEGACY_PROVIDER_STACK_ENABLED:
PROVIDER_BACKEND = os.getenv("PROVIDER_BACKEND", "demo").strip().lower()
PROVIDER_NAMESPACE_DEMO = os.getenv("PROVIDER_NAMESPACE_DEMO", "mvp_demo")
PROVIDER_NAMESPACE_BALLDONTLIE = os.getenv("PROVIDER_NAMESPACE_BALLDONTLIE", "balldontlie")

View File

@ -1,4 +1,5 @@
from django.contrib import admin
from django.conf import settings
from django.urls import include, path
urlpatterns = [
@ -11,6 +12,8 @@ urlpatterns = [
path("teams/", include("apps.teams.urls")),
path("stats/", include("apps.stats.urls")),
path("scouting/", include("apps.scouting.urls")),
path("providers/", include("apps.providers.urls")),
path("ingestion/", include("apps.ingestion.urls")),
]
if settings.LEGACY_PROVIDER_STACK_ENABLED:
urlpatterns.append(path("providers/", include("apps.providers.urls")))

View File

@ -0,0 +1,58 @@
# Runtime Consistency Checklist (v2)
Use this checklist when runtime/docs changes are made.
## Compose and Runtime
- `docker-compose.yml` contains only v2 default runtime services:
- `web`, `nginx`, `postgres`
- optional `scheduler` profile service
- `docker-compose.dev.yml` is mutable (source bind mounts allowed for dev only).
- `docker-compose.release.yml` is settings-focused and keeps release runtime immutable.
## Image/Registry Strategy
- `web` image: `registry.younerd.org/hoopscout/web:${APP_IMAGE_TAG:-latest}`
- `nginx` image: `registry.younerd.org/hoopscout/nginx:${NGINX_IMAGE_TAG:-latest}`
- optional scheduler image: `registry.younerd.org/hoopscout/scheduler:${APP_IMAGE_TAG:-latest}`
## Entrypoints
- `entrypoint.sh`:
- waits for PostgreSQL
- creates snapshot directories
- optionally runs `migrate` and `collectstatic` when booting gunicorn
- `scripts/scheduler.sh`:
- runs `run_daily_orchestration` loop
- idle-sleeps when `SCHEDULER_ENABLED=0`
## Snapshot Lifecycle
1. Extractor writes snapshots to `incoming`.
2. `import_snapshots` validates + upserts into PostgreSQL.
3. Success => file moved to `archive`.
4. Failure => file moved to `failed`.
## Source Identity Rule
Raw IDs are not global. Imported identities are namespaced by source:
- `Competition`: `(source_name, source_uid)`
- `Team`: `(source_name, source_uid)`
- `Player`: `(source_name, source_uid)`
## Legacy Isolation
- `LEGACY_PROVIDER_STACK_ENABLED=0` by default.
- With default setting:
- `apps.providers` is not installed
- `/providers/` routes are not mounted
- legacy provider settings are not required
## Verification Commands
```bash
docker compose -f docker-compose.yml -f docker-compose.release.yml config
./scripts/verify_release_topology.sh
docker compose -f docker-compose.yml -f docker-compose.dev.yml run --rm web sh -lc "export PYTHONUSERBASE=/tmp/pyuser && python -m pip install --user -r requirements/dev.txt && python -m pytest -q"
```

View File

@ -30,7 +30,6 @@ check_service_bind_mount() {
}
check_service_bind_mount "web"
check_service_bind_mount "celery_worker"
check_service_bind_mount "celery_beat"
check_service_bind_mount "scheduler"
echo "Release topology verification passed."

View File

@ -0,0 +1,25 @@
{
"data": [
{
"player": {
"id": "bcl-player-99",
"name": "Alex Novak"
},
"team": {
"id": "bcl-team-tenerife",
"name": "Lenovo Tenerife"
},
"gp": 10,
"mpg": 27.2,
"ppg": 14.8,
"rpg": 4.1,
"apg": 3.3,
"spg": 1.2,
"bpg": 0.4,
"tov": 2.0,
"fg_pct": 47.3,
"three_pct": 38.0,
"ft_pct": 79.1
}
]
}

View File

@ -0,0 +1,25 @@
{
"data": [
{
"player": {
"id": "p-002",
"name": "Andrea Bianchi"
},
"team": {
"id": "team-olimpia-milano",
"name": "Olimpia Milano"
},
"gp": 18,
"mpg": 24.7,
"ppg": 12.3,
"rpg": 2.9,
"apg": 4.2,
"spg": 1.1,
"bpg": 0.1,
"tov": 1.8,
"fg_pct": 45.0,
"three_pct": 35.4,
"ft_pct": 82.7
}
]
}

View File

@ -30,6 +30,12 @@ def test_players_api_list_and_detail(client):
list_response = client.get(reverse("api:players"), data={"q": "rossi"})
assert list_response.status_code == 200
assert list_response.json()["count"] == 1
list_payload = list_response.json()
assert "sort" in list_payload
assert "metric_semantics" in list_payload
assert "metric_sort_keys" in list_payload
assert "ppg_value" in list_payload["results"][0]
assert "mpg_value" in list_payload["results"][0]
detail_response = client.get(reverse("api:player_detail", kwargs={"pk": player.pk}))
assert detail_response.status_code == 200
@ -173,8 +179,33 @@ def test_players_api_metric_sort_uses_best_eligible_values(client):
response = client.get(reverse("api:players"), data={"sort": "ppg_desc"})
assert response.status_code == 200
names = [row["full_name"] for row in response.json()["results"]]
payload = response.json()
names = [row["full_name"] for row in payload["results"]]
assert names.index("Dan High") < names.index("Ion Low")
assert payload["sort"] == "ppg_desc"
assert "best eligible values per player" in payload["metric_semantics"]
dan = next(row for row in payload["results"] if row["full_name"] == "Dan High")
ion = next(row for row in payload["results"] if row["full_name"] == "Ion Low")
assert float(dan["ppg_value"]) > float(ion["ppg_value"])
@pytest.mark.django_db
def test_players_api_metric_fields_are_exposed_and_nullable(client):
nationality = Nationality.objects.create(name="Sweden", iso2_code="SE", iso3_code="SWE")
Player.objects.create(
first_name="No",
last_name="Stats",
full_name="No Stats",
birth_date=date(2002, 1, 1),
nationality=nationality,
)
response = client.get(reverse("api:players"), data={"sort": "name_asc"})
assert response.status_code == 200
payload = response.json()
row = next(item for item in payload["results"] if item["full_name"] == "No Stats")
assert row["ppg_value"] is None
assert row["mpg_value"] is None
@pytest.mark.django_db

View File

@ -8,6 +8,7 @@ import pytest
from django.core.management import call_command
from apps.ingestion.extractors.bcl import BCLSnapshotExtractor
from apps.ingestion.extractors.base import ExtractorNormalizationError
from apps.ingestion.extractors.registry import create_extractor
@ -51,6 +52,56 @@ def test_bcl_extractor_normalizes_fixture_payload(tmp_path, settings):
assert row["three_pt_pct"] == 37.2
@pytest.mark.django_db
def test_bcl_extractor_accepts_partial_public_player_bio_fields(tmp_path, settings):
settings.EXTRACTOR_BCL_STATS_URL = "https://www.championsleague.basketball/public/stats.json"
settings.EXTRACTOR_BCL_SEASON_LABEL = "2025-2026"
settings.EXTRACTOR_BCL_COMPETITION_EXTERNAL_ID = "bcl"
settings.EXTRACTOR_BCL_COMPETITION_NAME = "Basketball Champions League"
fixture_payload = _load_fixture("bcl/bcl_players_stats_partial_public.json")
class FakeClient:
def get_json(self, *_args, **_kwargs):
return fixture_payload
extractor = BCLSnapshotExtractor(http_client=FakeClient())
output_path = tmp_path / "bcl-partial.json"
result = extractor.run(output_path=output_path, snapshot_date=date(2026, 3, 13))
assert result.records_count == 1
payload = json.loads(output_path.read_text(encoding="utf-8"))
row = payload["records"][0]
assert row["full_name"] == "Alex Novak"
assert row["first_name"] is None
assert row["last_name"] is None
assert row["birth_date"] is None
assert row["nationality"] is None
assert row["height_cm"] is None
assert row["weight_kg"] is None
assert row["position"] is None
assert row["games_played"] == 10
@pytest.mark.django_db
def test_bcl_extractor_still_fails_when_required_stats_are_missing(settings):
settings.EXTRACTOR_BCL_STATS_URL = "https://www.championsleague.basketball/public/stats.json"
settings.EXTRACTOR_BCL_SEASON_LABEL = "2025-2026"
settings.EXTRACTOR_BCL_COMPETITION_EXTERNAL_ID = "bcl"
settings.EXTRACTOR_BCL_COMPETITION_NAME = "Basketball Champions League"
fixture_payload = _load_fixture("bcl/bcl_players_stats_partial_public.json")
fixture_payload["data"][0].pop("ppg")
class FakeClient:
def get_json(self, *_args, **_kwargs):
return fixture_payload
extractor = BCLSnapshotExtractor(http_client=FakeClient())
with pytest.raises(ExtractorNormalizationError):
extractor.run(write_output=False, snapshot_date=date(2026, 3, 13))
@pytest.mark.django_db
def test_bcl_extractor_registry_selection(settings):
settings.EXTRACTOR_BCL_STATS_URL = "https://www.championsleague.basketball/public/stats.json"

View File

@ -1,38 +0,0 @@
import os
import subprocess
import sys
import pytest
def _run_python_import(code: str, env_overrides: dict[str, str]) -> subprocess.CompletedProcess:
env = os.environ.copy()
env.update(env_overrides)
return subprocess.run(
[sys.executable, "-c", code],
capture_output=True,
text=True,
env=env,
check=False,
)
@pytest.mark.django_db
def test_invalid_cron_does_not_crash_config_import_path():
result = _run_python_import(
(
"import config; "
"from config.celery import app; "
"print(f'beat_schedule_size={len(app.conf.beat_schedule or {})}')"
),
{
"DJANGO_SETTINGS_MODULE": "config.settings.development",
"DJANGO_ENV": "development",
"DJANGO_DEBUG": "1",
"INGESTION_SCHEDULE_ENABLED": "1",
"INGESTION_SCHEDULE_CRON": "bad cron value",
},
)
assert result.returncode == 0
assert "beat_schedule_size=0" in result.stdout

View File

@ -7,8 +7,10 @@ import pytest
from django.core.management import call_command
from apps.ingestion.extractors.base import BaseSnapshotExtractor
from apps.ingestion.extractors.base import ExtractorNormalizationError
from apps.ingestion.extractors.http import ResponsibleHttpClient
from apps.ingestion.extractors.public_json import PublicJsonSnapshotExtractor
from apps.ingestion.snapshots.schema import REQUIRED_RECORD_FIELDS
class DummyExtractor(BaseSnapshotExtractor):
@ -64,6 +66,29 @@ class _FakeResponse:
return self._payload
def _minimal_public_json_record() -> dict:
return {
"competition_external_id": "comp-1",
"competition_name": "League One",
"season": "2025-2026",
"team_external_id": "team-1",
"team_name": "Team One",
"player_external_id": "player-1",
"full_name": "Jane Doe",
"games_played": 12,
"minutes_per_game": 27.2,
"points_per_game": 13.0,
"rebounds_per_game": 4.4,
"assists_per_game": 3.1,
"steals_per_game": 1.0,
"blocks_per_game": 0.3,
"turnovers_per_game": 1.8,
"fg_pct": 46.2,
"three_pt_pct": 35.5,
"ft_pct": 82.1,
}
@pytest.mark.django_db
def test_base_extractor_run_writes_snapshot_file(tmp_path, settings):
settings.STATIC_DATASET_INCOMING_DIR = str(tmp_path / "incoming")
@ -135,6 +160,71 @@ def test_public_json_extractor_normalizes_common_field_aliases(tmp_path):
assert row["three_pt_pct"] == 36.1
@pytest.mark.django_db
def test_public_json_extractor_accepts_missing_optional_bio_and_physical_fields(tmp_path):
class FakeClient:
def get_json(self, *_args, **_kwargs):
return {"records": [_minimal_public_json_record()]}
extractor = PublicJsonSnapshotExtractor(
url="https://example.com/public-feed.json",
source_name="test_public_feed",
http_client=FakeClient(),
)
output_file = tmp_path / "public-optional.json"
result = extractor.run(output_path=output_file, snapshot_date=date(2026, 3, 13))
assert result.records_count == 1
payload = json.loads(output_file.read_text(encoding="utf-8"))
row = payload["records"][0]
assert row["full_name"] == "Jane Doe"
assert row["first_name"] is None
assert row["last_name"] is None
assert row["birth_date"] is None
assert row["nationality"] is None
assert row["height_cm"] is None
assert row["weight_kg"] is None
assert row["position"] is None
assert row.get("role") is None
@pytest.mark.django_db
def test_public_json_extractor_fails_when_required_stat_missing():
broken = _minimal_public_json_record()
broken.pop("points_per_game")
class FakeClient:
def get_json(self, *_args, **_kwargs):
return {"records": [broken]}
extractor = PublicJsonSnapshotExtractor(
url="https://example.com/public-feed.json",
source_name="test_public_feed",
http_client=FakeClient(),
)
with pytest.raises(ExtractorNormalizationError):
extractor.run(write_output=False, snapshot_date=date(2026, 3, 13))
@pytest.mark.django_db
@pytest.mark.parametrize("required_field", sorted(REQUIRED_RECORD_FIELDS))
def test_public_json_required_fields_follow_snapshot_schema(required_field):
broken = _minimal_public_json_record()
broken.pop(required_field)
class FakeClient:
def get_json(self, *_args, **_kwargs):
return {"records": [broken]}
extractor = PublicJsonSnapshotExtractor(
url="https://example.com/public-feed.json",
source_name="test_public_feed",
http_client=FakeClient(),
)
with pytest.raises(ExtractorNormalizationError, match="missing required fields"):
extractor.run(write_output=False, snapshot_date=date(2026, 3, 13))
@pytest.mark.django_db
def test_run_extractor_management_command_writes_snapshot(tmp_path, settings):
settings.EXTRACTOR_PUBLIC_JSON_URL = "https://example.com/feed.json"

View File

@ -103,6 +103,116 @@ def test_valid_snapshot_import(tmp_path, settings):
assert PlayerSeasonStats.objects.count() == 1
@pytest.mark.django_db
def test_snapshot_import_succeeds_with_optional_bio_and_physical_fields_missing(tmp_path, settings):
incoming = tmp_path / "incoming"
archive = tmp_path / "archive"
failed = tmp_path / "failed"
incoming.mkdir()
archive.mkdir()
failed.mkdir()
payload = _valid_payload()
for optional_field in ("first_name", "last_name", "birth_date", "nationality", "height_cm", "weight_kg", "position", "role"):
payload["records"][0].pop(optional_field, None)
file_path = incoming / "optional-missing.json"
_write_json(file_path, payload)
settings.STATIC_DATASET_INCOMING_DIR = str(incoming)
settings.STATIC_DATASET_ARCHIVE_DIR = str(archive)
settings.STATIC_DATASET_FAILED_DIR = str(failed)
call_command("import_snapshots")
run = ImportRun.objects.get()
assert run.status == ImportRun.RunStatus.SUCCESS
player = Player.objects.get(source_uid="player-23")
assert player.first_name == "LeBron"
assert player.last_name == "James"
assert player.birth_date is None
assert player.nationality is None
assert player.nominal_position is None
assert player.height_cm is None
assert player.weight_kg is None
assert PlayerSeasonStats.objects.count() == 1
@pytest.mark.django_db
def test_snapshot_import_preserves_single_name_part_without_forced_split(tmp_path, settings):
incoming = tmp_path / "incoming"
archive = tmp_path / "archive"
failed = tmp_path / "failed"
incoming.mkdir()
archive.mkdir()
failed.mkdir()
payload = _valid_payload()
row = payload["records"][0]
row["first_name"] = "LeBron"
row.pop("last_name")
file_path = incoming / "single-name-part.json"
_write_json(file_path, payload)
settings.STATIC_DATASET_INCOMING_DIR = str(incoming)
settings.STATIC_DATASET_ARCHIVE_DIR = str(archive)
settings.STATIC_DATASET_FAILED_DIR = str(failed)
call_command("import_snapshots")
run = ImportRun.objects.get()
assert run.status == ImportRun.RunStatus.SUCCESS
player = Player.objects.get(source_uid="player-23")
assert player.first_name == "LeBron"
assert player.last_name == ""
@pytest.mark.django_db
@pytest.mark.parametrize(
("source_name", "competition_id", "competition_name"),
[
("lba", "lba-serie-a", "Lega Basket Serie A"),
("bcl", "bcl", "Basketball Champions League"),
],
)
def test_partial_public_source_snapshot_imports_for_lba_and_bcl(
tmp_path,
settings,
source_name,
competition_id,
competition_name,
):
incoming = tmp_path / "incoming"
archive = tmp_path / "archive"
failed = tmp_path / "failed"
incoming.mkdir()
archive.mkdir()
failed.mkdir()
payload = _valid_payload()
payload["source_name"] = source_name
row = payload["records"][0]
row["competition_external_id"] = competition_id
row["competition_name"] = competition_name
for optional_field in ("first_name", "last_name", "birth_date", "nationality", "height_cm", "weight_kg", "position", "role"):
row.pop(optional_field, None)
_write_json(incoming / f"{source_name}.json", payload)
settings.STATIC_DATASET_INCOMING_DIR = str(incoming)
settings.STATIC_DATASET_ARCHIVE_DIR = str(archive)
settings.STATIC_DATASET_FAILED_DIR = str(failed)
call_command("import_snapshots")
run = ImportRun.objects.get()
assert run.status == ImportRun.RunStatus.SUCCESS
assert Competition.objects.filter(source_uid=competition_id, name=competition_name).exists()
assert Player.objects.filter(source_uid="player-23").exists()
assert PlayerSeasonStats.objects.count() == 1
@pytest.mark.django_db
def test_invalid_snapshot_rejected_and_moved_to_failed(tmp_path, settings):
incoming = tmp_path / "incoming"

View File

@ -1,251 +0,0 @@
import os
import pytest
from apps.competitions.models import Competition, Season
from apps.ingestion.models import IngestionError, IngestionRun
from apps.ingestion.services.sync import run_sync_job
from apps.players.models import Nationality, Player
from apps.providers.exceptions import ProviderRateLimitError
from apps.providers.models import ExternalMapping
from apps.stats.models import PlayerSeason, PlayerSeasonStats
from apps.teams.models import Team
@pytest.mark.django_db
def test_run_full_sync_creates_domain_objects(settings):
settings.PROVIDER_DEFAULT_NAMESPACE = "mvp_demo"
run = run_sync_job(provider_namespace="mvp_demo", job_type=IngestionRun.JobType.FULL_SYNC)
assert run.status == IngestionRun.RunStatus.SUCCESS
assert Competition.objects.count() >= 1
assert Team.objects.count() >= 1
assert Season.objects.count() >= 1
assert Player.objects.count() >= 1
assert PlayerSeason.objects.count() >= 1
assert PlayerSeasonStats.objects.count() >= 1
assert Player.objects.filter(origin_competition__isnull=False).exists()
assert run.context.get("completed_steps") == [
"competitions",
"teams",
"seasons",
"players",
"player_stats",
"player_careers",
]
assert run.context.get("source_counts", {}).get("players", 0) >= 1
@pytest.mark.django_db
def test_full_sync_is_idempotent(settings):
settings.PROVIDER_DEFAULT_NAMESPACE = "mvp_demo"
run_sync_job(provider_namespace="mvp_demo", job_type=IngestionRun.JobType.FULL_SYNC)
counts_after_first = {
"competition": Competition.objects.count(),
"team": Team.objects.count(),
"season": Season.objects.count(),
"player": Player.objects.count(),
"player_season": PlayerSeason.objects.count(),
"player_stats": PlayerSeasonStats.objects.count(),
}
run_sync_job(provider_namespace="mvp_demo", job_type=IngestionRun.JobType.FULL_SYNC)
counts_after_second = {
"competition": Competition.objects.count(),
"team": Team.objects.count(),
"season": Season.objects.count(),
"player": Player.objects.count(),
"player_season": PlayerSeason.objects.count(),
"player_stats": PlayerSeasonStats.objects.count(),
}
assert counts_after_first == counts_after_second
@pytest.mark.django_db
def test_incremental_sync_runs_successfully(settings):
settings.PROVIDER_DEFAULT_NAMESPACE = "mvp_demo"
run = run_sync_job(
provider_namespace="mvp_demo",
job_type=IngestionRun.JobType.INCREMENTAL,
cursor="demo-cursor",
)
assert run.status == IngestionRun.RunStatus.SUCCESS
assert run.records_processed > 0
assert run.started_at is not None
assert run.finished_at is not None
assert run.finished_at >= run.started_at
assert run.error_summary == ""
@pytest.mark.django_db
def test_run_sync_handles_rate_limit(settings):
settings.PROVIDER_DEFAULT_NAMESPACE = "mvp_demo"
os.environ["PROVIDER_MVP_FORCE_RATE_LIMIT"] = "1"
with pytest.raises(ProviderRateLimitError):
run_sync_job(provider_namespace="mvp_demo", job_type=IngestionRun.JobType.FULL_SYNC)
run = IngestionRun.objects.order_by("-id").first()
assert run is not None
assert run.status == IngestionRun.RunStatus.FAILED
assert run.started_at is not None
assert run.finished_at is not None
assert "Rate limit" in run.error_summary
assert IngestionError.objects.filter(ingestion_run=run).exists()
os.environ.pop("PROVIDER_MVP_FORCE_RATE_LIMIT", None)
@pytest.mark.django_db
def test_balldontlie_sync_idempotency_with_stable_payload(monkeypatch):
class StableProvider:
def sync_all(self):
return {
"competitions": [
{
"external_id": "competition-nba",
"name": "NBA",
"slug": "nba",
"competition_type": "league",
"gender": "men",
"level": 1,
"country": None,
"is_active": True,
}
],
"teams": [
{
"external_id": "team-14",
"name": "Los Angeles Lakers",
"short_name": "LAL",
"slug": "los-angeles-lakers",
"country": None,
"is_national_team": False,
}
],
"seasons": [
{
"external_id": "season-2024",
"label": "2024-2025",
"start_date": "2024-10-01",
"end_date": "2025-06-30",
"is_current": False,
}
],
"players": [
{
"external_id": "player-237",
"first_name": "LeBron",
"last_name": "James",
"full_name": "LeBron James",
"birth_date": None,
"nationality": None,
"nominal_position": {"code": "SF", "name": "Small Forward"},
"inferred_role": {"code": "wing", "name": "Wing"},
"height_cm": None,
"weight_kg": None,
"dominant_hand": "unknown",
"is_active": True,
"aliases": [],
}
],
"player_stats": [
{
"external_id": "ps-2024-237-14",
"player_external_id": "player-237",
"team_external_id": "team-14",
"competition_external_id": "competition-nba",
"season_external_id": "season-2024",
"games_played": 2,
"games_started": 0,
"minutes_played": 68,
"points": 25,
"rebounds": 9,
"assists": 8,
"steals": 1.5,
"blocks": 0.5,
"turnovers": 3.5,
"fg_pct": 55.0,
"three_pct": 45.0,
"ft_pct": 95.0,
"usage_rate": None,
"true_shooting_pct": None,
"player_efficiency_rating": None,
}
],
"player_careers": [
{
"external_id": "career-2024-237-14",
"player_external_id": "player-237",
"team_external_id": "team-14",
"competition_external_id": "competition-nba",
"season_external_id": "season-2024",
"role_code": "",
"shirt_number": None,
"start_date": "2024-10-01",
"end_date": "2025-06-30",
"notes": "Imported from balldontlie aggregated box scores",
}
],
}
def sync_incremental(self, *, cursor: str | None = None):
payload = self.sync_all()
payload["cursor"] = cursor
return payload
monkeypatch.setattr("apps.ingestion.services.sync.get_provider", lambda namespace: StableProvider())
run_sync_job(provider_namespace="balldontlie", job_type=IngestionRun.JobType.FULL_SYNC)
lebron = Player.objects.get(full_name="LeBron James")
assert lebron.nationality is None
assert not Nationality.objects.filter(iso2_code="ZZ").exists()
counts_first = {
"competition": Competition.objects.count(),
"team": Team.objects.count(),
"season": Season.objects.count(),
"player": Player.objects.count(),
"player_season": PlayerSeason.objects.count(),
"player_stats": PlayerSeasonStats.objects.count(),
"mapping": ExternalMapping.objects.filter(provider_namespace="balldontlie").count(),
}
run_sync_job(provider_namespace="balldontlie", job_type=IngestionRun.JobType.FULL_SYNC)
counts_second = {
"competition": Competition.objects.count(),
"team": Team.objects.count(),
"season": Season.objects.count(),
"player": Player.objects.count(),
"player_season": PlayerSeason.objects.count(),
"player_stats": PlayerSeasonStats.objects.count(),
"mapping": ExternalMapping.objects.filter(provider_namespace="balldontlie").count(),
}
assert counts_first == counts_second
@pytest.mark.django_db
def test_batch_transactions_preserve_prior_step_progress_on_failure(settings, monkeypatch):
settings.PROVIDER_DEFAULT_NAMESPACE = "mvp_demo"
def boom(*args, **kwargs):
raise RuntimeError("teams-sync-failed")
monkeypatch.setattr("apps.ingestion.services.sync._sync_teams", boom)
with pytest.raises(RuntimeError):
run_sync_job(provider_namespace="mvp_demo", job_type=IngestionRun.JobType.FULL_SYNC)
run = IngestionRun.objects.order_by("-id").first()
assert run is not None
assert run.status == IngestionRun.RunStatus.FAILED
assert Competition.objects.exists()
assert Team.objects.count() == 0
assert run.context.get("completed_steps") == ["competitions"]
assert "Unhandled ingestion error" in run.error_summary

View File

@ -1,112 +0,0 @@
import pytest
from contextlib import contextmanager
from celery.schedules import crontab
import psycopg
from django.conf import settings
from apps.ingestion.models import IngestionRun
from apps.ingestion.services.runs import _build_ingestion_lock_key, release_ingestion_lock, try_acquire_ingestion_lock
from apps.ingestion.tasks import scheduled_provider_sync, trigger_incremental_sync
from config.celery import app as celery_app, build_periodic_schedule
@pytest.mark.django_db
def test_periodic_task_registered():
assert "apps.ingestion.tasks.scheduled_provider_sync" in celery_app.tasks
@pytest.mark.django_db
def test_build_periodic_schedule_enabled(settings):
settings.INGESTION_SCHEDULE_ENABLED = True
settings.INGESTION_SCHEDULE_CRON = "15 * * * *"
schedule = build_periodic_schedule()
assert "ingestion.scheduled_provider_sync" in schedule
entry = schedule["ingestion.scheduled_provider_sync"]
assert entry["task"] == "apps.ingestion.tasks.scheduled_provider_sync"
assert isinstance(entry["schedule"], crontab)
assert entry["schedule"]._orig_minute == "15"
@pytest.mark.django_db
def test_build_periodic_schedule_disabled(settings):
settings.INGESTION_SCHEDULE_ENABLED = False
assert build_periodic_schedule() == {}
@pytest.mark.django_db
def test_build_periodic_schedule_invalid_cron_disables_task_and_logs(settings, caplog):
settings.INGESTION_SCHEDULE_ENABLED = True
settings.INGESTION_SCHEDULE_CRON = "invalid-cron"
with caplog.at_level("ERROR"):
schedule = build_periodic_schedule()
assert schedule == {}
assert any("Invalid periodic ingestion schedule config. Task disabled." in message for message in caplog.messages)
@pytest.mark.django_db
def test_trigger_incremental_sync_skips_when_advisory_lock_not_acquired(settings, monkeypatch):
settings.INGESTION_PREVENT_OVERLAP = True
@contextmanager
def fake_lock(**kwargs):
yield False
monkeypatch.setattr("apps.ingestion.tasks.ingestion_advisory_lock", fake_lock)
run_id = trigger_incremental_sync.apply(
kwargs={"provider_namespace": "mvp_demo"},
).get()
skipped_run = IngestionRun.objects.get(id=run_id)
assert skipped_run.status == IngestionRun.RunStatus.CANCELED
assert "advisory lock" in skipped_run.error_summary
@pytest.mark.django_db
def test_advisory_lock_prevents_concurrent_acquisition():
provider_namespace = "mvp_demo"
job_type = IngestionRun.JobType.INCREMENTAL
lock_key = _build_ingestion_lock_key(provider_namespace=provider_namespace, job_type=job_type)
conninfo = (
f"dbname={settings.DATABASES['default']['NAME']} "
f"user={settings.DATABASES['default']['USER']} "
f"password={settings.DATABASES['default']['PASSWORD']} "
f"host={settings.DATABASES['default']['HOST']} "
f"port={settings.DATABASES['default']['PORT']}"
)
with psycopg.connect(conninfo) as external_conn:
with external_conn.cursor() as cursor:
cursor.execute("SELECT pg_advisory_lock(%s);", [lock_key])
acquired, _ = try_acquire_ingestion_lock(
provider_namespace=provider_namespace,
job_type=job_type,
)
assert acquired is False
cursor.execute("SELECT pg_advisory_unlock(%s);", [lock_key])
acquired, django_key = try_acquire_ingestion_lock(
provider_namespace=provider_namespace,
job_type=job_type,
)
assert acquired is True
release_ingestion_lock(lock_key=django_key)
@pytest.mark.django_db
def test_scheduled_provider_sync_uses_configured_job_type(settings, monkeypatch):
settings.INGESTION_SCHEDULE_JOB_TYPE = IngestionRun.JobType.FULL_SYNC
settings.INGESTION_SCHEDULE_PROVIDER_NAMESPACE = "mvp_demo"
captured = {}
def fake_runner(**kwargs):
captured.update(kwargs)
return 99
monkeypatch.setattr("apps.ingestion.tasks._run_sync_with_overlap_guard", fake_runner)
result = scheduled_provider_sync.apply().get()
assert result == 99
assert captured["provider_namespace"] == "mvp_demo"
assert captured["job_type"] == IngestionRun.JobType.FULL_SYNC

View File

@ -4,8 +4,6 @@ import pytest
from django.contrib.auth.models import User
from django.urls import reverse
from apps.ingestion.models import IngestionRun
from apps.ingestion.services.sync import run_sync_job
from apps.players.models import Nationality, Player, Position, Role
from apps.scouting.models import SavedSearch
@ -49,25 +47,3 @@ def test_saved_search_run_filters_player_results(client):
assert response.status_code == 200
assert "Marco Rossi" in response.content.decode()
assert "Luca Bianchi" not in response.content.decode()
@pytest.mark.django_db
def test_ingestion_output_is_searchable_in_ui_and_api(settings, client):
settings.PROVIDER_DEFAULT_NAMESPACE = "mvp_demo"
run = run_sync_job(provider_namespace="mvp_demo", job_type=IngestionRun.JobType.FULL_SYNC)
assert run.status == IngestionRun.RunStatus.SUCCESS
player = Player.objects.filter(origin_competition__isnull=False).order_by("id").first()
assert player is not None
assert player.origin_competition_id is not None
params = {"origin_competition": player.origin_competition_id}
ui_response = client.get(reverse("players:index"), data=params)
api_response = client.get(reverse("api:players"), data=params)
assert ui_response.status_code == 200
assert api_response.status_code == 200
ui_ids = {item.id for item in ui_response.context["players"]}
api_ids = {item["id"] for item in api_response.json()["results"]}
assert player.id in ui_ids
assert player.id in api_ids

View File

@ -8,6 +8,7 @@ import pytest
from django.core.management import call_command
from apps.ingestion.extractors.lba import LBASnapshotExtractor
from apps.ingestion.extractors.base import ExtractorNormalizationError
from apps.ingestion.extractors.registry import create_extractor
@ -51,6 +52,56 @@ def test_lba_extractor_normalizes_fixture_payload(tmp_path, settings):
assert row["three_pt_pct"] == 36.5
@pytest.mark.django_db
def test_lba_extractor_accepts_partial_public_player_bio_fields(tmp_path, settings):
settings.EXTRACTOR_LBA_STATS_URL = "https://www.legabasket.it/public/stats.json"
settings.EXTRACTOR_LBA_SEASON_LABEL = "2025-2026"
settings.EXTRACTOR_LBA_COMPETITION_EXTERNAL_ID = "lba-serie-a"
settings.EXTRACTOR_LBA_COMPETITION_NAME = "Lega Basket Serie A"
fixture_payload = _load_fixture("lba/lba_players_stats_partial_public.json")
class FakeClient:
def get_json(self, *_args, **_kwargs):
return fixture_payload
extractor = LBASnapshotExtractor(http_client=FakeClient())
output_path = tmp_path / "lba-partial.json"
result = extractor.run(output_path=output_path, snapshot_date=date(2026, 3, 13))
assert result.records_count == 1
payload = json.loads(output_path.read_text(encoding="utf-8"))
row = payload["records"][0]
assert row["full_name"] == "Andrea Bianchi"
assert row["first_name"] is None
assert row["last_name"] is None
assert row["birth_date"] is None
assert row["nationality"] is None
assert row["height_cm"] is None
assert row["weight_kg"] is None
assert row["position"] is None
assert row["games_played"] == 18
@pytest.mark.django_db
def test_lba_extractor_still_fails_when_required_stats_are_missing(settings):
settings.EXTRACTOR_LBA_STATS_URL = "https://www.legabasket.it/public/stats.json"
settings.EXTRACTOR_LBA_SEASON_LABEL = "2025-2026"
settings.EXTRACTOR_LBA_COMPETITION_EXTERNAL_ID = "lba-serie-a"
settings.EXTRACTOR_LBA_COMPETITION_NAME = "Lega Basket Serie A"
fixture_payload = _load_fixture("lba/lba_players_stats_partial_public.json")
fixture_payload["data"][0].pop("ppg")
class FakeClient:
def get_json(self, *_args, **_kwargs):
return fixture_payload
extractor = LBASnapshotExtractor(http_client=FakeClient())
with pytest.raises(ExtractorNormalizationError):
extractor.run(write_output=False, snapshot_date=date(2026, 3, 13))
@pytest.mark.django_db
def test_lba_extractor_registry_selection(settings):
settings.EXTRACTOR_LBA_STATS_URL = "https://www.legabasket.it/public/stats.json"

View File

@ -4,7 +4,7 @@ import pytest
from django.urls import reverse
from apps.competitions.models import Competition, Season
from apps.players.models import Nationality, Player, Position, Role
from apps.players.models import Nationality, Player, PlayerAlias, Position, Role
from apps.stats.models import PlayerSeason, PlayerSeasonStats
from apps.teams.models import Team

View File

@ -1,77 +0,0 @@
import os
import pytest
from apps.providers.adapters.mvp_provider import MvpDemoProviderAdapter
from apps.providers.exceptions import ProviderNotFoundError, ProviderRateLimitError
from apps.providers.registry import get_provider
@pytest.mark.django_db
def test_mvp_provider_fetch_and_search_players():
adapter = MvpDemoProviderAdapter()
players = adapter.fetch_players()
assert len(players) >= 2
results = adapter.search_players(query="luca")
assert any("Luca" in item["full_name"] for item in results)
detail = adapter.fetch_player(external_player_id="player-001")
assert detail is not None
assert detail["full_name"] == "Luca Rinaldi"
@pytest.mark.django_db
def test_mvp_provider_rate_limit_signal():
os.environ["PROVIDER_MVP_FORCE_RATE_LIMIT"] = "1"
adapter = MvpDemoProviderAdapter()
with pytest.raises(ProviderRateLimitError):
adapter.fetch_players()
os.environ.pop("PROVIDER_MVP_FORCE_RATE_LIMIT", None)
@pytest.mark.django_db
def test_provider_registry_resolution(settings):
settings.PROVIDER_DEFAULT_NAMESPACE = "mvp_demo"
provider = get_provider()
assert isinstance(provider, MvpDemoProviderAdapter)
with pytest.raises(ProviderNotFoundError):
get_provider("does-not-exist")
@pytest.mark.django_db
def test_demo_provider_sync_payload_uses_normalized_shape():
adapter = MvpDemoProviderAdapter()
payload = adapter.sync_all()
assert set(payload.keys()) == {
"players",
"competitions",
"teams",
"seasons",
"player_stats",
"player_careers",
"cursor",
}
assert payload["cursor"] is None
player = payload["players"][0]
assert set(player.keys()) == {
"external_id",
"first_name",
"last_name",
"full_name",
"birth_date",
"nationality",
"nominal_position",
"inferred_role",
"height_cm",
"weight_kg",
"dominant_hand",
"is_active",
"aliases",
}

View File

@ -1,263 +0,0 @@
from __future__ import annotations
import time
from typing import Any
import pytest
import requests
from apps.providers.adapters.balldontlie_provider import BalldontlieProviderAdapter
from apps.providers.adapters.mvp_provider import MvpDemoProviderAdapter
from apps.providers.clients.balldontlie import BalldontlieClient
from apps.providers.exceptions import ProviderRateLimitError, ProviderTransientError, ProviderUnauthorizedError
from apps.providers.registry import get_default_provider_namespace, get_provider
from apps.providers.services.balldontlie_mappings import map_seasons
class _FakeResponse:
def __init__(self, *, status_code: int, payload: dict[str, Any] | None = None, headers: dict[str, str] | None = None, text: str = ""):
self.status_code = status_code
self._payload = payload or {}
self.headers = headers or {}
self.text = text
def json(self):
return self._payload
class _FakeSession:
def __init__(self, responses: list[Any]):
self._responses = responses
self.calls: list[dict[str, Any]] = []
def get(self, *args, **kwargs):
self.calls.append(kwargs)
item = self._responses.pop(0)
if isinstance(item, Exception):
raise item
return item
class _FakeBalldontlieClient:
def get_json(self, path: str, *, params: dict[str, Any] | None = None) -> dict[str, Any]:
if path == "/nba/v1/teams":
return {
"data": [
{
"id": 14,
"full_name": "Los Angeles Lakers",
"abbreviation": "LAL",
}
]
}
return {"data": []}
def list_paginated(
self,
path: str,
*,
params: dict[str, Any] | None = None,
per_page: int = 100,
page_limit: int = 1,
) -> list[dict[str, Any]]:
if path == "/nba/v1/players":
return [
{
"id": 237,
"first_name": "LeBron",
"last_name": "James",
"position": "F",
"team": {"id": 14},
}
]
if path == "/nba/v1/stats":
return [
{
"pts": 20,
"reb": 8,
"ast": 7,
"stl": 1,
"blk": 1,
"turnover": 3,
"fg_pct": 0.5,
"fg3_pct": 0.4,
"ft_pct": 0.9,
"min": "35:12",
"player": {"id": 237},
"team": {"id": 14},
"game": {"season": 2024},
},
{
"pts": 30,
"reb": 10,
"ast": 9,
"stl": 2,
"blk": 0,
"turnover": 4,
"fg_pct": 0.6,
"fg3_pct": 0.5,
"ft_pct": 1.0,
"min": "33:00",
"player": {"id": 237},
"team": {"id": 14},
"game": {"season": 2024},
},
]
return []
@pytest.mark.django_db
def test_provider_registry_backend_selection(settings):
settings.PROVIDER_DEFAULT_NAMESPACE = ""
settings.PROVIDER_BACKEND = "demo"
assert get_default_provider_namespace() == "mvp_demo"
assert isinstance(get_provider(), MvpDemoProviderAdapter)
settings.PROVIDER_BACKEND = "balldontlie"
assert get_default_provider_namespace() == "balldontlie"
assert isinstance(get_provider(), BalldontlieProviderAdapter)
settings.PROVIDER_DEFAULT_NAMESPACE = "mvp_demo"
assert get_default_provider_namespace() == "mvp_demo"
@pytest.mark.django_db
def test_balldontlie_adapter_maps_payloads(settings):
settings.PROVIDER_BALLDONTLIE_SEASONS = [2024]
adapter = BalldontlieProviderAdapter(client=_FakeBalldontlieClient())
payload = adapter.sync_all()
assert payload["competitions"][0]["external_id"] == "competition-nba"
assert payload["teams"][0]["external_id"] == "team-14"
assert payload["players"][0]["external_id"] == "player-237"
assert payload["seasons"][0]["external_id"] == "season-2024"
assert payload["player_stats"][0]["games_played"] == 2
assert payload["player_stats"][0]["points"] == 25.0
assert payload["player_stats"][0]["fg_pct"] == 55.0
player = payload["players"][0]
assert player["nationality"] is None
assert "current_team_external_id" not in player
expected_keys = {
"external_id",
"first_name",
"last_name",
"full_name",
"birth_date",
"nationality",
"nominal_position",
"inferred_role",
"height_cm",
"weight_kg",
"dominant_hand",
"is_active",
"aliases",
}
assert set(player.keys()) == expected_keys
@pytest.mark.django_db
def test_balldontlie_map_seasons_marks_latest_as_current():
seasons = map_seasons([2022, 2024, 2023, 2024])
current_rows = [row for row in seasons if row["is_current"]]
assert len(current_rows) == 1
assert current_rows[0]["external_id"] == "season-2024"
assert [row["external_id"] for row in seasons] == ["season-2022", "season-2023", "season-2024"]
@pytest.mark.django_db
def test_balldontlie_adapter_degrades_when_stats_unauthorized(settings):
class _UnauthorizedStatsClient(_FakeBalldontlieClient):
def list_paginated(self, path: str, *, params=None, per_page=100, page_limit=1):
if path == "/nba/v1/stats":
raise ProviderUnauthorizedError(
provider="balldontlie",
path="stats",
status_code=401,
detail="Unauthorized",
)
return super().list_paginated(path, params=params, per_page=per_page, page_limit=page_limit)
settings.PROVIDER_BALLDONTLIE_SEASONS = [2024]
settings.PROVIDER_BALLDONTLIE_STATS_STRICT = False
adapter = BalldontlieProviderAdapter(client=_UnauthorizedStatsClient())
payload = adapter.sync_all()
assert payload["players"]
assert payload["teams"]
assert payload["player_stats"] == []
assert payload["player_careers"] == []
@pytest.mark.django_db
def test_balldontlie_client_retries_after_rate_limit(monkeypatch, settings):
monkeypatch.setattr(time, "sleep", lambda _: None)
settings.PROVIDER_REQUEST_RETRIES = 2
settings.PROVIDER_REQUEST_RETRY_SLEEP = 0
session = _FakeSession(
responses=[
_FakeResponse(status_code=429, headers={"Retry-After": "0"}),
_FakeResponse(status_code=200, payload={"data": []}),
]
)
client = BalldontlieClient(session=session)
payload = client.get_json("players")
assert payload == {"data": []}
@pytest.mark.django_db
def test_balldontlie_client_timeout_retries_then_fails(monkeypatch, settings):
monkeypatch.setattr(time, "sleep", lambda _: None)
settings.PROVIDER_REQUEST_RETRIES = 2
settings.PROVIDER_REQUEST_RETRY_SLEEP = 0
session = _FakeSession(responses=[requests.Timeout("slow"), requests.Timeout("slow")])
client = BalldontlieClient(session=session)
with pytest.raises(ProviderTransientError):
client.get_json("players")
@pytest.mark.django_db
def test_balldontlie_client_raises_rate_limit_after_max_retries(monkeypatch, settings):
monkeypatch.setattr(time, "sleep", lambda _: None)
settings.PROVIDER_REQUEST_RETRIES = 2
settings.PROVIDER_REQUEST_RETRY_SLEEP = 0
session = _FakeSession(
responses=[
_FakeResponse(status_code=429, headers={"Retry-After": "1"}),
_FakeResponse(status_code=429, headers={"Retry-After": "1"}),
]
)
client = BalldontlieClient(session=session)
with pytest.raises(ProviderRateLimitError):
client.get_json("players")
@pytest.mark.django_db
def test_balldontlie_client_cursor_pagination(settings):
session = _FakeSession(
responses=[
_FakeResponse(
status_code=200,
payload={"data": [{"id": 1}], "meta": {"next_cursor": 101}},
),
_FakeResponse(
status_code=200,
payload={"data": [{"id": 2}], "meta": {"next_cursor": None}},
),
]
)
client = BalldontlieClient(session=session)
rows = client.list_paginated("players", per_page=1, page_limit=5)
assert rows == [{"id": 1}, {"id": 2}]
assert "page" not in session.calls[0]["params"]
assert "cursor" not in session.calls[0]["params"]
assert session.calls[1]["params"]["cursor"] == 101

View File

@ -0,0 +1,15 @@
import pytest
from django.conf import settings
@pytest.mark.django_db
def test_legacy_provider_stack_disabled_by_default():
assert settings.LEGACY_PROVIDER_STACK_ENABLED is False
assert "apps.providers" not in settings.INSTALLED_APPS
@pytest.mark.django_db
def test_providers_route_not_mounted_by_default(client):
response = client.get("/providers/")
assert response.status_code == 404