Compare commits
10 Commits
1aad6945c7
...
24aa827811
| Author | SHA1 | Date | |
|---|---|---|---|
| 24aa827811 | |||
| 90f83091ce | |||
| f2d5e20701 | |||
| 887da3cd06 | |||
| eb6e0bf594 | |||
| b6b6753931 | |||
| 5a19587376 | |||
| 3f811827de | |||
| 48a82e812a | |||
| 6066d2a0bb |
@ -62,6 +62,12 @@ SCHEDULER_INTERVAL_SECONDS=900
|
||||
# When scheduler is disabled but container is started, keep it idle (avoid restart loops)
|
||||
SCHEDULER_DISABLED_SLEEP_SECONDS=300
|
||||
|
||||
# Legacy provider-sync stack (v1-style) is disabled by default in v2.
|
||||
LEGACY_PROVIDER_STACK_ENABLED=0
|
||||
# Optional legacy provider settings (only when LEGACY_PROVIDER_STACK_ENABLED=1):
|
||||
# PROVIDER_BACKEND=demo
|
||||
# PROVIDER_DEFAULT_NAMESPACE=mvp_demo
|
||||
|
||||
# API safeguards (read-only API is optional)
|
||||
API_THROTTLE_ANON=100/hour
|
||||
API_THROTTLE_USER=1000/hour
|
||||
|
||||
@ -45,6 +45,13 @@ docker compose -f docker-compose.yml -f docker-compose.dev.yml up --build
|
||||
docker compose -f docker-compose.yml -f docker-compose.release.yml up -d --build
|
||||
```
|
||||
|
||||
### Verify release topology assumptions
|
||||
|
||||
```bash
|
||||
docker compose -f docker-compose.yml -f docker-compose.release.yml config
|
||||
./scripts/verify_release_topology.sh
|
||||
```
|
||||
|
||||
## Day-to-Day Feature Workflow
|
||||
|
||||
1. Sync `develop`
|
||||
@ -63,6 +70,15 @@ git checkout -b feature/your-feature-name
|
||||
3. Implement with focused commits and tests.
|
||||
4. Open PR: `feature/*` -> `develop`.
|
||||
|
||||
## Running Tests (v2)
|
||||
|
||||
Runtime images are intentionally lean and may not ship `pytest`.
|
||||
Use the development compose stack and install dev dependencies before running tests:
|
||||
|
||||
```bash
|
||||
docker compose -f docker-compose.yml -f docker-compose.dev.yml run --rm web sh -lc "export PYTHONUSERBASE=/tmp/pyuser && python -m pip install --user -r requirements/dev.txt && python -m pytest -q"
|
||||
```
|
||||
|
||||
## PR Checklist
|
||||
|
||||
- [ ] Target branch is correct
|
||||
@ -78,6 +94,8 @@ git checkout -b feature/your-feature-name
|
||||
- Keep PostgreSQL as source of truth.
|
||||
- Keep snapshot storage file-based and volume-backed.
|
||||
- Do not introduce MongoDB or Elasticsearch as source of truth.
|
||||
- Keep legacy provider/Celery sync code isolated behind `LEGACY_PROVIDER_STACK_ENABLED=1`.
|
||||
- Keep runtime/docs consistency aligned with `docs/runtime-consistency-checklist.md`.
|
||||
|
||||
## Repository Bootstrap Commands
|
||||
|
||||
|
||||
97
README.md
97
README.md
@ -9,9 +9,8 @@ Current v2 foundation scope in this branch:
|
||||
- management-command-driven runtime operations
|
||||
- static snapshot directories persisted via Docker named volumes
|
||||
- strict JSON snapshot schema + import management command
|
||||
|
||||
Out of scope in this step:
|
||||
- extractor implementation
|
||||
- extractor framework with LBA/BCL/public JSON adapters
|
||||
- daily orchestration command and optional scheduler profile
|
||||
|
||||
## Runtime Architecture (v2)
|
||||
|
||||
@ -22,7 +21,8 @@ Runtime services are intentionally small:
|
||||
- optional `scheduler` profile service (runs daily extractor/import loop)
|
||||
|
||||
No Redis/Celery services are part of the v2 default runtime topology.
|
||||
Legacy Celery/provider code is still in repository history/codebase but de-emphasized for v2.
|
||||
Legacy Celery/provider code remains in-repo but is isolated behind `LEGACY_PROVIDER_STACK_ENABLED=1`.
|
||||
Default v2 runtime keeps that stack disabled.
|
||||
|
||||
## Image Strategy
|
||||
|
||||
@ -47,6 +47,7 @@ Reserved for future optional scheduler use:
|
||||
- `docker-compose.yml`: production-minded baseline runtime (immutable image filesystem)
|
||||
- `docker-compose.dev.yml`: development override with source bind mount for `web`
|
||||
- `docker-compose.release.yml`: production settings override (`DJANGO_SETTINGS_MODULE=config.settings.production`)
|
||||
- `scripts/verify_release_topology.sh`: validates merged release compose has no source-code bind mounts for runtime services
|
||||
|
||||
### Start development runtime
|
||||
|
||||
@ -73,6 +74,31 @@ For development override:
|
||||
docker compose -f docker-compose.yml -f docker-compose.dev.yml --profile scheduler up -d scheduler
|
||||
```
|
||||
|
||||
### Runtime Modes At A Glance
|
||||
|
||||
- development (`docker-compose.yml` + `docker-compose.dev.yml`):
|
||||
- mutable source bind mounts for `web` and `scheduler`
|
||||
- optimized for local iteration
|
||||
- release-style (`docker-compose.yml` + `docker-compose.release.yml`):
|
||||
- immutable app filesystem for runtime services
|
||||
- production settings enabled for Django
|
||||
- scheduler profile:
|
||||
- only starts when `--profile scheduler` is used
|
||||
- if started with `SCHEDULER_ENABLED=0`, scheduler stays in idle sleep mode (no restart loop exit behavior)
|
||||
|
||||
### Release Topology Verification
|
||||
|
||||
Verify merged release config and immutability:
|
||||
|
||||
```bash
|
||||
docker compose -f docker-compose.yml -f docker-compose.release.yml config
|
||||
./scripts/verify_release_topology.sh
|
||||
```
|
||||
|
||||
Verification expectation:
|
||||
- `web` and `scheduler` must not bind-mount repository source code in release mode.
|
||||
- named volumes for DB/static/media/snapshots remain mounted.
|
||||
|
||||
## Named Volumes
|
||||
|
||||
v2 runtime uses named volumes for persistence:
|
||||
@ -85,6 +111,11 @@ v2 runtime uses named volumes for persistence:
|
||||
|
||||
Development override uses separate dev-prefixed volumes to avoid ownership collisions.
|
||||
|
||||
Snapshot volume intent:
|
||||
- `snapshots_incoming`: extractor output waiting for import
|
||||
- `snapshots_archive`: successfully imported files
|
||||
- `snapshots_failed`: schema/processing failures for operator inspection
|
||||
|
||||
## Environment Variables
|
||||
|
||||
Use `.env.example` as the source of truth.
|
||||
@ -96,6 +127,10 @@ Core groups:
|
||||
- snapshot directory vars (`STATIC_DATASET_*`)
|
||||
- optional future scheduler vars (`SCHEDULER_*`)
|
||||
- daily orchestration vars (`DAILY_ORCHESTRATION_*`)
|
||||
- optional legacy provider-sync toggle (`LEGACY_PROVIDER_STACK_ENABLED`)
|
||||
|
||||
Operational reference:
|
||||
- `docs/runtime-consistency-checklist.md`
|
||||
|
||||
## Snapshot Storage Convention
|
||||
|
||||
@ -156,11 +191,23 @@ Each file must be a JSON object:
|
||||
|
||||
Validation is strict:
|
||||
- unknown fields are rejected
|
||||
- required fields must exist
|
||||
- `snapshot_date` and `birth_date` must be `YYYY-MM-DD`
|
||||
- required fields must exist:
|
||||
- `competition_external_id`, `competition_name`, `season`
|
||||
- `team_external_id`, `team_name`
|
||||
- `player_external_id`, `full_name`
|
||||
- core stats (`games_played`, `minutes_per_game`, `points_per_game`, `rebounds_per_game`, `assists_per_game`, `steals_per_game`, `blocks_per_game`, `turnovers_per_game`, `fg_pct`, `three_pt_pct`, `ft_pct`)
|
||||
- optional player bio/physical fields:
|
||||
- `first_name`, `last_name`, `birth_date`, `nationality`, `height_cm`, `weight_kg`, `position`, `role`
|
||||
- when `birth_date` is provided it must be `YYYY-MM-DD`
|
||||
- numeric fields must be numeric
|
||||
- invalid files are moved to failed directory
|
||||
|
||||
Importer enrichment note:
|
||||
- `full_name` is source truth for identity display
|
||||
- `first_name` / `last_name` are optional and may be absent in public snapshots
|
||||
- when both are missing, importer may derive them from `full_name` as a best-effort enrichment step
|
||||
- this enrichment is convenience-only and does not override source truth semantics
|
||||
|
||||
## Import Command
|
||||
|
||||
Run import:
|
||||
@ -185,6 +232,12 @@ Command behavior:
|
||||
- moves valid files to archive
|
||||
- moves invalid files to failed
|
||||
|
||||
Import lifecycle summary:
|
||||
1. extractor writes normalized snapshots to `incoming`
|
||||
2. `import_snapshots` validates + upserts to PostgreSQL
|
||||
3. imported files move to `archive`
|
||||
4. invalid files move to `failed` with error details in `ImportFile`
|
||||
|
||||
### Source Identity Namespacing
|
||||
|
||||
Raw external IDs are **not globally unique** across basketball data sources. HoopScout v2 uses a namespaced identity for imported entities:
|
||||
@ -278,6 +331,7 @@ Notes:
|
||||
- extraction is intentionally low-frequency and uses retries conservatively
|
||||
- only public pages/endpoints should be targeted
|
||||
- emitted snapshots must match the same schema consumed by `import_snapshots`
|
||||
- `public_json_snapshot` uses the same required-vs-optional field contract as `SnapshotSchemaValidator` (no stricter extractor-only required bio/physical fields)
|
||||
- optional scheduler container runs `scripts/scheduler.sh` loop using:
|
||||
- image: `registry.younerd.org/hoopscout/scheduler:${APP_IMAGE_TAG:-latest}`
|
||||
- command: `/app/scripts/scheduler.sh`
|
||||
@ -304,6 +358,7 @@ Notes:
|
||||
- season is configured by `EXTRACTOR_LBA_SEASON_LABEL`
|
||||
- parser supports payload keys: `records`, `data`, `players`, `items`
|
||||
- normalization supports nested `player` and `team` objects with common stat aliases (`gp/mpg/ppg/rpg/apg/spg/bpg/tov`)
|
||||
- public-source player bio/physical fields are often incomplete; extractor allows them to be missing and emits `null` for optional fields
|
||||
- no live HTTP calls in tests; tests use fixtures/mocked responses only
|
||||
|
||||
### BCL extractor assumptions and limitations (MVP)
|
||||
@ -316,8 +371,20 @@ Notes:
|
||||
- season is configured by `EXTRACTOR_BCL_SEASON_LABEL`
|
||||
- parser supports payload keys: `records`, `data`, `players`, `items`
|
||||
- normalization supports nested `player` and `team` objects with common stat aliases (`gp/mpg/ppg/rpg/apg/spg/bpg/tov`)
|
||||
- public-source player bio/physical fields are often incomplete; extractor allows them to be missing and emits `null` for optional fields
|
||||
- no live HTTP calls in tests; tests use fixtures/mocked responses only
|
||||
|
||||
## Testing
|
||||
|
||||
- runtime `web` image stays lean and may not include `pytest` tooling
|
||||
- runtime containers (`web`/`nginx`/`scheduler`) are for serving/orchestration, not preloaded test tooling
|
||||
- run tests with the development compose stack (or a dedicated test image/profile) and install dev dependencies first
|
||||
- local example (one-off):
|
||||
|
||||
```bash
|
||||
docker compose -f docker-compose.yml -f docker-compose.dev.yml run --rm web sh -lc "export PYTHONUSERBASE=/tmp/pyuser && python -m pip install --user -r requirements/dev.txt && python -m pytest -q"
|
||||
```
|
||||
|
||||
## Migration and Superuser Commands
|
||||
|
||||
```bash
|
||||
@ -352,6 +419,20 @@ Search metric semantics:
|
||||
- different metric columns for one player may come from different eligible seasons
|
||||
- when no eligible value exists for a metric in the current context, the UI shows `-`
|
||||
|
||||
### API Search Metric Transparency
|
||||
|
||||
`GET /api/players/` now exposes sortable metric fields directly in each list row:
|
||||
- `ppg_value`
|
||||
- `mpg_value`
|
||||
|
||||
These fields use the same **best eligible** semantics as UI search. They are computed from eligible
|
||||
player-season rows in the current filter context and may be `null` when no eligible data exists.
|
||||
|
||||
API list responses also include:
|
||||
- `sort`: effective sort key applied
|
||||
- `metric_sort_keys`: metric-based sort keys currently supported
|
||||
- `metric_semantics`: plain-language metric contract used for sorting/interpretation
|
||||
|
||||
Pagination and sorting:
|
||||
- querystring is preserved
|
||||
- HTMX navigation keeps URL state in sync with current filters/page/sort
|
||||
@ -379,3 +460,7 @@ This v2 work branch is:
|
||||
|
||||
Legacy provider/Celery ingestion layers are not the default runtime path for v2 foundation.
|
||||
They are intentionally isolated until replaced by v2 snapshot ingestion commands in later tasks.
|
||||
By default:
|
||||
- `apps.providers` is not installed
|
||||
- `/providers/` routes are not mounted
|
||||
- legacy provider-specific settings are not required
|
||||
|
||||
@ -45,6 +45,8 @@ class PlayerListSerializer(serializers.ModelSerializer):
|
||||
inferred_role = serializers.CharField(source="inferred_role.name", allow_null=True)
|
||||
origin_competition = serializers.CharField(source="origin_competition.name", allow_null=True)
|
||||
origin_team = serializers.CharField(source="origin_team.name", allow_null=True)
|
||||
ppg_value = serializers.SerializerMethodField()
|
||||
mpg_value = serializers.SerializerMethodField()
|
||||
|
||||
class Meta:
|
||||
model = Player
|
||||
@ -59,10 +61,20 @@ class PlayerListSerializer(serializers.ModelSerializer):
|
||||
"origin_team",
|
||||
"height_cm",
|
||||
"weight_kg",
|
||||
"ppg_value",
|
||||
"mpg_value",
|
||||
"dominant_hand",
|
||||
"is_active",
|
||||
]
|
||||
|
||||
def get_ppg_value(self, obj):
|
||||
value = getattr(obj, "ppg_value", None)
|
||||
return str(value) if value is not None else None
|
||||
|
||||
def get_mpg_value(self, obj):
|
||||
value = getattr(obj, "mpg_value", None)
|
||||
return float(value) if value is not None else None
|
||||
|
||||
|
||||
class PlayerAliasSerializer(serializers.Serializer):
|
||||
alias = serializers.CharField()
|
||||
|
||||
@ -9,6 +9,7 @@ from apps.players.forms import PlayerSearchForm
|
||||
from apps.players.models import Player
|
||||
from apps.players.services.search import (
|
||||
METRIC_SORT_KEYS,
|
||||
SEARCH_METRIC_SEMANTICS_TEXT,
|
||||
annotate_player_metrics,
|
||||
apply_sorting,
|
||||
base_player_queryset,
|
||||
@ -67,14 +68,17 @@ class PlayerSearchApiView(ReadOnlyBaseAPIView, generics.ListAPIView):
|
||||
form = self.get_search_form()
|
||||
if form.is_bound and not form.is_valid():
|
||||
return self._validation_error_response()
|
||||
return super().list(request, *args, **kwargs)
|
||||
response = super().list(request, *args, **kwargs)
|
||||
response.data["sort"] = form.cleaned_data.get("sort", "name_asc")
|
||||
response.data["metric_semantics"] = SEARCH_METRIC_SEMANTICS_TEXT
|
||||
response.data["metric_sort_keys"] = sorted(METRIC_SORT_KEYS)
|
||||
return response
|
||||
|
||||
def get_queryset(self):
|
||||
form = self.get_search_form()
|
||||
queryset = base_player_queryset()
|
||||
queryset = filter_players(queryset, form.cleaned_data)
|
||||
sort_key = form.cleaned_data.get("sort", "name_asc")
|
||||
if sort_key in METRIC_SORT_KEYS:
|
||||
queryset = annotate_player_metrics(queryset, form.cleaned_data)
|
||||
queryset = apply_sorting(queryset, sort_key)
|
||||
return queryset
|
||||
|
||||
@ -1,4 +1,5 @@
|
||||
from django.contrib import admin
|
||||
from django.conf import settings
|
||||
|
||||
from .models import ImportFile, ImportRun, IngestionError, IngestionRun
|
||||
|
||||
@ -91,15 +92,18 @@ class ImportFileAdmin(admin.ModelAdmin):
|
||||
)
|
||||
|
||||
|
||||
@admin.register(IngestionRun)
|
||||
class LegacyIngestionRunAdmin(admin.ModelAdmin):
|
||||
list_display = ("provider_namespace", "job_type", "status", "started_at", "finished_at")
|
||||
list_filter = ("provider_namespace", "job_type", "status")
|
||||
search_fields = ("provider_namespace", "error_summary")
|
||||
|
||||
|
||||
@admin.register(IngestionError)
|
||||
class LegacyIngestionErrorAdmin(admin.ModelAdmin):
|
||||
list_display = ("provider_namespace", "entity_type", "external_id", "severity", "occurred_at")
|
||||
list_filter = ("severity", "provider_namespace")
|
||||
search_fields = ("entity_type", "external_id", "message")
|
||||
|
||||
|
||||
if settings.LEGACY_PROVIDER_STACK_ENABLED:
|
||||
admin.site.register(IngestionRun, LegacyIngestionRunAdmin)
|
||||
admin.site.register(IngestionError, LegacyIngestionErrorAdmin)
|
||||
|
||||
@ -16,6 +16,38 @@ def _first_non_empty(record: dict[str, Any], *keys: str) -> Any:
|
||||
return None
|
||||
|
||||
|
||||
def _first_non_empty_text(record: dict[str, Any], *keys: str) -> str | None:
|
||||
for key in keys:
|
||||
value = record.get(key)
|
||||
if isinstance(value, str):
|
||||
stripped = value.strip()
|
||||
if stripped:
|
||||
return stripped
|
||||
return None
|
||||
|
||||
|
||||
ESSENTIAL_FIELDS = {
|
||||
"competition_external_id",
|
||||
"competition_name",
|
||||
"season",
|
||||
"team_external_id",
|
||||
"team_name",
|
||||
"player_external_id",
|
||||
"full_name",
|
||||
"games_played",
|
||||
"minutes_per_game",
|
||||
"points_per_game",
|
||||
"rebounds_per_game",
|
||||
"assists_per_game",
|
||||
"steals_per_game",
|
||||
"blocks_per_game",
|
||||
"turnovers_per_game",
|
||||
"fg_pct",
|
||||
"three_pt_pct",
|
||||
"ft_pct",
|
||||
}
|
||||
|
||||
|
||||
class BCLSnapshotExtractor(BaseSnapshotExtractor):
|
||||
"""
|
||||
Basketball Champions League MVP extractor.
|
||||
@ -86,7 +118,9 @@ class BCLSnapshotExtractor(BaseSnapshotExtractor):
|
||||
team_external_id = _first_non_empty(source_record, "team_external_id", "team_id") or _first_non_empty(
|
||||
team_obj, "id", "team_id"
|
||||
)
|
||||
team_name = _first_non_empty(source_record, "team_name", "team") or _first_non_empty(team_obj, "name")
|
||||
team_name = _first_non_empty_text(source_record, "team_name", "team") or _first_non_empty_text(
|
||||
team_obj, "name"
|
||||
)
|
||||
|
||||
normalized = {
|
||||
"competition_external_id": self.competition_external_id,
|
||||
@ -122,7 +156,7 @@ class BCLSnapshotExtractor(BaseSnapshotExtractor):
|
||||
"ft_pct": _first_non_empty(source_record, "ft_pct", "ft_percentage"),
|
||||
}
|
||||
|
||||
missing = [key for key, value in normalized.items() if key != "role" and value in (None, "")]
|
||||
missing = [key for key in ESSENTIAL_FIELDS if normalized.get(key) in (None, "")]
|
||||
if missing:
|
||||
raise ExtractorNormalizationError(f"bcl row missing required fields: {', '.join(sorted(missing))}")
|
||||
|
||||
|
||||
@ -16,6 +16,38 @@ def _first_non_empty(record: dict[str, Any], *keys: str) -> Any:
|
||||
return None
|
||||
|
||||
|
||||
def _first_non_empty_text(record: dict[str, Any], *keys: str) -> str | None:
|
||||
for key in keys:
|
||||
value = record.get(key)
|
||||
if isinstance(value, str):
|
||||
stripped = value.strip()
|
||||
if stripped:
|
||||
return stripped
|
||||
return None
|
||||
|
||||
|
||||
ESSENTIAL_FIELDS = {
|
||||
"competition_external_id",
|
||||
"competition_name",
|
||||
"season",
|
||||
"team_external_id",
|
||||
"team_name",
|
||||
"player_external_id",
|
||||
"full_name",
|
||||
"games_played",
|
||||
"minutes_per_game",
|
||||
"points_per_game",
|
||||
"rebounds_per_game",
|
||||
"assists_per_game",
|
||||
"steals_per_game",
|
||||
"blocks_per_game",
|
||||
"turnovers_per_game",
|
||||
"fg_pct",
|
||||
"three_pt_pct",
|
||||
"ft_pct",
|
||||
}
|
||||
|
||||
|
||||
class LBASnapshotExtractor(BaseSnapshotExtractor):
|
||||
"""
|
||||
LBA (Lega Basket Serie A) MVP extractor.
|
||||
@ -86,7 +118,9 @@ class LBASnapshotExtractor(BaseSnapshotExtractor):
|
||||
team_external_id = _first_non_empty(source_record, "team_external_id", "team_id") or _first_non_empty(
|
||||
team_obj, "id", "team_id"
|
||||
)
|
||||
team_name = _first_non_empty(source_record, "team_name", "team") or _first_non_empty(team_obj, "name")
|
||||
team_name = _first_non_empty_text(source_record, "team_name", "team") or _first_non_empty_text(
|
||||
team_obj, "name"
|
||||
)
|
||||
|
||||
normalized = {
|
||||
"competition_external_id": self.competition_external_id,
|
||||
@ -122,7 +156,7 @@ class LBASnapshotExtractor(BaseSnapshotExtractor):
|
||||
"ft_pct": _first_non_empty(source_record, "ft_pct", "ft_percentage"),
|
||||
}
|
||||
|
||||
missing = [key for key, value in normalized.items() if key != "role" and value in (None, "")]
|
||||
missing = [key for key in ESSENTIAL_FIELDS if normalized.get(key) in (None, "")]
|
||||
if missing:
|
||||
raise ExtractorNormalizationError(f"lba row missing required fields: {', '.join(sorted(missing))}")
|
||||
|
||||
|
||||
@ -4,6 +4,8 @@ from typing import Any
|
||||
|
||||
from django.conf import settings
|
||||
|
||||
from apps.ingestion.snapshots.schema import REQUIRED_RECORD_FIELDS
|
||||
|
||||
from .base import (
|
||||
BaseSnapshotExtractor,
|
||||
ExtractorConfigError,
|
||||
@ -113,7 +115,7 @@ class PublicJsonSnapshotExtractor(BaseSnapshotExtractor):
|
||||
"ft_pct": _first_non_empty(source_record, "ft_pct"),
|
||||
}
|
||||
|
||||
missing = [key for key, value in normalized.items() if key != "role" and value in (None, "")]
|
||||
missing = [key for key in REQUIRED_RECORD_FIELDS if normalized.get(key) in (None, "")]
|
||||
if missing:
|
||||
raise ExtractorNormalizationError(
|
||||
f"public_json_snapshot row missing required fields: {', '.join(sorted(missing))}"
|
||||
|
||||
@ -1,9 +1,14 @@
|
||||
from django.conf import settings
|
||||
|
||||
from .runs import finish_ingestion_run, log_ingestion_error, start_ingestion_run
|
||||
from .sync import run_sync_job
|
||||
|
||||
__all__ = [
|
||||
"start_ingestion_run",
|
||||
"finish_ingestion_run",
|
||||
"log_ingestion_error",
|
||||
"run_sync_job",
|
||||
]
|
||||
|
||||
if settings.LEGACY_PROVIDER_STACK_ENABLED:
|
||||
from .sync import run_sync_job # pragma: no cover - legacy provider stack only.
|
||||
|
||||
__all__.append("run_sync_job")
|
||||
|
||||
@ -62,6 +62,21 @@ def _parse_season_dates(label: str) -> tuple[date, date]:
|
||||
return date(year, 9, 1), date(year + 1, 7, 31)
|
||||
|
||||
|
||||
def _parse_optional_birth_date(value: str | None) -> date | None:
|
||||
if value in (None, ""):
|
||||
return None
|
||||
return parse_date(value)
|
||||
|
||||
|
||||
def _split_name_parts(full_name: str) -> tuple[str, str]:
|
||||
parts = full_name.strip().split(maxsplit=1)
|
||||
if not parts:
|
||||
return "", ""
|
||||
if len(parts) == 1:
|
||||
return parts[0], ""
|
||||
return parts[0], parts[1]
|
||||
|
||||
|
||||
def _resolve_nationality(value: str | None) -> Nationality | None:
|
||||
if not value:
|
||||
return None
|
||||
@ -152,9 +167,12 @@ def _upsert_record(record: dict[str, Any], *, source_name: str, snapshot_date: d
|
||||
},
|
||||
)
|
||||
|
||||
position_value = record.get("position")
|
||||
position = None
|
||||
if position_value:
|
||||
position, _ = Position.objects.get_or_create(
|
||||
code=_position_code(record["position"]),
|
||||
defaults={"name": record["position"]},
|
||||
code=_position_code(position_value),
|
||||
defaults={"name": position_value},
|
||||
)
|
||||
role = None
|
||||
if record.get("role"):
|
||||
@ -163,19 +181,24 @@ def _upsert_record(record: dict[str, Any], *, source_name: str, snapshot_date: d
|
||||
defaults={"name": record["role"]},
|
||||
)
|
||||
|
||||
first_name = record.get("first_name") or ""
|
||||
last_name = record.get("last_name") or ""
|
||||
if not first_name and not last_name:
|
||||
first_name, last_name = _split_name_parts(record["full_name"])
|
||||
|
||||
player, _ = Player.objects.update_or_create(
|
||||
source_name=source_key,
|
||||
source_uid=record["player_external_id"],
|
||||
defaults={
|
||||
"first_name": record["first_name"],
|
||||
"last_name": record["last_name"],
|
||||
"first_name": first_name,
|
||||
"last_name": last_name,
|
||||
"full_name": record["full_name"],
|
||||
"birth_date": parse_date(record["birth_date"]),
|
||||
"birth_date": _parse_optional_birth_date(record.get("birth_date")),
|
||||
"nationality": _resolve_nationality(record.get("nationality")),
|
||||
"nominal_position": position,
|
||||
"inferred_role": role,
|
||||
"height_cm": record["height_cm"],
|
||||
"weight_kg": record["weight_kg"],
|
||||
"height_cm": record.get("height_cm"),
|
||||
"weight_kg": record.get("weight_kg"),
|
||||
"is_active": True,
|
||||
},
|
||||
)
|
||||
|
||||
@ -14,13 +14,6 @@ REQUIRED_RECORD_FIELDS = {
|
||||
"team_name",
|
||||
"player_external_id",
|
||||
"full_name",
|
||||
"first_name",
|
||||
"last_name",
|
||||
"birth_date",
|
||||
"nationality",
|
||||
"height_cm",
|
||||
"weight_kg",
|
||||
"position",
|
||||
"games_played",
|
||||
"minutes_per_game",
|
||||
"points_per_game",
|
||||
@ -34,6 +27,16 @@ REQUIRED_RECORD_FIELDS = {
|
||||
"ft_pct",
|
||||
}
|
||||
|
||||
OPTIONAL_RECORD_FIELDS = {
|
||||
"first_name",
|
||||
"last_name",
|
||||
"birth_date",
|
||||
"nationality",
|
||||
"height_cm",
|
||||
"weight_kg",
|
||||
"position",
|
||||
}
|
||||
|
||||
ALLOWED_TOP_LEVEL_FIELDS = {
|
||||
"source_name",
|
||||
"snapshot_date",
|
||||
@ -42,7 +45,7 @@ ALLOWED_TOP_LEVEL_FIELDS = {
|
||||
"raw_payload",
|
||||
}
|
||||
|
||||
ALLOWED_RECORD_FIELDS = REQUIRED_RECORD_FIELDS | {
|
||||
ALLOWED_RECORD_FIELDS = REQUIRED_RECORD_FIELDS | OPTIONAL_RECORD_FIELDS | {
|
||||
"role",
|
||||
"source_metadata",
|
||||
"raw_payload",
|
||||
@ -69,6 +72,15 @@ class SnapshotSchemaValidator:
|
||||
raise SnapshotValidationError(f"{field} must be a non-empty string")
|
||||
return value.strip()
|
||||
|
||||
@staticmethod
|
||||
def _optional_string(value: Any, field: str) -> str | None:
|
||||
if value in (None, ""):
|
||||
return None
|
||||
if not isinstance(value, str):
|
||||
raise SnapshotValidationError(f"{field} must be a string when provided")
|
||||
stripped = value.strip()
|
||||
return stripped or None
|
||||
|
||||
@staticmethod
|
||||
def _require_non_negative_int(value: Any, field: str) -> int:
|
||||
if isinstance(value, bool):
|
||||
@ -81,6 +93,12 @@ class SnapshotSchemaValidator:
|
||||
raise SnapshotValidationError(f"{field} must be a non-negative integer")
|
||||
return parsed
|
||||
|
||||
@classmethod
|
||||
def _optional_non_negative_int(cls, value: Any, field: str) -> int | None:
|
||||
if value in (None, ""):
|
||||
return None
|
||||
return cls._require_non_negative_int(value, field)
|
||||
|
||||
@staticmethod
|
||||
def _require_float(value: Any, field: str) -> float:
|
||||
try:
|
||||
@ -112,23 +130,26 @@ class SnapshotSchemaValidator:
|
||||
"team_name",
|
||||
"player_external_id",
|
||||
"full_name",
|
||||
"first_name",
|
||||
"last_name",
|
||||
"nationality",
|
||||
"position",
|
||||
):
|
||||
normalized[field] = cls._require_string(record.get(field), f"record[{index}].{field}")
|
||||
|
||||
for field in ("first_name", "last_name", "nationality", "position"):
|
||||
normalized[field] = cls._optional_string(record.get(field), f"record[{index}].{field}")
|
||||
|
||||
if record.get("role") is not None:
|
||||
normalized["role"] = cls._require_string(record.get("role"), f"record[{index}].role")
|
||||
|
||||
birth_date = parse_date(str(record.get("birth_date")))
|
||||
birth_date_raw = record.get("birth_date")
|
||||
if birth_date_raw in (None, ""):
|
||||
normalized["birth_date"] = None
|
||||
else:
|
||||
birth_date = parse_date(str(birth_date_raw))
|
||||
if not birth_date:
|
||||
raise SnapshotValidationError(f"record[{index}].birth_date must be YYYY-MM-DD")
|
||||
normalized["birth_date"] = birth_date.isoformat()
|
||||
|
||||
normalized["height_cm"] = cls._require_non_negative_int(record.get("height_cm"), f"record[{index}].height_cm")
|
||||
normalized["weight_kg"] = cls._require_non_negative_int(record.get("weight_kg"), f"record[{index}].weight_kg")
|
||||
normalized["height_cm"] = cls._optional_non_negative_int(record.get("height_cm"), f"record[{index}].height_cm")
|
||||
normalized["weight_kg"] = cls._optional_non_negative_int(record.get("weight_kg"), f"record[{index}].weight_kg")
|
||||
normalized["games_played"] = cls._require_non_negative_int(record.get("games_played"), f"record[{index}].games_played")
|
||||
|
||||
for field in (
|
||||
|
||||
@ -72,10 +72,14 @@ INSTALLED_APPS = [
|
||||
"apps.teams",
|
||||
"apps.stats",
|
||||
"apps.scouting",
|
||||
"apps.providers",
|
||||
"apps.ingestion",
|
||||
]
|
||||
|
||||
# v2 default runtime is snapshot-first. Legacy provider stack is opt-in.
|
||||
LEGACY_PROVIDER_STACK_ENABLED = env_bool("LEGACY_PROVIDER_STACK_ENABLED", False)
|
||||
if LEGACY_PROVIDER_STACK_ENABLED:
|
||||
INSTALLED_APPS.append("apps.providers")
|
||||
|
||||
MIDDLEWARE = [
|
||||
"django.middleware.security.SecurityMiddleware",
|
||||
"django.contrib.sessions.middleware.SessionMiddleware",
|
||||
@ -195,6 +199,7 @@ SCHEDULER_INTERVAL_SECONDS = int(os.getenv("SCHEDULER_INTERVAL_SECONDS", "900"))
|
||||
if SCHEDULER_INTERVAL_SECONDS < 30:
|
||||
raise ImproperlyConfigured("SCHEDULER_INTERVAL_SECONDS must be >= 30.")
|
||||
|
||||
if LEGACY_PROVIDER_STACK_ENABLED:
|
||||
PROVIDER_BACKEND = os.getenv("PROVIDER_BACKEND", "demo").strip().lower()
|
||||
PROVIDER_NAMESPACE_DEMO = os.getenv("PROVIDER_NAMESPACE_DEMO", "mvp_demo")
|
||||
PROVIDER_NAMESPACE_BALLDONTLIE = os.getenv("PROVIDER_NAMESPACE_BALLDONTLIE", "balldontlie")
|
||||
|
||||
@ -1,4 +1,5 @@
|
||||
from django.contrib import admin
|
||||
from django.conf import settings
|
||||
from django.urls import include, path
|
||||
|
||||
urlpatterns = [
|
||||
@ -11,6 +12,8 @@ urlpatterns = [
|
||||
path("teams/", include("apps.teams.urls")),
|
||||
path("stats/", include("apps.stats.urls")),
|
||||
path("scouting/", include("apps.scouting.urls")),
|
||||
path("providers/", include("apps.providers.urls")),
|
||||
path("ingestion/", include("apps.ingestion.urls")),
|
||||
]
|
||||
|
||||
if settings.LEGACY_PROVIDER_STACK_ENABLED:
|
||||
urlpatterns.append(path("providers/", include("apps.providers.urls")))
|
||||
|
||||
58
docs/runtime-consistency-checklist.md
Normal file
58
docs/runtime-consistency-checklist.md
Normal file
@ -0,0 +1,58 @@
|
||||
# Runtime Consistency Checklist (v2)
|
||||
|
||||
Use this checklist when runtime/docs changes are made.
|
||||
|
||||
## Compose and Runtime
|
||||
|
||||
- `docker-compose.yml` contains only v2 default runtime services:
|
||||
- `web`, `nginx`, `postgres`
|
||||
- optional `scheduler` profile service
|
||||
- `docker-compose.dev.yml` is mutable (source bind mounts allowed for dev only).
|
||||
- `docker-compose.release.yml` is settings-focused and keeps release runtime immutable.
|
||||
|
||||
## Image/Registry Strategy
|
||||
|
||||
- `web` image: `registry.younerd.org/hoopscout/web:${APP_IMAGE_TAG:-latest}`
|
||||
- `nginx` image: `registry.younerd.org/hoopscout/nginx:${NGINX_IMAGE_TAG:-latest}`
|
||||
- optional scheduler image: `registry.younerd.org/hoopscout/scheduler:${APP_IMAGE_TAG:-latest}`
|
||||
|
||||
## Entrypoints
|
||||
|
||||
- `entrypoint.sh`:
|
||||
- waits for PostgreSQL
|
||||
- creates snapshot directories
|
||||
- optionally runs `migrate` and `collectstatic` when booting gunicorn
|
||||
- `scripts/scheduler.sh`:
|
||||
- runs `run_daily_orchestration` loop
|
||||
- idle-sleeps when `SCHEDULER_ENABLED=0`
|
||||
|
||||
## Snapshot Lifecycle
|
||||
|
||||
1. Extractor writes snapshots to `incoming`.
|
||||
2. `import_snapshots` validates + upserts into PostgreSQL.
|
||||
3. Success => file moved to `archive`.
|
||||
4. Failure => file moved to `failed`.
|
||||
|
||||
## Source Identity Rule
|
||||
|
||||
Raw IDs are not global. Imported identities are namespaced by source:
|
||||
|
||||
- `Competition`: `(source_name, source_uid)`
|
||||
- `Team`: `(source_name, source_uid)`
|
||||
- `Player`: `(source_name, source_uid)`
|
||||
|
||||
## Legacy Isolation
|
||||
|
||||
- `LEGACY_PROVIDER_STACK_ENABLED=0` by default.
|
||||
- With default setting:
|
||||
- `apps.providers` is not installed
|
||||
- `/providers/` routes are not mounted
|
||||
- legacy provider settings are not required
|
||||
|
||||
## Verification Commands
|
||||
|
||||
```bash
|
||||
docker compose -f docker-compose.yml -f docker-compose.release.yml config
|
||||
./scripts/verify_release_topology.sh
|
||||
docker compose -f docker-compose.yml -f docker-compose.dev.yml run --rm web sh -lc "export PYTHONUSERBASE=/tmp/pyuser && python -m pip install --user -r requirements/dev.txt && python -m pytest -q"
|
||||
```
|
||||
@ -30,7 +30,6 @@ check_service_bind_mount() {
|
||||
}
|
||||
|
||||
check_service_bind_mount "web"
|
||||
check_service_bind_mount "celery_worker"
|
||||
check_service_bind_mount "celery_beat"
|
||||
check_service_bind_mount "scheduler"
|
||||
|
||||
echo "Release topology verification passed."
|
||||
|
||||
25
tests/fixtures/bcl/bcl_players_stats_partial_public.json
vendored
Normal file
25
tests/fixtures/bcl/bcl_players_stats_partial_public.json
vendored
Normal file
@ -0,0 +1,25 @@
|
||||
{
|
||||
"data": [
|
||||
{
|
||||
"player": {
|
||||
"id": "bcl-player-99",
|
||||
"name": "Alex Novak"
|
||||
},
|
||||
"team": {
|
||||
"id": "bcl-team-tenerife",
|
||||
"name": "Lenovo Tenerife"
|
||||
},
|
||||
"gp": 10,
|
||||
"mpg": 27.2,
|
||||
"ppg": 14.8,
|
||||
"rpg": 4.1,
|
||||
"apg": 3.3,
|
||||
"spg": 1.2,
|
||||
"bpg": 0.4,
|
||||
"tov": 2.0,
|
||||
"fg_pct": 47.3,
|
||||
"three_pct": 38.0,
|
||||
"ft_pct": 79.1
|
||||
}
|
||||
]
|
||||
}
|
||||
25
tests/fixtures/lba/lba_players_stats_partial_public.json
vendored
Normal file
25
tests/fixtures/lba/lba_players_stats_partial_public.json
vendored
Normal file
@ -0,0 +1,25 @@
|
||||
{
|
||||
"data": [
|
||||
{
|
||||
"player": {
|
||||
"id": "p-002",
|
||||
"name": "Andrea Bianchi"
|
||||
},
|
||||
"team": {
|
||||
"id": "team-olimpia-milano",
|
||||
"name": "Olimpia Milano"
|
||||
},
|
||||
"gp": 18,
|
||||
"mpg": 24.7,
|
||||
"ppg": 12.3,
|
||||
"rpg": 2.9,
|
||||
"apg": 4.2,
|
||||
"spg": 1.1,
|
||||
"bpg": 0.1,
|
||||
"tov": 1.8,
|
||||
"fg_pct": 45.0,
|
||||
"three_pct": 35.4,
|
||||
"ft_pct": 82.7
|
||||
}
|
||||
]
|
||||
}
|
||||
@ -30,6 +30,12 @@ def test_players_api_list_and_detail(client):
|
||||
list_response = client.get(reverse("api:players"), data={"q": "rossi"})
|
||||
assert list_response.status_code == 200
|
||||
assert list_response.json()["count"] == 1
|
||||
list_payload = list_response.json()
|
||||
assert "sort" in list_payload
|
||||
assert "metric_semantics" in list_payload
|
||||
assert "metric_sort_keys" in list_payload
|
||||
assert "ppg_value" in list_payload["results"][0]
|
||||
assert "mpg_value" in list_payload["results"][0]
|
||||
|
||||
detail_response = client.get(reverse("api:player_detail", kwargs={"pk": player.pk}))
|
||||
assert detail_response.status_code == 200
|
||||
@ -173,8 +179,33 @@ def test_players_api_metric_sort_uses_best_eligible_values(client):
|
||||
|
||||
response = client.get(reverse("api:players"), data={"sort": "ppg_desc"})
|
||||
assert response.status_code == 200
|
||||
names = [row["full_name"] for row in response.json()["results"]]
|
||||
payload = response.json()
|
||||
names = [row["full_name"] for row in payload["results"]]
|
||||
assert names.index("Dan High") < names.index("Ion Low")
|
||||
assert payload["sort"] == "ppg_desc"
|
||||
assert "best eligible values per player" in payload["metric_semantics"]
|
||||
dan = next(row for row in payload["results"] if row["full_name"] == "Dan High")
|
||||
ion = next(row for row in payload["results"] if row["full_name"] == "Ion Low")
|
||||
assert float(dan["ppg_value"]) > float(ion["ppg_value"])
|
||||
|
||||
|
||||
@pytest.mark.django_db
|
||||
def test_players_api_metric_fields_are_exposed_and_nullable(client):
|
||||
nationality = Nationality.objects.create(name="Sweden", iso2_code="SE", iso3_code="SWE")
|
||||
Player.objects.create(
|
||||
first_name="No",
|
||||
last_name="Stats",
|
||||
full_name="No Stats",
|
||||
birth_date=date(2002, 1, 1),
|
||||
nationality=nationality,
|
||||
)
|
||||
|
||||
response = client.get(reverse("api:players"), data={"sort": "name_asc"})
|
||||
assert response.status_code == 200
|
||||
payload = response.json()
|
||||
row = next(item for item in payload["results"] if item["full_name"] == "No Stats")
|
||||
assert row["ppg_value"] is None
|
||||
assert row["mpg_value"] is None
|
||||
|
||||
|
||||
@pytest.mark.django_db
|
||||
|
||||
@ -8,6 +8,7 @@ import pytest
|
||||
from django.core.management import call_command
|
||||
|
||||
from apps.ingestion.extractors.bcl import BCLSnapshotExtractor
|
||||
from apps.ingestion.extractors.base import ExtractorNormalizationError
|
||||
from apps.ingestion.extractors.registry import create_extractor
|
||||
|
||||
|
||||
@ -51,6 +52,56 @@ def test_bcl_extractor_normalizes_fixture_payload(tmp_path, settings):
|
||||
assert row["three_pt_pct"] == 37.2
|
||||
|
||||
|
||||
@pytest.mark.django_db
|
||||
def test_bcl_extractor_accepts_partial_public_player_bio_fields(tmp_path, settings):
|
||||
settings.EXTRACTOR_BCL_STATS_URL = "https://www.championsleague.basketball/public/stats.json"
|
||||
settings.EXTRACTOR_BCL_SEASON_LABEL = "2025-2026"
|
||||
settings.EXTRACTOR_BCL_COMPETITION_EXTERNAL_ID = "bcl"
|
||||
settings.EXTRACTOR_BCL_COMPETITION_NAME = "Basketball Champions League"
|
||||
|
||||
fixture_payload = _load_fixture("bcl/bcl_players_stats_partial_public.json")
|
||||
|
||||
class FakeClient:
|
||||
def get_json(self, *_args, **_kwargs):
|
||||
return fixture_payload
|
||||
|
||||
extractor = BCLSnapshotExtractor(http_client=FakeClient())
|
||||
output_path = tmp_path / "bcl-partial.json"
|
||||
result = extractor.run(output_path=output_path, snapshot_date=date(2026, 3, 13))
|
||||
|
||||
assert result.records_count == 1
|
||||
payload = json.loads(output_path.read_text(encoding="utf-8"))
|
||||
row = payload["records"][0]
|
||||
assert row["full_name"] == "Alex Novak"
|
||||
assert row["first_name"] is None
|
||||
assert row["last_name"] is None
|
||||
assert row["birth_date"] is None
|
||||
assert row["nationality"] is None
|
||||
assert row["height_cm"] is None
|
||||
assert row["weight_kg"] is None
|
||||
assert row["position"] is None
|
||||
assert row["games_played"] == 10
|
||||
|
||||
|
||||
@pytest.mark.django_db
|
||||
def test_bcl_extractor_still_fails_when_required_stats_are_missing(settings):
|
||||
settings.EXTRACTOR_BCL_STATS_URL = "https://www.championsleague.basketball/public/stats.json"
|
||||
settings.EXTRACTOR_BCL_SEASON_LABEL = "2025-2026"
|
||||
settings.EXTRACTOR_BCL_COMPETITION_EXTERNAL_ID = "bcl"
|
||||
settings.EXTRACTOR_BCL_COMPETITION_NAME = "Basketball Champions League"
|
||||
|
||||
fixture_payload = _load_fixture("bcl/bcl_players_stats_partial_public.json")
|
||||
fixture_payload["data"][0].pop("ppg")
|
||||
|
||||
class FakeClient:
|
||||
def get_json(self, *_args, **_kwargs):
|
||||
return fixture_payload
|
||||
|
||||
extractor = BCLSnapshotExtractor(http_client=FakeClient())
|
||||
with pytest.raises(ExtractorNormalizationError):
|
||||
extractor.run(write_output=False, snapshot_date=date(2026, 3, 13))
|
||||
|
||||
|
||||
@pytest.mark.django_db
|
||||
def test_bcl_extractor_registry_selection(settings):
|
||||
settings.EXTRACTOR_BCL_STATS_URL = "https://www.championsleague.basketball/public/stats.json"
|
||||
|
||||
@ -1,38 +0,0 @@
|
||||
import os
|
||||
import subprocess
|
||||
import sys
|
||||
|
||||
import pytest
|
||||
|
||||
|
||||
def _run_python_import(code: str, env_overrides: dict[str, str]) -> subprocess.CompletedProcess:
|
||||
env = os.environ.copy()
|
||||
env.update(env_overrides)
|
||||
return subprocess.run(
|
||||
[sys.executable, "-c", code],
|
||||
capture_output=True,
|
||||
text=True,
|
||||
env=env,
|
||||
check=False,
|
||||
)
|
||||
|
||||
|
||||
@pytest.mark.django_db
|
||||
def test_invalid_cron_does_not_crash_config_import_path():
|
||||
result = _run_python_import(
|
||||
(
|
||||
"import config; "
|
||||
"from config.celery import app; "
|
||||
"print(f'beat_schedule_size={len(app.conf.beat_schedule or {})}')"
|
||||
),
|
||||
{
|
||||
"DJANGO_SETTINGS_MODULE": "config.settings.development",
|
||||
"DJANGO_ENV": "development",
|
||||
"DJANGO_DEBUG": "1",
|
||||
"INGESTION_SCHEDULE_ENABLED": "1",
|
||||
"INGESTION_SCHEDULE_CRON": "bad cron value",
|
||||
},
|
||||
)
|
||||
|
||||
assert result.returncode == 0
|
||||
assert "beat_schedule_size=0" in result.stdout
|
||||
@ -7,8 +7,10 @@ import pytest
|
||||
from django.core.management import call_command
|
||||
|
||||
from apps.ingestion.extractors.base import BaseSnapshotExtractor
|
||||
from apps.ingestion.extractors.base import ExtractorNormalizationError
|
||||
from apps.ingestion.extractors.http import ResponsibleHttpClient
|
||||
from apps.ingestion.extractors.public_json import PublicJsonSnapshotExtractor
|
||||
from apps.ingestion.snapshots.schema import REQUIRED_RECORD_FIELDS
|
||||
|
||||
|
||||
class DummyExtractor(BaseSnapshotExtractor):
|
||||
@ -64,6 +66,29 @@ class _FakeResponse:
|
||||
return self._payload
|
||||
|
||||
|
||||
def _minimal_public_json_record() -> dict:
|
||||
return {
|
||||
"competition_external_id": "comp-1",
|
||||
"competition_name": "League One",
|
||||
"season": "2025-2026",
|
||||
"team_external_id": "team-1",
|
||||
"team_name": "Team One",
|
||||
"player_external_id": "player-1",
|
||||
"full_name": "Jane Doe",
|
||||
"games_played": 12,
|
||||
"minutes_per_game": 27.2,
|
||||
"points_per_game": 13.0,
|
||||
"rebounds_per_game": 4.4,
|
||||
"assists_per_game": 3.1,
|
||||
"steals_per_game": 1.0,
|
||||
"blocks_per_game": 0.3,
|
||||
"turnovers_per_game": 1.8,
|
||||
"fg_pct": 46.2,
|
||||
"three_pt_pct": 35.5,
|
||||
"ft_pct": 82.1,
|
||||
}
|
||||
|
||||
|
||||
@pytest.mark.django_db
|
||||
def test_base_extractor_run_writes_snapshot_file(tmp_path, settings):
|
||||
settings.STATIC_DATASET_INCOMING_DIR = str(tmp_path / "incoming")
|
||||
@ -135,6 +160,71 @@ def test_public_json_extractor_normalizes_common_field_aliases(tmp_path):
|
||||
assert row["three_pt_pct"] == 36.1
|
||||
|
||||
|
||||
@pytest.mark.django_db
|
||||
def test_public_json_extractor_accepts_missing_optional_bio_and_physical_fields(tmp_path):
|
||||
class FakeClient:
|
||||
def get_json(self, *_args, **_kwargs):
|
||||
return {"records": [_minimal_public_json_record()]}
|
||||
|
||||
extractor = PublicJsonSnapshotExtractor(
|
||||
url="https://example.com/public-feed.json",
|
||||
source_name="test_public_feed",
|
||||
http_client=FakeClient(),
|
||||
)
|
||||
output_file = tmp_path / "public-optional.json"
|
||||
result = extractor.run(output_path=output_file, snapshot_date=date(2026, 3, 13))
|
||||
|
||||
assert result.records_count == 1
|
||||
payload = json.loads(output_file.read_text(encoding="utf-8"))
|
||||
row = payload["records"][0]
|
||||
assert row["full_name"] == "Jane Doe"
|
||||
assert row["first_name"] is None
|
||||
assert row["last_name"] is None
|
||||
assert row["birth_date"] is None
|
||||
assert row["nationality"] is None
|
||||
assert row["height_cm"] is None
|
||||
assert row["weight_kg"] is None
|
||||
assert row["position"] is None
|
||||
assert row.get("role") is None
|
||||
|
||||
|
||||
@pytest.mark.django_db
|
||||
def test_public_json_extractor_fails_when_required_stat_missing():
|
||||
broken = _minimal_public_json_record()
|
||||
broken.pop("points_per_game")
|
||||
|
||||
class FakeClient:
|
||||
def get_json(self, *_args, **_kwargs):
|
||||
return {"records": [broken]}
|
||||
|
||||
extractor = PublicJsonSnapshotExtractor(
|
||||
url="https://example.com/public-feed.json",
|
||||
source_name="test_public_feed",
|
||||
http_client=FakeClient(),
|
||||
)
|
||||
with pytest.raises(ExtractorNormalizationError):
|
||||
extractor.run(write_output=False, snapshot_date=date(2026, 3, 13))
|
||||
|
||||
|
||||
@pytest.mark.django_db
|
||||
@pytest.mark.parametrize("required_field", sorted(REQUIRED_RECORD_FIELDS))
|
||||
def test_public_json_required_fields_follow_snapshot_schema(required_field):
|
||||
broken = _minimal_public_json_record()
|
||||
broken.pop(required_field)
|
||||
|
||||
class FakeClient:
|
||||
def get_json(self, *_args, **_kwargs):
|
||||
return {"records": [broken]}
|
||||
|
||||
extractor = PublicJsonSnapshotExtractor(
|
||||
url="https://example.com/public-feed.json",
|
||||
source_name="test_public_feed",
|
||||
http_client=FakeClient(),
|
||||
)
|
||||
with pytest.raises(ExtractorNormalizationError, match="missing required fields"):
|
||||
extractor.run(write_output=False, snapshot_date=date(2026, 3, 13))
|
||||
|
||||
|
||||
@pytest.mark.django_db
|
||||
def test_run_extractor_management_command_writes_snapshot(tmp_path, settings):
|
||||
settings.EXTRACTOR_PUBLIC_JSON_URL = "https://example.com/feed.json"
|
||||
|
||||
@ -103,6 +103,116 @@ def test_valid_snapshot_import(tmp_path, settings):
|
||||
assert PlayerSeasonStats.objects.count() == 1
|
||||
|
||||
|
||||
@pytest.mark.django_db
|
||||
def test_snapshot_import_succeeds_with_optional_bio_and_physical_fields_missing(tmp_path, settings):
|
||||
incoming = tmp_path / "incoming"
|
||||
archive = tmp_path / "archive"
|
||||
failed = tmp_path / "failed"
|
||||
incoming.mkdir()
|
||||
archive.mkdir()
|
||||
failed.mkdir()
|
||||
|
||||
payload = _valid_payload()
|
||||
for optional_field in ("first_name", "last_name", "birth_date", "nationality", "height_cm", "weight_kg", "position", "role"):
|
||||
payload["records"][0].pop(optional_field, None)
|
||||
|
||||
file_path = incoming / "optional-missing.json"
|
||||
_write_json(file_path, payload)
|
||||
|
||||
settings.STATIC_DATASET_INCOMING_DIR = str(incoming)
|
||||
settings.STATIC_DATASET_ARCHIVE_DIR = str(archive)
|
||||
settings.STATIC_DATASET_FAILED_DIR = str(failed)
|
||||
|
||||
call_command("import_snapshots")
|
||||
|
||||
run = ImportRun.objects.get()
|
||||
assert run.status == ImportRun.RunStatus.SUCCESS
|
||||
player = Player.objects.get(source_uid="player-23")
|
||||
assert player.first_name == "LeBron"
|
||||
assert player.last_name == "James"
|
||||
assert player.birth_date is None
|
||||
assert player.nationality is None
|
||||
assert player.nominal_position is None
|
||||
assert player.height_cm is None
|
||||
assert player.weight_kg is None
|
||||
assert PlayerSeasonStats.objects.count() == 1
|
||||
|
||||
|
||||
@pytest.mark.django_db
|
||||
def test_snapshot_import_preserves_single_name_part_without_forced_split(tmp_path, settings):
|
||||
incoming = tmp_path / "incoming"
|
||||
archive = tmp_path / "archive"
|
||||
failed = tmp_path / "failed"
|
||||
incoming.mkdir()
|
||||
archive.mkdir()
|
||||
failed.mkdir()
|
||||
|
||||
payload = _valid_payload()
|
||||
row = payload["records"][0]
|
||||
row["first_name"] = "LeBron"
|
||||
row.pop("last_name")
|
||||
|
||||
file_path = incoming / "single-name-part.json"
|
||||
_write_json(file_path, payload)
|
||||
|
||||
settings.STATIC_DATASET_INCOMING_DIR = str(incoming)
|
||||
settings.STATIC_DATASET_ARCHIVE_DIR = str(archive)
|
||||
settings.STATIC_DATASET_FAILED_DIR = str(failed)
|
||||
|
||||
call_command("import_snapshots")
|
||||
|
||||
run = ImportRun.objects.get()
|
||||
assert run.status == ImportRun.RunStatus.SUCCESS
|
||||
player = Player.objects.get(source_uid="player-23")
|
||||
assert player.first_name == "LeBron"
|
||||
assert player.last_name == ""
|
||||
|
||||
|
||||
@pytest.mark.django_db
|
||||
@pytest.mark.parametrize(
|
||||
("source_name", "competition_id", "competition_name"),
|
||||
[
|
||||
("lba", "lba-serie-a", "Lega Basket Serie A"),
|
||||
("bcl", "bcl", "Basketball Champions League"),
|
||||
],
|
||||
)
|
||||
def test_partial_public_source_snapshot_imports_for_lba_and_bcl(
|
||||
tmp_path,
|
||||
settings,
|
||||
source_name,
|
||||
competition_id,
|
||||
competition_name,
|
||||
):
|
||||
incoming = tmp_path / "incoming"
|
||||
archive = tmp_path / "archive"
|
||||
failed = tmp_path / "failed"
|
||||
incoming.mkdir()
|
||||
archive.mkdir()
|
||||
failed.mkdir()
|
||||
|
||||
payload = _valid_payload()
|
||||
payload["source_name"] = source_name
|
||||
row = payload["records"][0]
|
||||
row["competition_external_id"] = competition_id
|
||||
row["competition_name"] = competition_name
|
||||
for optional_field in ("first_name", "last_name", "birth_date", "nationality", "height_cm", "weight_kg", "position", "role"):
|
||||
row.pop(optional_field, None)
|
||||
|
||||
_write_json(incoming / f"{source_name}.json", payload)
|
||||
|
||||
settings.STATIC_DATASET_INCOMING_DIR = str(incoming)
|
||||
settings.STATIC_DATASET_ARCHIVE_DIR = str(archive)
|
||||
settings.STATIC_DATASET_FAILED_DIR = str(failed)
|
||||
|
||||
call_command("import_snapshots")
|
||||
|
||||
run = ImportRun.objects.get()
|
||||
assert run.status == ImportRun.RunStatus.SUCCESS
|
||||
assert Competition.objects.filter(source_uid=competition_id, name=competition_name).exists()
|
||||
assert Player.objects.filter(source_uid="player-23").exists()
|
||||
assert PlayerSeasonStats.objects.count() == 1
|
||||
|
||||
|
||||
@pytest.mark.django_db
|
||||
def test_invalid_snapshot_rejected_and_moved_to_failed(tmp_path, settings):
|
||||
incoming = tmp_path / "incoming"
|
||||
|
||||
@ -1,251 +0,0 @@
|
||||
import os
|
||||
|
||||
import pytest
|
||||
|
||||
from apps.competitions.models import Competition, Season
|
||||
from apps.ingestion.models import IngestionError, IngestionRun
|
||||
from apps.ingestion.services.sync import run_sync_job
|
||||
from apps.players.models import Nationality, Player
|
||||
from apps.providers.exceptions import ProviderRateLimitError
|
||||
from apps.providers.models import ExternalMapping
|
||||
from apps.stats.models import PlayerSeason, PlayerSeasonStats
|
||||
from apps.teams.models import Team
|
||||
|
||||
|
||||
@pytest.mark.django_db
|
||||
def test_run_full_sync_creates_domain_objects(settings):
|
||||
settings.PROVIDER_DEFAULT_NAMESPACE = "mvp_demo"
|
||||
|
||||
run = run_sync_job(provider_namespace="mvp_demo", job_type=IngestionRun.JobType.FULL_SYNC)
|
||||
|
||||
assert run.status == IngestionRun.RunStatus.SUCCESS
|
||||
assert Competition.objects.count() >= 1
|
||||
assert Team.objects.count() >= 1
|
||||
assert Season.objects.count() >= 1
|
||||
assert Player.objects.count() >= 1
|
||||
assert PlayerSeason.objects.count() >= 1
|
||||
assert PlayerSeasonStats.objects.count() >= 1
|
||||
assert Player.objects.filter(origin_competition__isnull=False).exists()
|
||||
assert run.context.get("completed_steps") == [
|
||||
"competitions",
|
||||
"teams",
|
||||
"seasons",
|
||||
"players",
|
||||
"player_stats",
|
||||
"player_careers",
|
||||
]
|
||||
assert run.context.get("source_counts", {}).get("players", 0) >= 1
|
||||
|
||||
|
||||
@pytest.mark.django_db
|
||||
def test_full_sync_is_idempotent(settings):
|
||||
settings.PROVIDER_DEFAULT_NAMESPACE = "mvp_demo"
|
||||
|
||||
run_sync_job(provider_namespace="mvp_demo", job_type=IngestionRun.JobType.FULL_SYNC)
|
||||
counts_after_first = {
|
||||
"competition": Competition.objects.count(),
|
||||
"team": Team.objects.count(),
|
||||
"season": Season.objects.count(),
|
||||
"player": Player.objects.count(),
|
||||
"player_season": PlayerSeason.objects.count(),
|
||||
"player_stats": PlayerSeasonStats.objects.count(),
|
||||
}
|
||||
|
||||
run_sync_job(provider_namespace="mvp_demo", job_type=IngestionRun.JobType.FULL_SYNC)
|
||||
counts_after_second = {
|
||||
"competition": Competition.objects.count(),
|
||||
"team": Team.objects.count(),
|
||||
"season": Season.objects.count(),
|
||||
"player": Player.objects.count(),
|
||||
"player_season": PlayerSeason.objects.count(),
|
||||
"player_stats": PlayerSeasonStats.objects.count(),
|
||||
}
|
||||
|
||||
assert counts_after_first == counts_after_second
|
||||
|
||||
|
||||
@pytest.mark.django_db
|
||||
def test_incremental_sync_runs_successfully(settings):
|
||||
settings.PROVIDER_DEFAULT_NAMESPACE = "mvp_demo"
|
||||
|
||||
run = run_sync_job(
|
||||
provider_namespace="mvp_demo",
|
||||
job_type=IngestionRun.JobType.INCREMENTAL,
|
||||
cursor="demo-cursor",
|
||||
)
|
||||
|
||||
assert run.status == IngestionRun.RunStatus.SUCCESS
|
||||
assert run.records_processed > 0
|
||||
assert run.started_at is not None
|
||||
assert run.finished_at is not None
|
||||
assert run.finished_at >= run.started_at
|
||||
assert run.error_summary == ""
|
||||
|
||||
|
||||
@pytest.mark.django_db
|
||||
def test_run_sync_handles_rate_limit(settings):
|
||||
settings.PROVIDER_DEFAULT_NAMESPACE = "mvp_demo"
|
||||
os.environ["PROVIDER_MVP_FORCE_RATE_LIMIT"] = "1"
|
||||
|
||||
with pytest.raises(ProviderRateLimitError):
|
||||
run_sync_job(provider_namespace="mvp_demo", job_type=IngestionRun.JobType.FULL_SYNC)
|
||||
|
||||
run = IngestionRun.objects.order_by("-id").first()
|
||||
assert run is not None
|
||||
assert run.status == IngestionRun.RunStatus.FAILED
|
||||
assert run.started_at is not None
|
||||
assert run.finished_at is not None
|
||||
assert "Rate limit" in run.error_summary
|
||||
assert IngestionError.objects.filter(ingestion_run=run).exists()
|
||||
|
||||
os.environ.pop("PROVIDER_MVP_FORCE_RATE_LIMIT", None)
|
||||
|
||||
|
||||
@pytest.mark.django_db
|
||||
def test_balldontlie_sync_idempotency_with_stable_payload(monkeypatch):
|
||||
class StableProvider:
|
||||
def sync_all(self):
|
||||
return {
|
||||
"competitions": [
|
||||
{
|
||||
"external_id": "competition-nba",
|
||||
"name": "NBA",
|
||||
"slug": "nba",
|
||||
"competition_type": "league",
|
||||
"gender": "men",
|
||||
"level": 1,
|
||||
"country": None,
|
||||
"is_active": True,
|
||||
}
|
||||
],
|
||||
"teams": [
|
||||
{
|
||||
"external_id": "team-14",
|
||||
"name": "Los Angeles Lakers",
|
||||
"short_name": "LAL",
|
||||
"slug": "los-angeles-lakers",
|
||||
"country": None,
|
||||
"is_national_team": False,
|
||||
}
|
||||
],
|
||||
"seasons": [
|
||||
{
|
||||
"external_id": "season-2024",
|
||||
"label": "2024-2025",
|
||||
"start_date": "2024-10-01",
|
||||
"end_date": "2025-06-30",
|
||||
"is_current": False,
|
||||
}
|
||||
],
|
||||
"players": [
|
||||
{
|
||||
"external_id": "player-237",
|
||||
"first_name": "LeBron",
|
||||
"last_name": "James",
|
||||
"full_name": "LeBron James",
|
||||
"birth_date": None,
|
||||
"nationality": None,
|
||||
"nominal_position": {"code": "SF", "name": "Small Forward"},
|
||||
"inferred_role": {"code": "wing", "name": "Wing"},
|
||||
"height_cm": None,
|
||||
"weight_kg": None,
|
||||
"dominant_hand": "unknown",
|
||||
"is_active": True,
|
||||
"aliases": [],
|
||||
}
|
||||
],
|
||||
"player_stats": [
|
||||
{
|
||||
"external_id": "ps-2024-237-14",
|
||||
"player_external_id": "player-237",
|
||||
"team_external_id": "team-14",
|
||||
"competition_external_id": "competition-nba",
|
||||
"season_external_id": "season-2024",
|
||||
"games_played": 2,
|
||||
"games_started": 0,
|
||||
"minutes_played": 68,
|
||||
"points": 25,
|
||||
"rebounds": 9,
|
||||
"assists": 8,
|
||||
"steals": 1.5,
|
||||
"blocks": 0.5,
|
||||
"turnovers": 3.5,
|
||||
"fg_pct": 55.0,
|
||||
"three_pct": 45.0,
|
||||
"ft_pct": 95.0,
|
||||
"usage_rate": None,
|
||||
"true_shooting_pct": None,
|
||||
"player_efficiency_rating": None,
|
||||
}
|
||||
],
|
||||
"player_careers": [
|
||||
{
|
||||
"external_id": "career-2024-237-14",
|
||||
"player_external_id": "player-237",
|
||||
"team_external_id": "team-14",
|
||||
"competition_external_id": "competition-nba",
|
||||
"season_external_id": "season-2024",
|
||||
"role_code": "",
|
||||
"shirt_number": None,
|
||||
"start_date": "2024-10-01",
|
||||
"end_date": "2025-06-30",
|
||||
"notes": "Imported from balldontlie aggregated box scores",
|
||||
}
|
||||
],
|
||||
}
|
||||
|
||||
def sync_incremental(self, *, cursor: str | None = None):
|
||||
payload = self.sync_all()
|
||||
payload["cursor"] = cursor
|
||||
return payload
|
||||
|
||||
monkeypatch.setattr("apps.ingestion.services.sync.get_provider", lambda namespace: StableProvider())
|
||||
|
||||
run_sync_job(provider_namespace="balldontlie", job_type=IngestionRun.JobType.FULL_SYNC)
|
||||
lebron = Player.objects.get(full_name="LeBron James")
|
||||
assert lebron.nationality is None
|
||||
assert not Nationality.objects.filter(iso2_code="ZZ").exists()
|
||||
|
||||
counts_first = {
|
||||
"competition": Competition.objects.count(),
|
||||
"team": Team.objects.count(),
|
||||
"season": Season.objects.count(),
|
||||
"player": Player.objects.count(),
|
||||
"player_season": PlayerSeason.objects.count(),
|
||||
"player_stats": PlayerSeasonStats.objects.count(),
|
||||
"mapping": ExternalMapping.objects.filter(provider_namespace="balldontlie").count(),
|
||||
}
|
||||
|
||||
run_sync_job(provider_namespace="balldontlie", job_type=IngestionRun.JobType.FULL_SYNC)
|
||||
counts_second = {
|
||||
"competition": Competition.objects.count(),
|
||||
"team": Team.objects.count(),
|
||||
"season": Season.objects.count(),
|
||||
"player": Player.objects.count(),
|
||||
"player_season": PlayerSeason.objects.count(),
|
||||
"player_stats": PlayerSeasonStats.objects.count(),
|
||||
"mapping": ExternalMapping.objects.filter(provider_namespace="balldontlie").count(),
|
||||
}
|
||||
|
||||
assert counts_first == counts_second
|
||||
|
||||
|
||||
@pytest.mark.django_db
|
||||
def test_batch_transactions_preserve_prior_step_progress_on_failure(settings, monkeypatch):
|
||||
settings.PROVIDER_DEFAULT_NAMESPACE = "mvp_demo"
|
||||
|
||||
def boom(*args, **kwargs):
|
||||
raise RuntimeError("teams-sync-failed")
|
||||
|
||||
monkeypatch.setattr("apps.ingestion.services.sync._sync_teams", boom)
|
||||
|
||||
with pytest.raises(RuntimeError):
|
||||
run_sync_job(provider_namespace="mvp_demo", job_type=IngestionRun.JobType.FULL_SYNC)
|
||||
|
||||
run = IngestionRun.objects.order_by("-id").first()
|
||||
assert run is not None
|
||||
assert run.status == IngestionRun.RunStatus.FAILED
|
||||
assert Competition.objects.exists()
|
||||
assert Team.objects.count() == 0
|
||||
assert run.context.get("completed_steps") == ["competitions"]
|
||||
assert "Unhandled ingestion error" in run.error_summary
|
||||
@ -1,112 +0,0 @@
|
||||
import pytest
|
||||
from contextlib import contextmanager
|
||||
from celery.schedules import crontab
|
||||
import psycopg
|
||||
from django.conf import settings
|
||||
|
||||
from apps.ingestion.models import IngestionRun
|
||||
from apps.ingestion.services.runs import _build_ingestion_lock_key, release_ingestion_lock, try_acquire_ingestion_lock
|
||||
from apps.ingestion.tasks import scheduled_provider_sync, trigger_incremental_sync
|
||||
from config.celery import app as celery_app, build_periodic_schedule
|
||||
|
||||
|
||||
@pytest.mark.django_db
|
||||
def test_periodic_task_registered():
|
||||
assert "apps.ingestion.tasks.scheduled_provider_sync" in celery_app.tasks
|
||||
|
||||
|
||||
@pytest.mark.django_db
|
||||
def test_build_periodic_schedule_enabled(settings):
|
||||
settings.INGESTION_SCHEDULE_ENABLED = True
|
||||
settings.INGESTION_SCHEDULE_CRON = "15 * * * *"
|
||||
|
||||
schedule = build_periodic_schedule()
|
||||
assert "ingestion.scheduled_provider_sync" in schedule
|
||||
entry = schedule["ingestion.scheduled_provider_sync"]
|
||||
assert entry["task"] == "apps.ingestion.tasks.scheduled_provider_sync"
|
||||
assert isinstance(entry["schedule"], crontab)
|
||||
assert entry["schedule"]._orig_minute == "15"
|
||||
|
||||
|
||||
@pytest.mark.django_db
|
||||
def test_build_periodic_schedule_disabled(settings):
|
||||
settings.INGESTION_SCHEDULE_ENABLED = False
|
||||
assert build_periodic_schedule() == {}
|
||||
|
||||
|
||||
@pytest.mark.django_db
|
||||
def test_build_periodic_schedule_invalid_cron_disables_task_and_logs(settings, caplog):
|
||||
settings.INGESTION_SCHEDULE_ENABLED = True
|
||||
settings.INGESTION_SCHEDULE_CRON = "invalid-cron"
|
||||
|
||||
with caplog.at_level("ERROR"):
|
||||
schedule = build_periodic_schedule()
|
||||
|
||||
assert schedule == {}
|
||||
assert any("Invalid periodic ingestion schedule config. Task disabled." in message for message in caplog.messages)
|
||||
|
||||
|
||||
@pytest.mark.django_db
|
||||
def test_trigger_incremental_sync_skips_when_advisory_lock_not_acquired(settings, monkeypatch):
|
||||
settings.INGESTION_PREVENT_OVERLAP = True
|
||||
|
||||
@contextmanager
|
||||
def fake_lock(**kwargs):
|
||||
yield False
|
||||
|
||||
monkeypatch.setattr("apps.ingestion.tasks.ingestion_advisory_lock", fake_lock)
|
||||
run_id = trigger_incremental_sync.apply(
|
||||
kwargs={"provider_namespace": "mvp_demo"},
|
||||
).get()
|
||||
skipped_run = IngestionRun.objects.get(id=run_id)
|
||||
assert skipped_run.status == IngestionRun.RunStatus.CANCELED
|
||||
assert "advisory lock" in skipped_run.error_summary
|
||||
|
||||
|
||||
@pytest.mark.django_db
|
||||
def test_advisory_lock_prevents_concurrent_acquisition():
|
||||
provider_namespace = "mvp_demo"
|
||||
job_type = IngestionRun.JobType.INCREMENTAL
|
||||
lock_key = _build_ingestion_lock_key(provider_namespace=provider_namespace, job_type=job_type)
|
||||
|
||||
conninfo = (
|
||||
f"dbname={settings.DATABASES['default']['NAME']} "
|
||||
f"user={settings.DATABASES['default']['USER']} "
|
||||
f"password={settings.DATABASES['default']['PASSWORD']} "
|
||||
f"host={settings.DATABASES['default']['HOST']} "
|
||||
f"port={settings.DATABASES['default']['PORT']}"
|
||||
)
|
||||
with psycopg.connect(conninfo) as external_conn:
|
||||
with external_conn.cursor() as cursor:
|
||||
cursor.execute("SELECT pg_advisory_lock(%s);", [lock_key])
|
||||
acquired, _ = try_acquire_ingestion_lock(
|
||||
provider_namespace=provider_namespace,
|
||||
job_type=job_type,
|
||||
)
|
||||
assert acquired is False
|
||||
cursor.execute("SELECT pg_advisory_unlock(%s);", [lock_key])
|
||||
|
||||
acquired, django_key = try_acquire_ingestion_lock(
|
||||
provider_namespace=provider_namespace,
|
||||
job_type=job_type,
|
||||
)
|
||||
assert acquired is True
|
||||
release_ingestion_lock(lock_key=django_key)
|
||||
|
||||
|
||||
@pytest.mark.django_db
|
||||
def test_scheduled_provider_sync_uses_configured_job_type(settings, monkeypatch):
|
||||
settings.INGESTION_SCHEDULE_JOB_TYPE = IngestionRun.JobType.FULL_SYNC
|
||||
settings.INGESTION_SCHEDULE_PROVIDER_NAMESPACE = "mvp_demo"
|
||||
captured = {}
|
||||
|
||||
def fake_runner(**kwargs):
|
||||
captured.update(kwargs)
|
||||
return 99
|
||||
|
||||
monkeypatch.setattr("apps.ingestion.tasks._run_sync_with_overlap_guard", fake_runner)
|
||||
|
||||
result = scheduled_provider_sync.apply().get()
|
||||
assert result == 99
|
||||
assert captured["provider_namespace"] == "mvp_demo"
|
||||
assert captured["job_type"] == IngestionRun.JobType.FULL_SYNC
|
||||
@ -4,8 +4,6 @@ import pytest
|
||||
from django.contrib.auth.models import User
|
||||
from django.urls import reverse
|
||||
|
||||
from apps.ingestion.models import IngestionRun
|
||||
from apps.ingestion.services.sync import run_sync_job
|
||||
from apps.players.models import Nationality, Player, Position, Role
|
||||
from apps.scouting.models import SavedSearch
|
||||
|
||||
@ -49,25 +47,3 @@ def test_saved_search_run_filters_player_results(client):
|
||||
assert response.status_code == 200
|
||||
assert "Marco Rossi" in response.content.decode()
|
||||
assert "Luca Bianchi" not in response.content.decode()
|
||||
|
||||
|
||||
@pytest.mark.django_db
|
||||
def test_ingestion_output_is_searchable_in_ui_and_api(settings, client):
|
||||
settings.PROVIDER_DEFAULT_NAMESPACE = "mvp_demo"
|
||||
run = run_sync_job(provider_namespace="mvp_demo", job_type=IngestionRun.JobType.FULL_SYNC)
|
||||
assert run.status == IngestionRun.RunStatus.SUCCESS
|
||||
|
||||
player = Player.objects.filter(origin_competition__isnull=False).order_by("id").first()
|
||||
assert player is not None
|
||||
assert player.origin_competition_id is not None
|
||||
|
||||
params = {"origin_competition": player.origin_competition_id}
|
||||
ui_response = client.get(reverse("players:index"), data=params)
|
||||
api_response = client.get(reverse("api:players"), data=params)
|
||||
|
||||
assert ui_response.status_code == 200
|
||||
assert api_response.status_code == 200
|
||||
ui_ids = {item.id for item in ui_response.context["players"]}
|
||||
api_ids = {item["id"] for item in api_response.json()["results"]}
|
||||
assert player.id in ui_ids
|
||||
assert player.id in api_ids
|
||||
|
||||
@ -8,6 +8,7 @@ import pytest
|
||||
from django.core.management import call_command
|
||||
|
||||
from apps.ingestion.extractors.lba import LBASnapshotExtractor
|
||||
from apps.ingestion.extractors.base import ExtractorNormalizationError
|
||||
from apps.ingestion.extractors.registry import create_extractor
|
||||
|
||||
|
||||
@ -51,6 +52,56 @@ def test_lba_extractor_normalizes_fixture_payload(tmp_path, settings):
|
||||
assert row["three_pt_pct"] == 36.5
|
||||
|
||||
|
||||
@pytest.mark.django_db
|
||||
def test_lba_extractor_accepts_partial_public_player_bio_fields(tmp_path, settings):
|
||||
settings.EXTRACTOR_LBA_STATS_URL = "https://www.legabasket.it/public/stats.json"
|
||||
settings.EXTRACTOR_LBA_SEASON_LABEL = "2025-2026"
|
||||
settings.EXTRACTOR_LBA_COMPETITION_EXTERNAL_ID = "lba-serie-a"
|
||||
settings.EXTRACTOR_LBA_COMPETITION_NAME = "Lega Basket Serie A"
|
||||
|
||||
fixture_payload = _load_fixture("lba/lba_players_stats_partial_public.json")
|
||||
|
||||
class FakeClient:
|
||||
def get_json(self, *_args, **_kwargs):
|
||||
return fixture_payload
|
||||
|
||||
extractor = LBASnapshotExtractor(http_client=FakeClient())
|
||||
output_path = tmp_path / "lba-partial.json"
|
||||
result = extractor.run(output_path=output_path, snapshot_date=date(2026, 3, 13))
|
||||
|
||||
assert result.records_count == 1
|
||||
payload = json.loads(output_path.read_text(encoding="utf-8"))
|
||||
row = payload["records"][0]
|
||||
assert row["full_name"] == "Andrea Bianchi"
|
||||
assert row["first_name"] is None
|
||||
assert row["last_name"] is None
|
||||
assert row["birth_date"] is None
|
||||
assert row["nationality"] is None
|
||||
assert row["height_cm"] is None
|
||||
assert row["weight_kg"] is None
|
||||
assert row["position"] is None
|
||||
assert row["games_played"] == 18
|
||||
|
||||
|
||||
@pytest.mark.django_db
|
||||
def test_lba_extractor_still_fails_when_required_stats_are_missing(settings):
|
||||
settings.EXTRACTOR_LBA_STATS_URL = "https://www.legabasket.it/public/stats.json"
|
||||
settings.EXTRACTOR_LBA_SEASON_LABEL = "2025-2026"
|
||||
settings.EXTRACTOR_LBA_COMPETITION_EXTERNAL_ID = "lba-serie-a"
|
||||
settings.EXTRACTOR_LBA_COMPETITION_NAME = "Lega Basket Serie A"
|
||||
|
||||
fixture_payload = _load_fixture("lba/lba_players_stats_partial_public.json")
|
||||
fixture_payload["data"][0].pop("ppg")
|
||||
|
||||
class FakeClient:
|
||||
def get_json(self, *_args, **_kwargs):
|
||||
return fixture_payload
|
||||
|
||||
extractor = LBASnapshotExtractor(http_client=FakeClient())
|
||||
with pytest.raises(ExtractorNormalizationError):
|
||||
extractor.run(write_output=False, snapshot_date=date(2026, 3, 13))
|
||||
|
||||
|
||||
@pytest.mark.django_db
|
||||
def test_lba_extractor_registry_selection(settings):
|
||||
settings.EXTRACTOR_LBA_STATS_URL = "https://www.legabasket.it/public/stats.json"
|
||||
|
||||
@ -4,7 +4,7 @@ import pytest
|
||||
from django.urls import reverse
|
||||
|
||||
from apps.competitions.models import Competition, Season
|
||||
from apps.players.models import Nationality, Player, Position, Role
|
||||
from apps.players.models import Nationality, Player, PlayerAlias, Position, Role
|
||||
from apps.stats.models import PlayerSeason, PlayerSeasonStats
|
||||
from apps.teams.models import Team
|
||||
|
||||
|
||||
@ -1,77 +0,0 @@
|
||||
import os
|
||||
|
||||
import pytest
|
||||
|
||||
from apps.providers.adapters.mvp_provider import MvpDemoProviderAdapter
|
||||
from apps.providers.exceptions import ProviderNotFoundError, ProviderRateLimitError
|
||||
from apps.providers.registry import get_provider
|
||||
|
||||
|
||||
@pytest.mark.django_db
|
||||
def test_mvp_provider_fetch_and_search_players():
|
||||
adapter = MvpDemoProviderAdapter()
|
||||
|
||||
players = adapter.fetch_players()
|
||||
assert len(players) >= 2
|
||||
|
||||
results = adapter.search_players(query="luca")
|
||||
assert any("Luca" in item["full_name"] for item in results)
|
||||
|
||||
detail = adapter.fetch_player(external_player_id="player-001")
|
||||
assert detail is not None
|
||||
assert detail["full_name"] == "Luca Rinaldi"
|
||||
|
||||
|
||||
@pytest.mark.django_db
|
||||
def test_mvp_provider_rate_limit_signal():
|
||||
os.environ["PROVIDER_MVP_FORCE_RATE_LIMIT"] = "1"
|
||||
adapter = MvpDemoProviderAdapter()
|
||||
|
||||
with pytest.raises(ProviderRateLimitError):
|
||||
adapter.fetch_players()
|
||||
|
||||
os.environ.pop("PROVIDER_MVP_FORCE_RATE_LIMIT", None)
|
||||
|
||||
|
||||
@pytest.mark.django_db
|
||||
def test_provider_registry_resolution(settings):
|
||||
settings.PROVIDER_DEFAULT_NAMESPACE = "mvp_demo"
|
||||
provider = get_provider()
|
||||
assert isinstance(provider, MvpDemoProviderAdapter)
|
||||
|
||||
with pytest.raises(ProviderNotFoundError):
|
||||
get_provider("does-not-exist")
|
||||
|
||||
|
||||
@pytest.mark.django_db
|
||||
def test_demo_provider_sync_payload_uses_normalized_shape():
|
||||
adapter = MvpDemoProviderAdapter()
|
||||
payload = adapter.sync_all()
|
||||
|
||||
assert set(payload.keys()) == {
|
||||
"players",
|
||||
"competitions",
|
||||
"teams",
|
||||
"seasons",
|
||||
"player_stats",
|
||||
"player_careers",
|
||||
"cursor",
|
||||
}
|
||||
assert payload["cursor"] is None
|
||||
|
||||
player = payload["players"][0]
|
||||
assert set(player.keys()) == {
|
||||
"external_id",
|
||||
"first_name",
|
||||
"last_name",
|
||||
"full_name",
|
||||
"birth_date",
|
||||
"nationality",
|
||||
"nominal_position",
|
||||
"inferred_role",
|
||||
"height_cm",
|
||||
"weight_kg",
|
||||
"dominant_hand",
|
||||
"is_active",
|
||||
"aliases",
|
||||
}
|
||||
@ -1,263 +0,0 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import time
|
||||
from typing import Any
|
||||
|
||||
import pytest
|
||||
import requests
|
||||
|
||||
from apps.providers.adapters.balldontlie_provider import BalldontlieProviderAdapter
|
||||
from apps.providers.adapters.mvp_provider import MvpDemoProviderAdapter
|
||||
from apps.providers.clients.balldontlie import BalldontlieClient
|
||||
from apps.providers.exceptions import ProviderRateLimitError, ProviderTransientError, ProviderUnauthorizedError
|
||||
from apps.providers.registry import get_default_provider_namespace, get_provider
|
||||
from apps.providers.services.balldontlie_mappings import map_seasons
|
||||
|
||||
|
||||
class _FakeResponse:
|
||||
def __init__(self, *, status_code: int, payload: dict[str, Any] | None = None, headers: dict[str, str] | None = None, text: str = ""):
|
||||
self.status_code = status_code
|
||||
self._payload = payload or {}
|
||||
self.headers = headers or {}
|
||||
self.text = text
|
||||
|
||||
def json(self):
|
||||
return self._payload
|
||||
|
||||
|
||||
class _FakeSession:
|
||||
def __init__(self, responses: list[Any]):
|
||||
self._responses = responses
|
||||
self.calls: list[dict[str, Any]] = []
|
||||
|
||||
def get(self, *args, **kwargs):
|
||||
self.calls.append(kwargs)
|
||||
item = self._responses.pop(0)
|
||||
if isinstance(item, Exception):
|
||||
raise item
|
||||
return item
|
||||
|
||||
|
||||
class _FakeBalldontlieClient:
|
||||
def get_json(self, path: str, *, params: dict[str, Any] | None = None) -> dict[str, Any]:
|
||||
if path == "/nba/v1/teams":
|
||||
return {
|
||||
"data": [
|
||||
{
|
||||
"id": 14,
|
||||
"full_name": "Los Angeles Lakers",
|
||||
"abbreviation": "LAL",
|
||||
}
|
||||
]
|
||||
}
|
||||
return {"data": []}
|
||||
|
||||
def list_paginated(
|
||||
self,
|
||||
path: str,
|
||||
*,
|
||||
params: dict[str, Any] | None = None,
|
||||
per_page: int = 100,
|
||||
page_limit: int = 1,
|
||||
) -> list[dict[str, Any]]:
|
||||
if path == "/nba/v1/players":
|
||||
return [
|
||||
{
|
||||
"id": 237,
|
||||
"first_name": "LeBron",
|
||||
"last_name": "James",
|
||||
"position": "F",
|
||||
"team": {"id": 14},
|
||||
}
|
||||
]
|
||||
if path == "/nba/v1/stats":
|
||||
return [
|
||||
{
|
||||
"pts": 20,
|
||||
"reb": 8,
|
||||
"ast": 7,
|
||||
"stl": 1,
|
||||
"blk": 1,
|
||||
"turnover": 3,
|
||||
"fg_pct": 0.5,
|
||||
"fg3_pct": 0.4,
|
||||
"ft_pct": 0.9,
|
||||
"min": "35:12",
|
||||
"player": {"id": 237},
|
||||
"team": {"id": 14},
|
||||
"game": {"season": 2024},
|
||||
},
|
||||
{
|
||||
"pts": 30,
|
||||
"reb": 10,
|
||||
"ast": 9,
|
||||
"stl": 2,
|
||||
"blk": 0,
|
||||
"turnover": 4,
|
||||
"fg_pct": 0.6,
|
||||
"fg3_pct": 0.5,
|
||||
"ft_pct": 1.0,
|
||||
"min": "33:00",
|
||||
"player": {"id": 237},
|
||||
"team": {"id": 14},
|
||||
"game": {"season": 2024},
|
||||
},
|
||||
]
|
||||
return []
|
||||
|
||||
|
||||
@pytest.mark.django_db
|
||||
def test_provider_registry_backend_selection(settings):
|
||||
settings.PROVIDER_DEFAULT_NAMESPACE = ""
|
||||
settings.PROVIDER_BACKEND = "demo"
|
||||
assert get_default_provider_namespace() == "mvp_demo"
|
||||
assert isinstance(get_provider(), MvpDemoProviderAdapter)
|
||||
|
||||
settings.PROVIDER_BACKEND = "balldontlie"
|
||||
assert get_default_provider_namespace() == "balldontlie"
|
||||
assert isinstance(get_provider(), BalldontlieProviderAdapter)
|
||||
|
||||
settings.PROVIDER_DEFAULT_NAMESPACE = "mvp_demo"
|
||||
assert get_default_provider_namespace() == "mvp_demo"
|
||||
|
||||
|
||||
@pytest.mark.django_db
|
||||
def test_balldontlie_adapter_maps_payloads(settings):
|
||||
settings.PROVIDER_BALLDONTLIE_SEASONS = [2024]
|
||||
adapter = BalldontlieProviderAdapter(client=_FakeBalldontlieClient())
|
||||
|
||||
payload = adapter.sync_all()
|
||||
|
||||
assert payload["competitions"][0]["external_id"] == "competition-nba"
|
||||
assert payload["teams"][0]["external_id"] == "team-14"
|
||||
assert payload["players"][0]["external_id"] == "player-237"
|
||||
assert payload["seasons"][0]["external_id"] == "season-2024"
|
||||
assert payload["player_stats"][0]["games_played"] == 2
|
||||
assert payload["player_stats"][0]["points"] == 25.0
|
||||
assert payload["player_stats"][0]["fg_pct"] == 55.0
|
||||
|
||||
player = payload["players"][0]
|
||||
assert player["nationality"] is None
|
||||
assert "current_team_external_id" not in player
|
||||
|
||||
expected_keys = {
|
||||
"external_id",
|
||||
"first_name",
|
||||
"last_name",
|
||||
"full_name",
|
||||
"birth_date",
|
||||
"nationality",
|
||||
"nominal_position",
|
||||
"inferred_role",
|
||||
"height_cm",
|
||||
"weight_kg",
|
||||
"dominant_hand",
|
||||
"is_active",
|
||||
"aliases",
|
||||
}
|
||||
assert set(player.keys()) == expected_keys
|
||||
|
||||
|
||||
@pytest.mark.django_db
|
||||
def test_balldontlie_map_seasons_marks_latest_as_current():
|
||||
seasons = map_seasons([2022, 2024, 2023, 2024])
|
||||
current_rows = [row for row in seasons if row["is_current"]]
|
||||
assert len(current_rows) == 1
|
||||
assert current_rows[0]["external_id"] == "season-2024"
|
||||
assert [row["external_id"] for row in seasons] == ["season-2022", "season-2023", "season-2024"]
|
||||
|
||||
|
||||
@pytest.mark.django_db
|
||||
def test_balldontlie_adapter_degrades_when_stats_unauthorized(settings):
|
||||
class _UnauthorizedStatsClient(_FakeBalldontlieClient):
|
||||
def list_paginated(self, path: str, *, params=None, per_page=100, page_limit=1):
|
||||
if path == "/nba/v1/stats":
|
||||
raise ProviderUnauthorizedError(
|
||||
provider="balldontlie",
|
||||
path="stats",
|
||||
status_code=401,
|
||||
detail="Unauthorized",
|
||||
)
|
||||
return super().list_paginated(path, params=params, per_page=per_page, page_limit=page_limit)
|
||||
|
||||
settings.PROVIDER_BALLDONTLIE_SEASONS = [2024]
|
||||
settings.PROVIDER_BALLDONTLIE_STATS_STRICT = False
|
||||
adapter = BalldontlieProviderAdapter(client=_UnauthorizedStatsClient())
|
||||
|
||||
payload = adapter.sync_all()
|
||||
assert payload["players"]
|
||||
assert payload["teams"]
|
||||
assert payload["player_stats"] == []
|
||||
assert payload["player_careers"] == []
|
||||
|
||||
|
||||
@pytest.mark.django_db
|
||||
def test_balldontlie_client_retries_after_rate_limit(monkeypatch, settings):
|
||||
monkeypatch.setattr(time, "sleep", lambda _: None)
|
||||
settings.PROVIDER_REQUEST_RETRIES = 2
|
||||
settings.PROVIDER_REQUEST_RETRY_SLEEP = 0
|
||||
|
||||
session = _FakeSession(
|
||||
responses=[
|
||||
_FakeResponse(status_code=429, headers={"Retry-After": "0"}),
|
||||
_FakeResponse(status_code=200, payload={"data": []}),
|
||||
]
|
||||
)
|
||||
client = BalldontlieClient(session=session)
|
||||
|
||||
payload = client.get_json("players")
|
||||
assert payload == {"data": []}
|
||||
|
||||
|
||||
@pytest.mark.django_db
|
||||
def test_balldontlie_client_timeout_retries_then_fails(monkeypatch, settings):
|
||||
monkeypatch.setattr(time, "sleep", lambda _: None)
|
||||
settings.PROVIDER_REQUEST_RETRIES = 2
|
||||
settings.PROVIDER_REQUEST_RETRY_SLEEP = 0
|
||||
|
||||
session = _FakeSession(responses=[requests.Timeout("slow"), requests.Timeout("slow")])
|
||||
client = BalldontlieClient(session=session)
|
||||
|
||||
with pytest.raises(ProviderTransientError):
|
||||
client.get_json("players")
|
||||
|
||||
|
||||
@pytest.mark.django_db
|
||||
def test_balldontlie_client_raises_rate_limit_after_max_retries(monkeypatch, settings):
|
||||
monkeypatch.setattr(time, "sleep", lambda _: None)
|
||||
settings.PROVIDER_REQUEST_RETRIES = 2
|
||||
settings.PROVIDER_REQUEST_RETRY_SLEEP = 0
|
||||
|
||||
session = _FakeSession(
|
||||
responses=[
|
||||
_FakeResponse(status_code=429, headers={"Retry-After": "1"}),
|
||||
_FakeResponse(status_code=429, headers={"Retry-After": "1"}),
|
||||
]
|
||||
)
|
||||
client = BalldontlieClient(session=session)
|
||||
|
||||
with pytest.raises(ProviderRateLimitError):
|
||||
client.get_json("players")
|
||||
|
||||
|
||||
@pytest.mark.django_db
|
||||
def test_balldontlie_client_cursor_pagination(settings):
|
||||
session = _FakeSession(
|
||||
responses=[
|
||||
_FakeResponse(
|
||||
status_code=200,
|
||||
payload={"data": [{"id": 1}], "meta": {"next_cursor": 101}},
|
||||
),
|
||||
_FakeResponse(
|
||||
status_code=200,
|
||||
payload={"data": [{"id": 2}], "meta": {"next_cursor": None}},
|
||||
),
|
||||
]
|
||||
)
|
||||
client = BalldontlieClient(session=session)
|
||||
rows = client.list_paginated("players", per_page=1, page_limit=5)
|
||||
|
||||
assert rows == [{"id": 1}, {"id": 2}]
|
||||
assert "page" not in session.calls[0]["params"]
|
||||
assert "cursor" not in session.calls[0]["params"]
|
||||
assert session.calls[1]["params"]["cursor"] == 101
|
||||
15
tests/test_v2_runtime_boundaries.py
Normal file
15
tests/test_v2_runtime_boundaries.py
Normal file
@ -0,0 +1,15 @@
|
||||
import pytest
|
||||
from django.conf import settings
|
||||
|
||||
|
||||
@pytest.mark.django_db
|
||||
def test_legacy_provider_stack_disabled_by_default():
|
||||
assert settings.LEGACY_PROVIDER_STACK_ENABLED is False
|
||||
assert "apps.providers" not in settings.INSTALLED_APPS
|
||||
|
||||
|
||||
@pytest.mark.django_db
|
||||
def test_providers_route_not_mounted_by_default(client):
|
||||
response = client.get("/providers/")
|
||||
assert response.status_code == 404
|
||||
|
||||
Reference in New Issue
Block a user