feat(v2): add simple daily extraction-import orchestration
This commit is contained in:
43
README.md
43
README.md
@ -19,6 +19,7 @@ Runtime services are intentionally small:
|
||||
- `web` (Django/Gunicorn)
|
||||
- `postgres` (primary DB)
|
||||
- `nginx` (reverse proxy + static/media serving)
|
||||
- optional `scheduler` profile service (runs daily extractor/import loop)
|
||||
|
||||
No Redis/Celery services are part of the v2 default runtime topology.
|
||||
Legacy Celery/provider code is still in repository history/codebase but de-emphasized for v2.
|
||||
@ -60,6 +61,18 @@ docker compose -f docker-compose.yml -f docker-compose.dev.yml up --build
|
||||
docker compose -f docker-compose.yml -f docker-compose.release.yml up -d --build
|
||||
```
|
||||
|
||||
### Start scheduler profile (optional)
|
||||
|
||||
```bash
|
||||
docker compose --profile scheduler up -d scheduler
|
||||
```
|
||||
|
||||
For development override:
|
||||
|
||||
```bash
|
||||
docker compose -f docker-compose.yml -f docker-compose.dev.yml --profile scheduler up -d scheduler
|
||||
```
|
||||
|
||||
## Named Volumes
|
||||
|
||||
v2 runtime uses named volumes for persistence:
|
||||
@ -82,6 +95,7 @@ Core groups:
|
||||
- image tag vars (`APP_IMAGE_TAG`, `NGINX_IMAGE_TAG`)
|
||||
- snapshot directory vars (`STATIC_DATASET_*`)
|
||||
- optional future scheduler vars (`SCHEDULER_*`)
|
||||
- daily orchestration vars (`DAILY_ORCHESTRATION_*`)
|
||||
|
||||
## Snapshot Storage Convention
|
||||
|
||||
@ -155,6 +169,12 @@ Run import:
|
||||
docker compose exec web python manage.py import_snapshots
|
||||
```
|
||||
|
||||
Run end-to-end daily orchestration manually (extractors -> import):
|
||||
|
||||
```bash
|
||||
docker compose exec web python manage.py run_daily_orchestration
|
||||
```
|
||||
|
||||
Command behavior:
|
||||
- scans `STATIC_DATASET_INCOMING_DIR` for `.json` files
|
||||
- validates strict schema
|
||||
@ -217,6 +237,14 @@ Run only the BCL extractor:
|
||||
docker compose exec web python manage.py run_bcl_extractor
|
||||
```
|
||||
|
||||
### Daily orchestration behavior
|
||||
|
||||
`run_daily_orchestration` performs:
|
||||
1. run configured extractors in order from `DAILY_ORCHESTRATION_EXTRACTORS`
|
||||
2. write snapshots to incoming dir
|
||||
3. run `import_snapshots`
|
||||
4. log extractor/import summary
|
||||
|
||||
Extractor environment variables:
|
||||
- `EXTRACTOR_USER_AGENT`
|
||||
- `EXTRACTOR_HTTP_TIMEOUT_SECONDS`
|
||||
@ -234,11 +262,26 @@ Extractor environment variables:
|
||||
- `EXTRACTOR_BCL_SEASON_LABEL`
|
||||
- `EXTRACTOR_BCL_COMPETITION_EXTERNAL_ID`
|
||||
- `EXTRACTOR_BCL_COMPETITION_NAME`
|
||||
- `DAILY_ORCHESTRATION_EXTRACTORS`
|
||||
- `DAILY_ORCHESTRATION_INTERVAL_SECONDS`
|
||||
|
||||
Notes:
|
||||
- extraction is intentionally low-frequency and uses retries conservatively
|
||||
- only public pages/endpoints should be targeted
|
||||
- emitted snapshots must match the same schema consumed by `import_snapshots`
|
||||
- optional scheduler container runs `scripts/scheduler.sh` loop using:
|
||||
- image: `registry.younerd.org/hoopscout/scheduler:${APP_IMAGE_TAG:-latest}`
|
||||
- command: `/app/scripts/scheduler.sh`
|
||||
- interval: `DAILY_ORCHESTRATION_INTERVAL_SECONDS`
|
||||
|
||||
### Scheduler entrypoint/runtime expectations
|
||||
|
||||
- scheduler uses the same app image and base `entrypoint.sh` as web
|
||||
- scheduler requires database connectivity and snapshot volumes
|
||||
- scheduler is disabled unless:
|
||||
- compose `scheduler` profile is started
|
||||
- `SCHEDULER_ENABLED=1`
|
||||
- this keeps default runtime simple while supporting daily automation
|
||||
|
||||
### LBA extractor assumptions and limitations (MVP)
|
||||
|
||||
|
||||
Reference in New Issue
Block a user