The one-off cleanup merged ~13.5k same-video-different-title dupes, but they regrow as these sibling tubes re-ingest under new titles. Wire the asset-id+duration merge into the scheduler (every 12h, GOON_SCHED_THUMB_DEDUP_HOURS, 0=off) so it stays clean. Shared logic lives in app/scheduler/thumb_dedup.py (run_thumb_asset_dedup); the one-shot script now imports it. Same tight signature as the cleanup: family hosts only + identical duration (the bare asset-id number is reused across unrelated CDNs, so cross-host/diff- duration grouping is excluded). Reports 205b17d9 / 5a2944cb. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> |
||
|---|---|---|
| .. | ||
| api | ||
| connectors | ||
| extractors | ||
| models | ||
| normalize | ||
| resolve | ||
| scheduler | ||
| templates | ||
| __init__.py | ||
| auth.py | ||
| config.py | ||
| db.py | ||
| ingest.py | ||
| main.py | ||