The one-off cleanup merged ~13.5k same-video-different-title dupes, but they regrow as these sibling tubes re-ingest under new titles. Wire the asset-id+duration merge into the scheduler (every 12h, GOON_SCHED_THUMB_DEDUP_HOURS, 0=off) so it stays clean. Shared logic lives in app/scheduler/thumb_dedup.py (run_thumb_asset_dedup); the one-shot script now imports it. Same tight signature as the cleanup: family hosts only + identical duration (the bare asset-id number is reused across unrelated CDNs, so cross-host/diff- duration grouping is excluded). Reports 205b17d9 / 5a2944cb. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> |
||
|---|---|---|
| .. | ||
| __init__.py | ||
| browse_latest.py | ||
| bulk_dedup.py | ||
| deep_crawl.py | ||
| jobs.py | ||
| performer_driven.py | ||
| taxonomy_counts.py | ||
| thumb_dedup.py | ||
| worker.py | ||