goon/app/scheduler
jtrzupek 8b4783771f feat(scheduler): periodic thumb-asset dedup (hdporn.gg/fullmovies.xxx)
The one-off cleanup merged ~13.5k same-video-different-title dupes, but they regrow as
these sibling tubes re-ingest under new titles. Wire the asset-id+duration merge into
the scheduler (every 12h, GOON_SCHED_THUMB_DEDUP_HOURS, 0=off) so it stays clean.

Shared logic lives in app/scheduler/thumb_dedup.py (run_thumb_asset_dedup); the one-shot
script now imports it. Same tight signature as the cleanup: family hosts only + identical
duration (the bare asset-id number is reused across unrelated CDNs, so cross-host/diff-
duration grouping is excluded). Reports 205b17d9 / 5a2944cb.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-14 14:56:45 +02:00
..
__init__.py Initial commit 2026-05-20 10:10:22 +02:00
browse_latest.py refactor(ingest): rename scraper Source name "pornapp" -> "tube-scraper" 2026-06-07 16:54:55 +02:00
bulk_dedup.py fix(dedup): tighten cross-source candidate prefilter — kill 1800s hang (GOON-V) 2026-06-08 10:03:33 +02:00
deep_crawl.py refactor(ingest): rename scraper Source name "pornapp" -> "tube-scraper" 2026-06-07 16:54:55 +02:00
jobs.py feat(scheduler): periodic thumb-asset dedup (hdporn.gg/fullmovies.xxx) 2026-06-14 14:56:45 +02:00
performer_driven.py refactor(ingest): rename scraper Source name "pornapp" -> "tube-scraper" 2026-06-07 16:54:55 +02:00
taxonomy_counts.py fix(scenes): propagate playback duration to Scene + duration-consistent counts 2026-06-01 21:31:01 +02:00
thumb_dedup.py feat(scheduler): periodic thumb-asset dedup (hdporn.gg/fullmovies.xxx) 2026-06-14 14:56:45 +02:00
worker.py feat(scheduler): periodic thumb-asset dedup (hdporn.gg/fullmovies.xxx) 2026-06-14 14:56:45 +02:00