goon/app/scheduler
jtrzupek f014a901de feat(scheduler): periodic title+duration dedup (missing-merge tube dupes)
Missing-merge duplicates (same performer + identical normalized title + identical duration-to-the-second) that bulk_dedup misses — tube re-scrapes and cross-tube re-ingests like porn00 pulling a video already present from xnxx (reports 28fe8181/32df33b1). Extracted the proven merge_exact_title_duration logic into app/scheduler/title_duration_dedup.py (script now a thin wrapper), wired a 12h scheduler job (playback-only = what users actually see, GOON_SCHED_TITLE_DEDUP_HOURS). Signal is near-certain (two different videos don't share byte-identical title AND exact duration); no shared performer = not merged (over-match guard). Verified: job registers (jobs=14), backlog currently 0 after the one-shot global merge.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-19 11:20:48 +02:00
..
__init__.py Initial commit 2026-05-20 10:10:22 +02:00
browse_latest.py refactor(ingest): rename scraper Source name "pornapp" -> "tube-scraper" 2026-06-07 16:54:55 +02:00
bulk_dedup.py fix(dedup): tighten cross-source candidate prefilter — kill 1800s hang (GOON-V) 2026-06-08 10:03:33 +02:00
deep_crawl.py refactor(ingest): rename scraper Source name "pornapp" -> "tube-scraper" 2026-06-07 16:54:55 +02:00
hetzner_monitor.py feat(scheduler): hetzner bandwidth monitor + search-tube watchdog coverage 2026-06-18 09:18:59 +02:00
ingest_watchdog.py feat(scheduler): hetzner bandwidth monitor + search-tube watchdog coverage 2026-06-18 09:18:59 +02:00
jobs.py feat(scheduler): periodic title+duration dedup (missing-merge tube dupes) 2026-06-19 11:20:48 +02:00
performer_driven.py refactor(ingest): rename scraper Source name "pornapp" -> "tube-scraper" 2026-06-07 16:54:55 +02:00
taxonomy_counts.py fix(scenes): propagate playback duration to Scene + duration-consistent counts 2026-06-01 21:31:01 +02:00
thumb_dedup.py feat(scheduler): periodic thumb-asset dedup (hdporn.gg/fullmovies.xxx) 2026-06-14 14:56:45 +02:00
title_duration_dedup.py feat(scheduler): periodic title+duration dedup (missing-merge tube dupes) 2026-06-19 11:20:48 +02:00
worker.py feat(scheduler): periodic title+duration dedup (missing-merge tube dupes) 2026-06-19 11:20:48 +02:00