Commit graph

3 commits

Author SHA1 Message Date
jtrzupek
f014a901de feat(scheduler): periodic title+duration dedup (missing-merge tube dupes)
Missing-merge duplicates (same performer + identical normalized title + identical duration-to-the-second) that bulk_dedup misses — tube re-scrapes and cross-tube re-ingests like porn00 pulling a video already present from xnxx (reports 28fe8181/32df33b1). Extracted the proven merge_exact_title_duration logic into app/scheduler/title_duration_dedup.py (script now a thin wrapper), wired a 12h scheduler job (playback-only = what users actually see, GOON_SCHED_TITLE_DEDUP_HOURS). Signal is near-certain (two different videos don't share byte-identical title AND exact duration); no shared performer = not merged (over-match guard). Verified: job registers (jobs=14), backlog currently 0 after the one-shot global merge.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-19 11:20:48 +02:00
jtrzupek
a9545a7ab2 feat(scripts): merge_exact_title_duration --playback-only + progress logging
--playback-only restricts to scenes with live playback (app-visible dupes only).
Progress print every 500 merges for long global runs.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 11:02:19 +02:00
jtrzupek
e23e2d1f17 fix(merge): move playback_sources on scene merge + exact-title+duration dedup
merge_scenes never reassigned playback_sources → ON DELETE CASCADE dropped them
with the absorbed scene. Cross-source (canonical) merges rarely had tube playback
so it hid, but tube-dup merges silently LOST playback links. Add _move_playback_sources
(global unique (origin,page_url) guarantees no collision on reassign).

+ merge_exact_title_duration.py: catches missing-merge dupes bulk_dedup misses
(same performer + identical normalized title + identical duration_sec, no phash).
Bad Bella had 25 such pairs (bug-report ef92809d "duplikat, te same miniatury").

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 10:56:50 +02:00