Commit graph

2 commits

Author SHA1 Message Date
jtrzupek
8b4783771f feat(scheduler): periodic thumb-asset dedup (hdporn.gg/fullmovies.xxx)
The one-off cleanup merged ~13.5k same-video-different-title dupes, but they regrow as
these sibling tubes re-ingest under new titles. Wire the asset-id+duration merge into
the scheduler (every 12h, GOON_SCHED_THUMB_DEDUP_HOURS, 0=off) so it stays clean.

Shared logic lives in app/scheduler/thumb_dedup.py (run_thumb_asset_dedup); the one-shot
script now imports it. Same tight signature as the cleanup: family hosts only + identical
duration (the bare asset-id number is reused across unrelated CDNs, so cross-host/diff-
duration grouping is excluded). Reports 205b17d9 / 5a2944cb.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-14 14:56:45 +02:00
jtrzupek
b5d9473898 feat(scripts): merge tube dupes by thumbnail asset-id (hdporn.gg/fullmovies.xxx family)
These sibling platforms share one video-id space and ingest the same video under
different titles, which bulk_dedup misses (different titles, no phash). Match by the
asset-id in the thumbnail path (/<bucket>000/<id>/) on img.hdporn.gg|fullmovies.xxx plus
identical duration, and merge. Hard host restriction + duration guard: the bare number
is reused for unrelated videos on other CDNs (verified via dry-run), so cross-host or
different-duration grouping is excluded. Run scoped (studio id) or global; dry-run by
default. Reports 205b17d9 / 5a2944cb. Ran on Parasited: 43 pairs merged.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-14 14:18:44 +02:00