goon/scripts/merge_dupe_thumb_asset.py
jtrzupek 8b4783771f feat(scheduler): periodic thumb-asset dedup (hdporn.gg/fullmovies.xxx)
The one-off cleanup merged ~13.5k same-video-different-title dupes, but they regrow as
these sibling tubes re-ingest under new titles. Wire the asset-id+duration merge into
the scheduler (every 12h, GOON_SCHED_THUMB_DEDUP_HOURS, 0=off) so it stays clean.

Shared logic lives in app/scheduler/thumb_dedup.py (run_thumb_asset_dedup); the one-shot
script now imports it. Same tight signature as the cleanup: family hosts only + identical
duration (the bare asset-id number is reused across unrelated CDNs, so cross-host/diff-
duration grouping is excluded). Reports 205b17d9 / 5a2944cb.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-14 14:56:45 +02:00

36 lines
1.2 KiB
Python

"""One-shot merge tube-dupów po asset-id miniatury (rodzina hdporn.gg / fullmovies.xxx).
Logika współdzielona ze schedulerem: app/scheduler/thumb_dedup.py (job
`_job_thumb_asset_dedup` woła to samo periodycznie). Pełny opis sygnatury i guardów
w tamtym module.
Użycie (kontener worker):
python scripts/merge_dupe_thumb_asset.py [STUDIO_ID] [--commit]
Bez STUDIO_ID = global. Bez --commit = dry-run (lista par, nic nie scala).
"""
from __future__ import annotations
import sys
from app.scheduler.thumb_dedup import _groups, run_thumb_asset_dedup
def main() -> None:
commit = "--commit" in sys.argv
studio = next((a for a in sys.argv[1:] if not a.startswith("--") and len(a) >= 32), None)
if not commit:
groups = _groups(studio)
pairs = sum(len(g) - 1 for g in groups)
print(f"studio={studio or 'ALL'} groups={len(groups)} merges={pairs} commit=False", flush=True)
for g in groups:
for drop in g[1:]:
print(f" [dry] keep {g[0][:8]} <- drop {drop[:8]}")
return
res = run_thumb_asset_dedup(studio_id=studio, commit=True)
print(f"studio={studio or 'ALL'} {res}", flush=True)
if __name__ == "__main__":
main()