The one-off cleanup merged ~13.5k same-video-different-title dupes, but they regrow as these sibling tubes re-ingest under new titles. Wire the asset-id+duration merge into the scheduler (every 12h, GOON_SCHED_THUMB_DEDUP_HOURS, 0=off) so it stays clean. Shared logic lives in app/scheduler/thumb_dedup.py (run_thumb_asset_dedup); the one-shot script now imports it. Same tight signature as the cleanup: family hosts only + identical duration (the bare asset-id number is reused across unrelated CDNs, so cross-host/diff- duration grouping is excluded). Reports 205b17d9 / 5a2944cb. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
36 lines
1.2 KiB
Python
36 lines
1.2 KiB
Python
"""One-shot merge tube-dupów po asset-id miniatury (rodzina hdporn.gg / fullmovies.xxx).
|
|
|
|
Logika współdzielona ze schedulerem: app/scheduler/thumb_dedup.py (job
|
|
`_job_thumb_asset_dedup` woła to samo periodycznie). Pełny opis sygnatury i guardów
|
|
w tamtym module.
|
|
|
|
Użycie (kontener worker):
|
|
python scripts/merge_dupe_thumb_asset.py [STUDIO_ID] [--commit]
|
|
Bez STUDIO_ID = global. Bez --commit = dry-run (lista par, nic nie scala).
|
|
"""
|
|
from __future__ import annotations
|
|
|
|
import sys
|
|
|
|
from app.scheduler.thumb_dedup import _groups, run_thumb_asset_dedup
|
|
|
|
|
|
def main() -> None:
|
|
commit = "--commit" in sys.argv
|
|
studio = next((a for a in sys.argv[1:] if not a.startswith("--") and len(a) >= 32), None)
|
|
|
|
if not commit:
|
|
groups = _groups(studio)
|
|
pairs = sum(len(g) - 1 for g in groups)
|
|
print(f"studio={studio or 'ALL'} groups={len(groups)} merges={pairs} commit=False", flush=True)
|
|
for g in groups:
|
|
for drop in g[1:]:
|
|
print(f" [dry] keep {g[0][:8]} <- drop {drop[:8]}")
|
|
return
|
|
|
|
res = run_thumb_asset_dedup(studio_id=studio, commit=True)
|
|
print(f"studio={studio or 'ALL'} {res}", flush=True)
|
|
|
|
|
|
if __name__ == "__main__":
|
|
main()
|