fix(scheduler): bulk_dedup performers cross_source_only + hard-timeout (OOM)
_job_bulk_dedup_performers called run_bulk_dedup(strategy="performers") without the cross_source_only guard whose docstring exists precisely to prevent this OOM. At current catalog scale the unguarded path materializes N²/2 pairs per prolific performer into a list → worker hit 6GB RSS and was OOM-killed every 12h (05:00/ 17:00), taking down concurrent tpdb/stashdb/movie ingests as killed_by_restart (0 new movies). Verified in prod: 05:00 run now completes (885k pairs scored, no OOM) and ingests succeed (stashdb +241, tpdb +175). Also wrap in _run_with_timeout like tpdb/stashdb (job had no hard-timeout). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
parent
fad72e9cd6
commit
9d0cb7f26e
1 changed files with 10 additions and 2 deletions
|
|
@ -210,8 +210,16 @@ def _job_bulk_dedup_performers() -> None:
|
|||
log.info("[scheduler] bulk_dedup performers starting")
|
||||
try:
|
||||
from app.scheduler.bulk_dedup import run_bulk_dedup
|
||||
bc = run_bulk_dedup(strategy="performers", dry_run=False)
|
||||
log.info("[scheduler] bulk_dedup performers done: %s", bc)
|
||||
# cross_source_only=True: bez tego flag pairwise generuje N²/2 par na płodnego
|
||||
# performera, materializowane w listę → worker OOM-killed co 12h (6GB RSS na
|
||||
# 7.6GB boxie, 2026-06-06), ubijając przy okazji równoległe tpdb/stashdb/ingesty.
|
||||
# Flag zawęża do cross-source kandydatów (TPDB↔StashDB) z pre-filtrem candidate.
|
||||
# Timeout-wrap jak tpdb/stashdb — job nie ma własnego hard-timeoutu.
|
||||
_run_with_timeout(
|
||||
lambda: run_bulk_dedup(strategy="performers", dry_run=False, cross_source_only=True),
|
||||
label="bulk-dedup-performers",
|
||||
)
|
||||
log.info("[scheduler] bulk_dedup performers done")
|
||||
except Exception:
|
||||
log.exception("[scheduler] bulk_dedup performers failed")
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Reference in a new issue