_candidate used OR logic (studio OR date±7d OR dur±30s) → 938,950 pairs; Etap-2 scoring at ~110/s never finished in 1800s → bulk_dedup_performers HUNG every run, orphan thread leaked until restart. Require AND: same studio plus (date±2d OR dur±30s). 939k→16k pairs, full run 213s. Real cross-source dup of one master shares studio + near date/duration; rare studio_id-mismatch pairs skipped on purpose — a job that COMPLETES beats one that times out merging nothing. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> |
||
|---|---|---|
| .. | ||
| __init__.py | ||
| browse_latest.py | ||
| bulk_dedup.py | ||
| deep_crawl.py | ||
| jobs.py | ||
| performer_driven.py | ||
| taxonomy_counts.py | ||
| worker.py | ||