_candidate used OR logic (studio OR date±7d OR dur±30s) → 938,950 pairs; Etap-2 scoring at ~110/s never finished in 1800s → bulk_dedup_performers HUNG every run, orphan thread leaked until restart. Require AND: same studio plus (date±2d OR dur±30s). 939k→16k pairs, full run 213s. Real cross-source dup of one master shares studio + near date/duration; rare studio_id-mismatch pairs skipped on purpose — a job that COMPLETES beats one that times out merging nothing. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> |
||
|---|---|---|
| .. | ||
| api | ||
| connectors | ||
| extractors | ||
| models | ||
| normalize | ||
| resolve | ||
| scheduler | ||
| templates | ||
| __init__.py | ||
| auth.py | ||
| config.py | ||
| db.py | ||
| ingest.py | ||
| main.py | ||