goon/app/scheduler
jtrzupek 2b602beea5 fix(dedup): tighten cross-source candidate prefilter — kill 1800s hang (GOON-V)
_candidate used OR logic (studio OR date±7d OR dur±30s) → 938,950 pairs;
Etap-2 scoring at ~110/s never finished in 1800s → bulk_dedup_performers HUNG
every run, orphan thread leaked until restart. Require AND: same studio plus
(date±2d OR dur±30s). 939k→16k pairs, full run 213s. Real cross-source dup of
one master shares studio + near date/duration; rare studio_id-mismatch pairs
skipped on purpose — a job that COMPLETES beats one that times out merging nothing.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 10:03:33 +02:00
..
__init__.py Initial commit 2026-05-20 10:10:22 +02:00
browse_latest.py refactor(ingest): rename scraper Source name "pornapp" -> "tube-scraper" 2026-06-07 16:54:55 +02:00
bulk_dedup.py fix(dedup): tighten cross-source candidate prefilter — kill 1800s hang (GOON-V) 2026-06-08 10:03:33 +02:00
deep_crawl.py refactor(ingest): rename scraper Source name "pornapp" -> "tube-scraper" 2026-06-07 16:54:55 +02:00
jobs.py fix(scheduler): bulk_dedup performers cross_source_only + hard-timeout (OOM) 2026-06-07 11:00:19 +02:00
performer_driven.py refactor(ingest): rename scraper Source name "pornapp" -> "tube-scraper" 2026-06-07 16:54:55 +02:00
taxonomy_counts.py fix(scenes): propagate playback duration to Scene + duration-consistent counts 2026-06-01 21:31:01 +02:00
worker.py feat(scheduler): deep-crawl full tube catalogs (Phase 2a — ingest-all) 2026-06-03 09:26:44 +02:00