bug-report 2026-06-03 ("ten sam czas, ta sama miniaturka, czemu się nie mergują"):
duplicate scenes not merged at ingest. Exact phash alone is noisy here (95% are
collisions on shared thumbnails/intro frames — different scenes; bulk_dedup scorer
correctly gives 0 auto-merge). The safe subset is exact-phash AND same duration
(±3s) AND shared performer/title — near-certain same scene. Same-duration is key:
it excludes the false-merge pattern (short-clip-vs-full has DIFFERING durations).
- scripts/merge_phash_exact_dupes.py: one-off, dry-run by default, per-pair re-fetch
(handles clusters). Applied: 30 merged.
- bulk_dedup: add `_pairs_exact_phash` (SQL O(N log N), not the O(N²) Hamming scan)
+ strategy "phash_exact" — gated by the normal scorer (surfaces review candidates,
no risky auto-merge), schedulable for ongoing exact-collision review.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Goon — self-hosted aggregator for adult-content scene metadata.
Indexes scenes from TPDB, StashDB, and 30+ public adult tube sites.
Cross-source deduplication via perceptual hash + Levenshtein distance.
FastAPI backend + APScheduler worker + React Native (Expo) mobile client.
FOSS, ad-free, donation-funded. See README for details.