goon/app
jtrzupek 4922646011 feat(dedup): merge exact-phash + same-duration + shared-performer duplicates
bug-report 2026-06-03 ("ten sam czas, ta sama miniaturka, czemu się nie mergują"):
duplicate scenes not merged at ingest. Exact phash alone is noisy here (95% are
collisions on shared thumbnails/intro frames — different scenes; bulk_dedup scorer
correctly gives 0 auto-merge). The safe subset is exact-phash AND same duration
(±3s) AND shared performer/title — near-certain same scene. Same-duration is key:
it excludes the false-merge pattern (short-clip-vs-full has DIFFERING durations).

- scripts/merge_phash_exact_dupes.py: one-off, dry-run by default, per-pair re-fetch
  (handles clusters). Applied: 30 merged.
- bulk_dedup: add `_pairs_exact_phash` (SQL O(N log N), not the O(N²) Hamming scan)
  + strategy "phash_exact" — gated by the normal scorer (surfaces review candidates,
  no risky auto-merge), schedulable for ongoing exact-collision review.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 20:08:06 +02:00
..
api fix(playback): mark deleted sxyprn posts dead + rank native sources first 2026-06-07 14:09:01 +02:00
connectors refactor(ingest): rename scraper Source name "pornapp" -> "tube-scraper" 2026-06-07 16:54:55 +02:00
extractors fix(playback): mark deleted sxyprn posts dead + rank native sources first 2026-06-07 14:09:01 +02:00
models perf(taxonomy): denormalize scene_count for tags/performers/studios 2026-05-31 17:53:48 +02:00
normalize feat(ingest): SQL phash match, tag inference + backfill, clip-store skip, browse tubes, watchdog 2026-06-01 15:07:35 +02:00
resolve fix(tags): merge <base>2 numbered-duplicate tags + prevent regeneration 2026-06-06 23:18:44 +02:00
scheduler feat(dedup): merge exact-phash + same-duration + shared-performer duplicates 2026-06-07 20:08:06 +02:00
templates feat(seo): public HTML SEO router + templates; add CLAUDE.md; ignore .nimbalyst 2026-05-31 16:29:59 +02:00
__init__.py Initial commit 2026-05-20 10:10:22 +02:00
auth.py Initial commit 2026-05-20 10:10:22 +02:00
config.py feat(ingest): skip <180s tube scenes (trailers) + purge porndoe trailer orphans 2026-06-03 10:11:25 +02:00
db.py Initial commit 2026-05-20 10:10:22 +02:00
ingest.py feat(ingest): skip <180s tube scenes (trailers) + purge porndoe trailer orphans 2026-06-03 10:11:25 +02:00
main.py fix(apk 0.2.1): in-app installer "nic się nie dzieje" + oo launcher icon 2026-05-31 13:15:37 +02:00