Commit graph

28 commits

Author SHA1 Message Date
jtrzupek
f014a901de feat(scheduler): periodic title+duration dedup (missing-merge tube dupes)
Missing-merge duplicates (same performer + identical normalized title + identical duration-to-the-second) that bulk_dedup misses — tube re-scrapes and cross-tube re-ingests like porn00 pulling a video already present from xnxx (reports 28fe8181/32df33b1). Extracted the proven merge_exact_title_duration logic into app/scheduler/title_duration_dedup.py (script now a thin wrapper), wired a 12h scheduler job (playback-only = what users actually see, GOON_SCHED_TITLE_DEDUP_HOURS). Signal is near-certain (two different videos don't share byte-identical title AND exact duration); no shared performer = not merged (over-match guard). Verified: job registers (jobs=14), backlog currently 0 after the one-shot global merge.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-19 11:20:48 +02:00
jtrzupek
8b4783771f feat(scheduler): periodic thumb-asset dedup (hdporn.gg/fullmovies.xxx)
The one-off cleanup merged ~13.5k same-video-different-title dupes, but they regrow as
these sibling tubes re-ingest under new titles. Wire the asset-id+duration merge into
the scheduler (every 12h, GOON_SCHED_THUMB_DEDUP_HOURS, 0=off) so it stays clean.

Shared logic lives in app/scheduler/thumb_dedup.py (run_thumb_asset_dedup); the one-shot
script now imports it. Same tight signature as the cleanup: family hosts only + identical
duration (the bare asset-id number is reused across unrelated CDNs, so cross-host/diff-
duration grouping is excluded). Reports 205b17d9 / 5a2944cb.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-14 14:56:45 +02:00
jtrzupek
b5d9473898 feat(scripts): merge tube dupes by thumbnail asset-id (hdporn.gg/fullmovies.xxx family)
These sibling platforms share one video-id space and ingest the same video under
different titles, which bulk_dedup misses (different titles, no phash). Match by the
asset-id in the thumbnail path (/<bucket>000/<id>/) on img.hdporn.gg|fullmovies.xxx plus
identical duration, and merge. Hard host restriction + duration guard: the bare number
is reused for unrelated videos on other CDNs (verified via dry-run), so cross-host or
different-duration grouping is excluded. Run scoped (studio id) or global; dry-run by
default. Reports 205b17d9 / 5a2944cb. Ran on Parasited: 43 pairs merged.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-14 14:18:44 +02:00
jtrzupek
a9f0f94321 feat(sxyprn): mark dead posts during thumbnail refresh sweep
resolve_post() now distinguishes "Post Not Found" (mark dead_at — the
link wouldn't play anyway) from a live page with no fresh poster (leave
untouched), on top of the existing thumbnail refresh. Batched into
refresh_batch() with refreshed/dead/untouched counters.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 19:20:28 +02:00
jtrzupek
fef28ae56b feat(sxyprn): refresh rotting thumbnails from live post pages + scheduled job
CORRECTION to earlier "unrecoverable" call: the /post/<id> page is alive (200) and
DOES expose the scene's own fresh-signed poster via og:image / <video poster>
(post-id embedded, current timestamp) — only the STORED thumbnail URL had rotted.
Search/listings don't re-surface old posts (0 overlap), but per-post fetch works.

scripts/refresh_sxyprn_thumbs.py: iterate live sxyprn sources, fetch post page,
extract fresh og:image, UPDATE thumbnail_url (verified: refreshed URLs return 200).
_job_refresh_sxyprn_thumbs: every 12h refresh the 1200 least-recently-updated sources
(cycles the ~19k catalog within the expiry window). Pairs with the scene_resolver
overwrite fix so refreshed thumbnails stick.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-10 10:36:30 +02:00
jtrzupek
576a424615 fix(scripts): force UTF-8 stdout in publish_update — stop false exit-1
Final Polish-char print crashed with UnicodeEncodeError on Windows cp1252 stdout
AFTER a successful publish, making exit code 1 misleading. Reconfigure stdout/stderr
to UTF-8 up front.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 11:58:43 +02:00
jtrzupek
a9545a7ab2 feat(scripts): merge_exact_title_duration --playback-only + progress logging
--playback-only restricts to scenes with live playback (app-visible dupes only).
Progress print every 500 merges for long global runs.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 11:02:19 +02:00
jtrzupek
e23e2d1f17 fix(merge): move playback_sources on scene merge + exact-title+duration dedup
merge_scenes never reassigned playback_sources → ON DELETE CASCADE dropped them
with the absorbed scene. Cross-source (canonical) merges rarely had tube playback
so it hid, but tube-dup merges silently LOST playback links. Add _move_playback_sources
(global unique (origin,page_url) guarantees no collision on reassign).

+ merge_exact_title_duration.py: catches missing-merge dupes bulk_dedup misses
(same performer + identical normalized title + identical duration_sec, no phash).
Bad Bella had 25 such pairs (bug-report ef92809d "duplikat, te same miniatury").

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 10:56:50 +02:00
jtrzupek
d4b89f16e3 fix(scripts): backfill arg parser consumed --workers value as LIMIT
'--workers 3' set limit=3 because the bare '3' also hit the isdigit() branch.
Skip flag-value positions when scanning for a positional LIMIT.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 10:15:09 +02:00
jtrzupek
7bf1fd6716 fix(xvideos): parse model name from nested span.name — recover 0-performer scenes
xvideos renders the scene's models as `<a href="/models/slug">...<span class="name">
Display Name</span>...`. The old _MODEL_RE wanted text immediately after the anchor
`>` and never matched current markup → browse-scraped scenes landed with 0 performers
(bug-report 2026-06-07: "no actors, but Rebecca Johnson is on the page"). New regex
captures slug + nested span.name, bounded within the anchor. + backfill script for the
~11.9k existing zero-performer xvideos scenes (54% have a real /models/ link; resolver
merges names to canonical by name_normalized).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 10:13:21 +02:00
jtrzupek
9f46e8dea9 feat(scripts): dedup_n2_canonical — resolve n=2 false-merges via canonical duration
audit_false_merges only auto-fixes n>=3 (majority disambiguates the outlier); n=2
was "needs human review" — but the merge-review UI is gone, nobody triages 500+.
Measured: of 535 n=2 duration-divergent scenes, ALL have a canonical scene.duration_sec
(TPDB/StashDB) and 531 have exactly one source matching canonical (±20%) + one >2x off
→ unambiguous false-merge. Kill the off source (works both directions since canonical is
corroborated by the matching keeper, unlike the Omar-case the n>=3 audit guards against).

Applied: 529 sources marked dead (4 ambiguous skipped). Reversible (dead_at).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 20:25:10 +02:00
jtrzupek
4922646011 feat(dedup): merge exact-phash + same-duration + shared-performer duplicates
bug-report 2026-06-03 ("ten sam czas, ta sama miniaturka, czemu się nie mergują"):
duplicate scenes not merged at ingest. Exact phash alone is noisy here (95% are
collisions on shared thumbnails/intro frames — different scenes; bulk_dedup scorer
correctly gives 0 auto-merge). The safe subset is exact-phash AND same duration
(±3s) AND shared performer/title — near-certain same scene. Same-duration is key:
it excludes the false-merge pattern (short-clip-vs-full has DIFFERING durations).

- scripts/merge_phash_exact_dupes.py: one-off, dry-run by default, per-pair re-fetch
  (handles clusters). Applied: 30 merged.
- bulk_dedup: add `_pairs_exact_phash` (SQL O(N log N), not the O(N²) Hamming scan)
  + strategy "phash_exact" — gated by the normal scorer (surfaces review candidates,
  no risky auto-merge), schedulable for ongoing exact-collision review.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 20:08:06 +02:00
jtrzupek
d5409d01ce feat(scripts): audit_teaser_only — hide scenes whose only source is a teaser
bug-report 2026-06-01 (48d6cc6b): scene shows canonical duration from TPDB
(real 22min studio scene) but the only live playback_source is a short tube
teaser (xnxx 21s) → "shows 22m, plays <1m". When ALL live sources are a tiny
fraction (<15%) of a known canonical (>300s), the scene has no real playback;
mark those sources dead → scene becomes orphan → hidden (has_playback=false),
consistent with the orphan-hiding policy. Reversible (dead_at), conservative
(skips scenes with any unknown-duration or full-length live source).

Applied on prod: 182 sources dead across 174 scenes.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 19:52:44 +02:00
jtrzupek
a196fcbcdb refactor(ingest): rename scraper Source name "pornapp" -> "tube-scraper"
The umbrella Source.name for all direct tube scrapers (deep-crawl, browse-latest,
performer-driven) was "pornapp" — a misleading leftover from the removed external
porn-app API. It read like a dependency on a third-party "pornapp" service; it is
not — these are our own scrapers hitting 25+ tubes directly (kind=scraper,
origin tube:<sitetag>). Renamed to "tube-scraper" via a single SCRAPER_SOURCE_NAME
constant; DB row renamed in place (UPDATE name, same id) so all ingest_runs +
external_records history stays linked. No behavior change — external_id keying
(sitetag:url) and dedup are unaffected.

NOTE: playback_sources.origin "pornapp:<sitetag>" prefix is a separate legacy
format (resolve_playback parses it) and is intentionally left untouched.

Verified on prod: row renamed (0 stray "pornapp"), new runs land on "tube-scraper".

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 16:54:55 +02:00
jtrzupek
fad72e9cd6 fix(tags): merge <base>2 numbered-duplicate tags + prevent regeneration
TPDB taxonomy emits numbered-duplicate tags (name "Bubble Butt2"); slugify
yields "bubble-butt2" (no separator before digit), so resolve_tag created a
separate tag alongside "bubble-butt". Tube scenes inherited the dup via
scene-merge → 75 pairs, ~10k scene_tags on the wrong tag.

- resolve_tag: canonicalize "<base>2" -> "<base>" when base exists (handles
  current + future; trailing-"2"+alpha guard leaves milf-30/teen18 intact)
- scripts/merge_dup2_tags.py: one-off bulk merge (scene_tags + movie_tags +
  blacklist) and taxonomy-count refresh

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-06 23:18:44 +02:00
jtrzupek
4f0fb1636c chore(scripts): tube SSR-richness survey probe
Ad-hoc research tool: for a list of candidate tubes, fetch a listing page, grab a scene
URL, and classify the detail — reachable / JSON-LD VideoObject / duration / performers /
tags. Used 2026-06-03 to evaluate deep-crawl candidates (redtube + drtuber look strong;
pornhub/spankbang/porntrex/hqporner/youporn rejected; nuvid/motherless bare).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-03 11:23:49 +02:00
jtrzupek
7e46e5ac48 feat(scheduler): deep-crawl full tube catalogs (Phase 2a — ingest-all)
We ingested only ~3% of each browse tube's catalog (porndoe >62k scenes; we had 1959)
because tubes were hit only by performer-search + top-N browse. Pilot (porndoe pages
64-110): 1119 new scenes, 100% playable + 100% tagged, 0% canonical overlap (purely
additive — content not in TPDB/StashDB).

- app/scheduler/deep_crawl.py: round-robin over ALL_BROWSE_SCRAPERS, per-tube page cursor
  in app/_state/deepcrawl_state.json (no DB migration), deep-paginate from the cursor,
  idempotent (resolver skips known by raw_hash), mark 'exhausted' at catalog end then
  reset cursors for an incremental re-sweep.
- _job_deep_crawl: hourly, 60 pages/run (~1860 scenes, ~22 min), wrapped in the 1h
  hard-timeout; registered in build_scheduler (jobs=10).
- config: sched_deep_crawl_hours=1, deep_crawl_pages_per_run=60, deepcrawl_state_path.
- scripts/pilot_porndoe_deepcrawl.py: one-off pilot used to validate the approach.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-03 09:26:44 +02:00
jtrzupek
5896b58688 fix(ota): make publish_update.py work one-shot on Windows git-bash
Publishing the OTA from Windows git-bash failed at the scp step (2026-06-02):
- git-bash (MSYS) rewrote the /root/... env path to 'C:/Program Files/Git/root/...'
  before Python saw it → upload targeted a bogus remote dir.
- scp local source 'C:\...\dist' is parsed as host 'C' (drive letter = host).

Fixes: default runtime 1.0→1.1 (active channel, app.json runtimeVersion=1.1); scp
source passed as '.' with cwd=DIST (no drive letter); MSYS_NO_PATHCONV=1 in subprocess
env; defensive un-mangle of a git-bash-converted VPS_BASE.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-02 09:56:34 +02:00
jtrzupek
817b50fbf8 fix(scenes): propagate playback duration to Scene + duration-consistent counts
Scene.duration_sec was NULL for ~74% of playable scenes (tube duration lives on
playback_source, never propagated to Scene), so the mobile min_duration_sec=60 filter
(Scene.duration_sec >= 60; NULL fails) silently hid them — surfaced as '119 in favorites,
14 after entering the performer' (Safira Yakkuza).

- resolver: _effective_duration() falls back to max live playback_source duration when the
  connector provides no scene-level duration (forward fix, used in create + update).
- scripts/backfill_scene_duration_from_playback.py: one-off idempotent backfill (recovered
  204,014 scenes).
- taxonomy_counts: scene_count now counts playable AND duration_sec >= 60, matching the
  always-60s-filtered scene lists, so favorites/performer/studio/tag badges agree with what
  the scene screen actually shows (Safira: 39 == 39).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-01 21:31:01 +02:00
jtrzupek
cd12348782 fix(movies): paradisehill delta date-granularity + browse cadence docs
- paradisehill.fetch_movies compared release_date coerced to midnight against the
  `since` timestamp, so the chronological crawl stopped at the first upload dated
  the same calendar day as `since` and silently dropped most new movies (0-2 seen
  per run; Movies tab stalled). Compare by DATE with a 1-day grace instead; idempotent
  external_records upsert dedups the re-fetched recent window.
- scripts/backfill_paradisehill_movies.py: one-off no-delta deep crawl to recover the
  backlog missed during the bug (idempotent, resumable).
- docs: correct stale 'raz dziennie/24h' browse-latest comments to 6h (4x/day), the
  actual configured cadence (config.py sched_browse_latest_hours=6).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-01 17:00:10 +02:00
jtrzupek
da7fcda132 feat(ingest): SQL phash match, tag inference + backfill, clip-store skip, browse tubes, watchdog
Resolver/perf:
- find_by_phash_within: nearest match via Postgres bit_count over bit(64) XOR
  instead of Python scan of all phash fingerprints (~20x faster per scene;
  unblocks long delta runs that were killed mid-run before since advanced).

Scheduler/reliability:
- reap ingest_runs stuck in 'running' on worker startup (killed_by_restart).
- smoke_test: per-source ingest health, stuck-run and browse-freshness checks
  -> Sentry; exclude killed_by_restart from the failed-run alarm.

Tags (ingest with tags + fill blanks):
- wire infer_tag_slugs into normalize_scene so tube scenes get title-inferred
  tags (was dead code); union with connector tags.
- scripts/backfill_inferred_tags.py: keyset/batched/idempotent backfill for
  existing tagless scenes (playable tag coverage 16% -> ~52%).

Clip-store:
- skip ManyVids/IWantClips/Clips4Sale/... from canonical sources at ingest
  (GOON_SKIP_CLIP_STORE, default on) — permanent orphans, ~56% of canonical
  ingest, never have a free-tube playback source.

Browse tubes:
- enable fullmovies + hdporn.gg: studio parsed from title prefix instead of
  the /networks/ sidebar (which always yielded the first listed network);
  drop phash compute (pilot: 0% canonical hit within Hamming 5 — auto-screenshots),
  matching relies on title/performer/duration.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-01 15:07:35 +02:00
jtrzupek
ee83ae5e97 scripts: add gated --fix to false-merge audit (short-clip outliers)
Opt-in remediation for the duration-inconsistent scenes found by the audit.
Scope is deliberately narrow and reversible:

- only scenes with >=3 duration-bearing sources AND max/min ratio > 3x
- anchored on scene.duration_sec (the canonical value), never the median of
  sources (a median is wrong when several bogus short clips outvote the real
  full-length source)
- marks dead ONLY sources that are >2x SHORTER than the canonical — a falsely
  merged source is almost always a short SEO clip/preview. Sources longer than
  the canonical are left alone, since an over-long outlier more often means the
  canonical duration itself is too low (so killing the long source would drop
  the real video); those stay for manual review.
- guards that at least one live source remains
- dry-run by default; --yes to apply; sets dead_at (reversible), not delete

First run marked 514 short-clip sources dead across 228 scenes.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-01 11:30:23 +02:00
jtrzupek
ee1d0c7610 scripts: add false-merge audit (duration-inconsistent scenes)
Read-only data-quality audit for scene merges made before the 2026-05-12
scoring hardening (which now caps weak-signal aggregator matches at 0.85 and
tightened the duration bump to <=3s). The auto-merge candidate log does not
record which external_ref was attached, so a merge cannot be reversed from the
log alone. Instead this detects false merges by their effect: a scene that
absorbed a different video ends up with playback_sources of inconsistent
durations (e.g. a 60s clip alongside a 2h source).

Reports counts + severity buckets by max/min duration ratio, can list the worst
offenders with a per-source breakdown, and can export suspects to JSON. Mutates
nothing — remediation (detach/mark-dead the outlier source) is left as an
explicit, separately-decided step because short durations can be legitimate
(previews) and n=2 scenes are ambiguous about which source is canonical.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-01 11:23:10 +02:00
jtrzupek
64506690df fix(ota+mobile): strip expo-font from bundle, runtime back to 1.0
DIAGNOZA NA EMULATORZE (emulator-5554, goon-v0.1.9.apk):
Dwa błędne założenia z poprzednich sesji obalone empirycznie:

1. RUNTIME: APK ma EXPO_RUNTIME_VERSION="1.0" (NIE 0.1.9 — pomyliłem versionName
   z runtime). App akceptuje TYLKO manifest runtime 1.0. Mój wcześniejszy
   "fix" na 0.1.9 (c19da51) był wstecz — app go ignorował. Cofnięte: app.json
   + publish_update RUNTIME_DEFAULT z powrotem na "1.0".

2. CRASH: prawdziwa przyczyna "nic się nie pojawia" — OTA bundle z expo-font
   crashował: "Cannot find native module 'ExpoFontLoader'" → expo-updates
   ErrorRecovery rollback. APK (build 22-maja) nie ma natywnego ExpoFontLoader
   (expo-font dodany 30-maja, PO buildzie APK). OTA NIE MOŻE dostarczyć native
   modułu. Potwierdzone: embedded bundle + served bundle grep = 0 ExpoFontLoader;
   stary font-bundle crashował, font-stripped NIE.

FIX: usunięto useFonts z App.tsx + expo-font import; theme.fonts → undefined
(system font); SceneTile/MoviePosterCard/navigation/GoonWordmark fontFamily →
fontWeight. Wszystko inne (2-col grid, oxblood, logo SVG-RNSVG-jest-w-APK)
zostaje. Custom fonty wrócą przy rebuildzie APK z expo-font (option B).

ZWERYFIKOWANE: bundle d5b87e5c (runtime 1.0, 0 ttf) — emulator launch:
`ReactNativeJS: Running "main"`, zero JS errors, brak ExpoFontLoader crash.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 11:41:29 +02:00
jtrzupek
c19da51aff fix(ota): publish under runtimeVersion 0.1.9 to match installed APK
ROOT CAUSE wszystkich "znikajacych" OTA (2026-05-29..30, ~6 publishow w prozni):
zainstalowany APK ma EXPO_RUNTIME_VERSION=0.1.9 (AndroidManifest), ale app.json
mialo runtimeVersion "1.0" i publish_update.py defaultowal --runtime 1.0.
Updaty ladowaly w /expo-updates/1.0/, a app z headerem expo-runtime-version:0.1.9
dostawal HTTP 204 (no update) i nigdy nic nie aplikowal mimo "OK live".

Fix:
- app.json runtimeVersion "1.0" -> "0.1.9" (== APK)
- publish_update.py RUNTIME_DEFAULT "1.0" -> "0.1.9"
- Republished caly skumulowany bundle pod 0.1.9 (ce275235) — zweryfikowane:
  manifest dla expo-runtime-version:0.1.9 zwraca 200 + runtimeVersion:0.1.9 +
  bundle 4.76MB serwuje 200.

Stary /expo-updates/1.0/ (~40 nieaplikowanych updateow) do usuniecia osobno.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 10:59:07 +02:00
https://github.com/goon-foss/goon
7979d5fa61 session work: bug-report fixes + WIP cleanup
User-facing bugs resolved (per bug_reports table 2026-05-25):
- 40cd28aa (short-scene filter): mobile api.ts default min_duration_sec=60
  hides 6519 sub-60s scenes across all list endpoints (Performer/Site/Tag/
  Browse). Caller may override with explicit 0.
- 5e89ef7e (porndoe needs cookies/play click): INJECTED_JS in PlayerScreen
  now auto-clicks player-poster overlay (player-poster-play, big-play-button,
  vjs-big-play-button, jw-icon-display, btn-big-play, mejs__overlay-button,
  play-button, btn-play, videoPlayButton). Triggered same interval as
  consent-dismiss + ad-iframe removal.
- b1b5e1a2 (Mixdrop czarny ekran): re-enable mixdrop direct stream via VPS
  curl_cffi proxy (was: skip → WebView fallback → blank screen). Backend
  pipeline (mixdrop.py extract + stream_proxy._curl_cffi_stream with JA3 +
  auto-refetch on token expire) was already complete; just removed the skip
  in app/api/playback.py.

Plus ongoing WIP (paradisehill multi-part extraction, stream_proxy refetch
logic, gesture race fix for long-press 2x speed, anti-adblock INJECTED_JS
defenses, scripts for freshporno backfill, new sources API).
2026-05-25 22:02:52 +02:00
https://github.com/goon-foss/goon
642f1ab8b8 Mobile 0.1.9: OTA enable, WebView cookie-dismiss fix, porndoe connector
Mobile / OTA:
- Enable Expo Updates (app.json + AndroidManifest) → api.goon-foss.org
- Bump 0.1.6 → 0.1.9 (build.gradle, app.json, appVersion.ts, main.py /version)
- backend.ts: default public backend auto-connect (no manual login)

WebView fallback fix (PlayerScreen INJECTED_JS):
- Auto-dismiss cookie/consent gates (hqporner et al. blocked kt_player init)
- Context-scoped: only clicks consent buttons inside cookie/gdpr containers
- Retry window for <source>.src polling raised 5→15 ticks (post-dismiss init)

Resolver:
- Series-position + modifier mismatch detector (Episode 2≠4, BTS/unedited)
  → composite_score hard-reject / cap; wired into scene_score + bulk_dedup
- aggregator-mode candidate query: LIMIT 500 + title-match ordering

Connectors:
- porndoe.com browse scraper (JSON-LD VideoObject) — theporndude audit pilot

landing: APK links → goon-v0.1.9.apk

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-22 11:20:57 +02:00
goon-foss
ad0284585b Initial commit
Goon — self-hosted aggregator for adult-content scene metadata.

Indexes scenes from TPDB, StashDB, and 30+ public adult tube sites.
Cross-source deduplication via perceptual hash + Levenshtein distance.
FastAPI backend + APScheduler worker + React Native (Expo) mobile client.

FOSS, ad-free, donation-funded. See README for details.
2026-05-20 10:10:22 +02:00