Commit graph

174 commits

Author SHA1 Message Date
jtrzupek
1654d78d59 fix(ingest): strip NUL bytes from raw payloads before Postgres write
A source (TPDB) returned a performer alias containing a literal U+0000 ("Ramon..").
Postgres cannot store  in JSONB or text, so the external_records JSONB insert in
_upsert_external_record failed with UntranslatableCharacter and the scene never ingested
(GOON-Z). Recursively strip NUL from the raw payload (-> external_records.raw) and, when
present, also re-validate the RawScene/RawMovie so normalize -> typed text columns get
clean data too. Gated by a cheap _has_nul scan so clean records (the overwhelming
majority) pay no extra cost.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 19:48:22 +02:00
jtrzupek
16eb633bde feat(mobile): phone-side resolvers for IP-bound tubes (sxyprn, eporner, voe)
These CDNs bind their signed video URL to the IP that fetched the page, so a
server-side resolve hands the phone a URL bound to the server IP -- the device then
gets a placeholder/403 and falls back through the proxy, streaming the whole video
through the server. Resolve on the device instead (token binds to the phone IP) so
playback goes direct with zero proxy bandwidth.

Ports of the existing backend extractors:
- sxyprnResolver.ts: data-vnfo + boo/ssut51 transform
- epornerResolver.ts: vid+hash -> /xhr/video mp4 sources
- voeResolver.ts: mirror redirect + 7-step payload decoder

Wired into SceneDetailScreen.onPress (sxyprn/eporner) and MovieDetailScreen.playVoe (voe).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 16:14:25 +02:00
jtrzupek
aa05ce2647 feat(playback): direct-HLS manifest passthrough + proxy stream drop handling
Time-bound HLS hosters whose manifest URL lacks a .m3u8 extension (e.g. pornhat's
"...mp4,?..." path) were mis-detected by ExoPlayer as progressive MP4 and failed,
forcing a full proxy fallback that streamed the whole video through the server. Serve
such manifests via /proxy/hls/<token>/play.m3u8 with child URLs left absolute on the
CDN, so the device fetches variant+segments directly and only the ~1KB manifest is
proxied. Routed only for mobile_direct_ok (time-bound) HLS without a .m3u8 path.

Also swallow httpx.TransportError in the stream proxy body generator: an upstream CDN
closing the connection mid-stream is benign (client just retries a range) and should
not surface as an unhandled error.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 16:14:25 +02:00
jtrzupek
072f2608b3 chore: gitignore marketing-shots/ and one-off _*.py scripts
Keep local-only marketing material and throwaway backfill scripts out of
the public repo (same rationale as the existing screenshots/ entry).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 19:28:22 +02:00
jtrzupek
a9f0f94321 feat(sxyprn): mark dead posts during thumbnail refresh sweep
resolve_post() now distinguishes "Post Not Found" (mark dead_at — the
link wouldn't play anyway) from a live page with no fresh poster (leave
untouched), on top of the existing thumbnail refresh. Batched into
refresh_batch() with refreshed/dead/untouched counters.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 19:20:28 +02:00
jtrzupek
956a0feb22 docs: correct Bright Data proxy type (ISP, flat-rate not per-GB)
It is an ISP proxy (static ISP IPs, flat billing), not residential —
so HTML-ingest bandwidth is free and the full deep-crawl is fine.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 19:18:40 +02:00
jtrzupek
21bc8bf1fe feat(superporn): browse scraper via Bright Data residential proxy
superporn hard-blocks the VPS IP with Cloudflare 403 on every TLS
impersonation, so HTML ingest routes through Bright Data residential
(BRIGHTDATA_PROXY_URL, parsed in config). First scraper to use a proxy:
optional _proxy on the browse base, threaded into browser_get.

JSON-LD VideoObject (title/desc/uploadDate/thumb/duration) + pornstar
and category chips; superporn double-encodes HTML entities so titles
are unescaped twice. Thumbnails fetch fine from the VPS (no proxy).

Playback stays off-proxy: the <source> mp4 token is IP-bound to the
fetcher, so resolve is phone-side via WebView (extractor superporncom
-> _vps_blocked_fallback), same as porndoe.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 18:47:45 +02:00
jtrzupek
80fd83cb4e feat(tubes): add 4k69 + neporn browse scrapers, shared PlayTube base
4k69.com (~65k scenes): same PlayTube CMS as hqfap - common logic moved
to _playtube.py (sitemap catalog, JSON-LD, pills). Studio classified by
matching category pills against the studios index page. Streams are
get_file (fullmovies family) returned unresolved with mobile_direct,
2160p skipped.

neporn.com: KVS engine, latest-updates listing, JSON-LD + video:duration
meta, performers from models links with flashvars video_tags fallback
for fresh uploads. Resolve via _kvs; final URL portable cross-IP.

superporn.com rejected: Cloudflare 403 from VPS on all TLS impersonations.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 18:15:13 +02:00
jtrzupek
6de986b9a7 feat(hqfap): browse scraper + native mp4 extractor (~120k scenes)
PlayTube CMS. Sitemap-based pagination (listing has no GET paging),
JSON-LD VideoObject metadata, pornstar/category pills, " Clips"
categories mapped to studio. Direct mp4 (cdnde.com/okcdn.ru), tokens
time-bound and portable cross-IP, so mobile plays direct.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 17:51:04 +02:00
jtrzupek
08079787da feat(sxyprn): on-demand thumbnail resolver (live posters, ~1h-TTL workaround)
trafficdeposit poster tokens live ~1h (hour-bucketed), so stored URLs can't persist.
New GET /proxy/sxyprn-thumb/{post_id}: resolves the current og:image from the live
/post/<id> page (cache resolved poster URL ~40min), streams bytes with Referer +
long client Cache-Control (URL is stable per post_id → client disk-caches the image,
backend fetches each post ~once). Deleted posts ("Post Not Found") → 404.

Scene grid now emits /proxy/sxyprn-thumb/<id> for sxyprn sources (derived from
page_url) instead of the dead stored trafficdeposit URL. Verified: live post → 200
image, deleted → 404, grid emits resolver URL. Backend-only, no OTA.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-10 15:02:49 +02:00
jtrzupek
f7670963df fix(sxyprn): disable thumbnail refresh job — trafficdeposit token has ~1h TTL
CORRECTION: trafficdeposit thumbnail tokens are hour-bucketed and valid only ~1h
(verified 2026-06-10: stored ts=11:00 dead at 12:27, current ts=13:00 loads). Earlier
"~weekly rot" read was wrong. Storing/periodically-refreshing sxyprn thumbnail URLs
is futile — they expire within the hour. Default the refresh job OFF (kept in code).
The dead-marking sweep (Post Not Found → dead_at) it performed was still valid. Live
sxyprn thumbnails need on-demand resolution at serve time (future work).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-10 14:29:24 +02:00
jtrzupek
fef28ae56b feat(sxyprn): refresh rotting thumbnails from live post pages + scheduled job
CORRECTION to earlier "unrecoverable" call: the /post/<id> page is alive (200) and
DOES expose the scene's own fresh-signed poster via og:image / <video poster>
(post-id embedded, current timestamp) — only the STORED thumbnail URL had rotted.
Search/listings don't re-surface old posts (0 overlap), but per-post fetch works.

scripts/refresh_sxyprn_thumbs.py: iterate live sxyprn sources, fetch post page,
extract fresh og:image, UPDATE thumbnail_url (verified: refreshed URLs return 200).
_job_refresh_sxyprn_thumbs: every 12h refresh the 1200 least-recently-updated sources
(cycles the ~19k catalog within the expiry window). Pairs with the scene_resolver
overwrite fix so refreshed thumbnails stick.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-10 10:36:30 +02:00
jtrzupek
bb9e1afc31 fix(resolver): refresh thumbnails on re-scrape instead of fill-only-if-null
_upsert_playback_sources only set thumbnail_url when the existing value was NULL,
so signed CDN thumbnails that ROT (sxyprn/trafficdeposit tokens expire ~weekly →
404) were never replaced even when a fresh re-scrape captured a valid URL — making
the rot permanent (bug 2026-06-10). Always overwrite thumbnail_url/animated_thumbnail_url
with the freshly-scraped value when present; other fields keep fill-if-null. Lets
the regular performer-driven ingest self-heal thumbnails for re-crawled scenes.

(Note: old sxyprn backlog can't be bulk-refreshed — search/listings don't re-surface
those posts, verified 0 overlap — so it's forward-looking; old sxyprn-only scenes
fall back to the clean placeholder.)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-10 10:28:18 +02:00
jtrzupek
32c18a6d0f fix(mobile): English long-press action labels + clean thumb error placeholder
bug-report c25e9b55: long-press scene actions were in Polish — translate menu,
banner and confirm dialogs to English. Thumb 'error' state (e.g. expired sxyprn
thumbnail 404) now shows the same 🎬 placeholder as 'empty' instead of a ⚠ broken
glyph (bug 2026-06-10).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-10 10:11:10 +02:00
jtrzupek
adbdce1c75 fix(api): de-prioritize rotting sxyprn/trafficdeposit thumbnails
sxyprn thumbnails are time-signed on trafficdeposit CDN and ROT — the signed asset
404s after ~weeks and can't be re-signed/refreshed server-side (bug 2026-06-10,
~15k sxyprn-only scenes showed broken thumbs). In the light-list slim-thumbnail pick,
prefer a thumbnail from any non-trafficdeposit source; fall back to sxyprn only when
it's the scene's sole thumbnail (recent ones still load; dead ones now render a clean
placeholder client-side instead of a broken image).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-10 10:11:10 +02:00
jtrzupek
200db33d78 feat(mobile): send X-Device-Id, one-time adopt-legacy
GoonClient attaches a stable per-install device id (SecureStore, lazy UUID) on
every request so server-side user state is scoped per device. On first launch
after update, call /me/adopt-legacy once (SecureStore flag) to claim the previous
shared state onto this device — the instance owner should relaunch first.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-10 08:58:02 +02:00
jtrzupek
c8baa11604 feat(api): device-scope user state (favorites/progress/blacklists)
Public instance has no accounts, so all user state was GLOBAL in DB — new users
saw/overwrote each other's (and Jan's) favorites, watched badges and blacklists
(bug 2026-06-10). Add device_id (VARCHAR 64) to 9 state tables with composite PK
(device_id, entity_id); app sends X-Device-Id header (get_device_id dep). All
favorites/scene-favorites/blacklist/watch + scene&movie list/detail (is_favorite,
watched, blacklist-hide) now filter by device. Existing rows backfilled to
'legacy-shared'; POST /me/adopt-legacy reassigns them to the caller once. Old
clients (no header) map to legacy-shared so they keep working until OTA updates.

Migration 0022: add col, backfill, composite PK. Verified on prod: 967 progress
rows preserved, device isolation holds (new device sees none of legacy state).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-10 08:58:01 +02:00
jtrzupek
953068f0db docs(claude): add resolve/playback findings + local debugging guide
Capture the durable Goon facts (phone-side resolve for IP-bound/Turnstile hosters,
DoodStream/playmogo pass_md5, _embed_iframe vs _vps_blocked_fallback, stable image
proxy tokens, paradisehill multipart, dedup/merge) and a local-debugging section
(prod psql/worker patterns, Windows real-Python gotcha, Android emulator AVD `goon`
+ FLAG_SECURE-off screencap + 2x OTA apply, Chrome DevTools port 9223 + CDP-blanking
+ hoster network signatures). No secrets/IPs/usernames — env-var forms only.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 21:51:29 +02:00
jtrzupek
904f8984c8 feat(mobile): tile long-press actions (hide / mark-duplicate), drop dead preview
bug-report 5a6844db: the hold-to-preview animated gesture did nothing useful.
Replace it with a long-press action menu on scene tiles:
  - Ukryj scenę → POST /scenes/{id}/hide
  - Oznacz jako duplikat → enter selection mode; tapping another tile merges the
    long-pressed scene INTO the tapped one (POST /scenes/{keep}/merge/{drop}).
SceneActionsProvider holds the selection state + a bottom banner, so it works across
all 5 scene-list screens via the shared SceneTile (no per-screen wiring). Selecting
mode highlights tappable tiles and badges the pending duplicate. Animated thumbnails
kept only as a still-fallback image; has_animated_thumbnail filter removed.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 09:52:15 +02:00
jtrzupek
e1c7efb947 chore(api): drop unused has_animated_thumbnail scene filter
The hold-to-preview gesture is being removed (did nothing useful), and no client
sends this filter. Remove the Query param, its EXISTS filter, and the pure-default
count guard reference.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 09:52:15 +02:00
jtrzupek
e98ef6577e feat(api): scene hide + merge-duplicate endpoints for long-press actions
POST /scenes/{id}/hide — marks all playback_sources dead so the scene drops out
of has_playback lists (reversible via dead_at; row kept for dedup/refs).
POST /scenes/{keep_id}/merge/{drop_id} — merges drop into keep via scene_merge
(moves refs/performers/tags/fingerprints/playback). Backs the new tile long-press
menu (hide / mark-duplicate) replacing the dead animated-preview gesture.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 09:47:16 +02:00
jtrzupek
abddd27856 fix(proxy): stable image-proxy URLs so expo-image actually caches thumbnails
make_token embedded the current timestamp in the expiry, so every /scenes fetch
produced a DIFFERENT proxied URL for the same thumbnail → expo-image (keyed by URI)
cache-missed and re-downloaded every list load / app launch. Add stable_bucket_sec:
quantize the expiry base to a window so the URL is identical across requests.
_wrap_image_proxy uses a 7-day bucket → thumbnails disk-cache for a week instead of
re-fetching constantly. Answers "czy miniatury są cache'owane" — now yes.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 09:45:22 +02:00
jtrzupek
3e8a221981 feat(extractors): native HLS for xhamster; hqporner flyflv player
xhamster: move from WebView fallback to server-side native HLS. The scene page
is fetchable server-side and the xhcdn master m3u8 (variants + segments) is
time-bound, not IP-bound (verified cross-IP), so mobile plays the HLS direct
with zero proxy bandwidth. New tubes/xhamster.py pulls the master m3u8 from
SSR HTML and returns type='m3u8' mobile_direct; registry remaps xhamstercom
off _vps_blocked_fallback.

hqporner: add flyflv to the player-iframe host whitelist. hqporner rotated
some players to flyflv.com; the CDN host was already whitelisted but the iframe
host was not, so those scenes returned no stream.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 09:35:58 +02:00
jtrzupek
7f36865b5a fix(performer): tag chips → in-place horizontal filter selector
Follow-up to 1a4bf258 feedback (a627637b + 0264a3ff): the flexWrap chip list ate
too much vertical space and tapping navigated away to TagScenes. Rework: single-row
horizontal scroll of toggle-chips that filter the performer's scenes IN-PLACE
(performer_ids + tags in one listScenes query, no navigation). Selected chip is
highlighted with a ✕ affordance; tap again clears. One line tall instead of N rows.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 09:25:02 +02:00
jtrzupek
576a424615 fix(scripts): force UTF-8 stdout in publish_update — stop false exit-1
Final Polish-char print crashed with UnicodeEncodeError on Windows cp1252 stdout
AFTER a successful publish, making exit code 1 misleading. Reconfigure stdout/stderr
to UTF-8 up front.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 11:58:43 +02:00
jtrzupek
ffb80c7b60 feat(performer): replace dev Re-scrape button with top-tag chips
bug-report 1a4bf258: "Re-scrape mógłby zniknąć, za to tagi/kategorie by mogły".
Re-scrape was a dev-only bulk thumbnail/tag enrich — noise on the performer page
(per-scene enrich already happens on SceneDetail). Removed it; kept Search.

New GET /performers/{id}/tags aggregates scene_tags across the performer's
live-playback scenes (top N). PerformerScenes renders them as chips → tap navigates
to TagScenes. Search button widened to full row.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 11:56:26 +02:00
jtrzupek
f8b1e801ef fix(api): collapse same-origin playback sources on scene detail
A merged scene often aggregates several uploads from ONE tube (re-encodes / 4K
dups). bug-report aa79a995 "why 2 links, both porntrex?" = same scene std + 4K
(porntrex 2591377 + 2593449 "...in 4K"). In the UI these are indistinguishable
links to one hoster (same extractor). Keep one best per origin: prefer duration
matching the scene → any duration → first (origin-asc stable). Dead already filtered.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 11:50:45 +02:00
jtrzupek
65b9df073a fix(extractors): route sxylandcom through _embed_iframe, not webview fallback
Chrome-DevTools investigation of bug-report 827a50a1 (sxyland "long loading,
then webview, no autoplay") showed sxyland embeds playmogo.com/e/<id> — a
DoodStream clone (doodcdn.io infra, pass_md5 protocol, get_slides) behind an
INVISIBLE Cloudflare Turnstile (not an interactive CAPTCHA; auto-passes in a
real browser/WebView from a residential IP). The sxyland page itself is NOT
Turnstile-gated — VPS curl pulls the playmogo iframe URL straight from the HTML.

sxylandcom was wired to _vps_blocked_fallback → phone loaded the entire sxyland
page in WebView (ads, click-to-play, no autoplay = the reported symptom), and the
playmogo embed never reached the phone's dood resolver. _embed_iframe (which
already lists sxyland in its docstring) extracts the playmogo embed and emits it
as type='hoster' → PlayerScreen routes playmogo URLs to doodstream.ts (resolveDoodStream),
which resolves phone-side (phone IP passes invisible Turnstile) → direct mp4 → autoplay.

Mobile unchanged (hoster→dood path already exists for xmoviesforyou/siska). Backend-only.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 11:41:38 +02:00
jtrzupek
a9545a7ab2 feat(scripts): merge_exact_title_duration --playback-only + progress logging
--playback-only restricts to scenes with live playback (app-visible dupes only).
Progress print every 500 merges for long global runs.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 11:02:19 +02:00
jtrzupek
e23e2d1f17 fix(merge): move playback_sources on scene merge + exact-title+duration dedup
merge_scenes never reassigned playback_sources → ON DELETE CASCADE dropped them
with the absorbed scene. Cross-source (canonical) merges rarely had tube playback
so it hid, but tube-dup merges silently LOST playback links. Add _move_playback_sources
(global unique (origin,page_url) guarantees no collision on reassign).

+ merge_exact_title_duration.py: catches missing-merge dupes bulk_dedup misses
(same performer + identical normalized title + identical duration_sec, no phash).
Bad Bella had 25 such pairs (bug-report ef92809d "duplikat, te same miniatury").

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 10:56:50 +02:00
jtrzupek
8f34a3e2f1 fix(mobile): movie part picker as scrollable modal — Android showed only 3 of N
paradisehill multipart movies passed all N parts to Alert.alert, but Android's
native AlertDialog renders at most 3 buttons → a 35-part movie showed 3 (bug-report
2ebd0690 2026-06-07). Backend correctly returns all 35; the cap was client-side.
Reuse PlaybackQualityModal (now scrollable + title + preserveOrder props, hides
bogus "1p" for non-resolution labels). Also add the missing `raw` field to the
StreamLink type (backend sends it; part_label lives there).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 10:25:03 +02:00
jtrzupek
940d4872e3 fix(mobile): removeClippedSubviews=false on grids — stop thumbnails vanishing on scroll
Android FlatList defaults removeClippedSubviews=true, which detaches off-viewport
subviews; expo-image frequently fails to re-render them when they scroll back in →
blank thumbnails (bug-report f181d382 2026-06-07, recurring). Disable on all heavy
image grids: scene grids (Scenes/Site/Studio/Tag/Performer) + movie poster grids.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 10:18:48 +02:00
jtrzupek
d4b89f16e3 fix(scripts): backfill arg parser consumed --workers value as LIMIT
'--workers 3' set limit=3 because the bare '3' also hit the isdigit() branch.
Skip flag-value positions when scanning for a positional LIMIT.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 10:15:09 +02:00
jtrzupek
7bf1fd6716 fix(xvideos): parse model name from nested span.name — recover 0-performer scenes
xvideos renders the scene's models as `<a href="/models/slug">...<span class="name">
Display Name</span>...`. The old _MODEL_RE wanted text immediately after the anchor
`>` and never matched current markup → browse-scraped scenes landed with 0 performers
(bug-report 2026-06-07: "no actors, but Rebecca Johnson is on the page"). New regex
captures slug + nested span.name, bounded within the anchor. + backfill script for the
~11.9k existing zero-performer xvideos scenes (54% have a real /models/ link; resolver
merges names to canonical by name_normalized).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 10:13:21 +02:00
jtrzupek
edbffc0fa7 fix(mobile): boot diagnostic as breadcrumb, not event — silence GOON-Q noise
captureMessage('mobile boot OK', info) fired an event every launch → 171 events
/13 users polluting the Sentry issue list. Diagnostic served its purpose (SDK
confirmed sending). addBreadcrumb keeps boot context attached to real errors
without creating standalone issues.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 10:04:21 +02:00
jtrzupek
2b602beea5 fix(dedup): tighten cross-source candidate prefilter — kill 1800s hang (GOON-V)
_candidate used OR logic (studio OR date±7d OR dur±30s) → 938,950 pairs;
Etap-2 scoring at ~110/s never finished in 1800s → bulk_dedup_performers HUNG
every run, orphan thread leaked until restart. Require AND: same studio plus
(date±2d OR dur±30s). 939k→16k pairs, full run 213s. Real cross-source dup of
one master shares studio + near date/duration; rare studio_id-mismatch pairs
skipped on purpose — a job that COMPLETES beats one that times out merging nothing.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 10:03:33 +02:00
jtrzupek
cd257740be fix(hqporner): require ALL query tokens in slug — stop performer over-attribution
hqporner search post-filter kept a scene if its slug contained ANY query token
(>=3 chars). For multi-word performer names this matched on a single common token
(e.g. "anna","mia"), so the performer-driven ingest attributed the scene to EVERY
performer sharing that token — scenes accumulated up to 503 wrong performers
(hqporner = 5659 of 5897 scenes with >30 performers; bug-reports 2026-06-07).

Switch ANY->ALL: the slug must contain every query token, requiring a full name
match before attribution. Single-word names still work. Precision over recall —
144 wrong performers is far worse than missing a few loose matches.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 09:28:18 +02:00
jtrzupek
bc72515227 fix(player): drop "Tap for sound" pill — speaker toggle is enough
User feedback (2026-06-07, report 4bdca61e) on the prior mute change: the always-
visible "Tap for sound" pill is redundant — the 🔇/🔊 toggle in the top controls
is enough. Removed the pill (+ its styles); video still starts muted and the
speaker toggle unmutes.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 09:20:42 +02:00
jtrzupek
c5abdc1186 migration(0021): raise scene_tags.tag_id statistics target to 1000
Completes the literal-tag_id perf fix — the planner's MCV stats on tag_id are what
let it pick the index-walk for common tags. Default target (100) covers only the
top ~100 tags; 1000 extends correct cardinality estimates to mid-tier tags.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 21:12:22 +02:00
jtrzupek
43f7e1f7b2 perf(scenes): literal tag_id in filter — 4-12s tag lists -> ~20ms
Tag-filtered scene lists (e.g. blowjob + has_playback) took 4-12s. Root cause:
the filter joined scene_tags->tags on slug, so the actual tag_id was opaque to
the planner at plan time. It fell back to average per-tag cardinality
(8.4M/11541 ≈ 726) instead of the real 273k, chose to materialize ALL matching
scene_tags + check playback per row, then top-N sort.

Fix: resolve slug->tag_id in the app and filter on a LITERAL tag_id (no slug
join). With a constant, the planner uses MCV stats, knows the tag is huge, and
walks ix_scenes_created_at_desc probing scene_tags/playback per scene, stopping
at the page limit. Verified: blowjob list 3300ms -> 18ms (EXPLAIN), HTTP 4-12s ->
47ms. Unknown slug short-circuits to empty. (Pairs with the raised tag_id
statistics target so mid-tier tags also get correct estimates.)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 21:10:31 +02:00
jtrzupek
d52641774d perf(scenes): light list payload — drop tags/refs, slim playback to thumbnail
Scene list returned the full SceneOut per item (nested tags/external_refs + all
playback_sources with page_url/embed/stream/quality) though SceneTile only reads
the thumbnail + title/duration/performer/studio, and SceneDetail re-fetches the
full scene via /scenes/{id}. Added light=True to _build_scenes_out_batch: skip the
tags + external_refs queries entirely and collapse playback_sources to one slim
entry (thumbnail_url + animated_thumbnail_url only).

Result: default list payload 78KB->48KB (-38%), ~28ms cached, less DB work per
list. Verified on emulator: grid thumbnails/durations/titles render unchanged.
No mobile change (tile reads the same fields); server-side, no OTA.

NOTE: the separate slow path — common-tag-filtered lists (4-12s; query expands all
matching scene_tags before sort/limit) — is structural (needs a denormalized
(tag_id, created_at) index) and deferred. VACUUM ANALYZE + raised tag_id stats
applied but the planner still can't avoid the materialization.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 21:03:26 +02:00
jtrzupek
9f46e8dea9 feat(scripts): dedup_n2_canonical — resolve n=2 false-merges via canonical duration
audit_false_merges only auto-fixes n>=3 (majority disambiguates the outlier); n=2
was "needs human review" — but the merge-review UI is gone, nobody triages 500+.
Measured: of 535 n=2 duration-divergent scenes, ALL have a canonical scene.duration_sec
(TPDB/StashDB) and 531 have exactly one source matching canonical (±20%) + one >2x off
→ unambiguous false-merge. Kill the off source (works both directions since canonical is
corroborated by the matching keeper, unlike the Omar-case the n>=3 audit guards against).

Applied: 529 sources marked dead (4 ambiguous skipped). Reversible (dead_at).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 20:25:10 +02:00
jtrzupek
4922646011 feat(dedup): merge exact-phash + same-duration + shared-performer duplicates
bug-report 2026-06-03 ("ten sam czas, ta sama miniaturka, czemu się nie mergują"):
duplicate scenes not merged at ingest. Exact phash alone is noisy here (95% are
collisions on shared thumbnails/intro frames — different scenes; bulk_dedup scorer
correctly gives 0 auto-merge). The safe subset is exact-phash AND same duration
(±3s) AND shared performer/title — near-certain same scene. Same-duration is key:
it excludes the false-merge pattern (short-clip-vs-full has DIFFERING durations).

- scripts/merge_phash_exact_dupes.py: one-off, dry-run by default, per-pair re-fetch
  (handles clusters). Applied: 30 merged.
- bulk_dedup: add `_pairs_exact_phash` (SQL O(N log N), not the O(N²) Hamming scan)
  + strategy "phash_exact" — gated by the normal scorer (surfaces review candidates,
  no risky auto-merge), schedulable for ongoing exact-collision review.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 20:08:06 +02:00
jtrzupek
d5409d01ce feat(scripts): audit_teaser_only — hide scenes whose only source is a teaser
bug-report 2026-06-01 (48d6cc6b): scene shows canonical duration from TPDB
(real 22min studio scene) but the only live playback_source is a short tube
teaser (xnxx 21s) → "shows 22m, plays <1m". When ALL live sources are a tiny
fraction (<15%) of a known canonical (>300s), the scene has no real playback;
mark those sources dead → scene becomes orphan → hidden (has_playback=false),
consistent with the orphan-hiding policy. Reversible (dead_at), conservative
(skips scenes with any unknown-duration or full-length live source).

Applied on prod: 182 sources dead across 174 scenes.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 19:52:44 +02:00
jtrzupek
63880feeb1 fix(mobile/api): handle 204 in request() — "Mark as invalid" false failure
The generic request<T>() always called res.json(), which throws on a 204 No
Content body. mark-dead endpoints (scene + movie "Mark as invalid"/broken)
return 204, so the call threw AFTER the backend had already marked the source
dead → user saw a "Failed" alert and the list didn't refresh, even though the
mark succeeded server-side (bug-reports 2026-05-28 Voe, 2026-06-03 scene
1e8dc190). Return undefined for 204 before parsing JSON.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 19:24:29 +02:00
jtrzupek
a196fcbcdb refactor(ingest): rename scraper Source name "pornapp" -> "tube-scraper"
The umbrella Source.name for all direct tube scrapers (deep-crawl, browse-latest,
performer-driven) was "pornapp" — a misleading leftover from the removed external
porn-app API. It read like a dependency on a third-party "pornapp" service; it is
not — these are our own scrapers hitting 25+ tubes directly (kind=scraper,
origin tube:<sitetag>). Renamed to "tube-scraper" via a single SCRAPER_SOURCE_NAME
constant; DB row renamed in place (UPDATE name, same id) so all ingest_runs +
external_records history stays linked. No behavior change — external_id keying
(sitetag:url) and dedup are unaffected.

NOTE: playback_sources.origin "pornapp:<sitetag>" prefix is a separate legacy
format (resolve_playback parses it) and is intentionally left untouched.

Verified on prod: row renamed (0 stray "pornapp"), new runs land on "tube-scraper".

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 16:54:55 +02:00
jtrzupek
3339d3cd14 fix(playback): recognize luluvids.top/cdnstream/cdnvids as P.A.C.K.E.R. hosters
mypornerleak embeds luluvids.top (+ cdnstream.top/cdnvids.top) which are
luluvid/streamwish forks on new TLDs, all confirmed P.A.C.K.E.R.-JWPlayer. They
were missing from PACKER_HOSTS, so isPackerHoster() returned false → the phone-
side packer resolver never ran → WebView fallback landed on luluvids.top's
"disable Adblock and enable popup" wall (bug-report 2026-06-07, scene 75aa3316).
filemoon variant (bysezoxexe.com) was already covered.

Verified on emulator (live OTA): mypornerleak source → luluvids.top resolves
phone-side → native ExoPlayer PLAYING (position advancing), no adblock wall.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 16:23:22 +02:00
jtrzupek
b18f07d90e feat(playback): native pornxp.ph via phone-side resolver (kills black screen)
pornxp.ph serves direct <source> mp4 (360/720/1080p) on st.pornxp.sh whose path
token is IP-bound to whoever fetched the PAGE (verified 2026-06-07: VPS-resolved
URL → 403 cross-IP). Backend resolve was therefore impossible, so pornxpph fell
to the WebView fallback which black-screened (bug-report fd06cd86).

Fix: resolve on-device (same pattern as getfileResolver/doodstream) — the phone
fetches the page, so tokens bind to the phone IP and play natively. New
pornxpResolver.ts extracts the <source> mp4s into multi-quality StreamLinks;
SceneDetail short-circuits tube:pornxpph to it before backend resolve, feeding
the existing quality-picker + native player.

Verified on emulator (live OTA): pornxpph scene → quality picker (1080/720/360)
→ native playback PLAYING (no WebView, no ads, no black screen).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 14:58:40 +02:00
jtrzupek
8c0edbdf7b fix(playback): mark deleted sxyprn posts dead + rank native sources first
Two bug-report fixes (2026-06-07):
- sxyprn returns HTTP 200 "Post Not Found" for deleted posts (soft-404), so the
  extractor returned None → resolve treated it as transient and never marked the
  source dead, leaving a dead link offered forever. Now raise HosterDead on the
  marker so resolve marks it dead.
- Scene playback sources were ordered alphabetically by origin, so a WebView-
  fallback hoster (fpoxxx, IP-bound + ad-heavy) ranked above a working native
  source (freshporno) on the same scene. Add is_vps_blocked_fallback() and sort
  native-resolve origins ahead of WebView-fallback ones.

Verified on prod: sxyprn dead URL → HosterDead; scene sources reorder
freshpornoorg before fpoxxx.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 14:09:01 +02:00
jtrzupek
4d14f3946b feat(player): start muted, unmute via button (autoplay-friendly)
Scenes/movies now start with sound OFF; user enables audio via a control
(UX request). NativeVideoPlayer: useVideoPlayer starts muted=true + speaker
toggle in top controls + always-visible "Tap for sound" pill while muted.
WebView path: injected autoplay sets muted=true (also makes muted autoplay
reliable per browser policy → faster CDN extraction); host player controls
handle unmute when the WebView is the actual surface.

Verified on emulator against the live runtime-1.1 OTA bundle: video starts
muted (pill shown), tap unmutes (pill clears).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 14:03:52 +02:00