Commit graph

152 commits

Author SHA1 Message Date
jtrzupek
3e8a221981 feat(extractors): native HLS for xhamster; hqporner flyflv player
xhamster: move from WebView fallback to server-side native HLS. The scene page
is fetchable server-side and the xhcdn master m3u8 (variants + segments) is
time-bound, not IP-bound (verified cross-IP), so mobile plays the HLS direct
with zero proxy bandwidth. New tubes/xhamster.py pulls the master m3u8 from
SSR HTML and returns type='m3u8' mobile_direct; registry remaps xhamstercom
off _vps_blocked_fallback.

hqporner: add flyflv to the player-iframe host whitelist. hqporner rotated
some players to flyflv.com; the CDN host was already whitelisted but the iframe
host was not, so those scenes returned no stream.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 09:35:58 +02:00
jtrzupek
7f36865b5a fix(performer): tag chips → in-place horizontal filter selector
Follow-up to 1a4bf258 feedback (a627637b + 0264a3ff): the flexWrap chip list ate
too much vertical space and tapping navigated away to TagScenes. Rework: single-row
horizontal scroll of toggle-chips that filter the performer's scenes IN-PLACE
(performer_ids + tags in one listScenes query, no navigation). Selected chip is
highlighted with a ✕ affordance; tap again clears. One line tall instead of N rows.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 09:25:02 +02:00
jtrzupek
576a424615 fix(scripts): force UTF-8 stdout in publish_update — stop false exit-1
Final Polish-char print crashed with UnicodeEncodeError on Windows cp1252 stdout
AFTER a successful publish, making exit code 1 misleading. Reconfigure stdout/stderr
to UTF-8 up front.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 11:58:43 +02:00
jtrzupek
ffb80c7b60 feat(performer): replace dev Re-scrape button with top-tag chips
bug-report 1a4bf258: "Re-scrape mógłby zniknąć, za to tagi/kategorie by mogły".
Re-scrape was a dev-only bulk thumbnail/tag enrich — noise on the performer page
(per-scene enrich already happens on SceneDetail). Removed it; kept Search.

New GET /performers/{id}/tags aggregates scene_tags across the performer's
live-playback scenes (top N). PerformerScenes renders them as chips → tap navigates
to TagScenes. Search button widened to full row.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 11:56:26 +02:00
jtrzupek
f8b1e801ef fix(api): collapse same-origin playback sources on scene detail
A merged scene often aggregates several uploads from ONE tube (re-encodes / 4K
dups). bug-report aa79a995 "why 2 links, both porntrex?" = same scene std + 4K
(porntrex 2591377 + 2593449 "...in 4K"). In the UI these are indistinguishable
links to one hoster (same extractor). Keep one best per origin: prefer duration
matching the scene → any duration → first (origin-asc stable). Dead already filtered.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 11:50:45 +02:00
jtrzupek
65b9df073a fix(extractors): route sxylandcom through _embed_iframe, not webview fallback
Chrome-DevTools investigation of bug-report 827a50a1 (sxyland "long loading,
then webview, no autoplay") showed sxyland embeds playmogo.com/e/<id> — a
DoodStream clone (doodcdn.io infra, pass_md5 protocol, get_slides) behind an
INVISIBLE Cloudflare Turnstile (not an interactive CAPTCHA; auto-passes in a
real browser/WebView from a residential IP). The sxyland page itself is NOT
Turnstile-gated — VPS curl pulls the playmogo iframe URL straight from the HTML.

sxylandcom was wired to _vps_blocked_fallback → phone loaded the entire sxyland
page in WebView (ads, click-to-play, no autoplay = the reported symptom), and the
playmogo embed never reached the phone's dood resolver. _embed_iframe (which
already lists sxyland in its docstring) extracts the playmogo embed and emits it
as type='hoster' → PlayerScreen routes playmogo URLs to doodstream.ts (resolveDoodStream),
which resolves phone-side (phone IP passes invisible Turnstile) → direct mp4 → autoplay.

Mobile unchanged (hoster→dood path already exists for xmoviesforyou/siska). Backend-only.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 11:41:38 +02:00
jtrzupek
a9545a7ab2 feat(scripts): merge_exact_title_duration --playback-only + progress logging
--playback-only restricts to scenes with live playback (app-visible dupes only).
Progress print every 500 merges for long global runs.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 11:02:19 +02:00
jtrzupek
e23e2d1f17 fix(merge): move playback_sources on scene merge + exact-title+duration dedup
merge_scenes never reassigned playback_sources → ON DELETE CASCADE dropped them
with the absorbed scene. Cross-source (canonical) merges rarely had tube playback
so it hid, but tube-dup merges silently LOST playback links. Add _move_playback_sources
(global unique (origin,page_url) guarantees no collision on reassign).

+ merge_exact_title_duration.py: catches missing-merge dupes bulk_dedup misses
(same performer + identical normalized title + identical duration_sec, no phash).
Bad Bella had 25 such pairs (bug-report ef92809d "duplikat, te same miniatury").

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 10:56:50 +02:00
jtrzupek
8f34a3e2f1 fix(mobile): movie part picker as scrollable modal — Android showed only 3 of N
paradisehill multipart movies passed all N parts to Alert.alert, but Android's
native AlertDialog renders at most 3 buttons → a 35-part movie showed 3 (bug-report
2ebd0690 2026-06-07). Backend correctly returns all 35; the cap was client-side.
Reuse PlaybackQualityModal (now scrollable + title + preserveOrder props, hides
bogus "1p" for non-resolution labels). Also add the missing `raw` field to the
StreamLink type (backend sends it; part_label lives there).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 10:25:03 +02:00
jtrzupek
940d4872e3 fix(mobile): removeClippedSubviews=false on grids — stop thumbnails vanishing on scroll
Android FlatList defaults removeClippedSubviews=true, which detaches off-viewport
subviews; expo-image frequently fails to re-render them when they scroll back in →
blank thumbnails (bug-report f181d382 2026-06-07, recurring). Disable on all heavy
image grids: scene grids (Scenes/Site/Studio/Tag/Performer) + movie poster grids.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 10:18:48 +02:00
jtrzupek
d4b89f16e3 fix(scripts): backfill arg parser consumed --workers value as LIMIT
'--workers 3' set limit=3 because the bare '3' also hit the isdigit() branch.
Skip flag-value positions when scanning for a positional LIMIT.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 10:15:09 +02:00
jtrzupek
7bf1fd6716 fix(xvideos): parse model name from nested span.name — recover 0-performer scenes
xvideos renders the scene's models as `<a href="/models/slug">...<span class="name">
Display Name</span>...`. The old _MODEL_RE wanted text immediately after the anchor
`>` and never matched current markup → browse-scraped scenes landed with 0 performers
(bug-report 2026-06-07: "no actors, but Rebecca Johnson is on the page"). New regex
captures slug + nested span.name, bounded within the anchor. + backfill script for the
~11.9k existing zero-performer xvideos scenes (54% have a real /models/ link; resolver
merges names to canonical by name_normalized).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 10:13:21 +02:00
jtrzupek
edbffc0fa7 fix(mobile): boot diagnostic as breadcrumb, not event — silence GOON-Q noise
captureMessage('mobile boot OK', info) fired an event every launch → 171 events
/13 users polluting the Sentry issue list. Diagnostic served its purpose (SDK
confirmed sending). addBreadcrumb keeps boot context attached to real errors
without creating standalone issues.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 10:04:21 +02:00
jtrzupek
2b602beea5 fix(dedup): tighten cross-source candidate prefilter — kill 1800s hang (GOON-V)
_candidate used OR logic (studio OR date±7d OR dur±30s) → 938,950 pairs;
Etap-2 scoring at ~110/s never finished in 1800s → bulk_dedup_performers HUNG
every run, orphan thread leaked until restart. Require AND: same studio plus
(date±2d OR dur±30s). 939k→16k pairs, full run 213s. Real cross-source dup of
one master shares studio + near date/duration; rare studio_id-mismatch pairs
skipped on purpose — a job that COMPLETES beats one that times out merging nothing.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 10:03:33 +02:00
jtrzupek
cd257740be fix(hqporner): require ALL query tokens in slug — stop performer over-attribution
hqporner search post-filter kept a scene if its slug contained ANY query token
(>=3 chars). For multi-word performer names this matched on a single common token
(e.g. "anna","mia"), so the performer-driven ingest attributed the scene to EVERY
performer sharing that token — scenes accumulated up to 503 wrong performers
(hqporner = 5659 of 5897 scenes with >30 performers; bug-reports 2026-06-07).

Switch ANY->ALL: the slug must contain every query token, requiring a full name
match before attribution. Single-word names still work. Precision over recall —
144 wrong performers is far worse than missing a few loose matches.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 09:28:18 +02:00
jtrzupek
bc72515227 fix(player): drop "Tap for sound" pill — speaker toggle is enough
User feedback (2026-06-07, report 4bdca61e) on the prior mute change: the always-
visible "Tap for sound" pill is redundant — the 🔇/🔊 toggle in the top controls
is enough. Removed the pill (+ its styles); video still starts muted and the
speaker toggle unmutes.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 09:20:42 +02:00
jtrzupek
c5abdc1186 migration(0021): raise scene_tags.tag_id statistics target to 1000
Completes the literal-tag_id perf fix — the planner's MCV stats on tag_id are what
let it pick the index-walk for common tags. Default target (100) covers only the
top ~100 tags; 1000 extends correct cardinality estimates to mid-tier tags.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 21:12:22 +02:00
jtrzupek
43f7e1f7b2 perf(scenes): literal tag_id in filter — 4-12s tag lists -> ~20ms
Tag-filtered scene lists (e.g. blowjob + has_playback) took 4-12s. Root cause:
the filter joined scene_tags->tags on slug, so the actual tag_id was opaque to
the planner at plan time. It fell back to average per-tag cardinality
(8.4M/11541 ≈ 726) instead of the real 273k, chose to materialize ALL matching
scene_tags + check playback per row, then top-N sort.

Fix: resolve slug->tag_id in the app and filter on a LITERAL tag_id (no slug
join). With a constant, the planner uses MCV stats, knows the tag is huge, and
walks ix_scenes_created_at_desc probing scene_tags/playback per scene, stopping
at the page limit. Verified: blowjob list 3300ms -> 18ms (EXPLAIN), HTTP 4-12s ->
47ms. Unknown slug short-circuits to empty. (Pairs with the raised tag_id
statistics target so mid-tier tags also get correct estimates.)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 21:10:31 +02:00
jtrzupek
d52641774d perf(scenes): light list payload — drop tags/refs, slim playback to thumbnail
Scene list returned the full SceneOut per item (nested tags/external_refs + all
playback_sources with page_url/embed/stream/quality) though SceneTile only reads
the thumbnail + title/duration/performer/studio, and SceneDetail re-fetches the
full scene via /scenes/{id}. Added light=True to _build_scenes_out_batch: skip the
tags + external_refs queries entirely and collapse playback_sources to one slim
entry (thumbnail_url + animated_thumbnail_url only).

Result: default list payload 78KB->48KB (-38%), ~28ms cached, less DB work per
list. Verified on emulator: grid thumbnails/durations/titles render unchanged.
No mobile change (tile reads the same fields); server-side, no OTA.

NOTE: the separate slow path — common-tag-filtered lists (4-12s; query expands all
matching scene_tags before sort/limit) — is structural (needs a denormalized
(tag_id, created_at) index) and deferred. VACUUM ANALYZE + raised tag_id stats
applied but the planner still can't avoid the materialization.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 21:03:26 +02:00
jtrzupek
9f46e8dea9 feat(scripts): dedup_n2_canonical — resolve n=2 false-merges via canonical duration
audit_false_merges only auto-fixes n>=3 (majority disambiguates the outlier); n=2
was "needs human review" — but the merge-review UI is gone, nobody triages 500+.
Measured: of 535 n=2 duration-divergent scenes, ALL have a canonical scene.duration_sec
(TPDB/StashDB) and 531 have exactly one source matching canonical (±20%) + one >2x off
→ unambiguous false-merge. Kill the off source (works both directions since canonical is
corroborated by the matching keeper, unlike the Omar-case the n>=3 audit guards against).

Applied: 529 sources marked dead (4 ambiguous skipped). Reversible (dead_at).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 20:25:10 +02:00
jtrzupek
4922646011 feat(dedup): merge exact-phash + same-duration + shared-performer duplicates
bug-report 2026-06-03 ("ten sam czas, ta sama miniaturka, czemu się nie mergują"):
duplicate scenes not merged at ingest. Exact phash alone is noisy here (95% are
collisions on shared thumbnails/intro frames — different scenes; bulk_dedup scorer
correctly gives 0 auto-merge). The safe subset is exact-phash AND same duration
(±3s) AND shared performer/title — near-certain same scene. Same-duration is key:
it excludes the false-merge pattern (short-clip-vs-full has DIFFERING durations).

- scripts/merge_phash_exact_dupes.py: one-off, dry-run by default, per-pair re-fetch
  (handles clusters). Applied: 30 merged.
- bulk_dedup: add `_pairs_exact_phash` (SQL O(N log N), not the O(N²) Hamming scan)
  + strategy "phash_exact" — gated by the normal scorer (surfaces review candidates,
  no risky auto-merge), schedulable for ongoing exact-collision review.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 20:08:06 +02:00
jtrzupek
d5409d01ce feat(scripts): audit_teaser_only — hide scenes whose only source is a teaser
bug-report 2026-06-01 (48d6cc6b): scene shows canonical duration from TPDB
(real 22min studio scene) but the only live playback_source is a short tube
teaser (xnxx 21s) → "shows 22m, plays <1m". When ALL live sources are a tiny
fraction (<15%) of a known canonical (>300s), the scene has no real playback;
mark those sources dead → scene becomes orphan → hidden (has_playback=false),
consistent with the orphan-hiding policy. Reversible (dead_at), conservative
(skips scenes with any unknown-duration or full-length live source).

Applied on prod: 182 sources dead across 174 scenes.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 19:52:44 +02:00
jtrzupek
63880feeb1 fix(mobile/api): handle 204 in request() — "Mark as invalid" false failure
The generic request<T>() always called res.json(), which throws on a 204 No
Content body. mark-dead endpoints (scene + movie "Mark as invalid"/broken)
return 204, so the call threw AFTER the backend had already marked the source
dead → user saw a "Failed" alert and the list didn't refresh, even though the
mark succeeded server-side (bug-reports 2026-05-28 Voe, 2026-06-03 scene
1e8dc190). Return undefined for 204 before parsing JSON.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 19:24:29 +02:00
jtrzupek
a196fcbcdb refactor(ingest): rename scraper Source name "pornapp" -> "tube-scraper"
The umbrella Source.name for all direct tube scrapers (deep-crawl, browse-latest,
performer-driven) was "pornapp" — a misleading leftover from the removed external
porn-app API. It read like a dependency on a third-party "pornapp" service; it is
not — these are our own scrapers hitting 25+ tubes directly (kind=scraper,
origin tube:<sitetag>). Renamed to "tube-scraper" via a single SCRAPER_SOURCE_NAME
constant; DB row renamed in place (UPDATE name, same id) so all ingest_runs +
external_records history stays linked. No behavior change — external_id keying
(sitetag:url) and dedup are unaffected.

NOTE: playback_sources.origin "pornapp:<sitetag>" prefix is a separate legacy
format (resolve_playback parses it) and is intentionally left untouched.

Verified on prod: row renamed (0 stray "pornapp"), new runs land on "tube-scraper".

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 16:54:55 +02:00
jtrzupek
3339d3cd14 fix(playback): recognize luluvids.top/cdnstream/cdnvids as P.A.C.K.E.R. hosters
mypornerleak embeds luluvids.top (+ cdnstream.top/cdnvids.top) which are
luluvid/streamwish forks on new TLDs, all confirmed P.A.C.K.E.R.-JWPlayer. They
were missing from PACKER_HOSTS, so isPackerHoster() returned false → the phone-
side packer resolver never ran → WebView fallback landed on luluvids.top's
"disable Adblock and enable popup" wall (bug-report 2026-06-07, scene 75aa3316).
filemoon variant (bysezoxexe.com) was already covered.

Verified on emulator (live OTA): mypornerleak source → luluvids.top resolves
phone-side → native ExoPlayer PLAYING (position advancing), no adblock wall.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 16:23:22 +02:00
jtrzupek
b18f07d90e feat(playback): native pornxp.ph via phone-side resolver (kills black screen)
pornxp.ph serves direct <source> mp4 (360/720/1080p) on st.pornxp.sh whose path
token is IP-bound to whoever fetched the PAGE (verified 2026-06-07: VPS-resolved
URL → 403 cross-IP). Backend resolve was therefore impossible, so pornxpph fell
to the WebView fallback which black-screened (bug-report fd06cd86).

Fix: resolve on-device (same pattern as getfileResolver/doodstream) — the phone
fetches the page, so tokens bind to the phone IP and play natively. New
pornxpResolver.ts extracts the <source> mp4s into multi-quality StreamLinks;
SceneDetail short-circuits tube:pornxpph to it before backend resolve, feeding
the existing quality-picker + native player.

Verified on emulator (live OTA): pornxpph scene → quality picker (1080/720/360)
→ native playback PLAYING (no WebView, no ads, no black screen).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 14:58:40 +02:00
jtrzupek
8c0edbdf7b fix(playback): mark deleted sxyprn posts dead + rank native sources first
Two bug-report fixes (2026-06-07):
- sxyprn returns HTTP 200 "Post Not Found" for deleted posts (soft-404), so the
  extractor returned None → resolve treated it as transient and never marked the
  source dead, leaving a dead link offered forever. Now raise HosterDead on the
  marker so resolve marks it dead.
- Scene playback sources were ordered alphabetically by origin, so a WebView-
  fallback hoster (fpoxxx, IP-bound + ad-heavy) ranked above a working native
  source (freshporno) on the same scene. Add is_vps_blocked_fallback() and sort
  native-resolve origins ahead of WebView-fallback ones.

Verified on prod: sxyprn dead URL → HosterDead; scene sources reorder
freshpornoorg before fpoxxx.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 14:09:01 +02:00
jtrzupek
4d14f3946b feat(player): start muted, unmute via button (autoplay-friendly)
Scenes/movies now start with sound OFF; user enables audio via a control
(UX request). NativeVideoPlayer: useVideoPlayer starts muted=true + speaker
toggle in top controls + always-visible "Tap for sound" pill while muted.
WebView path: injected autoplay sets muted=true (also makes muted autoplay
reliable per browser policy → faster CDN extraction); host player controls
handle unmute when the WebView is the actual surface.

Verified on emulator against the live runtime-1.1 OTA bundle: video starts
muted (pill shown), tap unmutes (pill clears).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 14:03:52 +02:00
jtrzupek
9d0cb7f26e fix(scheduler): bulk_dedup performers cross_source_only + hard-timeout (OOM)
_job_bulk_dedup_performers called run_bulk_dedup(strategy="performers") without
the cross_source_only guard whose docstring exists precisely to prevent this OOM.
At current catalog scale the unguarded path materializes N²/2 pairs per prolific
performer into a list → worker hit 6GB RSS and was OOM-killed every 12h (05:00/
17:00), taking down concurrent tpdb/stashdb/movie ingests as killed_by_restart
(0 new movies). Verified in prod: 05:00 run now completes (885k pairs scored, no
OOM) and ingests succeed (stashdb +241, tpdb +175).

Also wrap in _run_with_timeout like tpdb/stashdb (job had no hard-timeout).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 11:00:19 +02:00
jtrzupek
fad72e9cd6 fix(tags): merge <base>2 numbered-duplicate tags + prevent regeneration
TPDB taxonomy emits numbered-duplicate tags (name "Bubble Butt2"); slugify
yields "bubble-butt2" (no separator before digit), so resolve_tag created a
separate tag alongside "bubble-butt". Tube scenes inherited the dup via
scene-merge → 75 pairs, ~10k scene_tags on the wrong tag.

- resolve_tag: canonicalize "<base>2" -> "<base>" when base exists (handles
  current + future; trailing-"2"+alpha guard leaves milf-30/teen18 intact)
- scripts/merge_dup2_tags.py: one-off bulk merge (scene_tags + movie_tags +
  blacklist) and taxonomy-count refresh

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-06 23:18:44 +02:00
jtrzupek
3cbfb1d490 fix(db): set shm_size 1g — parallel queries overflow default 64MB /dev/shm
Postgres parallel workers (e.g. sitemap_index) need >64MB shared memory;
Docker's default /dev/shm cap raised DiskFull ("No space left on device").

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-06 22:56:49 +02:00
jtrzupek
210aec0536 feat(scrapers): extract tags + description from porndish scene pages
porndish-only scenes had no tags and no description — the scraper only derived a
title from the URL slug. The scene page (g1/bimber WP theme) carries both: a
<p class="entry-tags"> list of /video2/<slug>/ links (the "#" tags the user sees,
categories + co-performers) and a prose description <p> in .entry-content.

Override _fetch_scene_metadata in PornDishScraper to pull both from one page
fetch. Extend the base hook to accept an optional 4th return element
(description) and thread it into RawScene.description — backward compatible with
the existing 3-tuple (pornhat). Strips leading embed-button labels
("Video Player N", "Server N") from the prose. Verified on live scenes: clean
tag lists + real descriptions.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-06 21:32:10 +02:00
jtrzupek
77323d23e6 fix(playback): retry DoodStream/playmogo resolve, handle "RELOAD" token response
porndish scenes resolve only to playmogo.com embeds, which are DoodStream clones
(doodcdn.io + pass_md5 + Cloudflare Turnstile). The mobile resolver already
supported playmogo, but DoodStream is flaky from a single shot: the embed is
sometimes Turnstile-gated (no pass_md5), and the pass_md5 endpoint intermittently
returns the literal string "RELOAD" (stale/consumed token) instead of a base URL.
The old code built "RELOAD<suffix>?token=..." -> ExoPlayer "no extractors" ->
WebView -> loading forever (bug 62e78c9a).

Wrap resolveDoodStream in a 3-attempt retry that re-fetches the embed (fresh
token) on retryable failures (gate / RELOAD / empty / stale token), and reject a
non-http pass_md5 body as retryable instead of building a garbage URL. Verified
cross-IP that the pass_md5 -> base -> final flow yields 206 video/mp4 when not
gated; real carrier IPs are gated far less than the test proxy. Strict
improvement: worst case is the existing WebView fallback, best case native play.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-06 21:14:26 +02:00
jtrzupek
83918e9a8d perf(movies+scenes): direct-play #hash movie hosters; skip empty blacklist filters
Movies: the seekplayer-engine family (easyvidplayer/player4me/seekplayer/
embedseek/upns, ~322k sources) returns a time-bound master.m3u8 on a CDN with a
valid IP-SAN cert that plays cross-IP. Mark it mobile_direct in resolve, and make
MovieDetailScreen prefer direct_url with a proxy fallback (mirrors the scene
path) — previously every movie streamed through the VPS proxy. Paradisehill
multipart parts now go direct too. Device-verified: ExoPlayer plays the raw CDN
direct, zero proxy traffic, no flicker.

Scenes: the three blacklist NOT EXISTS clauses were appended to every filtered
list and evaluated per-row even when all blacklist tables are empty (~3.4s tax on
a deep mega-tag walk). Skip them when the tables are empty (cached check) —
mega-tag list 6.7s -> 3.3s, and every filtered list benefits.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-06 19:44:41 +02:00
jtrzupek
a5ec6ca991 mobile: page-side get_file resolve for hdporngg/fullmovies (native, no proxy/flicker)
Device logs (not assumptions) pinned the real cause of the hdporngg/fullmovies
flicker: the backend returns a get_file URL, but get_file is bound to the IP that
loaded the *page*. The backend (VPS) loads the page, so the get_file is VPS-bound;
the phone fetching that get_file gets HTTP 410 -> ExoPlayer errors -> falls back to
the proxy via nav.replace (the "flicker"), and ends up streaming through the proxy.
(My earlier "stateless/portable" test was from the VPS — same IP as the page load —
so it wrongly showed 206.)

Fix: when the direct_url is a get_file, the phone re-fetches the *page* itself
(resolveGetFilePage on source.page_url) so the get_file is bound to the phone IP,
picks the requested quality skipping 4K (dead on fpvcdn), follows to the CDN, and
hands ExoPlayer a working URL. On failure it keeps the original (proxy fallback).

Verified on device: [getfile] page-resolve -> get_file 206 -> ExoPlayer PLAYING,
position advancing, no error/proxy/flicker, real video frame rendered.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-06 00:16:19 +02:00
jtrzupek
e5b6e8968c mobile: resolve get_file redirect client-side (kills hdporngg flicker)
hdporn.gg/fullmovies.xxx return an unresolved get_file direct_url that 302-redirects
to fpvcdn.com with the requester IP baked in. The backend can't resolve it (would
bind fpvcdn to the VPS IP -> mobile 403), so the phone must follow the redirect. But
ExoPlayer errors on that cross-domain get_file->fpvcdn redirect (drops Referer / won't
complete it) -> the native player falls back to the proxy via nav.replace, which the
user sees as a screen-reload "flicker" before playback (and means it's actually playing
through the VPS proxy, not direct).

Fix: resolve the get_file 302 in JS on the phone (so fpvcdn binds to the phone IP)
before navigating to the player, and hand ExoPlayer the final fpvcdn URL directly —
no redirect, no error, no flicker, no proxy. Uses the same redirect:'manual' +
Location-header pattern as the doodstream resolver (works on RN Android). On resolve
failure it keeps the original get_file URL (current behaviour with proxy fallback).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 23:49:40 +02:00
jtrzupek
e780e1ae6f fix(hdporngg+fullmovies): native get_file, skip broken 4K — "loading forever"
User: "hdporngg loading forever". DevTools + cross-IP investigation (not guessing):
- site is alive (sample scenes 200; the one earlier 404 was a single removed video,
  not the site — my earlier "site dead" was a hasty generalization).
- both are the same platform (<source src=.../get_file/8512/...mp4>), no function/0.
- the get_file 302 is fast (~100ms) but the 2160p/4K source on fpvcdn.com TIMES OUT
  (~30s); 720p/480p resolve in ~1s. The player loading 4K first = the "loading forever".
- the final fpvcdn URL embeds the requester IP (ip=<fetcher>) -> IP-bound to whoever
  resolves it; BUT the get_file itself is stateless (fresh session works) and valid >=90s,
  and binds fpvcdn to the fetcher. So a VPS resolve would bind to the VPS IP (mobile 403),
  but returning the get_file URL UNRESOLVED lets the phone follow the 302 itself ->
  fpvcdn binds to the phone IP -> plays.

Fix: new _source_getfile resolver returns get_file URLs as mobile_direct (skip 4K),
phone resolves the 302 in-session. Native, multi-quality, no WebView, no proxy.
Replaces fullmovies' old force_proxy+4K extractor and the WebView fallback for both.
Backend-verified: resolve -> 720/480 mobile_direct, get_file fresh fetch -> 206. Pending
on-device confirmation (emulator unstable; same mechanism as porn00/freshporno which work).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 22:48:55 +02:00
jtrzupek
c05bafb4c7 fix(porn00): backend KVS resolve (portable CDN, no proxy) — corrects #20
Same proper re-investigation as freshporno (DevTools + Bright Data residential
cross-IP + curl_cffi browser TLS). porn00's final CDN fe.porn00.org/...?token=&expires=
is PORTABLE cross-IP (token resolved from one residential IP replays 206 from a
different Bright Data residential IP) and only rejects non-browser TLS (plain curl
403, curl_cffi chrome 206). In #20 I tested the final URL with a standalone plain
curl, got 403, wrongly concluded "IP-bound" and left it on WebView (and before that
it used force_proxy, which violated the no-proxy stance).

porn00 flashvars are plain get_file (already decoded, no function/0 prefix), so
extend _kvs._URL_RE to match both forms — real_url passes plain URLs through
unchanged, _resolve_get_file follows the 302 in-session. porn00.py becomes a thin
_kvs wrapper. Verified no regression for the function/0 tubes (yespornvip/pornditt/
freshporno still resolve 3x mp4). Result: porn00 native multi-quality, mobile_direct,
zero proxy/WebView.

fpoxxx and pornxp were re-tested the same way and ARE genuinely IP-bound (403 from a
different residential IP — their token binds to the resolver IP), so they correctly
stay on the WebView fallback.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 21:15:19 +02:00
jtrzupek
6e3ad870a7 fix(freshporno): backend KVS resolve (portable CDN) — corrects #20
Re-investigated with the proper method (Chrome DevTools network capture + cross-IP
test via Bright Data residential proxy + curl_cffi browser-TLS) instead of guessing.
freshporno's real flow is get_file -> 302 -> cdn4.freshporno.org/remote_control.php
-> 206 video/mp4. The CDN URL is PORTABLE cross-IP (a token generated from one
residential IP replays fine from the VPS and from a different Bright Data residential
IP), it only rejects non-browser TLS fingerprints (plain curl -> 000, curl_cffi
chrome / ExoPlayer -> 206).

In #20 I tested the final URL with a standalone plain curl, got 000, and wrongly
concluded "unreachable from residential" -> kept it on the WebView fallback, which
barely worked (ad-heavy page, flaky). That false negative is the regression the user
reported. freshporno is function/0 KVS, so _kvs.resolve_kvs (which uses curl_cffi
chrome) already decodes + resolves it to a portable mp4 — switch to backend resolve
like yespornvip/pornditt: native, multi-quality, no proxy, no WebView.

Verified: backend resolve returns 3x mp4 (1080/720/480, mobile_direct) + cdn 206;
user confirmed native playback on device.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 21:12:17 +02:00
jtrzupek
c18ed24330 extractors: register fullmoviesxxx + hdporngg (WebView fallback)
Bug 19866e9e ("problem z oboma hosterami"): a scene whose only two sources were
fullmovies.xxx and hdporn.gg wouldn't play at all — neither had an entry in the
extractor registry, so try_extract returned None ("no stream"). fullmovies.xxx
serves a <source ...get_file...mp4> but the get_file CDN times out from the VPS
(unreachable, like freshporno), so backend resolve isn't viable; hdporn.gg sample
pages 404. Route both through the WebView fallback so the phone (residential IP)
loads the page and plays / the injected-JS scrape can grab the URL — strictly
better than no playback path. Surfaced by the hoster sweep + this bug report.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-03 22:16:05 +02:00
jtrzupek
4f0fb1636c chore(scripts): tube SSR-richness survey probe
Ad-hoc research tool: for a list of candidate tubes, fetch a listing page, grab a scene
URL, and classify the detail — reachable / JSON-LD VideoObject / duration / performers /
tags. Used 2026-06-03 to evaluate deep-crawl candidates (redtube + drtuber look strong;
pornhub/spankbang/porntrex/hqporner/youporn rejected; nuvid/motherless bare).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-03 11:23:49 +02:00
jtrzupek
e42217773f feat(deep-crawl): xvideos browse source (capped) + per-tube page cap
xvideos SSR's JSON-LD VideoObject (duration/title/uploadDate) + on-page /models/ (perf)
+ /tags/. Sample: median ~10.5min, 93% >=3min. Pilot (2 pages): 29 new, 100% playable +
visible + tagged (performers sparse — xvideos 'new' is amateur-heavy; /models/ tagged
mostly on studio rips).

- XVideosBrowseScraper (JSON-LD + page-parse models/tags), in ALL_BROWSE_SCRAPERS.
- deep_crawl._PAGE_CAP: per-sitetag depth cap; xvideoscom=1800 (~newest 50k). At the cap
  the tube is marked exhausted (reset -> incremental re-sweep) so a mega-tube cannot
  monopolize the round-robin or balloon the DB.
- ported yesporn.py into the public repo (was prod-only, like hdporngg) ending the
  __init__ public/prod divergence.

youporn rejected: JSON-LD lacks actor/keywords, its /pornstar//category/ links are A-Z
nav not scene-specific. xhamster: 429/Cloudflare from the VPS IP.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-03 11:16:44 +02:00
jtrzupek
ee4915770f feat(deep-crawl): eporner via JSON API as SSR-rich source (Phase 2b alternative)
porntrex/hqporner rejected for deep-crawl: KVS sites with no SSR metadata (77% of
existing porntrex has no duration -> invisible under the app's >=60 filter). eporner
instead exposes a public JSON API (api/v2/video/search) returning title + length_sec
+ keywords + added per video; ~100k videos, ~100/page, no per-scene detail fetch.

- BaseBrowseScraper.crawl_page(page): factored out of latest_scenes; returns None
  (transient fail) / [] (catalog end) / [scenes]. API subclasses override it.
- deep_crawl drives via crawl_page (supports HTML-listing AND API sources).
- EpornerApiScraper: crawl_page hits the eporner API -> RawScene with duration+tags+
  date+thumb+playback; registered in ALL_BROWSE_SCRAPERS.
- Pilot (2 API pages): 192 new, 100% playable + tagged + visible (>=60); the <180s
  trailer filter dropped 6 short clips.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-03 10:37:20 +02:00
jtrzupek
0f19a61789 feat(ingest): skip <180s tube scenes (trailers) + purge porndoe trailer orphans
Deep-crawling tube catalogs pulls in lots of <3min trailers/teasers (porndoe). Add
min_ingest_duration_sec (default 180): _process_scene skips scraper-source scenes whose
known duration is below the floor (unknown duration kept; canonical TPDB/StashDB
untouched). Deleted 67 existing porndoe-only orphan trailers (<180s, no canonical, no
non-porndoe live playback) via cascade.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-03 10:11:25 +02:00
jtrzupek
7e46e5ac48 feat(scheduler): deep-crawl full tube catalogs (Phase 2a — ingest-all)
We ingested only ~3% of each browse tube's catalog (porndoe >62k scenes; we had 1959)
because tubes were hit only by performer-search + top-N browse. Pilot (porndoe pages
64-110): 1119 new scenes, 100% playable + 100% tagged, 0% canonical overlap (purely
additive — content not in TPDB/StashDB).

- app/scheduler/deep_crawl.py: round-robin over ALL_BROWSE_SCRAPERS, per-tube page cursor
  in app/_state/deepcrawl_state.json (no DB migration), deep-paginate from the cursor,
  idempotent (resolver skips known by raw_hash), mark 'exhausted' at catalog end then
  reset cursors for an incremental re-sweep.
- _job_deep_crawl: hourly, 60 pages/run (~1860 scenes, ~22 min), wrapped in the 1h
  hard-timeout; registered in build_scheduler (jobs=10).
- config: sched_deep_crawl_hours=1, deep_crawl_pages_per_run=60, deepcrawl_state_path.
- scripts/pilot_porndoe_deepcrawl.py: one-off pilot used to validate the approach.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-03 09:26:44 +02:00
jtrzupek
5e74195878 mobile: temporarily disable FLAG_SECURE (debug toggle)
Gated the expo-screen-capture preventScreenCaptureAsync call behind
SCREEN_CAPTURE_PROTECTION (currently false) so screenshots / screen recording
work during emulator debugging — FLAG_SECURE makes every screencap black, which
blocks on-device playback verification. Single-user phase; flip back to true
before wider distribution.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-02 21:45:13 +02:00
jtrzupek
58b355b6b5 fix(pornhub): WebView fallback — yt-dlp gets 403 from VPS
Hoster sweep (2026-06-02) found pornhub resolving to 0 sources: yt-dlp (current,
2026.03.17) gets HTTP 403 fetching the watch page from the Hetzner VPS, while the
other yt-dlp tubes (xvideos/xnxx/youporn/redtube) still work — so it's a
Pornhub-specific block of the server IP, not a yt-dlp regression. Route pornhub
through the WebView fallback so it plays from the phone's residential IP, same as
xhamster. 7.3k scenes affected.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-02 21:41:38 +02:00
jtrzupek
d4c4b79e92 fix(kvs): cap get_file timeout + early-break on dead scenes
Bug 6ec1960e: yespornvip "resolving forever". yesporn.vip moved to a
cdn4/remote_control.php CDN (still portable cross-IP — verified 206 from a
residential IP, so backend resolve stays correct). But when a video is removed
from the CDN the page still exists and each get_file 302-follow STALLS to the
full timeout. With the resolve timeout (60s) applied per quality variant, a dead
scene hung 3x60 = 180s and returned nothing -> the mobile resolve spinner never
ended.

Fix: a dedicated low get_file timeout (10s, separate from the page-fetch
timeout) and an early-break once 2 variants fail with no result so far (the
scene is dead on the CDN — no point waiting for the third). Dead scene now
resolves to None in ~20s instead of 180s; a live scene is unaffected (~0.8s,
3 sources). Applies to all KVS tubes (yespornvip + pornditt).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-02 21:33:05 +02:00
jtrzupek
08f901712c fix(scheduler): hard-timeout heavy jobs + periodic stuck-run reaper
At the shared 05:00 anchor all heavy jobs fire together; tpdb/stashdb/performer-driven
had no timeout, so a hung connector blocked the whole job and — with max_instances=1 —
blocked every future fire of that job until a worker restart (incident 2026-06-02: 6 runs
hung 8.7h, movie mirrors 47h stale, tube ingest stalled).

- _run_with_timeout wraps tpdb/stashdb/performer-driven in a 30-min hard cap (same
  ThreadPoolExecutor pattern movie-ingest already uses): on timeout the job returns and
  frees the scheduler slot; the orphaned thread lives until restart.
- _job_reap_stuck: hourly reaper of 'running' >2h rows, registered in the scheduler —
  the startup-only reaper missed hangs while the worker stayed up for hours.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-02 16:17:50 +02:00
jtrzupek
24fc790691 mobile: skeleton grid while scene lists load (perceived perf)
Scene-list screens showed a small spinner while waiting on the API, so a slow
list read felt like a blank stall. Replace the initial-load spinner on
ScenesScreen and TagScenesScreen with a SceneGridSkeleton — a 2-col grid of
pulsing placeholder tiles laid out 1:1 with SceneTile (16:9 thumb + title + meta
lines). It paints instantly with zero data, so the screen feels responsive even
when the query takes a moment, and the skeleton->content swap doesn't reflow.

Pairs with the backend list-count fix (most filtered lists are now ~0.1s); the
skeleton also masks the residual slow path (enormous tags) so it no longer reads
as a freeze.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-02 12:03:33 +02:00