The umbrella Source.name for all direct tube scrapers (deep-crawl, browse-latest,
performer-driven) was "pornapp" — a misleading leftover from the removed external
porn-app API. It read like a dependency on a third-party "pornapp" service; it is
not — these are our own scrapers hitting 25+ tubes directly (kind=scraper,
origin tube:<sitetag>). Renamed to "tube-scraper" via a single SCRAPER_SOURCE_NAME
constant; DB row renamed in place (UPDATE name, same id) so all ingest_runs +
external_records history stays linked. No behavior change — external_id keying
(sitetag:url) and dedup are unaffected.
NOTE: playback_sources.origin "pornapp:<sitetag>" prefix is a separate legacy
format (resolve_playback parses it) and is intentionally left untouched.
Verified on prod: row renamed (0 stray "pornapp"), new runs land on "tube-scraper".
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
mypornerleak embeds luluvids.top (+ cdnstream.top/cdnvids.top) which are
luluvid/streamwish forks on new TLDs, all confirmed P.A.C.K.E.R.-JWPlayer. They
were missing from PACKER_HOSTS, so isPackerHoster() returned false → the phone-
side packer resolver never ran → WebView fallback landed on luluvids.top's
"disable Adblock and enable popup" wall (bug-report 2026-06-07, scene 75aa3316).
filemoon variant (bysezoxexe.com) was already covered.
Verified on emulator (live OTA): mypornerleak source → luluvids.top resolves
phone-side → native ExoPlayer PLAYING (position advancing), no adblock wall.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
pornxp.ph serves direct <source> mp4 (360/720/1080p) on st.pornxp.sh whose path
token is IP-bound to whoever fetched the PAGE (verified 2026-06-07: VPS-resolved
URL → 403 cross-IP). Backend resolve was therefore impossible, so pornxpph fell
to the WebView fallback which black-screened (bug-report fd06cd86).
Fix: resolve on-device (same pattern as getfileResolver/doodstream) — the phone
fetches the page, so tokens bind to the phone IP and play natively. New
pornxpResolver.ts extracts the <source> mp4s into multi-quality StreamLinks;
SceneDetail short-circuits tube:pornxpph to it before backend resolve, feeding
the existing quality-picker + native player.
Verified on emulator (live OTA): pornxpph scene → quality picker (1080/720/360)
→ native playback PLAYING (no WebView, no ads, no black screen).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Two bug-report fixes (2026-06-07):
- sxyprn returns HTTP 200 "Post Not Found" for deleted posts (soft-404), so the
extractor returned None → resolve treated it as transient and never marked the
source dead, leaving a dead link offered forever. Now raise HosterDead on the
marker so resolve marks it dead.
- Scene playback sources were ordered alphabetically by origin, so a WebView-
fallback hoster (fpoxxx, IP-bound + ad-heavy) ranked above a working native
source (freshporno) on the same scene. Add is_vps_blocked_fallback() and sort
native-resolve origins ahead of WebView-fallback ones.
Verified on prod: sxyprn dead URL → HosterDead; scene sources reorder
freshpornoorg before fpoxxx.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Scenes/movies now start with sound OFF; user enables audio via a control
(UX request). NativeVideoPlayer: useVideoPlayer starts muted=true + speaker
toggle in top controls + always-visible "Tap for sound" pill while muted.
WebView path: injected autoplay sets muted=true (also makes muted autoplay
reliable per browser policy → faster CDN extraction); host player controls
handle unmute when the WebView is the actual surface.
Verified on emulator against the live runtime-1.1 OTA bundle: video starts
muted (pill shown), tap unmutes (pill clears).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
_job_bulk_dedup_performers called run_bulk_dedup(strategy="performers") without
the cross_source_only guard whose docstring exists precisely to prevent this OOM.
At current catalog scale the unguarded path materializes N²/2 pairs per prolific
performer into a list → worker hit 6GB RSS and was OOM-killed every 12h (05:00/
17:00), taking down concurrent tpdb/stashdb/movie ingests as killed_by_restart
(0 new movies). Verified in prod: 05:00 run now completes (885k pairs scored, no
OOM) and ingests succeed (stashdb +241, tpdb +175).
Also wrap in _run_with_timeout like tpdb/stashdb (job had no hard-timeout).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
TPDB taxonomy emits numbered-duplicate tags (name "Bubble Butt2"); slugify
yields "bubble-butt2" (no separator before digit), so resolve_tag created a
separate tag alongside "bubble-butt". Tube scenes inherited the dup via
scene-merge → 75 pairs, ~10k scene_tags on the wrong tag.
- resolve_tag: canonicalize "<base>2" -> "<base>" when base exists (handles
current + future; trailing-"2"+alpha guard leaves milf-30/teen18 intact)
- scripts/merge_dup2_tags.py: one-off bulk merge (scene_tags + movie_tags +
blacklist) and taxonomy-count refresh
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Postgres parallel workers (e.g. sitemap_index) need >64MB shared memory;
Docker's default /dev/shm cap raised DiskFull ("No space left on device").
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
porndish-only scenes had no tags and no description — the scraper only derived a
title from the URL slug. The scene page (g1/bimber WP theme) carries both: a
<p class="entry-tags"> list of /video2/<slug>/ links (the "#" tags the user sees,
categories + co-performers) and a prose description <p> in .entry-content.
Override _fetch_scene_metadata in PornDishScraper to pull both from one page
fetch. Extend the base hook to accept an optional 4th return element
(description) and thread it into RawScene.description — backward compatible with
the existing 3-tuple (pornhat). Strips leading embed-button labels
("Video Player N", "Server N") from the prose. Verified on live scenes: clean
tag lists + real descriptions.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
porndish scenes resolve only to playmogo.com embeds, which are DoodStream clones
(doodcdn.io + pass_md5 + Cloudflare Turnstile). The mobile resolver already
supported playmogo, but DoodStream is flaky from a single shot: the embed is
sometimes Turnstile-gated (no pass_md5), and the pass_md5 endpoint intermittently
returns the literal string "RELOAD" (stale/consumed token) instead of a base URL.
The old code built "RELOAD<suffix>?token=..." -> ExoPlayer "no extractors" ->
WebView -> loading forever (bug 62e78c9a).
Wrap resolveDoodStream in a 3-attempt retry that re-fetches the embed (fresh
token) on retryable failures (gate / RELOAD / empty / stale token), and reject a
non-http pass_md5 body as retryable instead of building a garbage URL. Verified
cross-IP that the pass_md5 -> base -> final flow yields 206 video/mp4 when not
gated; real carrier IPs are gated far less than the test proxy. Strict
improvement: worst case is the existing WebView fallback, best case native play.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Movies: the seekplayer-engine family (easyvidplayer/player4me/seekplayer/
embedseek/upns, ~322k sources) returns a time-bound master.m3u8 on a CDN with a
valid IP-SAN cert that plays cross-IP. Mark it mobile_direct in resolve, and make
MovieDetailScreen prefer direct_url with a proxy fallback (mirrors the scene
path) — previously every movie streamed through the VPS proxy. Paradisehill
multipart parts now go direct too. Device-verified: ExoPlayer plays the raw CDN
direct, zero proxy traffic, no flicker.
Scenes: the three blacklist NOT EXISTS clauses were appended to every filtered
list and evaluated per-row even when all blacklist tables are empty (~3.4s tax on
a deep mega-tag walk). Skip them when the tables are empty (cached check) —
mega-tag list 6.7s -> 3.3s, and every filtered list benefits.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Device logs (not assumptions) pinned the real cause of the hdporngg/fullmovies
flicker: the backend returns a get_file URL, but get_file is bound to the IP that
loaded the *page*. The backend (VPS) loads the page, so the get_file is VPS-bound;
the phone fetching that get_file gets HTTP 410 -> ExoPlayer errors -> falls back to
the proxy via nav.replace (the "flicker"), and ends up streaming through the proxy.
(My earlier "stateless/portable" test was from the VPS — same IP as the page load —
so it wrongly showed 206.)
Fix: when the direct_url is a get_file, the phone re-fetches the *page* itself
(resolveGetFilePage on source.page_url) so the get_file is bound to the phone IP,
picks the requested quality skipping 4K (dead on fpvcdn), follows to the CDN, and
hands ExoPlayer a working URL. On failure it keeps the original (proxy fallback).
Verified on device: [getfile] page-resolve -> get_file 206 -> ExoPlayer PLAYING,
position advancing, no error/proxy/flicker, real video frame rendered.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
hdporn.gg/fullmovies.xxx return an unresolved get_file direct_url that 302-redirects
to fpvcdn.com with the requester IP baked in. The backend can't resolve it (would
bind fpvcdn to the VPS IP -> mobile 403), so the phone must follow the redirect. But
ExoPlayer errors on that cross-domain get_file->fpvcdn redirect (drops Referer / won't
complete it) -> the native player falls back to the proxy via nav.replace, which the
user sees as a screen-reload "flicker" before playback (and means it's actually playing
through the VPS proxy, not direct).
Fix: resolve the get_file 302 in JS on the phone (so fpvcdn binds to the phone IP)
before navigating to the player, and hand ExoPlayer the final fpvcdn URL directly —
no redirect, no error, no flicker, no proxy. Uses the same redirect:'manual' +
Location-header pattern as the doodstream resolver (works on RN Android). On resolve
failure it keeps the original get_file URL (current behaviour with proxy fallback).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
User: "hdporngg loading forever". DevTools + cross-IP investigation (not guessing):
- site is alive (sample scenes 200; the one earlier 404 was a single removed video,
not the site — my earlier "site dead" was a hasty generalization).
- both are the same platform (<source src=.../get_file/8512/...mp4>), no function/0.
- the get_file 302 is fast (~100ms) but the 2160p/4K source on fpvcdn.com TIMES OUT
(~30s); 720p/480p resolve in ~1s. The player loading 4K first = the "loading forever".
- the final fpvcdn URL embeds the requester IP (ip=<fetcher>) -> IP-bound to whoever
resolves it; BUT the get_file itself is stateless (fresh session works) and valid >=90s,
and binds fpvcdn to the fetcher. So a VPS resolve would bind to the VPS IP (mobile 403),
but returning the get_file URL UNRESOLVED lets the phone follow the 302 itself ->
fpvcdn binds to the phone IP -> plays.
Fix: new _source_getfile resolver returns get_file URLs as mobile_direct (skip 4K),
phone resolves the 302 in-session. Native, multi-quality, no WebView, no proxy.
Replaces fullmovies' old force_proxy+4K extractor and the WebView fallback for both.
Backend-verified: resolve -> 720/480 mobile_direct, get_file fresh fetch -> 206. Pending
on-device confirmation (emulator unstable; same mechanism as porn00/freshporno which work).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Same proper re-investigation as freshporno (DevTools + Bright Data residential
cross-IP + curl_cffi browser TLS). porn00's final CDN fe.porn00.org/...?token=&expires=
is PORTABLE cross-IP (token resolved from one residential IP replays 206 from a
different Bright Data residential IP) and only rejects non-browser TLS (plain curl
403, curl_cffi chrome 206). In #20 I tested the final URL with a standalone plain
curl, got 403, wrongly concluded "IP-bound" and left it on WebView (and before that
it used force_proxy, which violated the no-proxy stance).
porn00 flashvars are plain get_file (already decoded, no function/0 prefix), so
extend _kvs._URL_RE to match both forms — real_url passes plain URLs through
unchanged, _resolve_get_file follows the 302 in-session. porn00.py becomes a thin
_kvs wrapper. Verified no regression for the function/0 tubes (yespornvip/pornditt/
freshporno still resolve 3x mp4). Result: porn00 native multi-quality, mobile_direct,
zero proxy/WebView.
fpoxxx and pornxp were re-tested the same way and ARE genuinely IP-bound (403 from a
different residential IP — their token binds to the resolver IP), so they correctly
stay on the WebView fallback.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Re-investigated with the proper method (Chrome DevTools network capture + cross-IP
test via Bright Data residential proxy + curl_cffi browser-TLS) instead of guessing.
freshporno's real flow is get_file -> 302 -> cdn4.freshporno.org/remote_control.php
-> 206 video/mp4. The CDN URL is PORTABLE cross-IP (a token generated from one
residential IP replays fine from the VPS and from a different Bright Data residential
IP), it only rejects non-browser TLS fingerprints (plain curl -> 000, curl_cffi
chrome / ExoPlayer -> 206).
In #20 I tested the final URL with a standalone plain curl, got 000, and wrongly
concluded "unreachable from residential" -> kept it on the WebView fallback, which
barely worked (ad-heavy page, flaky). That false negative is the regression the user
reported. freshporno is function/0 KVS, so _kvs.resolve_kvs (which uses curl_cffi
chrome) already decodes + resolves it to a portable mp4 — switch to backend resolve
like yespornvip/pornditt: native, multi-quality, no proxy, no WebView.
Verified: backend resolve returns 3x mp4 (1080/720/480, mobile_direct) + cdn 206;
user confirmed native playback on device.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Bug 19866e9e ("problem z oboma hosterami"): a scene whose only two sources were
fullmovies.xxx and hdporn.gg wouldn't play at all — neither had an entry in the
extractor registry, so try_extract returned None ("no stream"). fullmovies.xxx
serves a <source ...get_file...mp4> but the get_file CDN times out from the VPS
(unreachable, like freshporno), so backend resolve isn't viable; hdporn.gg sample
pages 404. Route both through the WebView fallback so the phone (residential IP)
loads the page and plays / the injected-JS scrape can grab the URL — strictly
better than no playback path. Surfaced by the hoster sweep + this bug report.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Ad-hoc research tool: for a list of candidate tubes, fetch a listing page, grab a scene
URL, and classify the detail — reachable / JSON-LD VideoObject / duration / performers /
tags. Used 2026-06-03 to evaluate deep-crawl candidates (redtube + drtuber look strong;
pornhub/spankbang/porntrex/hqporner/youporn rejected; nuvid/motherless bare).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
xvideos SSR's JSON-LD VideoObject (duration/title/uploadDate) + on-page /models/ (perf)
+ /tags/. Sample: median ~10.5min, 93% >=3min. Pilot (2 pages): 29 new, 100% playable +
visible + tagged (performers sparse — xvideos 'new' is amateur-heavy; /models/ tagged
mostly on studio rips).
- XVideosBrowseScraper (JSON-LD + page-parse models/tags), in ALL_BROWSE_SCRAPERS.
- deep_crawl._PAGE_CAP: per-sitetag depth cap; xvideoscom=1800 (~newest 50k). At the cap
the tube is marked exhausted (reset -> incremental re-sweep) so a mega-tube cannot
monopolize the round-robin or balloon the DB.
- ported yesporn.py into the public repo (was prod-only, like hdporngg) ending the
__init__ public/prod divergence.
youporn rejected: JSON-LD lacks actor/keywords, its /pornstar//category/ links are A-Z
nav not scene-specific. xhamster: 429/Cloudflare from the VPS IP.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
porntrex/hqporner rejected for deep-crawl: KVS sites with no SSR metadata (77% of
existing porntrex has no duration -> invisible under the app's >=60 filter). eporner
instead exposes a public JSON API (api/v2/video/search) returning title + length_sec
+ keywords + added per video; ~100k videos, ~100/page, no per-scene detail fetch.
- BaseBrowseScraper.crawl_page(page): factored out of latest_scenes; returns None
(transient fail) / [] (catalog end) / [scenes]. API subclasses override it.
- deep_crawl drives via crawl_page (supports HTML-listing AND API sources).
- EpornerApiScraper: crawl_page hits the eporner API -> RawScene with duration+tags+
date+thumb+playback; registered in ALL_BROWSE_SCRAPERS.
- Pilot (2 API pages): 192 new, 100% playable + tagged + visible (>=60); the <180s
trailer filter dropped 6 short clips.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Deep-crawling tube catalogs pulls in lots of <3min trailers/teasers (porndoe). Add
min_ingest_duration_sec (default 180): _process_scene skips scraper-source scenes whose
known duration is below the floor (unknown duration kept; canonical TPDB/StashDB
untouched). Deleted 67 existing porndoe-only orphan trailers (<180s, no canonical, no
non-porndoe live playback) via cascade.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
We ingested only ~3% of each browse tube's catalog (porndoe >62k scenes; we had 1959)
because tubes were hit only by performer-search + top-N browse. Pilot (porndoe pages
64-110): 1119 new scenes, 100% playable + 100% tagged, 0% canonical overlap (purely
additive — content not in TPDB/StashDB).
- app/scheduler/deep_crawl.py: round-robin over ALL_BROWSE_SCRAPERS, per-tube page cursor
in app/_state/deepcrawl_state.json (no DB migration), deep-paginate from the cursor,
idempotent (resolver skips known by raw_hash), mark 'exhausted' at catalog end then
reset cursors for an incremental re-sweep.
- _job_deep_crawl: hourly, 60 pages/run (~1860 scenes, ~22 min), wrapped in the 1h
hard-timeout; registered in build_scheduler (jobs=10).
- config: sched_deep_crawl_hours=1, deep_crawl_pages_per_run=60, deepcrawl_state_path.
- scripts/pilot_porndoe_deepcrawl.py: one-off pilot used to validate the approach.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Gated the expo-screen-capture preventScreenCaptureAsync call behind
SCREEN_CAPTURE_PROTECTION (currently false) so screenshots / screen recording
work during emulator debugging — FLAG_SECURE makes every screencap black, which
blocks on-device playback verification. Single-user phase; flip back to true
before wider distribution.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Hoster sweep (2026-06-02) found pornhub resolving to 0 sources: yt-dlp (current,
2026.03.17) gets HTTP 403 fetching the watch page from the Hetzner VPS, while the
other yt-dlp tubes (xvideos/xnxx/youporn/redtube) still work — so it's a
Pornhub-specific block of the server IP, not a yt-dlp regression. Route pornhub
through the WebView fallback so it plays from the phone's residential IP, same as
xhamster. 7.3k scenes affected.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Bug 6ec1960e: yespornvip "resolving forever". yesporn.vip moved to a
cdn4/remote_control.php CDN (still portable cross-IP — verified 206 from a
residential IP, so backend resolve stays correct). But when a video is removed
from the CDN the page still exists and each get_file 302-follow STALLS to the
full timeout. With the resolve timeout (60s) applied per quality variant, a dead
scene hung 3x60 = 180s and returned nothing -> the mobile resolve spinner never
ended.
Fix: a dedicated low get_file timeout (10s, separate from the page-fetch
timeout) and an early-break once 2 variants fail with no result so far (the
scene is dead on the CDN — no point waiting for the third). Dead scene now
resolves to None in ~20s instead of 180s; a live scene is unaffected (~0.8s,
3 sources). Applies to all KVS tubes (yespornvip + pornditt).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
At the shared 05:00 anchor all heavy jobs fire together; tpdb/stashdb/performer-driven
had no timeout, so a hung connector blocked the whole job and — with max_instances=1 —
blocked every future fire of that job until a worker restart (incident 2026-06-02: 6 runs
hung 8.7h, movie mirrors 47h stale, tube ingest stalled).
- _run_with_timeout wraps tpdb/stashdb/performer-driven in a 30-min hard cap (same
ThreadPoolExecutor pattern movie-ingest already uses): on timeout the job returns and
frees the scheduler slot; the orphaned thread lives until restart.
- _job_reap_stuck: hourly reaper of 'running' >2h rows, registered in the scheduler —
the startup-only reaper missed hangs while the worker stayed up for hours.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Scene-list screens showed a small spinner while waiting on the API, so a slow
list read felt like a blank stall. Replace the initial-load spinner on
ScenesScreen and TagScenesScreen with a SceneGridSkeleton — a 2-col grid of
pulsing placeholder tiles laid out 1:1 with SceneTile (16:9 thumb + title + meta
lines). It paints instantly with zero data, so the screen feels responsive even
when the query takes a moment, and the skeleton->content swap doesn't reflow.
Pairs with the backend list-count fix (most filtered lists are now ~0.1s); the
skeleton also masks the residual slow path (enormous tags) so it no longer reads
as a freeze.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The filtered scene-list endpoints (default feed sends min_duration_sec=60, plus
has_playback / tag / q filters) took ~4.5s — and an idle server. Profiling showed
the entire cost was the bounded COUNT subquery over the EXISTS filters: Postgres
would not reliably early-terminate at the cap under psycopg bound params, scanning
the whole matching set (~858k for has_playback). Counting over the PK and using a
literal LIMIT helped some cases but the plan stayed unstable.
Fix: stop computing an exact count for filtered lists entirely. The mobile client
paginates by has_more (per_page+1 fetch), never by total — total is only the "N+"
UI counter. Derive total as a lower bound from the page + has_more after the fetch.
This removes the count query from every filtered request.
Result (end-to-end, authenticated): default feed 4.5s -> ~0.1s, has_playback
4.4s -> ~0.1s, q/studio/normal-tag filters all <0.3s. Also added index
scene_tags(tag_id, scene_id) (PK led with scene_id, so tag->scenes did a seq scan).
Remaining: a single enormous tag (e.g. "anal", ~163k scenes) ordered by recency
still gathers-all-then-sorts in the fetch (~5s); normal tags are <0.5s. Tracked
in #22 for a denormalized recency-ordered approach.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The bounded count for filtered scene lists ran `SELECT count(*) FROM (SELECT
scenes.* ... LIMIT 1001)` because the base query selects the full Scene entity.
Counting over all columns made the planner pick a far worse plan via psycopg
bound params (~4s for has_playback) than the same logic over the PK (~30-400ms).
Count semantics are unchanged — we only need rows to exist — so count over
`base.with_only_columns(Scene.id)`.
Partial: this fixes the count leg. The main ordered fetch on filtered lists
(has_playback / tags) can still pick a gather-all-then-sort plan under bound
params (fast with literal binds, slow parameterized) — tracked separately.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Publishing the OTA from Windows git-bash failed at the scp step (2026-06-02):
- git-bash (MSYS) rewrote the /root/... env path to 'C:/Program Files/Git/root/...'
before Python saw it → upload targeted a bogus remote dir.
- scp local source 'C:\...\dist' is parsed as host 'C' (drive letter = host).
Fixes: default runtime 1.0→1.1 (active channel, app.json runtimeVersion=1.1); scp
source passed as '.' with cwd=DIST (no drive letter); MSYS_NO_PATHCONV=1 in subprocess
env; defensive un-mangle of a git-bash-converted VPS_BASE.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Pairing is automatic (App.tsx auto-connects to the public instance when no creds are
stored); the login screen only appears after an explicit Sign out. It defaulted to
localhost + empty key, forcing manual entry that no longer reflects how pairing works.
Now it prefills the public backend + shipped key (one-tap 'Connect to public instance')
and tucks the URL/API-key fields under an 'Advanced · self-hosted backend' toggle for
power users.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Header showed the 'goon' text wordmark while the login screen leads with the GoonMark
symbol — switch the header to GoonMark so the logo is consistent across login + main.
- Scenes/Movies/Sites could overlap the header action icons on narrow phones: the mark is
narrower than the wordmark, row gap reduced 16->10, and the 'Sign out' text replaced with
a compact icon — frees ~80px so the left (logo+tabs) and right (actions) fit down to ~320px.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Scene.duration_sec was NULL for ~74% of playable scenes (tube duration lives on
playback_source, never propagated to Scene), so the mobile min_duration_sec=60 filter
(Scene.duration_sec >= 60; NULL fails) silently hid them — surfaced as '119 in favorites,
14 after entering the performer' (Safira Yakkuza).
- resolver: _effective_duration() falls back to max live playback_source duration when the
connector provides no scene-level duration (forward fix, used in create + update).
- scripts/backfill_scene_duration_from_playback.py: one-off idempotent backfill (recovered
204,014 scenes).
- taxonomy_counts: scene_count now counts playable AND duration_sec >= 60, matching the
always-60s-filtered scene lists, so favorites/performer/studio/tag badges agree with what
the scene screen actually shows (Safira: 39 == 39).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- paradisehill.fetch_movies compared release_date coerced to midnight against the
`since` timestamp, so the chronological crawl stopped at the first upload dated
the same calendar day as `since` and silently dropped most new movies (0-2 seen
per run; Movies tab stalled). Compare by DATE with a 1-day grace instead; idempotent
external_records upsert dedups the re-fetched recent window.
- scripts/backfill_paradisehill_movies.py: one-off no-delta deep crawl to recover the
backlog missed during the bug (idempotent, resumable).
- docs: correct stale 'raz dziennie/24h' browse-latest comments to 6h (4x/day), the
actual configured cadence (config.py sched_browse_latest_hours=6).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Resolver/perf:
- find_by_phash_within: nearest match via Postgres bit_count over bit(64) XOR
instead of Python scan of all phash fingerprints (~20x faster per scene;
unblocks long delta runs that were killed mid-run before since advanced).
Scheduler/reliability:
- reap ingest_runs stuck in 'running' on worker startup (killed_by_restart).
- smoke_test: per-source ingest health, stuck-run and browse-freshness checks
-> Sentry; exclude killed_by_restart from the failed-run alarm.
Tags (ingest with tags + fill blanks):
- wire infer_tag_slugs into normalize_scene so tube scenes get title-inferred
tags (was dead code); union with connector tags.
- scripts/backfill_inferred_tags.py: keyset/batched/idempotent backfill for
existing tagless scenes (playable tag coverage 16% -> ~52%).
Clip-store:
- skip ManyVids/IWantClips/Clips4Sale/... from canonical sources at ingest
(GOON_SKIP_CLIP_STORE, default on) — permanent orphans, ~56% of canonical
ingest, never have a free-tube playback source.
Browse tubes:
- enable fullmovies + hdporn.gg: studio parsed from title prefix instead of
the /networks/ sidebar (which always yielded the first listed network);
drop phash compute (pilot: 0% canonical hit within Hamming 5 — auto-screenshots),
matching relies on title/performer/duration.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Opt-in remediation for the duration-inconsistent scenes found by the audit.
Scope is deliberately narrow and reversible:
- only scenes with >=3 duration-bearing sources AND max/min ratio > 3x
- anchored on scene.duration_sec (the canonical value), never the median of
sources (a median is wrong when several bogus short clips outvote the real
full-length source)
- marks dead ONLY sources that are >2x SHORTER than the canonical — a falsely
merged source is almost always a short SEO clip/preview. Sources longer than
the canonical are left alone, since an over-long outlier more often means the
canonical duration itself is too low (so killing the long source would drop
the real video); those stay for manual review.
- guards that at least one live source remains
- dry-run by default; --yes to apply; sets dead_at (reversible), not delete
First run marked 514 short-clip sources dead across 228 scenes.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Read-only data-quality audit for scene merges made before the 2026-05-12
scoring hardening (which now caps weak-signal aggregator matches at 0.85 and
tightened the duration bump to <=3s). The auto-merge candidate log does not
record which external_ref was attached, so a merge cannot be reversed from the
log alone. Instead this detects false merges by their effect: a scene that
absorbed a different video ends up with playback_sources of inconsistent
durations (e.g. a 60s clip alongside a 2h source).
Reports counts + severity buckets by max/min duration ratio, can list the worst
offenders with a per-source breakdown, and can export suspects to JSON. Mutates
nothing — remediation (detach/mark-dead the outlier source) is left as an
explicit, separately-decided step because short durations can be legitimate
(previews) and n=2 scenes are ambiguous about which source is canonical.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Bug f6c86847/b1b5e1a2: doply/playmogo plays fine but seeking throws
"source error, invalid NAL length" in ExoPlayer. Investigation (cross-IP,
2026-06-01) showed the stream is well-formed — faststart MP4 (moov before
mdat) on cloudatacdn.com which fully supports HTTP range (206, correct
content-range, repeatable token, no redirect). So it is an ExoPlayer-internal
seek failure, not an HTTP/container problem, and expo-video exposes no
extractor/MIME hint to influence it.
Mitigation: when the native player errors *after* it had already loaded
(i.e. a mid-playback/seek failure, not an initial-load failure) and the error
is not a 404/410, recreate the source via player.replace() and resume at the
last known position — this opens a fresh connection and re-parses moov, which
typically clears the transient decode error. Hard-capped at 2 attempts per
mount to avoid any auto-reload loop; if it still fails it falls through to the
existing proxy/WebView fallback and error UI. Initial-load errors are
untouched, so the resolver and the ~59k working doply sources are unaffected.
Also thread playbackId/entityKind through the resolved-hoster and proxy/WebView
nav.replace calls so those paths get the 404 "Mark broken" affordance too, and
complete the local RouteParams type with headers/fallbackProxyUrl.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
When a host returns 404/410 at playback time (CDN gone, video removed) the
player previously showed only a raw error and a Back button — the user could
not tell it was a dead source or report it without going back to the detail
screen (bug a78cc3b6: "fpo i sxyprn to 404, którego apka nie potrafi
zidentyfikować").
- Thread playback_source.id into Player route params (scenes + movies).
- Native player error overlay: detect 404/410 in the ExoPlayer error, show
"Source no longer available" and a "Mark broken" button that marks the
source dead and returns. 403 is excluded (proxy/WebView fallback may save it).
- WebView player: add onHttpError; on a main-document 404/410 show the same
overlay (Mark broken / Try anyway / Back) instead of the host's 404 page.
Guarded to the loaded document (host+path) so same-host ad/subresource 404s
don't false-trigger.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Re-checked whether these four KVS tubes could move to server-side resolve
like yespornvip/pornditt/porntrex. All four are reachable from the backend,
but cross-IP testing showed their final CDN URLs are IP-bound to the
resolving host (403 / connection refused from a different IP; fpo.xxx even
embeds the resolver IP in its acctoken). Unlike the portable cdn5/twa CDNs,
backend resolve cannot produce a mobile-playable URL here without a proxy,
which is out of scope for the public app.
- porn00: was using force_proxy resolve (violated the no-proxy stance);
switched to the WebView fallback like its siblings. The ad exposure that
originally motivated the proxy path is mitigated by the recent ad-filter
work (AD_HOSTS + cover overlay + injected-JS ad-CDN skipping).
- freshporno/pornxp/fpoxxx already on WebView fallback; comments updated
with the cross-IP findings so this isn't re-investigated.
- Dropped the now-unused tube extractor imports (F401).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
pornditt is the same kt_player KVS engine as yespornvip: flashvars carry
function/0/-obfuscated get_file urls + license_code, and the VPS reaches it
(HTTP 200). It was on _vps_blocked_fallback (WebView), where the scrape grabbed
the VAST preroll ad (trafostatic) instead of content (user bug "pornditt łapie
reklamę zamiast video").
Extracted the verified yespornvip logic into app/extractors/tubes/_kvs.py
(resolve_kvs: fetch page → decode function/0 get_file via kt_player algo → follow
302 in-session → portable CDN, multi-quality). yespornvip.py and new pornditt.py
are now thin wrappers. Registry: porndittcom _vps_blocked_fallback → pornditt.extract.
Verified on prod: pornditt → 720p/480p on twa.tgprn.com (portable, fresh-session
206 video/mp4); yespornvip still → 1080/720/480p on cdn5 (refactor intact).
Backend-only, no OTA — mobile plays mp4+mobile_direct_ok natively with quality
picker, zero WebView/ads.
Note: a real-browser residential load shows MEDIA_ERR on the content (the page's
own player flow / ad gating); server-side decode+follow sidesteps the player
entirely, which is why it resolves cleanly. The original bug scene (40f118e1) has
its video deleted on pornditt — verified on a live scene (156091).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
WebView-fallback hosts (pornditt, xhamster, 0dayxx, sxyland, fpoxxx, porndoe)
inject a VAST preroll ad video (trafostatic.com / bkcdn.net / gripi.online / ...)
that loads before the real content. The INJECTED_JS performance scrape grabbed
that ad mp4 and handed it to ExoPlayer, so the native player showed the 30s ad
instead of the video (user bug: "pornditt łapie reklamę zamiast video").
report() now calls isAdHost() and skips ad-network video URLs; extended AD_HOSTS
with the video-ad CDNs. Content CDNs (sacdnssedge etc.) still pass through.
Shipped via OTA runtime 1.1 (update ea4b9901).
NOTE: this fixes ad-scraping for the WebView class generally, but pornditt itself
is separately broken — its content get_file fails to load even in a real desktop
browser from a residential IP (MEDIA_ERR code 4; only the ad mp4 loads) and its
player config is dynamic/obfuscated (no inline flashvars to resolve server-side).
pornditt effectively unplayable for now — see task; deprioritize / fall back to
other sources. yespornvip (clean backend resolve) is unaffected by this.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
yespornvip was on the WebView fallback, which loaded the ad-heavy host page; the
INJECTED_JS scrape grabbed the preroll ad video (bkcdn.net, ~30s) instead of the
content, so the native player showed a 30s ad. The get_file content url is also
session/cookie-bound (410 for a cookieless ExoPlayer request).
Key finding: the VPS now reaches yesporn.vip (HTTP 200 — unblocked, same as
porntrex got 2026-05-22), so we can resolve server-side like porntrex instead of
relying on the browser. KVS flashvars carry function/0/-obfuscated get_file urls +
license_code; decode the hash with the kt_player algorithm (yt-dlp KVS algo,
verified to reproduce kt_player's output), then follow each quality's get_file 302
in the same curl_cffi session → final cdn5 url. That url is time-bound signed but
NOT IP/cookie-bound — verified portable cross-IP (VPS-resolved url fetched from a
different IP → 206 video/mp4).
New app/extractors/tubes/yespornvip.py returns 480p/720p/1080p portable CDN urls;
registry switched from _vps_blocked_fallback → yespornvip.extract. Mobile plays
direct natively with a working quality picker — zero WebView, zero ads, zero proxy.
Verified on prod (3 cdn5 sources) and emulator (quality picker → 1080p native
decode at 1920px, no WebView, no ad). Backend-only; no OTA needed.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
User bug: opening a WebView-fallback scene (yespornvip etc.) shows the host's
ad-heavy page while INJECTED_JS auto-plays + scrapes the stream url in the
background. User sees ads instead of a loading state.
Render an opaque cover (theme.bg + spinner "Loading video…") over the WebView
while !extractedUrl. The WebView is still laid out and painted underneath, so
media keeps playing (autoplay via mediaPlaybackRequiresUserAction=false) and the
performance-scan picks up the CDN url — but the user only ever sees a loading
screen, then the native player. Applies to every WebView-fallback host.
Safety: if no stream is scraped within 15s (host needs a real tap to start),
reveal the WebView so the user can interact manually — no worse than before.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
User bug: porntrex plays slowly, no quality picker, reload flicker — suspected
VPS proxy. Root cause: porntrex KVS get_file tokens are cookie/session-bound, not
just time-bound as previously assumed. The extractor handed mobile the raw
get_file url; ExoPlayer's cookieless request → 410 → mobile fell back to the VPS
proxy (slow + nav.replace flicker).
Verified: following get_file in the same curl_cffi session that fetched the page
→ 200 (streams video); a fresh session → 410. The final CDN url after the 302
(cdn.pcdn.cloudswitches.com/...?expires=&md5=) is portable — fresh session → 206.
Fix: extract() now uses one curl_cffi Session for page + get_file, follows each
quality's 302 (stream + Range, no body download) and returns the resolved CDN url.
Mobile plays direct, multi-quality picker works, zero proxy bandwidth. Falls back
to the raw get_file url if a resolve fails. Verified on prod: both 720p/480p now
resolve to cloudswitches CDN.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
yespornvip (and other KVS / kt_player tubes) play via the WebView fallback:
INJECTED_JS scrapes <video>.src and hands it to ExoPlayer. For KVS, <video>.src
is a get_file/N/<hash>/... intermediate that 302-redirects to the CDN, but that
redirect is bound to the WebView's cookies/session (and is effectively one-shot).
ExoPlayer's separate request gets "Source error: response code 410" (user bug
2026-05-31, scenes Delicious Dulce / Alexis Fawx).
The actual playable CDN url (e.g. tsvideo.sacdnssedge.com/video/ol_<hash>.mp4) is
portable (206 with no cookies/referer) but never appears in <video>.src or
XHR/fetch — only in Performance resource timing (the native media loader fetches
it after the 302). Verified live in Chromium on the exact broken scene.
INJECTED_JS now:
- skips get_file intermediates (INTERMEDIATE_RE) so they're never sent to ExoPlayer
- skips scrubber preview/heatmap/sprite mp4s (PREVIEW_RE)
- scans performance.getEntriesByType('resource') each tick and reports the real
CDN media url — cross-origin entries expose .name even without Timing-Allow-Origin
Pure JS → shipped via OTA runtime 1.1 (update d4708fed).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Filtered /scenes (tag/origin/q/studio/performer) ran exhaustive COUNT with
stub-filter EXISTS over 1.7M rows: TAG 5.1s, ORIGIN 4.9s, SEARCH 3.1s.
Mobile relied on `loaded < total` for infinite-scroll, making exact count
mandatory and ruling out approximate shortcuts.
Backend:
- SceneListOut gains has_more (bool) and total_capped (bool), both optional
for backward compat with old mobile
- Filtered count uses LIMIT _COUNT_CAP+1 (1000) subquery — cost is
O(min(matches, cap)) instead of O(all). Measured: TAG 5.1s→664ms,
SEARCH 3.1s→138ms, ORIGIN 4.9s→1.07s (also fixes SiteScenes showing
global count ~1M instead of per-site count)
- has_more from fetching per_page+1 rows (essentially free); extra row
stripped before serialisation
- Pure-default list (no filters at all) keeps TTL-cached full count
Mobile:
- getNextPageParam uses has_more ?? fallback to loaded<total
- Display shows "{total}+" when total_capped=true (5 screens)
Verified on emulator: tag "Big Tits" → "1000 scenes" loaded, no 500s,
backward compat confirmed (old APK works against new backend).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Counts for /tags, /performers, /studios and /favorites were computed live
per-request by aggregating scene_tags / scene_performers with an EXISTS to
playback_sources. As the catalog grew to ~1.7M scenes (6.3M scene_tags) this
ran ~4.3s for /tags?order=popular (x2 incl. the total count) and ~950ms for
the default /scenes count, making those screens load in several seconds.
- migration 0019: add scene_count (+ DESC index) to tags/performers/studios
- background job _job_refresh_taxonomy_counts (every 3h) recomputes the counts
in one UPDATE..FROM each (IS DISTINCT FROM to skip unchanged rows)
- /tags, /performers, /studios scenes path now read the column + ORDER BY the
indexed scene_count; for_movies paths keep live aggregation (small tables)
- favorites read denormalized scene_count instead of a grouped EXISTS aggregate
- /scenes default count: 10-min in-process TTL cache (header is approximate)
Measured: /tags?order=popular&per_page=500 ~8s -> 66ms incl. serialization.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>