Commit graph

179 commits

Author SHA1 Message Date
jtrzupek
05a35955ad fix(api): cap list_scenes filter sizes to prevent DB OOM (Fixes GOON-1M)
Some checks are pending
Backend tests / test (push) Waiting to run
A single request with 194 studio_slugs + 23 tag filters (each tag = a correlated
EXISTS) plus an ILIKE search built a query heavy enough that the OOM killer killed the
Postgres backend, triggering a full crash-recovery (~1s prod-wide outage, all in-flight
connections dropped). Any user could do this with a big enough filter. Cap studios to
50, tags to 15, performers to 15 (far above any real UI usage) and return 422 instead
of executing — bounding query complexity regardless of the planner's choice.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 16:25:29 +02:00
jtrzupek
813bf741b9 fix(mobile): re-resolve IP-bound tubes on playback error (sxyprn/eporner/fpoxxx)
sxyprn's video token is bound to the IP that fetched the post page; on mobile the
phone resolver works ~74% but ~26% fail when the egress IP shifts (CGNAT / network
switch) or the token goes stale → native player hung on a dead URL (18 reports, 26%
error rate in telemetry). Now on an initial-load error for these phone-resolved
tubes, the player re-fetches the page fresh (new token bound to the current IP) and
swaps the source before falling through to the proxy/WebView chain. Zero VPS
bandwidth. Gated by resolvePageUrl so other tubes are completely unaffected.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-25 11:11:21 +02:00
jtrzupek
585e5d59f5 chore(ingest): hard-remove hqfap + 4k69 (entire CDN library gone)
Re-check 2026-06-25 across the full id range confirmed both PlayTube tubes
serve only the fixed `/upload/videos/video_down.mp4` "server down" stub, never
a real file: hqfap 0/80 real (79 stub, 1 none), 4k69 0/40 real (38 stub, 2
none). Both were disabled 2026-06-22; CDN never came back, so removing entirely
(mirrors the pornhub/redtube/0dayxx/pornditt/pornhat removals).

Removed the extractor registry entries (hqfapcom, 4k69com) + module files and
the browse scrapers + imports. Prod DB data deleted separately (28,398
solo-orphan scenes + 46,196 playback_sources). `_playtube.py` kept: superporn
and neporn still use its JSON-LD helpers.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-25 11:07:47 +02:00
jtrzupek
9a789a8551 fix(extract): perverzija xtremestream → hoster/WebView (was bogus mp4, hung player)
_embed_iframe returned xtremestream's player endpoint (player/xs1.php?data=) labeled
type=mp4, but it's an IP-bound JS player page (403 cross-IP), not a real file — the
native player loaded it forever ("perverzija nie działa" / "loading w nieskończoność").
Added xtremestream.* to _IP_BOUND_CDN_RE so Stage 1 skips it and falls through to the
hoster fallback: the phone WebView loads the index.php player with its residential IP
and the stream plays in-session.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-25 10:57:33 +02:00
jtrzupek
1ca503b7be feat(ingest): add xnxx browse scraper (JSON-LD only, alongside search)
Browse over /best/<YYYY-MM>/<page> (SSR; xnxx has no clean /new/ and its homepage is
JS-rendered) for a latest-feed freshness signal next to the performer-driven search
scraper. JSON-LD VideoObject only — xnxx detail (unlike its xvideos twin) doesn't
expose /models/ or /tags/ in SSR, so performers/tags come via canonical merge + the
search scraper. Title is html.unescaped (JSON-LD ships &comma;/&excl; entities).

xhamster and sxyprn intentionally left search-only: xhamster Cloudflare-blocks the
VPS on listing pages (1KB challenge), sxyprn has no clean SSR listing (IP-bound) —
a flaky browse scraper would be worse than the working search + 168h watchdog.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-24 15:52:32 +02:00
jtrzupek
2051fc1ded feat(ingest): add youporn browse scraper (JSON-LD only, alongside search)
Browse over /browse/time/?page=<n> (SSR) for guaranteed latest-feed freshness next to
the existing performer-driven search scraper. JSON-LD VideoObject only (title /
duration / uploadDate / thumbnail) — deliberately NOT scraping performers/tags from
the detail page: JSON-LD has no actor field and the /pornstar//category links are
sidebar-polluted with no scene-scoped container, so a naive regex attached the same
2 pornstars to every scene. Performers/tags come via canonical merge + the search
scraper instead.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-24 15:47:58 +02:00
jtrzupek
55612e262b feat(ingest): add browse scrapers for porntrex + mypornerleak (alongside search)
Both were search-only — fresh only as long as the performer queue cycles and the
site search keeps working. Added browse scrapers next to the existing search ones
(xvideos/eporner pattern: search keeps performer back-catalog coverage, browse
guarantees latest-feed freshness → watchdog 48h instead of 168h):
- porntrex: KVS /latest-updates/<n>/ (title + thumb + phash)
- mypornerleak: WP REST /wp-json/wp/v2/posts?_embed=1 (title + date + studio from
  category + performers from the actors taxonomy)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-24 15:41:22 +02:00
jtrzupek
a10c51aebf feat(ingest): revive porndish — search→WP REST API browse
Watchdog flagged porndish as frozen (search ?s= stopped yielding new scenes
2026-05-07, 1151h). It's WordPress and the VPS can reach it, so converted to a browse
scraper over the WP REST API (/wp-json/wp/v2/posts?_embed=1), same pattern as
perverzija: title, date, featured thumbnail, studio (category — FreeUseFantasy /
I Have A Wife / … paysite content) and tags. Performers via canonical merge. Playback
unchanged (embed iframe → phone-side). 60 fresh scenes on first crawl.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-24 15:09:27 +02:00
jtrzupek
b3ecf7141a feat(ingest): revive perverzija — search→WP REST API browse
Search (?s=) started returning 429 and the homepage is JS-rendered (no post links in
raw HTML), so the old search scraper got 0 (frozen since 2026-05-07). perverzija is
WordPress and the VPS can reach it (200, not CF-blocked), so converted to a browse
scraper over the WP REST API (/wp-json/wp/v2/posts?_embed=1): one structured call per
page gives title, date, featured thumbnail, studio (category — DadCrush/FamilyStrokes/
… TeamSkeet-family paysite re-ups) and genre tags. Performers via canonical merge
(stars taxonomy isn't REST-exposed; title carries names). Playback unchanged (embed
iframe → phone-side). 15 fresh + 45 refreshed on first crawl.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-22 13:10:16 +02:00
jtrzupek
cbb2390a2a feat(sources): remove 0dayxx + pornditt + pornhat entirely
Three orphan-factory tubes (0–0.2% canonical match — auto-screenshot thumbs and
slug titles that never match TPDB/StashDB) — to be replaced by better sources.
Removed scrapers (files + imports), extractors (registry + modules), the pornhat
entry from tag-enrichment priority lists and the 0dayxx display override, and purged
the DB (19,003 playback_sources + 9,904 solo-orphan scenes; shared mirror scenes keep
their other sources). The pornhat-based enrich_studio endpoint stays as a graceful
no-op (no pornhat sources → returns no studio).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-22 12:23:29 +02:00
jtrzupek
2f3e57c0ac feat(ingest): revive fpoxxx — search→browse (KVS /new-N/)
fpo.xxx is a KVS site, not WordPress, so the old `?s=` search scraper matched
nothing (frozen since 2026-05-07). Converted to a browse scraper reading /new-<n>/
(title + duration + thumbnail + phash from the listing tile; performers via canonical
merge). Playback was already phone-side (KVS). 32 fresh scenes on first crawl.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-22 12:04:05 +02:00
jtrzupek
90e391e255 feat(sources): remove pornhub + redtube entirely
Both scrapers were disabled since 2026-05-12 (~0.4% canonical match — mostly short
amateur clips that never match studio content); their data sat frozen. Removed for
good: deleted the extractor registry entries, scraper files and imports, dropped them
from the tag-enrichment priority lists, and purged the DB (17,906 playback_sources +
122 scenes that had no other source; mirror scenes shared with other tubes just lost
the ph/rt link).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-22 11:55:08 +02:00
jtrzupek
1875604c6d fix(mobile): onboarding pager — measure page width so last slide shows "Start browsing"
scrollTo/onScroll used the full screen width, but the ScrollView viewport is narrower
(card margins + padding), so the computed index desynced from the visible slide — the
last slide kept showing "Next"/"Skip" instead of "Start browsing". Measure the real
viewport width via onLayout and use it for paging, scrollTo and index. Caught on the
emulator (uiautomator dump — FLAG_SECURE blocks screenshots).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-22 10:21:19 +02:00
jtrzupek
db23b63e46 feat(mobile): first-launch tutorial (pages, features, long-presses, player gestures)
A 7-slide carousel shown once on first launch:
- the three tabs (Scenes/Movies/Sites)
- search, filters, saved searches, Performers/Tags/Favorites
- long-press actions (hide/duplicate a scene, remove a wrong performer, link diagnostics)
- player gestures (tap controls, double-tap ±15s, swipe to scrub, unmute)
- favorites, Hidden content, PIN lock, the ? report button, Sites ★ ratings

Gated by a SecureStore flag; replayable from Settings ⚙ → Replay tutorial (via a
tiny onboarding bus). Suppresses the What's-new popup for brand-new users (the tour
covers it) and marks the changelog seen on finish.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-22 10:08:56 +02:00
jtrzupek
c154deab37 feat(sources): 0-5★ ranking on Sites (freshness/metadata/plays) + playback telemetry
Rates each source on three axes the user asked for:
- freshness: how recently/often new content arrives (newest age + 7d volume)
- richness: metadata coverage (thumbnail/tags/performers/description/studio/duration)
- plays: does it actually play — from real playback telemetry when available,
  else a proxy from the resolve mechanism. 0★ = offline (gates the overall stars,
  so a fresh+rich source that doesn't play still ranks bottom — the hqfap/4k69 case)

Backend:
- playback_events: fire-and-forget telemetry POST from the app per playback attempt
  (origin + success/error + time-to-first-frame), append-only, 30d retention
- source_stats: per-origin computed scores, refreshed by a scheduler job (6h);
  /sources joins it and sorts by stars
- models + local migration 0025; new GOON_SCHED_SOURCE_STATS_HOURS setting

Mobile:
- Sites rows show ★ rating; tap the stars for a breakdown (axes + metadata %, plus
  whether "plays" is measured or estimated)
- PlayerScreen reports playback success/failure per source (native path only —
  symmetric, conservative); origin threaded through Scene/Movie play callsites

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-22 10:00:59 +02:00
jtrzupek
f34a75f4c6 feat(ingest): disable hqfap/4k69 (broken playback), latestpornvideo → browse
- hqfap + 4k69: both ingested fresh but playback is dead (hqfap serves a fixed
  ~3MB "server down" stub for every scene; 4k69 resolves no playable URL).
  Removed from ALL_BROWSE_SCRAPERS so no new dead sources get ingested; existing
  live playback_sources marked dead in prod (scenes drop out of has_playback /
  Sites). Extractors kept in registry for easy re-enable if the hosts recover.
- latestpornvideo: was a performer-search scraper, so it never picked up the
  site's "latest" feed — users saw a stale set. Converted to a browse scraper
  reading /page/N/ (studio+date from title/thumb, category tags; performers via
  canonical merge). Moved DIRECT → BROWSE list.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-22 09:34:47 +02:00
jtrzupek
4afebacad8 feat(mobile): movies — performer filter + 3-column grid
Two Movies-list reports. (1) 1044cd34 'do movies have a metadata base for performers/categories/studio/year': yes — 90% have year, 92% studio, 81% performers, 93% tags, and the filter already covered studio/genre/year. Added the missing dimension: a performer search-and-select in MovieFiltersSheet (backend listMovies + api.ts already accepted performer_ids; only the UI was missing). (2) 0200956f 'use the space better': Movies grid goes 2 -> 3 columns (poster card is flex:1, scales fine) so ~50% more films per screen.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-21 23:21:22 +02:00
jtrzupek
960bc75be4 fix(hqfap): reject 3MB video_down.mp4 stub (placeholder, not real video)
hqfap migrated its JSON-LD contentUrl (and the *.workers.dev mirror) to /upload/videos/video_down.mp4, which serves a FIXED ~3.04MB file for EVERY scene regardless of declared length (verified 5/5 scenes at 14-47min all = 3.04MB, 2026-06-21). It is a placeholder/'server down' clip, not the content — the browser's own player streamed the same stub via MediaSource. We were handing users that 3MB stub (reports c382d441/ef10b946). Now reject the video_down.mp4 contentUrl and return no source, so scenes fall through to other sources or show no playback instead of a fake clip. Real older scenes (cdnde.com / okcdn.ru direct mp4) still resolve. This also makes the proxy-fallback question moot — there is no source to proxy.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-21 23:01:19 +02:00
jtrzupek
249ad49430 fix(mobile): double-tap seek no longer pops the center pause control
Report dc4e91fb: double-tapping to skip ±15s also called setControlsVisible(true), throwing the full controls (big center pause button) on screen for 3.5s. Seek already has its own ±15s hint overlay, so the controls pop was redundant — removed it. Single-tap still toggles controls.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-21 22:27:08 +02:00
jtrzupek
b643b2cb77 fix(movies): dedup playback sources by target (cross-mirror dupes)
Movie detail showed ~100 playback links (report 41ca1fa4) because the 3 dooplay mirrors (mangoporn/pandamovies/streamporn) each record the SAME hoster embed as a separate row (e.g. luluvid/e/X from all three). Dedup by real target (embed_url/stream_url/page_url) after the priority sort, keeping the highest-priority copy — one verified movie drops 101 -> 58 unique.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-21 22:26:50 +02:00
jtrzupek
78d26c4bc6 feat(mobile): strip .com/.org clutter from site names
User-report 18105d14: drop the TLD suffix from Sites list + SiteScenes header (hqporner.com -> hqporner, fpo.xxx -> fpo). Logos skipped (needs a per-site logo source) — TLD strip is the quick win.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-20 16:38:45 +02:00
jtrzupek
ac84da92a4 feat(siska): convert to browse scraper, re-enable (search broken site-side)
siska's ?s= search ignores the query (returns latest regardless), so the performer-driven search scraper always yielded 0 and was disabled. Rewrote SiskaScraper as a latest-browse scraper (BaseBrowseScraper, /page/<n>/) and moved it to ALL_BROWSE_SCRAPERS. The listing tile carries everything (no detail fetch): title, duration (MM:SS span), thumbnail (img data-src), performer + studio (img alt 'Performer - Title - Studio'), category (thumbnail path). Playback unchanged: fresh videos embed playmogo + luluvid, resolved phone-side via _embed_iframe. Verified ingest: 26 seen / 11 new / 15 updated / 0 errors — and 15 updated means siska scenes match existing canonical scenes, adding playback coverage rather than orphans. Now covered by the browse ingest-watchdog (48h) and the 6h browse-latest + deep-crawl jobs. Old self-player videos (player.siska.video -> cfglobalcdn, ~2018) are dead and age out.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-20 16:25:11 +02:00
jtrzupek
0b6f663528 investigate(siska): keep disabled — site search is broken (ignores query)
Revisited siska re-enable (user fa4083a2). Findings: (1) fresh siska videos (videoID 227xxx) embed playmogo + luluvid and ARE phone-resolvable; updated siska.py scene regex + extractor path to the current video.php?videoID= format (old /<slug>/ format is gone). (2) BUT siska's ?s=<query> search is broken site-side — it returns the latest videos regardless of query (angela white == riley reid == homepage), so as a performer-driven BaseSearchScraper it always yields 0 (title token filter rejects everything). Reviving siska would require converting it to a browse/latest scraper (changes ingest character) — left as a decision. Old self-player videos (player.siska.video -> cfglobalcdn) are dead. Scraper stays disabled.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-20 16:15:02 +02:00
jtrzupek
8b216018a2 feat(mobile): Hidden content screen — blacklist tags/performers/studios
User-report 86a9ec72 ('remove all gay scenes from randomly popping up'): there was no UI to hide a tag, nor to view/undo the blacklist — even though the 'Hide performer' alert promised 'undo from Settings -> Blacklist' (a screen that never existed). New BlacklistScreen: search-and-add any tag to hide (e.g. a category), plus manage/unhide all blacklisted tags/performers/studios. Reached via Settings -> Content -> Hidden content. Backend already drops blacklisted-entity scenes from every /scenes (device-scoped); this just exposes it.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-20 15:53:09 +02:00
jtrzupek
b0e15935c6 fix(mobile): stop full scene-list refetch on back-navigation (perf)
Returning to the Scenes list from a scene caused a full reload + phone load spike (report 5df48551). Cause: invalidateQueries(['scenes']) in SceneDetail/Player/Performer/Studio handlers — including the silent auto-enrich-thumbnail that fires on opening any thumbnail-less scene — forces react-query to refetch EVERY loaded page of the infinite list. Added refetchType:'none' to all ['scenes'] invalidations: marks stale without refetching the active list, which refreshes on pull-to-refresh / filter change instead. Scene detail (['scene', id]) still updates immediately.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-20 14:14:07 +02:00
jtrzupek
c524b43fa3 fix(mobile): drop background ANR noise from Sentry (beforeSend)
Android Background ANRs captured via AppExitInfo (GOON-1D) are OS-side noise: the OS freezes a backgrounded app and reports it as not-responding, with zero JS/app frames and nothing to fix. beforeSend now drops events that are ANRs (ApplicationNotResponding) AND backgrounded (contexts.app.in_foreground === false). Foreground ANRs are kept (those can be real jank).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-19 20:07:32 +02:00
jtrzupek
f014a901de feat(scheduler): periodic title+duration dedup (missing-merge tube dupes)
Missing-merge duplicates (same performer + identical normalized title + identical duration-to-the-second) that bulk_dedup misses — tube re-scrapes and cross-tube re-ingests like porn00 pulling a video already present from xnxx (reports 28fe8181/32df33b1). Extracted the proven merge_exact_title_duration logic into app/scheduler/title_duration_dedup.py (script now a thin wrapper), wired a 12h scheduler job (playback-only = what users actually see, GOON_SCHED_TITLE_DEDUP_HOURS). Signal is near-certain (two different videos don't share byte-identical title AND exact duration); no shared performer = not merged (over-match guard). Verified: job registers (jobs=14), backlog currently 0 after the one-shot global merge.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-19 11:20:48 +02:00
jtrzupek
476cbb8d16 fix(ingest): race-safe scene_tags insert (ON CONFLICT) — GOON-M
scene_resolver._sync_tags used check-then-insert (select existing -> add if None), which races under concurrent ingest of the same scene: two runs both see existing=None, both add, flush -> IntegrityError pk_scene_tags (Sentry GOON-M, 4 events). Switched to pg_insert(...).on_conflict_do_nothing(index_elements=[scene_id, tag_id]) + in-batch dedup, identical to movie_resolver._sync_tags.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-19 11:09:06 +02:00
jtrzupek
567a8fb3b5 fix(mobile): scene-list scroll perf + native phone-side fpoxxx resolver
(1) Scroll jank/device load on long scene lists (report 5b7ca1e1): SceneTile is now React.memo'd so typing in search no longer re-renders every mounted tile, and sceneGridProps bounds the render window (windowSize 7 etc.) — required because removeClippedSubviews stays false to avoid thumbnail blanking. Applies to all scene grids. (2) fpoxxx played an ad instead of the video via the WebView fallback (reports f79beefb/cfa207c7). fpoxxx is KVS with an IP-bound + session-bound get_file token (cross-IP 403 confirmed), so it must resolve phone-side: new fpoxxxResolver fetches the page + follows get_file on the device (KVS real_url port for the function/0 case), wired into SceneDetailScreen like sxyprn/eporner. Verified from a residential IP: get_file -> CDN returns 206 video/mp4.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-19 11:02:21 +02:00
jtrzupek
e4cb94bc59 feat(scheduler): hetzner bandwidth monitor + search-tube watchdog coverage
Two observability additions to the worker scheduler (intertwined in the same files): (1) ingest-watchdog now also covers performer-driven search scrapers (ALL_DIRECT_SCRAPERS) with a separate 7d threshold, not just browse tubes at 48h — several search tubes (perverzija, fpoxxx, porndish, ...) had frozen silently for weeks. (2) New Hetzner Cloud bandwidth monitor (app/scheduler/hetzner_monitor.py): polls outgoing_traffic vs included_traffic and fires a Sentry message at info/warning/error % thresholds with a per-level fingerprint. The config fields existed for ages but the monitor was never implemented. No-op until HETZNER_API_TOKEN + HETZNER_SERVER_ID are set in .env (verified: returns {enabled: False}, job registers as 'hetzner-monitor every 6h', jobs=13).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-18 09:18:59 +02:00
jtrzupek
b1a530611f fix(latestpornvideo): revive search via /actor/ listing + metadata
Old regex matched junk (/wp-json etc.), not scenes (scenes are /<post_id>/).
Frozen since 06-13. Rewrote search() to scrape the /actor/<slug>/ listing
and parse <article> cards: scene URL, title, performers + tags from the
class (actors-*/tag-*/category-*, dropping performer-name fragment tags),
thumbnail. Studio + release date parsed from the "<Studio>-YYYY-MM-DD"
thumbnail filename, with a title-prefix "<Studio> YY MM DD" fallback.
Multi-performer works; no duration in listing; playback unchanged (hoster).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-16 23:20:02 +02:00
jtrzupek
e77deef667 fix(mypornerleak): revive search via /actor/ listing + metadata
Content moved to the w8.mypornerleak.com (wN) load-balancer subdomain, so
the old bare-domain scene regex matched nothing (frozen since 05-07).
Rewrote search() to scrape the canonical /actor/<slug>/ listing: scene
URL (wN host normalized to canonical for stable dedup), title, duration,
performers and category-tags from the <article> class (actors-*/category-*),
thumbnail. No studio (OnlyFans/amateur leaks have none). Multi-performer
works; playback unchanged (hoster, phone-side).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-16 23:16:02 +02:00
jtrzupek
5b67aeeeaf fix(sxyland): revive search via /actor/ pages + rich metadata
sxyland dropped the /<numeric_id>/<slug>/ scene URL format for /<slug>/,
so the old regex matched nothing (frozen since 06-07). Rewrote search()
to use the performer page /actor/<slug>/ and fetch each scene for full
metadata: all performers (with co-stars, from /actor/ links), tags
(scoped to the scene's tags-list, not the sidebar), duration + upload
date (itemprop), studio from the title prefix (BraZZers/MilfCoach/... ,
guarded so a performer-name prefix isn't mistaken for a studio). Junk
nav pages (Terms of Use etc.) are dropped via a no-duration-and-no-tags
guard. Verified: clean studio/performers/tags in DB, 0 errors.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-16 23:11:44 +02:00
jtrzupek
e0e69189a8 fix(sxyprn): revive search via performer pages + rich metadata
sxyprn ingest was frozen since 05-07: the old ?type=videos&query= endpoint
returns trending (not performer-filtered), so the strict token filter
correctly dropped everything -> 0 ingest. Real "search" is the performer
page /<First-Last>.html. Rewrote search() to scrape those cards: clean
performer (the query, avoids sxyprn's Dallas/Rae name fragmentation),
studio (channel subcat), tags (#hashtags), duration, thumbnail. Token
filter now runs on the card title so only genuine matches attach the
performer. Verified: Lana Rhoades/Riley Reid/Angela White return results,
metadata persists in DB (studio e.g. Vixen, 10-31 tags/scene), playback
mp4 206.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-16 22:58:52 +02:00
jtrzupek
00f4779abe feat(mobile): column toggle, duration filter, saved searches, screen protection (mobilism feedback)
Batch from user feedback: (1) Grid columns 1/2/3 setting (PreferencesContext, persisted) across all scene grids — default 2 was too small on phones. (2) Min-duration filter chips (5/10/20/30+ min) to hide ad-clips. (3) Saved-search chips + Save button (backed by /saved-searches). (4) Re-enabled screen-capture protection (Recents hide + screenshot block) for distributed users — verified active on emulator (screencap returns 0 bytes). (5) 'Checking for updates' gate before the PIN screen so a background OTA restart no longer causes a double PIN prompt. Changelog entry added. Published OTA runtime 1.1 (a9620b12).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-16 13:52:27 +02:00
jtrzupek
bcee5851e9 feat(api): per-device saved searches (keyword favorites)
User-report (mobilism): scenes are often poorly titled, so saved keyword queries are a useful extra retrieval strategy. New saved_searches table (device-scoped via X-Device-Id, unique per device+query, 50/device cap) + GET/POST/DELETE /saved-searches. Migration 0024. Verified CRUD on prod: add trims+dedups idempotently, empty rejected 422, delete idempotent.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-16 13:52:18 +02:00
jtrzupek
0424cb9138 feat(scheduler): per-origin ingest freshness watchdog -> Sentry
The global source monitor can't catch a single stalled tube because every tube scraper shares one Source row (tube-scraper), so an aggregate run still reports success while one origin freezes (freshporno browsing the rotating KVS homepage root, report 14f3a655). New watchdog checks max(created_at) per active browse-scraper origin (tube:<sitetag>); if a tube with history hasn't produced a new scene in > max_age_hours it fires a Sentry message with a stable per-origin fingerprint (age in extras, not the title, so it stays one grouped issue). Runs every 6h, 48h threshold, both env-tunable (GOON_SCHED_INGEST_WATCHDOG_HOURS / GOON_INGEST_WATCHDOG_MAX_AGE_HOURS). Verified: 0 stale at 48h post-fix, detects neporn at a strict 12h threshold.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-15 10:26:25 +02:00
jtrzupek
4b71689a95 fix(scrapers): freshporno browse from /latest-updates/ not homepage root
The homepage root / is a KVS page with cache-control: no-store and a fresh PHPSESSID per request; the server rotates its featured block and on a cold session can serve an old set instead of the newest scenes. Result: browse-latest skipped everything for 3 days (root served 20 May content), no new freshporno scenes since 12 Jun (user report). Switch _listing_url to the explicit date-sorted /latest-updates/ feed (pagination /latest-updates/N/), which is not subject to that rotation.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-15 09:59:40 +02:00
jtrzupek
3714afa22f fix(mobile): capture site/origin text params in bug-report auto-context
SiteScenes passes the tube as origin/name (strings), not UUIDs, so the existing UUID-only auto-context loop dropped them. Reports like 'ingest of this site has been stuck 2 days' (14f3a655) arrived without any site identifier. Add a second loop for known string identity params (origin/name/sitetag/tag/q), length-capped, so per-site/per-performer reports become actionable.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-15 09:35:58 +02:00
jtrzupek
8b4783771f feat(scheduler): periodic thumb-asset dedup (hdporn.gg/fullmovies.xxx)
The one-off cleanup merged ~13.5k same-video-different-title dupes, but they regrow as
these sibling tubes re-ingest under new titles. Wire the asset-id+duration merge into
the scheduler (every 12h, GOON_SCHED_THUMB_DEDUP_HOURS, 0=off) so it stays clean.

Shared logic lives in app/scheduler/thumb_dedup.py (run_thumb_asset_dedup); the one-shot
script now imports it. Same tight signature as the cleanup: family hosts only + identical
duration (the bare asset-id number is reused across unrelated CDNs, so cross-host/diff-
duration grouping is excluded). Reports 205b17d9 / 5a2944cb.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-14 14:56:45 +02:00
jtrzupek
b5d9473898 feat(scripts): merge tube dupes by thumbnail asset-id (hdporn.gg/fullmovies.xxx family)
These sibling platforms share one video-id space and ingest the same video under
different titles, which bulk_dedup misses (different titles, no phash). Match by the
asset-id in the thumbnail path (/<bucket>000/<id>/) on img.hdporn.gg|fullmovies.xxx plus
identical duration, and merge. Hard host restriction + duration guard: the bare number
is reused for unrelated videos on other CDNs (verified via dry-run), so cross-host or
different-duration grouping is excluded. Run scoped (studio id) or global; dry-run by
default. Reports 205b17d9 / 5a2944cb. Ran on Parasited: 43 pairs merged.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-14 14:18:44 +02:00
jtrzupek
b66dd99eba fix(mobile): show Refresh thumbnail when the hero image actually fails to load
The button keyed on thumbnail_url presence, but a URL can be present yet broken (hqfap
404 → blank hero, no button — report ef0c6a5a). Tie it to the hero Image load state
(onLoad ok / onError broken / no url none) and show Refresh only when the image is
broken or missing. Reconciles 26c114ed (hidden for good previews) with ef0c6a5a.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-14 14:18:44 +02:00
jtrzupek
81d617efc2 fix(extractors): 4k69 direct okcdn extraction (replaces WebView fallback)
Reverse-engineered the migrated 4k69 player: jwplayer now serves OK.ru CDN (okcdn.ru)
mp4s. The static page (SSR behind Cloudflare, fetched via proxy) carries "file"+"label"
pairs for every quality. okcdn's srcIp param is NOT enforced (cross-IP test 2026-06-14:
206 video/mp4 from a residential IP != srcIp), so the URL plays from any IP. Parse the
okcdn sources server-side and return them mobile_direct_ok — the phone plays the direct
video, no WebView, no VAST preroll, no age-gate, zero VPS proxy. Skips 4K/2K. Reverts
the brief _vps_blocked_fallback routing (WebView grabbed the preroll ad, not content).
Verified on emulator: native player streams the actual scene (report 5de3fbc5).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-14 11:39:36 +02:00
jtrzupek
2a9445fe4a feat(mobile): auto-accept age-gate modal in WebView fallback
4k69 (and similar) show an "Are you 18 or above? Yes/No" modal that blocks the jwplayer
from initialising, so the WebView fallback never extracts a stream. Click the age-gate
accept button by id (#pop_up_18_yes and id*=18_yes/age_yes variants) on the same loop as
the consent/play-poster auto-clickers. Verified on emulator: 4k69 age-gate clears and the
player initialises (ExoPlayer hands off). A VAST preroll is still grabbed instead of the
okcdn content for 4k69 specifically (report 5de3fbc5 stays open) - separate ad-filter work.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-14 11:31:12 +02:00
jtrzupek
08410fddd1 fix(mobile): show Refresh thumbnail only when preview missing or broken
The Refresh thumbnail button appeared on every scene, which is noise for the majority
that already have a good preview (report 26c114ed). Show it only when no source has a
usable thumbnail or the only thumbnails are rotting (sxyprn/trafficdeposit), which is
exactly when a manual refresh helps (the original d3376a71 case).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-14 11:17:18 +02:00
jtrzupek
29da1fbaa6 fix(extractors): route 4k69 to WebView fallback after player migration
4k69 swapped its player from get_file (4kporno.xxx) to jwplayer + okcdn.ru, whose token
carries srcIp= (IP-bound); the site is also behind Cloudflare (VPS fetch only via proxy).
The native get_file extractor matched nothing and returned None, surfacing as a "host
problem" error even though the video plays fine (report 5de3fbc5). Switch 4k69com to
_vps_blocked_fallback: the on-device WebView (residential IP) clears Cloudflare, the
okcdn token binds to the phone IP, and INJECTED_JS hands the jwplayer source to ExoPlayer.
fourk69.extract stays in the module in case the site reverts to get_file.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-14 11:17:18 +02:00
jtrzupek
9269b02a4c feat(mobile): source-code link in Settings + Refresh thumbnail button
- AppLockSettings: a "Source code" row linking the public OSS repo (report 4c5066b8) -
  a trust signal for a sideloaded FOSS app (audit / self-host / contribute).
- SceneDetail: a "Refresh thumbnail" button (force) for scenes whose preview is broken
  or stale (report d3376a71).
- changelog: new What's New entry for this batch.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-13 19:04:11 +02:00
jtrzupek
e512665d26 feat(scenes): force-refresh thumbnail via enrich-thumbnail ?force
enrich-thumbnail was fill-only (skipped scenes that already had a thumbnail), so a
broken or stale preview (rotting sxyprn/trafficdeposit) could not be refreshed. Add a
force flag that re-fetches the source page and overwrites the existing thumbnail.
Backs the new "Refresh thumbnail" button (report d3376a71).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-13 19:04:10 +02:00
jtrzupek
32919d6a6c feat(extractors): detect deleted porntrex videos and mark dead
Porntrex soft-deletes: a removed video returns HTTP 200 with a "this video was deleted"
message instead of a player, so extract returned [] (transient) and the source was never
marked dead, leaving users on a permanently broken link (report 75dbf53e). Match the
deletion message and raise HosterDead so resolve marks the source dead.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-13 19:04:10 +02:00
jtrzupek
9d4384cef3 fix(ingest): cap code/director to column length (GOON-J)
Some sources (sexlikereal) build a giant `code`/`director` from a multi-performer
compilation title, overflowing scenes.code varchar(128) -> StringDataRightTruncation,
and the scene silently dropped from ingest. Cap both at the column limit in
_create_canonical and the fill path; code/director are stored metadata, not match keys,
so truncation is safe.

Fixes GOON-J

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-13 19:04:10 +02:00