Commit graph

50 commits

Author SHA1 Message Date
jtrzupek
2b602beea5 fix(dedup): tighten cross-source candidate prefilter — kill 1800s hang (GOON-V)
_candidate used OR logic (studio OR date±7d OR dur±30s) → 938,950 pairs;
Etap-2 scoring at ~110/s never finished in 1800s → bulk_dedup_performers HUNG
every run, orphan thread leaked until restart. Require AND: same studio plus
(date±2d OR dur±30s). 939k→16k pairs, full run 213s. Real cross-source dup of
one master shares studio + near date/duration; rare studio_id-mismatch pairs
skipped on purpose — a job that COMPLETES beats one that times out merging nothing.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 10:03:33 +02:00
jtrzupek
cd257740be fix(hqporner): require ALL query tokens in slug — stop performer over-attribution
hqporner search post-filter kept a scene if its slug contained ANY query token
(>=3 chars). For multi-word performer names this matched on a single common token
(e.g. "anna","mia"), so the performer-driven ingest attributed the scene to EVERY
performer sharing that token — scenes accumulated up to 503 wrong performers
(hqporner = 5659 of 5897 scenes with >30 performers; bug-reports 2026-06-07).

Switch ANY->ALL: the slug must contain every query token, requiring a full name
match before attribution. Single-word names still work. Precision over recall —
144 wrong performers is far worse than missing a few loose matches.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 09:28:18 +02:00
jtrzupek
43f7e1f7b2 perf(scenes): literal tag_id in filter — 4-12s tag lists -> ~20ms
Tag-filtered scene lists (e.g. blowjob + has_playback) took 4-12s. Root cause:
the filter joined scene_tags->tags on slug, so the actual tag_id was opaque to
the planner at plan time. It fell back to average per-tag cardinality
(8.4M/11541 ≈ 726) instead of the real 273k, chose to materialize ALL matching
scene_tags + check playback per row, then top-N sort.

Fix: resolve slug->tag_id in the app and filter on a LITERAL tag_id (no slug
join). With a constant, the planner uses MCV stats, knows the tag is huge, and
walks ix_scenes_created_at_desc probing scene_tags/playback per scene, stopping
at the page limit. Verified: blowjob list 3300ms -> 18ms (EXPLAIN), HTTP 4-12s ->
47ms. Unknown slug short-circuits to empty. (Pairs with the raised tag_id
statistics target so mid-tier tags also get correct estimates.)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 21:10:31 +02:00
jtrzupek
d52641774d perf(scenes): light list payload — drop tags/refs, slim playback to thumbnail
Scene list returned the full SceneOut per item (nested tags/external_refs + all
playback_sources with page_url/embed/stream/quality) though SceneTile only reads
the thumbnail + title/duration/performer/studio, and SceneDetail re-fetches the
full scene via /scenes/{id}. Added light=True to _build_scenes_out_batch: skip the
tags + external_refs queries entirely and collapse playback_sources to one slim
entry (thumbnail_url + animated_thumbnail_url only).

Result: default list payload 78KB->48KB (-38%), ~28ms cached, less DB work per
list. Verified on emulator: grid thumbnails/durations/titles render unchanged.
No mobile change (tile reads the same fields); server-side, no OTA.

NOTE: the separate slow path — common-tag-filtered lists (4-12s; query expands all
matching scene_tags before sort/limit) — is structural (needs a denormalized
(tag_id, created_at) index) and deferred. VACUUM ANALYZE + raised tag_id stats
applied but the planner still can't avoid the materialization.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 21:03:26 +02:00
jtrzupek
4922646011 feat(dedup): merge exact-phash + same-duration + shared-performer duplicates
bug-report 2026-06-03 ("ten sam czas, ta sama miniaturka, czemu się nie mergują"):
duplicate scenes not merged at ingest. Exact phash alone is noisy here (95% are
collisions on shared thumbnails/intro frames — different scenes; bulk_dedup scorer
correctly gives 0 auto-merge). The safe subset is exact-phash AND same duration
(±3s) AND shared performer/title — near-certain same scene. Same-duration is key:
it excludes the false-merge pattern (short-clip-vs-full has DIFFERING durations).

- scripts/merge_phash_exact_dupes.py: one-off, dry-run by default, per-pair re-fetch
  (handles clusters). Applied: 30 merged.
- bulk_dedup: add `_pairs_exact_phash` (SQL O(N log N), not the O(N²) Hamming scan)
  + strategy "phash_exact" — gated by the normal scorer (surfaces review candidates,
  no risky auto-merge), schedulable for ongoing exact-collision review.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 20:08:06 +02:00
jtrzupek
a196fcbcdb refactor(ingest): rename scraper Source name "pornapp" -> "tube-scraper"
The umbrella Source.name for all direct tube scrapers (deep-crawl, browse-latest,
performer-driven) was "pornapp" — a misleading leftover from the removed external
porn-app API. It read like a dependency on a third-party "pornapp" service; it is
not — these are our own scrapers hitting 25+ tubes directly (kind=scraper,
origin tube:<sitetag>). Renamed to "tube-scraper" via a single SCRAPER_SOURCE_NAME
constant; DB row renamed in place (UPDATE name, same id) so all ingest_runs +
external_records history stays linked. No behavior change — external_id keying
(sitetag:url) and dedup are unaffected.

NOTE: playback_sources.origin "pornapp:<sitetag>" prefix is a separate legacy
format (resolve_playback parses it) and is intentionally left untouched.

Verified on prod: row renamed (0 stray "pornapp"), new runs land on "tube-scraper".

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 16:54:55 +02:00
jtrzupek
8c0edbdf7b fix(playback): mark deleted sxyprn posts dead + rank native sources first
Two bug-report fixes (2026-06-07):
- sxyprn returns HTTP 200 "Post Not Found" for deleted posts (soft-404), so the
  extractor returned None → resolve treated it as transient and never marked the
  source dead, leaving a dead link offered forever. Now raise HosterDead on the
  marker so resolve marks it dead.
- Scene playback sources were ordered alphabetically by origin, so a WebView-
  fallback hoster (fpoxxx, IP-bound + ad-heavy) ranked above a working native
  source (freshporno) on the same scene. Add is_vps_blocked_fallback() and sort
  native-resolve origins ahead of WebView-fallback ones.

Verified on prod: sxyprn dead URL → HosterDead; scene sources reorder
freshpornoorg before fpoxxx.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 14:09:01 +02:00
jtrzupek
9d0cb7f26e fix(scheduler): bulk_dedup performers cross_source_only + hard-timeout (OOM)
_job_bulk_dedup_performers called run_bulk_dedup(strategy="performers") without
the cross_source_only guard whose docstring exists precisely to prevent this OOM.
At current catalog scale the unguarded path materializes N²/2 pairs per prolific
performer into a list → worker hit 6GB RSS and was OOM-killed every 12h (05:00/
17:00), taking down concurrent tpdb/stashdb/movie ingests as killed_by_restart
(0 new movies). Verified in prod: 05:00 run now completes (885k pairs scored, no
OOM) and ingests succeed (stashdb +241, tpdb +175).

Also wrap in _run_with_timeout like tpdb/stashdb (job had no hard-timeout).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 11:00:19 +02:00
jtrzupek
fad72e9cd6 fix(tags): merge <base>2 numbered-duplicate tags + prevent regeneration
TPDB taxonomy emits numbered-duplicate tags (name "Bubble Butt2"); slugify
yields "bubble-butt2" (no separator before digit), so resolve_tag created a
separate tag alongside "bubble-butt". Tube scenes inherited the dup via
scene-merge → 75 pairs, ~10k scene_tags on the wrong tag.

- resolve_tag: canonicalize "<base>2" -> "<base>" when base exists (handles
  current + future; trailing-"2"+alpha guard leaves milf-30/teen18 intact)
- scripts/merge_dup2_tags.py: one-off bulk merge (scene_tags + movie_tags +
  blacklist) and taxonomy-count refresh

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-06 23:18:44 +02:00
jtrzupek
210aec0536 feat(scrapers): extract tags + description from porndish scene pages
porndish-only scenes had no tags and no description — the scraper only derived a
title from the URL slug. The scene page (g1/bimber WP theme) carries both: a
<p class="entry-tags"> list of /video2/<slug>/ links (the "#" tags the user sees,
categories + co-performers) and a prose description <p> in .entry-content.

Override _fetch_scene_metadata in PornDishScraper to pull both from one page
fetch. Extend the base hook to accept an optional 4th return element
(description) and thread it into RawScene.description — backward compatible with
the existing 3-tuple (pornhat). Strips leading embed-button labels
("Video Player N", "Server N") from the prose. Verified on live scenes: clean
tag lists + real descriptions.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-06 21:32:10 +02:00
jtrzupek
83918e9a8d perf(movies+scenes): direct-play #hash movie hosters; skip empty blacklist filters
Movies: the seekplayer-engine family (easyvidplayer/player4me/seekplayer/
embedseek/upns, ~322k sources) returns a time-bound master.m3u8 on a CDN with a
valid IP-SAN cert that plays cross-IP. Mark it mobile_direct in resolve, and make
MovieDetailScreen prefer direct_url with a proxy fallback (mirrors the scene
path) — previously every movie streamed through the VPS proxy. Paradisehill
multipart parts now go direct too. Device-verified: ExoPlayer plays the raw CDN
direct, zero proxy traffic, no flicker.

Scenes: the three blacklist NOT EXISTS clauses were appended to every filtered
list and evaluated per-row even when all blacklist tables are empty (~3.4s tax on
a deep mega-tag walk). Skip them when the tables are empty (cached check) —
mega-tag list 6.7s -> 3.3s, and every filtered list benefits.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-06 19:44:41 +02:00
jtrzupek
e780e1ae6f fix(hdporngg+fullmovies): native get_file, skip broken 4K — "loading forever"
User: "hdporngg loading forever". DevTools + cross-IP investigation (not guessing):
- site is alive (sample scenes 200; the one earlier 404 was a single removed video,
  not the site — my earlier "site dead" was a hasty generalization).
- both are the same platform (<source src=.../get_file/8512/...mp4>), no function/0.
- the get_file 302 is fast (~100ms) but the 2160p/4K source on fpvcdn.com TIMES OUT
  (~30s); 720p/480p resolve in ~1s. The player loading 4K first = the "loading forever".
- the final fpvcdn URL embeds the requester IP (ip=<fetcher>) -> IP-bound to whoever
  resolves it; BUT the get_file itself is stateless (fresh session works) and valid >=90s,
  and binds fpvcdn to the fetcher. So a VPS resolve would bind to the VPS IP (mobile 403),
  but returning the get_file URL UNRESOLVED lets the phone follow the 302 itself ->
  fpvcdn binds to the phone IP -> plays.

Fix: new _source_getfile resolver returns get_file URLs as mobile_direct (skip 4K),
phone resolves the 302 in-session. Native, multi-quality, no WebView, no proxy.
Replaces fullmovies' old force_proxy+4K extractor and the WebView fallback for both.
Backend-verified: resolve -> 720/480 mobile_direct, get_file fresh fetch -> 206. Pending
on-device confirmation (emulator unstable; same mechanism as porn00/freshporno which work).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 22:48:55 +02:00
jtrzupek
c05bafb4c7 fix(porn00): backend KVS resolve (portable CDN, no proxy) — corrects #20
Same proper re-investigation as freshporno (DevTools + Bright Data residential
cross-IP + curl_cffi browser TLS). porn00's final CDN fe.porn00.org/...?token=&expires=
is PORTABLE cross-IP (token resolved from one residential IP replays 206 from a
different Bright Data residential IP) and only rejects non-browser TLS (plain curl
403, curl_cffi chrome 206). In #20 I tested the final URL with a standalone plain
curl, got 403, wrongly concluded "IP-bound" and left it on WebView (and before that
it used force_proxy, which violated the no-proxy stance).

porn00 flashvars are plain get_file (already decoded, no function/0 prefix), so
extend _kvs._URL_RE to match both forms — real_url passes plain URLs through
unchanged, _resolve_get_file follows the 302 in-session. porn00.py becomes a thin
_kvs wrapper. Verified no regression for the function/0 tubes (yespornvip/pornditt/
freshporno still resolve 3x mp4). Result: porn00 native multi-quality, mobile_direct,
zero proxy/WebView.

fpoxxx and pornxp were re-tested the same way and ARE genuinely IP-bound (403 from a
different residential IP — their token binds to the resolver IP), so they correctly
stay on the WebView fallback.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 21:15:19 +02:00
jtrzupek
6e3ad870a7 fix(freshporno): backend KVS resolve (portable CDN) — corrects #20
Re-investigated with the proper method (Chrome DevTools network capture + cross-IP
test via Bright Data residential proxy + curl_cffi browser-TLS) instead of guessing.
freshporno's real flow is get_file -> 302 -> cdn4.freshporno.org/remote_control.php
-> 206 video/mp4. The CDN URL is PORTABLE cross-IP (a token generated from one
residential IP replays fine from the VPS and from a different Bright Data residential
IP), it only rejects non-browser TLS fingerprints (plain curl -> 000, curl_cffi
chrome / ExoPlayer -> 206).

In #20 I tested the final URL with a standalone plain curl, got 000, and wrongly
concluded "unreachable from residential" -> kept it on the WebView fallback, which
barely worked (ad-heavy page, flaky). That false negative is the regression the user
reported. freshporno is function/0 KVS, so _kvs.resolve_kvs (which uses curl_cffi
chrome) already decodes + resolves it to a portable mp4 — switch to backend resolve
like yespornvip/pornditt: native, multi-quality, no proxy, no WebView.

Verified: backend resolve returns 3x mp4 (1080/720/480, mobile_direct) + cdn 206;
user confirmed native playback on device.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 21:12:17 +02:00
jtrzupek
c18ed24330 extractors: register fullmoviesxxx + hdporngg (WebView fallback)
Bug 19866e9e ("problem z oboma hosterami"): a scene whose only two sources were
fullmovies.xxx and hdporn.gg wouldn't play at all — neither had an entry in the
extractor registry, so try_extract returned None ("no stream"). fullmovies.xxx
serves a <source ...get_file...mp4> but the get_file CDN times out from the VPS
(unreachable, like freshporno), so backend resolve isn't viable; hdporn.gg sample
pages 404. Route both through the WebView fallback so the phone (residential IP)
loads the page and plays / the injected-JS scrape can grab the URL — strictly
better than no playback path. Surfaced by the hoster sweep + this bug report.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-03 22:16:05 +02:00
jtrzupek
e42217773f feat(deep-crawl): xvideos browse source (capped) + per-tube page cap
xvideos SSR's JSON-LD VideoObject (duration/title/uploadDate) + on-page /models/ (perf)
+ /tags/. Sample: median ~10.5min, 93% >=3min. Pilot (2 pages): 29 new, 100% playable +
visible + tagged (performers sparse — xvideos 'new' is amateur-heavy; /models/ tagged
mostly on studio rips).

- XVideosBrowseScraper (JSON-LD + page-parse models/tags), in ALL_BROWSE_SCRAPERS.
- deep_crawl._PAGE_CAP: per-sitetag depth cap; xvideoscom=1800 (~newest 50k). At the cap
  the tube is marked exhausted (reset -> incremental re-sweep) so a mega-tube cannot
  monopolize the round-robin or balloon the DB.
- ported yesporn.py into the public repo (was prod-only, like hdporngg) ending the
  __init__ public/prod divergence.

youporn rejected: JSON-LD lacks actor/keywords, its /pornstar//category/ links are A-Z
nav not scene-specific. xhamster: 429/Cloudflare from the VPS IP.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-03 11:16:44 +02:00
jtrzupek
ee4915770f feat(deep-crawl): eporner via JSON API as SSR-rich source (Phase 2b alternative)
porntrex/hqporner rejected for deep-crawl: KVS sites with no SSR metadata (77% of
existing porntrex has no duration -> invisible under the app's >=60 filter). eporner
instead exposes a public JSON API (api/v2/video/search) returning title + length_sec
+ keywords + added per video; ~100k videos, ~100/page, no per-scene detail fetch.

- BaseBrowseScraper.crawl_page(page): factored out of latest_scenes; returns None
  (transient fail) / [] (catalog end) / [scenes]. API subclasses override it.
- deep_crawl drives via crawl_page (supports HTML-listing AND API sources).
- EpornerApiScraper: crawl_page hits the eporner API -> RawScene with duration+tags+
  date+thumb+playback; registered in ALL_BROWSE_SCRAPERS.
- Pilot (2 API pages): 192 new, 100% playable + tagged + visible (>=60); the <180s
  trailer filter dropped 6 short clips.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-03 10:37:20 +02:00
jtrzupek
0f19a61789 feat(ingest): skip <180s tube scenes (trailers) + purge porndoe trailer orphans
Deep-crawling tube catalogs pulls in lots of <3min trailers/teasers (porndoe). Add
min_ingest_duration_sec (default 180): _process_scene skips scraper-source scenes whose
known duration is below the floor (unknown duration kept; canonical TPDB/StashDB
untouched). Deleted 67 existing porndoe-only orphan trailers (<180s, no canonical, no
non-porndoe live playback) via cascade.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-03 10:11:25 +02:00
jtrzupek
7e46e5ac48 feat(scheduler): deep-crawl full tube catalogs (Phase 2a — ingest-all)
We ingested only ~3% of each browse tube's catalog (porndoe >62k scenes; we had 1959)
because tubes were hit only by performer-search + top-N browse. Pilot (porndoe pages
64-110): 1119 new scenes, 100% playable + 100% tagged, 0% canonical overlap (purely
additive — content not in TPDB/StashDB).

- app/scheduler/deep_crawl.py: round-robin over ALL_BROWSE_SCRAPERS, per-tube page cursor
  in app/_state/deepcrawl_state.json (no DB migration), deep-paginate from the cursor,
  idempotent (resolver skips known by raw_hash), mark 'exhausted' at catalog end then
  reset cursors for an incremental re-sweep.
- _job_deep_crawl: hourly, 60 pages/run (~1860 scenes, ~22 min), wrapped in the 1h
  hard-timeout; registered in build_scheduler (jobs=10).
- config: sched_deep_crawl_hours=1, deep_crawl_pages_per_run=60, deepcrawl_state_path.
- scripts/pilot_porndoe_deepcrawl.py: one-off pilot used to validate the approach.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-03 09:26:44 +02:00
jtrzupek
58b355b6b5 fix(pornhub): WebView fallback — yt-dlp gets 403 from VPS
Hoster sweep (2026-06-02) found pornhub resolving to 0 sources: yt-dlp (current,
2026.03.17) gets HTTP 403 fetching the watch page from the Hetzner VPS, while the
other yt-dlp tubes (xvideos/xnxx/youporn/redtube) still work — so it's a
Pornhub-specific block of the server IP, not a yt-dlp regression. Route pornhub
through the WebView fallback so it plays from the phone's residential IP, same as
xhamster. 7.3k scenes affected.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-02 21:41:38 +02:00
jtrzupek
d4c4b79e92 fix(kvs): cap get_file timeout + early-break on dead scenes
Bug 6ec1960e: yespornvip "resolving forever". yesporn.vip moved to a
cdn4/remote_control.php CDN (still portable cross-IP — verified 206 from a
residential IP, so backend resolve stays correct). But when a video is removed
from the CDN the page still exists and each get_file 302-follow STALLS to the
full timeout. With the resolve timeout (60s) applied per quality variant, a dead
scene hung 3x60 = 180s and returned nothing -> the mobile resolve spinner never
ended.

Fix: a dedicated low get_file timeout (10s, separate from the page-fetch
timeout) and an early-break once 2 variants fail with no result so far (the
scene is dead on the CDN — no point waiting for the third). Dead scene now
resolves to None in ~20s instead of 180s; a live scene is unaffected (~0.8s,
3 sources). Applies to all KVS tubes (yespornvip + pornditt).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-02 21:33:05 +02:00
jtrzupek
08f901712c fix(scheduler): hard-timeout heavy jobs + periodic stuck-run reaper
At the shared 05:00 anchor all heavy jobs fire together; tpdb/stashdb/performer-driven
had no timeout, so a hung connector blocked the whole job and — with max_instances=1 —
blocked every future fire of that job until a worker restart (incident 2026-06-02: 6 runs
hung 8.7h, movie mirrors 47h stale, tube ingest stalled).

- _run_with_timeout wraps tpdb/stashdb/performer-driven in a 30-min hard cap (same
  ThreadPoolExecutor pattern movie-ingest already uses): on timeout the job returns and
  frees the scheduler slot; the orphaned thread lives until restart.
- _job_reap_stuck: hourly reaper of 'running' >2h rows, registered in the scheduler —
  the startup-only reaper missed hangs while the worker stayed up for hours.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-02 16:17:50 +02:00
jtrzupek
983bf62416 perf(scenes): drop exact count on filtered lists; index scene_tags(tag_id)
The filtered scene-list endpoints (default feed sends min_duration_sec=60, plus
has_playback / tag / q filters) took ~4.5s — and an idle server. Profiling showed
the entire cost was the bounded COUNT subquery over the EXISTS filters: Postgres
would not reliably early-terminate at the cap under psycopg bound params, scanning
the whole matching set (~858k for has_playback). Counting over the PK and using a
literal LIMIT helped some cases but the plan stayed unstable.

Fix: stop computing an exact count for filtered lists entirely. The mobile client
paginates by has_more (per_page+1 fetch), never by total — total is only the "N+"
UI counter. Derive total as a lower bound from the page + has_more after the fetch.
This removes the count query from every filtered request.

Result (end-to-end, authenticated): default feed 4.5s -> ~0.1s, has_playback
4.4s -> ~0.1s, q/studio/normal-tag filters all <0.3s. Also added index
scene_tags(tag_id, scene_id) (PK led with scene_id, so tag->scenes did a seq scan).

Remaining: a single enormous tag (e.g. "anal", ~163k scenes) ordered by recency
still gathers-all-then-sorts in the fetch (~5s); normal tags are <0.5s. Tracked
in #22 for a denormalized recency-ordered approach.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-02 12:00:36 +02:00
jtrzupek
20a8dc8e27 perf(scenes): count over PK, not whole entity, in filtered list
The bounded count for filtered scene lists ran `SELECT count(*) FROM (SELECT
scenes.* ... LIMIT 1001)` because the base query selects the full Scene entity.
Counting over all columns made the planner pick a far worse plan via psycopg
bound params (~4s for has_playback) than the same logic over the PK (~30-400ms).
Count semantics are unchanged — we only need rows to exist — so count over
`base.with_only_columns(Scene.id)`.

Partial: this fixes the count leg. The main ordered fetch on filtered lists
(has_playback / tags) can still pick a gather-all-then-sort plan under bound
params (fast with literal binds, slow parameterized) — tracked separately.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-02 11:14:38 +02:00
jtrzupek
817b50fbf8 fix(scenes): propagate playback duration to Scene + duration-consistent counts
Scene.duration_sec was NULL for ~74% of playable scenes (tube duration lives on
playback_source, never propagated to Scene), so the mobile min_duration_sec=60 filter
(Scene.duration_sec >= 60; NULL fails) silently hid them — surfaced as '119 in favorites,
14 after entering the performer' (Safira Yakkuza).

- resolver: _effective_duration() falls back to max live playback_source duration when the
  connector provides no scene-level duration (forward fix, used in create + update).
- scripts/backfill_scene_duration_from_playback.py: one-off idempotent backfill (recovered
  204,014 scenes).
- taxonomy_counts: scene_count now counts playable AND duration_sec >= 60, matching the
  always-60s-filtered scene lists, so favorites/performer/studio/tag badges agree with what
  the scene screen actually shows (Safira: 39 == 39).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-01 21:31:01 +02:00
jtrzupek
cd12348782 fix(movies): paradisehill delta date-granularity + browse cadence docs
- paradisehill.fetch_movies compared release_date coerced to midnight against the
  `since` timestamp, so the chronological crawl stopped at the first upload dated
  the same calendar day as `since` and silently dropped most new movies (0-2 seen
  per run; Movies tab stalled). Compare by DATE with a 1-day grace instead; idempotent
  external_records upsert dedups the re-fetched recent window.
- scripts/backfill_paradisehill_movies.py: one-off no-delta deep crawl to recover the
  backlog missed during the bug (idempotent, resumable).
- docs: correct stale 'raz dziennie/24h' browse-latest comments to 6h (4x/day), the
  actual configured cadence (config.py sched_browse_latest_hours=6).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-01 17:00:10 +02:00
jtrzupek
da7fcda132 feat(ingest): SQL phash match, tag inference + backfill, clip-store skip, browse tubes, watchdog
Resolver/perf:
- find_by_phash_within: nearest match via Postgres bit_count over bit(64) XOR
  instead of Python scan of all phash fingerprints (~20x faster per scene;
  unblocks long delta runs that were killed mid-run before since advanced).

Scheduler/reliability:
- reap ingest_runs stuck in 'running' on worker startup (killed_by_restart).
- smoke_test: per-source ingest health, stuck-run and browse-freshness checks
  -> Sentry; exclude killed_by_restart from the failed-run alarm.

Tags (ingest with tags + fill blanks):
- wire infer_tag_slugs into normalize_scene so tube scenes get title-inferred
  tags (was dead code); union with connector tags.
- scripts/backfill_inferred_tags.py: keyset/batched/idempotent backfill for
  existing tagless scenes (playable tag coverage 16% -> ~52%).

Clip-store:
- skip ManyVids/IWantClips/Clips4Sale/... from canonical sources at ingest
  (GOON_SKIP_CLIP_STORE, default on) — permanent orphans, ~56% of canonical
  ingest, never have a free-tube playback source.

Browse tubes:
- enable fullmovies + hdporn.gg: studio parsed from title prefix instead of
  the /networks/ sidebar (which always yielded the first listed network);
  drop phash compute (pilot: 0% canonical hit within Hamming 5 — auto-screenshots),
  matching relies on title/performer/duration.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-01 15:07:35 +02:00
jtrzupek
86c9bd438b extractors: keep freshporno/porn00/pornxp/fpoxxx on WebView (IP-bound CDN)
Re-checked whether these four KVS tubes could move to server-side resolve
like yespornvip/pornditt/porntrex. All four are reachable from the backend,
but cross-IP testing showed their final CDN URLs are IP-bound to the
resolving host (403 / connection refused from a different IP; fpo.xxx even
embeds the resolver IP in its acctoken). Unlike the portable cdn5/twa CDNs,
backend resolve cannot produce a mobile-playable URL here without a proxy,
which is out of scope for the public app.

- porn00: was using force_proxy resolve (violated the no-proxy stance);
  switched to the WebView fallback like its siblings. The ad exposure that
  originally motivated the proxy path is mitigated by the recent ad-filter
  work (AD_HOSTS + cover overlay + injected-JS ad-CDN skipping).
- freshporno/pornxp/fpoxxx already on WebView fallback; comments updated
  with the cross-IP findings so this isn't re-investigated.
- Dropped the now-unused tube extractor imports (F401).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-01 10:55:44 +02:00
jtrzupek
920740b76f fix(pornditt): server-side KVS resolve; extract shared _kvs helper
pornditt is the same kt_player KVS engine as yespornvip: flashvars carry
function/0/-obfuscated get_file urls + license_code, and the VPS reaches it
(HTTP 200). It was on _vps_blocked_fallback (WebView), where the scrape grabbed
the VAST preroll ad (trafostatic) instead of content (user bug "pornditt łapie
reklamę zamiast video").

Extracted the verified yespornvip logic into app/extractors/tubes/_kvs.py
(resolve_kvs: fetch page → decode function/0 get_file via kt_player algo → follow
302 in-session → portable CDN, multi-quality). yespornvip.py and new pornditt.py
are now thin wrappers. Registry: porndittcom _vps_blocked_fallback → pornditt.extract.

Verified on prod: pornditt → 720p/480p on twa.tgprn.com (portable, fresh-session
206 video/mp4); yespornvip still → 1080/720/480p on cdn5 (refactor intact).
Backend-only, no OTA — mobile plays mp4+mobile_direct_ok natively with quality
picker, zero WebView/ads.

Note: a real-browser residential load shows MEDIA_ERR on the content (the page's
own player flow / ad gating); server-side decode+follow sidesteps the player
entirely, which is why it resolves cleanly. The original bug scene (40f118e1) has
its video deleted on pornditt — verified on a live scene (156091).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-01 10:36:33 +02:00
jtrzupek
fbfe2b2bf4 fix(yespornvip): server-side KVS resolve to portable CDN, drop WebView fallback
yespornvip was on the WebView fallback, which loaded the ad-heavy host page; the
INJECTED_JS scrape grabbed the preroll ad video (bkcdn.net, ~30s) instead of the
content, so the native player showed a 30s ad. The get_file content url is also
session/cookie-bound (410 for a cookieless ExoPlayer request).

Key finding: the VPS now reaches yesporn.vip (HTTP 200 — unblocked, same as
porntrex got 2026-05-22), so we can resolve server-side like porntrex instead of
relying on the browser. KVS flashvars carry function/0/-obfuscated get_file urls +
license_code; decode the hash with the kt_player algorithm (yt-dlp KVS algo,
verified to reproduce kt_player's output), then follow each quality's get_file 302
in the same curl_cffi session → final cdn5 url. That url is time-bound signed but
NOT IP/cookie-bound — verified portable cross-IP (VPS-resolved url fetched from a
different IP → 206 video/mp4).

New app/extractors/tubes/yespornvip.py returns 480p/720p/1080p portable CDN urls;
registry switched from _vps_blocked_fallback → yespornvip.extract. Mobile plays
direct natively with a working quality picker — zero WebView, zero ads, zero proxy.
Verified on prod (3 cdn5 sources) and emulator (quality picker → 1080p native
decode at 1920px, no WebView, no ad). Backend-only; no OTA needed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 22:33:10 +02:00
jtrzupek
bb4a17ae79 fix(porntrex): resolve get_file 302 backend-side, return portable CDN url
User bug: porntrex plays slowly, no quality picker, reload flicker — suspected
VPS proxy. Root cause: porntrex KVS get_file tokens are cookie/session-bound, not
just time-bound as previously assumed. The extractor handed mobile the raw
get_file url; ExoPlayer's cookieless request → 410 → mobile fell back to the VPS
proxy (slow + nav.replace flicker).

Verified: following get_file in the same curl_cffi session that fetched the page
→ 200 (streams video); a fresh session → 410. The final CDN url after the 302
(cdn.pcdn.cloudswitches.com/...?expires=&md5=) is portable — fresh session → 206.

Fix: extract() now uses one curl_cffi Session for page + get_file, follows each
quality's 302 (stream + Range, no body download) and returns the resolved CDN url.
Mobile plays direct, multi-quality picker works, zero proxy bandwidth. Falls back
to the raw get_file url if a resolve fails. Verified on prod: both 720p/480p now
resolve to cloudswitches CDN.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 21:47:26 +02:00
jtrzupek
5ae5dbb201 perf(scenes): bounded count + has_more for filtered scene lists
Filtered /scenes (tag/origin/q/studio/performer) ran exhaustive COUNT with
stub-filter EXISTS over 1.7M rows: TAG 5.1s, ORIGIN 4.9s, SEARCH 3.1s.
Mobile relied on `loaded < total` for infinite-scroll, making exact count
mandatory and ruling out approximate shortcuts.

Backend:
- SceneListOut gains has_more (bool) and total_capped (bool), both optional
  for backward compat with old mobile
- Filtered count uses LIMIT _COUNT_CAP+1 (1000) subquery — cost is
  O(min(matches, cap)) instead of O(all). Measured: TAG 5.1s→664ms,
  SEARCH 3.1s→138ms, ORIGIN 4.9s→1.07s (also fixes SiteScenes showing
  global count ~1M instead of per-site count)
- has_more from fetching per_page+1 rows (essentially free); extra row
  stripped before serialisation
- Pure-default list (no filters at all) keeps TTL-cached full count

Mobile:
- getNextPageParam uses has_more ?? fallback to loaded<total
- Display shows "{total}+" when total_capped=true (5 screens)

Verified on emulator: tag "Big Tits" → "1000 scenes" loaded, no 500s,
backward compat confirmed (old APK works against new backend).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-31 19:24:26 +02:00
jtrzupek
2163fee245 perf(taxonomy): denormalize scene_count for tags/performers/studios
Counts for /tags, /performers, /studios and /favorites were computed live
per-request by aggregating scene_tags / scene_performers with an EXISTS to
playback_sources. As the catalog grew to ~1.7M scenes (6.3M scene_tags) this
ran ~4.3s for /tags?order=popular (x2 incl. the total count) and ~950ms for
the default /scenes count, making those screens load in several seconds.

- migration 0019: add scene_count (+ DESC index) to tags/performers/studios
- background job _job_refresh_taxonomy_counts (every 3h) recomputes the counts
  in one UPDATE..FROM each (IS DISTINCT FROM to skip unchanged rows)
- /tags, /performers, /studios scenes path now read the column + ORDER BY the
  indexed scene_count; for_movies paths keep live aggregation (small tables)
- favorites read denormalized scene_count instead of a grouped EXISTS aggregate
- /scenes default count: 10-min in-process TTL cache (header is approximate)

Measured: /tags?order=popular&per_page=500 ~8s -> 66ms incl. serialization.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 17:53:48 +02:00
jtrzupek
9c49a69a66 feat(seo): public HTML SEO router + templates; add CLAUDE.md; ignore .nimbalyst
- app/api/seo.py (+ app/templates/seo/*): publiczny HTML SEO router (programmatic
  entity long-tail: performer/studio/scene/landing/2257), bez api-key. Importowany
  przez main.py — wymagany do uruchomienia, dotąd untracked. Opsec-clean (brak
  VPS IP/sekretów).
- CLAUDE.md: instrukcje projektu (dotąd untracked).
- .gitignore: .nimbalyst/ (lokalne tracker-tooling, nie dla OSS repo).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 16:29:59 +02:00
jtrzupek
55503136e6 fix(apk 0.2.1): in-app installer "nic się nie dzieje" + oo launcher icon
INSTALL BUG ("klikam Install i nic się nie dzieje"): ApkInstallerModule
commitował PackageInstaller.Session z PendingIntent → MainActivity, ale Android
po commit odsyła STATUS_PENDING_USER_ACTION + EXTRA_INTENT który MUSI być
startActivity()-owany żeby pokazać systemowy dialog "Install update?". Nic tego
nie obsługiwało → download OK, sesja commit OK, ale dialog NIGDY się nie
pokazywał. Fix: getBroadcast + runtime BroadcastReceiver → na PENDING_USER_ACTION
launchuje EXTRA_INTENT (FLAG_ACTIVITY_NEW_TASK), unregister na terminal status.
(Native — działa dla 0.2.1→przyszłe; do 0.2.1 user sideload z goon-foss.org.)

LAUNCHER ICON: regenerowane mipmapy (oo logo) z assets/icon.png przez PIL —
ic_launcher / ic_launcher_round / ic_launcher_foreground we wszystkich 5
densities (webp). iconBackground #0E1018 (stary navy) → #15110D (warm charcoal).

version 0.2.1 / versionCode 11. Build verified: podpis SHA-256 == ALLOWED,
Running "main" bez crasha. Deployed: /version=0.2.1, /static + webroot + landing.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 13:15:37 +02:00
jtrzupek
0281e449fe build(apk): 0.2.0 — expo-font native, runtime 1.1, fonts re-enabled
Option B (rebuild APK) — odblokowuje custom fonty na stałe + sprawia że
przyszłe font-OTA nie crashują.

- runtime 1.0 → 1.1 (app.json + AndroidManifest EXPO_RUNTIME_VERSION): nowy APK
  ma native ExpoFontLoader, więc MUSI mieć inny runtime niż stare instalacje 1.0
  (inaczej font-OTA crashnęłoby stare). 1.0 channel zostaje na d5b87e5c
  (font-stripped) dla starych, 1.1 = nowy APK z fontami.
- version 0.2.0 / versionCode 10 (build.gradle) — in-app updater (/version=0.2.0)
  zaoferuje install starym 0.1.9.
- Fonty przywrócone (useFonts, theme.fonts realne, SceneTile/MoviePosterCard/
  navigation/GoonWordmark fontFamily) — działają bo native jest w APK.
- Build: gradlew assembleRelease (autolinking expo-font, BEZ prebuild — zachowane
  custom native AntiTamper/ApkInstaller), Sentry source-map upload wyłączony
  (SENTRY_DISABLE_AUTO_UPLOAD, brak org/auth — krok poboczny).
- app/main.py /version 0.1.9 → 0.2.0.

ZWERYFIKOWANE na emulatorze: podpis SHA-256 == ALLOWED_APP_SIG_HASH (anti-tamper
OK), ExpoFontLoader w classes3.dex, `ReactNativeJS: Running "main"` bez crasha.
APK live: /static/app-release.apk + goon-v0.2.0.apk + landing webroot.

UWAGA: launcher-icon (native mipmaps) NIE zmienione w tym buildzie — nadal stara
ikona. Nowy oo-icon wymaga regeneracji res/mipmap-* + rebuild (follow-up).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 12:51:32 +02:00
jtrzupek
05c0f6ef93 fix(scheduler): per-connector hard timeout + reorder mangoporn-first
Bug-report 2026-05-30 "ingest znów się zawiesił". streamporn/pandamovies
wieszały się intermittentnie mid-run (zależnie od live-contentu danego dnia),
blokując sekwencyjny _job_movie_ingest → mangoporn (jedyny mirror z realnym
new-content: 72 nowych 05-28) nigdy nie startował. try/except chronił przed
wyjątkiem, NIE przed hangiem.

Fix:
- _job_movie_ingest: każdy connector w ThreadPoolExecutor z future.result
  (timeout=360s). Hang jednego źródła → log + shutdown(wait=False) + kolejka
  leci dalej. Healthy run ~50s, cap 6min = zapas.
- get_movie_connectors: reorder paradisehill, MANGOPORN, streamporn, pandamovies
  — mangoporn zaraz po canonical primary, przed wolniejszymi/wieszającymi się.

Zweryfikowane: pełny _job_movie_ingest przeszedł wszystkie 4 success w nowej
kolejności (mangoporn 2nd, 23s). 33 osierocone "running" rows (worker ubity
mid-run przy deployach) wyczyszczone osobno.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 11:19:13 +02:00
jtrzupek
a0481060f3 fix(extractor/yespornvip): switch to WebView fallback - both URL paths fail
Previous attempts to extract direct stream URL from yesporn.vip flashvars both
turned out broken (verified 2026-05-30):
- video_url ('/get_file/7/'): requires PHPSESSID cookie from embed page session;
  standalone mobile fetch returns 404
- event_reporting2 ('/get_file/1/'): returns HTTP 200 but Content-Type: image/gif
  (1x1 analytics tracker pixel, not video)

Switch yespornvip -> _vps_blocked_fallback.extract. Mobile loads embed in WebView
with phone IP; kt_player JS decodes URL inside browser context (cookies + session
set properly); INJECTED_JS scrapes <video>.src and posts to ExoPlayer. UX flicker
(page renders before video) is the trade-off but aligns with no-video-proxy policy
(public-app bandwidth/anonymity priority).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-30 12:28:17 +02:00
jtrzupek
172642d55c fix(extractor/yespornvip): use event_reporting2 URL (server 1) — server 7 needs session cookies
Verify 2026-05-29: extractor zwracal video_url server `/get_file/7/...?embed=true`
ktore 404-uje na direct fetch nawet ze swiezym tokenem - URL wymaga PHPSESSID
z embed page session (cookies jar mobile-side nie matchuje VPS-side ekstraktora).

Switch na `event_reporting2` ktore wskazuje na `/get_file/1/...` - standalone
time-bound signed URL, 200 OK direct fetch z UA+Referer. Quality label
zachowany z `video_url_text`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-29 09:48:12 +02:00
jtrzupek
6eb7cdd320 feat(movies): watched/continue-watching tracking end-to-end
Bug-report b207ff17 2026-05-26 ("przydaloby sie oznaczenie filmow juz
obejrzanych" - sceny mialy watched badge + dim, filmom brakowalo).

Backend:
- alembic 0018_movie_play_progress: nowa tabela (mirror scene_play_progress)
- MoviePlayProgress SQLAlchemy model
- MovieOut schema dolane finished/position_sec/last_played_at
- POST+DELETE /movies/{id}/progress endpointy (upsert via pg ON CONFLICT)
- _movie_to_out wstrzykuje progress z DB

Mobile:
- RouteParams.entityKind: 'scene'|'movie' (default scene dla back-compat)
- PlayerScreen NativeVideoPlayer + EmbedWebViewPlayer dispatchuja
  upsertProgress vs upsertMovieProgress po entityKind
- MovieDetailScreen przekazuje entityKind='movie' do nav
- MoviePosterCard renderuje dim + check badge + progress bar
  (parity ze ScenesScreen pattern)

Wczesniej MovieDetail przekazywal movieId jako sceneId -> backend
/scenes/<movieId>/progress zwracal 404 (silently caught). Po dodaniu
dedykowanego movie endpoint proper routing dziala.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-28 23:24:06 +02:00
jtrzupek
6ee0516e62 fix(connectors/dooplay): max_pages cap to unblock movie ingest queue
Bug-report 2026-05-28 ("od wczoraj nie ma nowych filmow"). DooplayConnector
.fetch_movies mial `while True` po stronach bez bound; streamporn (>2k filmow)
wisial godzinami az do dailowego killa schedulera, blokujac kolejke mangoporn
+ pandamovies. Watermark zamrozony, dziennie 0 nowych filmow.

Fix: cap _MAX_PAGES_DELTA=3 (since-driven runs, ~144 najnowszych pozycji)
i _MAX_PAGES_FULL=50 (full backfill gdy since=None). Wczesniejsza proba
filtrowania przez release_date odrzucona - release_date to data wydania filmu
(np. 2013), nie data uploadu na strone, wiec sortowanie listing nie matchuje.

Po deployu manualne re-run: streamporn 144/46s, pandamovies 120/47s,
mangoporn 108 z 72 NEW filmow w 58s. Scheduler queue unblocked.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-28 23:23:50 +02:00
jtrzupek
81090ca8d2 fix(extractors): mixdrop hardening, yespornvip extractor, freshporno revert
Mixdrop (bug #3/#10 czarny ekran): wymagane UA+Accept headers (bez nich shell
bez P.A.C.K.E.R.). Detect dead-video page -> raise HosterDead zamiast None
(mobile dostaje skip-to-next sygnal). Dispatch regex obejmuje nowy canonical
domain `miixdrop` (double-i).

Yespornvip (bug #1): nowy KVS engine extractor. Origin `tube:yespornvip`
istnial w playback_sources ale brak handlera w _REGISTRY -> try_extract None.
Flashvars `video_url: 'function/0/<get_file_url>'`, function/0 to passthrough.
480p mp4 z mobile_direct_ok=True.

Freshporno (bug #9 revert): wrocony na _vps_blocked_fallback (WebView path).
Krotko-zywy switch na native extract z force_proxy=True cofniety bo app idzie
publicznie - VPS bandwidth/anonimowosc priorytet nad UX flicker.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-28 23:23:37 +02:00
jtrzupek
49bb65d707 fix(scenes): use ON CONFLICT for tag slug upsert in enrich_tags_from_tube
Replace SAVEPOINT + IntegrityError fallback in resolve_tag with
postgres INSERT ... ON CONFLICT (slug) DO NOTHING + re-SELECT.
Postgres serializes on the unique index, so concurrent inserts of
the same slug no longer race on lookup→insert and the second caller
no longer raises uq_tags_slug. Mirrors the on_conflict pattern
already used for SceneTag/MovieTag inserts.
2026-05-27 15:38:43 +02:00
jtrzupek
aac6b10d77 fix(extractor/hqporner): wire dedicated extractor + reject ad iframes/CDNs
Registry mapowanie `hqpornercom` -> `_vps_blocked_fallback.extract` zwracało
scene page URL do mobile WebView. Page ma 3 ad-iframes (adtng/goaserv/
mavrtracktor) + pop-under triggery -> user widział reklame zamiast video.

Powrot do `hqporner.extract` (multi-quality bigcdn.cc mp4 + force_proxy=True).
Plus hardening: iframe regex bound do `<div id="playerWrapper">...</div>`,
whitelist hostow embed (mydaddy.cc/hqwo.cc) i CDN mp4 (bigcdn/hqwo/flyflv).
2026-05-27 15:10:47 +02:00
https://github.com/goon-foss/goon
7979d5fa61 session work: bug-report fixes + WIP cleanup
User-facing bugs resolved (per bug_reports table 2026-05-25):
- 40cd28aa (short-scene filter): mobile api.ts default min_duration_sec=60
  hides 6519 sub-60s scenes across all list endpoints (Performer/Site/Tag/
  Browse). Caller may override with explicit 0.
- 5e89ef7e (porndoe needs cookies/play click): INJECTED_JS in PlayerScreen
  now auto-clicks player-poster overlay (player-poster-play, big-play-button,
  vjs-big-play-button, jw-icon-display, btn-big-play, mejs__overlay-button,
  play-button, btn-play, videoPlayButton). Triggered same interval as
  consent-dismiss + ad-iframe removal.
- b1b5e1a2 (Mixdrop czarny ekran): re-enable mixdrop direct stream via VPS
  curl_cffi proxy (was: skip → WebView fallback → blank screen). Backend
  pipeline (mixdrop.py extract + stream_proxy._curl_cffi_stream with JA3 +
  auto-refetch on token expire) was already complete; just removed the skip
  in app/api/playback.py.

Plus ongoing WIP (paradisehill multi-part extraction, stream_proxy refetch
logic, gesture race fix for long-press 2x speed, anti-adblock INJECTED_JS
defenses, scripts for freshporno backfill, new sources API).
2026-05-25 22:02:52 +02:00
https://github.com/goon-foss/goon
2fad46f934 filemoon: resurrect via mobile-side resolver (Byse SPA RE)
filemoon (+ mirrory kerapoxy/lvturbo/emturbovid/bysezoxexe/bysezejataos)
nie umarł — ~2026-05 zrobił rebrand na Vite SPA "Byse Frontend". Stary
P.A.C.K.E.R.-JWPlayer embed zniknął, więc backend uznał go za martwego i
wpisał na DEAD_HOSTER_RE. RE bundla index-ChwZgmXV.js (2026-05-22):

  POST /api/videos/<code>/embed/playback  body {"fingerprint":{}}
  → {"playback":{"key_parts":[..],"iv":..,"payload":..}}
  → key=concat(b64url(key_parts)); AES-256-GCM(key,iv,payload) → JSON
  → sources[*].url = HLS master.m3u8

Browser-attestation jest opcjonalny — pusty fingerprint wystarcza.
Stream URL jest IP-bound (token wiąże się z IP requestera), więc resolve
musi iść z urządzenia użytkownika (jak doodstream.ts / packerHoster.ts).

- mobile/src/lib/aesGcm.ts — pure-JS AES-256-GCM decrypt (RN/Hermes nie
  ma Web Crypto); S-box liczony z GF(2^8), GHASH weryfikuje tag.
  Zweryfikowane przeciw cryptography (Python) na 2 payloadach.
- mobile/src/lib/filemoonHoster.ts — resolver: POST playback → decrypt →
  pick best source. E2E test: filemoon.to/e + /d + bysezoxexe.com mirror.
- PlayerScreen: filemoon w resolve useEffect obok doodstream/packer.
- backend: filemoon poza DEAD_HOSTER_RE; hoster.py early-return → przelot
  jako type='hoster' do mobile resolvera (server-side resolve bezcelowy,
  bo URL IP-bound do VPS).
- direct_scrapers: poprawiony błędny komentarz "filemoon shutdown".

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-22 13:18:26 +02:00
https://github.com/goon-foss/goon
b6e3b1cbb5 Origin/hoster filter w /scenes + Filter modal
Dotąd nie dało się docelować sceny konkretnego hostera — search faworyzuje
xnxx/xvideos (dominują bazę), brak filtra po źródle. Diagnostyka per-hoster
(test cookie-fix, luluvid, porntrex) wymagała trafienia sceny danego tube'a.

- /scenes?origin=<substr> — exists() na PlaybackSource.origin ilike, np.
  'hqporner' łapie tube:hqpornercom
- ScenesFilterModal: sekcja "Source / hoster" (TextInput) w FilterState.origin
- ScenesScreen: filter.origin → listScenes; liczone do activeCount

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-22 12:12:50 +02:00
https://github.com/goon-foss/goon
28273eda02 Dedykowane resolvery: xtremestream + porntrex KVS
xtremestream (perverzija):
- extract_stream_from_hoster special-case: embed /player/index.php?data=<H>
  → m3u8 master = /player/xs1.php?data=<H> (z inline JS m3u8_loader_url)
- Wcześniej brak packera/file w videojs HTML → WebView fallback

porntrex (KVS) — VPS znów ma dostęp 2026-05-22:
- Nowy app/extractors/tubes/porntrex.py — flashvars video_url/_alt_url
  → get_file URLs (480/720/1080p)
- get_file 302 → CDN time-bound signed (expires+md5, NIE IP-bound)
  → mobile_direct_ok=True, mobile gra direct, zero VPS bandwidth
- _REGISTRY: porntrexcom _vps_blocked_fallback → porntrex.extract

bysezoxexe (latestpornvideo 2nd embed) — filemoon-rebrand Vite SPA,
wymaga osobnego RE; latestpornvideo i tak działa przez luluvid.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-22 11:54:47 +02:00
https://github.com/goon-foss/goon
642f1ab8b8 Mobile 0.1.9: OTA enable, WebView cookie-dismiss fix, porndoe connector
Mobile / OTA:
- Enable Expo Updates (app.json + AndroidManifest) → api.goon-foss.org
- Bump 0.1.6 → 0.1.9 (build.gradle, app.json, appVersion.ts, main.py /version)
- backend.ts: default public backend auto-connect (no manual login)

WebView fallback fix (PlayerScreen INJECTED_JS):
- Auto-dismiss cookie/consent gates (hqporner et al. blocked kt_player init)
- Context-scoped: only clicks consent buttons inside cookie/gdpr containers
- Retry window for <source>.src polling raised 5→15 ticks (post-dismiss init)

Resolver:
- Series-position + modifier mismatch detector (Episode 2≠4, BTS/unedited)
  → composite_score hard-reject / cap; wired into scene_score + bulk_dedup
- aggregator-mode candidate query: LIMIT 500 + title-match ordering

Connectors:
- porndoe.com browse scraper (JSON-LD VideoObject) — theporndude audit pilot

landing: APK links → goon-v0.1.9.apk

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-22 11:20:57 +02:00
goon-foss
ad0284585b Initial commit
Goon — self-hosted aggregator for adult-content scene metadata.

Indexes scenes from TPDB, StashDB, and 30+ public adult tube sites.
Cross-source deduplication via perceptual hash + Levenshtein distance.
FastAPI backend + APScheduler worker + React Native (Expo) mobile client.

FOSS, ad-free, donation-funded. See README for details.
2026-05-20 10:10:22 +02:00