goon-foss/goon - Forgejo: Beyond coding. We forge.

Author	SHA1	Message	Date
jtrzupek	6de986b9a7	feat(hqfap): browse scraper + native mp4 extractor (~120k scenes) PlayTube CMS. Sitemap-based pagination (listing has no GET paging), JSON-LD VideoObject metadata, pornstar/category pills, " Clips" categories mapped to studio. Direct mp4 (cdnde.com/okcdn.ru), tokens time-bound and portable cross-IP, so mobile plays direct. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-10 17:51:04 +02:00
jtrzupek	7bf1fd6716	fix(xvideos): parse model name from nested span.name — recover 0-performer scenes xvideos renders the scene's models as `<a href="/models/slug">...<span class="name"> Display Name</span>...`. The old _MODEL_RE wanted text immediately after the anchor `>` and never matched current markup → browse-scraped scenes landed with 0 performers (bug-report 2026-06-07: "no actors, but Rebecca Johnson is on the page"). New regex captures slug + nested span.name, bounded within the anchor. + backfill script for the ~11.9k existing zero-performer xvideos scenes (54% have a real /models/ link; resolver merges names to canonical by name_normalized). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-08 10:13:21 +02:00
jtrzupek	cd257740be	fix(hqporner): require ALL query tokens in slug — stop performer over-attribution hqporner search post-filter kept a scene if its slug contained ANY query token (>=3 chars). For multi-word performer names this matched on a single common token (e.g. "anna","mia"), so the performer-driven ingest attributed the scene to EVERY performer sharing that token — scenes accumulated up to 503 wrong performers (hqporner = 5659 of 5897 scenes with >30 performers; bug-reports 2026-06-07). Switch ANY->ALL: the slug must contain every query token, requiring a full name match before attribution. Single-word names still work. Precision over recall — 144 wrong performers is far worse than missing a few loose matches. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-08 09:28:18 +02:00
jtrzupek	a196fcbcdb	refactor(ingest): rename scraper Source name "pornapp" -> "tube-scraper" The umbrella Source.name for all direct tube scrapers (deep-crawl, browse-latest, performer-driven) was "pornapp" — a misleading leftover from the removed external porn-app API. It read like a dependency on a third-party "pornapp" service; it is not — these are our own scrapers hitting 25+ tubes directly (kind=scraper, origin tube:<sitetag>). Renamed to "tube-scraper" via a single SCRAPER_SOURCE_NAME constant; DB row renamed in place (UPDATE name, same id) so all ingest_runs + external_records history stays linked. No behavior change — external_id keying (sitetag:url) and dedup are unaffected. NOTE: playback_sources.origin "pornapp:<sitetag>" prefix is a separate legacy format (resolve_playback parses it) and is intentionally left untouched. Verified on prod: row renamed (0 stray "pornapp"), new runs land on "tube-scraper". Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-07 16:54:55 +02:00
jtrzupek	210aec0536	feat(scrapers): extract tags + description from porndish scene pages porndish-only scenes had no tags and no description — the scraper only derived a title from the URL slug. The scene page (g1/bimber WP theme) carries both: a <p class="entry-tags"> list of /video2/<slug>/ links (the "#" tags the user sees, categories + co-performers) and a prose description <p> in .entry-content. Override _fetch_scene_metadata in PornDishScraper to pull both from one page fetch. Extend the base hook to accept an optional 4th return element (description) and thread it into RawScene.description — backward compatible with the existing 3-tuple (pornhat). Strips leading embed-button labels ("Video Player N", "Server N") from the prose. Verified on live scenes: clean tag lists + real descriptions. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-06 21:32:10 +02:00
jtrzupek	e42217773f	feat(deep-crawl): xvideos browse source (capped) + per-tube page cap xvideos SSR's JSON-LD VideoObject (duration/title/uploadDate) + on-page /models/ (perf) + /tags/. Sample: median ~10.5min, 93% >=3min. Pilot (2 pages): 29 new, 100% playable + visible + tagged (performers sparse — xvideos 'new' is amateur-heavy; /models/ tagged mostly on studio rips). - XVideosBrowseScraper (JSON-LD + page-parse models/tags), in ALL_BROWSE_SCRAPERS. - deep_crawl._PAGE_CAP: per-sitetag depth cap; xvideoscom=1800 (~newest 50k). At the cap the tube is marked exhausted (reset -> incremental re-sweep) so a mega-tube cannot monopolize the round-robin or balloon the DB. - ported yesporn.py into the public repo (was prod-only, like hdporngg) ending the __init__ public/prod divergence. youporn rejected: JSON-LD lacks actor/keywords, its /pornstar//category/ links are A-Z nav not scene-specific. xhamster: 429/Cloudflare from the VPS IP. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-03 11:16:44 +02:00
jtrzupek	ee4915770f	feat(deep-crawl): eporner via JSON API as SSR-rich source (Phase 2b alternative) porntrex/hqporner rejected for deep-crawl: KVS sites with no SSR metadata (77% of existing porntrex has no duration -> invisible under the app's >=60 filter). eporner instead exposes a public JSON API (api/v2/video/search) returning title + length_sec + keywords + added per video; ~100k videos, ~100/page, no per-scene detail fetch. - BaseBrowseScraper.crawl_page(page): factored out of latest_scenes; returns None (transient fail) / [] (catalog end) / [scenes]. API subclasses override it. - deep_crawl drives via crawl_page (supports HTML-listing AND API sources). - EpornerApiScraper: crawl_page hits the eporner API -> RawScene with duration+tags+ date+thumb+playback; registered in ALL_BROWSE_SCRAPERS. - Pilot (2 API pages): 192 new, 100% playable + tagged + visible (>=60); the <180s trailer filter dropped 6 short clips. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-03 10:37:20 +02:00
jtrzupek	cd12348782	fix(movies): paradisehill delta date-granularity + browse cadence docs - paradisehill.fetch_movies compared release_date coerced to midnight against the `since` timestamp, so the chronological crawl stopped at the first upload dated the same calendar day as `since` and silently dropped most new movies (0-2 seen per run; Movies tab stalled). Compare by DATE with a 1-day grace instead; idempotent external_records upsert dedups the re-fetched recent window. - scripts/backfill_paradisehill_movies.py: one-off no-delta deep crawl to recover the backlog missed during the bug (idempotent, resumable). - docs: correct stale 'raz dziennie/24h' browse-latest comments to 6h (4x/day), the actual configured cadence (config.py sched_browse_latest_hours=6). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-01 17:00:10 +02:00
jtrzupek	da7fcda132	feat(ingest): SQL phash match, tag inference + backfill, clip-store skip, browse tubes, watchdog Resolver/perf: - find_by_phash_within: nearest match via Postgres bit_count over bit(64) XOR instead of Python scan of all phash fingerprints (~20x faster per scene; unblocks long delta runs that were killed mid-run before since advanced). Scheduler/reliability: - reap ingest_runs stuck in 'running' on worker startup (killed_by_restart). - smoke_test: per-source ingest health, stuck-run and browse-freshness checks -> Sentry; exclude killed_by_restart from the failed-run alarm. Tags (ingest with tags + fill blanks): - wire infer_tag_slugs into normalize_scene so tube scenes get title-inferred tags (was dead code); union with connector tags. - scripts/backfill_inferred_tags.py: keyset/batched/idempotent backfill for existing tagless scenes (playable tag coverage 16% -> ~52%). Clip-store: - skip ManyVids/IWantClips/Clips4Sale/... from canonical sources at ingest (GOON_SKIP_CLIP_STORE, default on) — permanent orphans, ~56% of canonical ingest, never have a free-tube playback source. Browse tubes: - enable fullmovies + hdporn.gg: studio parsed from title prefix instead of the /networks/ sidebar (which always yielded the first listed network); drop phash compute (pilot: 0% canonical hit within Hamming 5 — auto-screenshots), matching relies on title/performer/duration. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-01 15:07:35 +02:00
jtrzupek	05c0f6ef93	fix(scheduler): per-connector hard timeout + reorder mangoporn-first Bug-report 2026-05-30 "ingest znów się zawiesił". streamporn/pandamovies wieszały się intermittentnie mid-run (zależnie od live-contentu danego dnia), blokując sekwencyjny _job_movie_ingest → mangoporn (jedyny mirror z realnym new-content: 72 nowych 05-28) nigdy nie startował. try/except chronił przed wyjątkiem, NIE przed hangiem. Fix: - _job_movie_ingest: każdy connector w ThreadPoolExecutor z future.result (timeout=360s). Hang jednego źródła → log + shutdown(wait=False) + kolejka leci dalej. Healthy run ~50s, cap 6min = zapas. - get_movie_connectors: reorder paradisehill, MANGOPORN, streamporn, pandamovies — mangoporn zaraz po canonical primary, przed wolniejszymi/wieszającymi się. Zweryfikowane: pełny _job_movie_ingest przeszedł wszystkie 4 success w nowej kolejności (mangoporn 2nd, 23s). 33 osierocone "running" rows (worker ubity mid-run przy deployach) wyczyszczone osobno. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-31 11:19:13 +02:00
jtrzupek	6ee0516e62	fix(connectors/dooplay): max_pages cap to unblock movie ingest queue Bug-report 2026-05-28 ("od wczoraj nie ma nowych filmow"). DooplayConnector .fetch_movies mial `while True` po stronach bez bound; streamporn (>2k filmow) wisial godzinami az do dailowego killa schedulera, blokujac kolejke mangoporn + pandamovies. Watermark zamrozony, dziennie 0 nowych filmow. Fix: cap _MAX_PAGES_DELTA=3 (since-driven runs, ~144 najnowszych pozycji) i _MAX_PAGES_FULL=50 (full backfill gdy since=None). Wczesniejsza proba filtrowania przez release_date odrzucona - release_date to data wydania filmu (np. 2013), nie data uploadu na strone, wiec sortowanie listing nie matchuje. Po deployu manualne re-run: streamporn 144/46s, pandamovies 120/47s, mangoporn 108 z 72 NEW filmow w 58s. Scheduler queue unblocked. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-28 23:23:50 +02:00
https://github.com/goon-foss/goon	7979d5fa61	session work: bug-report fixes + WIP cleanup User-facing bugs resolved (per bug_reports table 2026-05-25): - 40cd28aa (short-scene filter): mobile api.ts default min_duration_sec=60 hides 6519 sub-60s scenes across all list endpoints (Performer/Site/Tag/ Browse). Caller may override with explicit 0. - 5e89ef7e (porndoe needs cookies/play click): INJECTED_JS in PlayerScreen now auto-clicks player-poster overlay (player-poster-play, big-play-button, vjs-big-play-button, jw-icon-display, btn-big-play, mejs__overlay-button, play-button, btn-play, videoPlayButton). Triggered same interval as consent-dismiss + ad-iframe removal. - b1b5e1a2 (Mixdrop czarny ekran): re-enable mixdrop direct stream via VPS curl_cffi proxy (was: skip → WebView fallback → blank screen). Backend pipeline (mixdrop.py extract + stream_proxy._curl_cffi_stream with JA3 + auto-refetch on token expire) was already complete; just removed the skip in app/api/playback.py. Plus ongoing WIP (paradisehill multi-part extraction, stream_proxy refetch logic, gesture race fix for long-press 2x speed, anti-adblock INJECTED_JS defenses, scripts for freshporno backfill, new sources API).	2026-05-25 22:02:52 +02:00
https://github.com/goon-foss/goon	2fad46f934	filemoon: resurrect via mobile-side resolver (Byse SPA RE) filemoon (+ mirrory kerapoxy/lvturbo/emturbovid/bysezoxexe/bysezejataos) nie umarł — ~2026-05 zrobił rebrand na Vite SPA "Byse Frontend". Stary P.A.C.K.E.R.-JWPlayer embed zniknął, więc backend uznał go za martwego i wpisał na DEAD_HOSTER_RE. RE bundla index-ChwZgmXV.js (2026-05-22): POST /api/videos/<code>/embed/playback body {"fingerprint":{}} → {"playback":{"key_parts":[..],"iv":..,"payload":..}} → key=concat(b64url(key_parts)); AES-256-GCM(key,iv,payload) → JSON → sources[*].url = HLS master.m3u8 Browser-attestation jest opcjonalny — pusty fingerprint wystarcza. Stream URL jest IP-bound (token wiąże się z IP requestera), więc resolve musi iść z urządzenia użytkownika (jak doodstream.ts / packerHoster.ts). - mobile/src/lib/aesGcm.ts — pure-JS AES-256-GCM decrypt (RN/Hermes nie ma Web Crypto); S-box liczony z GF(2^8), GHASH weryfikuje tag. Zweryfikowane przeciw cryptography (Python) na 2 payloadach. - mobile/src/lib/filemoonHoster.ts — resolver: POST playback → decrypt → pick best source. E2E test: filemoon.to/e + /d + bysezoxexe.com mirror. - PlayerScreen: filemoon w resolve useEffect obok doodstream/packer. - backend: filemoon poza DEAD_HOSTER_RE; hoster.py early-return → przelot jako type='hoster' do mobile resolvera (server-side resolve bezcelowy, bo URL IP-bound do VPS). - direct_scrapers: poprawiony błędny komentarz "filemoon shutdown". Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-22 13:18:26 +02:00
https://github.com/goon-foss/goon	642f1ab8b8	Mobile 0.1.9: OTA enable, WebView cookie-dismiss fix, porndoe connector Mobile / OTA: - Enable Expo Updates (app.json + AndroidManifest) → api.goon-foss.org - Bump 0.1.6 → 0.1.9 (build.gradle, app.json, appVersion.ts, main.py /version) - backend.ts: default public backend auto-connect (no manual login) WebView fallback fix (PlayerScreen INJECTED_JS): - Auto-dismiss cookie/consent gates (hqporner et al. blocked kt_player init) - Context-scoped: only clicks consent buttons inside cookie/gdpr containers - Retry window for <source>.src polling raised 5→15 ticks (post-dismiss init) Resolver: - Series-position + modifier mismatch detector (Episode 2≠4, BTS/unedited) → composite_score hard-reject / cap; wired into scene_score + bulk_dedup - aggregator-mode candidate query: LIMIT 500 + title-match ordering Connectors: - porndoe.com browse scraper (JSON-LD VideoObject) — theporndude audit pilot landing: APK links → goon-v0.1.9.apk Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-22 11:20:57 +02:00
goon-foss	ad0284585b	Initial commit Goon — self-hosted aggregator for adult-content scene metadata. Indexes scenes from TPDB, StashDB, and 30+ public adult tube sites. Cross-source deduplication via perceptual hash + Levenshtein distance. FastAPI backend + APScheduler worker + React Native (Expo) mobile client. FOSS, ad-free, donation-funded. See README for details.	2026-05-20 10:10:22 +02:00

15 commits