goon-foss/goon - Forgejo: Beyond coding. We forge.

Author	SHA1	Message	Date
jtrzupek	21bc8bf1fe	feat(superporn): browse scraper via Bright Data residential proxy superporn hard-blocks the VPS IP with Cloudflare 403 on every TLS impersonation, so HTML ingest routes through Bright Data residential (BRIGHTDATA_PROXY_URL, parsed in config). First scraper to use a proxy: optional _proxy on the browse base, threaded into browser_get. JSON-LD VideoObject (title/desc/uploadDate/thumb/duration) + pornstar and category chips; superporn double-encodes HTML entities so titles are unescaped twice. Thumbnails fetch fine from the VPS (no proxy). Playback stays off-proxy: the <source> mp4 token is IP-bound to the fetcher, so resolve is phone-side via WebView (extractor superporncom -> _vps_blocked_fallback), same as porndoe. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-10 18:47:45 +02:00
jtrzupek	ee4915770f	feat(deep-crawl): eporner via JSON API as SSR-rich source (Phase 2b alternative) porntrex/hqporner rejected for deep-crawl: KVS sites with no SSR metadata (77% of existing porntrex has no duration -> invisible under the app's >=60 filter). eporner instead exposes a public JSON API (api/v2/video/search) returning title + length_sec + keywords + added per video; ~100k videos, ~100/page, no per-scene detail fetch. - BaseBrowseScraper.crawl_page(page): factored out of latest_scenes; returns None (transient fail) / [] (catalog end) / [scenes]. API subclasses override it. - deep_crawl drives via crawl_page (supports HTML-listing AND API sources). - EpornerApiScraper: crawl_page hits the eporner API -> RawScene with duration+tags+ date+thumb+playback; registered in ALL_BROWSE_SCRAPERS. - Pilot (2 API pages): 192 new, 100% playable + tagged + visible (>=60); the <180s trailer filter dropped 6 short clips. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-03 10:37:20 +02:00
jtrzupek	cd12348782	fix(movies): paradisehill delta date-granularity + browse cadence docs - paradisehill.fetch_movies compared release_date coerced to midnight against the `since` timestamp, so the chronological crawl stopped at the first upload dated the same calendar day as `since` and silently dropped most new movies (0-2 seen per run; Movies tab stalled). Compare by DATE with a 1-day grace instead; idempotent external_records upsert dedups the re-fetched recent window. - scripts/backfill_paradisehill_movies.py: one-off no-delta deep crawl to recover the backlog missed during the bug (idempotent, resumable). - docs: correct stale 'raz dziennie/24h' browse-latest comments to 6h (4x/day), the actual configured cadence (config.py sched_browse_latest_hours=6). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-01 17:00:10 +02:00
goon-foss	ad0284585b	Initial commit Goon — self-hosted aggregator for adult-content scene metadata. Indexes scenes from TPDB, StashDB, and 30+ public adult tube sites. Cross-source deduplication via perceptual hash + Levenshtein distance. FastAPI backend + APScheduler worker + React Native (Expo) mobile client. FOSS, ad-free, donation-funded. See README for details.	2026-05-20 10:10:22 +02:00

4 commits