goon-foss/goon - Forgejo: Beyond coding. We forge.

Author	SHA1	Message	Date
jtrzupek	f014a901de	feat(scheduler): periodic title+duration dedup (missing-merge tube dupes) Missing-merge duplicates (same performer + identical normalized title + identical duration-to-the-second) that bulk_dedup misses — tube re-scrapes and cross-tube re-ingests like porn00 pulling a video already present from xnxx (reports 28fe8181/32df33b1). Extracted the proven merge_exact_title_duration logic into app/scheduler/title_duration_dedup.py (script now a thin wrapper), wired a 12h scheduler job (playback-only = what users actually see, GOON_SCHED_TITLE_DEDUP_HOURS). Signal is near-certain (two different videos don't share byte-identical title AND exact duration); no shared performer = not merged (over-match guard). Verified: job registers (jobs=14), backlog currently 0 after the one-shot global merge. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-19 11:20:48 +02:00
jtrzupek	e4cb94bc59	feat(scheduler): hetzner bandwidth monitor + search-tube watchdog coverage Two observability additions to the worker scheduler (intertwined in the same files): (1) ingest-watchdog now also covers performer-driven search scrapers (ALL_DIRECT_SCRAPERS) with a separate 7d threshold, not just browse tubes at 48h — several search tubes (perverzija, fpoxxx, porndish, ...) had frozen silently for weeks. (2) New Hetzner Cloud bandwidth monitor (app/scheduler/hetzner_monitor.py): polls outgoing_traffic vs included_traffic and fires a Sentry message at info/warning/error % thresholds with a per-level fingerprint. The config fields existed for ages but the monitor was never implemented. No-op until HETZNER_API_TOKEN + HETZNER_SERVER_ID are set in .env (verified: returns {enabled: False}, job registers as 'hetzner-monitor every 6h', jobs=13). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-18 09:18:59 +02:00
jtrzupek	0424cb9138	feat(scheduler): per-origin ingest freshness watchdog -> Sentry The global source monitor can't catch a single stalled tube because every tube scraper shares one Source row (tube-scraper), so an aggregate run still reports success while one origin freezes (freshporno browsing the rotating KVS homepage root, report 14f3a655). New watchdog checks max(created_at) per active browse-scraper origin (tube:<sitetag>); if a tube with history hasn't produced a new scene in > max_age_hours it fires a Sentry message with a stable per-origin fingerprint (age in extras, not the title, so it stays one grouped issue). Runs every 6h, 48h threshold, both env-tunable (GOON_SCHED_INGEST_WATCHDOG_HOURS / GOON_INGEST_WATCHDOG_MAX_AGE_HOURS). Verified: 0 stale at 48h post-fix, detects neporn at a strict 12h threshold. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-15 10:26:25 +02:00
jtrzupek	8b4783771f	feat(scheduler): periodic thumb-asset dedup (hdporn.gg/fullmovies.xxx) The one-off cleanup merged ~13.5k same-video-different-title dupes, but they regrow as these sibling tubes re-ingest under new titles. Wire the asset-id+duration merge into the scheduler (every 12h, GOON_SCHED_THUMB_DEDUP_HOURS, 0=off) so it stays clean. Shared logic lives in app/scheduler/thumb_dedup.py (run_thumb_asset_dedup); the one-shot script now imports it. Same tight signature as the cleanup: family hosts only + identical duration (the bare asset-id number is reused across unrelated CDNs, so cross-host/diff- duration grouping is excluded). Reports 205b17d9 / 5a2944cb. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-14 14:56:45 +02:00
jtrzupek	f7670963df	fix(sxyprn): disable thumbnail refresh job — trafficdeposit token has ~1h TTL CORRECTION: trafficdeposit thumbnail tokens are hour-bucketed and valid only ~1h (verified 2026-06-10: stored ts=11:00 dead at 12:27, current ts=13:00 loads). Earlier "~weekly rot" read was wrong. Storing/periodically-refreshing sxyprn thumbnail URLs is futile — they expire within the hour. Default the refresh job OFF (kept in code). The dead-marking sweep (Post Not Found → dead_at) it performed was still valid. Live sxyprn thumbnails need on-demand resolution at serve time (future work). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-10 14:29:24 +02:00
jtrzupek	fef28ae56b	feat(sxyprn): refresh rotting thumbnails from live post pages + scheduled job CORRECTION to earlier "unrecoverable" call: the /post/<id> page is alive (200) and DOES expose the scene's own fresh-signed poster via og:image / <video poster> (post-id embedded, current timestamp) — only the STORED thumbnail URL had rotted. Search/listings don't re-surface old posts (0 overlap), but per-post fetch works. scripts/refresh_sxyprn_thumbs.py: iterate live sxyprn sources, fetch post page, extract fresh og:image, UPDATE thumbnail_url (verified: refreshed URLs return 200). _job_refresh_sxyprn_thumbs: every 12h refresh the 1200 least-recently-updated sources (cycles the ~19k catalog within the expiry window). Pairs with the scene_resolver overwrite fix so refreshed thumbnails stick. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-10 10:36:30 +02:00
jtrzupek	9d0cb7f26e	fix(scheduler): bulk_dedup performers cross_source_only + hard-timeout (OOM) _job_bulk_dedup_performers called run_bulk_dedup(strategy="performers") without the cross_source_only guard whose docstring exists precisely to prevent this OOM. At current catalog scale the unguarded path materializes N²/2 pairs per prolific performer into a list → worker hit 6GB RSS and was OOM-killed every 12h (05:00/ 17:00), taking down concurrent tpdb/stashdb/movie ingests as killed_by_restart (0 new movies). Verified in prod: 05:00 run now completes (885k pairs scored, no OOM) and ingests succeed (stashdb +241, tpdb +175). Also wrap in _run_with_timeout like tpdb/stashdb (job had no hard-timeout). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-07 11:00:19 +02:00
jtrzupek	7e46e5ac48	feat(scheduler): deep-crawl full tube catalogs (Phase 2a — ingest-all) We ingested only ~3% of each browse tube's catalog (porndoe >62k scenes; we had 1959) because tubes were hit only by performer-search + top-N browse. Pilot (porndoe pages 64-110): 1119 new scenes, 100% playable + 100% tagged, 0% canonical overlap (purely additive — content not in TPDB/StashDB). - app/scheduler/deep_crawl.py: round-robin over ALL_BROWSE_SCRAPERS, per-tube page cursor in app/_state/deepcrawl_state.json (no DB migration), deep-paginate from the cursor, idempotent (resolver skips known by raw_hash), mark 'exhausted' at catalog end then reset cursors for an incremental re-sweep. - _job_deep_crawl: hourly, 60 pages/run (~1860 scenes, ~22 min), wrapped in the 1h hard-timeout; registered in build_scheduler (jobs=10). - config: sched_deep_crawl_hours=1, deep_crawl_pages_per_run=60, deepcrawl_state_path. - scripts/pilot_porndoe_deepcrawl.py: one-off pilot used to validate the approach. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-03 09:26:44 +02:00
jtrzupek	08f901712c	fix(scheduler): hard-timeout heavy jobs + periodic stuck-run reaper At the shared 05:00 anchor all heavy jobs fire together; tpdb/stashdb/performer-driven had no timeout, so a hung connector blocked the whole job and — with max_instances=1 — blocked every future fire of that job until a worker restart (incident 2026-06-02: 6 runs hung 8.7h, movie mirrors 47h stale, tube ingest stalled). - _run_with_timeout wraps tpdb/stashdb/performer-driven in a 30-min hard cap (same ThreadPoolExecutor pattern movie-ingest already uses): on timeout the job returns and frees the scheduler slot; the orphaned thread lives until restart. - _job_reap_stuck: hourly reaper of 'running' >2h rows, registered in the scheduler — the startup-only reaper missed hangs while the worker stayed up for hours. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-02 16:17:50 +02:00
jtrzupek	cd12348782	fix(movies): paradisehill delta date-granularity + browse cadence docs - paradisehill.fetch_movies compared release_date coerced to midnight against the `since` timestamp, so the chronological crawl stopped at the first upload dated the same calendar day as `since` and silently dropped most new movies (0-2 seen per run; Movies tab stalled). Compare by DATE with a 1-day grace instead; idempotent external_records upsert dedups the re-fetched recent window. - scripts/backfill_paradisehill_movies.py: one-off no-delta deep crawl to recover the backlog missed during the bug (idempotent, resumable). - docs: correct stale 'raz dziennie/24h' browse-latest comments to 6h (4x/day), the actual configured cadence (config.py sched_browse_latest_hours=6). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-01 17:00:10 +02:00
jtrzupek	2163fee245	perf(taxonomy): denormalize scene_count for tags/performers/studios Counts for /tags, /performers, /studios and /favorites were computed live per-request by aggregating scene_tags / scene_performers with an EXISTS to playback_sources. As the catalog grew to ~1.7M scenes (6.3M scene_tags) this ran ~4.3s for /tags?order=popular (x2 incl. the total count) and ~950ms for the default /scenes count, making those screens load in several seconds. - migration 0019: add scene_count (+ DESC index) to tags/performers/studios - background job _job_refresh_taxonomy_counts (every 3h) recomputes the counts in one UPDATE..FROM each (IS DISTINCT FROM to skip unchanged rows) - /tags, /performers, /studios scenes path now read the column + ORDER BY the indexed scene_count; for_movies paths keep live aggregation (small tables) - favorites read denormalized scene_count instead of a grouped EXISTS aggregate - /scenes default count: 10-min in-process TTL cache (header is approximate) Measured: /tags?order=popular&per_page=500 ~8s -> 66ms incl. serialization. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-31 17:53:48 +02:00
jtrzupek	05c0f6ef93	fix(scheduler): per-connector hard timeout + reorder mangoporn-first Bug-report 2026-05-30 "ingest znów się zawiesił". streamporn/pandamovies wieszały się intermittentnie mid-run (zależnie od live-contentu danego dnia), blokując sekwencyjny _job_movie_ingest → mangoporn (jedyny mirror z realnym new-content: 72 nowych 05-28) nigdy nie startował. try/except chronił przed wyjątkiem, NIE przed hangiem. Fix: - _job_movie_ingest: każdy connector w ThreadPoolExecutor z future.result (timeout=360s). Hang jednego źródła → log + shutdown(wait=False) + kolejka leci dalej. Healthy run ~50s, cap 6min = zapas. - get_movie_connectors: reorder paradisehill, MANGOPORN, streamporn, pandamovies — mangoporn zaraz po canonical primary, przed wolniejszymi/wieszającymi się. Zweryfikowane: pełny _job_movie_ingest przeszedł wszystkie 4 success w nowej kolejności (mangoporn 2nd, 23s). 33 osierocone "running" rows (worker ubity mid-run przy deployach) wyczyszczone osobno. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-31 11:19:13 +02:00
https://github.com/goon-foss/goon	7979d5fa61	session work: bug-report fixes + WIP cleanup User-facing bugs resolved (per bug_reports table 2026-05-25): - 40cd28aa (short-scene filter): mobile api.ts default min_duration_sec=60 hides 6519 sub-60s scenes across all list endpoints (Performer/Site/Tag/ Browse). Caller may override with explicit 0. - 5e89ef7e (porndoe needs cookies/play click): INJECTED_JS in PlayerScreen now auto-clicks player-poster overlay (player-poster-play, big-play-button, vjs-big-play-button, jw-icon-display, btn-big-play, mejs__overlay-button, play-button, btn-play, videoPlayButton). Triggered same interval as consent-dismiss + ad-iframe removal. - b1b5e1a2 (Mixdrop czarny ekran): re-enable mixdrop direct stream via VPS curl_cffi proxy (was: skip → WebView fallback → blank screen). Backend pipeline (mixdrop.py extract + stream_proxy._curl_cffi_stream with JA3 + auto-refetch on token expire) was already complete; just removed the skip in app/api/playback.py. Plus ongoing WIP (paradisehill multi-part extraction, stream_proxy refetch logic, gesture race fix for long-press 2x speed, anti-adblock INJECTED_JS defenses, scripts for freshporno backfill, new sources API).	2026-05-25 22:02:52 +02:00
https://github.com/goon-foss/goon	642f1ab8b8	Mobile 0.1.9: OTA enable, WebView cookie-dismiss fix, porndoe connector Mobile / OTA: - Enable Expo Updates (app.json + AndroidManifest) → api.goon-foss.org - Bump 0.1.6 → 0.1.9 (build.gradle, app.json, appVersion.ts, main.py /version) - backend.ts: default public backend auto-connect (no manual login) WebView fallback fix (PlayerScreen INJECTED_JS): - Auto-dismiss cookie/consent gates (hqporner et al. blocked kt_player init) - Context-scoped: only clicks consent buttons inside cookie/gdpr containers - Retry window for <source>.src polling raised 5→15 ticks (post-dismiss init) Resolver: - Series-position + modifier mismatch detector (Episode 2≠4, BTS/unedited) → composite_score hard-reject / cap; wired into scene_score + bulk_dedup - aggregator-mode candidate query: LIMIT 500 + title-match ordering Connectors: - porndoe.com browse scraper (JSON-LD VideoObject) — theporndude audit pilot landing: APK links → goon-v0.1.9.apk Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-22 11:20:57 +02:00
goon-foss	ad0284585b	Initial commit Goon — self-hosted aggregator for adult-content scene metadata. Indexes scenes from TPDB, StashDB, and 30+ public adult tube sites. Cross-source deduplication via perceptual hash + Levenshtein distance. FastAPI backend + APScheduler worker + React Native (Expo) mobile client. FOSS, ad-free, donation-funded. See README for details.	2026-05-20 10:10:22 +02:00

15 commits