goon/app at 7e46e5ac4858f52c66cfe0d9446dd36b37024271 - goon-foss/goon

History

jtrzupek 7e46e5ac48 feat(scheduler): deep-crawl full tube catalogs (Phase 2a — ingest-all) We ingested only ~3% of each browse tube's catalog (porndoe >62k scenes; we had 1959) because tubes were hit only by performer-search + top-N browse. Pilot (porndoe pages 64-110): 1119 new scenes, 100% playable + 100% tagged, 0% canonical overlap (purely additive — content not in TPDB/StashDB). - app/scheduler/deep_crawl.py: round-robin over ALL_BROWSE_SCRAPERS, per-tube page cursor in app/_state/deepcrawl_state.json (no DB migration), deep-paginate from the cursor, idempotent (resolver skips known by raw_hash), mark 'exhausted' at catalog end then reset cursors for an incremental re-sweep. - _job_deep_crawl: hourly, 60 pages/run (~1860 scenes, ~22 min), wrapped in the 1h hard-timeout; registered in build_scheduler (jobs=10). - config: sched_deep_crawl_hours=1, deep_crawl_pages_per_run=60, deepcrawl_state_path. - scripts/pilot_porndoe_deepcrawl.py: one-off pilot used to validate the approach. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>		2026-06-03 09:26:44 +02:00
..
api	perf(scenes): drop exact count on filtered lists; index scene_tags(tag_id)	2026-06-02 12:00:36 +02:00
connectors	fix(movies): paradisehill delta date-granularity + browse cadence docs	2026-06-01 17:00:10 +02:00
extractors	fix(pornhub): WebView fallback — yt-dlp gets 403 from VPS	2026-06-02 21:41:38 +02:00
models	perf(taxonomy): denormalize scene_count for tags/performers/studios	2026-05-31 17:53:48 +02:00
normalize	feat(ingest): SQL phash match, tag inference + backfill, clip-store skip, browse tubes, watchdog	2026-06-01 15:07:35 +02:00
resolve	fix(scenes): propagate playback duration to Scene + duration-consistent counts	2026-06-01 21:31:01 +02:00
scheduler	feat(scheduler): deep-crawl full tube catalogs (Phase 2a — ingest-all)	2026-06-03 09:26:44 +02:00
templates	feat(seo): public HTML SEO router + templates; add CLAUDE.md; ignore .nimbalyst	2026-05-31 16:29:59 +02:00
__init__.py	Initial commit	2026-05-20 10:10:22 +02:00
auth.py	Initial commit	2026-05-20 10:10:22 +02:00
config.py	feat(scheduler): deep-crawl full tube catalogs (Phase 2a — ingest-all)	2026-06-03 09:26:44 +02:00
db.py	Initial commit	2026-05-20 10:10:22 +02:00
ingest.py	feat(ingest): SQL phash match, tag inference + backfill, clip-store skip, browse tubes, watchdog	2026-06-01 15:07:35 +02:00
main.py	fix(apk 0.2.1): in-app installer "nic się nie dzieje" + oo launcher icon	2026-05-31 13:15:37 +02:00