goon/app/scheduler
jtrzupek ee4915770f feat(deep-crawl): eporner via JSON API as SSR-rich source (Phase 2b alternative)
porntrex/hqporner rejected for deep-crawl: KVS sites with no SSR metadata (77% of
existing porntrex has no duration -> invisible under the app's >=60 filter). eporner
instead exposes a public JSON API (api/v2/video/search) returning title + length_sec
+ keywords + added per video; ~100k videos, ~100/page, no per-scene detail fetch.

- BaseBrowseScraper.crawl_page(page): factored out of latest_scenes; returns None
  (transient fail) / [] (catalog end) / [scenes]. API subclasses override it.
- deep_crawl drives via crawl_page (supports HTML-listing AND API sources).
- EpornerApiScraper: crawl_page hits the eporner API -> RawScene with duration+tags+
  date+thumb+playback; registered in ALL_BROWSE_SCRAPERS.
- Pilot (2 API pages): 192 new, 100% playable + tagged + visible (>=60); the <180s
  trailer filter dropped 6 short clips.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-03 10:37:20 +02:00
..
__init__.py Initial commit 2026-05-20 10:10:22 +02:00
browse_latest.py fix(movies): paradisehill delta date-granularity + browse cadence docs 2026-06-01 17:00:10 +02:00
bulk_dedup.py Mobile 0.1.9: OTA enable, WebView cookie-dismiss fix, porndoe connector 2026-05-22 11:20:57 +02:00
deep_crawl.py feat(deep-crawl): eporner via JSON API as SSR-rich source (Phase 2b alternative) 2026-06-03 10:37:20 +02:00
jobs.py feat(scheduler): deep-crawl full tube catalogs (Phase 2a — ingest-all) 2026-06-03 09:26:44 +02:00
performer_driven.py Mobile 0.1.9: OTA enable, WebView cookie-dismiss fix, porndoe connector 2026-05-22 11:20:57 +02:00
taxonomy_counts.py fix(scenes): propagate playback duration to Scene + duration-consistent counts 2026-06-01 21:31:01 +02:00
worker.py feat(scheduler): deep-crawl full tube catalogs (Phase 2a — ingest-all) 2026-06-03 09:26:44 +02:00