porntrex/hqporner rejected for deep-crawl: KVS sites with no SSR metadata (77% of existing porntrex has no duration -> invisible under the app's >=60 filter). eporner instead exposes a public JSON API (api/v2/video/search) returning title + length_sec + keywords + added per video; ~100k videos, ~100/page, no per-scene detail fetch. - BaseBrowseScraper.crawl_page(page): factored out of latest_scenes; returns None (transient fail) / [] (catalog end) / [scenes]. API subclasses override it. - deep_crawl drives via crawl_page (supports HTML-listing AND API sources). - EpornerApiScraper: crawl_page hits the eporner API -> RawScene with duration+tags+ date+thumb+playback; registered in ALL_BROWSE_SCRAPERS. - Pilot (2 API pages): 192 new, 100% playable + tagged + visible (>=60); the <180s trailer filter dropped 6 short clips. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|---|---|---|
| .. | ||
| __init__.py | ||
| browse_latest.py | ||
| bulk_dedup.py | ||
| deep_crawl.py | ||
| jobs.py | ||
| performer_driven.py | ||
| taxonomy_counts.py | ||
| worker.py | ||