goon/app/scheduler
jtrzupek e42217773f feat(deep-crawl): xvideos browse source (capped) + per-tube page cap
xvideos SSR's JSON-LD VideoObject (duration/title/uploadDate) + on-page /models/ (perf)
+ /tags/. Sample: median ~10.5min, 93% >=3min. Pilot (2 pages): 29 new, 100% playable +
visible + tagged (performers sparse — xvideos 'new' is amateur-heavy; /models/ tagged
mostly on studio rips).

- XVideosBrowseScraper (JSON-LD + page-parse models/tags), in ALL_BROWSE_SCRAPERS.
- deep_crawl._PAGE_CAP: per-sitetag depth cap; xvideoscom=1800 (~newest 50k). At the cap
  the tube is marked exhausted (reset -> incremental re-sweep) so a mega-tube cannot
  monopolize the round-robin or balloon the DB.
- ported yesporn.py into the public repo (was prod-only, like hdporngg) ending the
  __init__ public/prod divergence.

youporn rejected: JSON-LD lacks actor/keywords, its /pornstar//category/ links are A-Z
nav not scene-specific. xhamster: 429/Cloudflare from the VPS IP.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-03 11:16:44 +02:00
..
__init__.py Initial commit 2026-05-20 10:10:22 +02:00
browse_latest.py fix(movies): paradisehill delta date-granularity + browse cadence docs 2026-06-01 17:00:10 +02:00
bulk_dedup.py Mobile 0.1.9: OTA enable, WebView cookie-dismiss fix, porndoe connector 2026-05-22 11:20:57 +02:00
deep_crawl.py feat(deep-crawl): xvideos browse source (capped) + per-tube page cap 2026-06-03 11:16:44 +02:00
jobs.py feat(scheduler): deep-crawl full tube catalogs (Phase 2a — ingest-all) 2026-06-03 09:26:44 +02:00
performer_driven.py Mobile 0.1.9: OTA enable, WebView cookie-dismiss fix, porndoe connector 2026-05-22 11:20:57 +02:00
taxonomy_counts.py fix(scenes): propagate playback duration to Scene + duration-consistent counts 2026-06-01 21:31:01 +02:00
worker.py feat(scheduler): deep-crawl full tube catalogs (Phase 2a — ingest-all) 2026-06-03 09:26:44 +02:00