goon/app/connectors
jtrzupek e42217773f feat(deep-crawl): xvideos browse source (capped) + per-tube page cap
xvideos SSR's JSON-LD VideoObject (duration/title/uploadDate) + on-page /models/ (perf)
+ /tags/. Sample: median ~10.5min, 93% >=3min. Pilot (2 pages): 29 new, 100% playable +
visible + tagged (performers sparse — xvideos 'new' is amateur-heavy; /models/ tagged
mostly on studio rips).

- XVideosBrowseScraper (JSON-LD + page-parse models/tags), in ALL_BROWSE_SCRAPERS.
- deep_crawl._PAGE_CAP: per-sitetag depth cap; xvideoscom=1800 (~newest 50k). At the cap
  the tube is marked exhausted (reset -> incremental re-sweep) so a mega-tube cannot
  monopolize the round-robin or balloon the DB.
- ported yesporn.py into the public repo (was prod-only, like hdporngg) ending the
  __init__ public/prod divergence.

youporn rejected: JSON-LD lacks actor/keywords, its /pornstar//category/ links are A-Z
nav not scene-specific. xhamster: 429/Cloudflare from the VPS IP.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-03 11:16:44 +02:00
..
direct_scrapers feat(deep-crawl): xvideos browse source (capped) + per-tube page cap 2026-06-03 11:16:44 +02:00
__init__.py fix(scheduler): per-connector hard timeout + reorder mangoporn-first 2026-05-31 11:19:13 +02:00
base.py Initial commit 2026-05-20 10:10:22 +02:00
dooplay.py fix(connectors/dooplay): max_pages cap to unblock movie ingest queue 2026-05-28 23:23:50 +02:00
paradisehill.py fix(movies): paradisehill delta date-granularity + browse cadence docs 2026-06-01 17:00:10 +02:00
stashdb.py Initial commit 2026-05-20 10:10:22 +02:00
tpdb.py Initial commit 2026-05-20 10:10:22 +02:00