goon/app/connectors/direct_scrapers/fourk69.py
jtrzupek 80fd83cb4e feat(tubes): add 4k69 + neporn browse scrapers, shared PlayTube base
4k69.com (~65k scenes): same PlayTube CMS as hqfap - common logic moved
to _playtube.py (sitemap catalog, JSON-LD, pills). Studio classified by
matching category pills against the studios index page. Streams are
get_file (fullmovies family) returned unresolved with mobile_direct,
2160p skipped.

neporn.com: KVS engine, latest-updates listing, JSON-LD + video:duration
meta, performers from models links with flashvars video_tags fallback
for fresh uploads. Resolve via _kvs; final URL portable cross-IP.

superporn.com rejected: Cloudflare 403 from VPS on all TLS impersonations.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-10 18:15:13 +02:00

66 lines
2.7 KiB
Python

"""4k69.com — latest-vids browse scraper (PlayTube CMS, patrz _playtube.py).
Dołączony 2026-06-10 (user request; probe 2026-06-01 odrzucił po stronie głównej
"JS-rendered" — błędnie, scene pages mają pełny SSR + JSON-LD). 7 video sitemapów
≈ ~65k scen, content w dużej mierze studyjny (paysite re-upload, 4K).
Specyfika vs baza: studio NIE ma własnego pola na scenie — nazwy studiów występują
jako kategorie ("21 Sextury", "Adult Time") obok zwykłych ("Anal", "4K").
Klasyfikacja: lista wszystkich studiów z `/studios` (fetch raz per instancję,
match po znormalizowanej nazwie alfanumerycznej — pill "Adult Time" vs slug
"AdultTime"). Studio bywa też w prefiksie tytułu, ale kategoria jest pewniejsza.
Playback: JSON-LD contentUrl + dwa dodatkowe get_file w HTML (2160m/720m/480m,
www.4kporno.xxx) — ta sama platforma co fullmovies/hdporngg: get_file binduje CDN
do IP fetchera, więc oddajemy NIEZRESOLWOWANE (mobile_direct), telefon follow-uje
302 z własnym IP. Extractor `4k69com` pomija 2160p (CDN time-out, jak fpvcdn).
"""
from __future__ import annotations
import logging
import re
from app.connectors.direct_scrapers._playtube import BasePlayTubeScraper
from app.extractors import browser_get
log = logging.getLogger(__name__)
_STUDIO_LINK_RE = re.compile(r"href=['\"][^'\"]*/videos/studio/([^'\"]+)['\"]", re.IGNORECASE)
def _norm(name: str) -> str:
"""`Adult Time` / `AdultTime` → `adulttime` (porównanie pill vs studio slug)."""
return re.sub(r"[^a-z0-9]", "", name.lower())
class FourK69Scraper(BasePlayTubeScraper):
sitetag = "4k69com"
base_url = "https://4k69.com"
def __init__(self) -> None:
super().__init__()
self._studio_set: set[str] | None = None
def _load_studio_set(self) -> set[str]:
"""Znormalizowane nazwy wszystkich studiów z /studios. Pusty set = fetch
fail (graceful: sceny pójdą bez studio, composite ma performer+title+dur)."""
if self._studio_set is not None:
return self._studio_set
try:
r = browser_get(f"{self.base_url}/studios", timeout=self._timeout)
r.raise_for_status()
self._studio_set = {_norm(m) for m in _STUDIO_LINK_RE.findall(r.text) if _norm(m)}
log.info("4k69: studio list loaded — %d studios", len(self._studio_set))
except Exception as e:
log.warning("4k69: studios page fetch failed: %s", e)
self._studio_set = set()
return self._studio_set
def _pick_studio(self, category_names: list[str]) -> str | None:
studios = self._load_studio_set()
if not studios:
return None
for name in category_names:
if _norm(name) in studios:
return name
return None