4k69.com (~65k scenes): same PlayTube CMS as hqfap - common logic moved to _playtube.py (sitemap catalog, JSON-LD, pills). Studio classified by matching category pills against the studios index page. Streams are get_file (fullmovies family) returned unresolved with mobile_direct, 2160p skipped. neporn.com: KVS engine, latest-updates listing, JSON-LD + video:duration meta, performers from models links with flashvars video_tags fallback for fresh uploads. Resolve via _kvs; final URL portable cross-IP. superporn.com rejected: Cloudflare 403 from VPS on all TLS impersonations. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
66 lines
2.7 KiB
Python
66 lines
2.7 KiB
Python
"""4k69.com — latest-vids browse scraper (PlayTube CMS, patrz _playtube.py).
|
|
|
|
Dołączony 2026-06-10 (user request; probe 2026-06-01 odrzucił po stronie głównej
|
|
"JS-rendered" — błędnie, scene pages mają pełny SSR + JSON-LD). 7 video sitemapów
|
|
≈ ~65k scen, content w dużej mierze studyjny (paysite re-upload, 4K).
|
|
|
|
Specyfika vs baza: studio NIE ma własnego pola na scenie — nazwy studiów występują
|
|
jako kategorie ("21 Sextury", "Adult Time") obok zwykłych ("Anal", "4K").
|
|
Klasyfikacja: lista wszystkich studiów z `/studios` (fetch raz per instancję,
|
|
match po znormalizowanej nazwie alfanumerycznej — pill "Adult Time" vs slug
|
|
"AdultTime"). Studio bywa też w prefiksie tytułu, ale kategoria jest pewniejsza.
|
|
|
|
Playback: JSON-LD contentUrl + dwa dodatkowe get_file w HTML (2160m/720m/480m,
|
|
www.4kporno.xxx) — ta sama platforma co fullmovies/hdporngg: get_file binduje CDN
|
|
do IP fetchera, więc oddajemy NIEZRESOLWOWANE (mobile_direct), telefon follow-uje
|
|
302 z własnym IP. Extractor `4k69com` pomija 2160p (CDN time-out, jak fpvcdn).
|
|
"""
|
|
from __future__ import annotations
|
|
|
|
import logging
|
|
import re
|
|
|
|
from app.connectors.direct_scrapers._playtube import BasePlayTubeScraper
|
|
from app.extractors import browser_get
|
|
|
|
log = logging.getLogger(__name__)
|
|
|
|
_STUDIO_LINK_RE = re.compile(r"href=['\"][^'\"]*/videos/studio/([^'\"]+)['\"]", re.IGNORECASE)
|
|
|
|
|
|
def _norm(name: str) -> str:
|
|
"""`Adult Time` / `AdultTime` → `adulttime` (porównanie pill vs studio slug)."""
|
|
return re.sub(r"[^a-z0-9]", "", name.lower())
|
|
|
|
|
|
class FourK69Scraper(BasePlayTubeScraper):
|
|
sitetag = "4k69com"
|
|
base_url = "https://4k69.com"
|
|
|
|
def __init__(self) -> None:
|
|
super().__init__()
|
|
self._studio_set: set[str] | None = None
|
|
|
|
def _load_studio_set(self) -> set[str]:
|
|
"""Znormalizowane nazwy wszystkich studiów z /studios. Pusty set = fetch
|
|
fail (graceful: sceny pójdą bez studio, composite ma performer+title+dur)."""
|
|
if self._studio_set is not None:
|
|
return self._studio_set
|
|
try:
|
|
r = browser_get(f"{self.base_url}/studios", timeout=self._timeout)
|
|
r.raise_for_status()
|
|
self._studio_set = {_norm(m) for m in _STUDIO_LINK_RE.findall(r.text) if _norm(m)}
|
|
log.info("4k69: studio list loaded — %d studios", len(self._studio_set))
|
|
except Exception as e:
|
|
log.warning("4k69: studios page fetch failed: %s", e)
|
|
self._studio_set = set()
|
|
return self._studio_set
|
|
|
|
def _pick_studio(self, category_names: list[str]) -> str | None:
|
|
studios = self._load_studio_set()
|
|
if not studios:
|
|
return None
|
|
for name in category_names:
|
|
if _norm(name) in studios:
|
|
return name
|
|
return None
|