goon/app/extractors/tubes/hqfap.py
jtrzupek 960bc75be4 fix(hqfap): reject 3MB video_down.mp4 stub (placeholder, not real video)
hqfap migrated its JSON-LD contentUrl (and the *.workers.dev mirror) to /upload/videos/video_down.mp4, which serves a FIXED ~3.04MB file for EVERY scene regardless of declared length (verified 5/5 scenes at 14-47min all = 3.04MB, 2026-06-21). It is a placeholder/'server down' clip, not the content — the browser's own player streamed the same stub via MediaSource. We were handing users that 3MB stub (reports c382d441/ef10b946). Now reject the video_down.mp4 contentUrl and return no source, so scenes fall through to other sources or show no playback instead of a fake clip. Real older scenes (cdnde.com / okcdn.ru direct mp4) still resolve. This also makes the proxy-fallback question moot — there is no source to proxy.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-21 23:01:19 +02:00

79 lines
3.1 KiB
Python

"""hqfap.com — direct stream extractor.
Scene page (SSR, za Cloudflare → curl_cffi w fetch_tube_html) ma JSON-LD
VideoObject z `contentUrl` = direct mp4. Dwie generacje hostingu w katalogu:
- nowsze sceny: `v4.cdnde.com/...?video=<b64>&time=<epoch>&ip=<addr>` — param
`ip` NIE jest egzekwowany (cross-IP test 2026-06-10: lokalny ISP i VPS Hetzner
oba 206), token time-bound → resolve on-demand daje świeży URL,
- starsze sceny: `vd*.okcdn.ru/?expires=...&srcIp=...&sig=...` (ok.ru) — również
portable cross-IP (206 z innego IP niż fetcher).
Mobile gra direct (mobile_direct auto-detect w playback.py), zero proxy/WebView.
"""
from __future__ import annotations
import json
import logging
import re
from app.extractors._fetch import fetch_tube_html
from app.extractors._models import StreamSource
log = logging.getLogger(__name__)
_JSONLD_RE = re.compile(
r'<script[^>]+type=["\']application/ld\+json["\'][^>]*>(.*?)</script>',
re.IGNORECASE | re.DOTALL,
)
# Fallback gdy JSON-LD nie parsuje się jako JSON (trailing comma itp.).
_CONTENT_URL_RE = re.compile(r'"contentUrl"\s*:\s*"([^"]+)"')
_QUALITY_RE = re.compile(r"_(\d{3,4})p\.mp4", re.IGNORECASE)
def extract(page_url: str, *, timeout: float = 60.0) -> list[StreamSource] | None:
html = fetch_tube_html(page_url, timeout=timeout)
content_url: str | None = None
for m in _JSONLD_RE.finditer(html):
raw = m.group(1).strip()
if not raw:
continue
try:
data = json.loads(raw)
except (json.JSONDecodeError, ValueError):
continue
items = data if isinstance(data, list) else [data]
for obj in items:
if isinstance(obj, dict) and obj.get("@type") == "VideoObject":
content_url = (obj.get("contentUrl") or "").strip() or None
break
if content_url:
break
if not content_url:
rm = _CONTENT_URL_RE.search(html)
content_url = rm.group(1).strip() if rm else None
if not content_url or not content_url.startswith("http"):
log.warning("hqfap: no contentUrl in JSON-LD for %s", page_url)
return None
# hqfap migrował: `/upload/videos/video_down.mp4` (+ mirror *.workers.dev) serwuje
# STAŁY ~3MB placeholder dla KAŻDEJ sceny, niezależnie od deklarowanej długości
# (5/5 scen = 3.04MB przy 14-47min, weryfikacja 2026-06-21, browser MediaSource grał
# ten sam stub; user-reports „server down" c382d441/ef10b946). To NIE jest realne
# wideo → traktujemy jak brak źródła (lepiej żadne niż 3MB „server down" clip).
# Realne starsze sceny (cdnde.com / okcdn.ru direct mp4) przechodzą normalnie.
if "/upload/videos/video_down.mp4" in content_url:
log.info("hqfap: stub video_down.mp4 (placeholder, no real video) on %s", page_url)
return None
qm = _QUALITY_RE.search(content_url)
quality = f"{qm.group(1)}p" if qm else None
return [
StreamSource(
link=content_url,
quality=quality,
type="mp4",
referer="https://hqfap.com/",
)
]