goon/app/extractors/tubes/hqporner.py
https://github.com/goon-foss/goon 642f1ab8b8 Mobile 0.1.9: OTA enable, WebView cookie-dismiss fix, porndoe connector
Mobile / OTA:
- Enable Expo Updates (app.json + AndroidManifest) → api.goon-foss.org
- Bump 0.1.6 → 0.1.9 (build.gradle, app.json, appVersion.ts, main.py /version)
- backend.ts: default public backend auto-connect (no manual login)

WebView fallback fix (PlayerScreen INJECTED_JS):
- Auto-dismiss cookie/consent gates (hqporner et al. blocked kt_player init)
- Context-scoped: only clicks consent buttons inside cookie/gdpr containers
- Retry window for <source>.src polling raised 5→15 ticks (post-dismiss init)

Resolver:
- Series-position + modifier mismatch detector (Episode 2≠4, BTS/unedited)
  → composite_score hard-reject / cap; wired into scene_score + bulk_dedup
- aggregator-mode candidate query: LIMIT 500 + title-match ordering

Connectors:
- porndoe.com browse scraper (JSON-LD VideoObject) — theporndude audit pilot

landing: APK links → goon-v0.1.9.apk

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-22 11:20:57 +02:00

122 lines
5 KiB
Python

"""hqporner.com — direct stream extractor.
Page → iframe (mydaddy.cc lub hqwo.cc — hosting się zmienia w czasie) → wyciągnij
mp4 URL-e z `<source>` tagów lub innych miejsc w HTML/JS playera.
Dwie generacje hostera (oba aktywne dla różnych scen):
1. **Stara: mydaddy.cc/video/<hash>/** — FluidPlayer wrapper z `<source>` tagami
bezpośrednio w HTML iframe:
`<source src="//s12.bigcdn.cc/.../360.mp4" title="360p">` + 720p + 1080p.
2. **Nowa: hqwo.cc/player/<hash>?img=<base64>** — `<source>` tagi są wewnątrz
JavaScript string literal (`$("#jw").html("<video>...<source src=\"...\">")`).
Quotes są escaped (`\"`), więc plain regex na `<source[^>]+src="..."`
nie matchuje. Trzeba odescape'ować HTML przed regex match.
URL pattern: `https://hqwo.cc/pubs/<pub_id>/<quality>.mp4` gdzie pub_id jest
inny niż player_hash w iframe URL — generowany serwerem per request.
Fallback gdy oba zawiodą: hoster type → mobile otworzy w WebView (FluidPlayer
JS wyciągnie URL po user click).
"""
from __future__ import annotations
import logging
import re
from app.extractors._fetch import _DEFAULT_UA, browser_get, fetch_tube_html
from app.extractors._models import StreamSource
from app.extractors.hoster import extract_stream_from_hoster
log = logging.getLogger(__name__)
_IFRAME_RE = re.compile(
r'<div[^>]+id=["\']?playerWrapper["\']?[^>]*>.*?<iframe[^>]+src=["\']([^"\']+)',
re.IGNORECASE | re.DOTALL,
)
# Match `<source src="...mp4" title="...">` z opcjonalnym title. Po unescape
# (`\"` → `"`) ten regex łapie zarówno raw HTML (mydaddy.cc) jak i JS-embedded
# HTML (hqwo.cc).
_SOURCE_RE = re.compile(
r'<source[^>]+src=["\']((?://|https?://)[^"\']+\.mp4[^"\']*)["\'](?:[^>]+title=["\']([^"\']+))?',
re.IGNORECASE,
)
def extract(page_url: str, *, timeout: float = 60.0) -> list[StreamSource] | None:
page_html = fetch_tube_html(page_url, timeout=timeout)
m = _IFRAME_RE.search(page_html)
if not m:
log.warning("hqporner: no iframe in %s", page_url)
return None
iframe_src = m.group(1).strip()
if iframe_src.startswith("//"):
iframe_src = "https:" + iframe_src
elif iframe_src.startswith("/"):
iframe_src = f"https://hqporner.com{iframe_src}"
headers = {
"User-Agent": _DEFAULT_UA,
"Accept": "text/html,application/xhtml+xml",
"Accept-Language": "en-US,en;q=0.9",
"Referer": "https://hqporner.com/",
}
try:
r = browser_get(iframe_src, headers=headers, timeout=timeout, follow_redirects=True)
r.raise_for_status()
except Exception as e:
log.warning("hqporner iframe fetch %s failed: %s", iframe_src, e)
return None
# Hqwo.cc embeds `<source>` tags inside `$.html("<video>...<source src=\"...\">")`
# JS string literals — quotes are escaped. Plain HTML in mydaddy.cc has raw quotes.
# Unescape commonly-escaped sequences so the same regex handles both shapes.
iframe_html = (
r.text.replace('\\"', '"').replace("\\'", "'").replace("\\\\", "\\")
)
# CDN-y (bigcdn.cc, hqwo.cc) bindują URL do Referera embed iframe'a (host hqwo.cc /
# mydaddy.cc), nie hqporner.com. Trzymamy referer = host iframe'a dla proxy.
from urllib.parse import urlparse as _urlparse
iframe_host = _urlparse(iframe_src).hostname or ""
iframe_referer = f"https://{iframe_host}/" if iframe_host else iframe_src
# De-dup by URL: hqwo.cc emits `<source>` tags twice (adblock + non-adblock branches).
seen_urls: set[str] = set()
sources: list[StreamSource] = []
for sm in _SOURCE_RE.finditer(iframe_html):
url = sm.group(1).strip()
if url.startswith("//"):
url = "https:" + url
if url in seen_urls:
continue
seen_urls.add(url)
title = (sm.group(2) or "").strip()
# `force_proxy=True` (2026-05-20): CDN-y bigcdn.cc/flyflv IP-bound + flyflv ma
# `ip=46.62.219.154` w URL path. Mobile direct = 404/403 → fallback proxy
# generuje flicker. Force_proxy wymusza mobile użycie proxied od razu.
# Bug-report e8ddd8d4: "kliknięcie otwiera reklamę" gdy _vps_blocked_fallback
# (hqporner page ads). Force_proxy + native mp4 = quality picker + natywny.
sources.append(StreamSource(
link=url, quality=title or None, type="mp4", referer=iframe_referer,
raw={"force_proxy": True},
))
if sources:
return sources
# Fallback 1: niektóre mydaddy.cc iframes używają packed JS (JWPlayer).
stream_url = extract_stream_from_hoster(
iframe_src, referer="https://hqporner.com/", timeout=timeout,
)
if stream_url:
type_hint = "m3u8" if ".m3u8" in stream_url.lower() else "mp4"
return [StreamSource(link=stream_url, type=type_hint, referer=iframe_referer)]
# Fallback 2: oddaj iframe URL jako hoster type — mobile otworzy w WebView,
# FluidPlayer JS sam wyciągnie URL po user click / przejściu adblock check.
log.info("hqporner: using hoster fallback for %s", iframe_src)
return [StreamSource(link=iframe_src, type="hoster")]