goon/app/extractors/tubes/_source_getfile.py
jtrzupek e780e1ae6f fix(hdporngg+fullmovies): native get_file, skip broken 4K — "loading forever"
User: "hdporngg loading forever". DevTools + cross-IP investigation (not guessing):
- site is alive (sample scenes 200; the one earlier 404 was a single removed video,
  not the site — my earlier "site dead" was a hasty generalization).
- both are the same platform (<source src=.../get_file/8512/...mp4>), no function/0.
- the get_file 302 is fast (~100ms) but the 2160p/4K source on fpvcdn.com TIMES OUT
  (~30s); 720p/480p resolve in ~1s. The player loading 4K first = the "loading forever".
- the final fpvcdn URL embeds the requester IP (ip=<fetcher>) -> IP-bound to whoever
  resolves it; BUT the get_file itself is stateless (fresh session works) and valid >=90s,
  and binds fpvcdn to the fetcher. So a VPS resolve would bind to the VPS IP (mobile 403),
  but returning the get_file URL UNRESOLVED lets the phone follow the 302 itself ->
  fpvcdn binds to the phone IP -> plays.

Fix: new _source_getfile resolver returns get_file URLs as mobile_direct (skip 4K),
phone resolves the 302 in-session. Native, multi-quality, no WebView, no proxy.
Replaces fullmovies' old force_proxy+4K extractor and the WebView fallback for both.
Backend-verified: resolve -> 720/480 mobile_direct, get_file fresh fetch -> 206. Pending
on-device confirmation (emulator unstable; same mechanism as porn00/freshporno which work).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 22:48:55 +02:00

76 lines
3 KiB
Python

"""Współdzielony resolver dla tubów z `<source src=.../get_file/...mp4>` + IP-bound CDN
(hdporn.gg, fullmovies.xxx — ta sama platforma, `/get_file/8512/`).
2026-06-04 (DevTools + cross-IP investigation — naprawia „hdporngg loading forever"):
get_file 302-redirectuje do `fpvcdn.com` z **IP fetchera wbitym w URL** (`ip=<kto-fetchnął>`),
więc finalny CDN jest IP-bound do tego kto resolvuje. Dlatego oddajemy get_file URL
**NIEZRESOLWOWANY** (mobile_direct) — ExoPlayer na telefonie sam follow-uje 302, fpvcdn
bindje się do IP telefonu, gra. (Resolve na VPS → bind do IP VPS → mobile 403.)
Zweryfikowane: get_file jest STATELESS (świeża sesja działa) + ważny ≥90s, więc telefon
ma czas (resolve→picker→tap = sekundy). Źródło **2160p/4K konsekwentnie time-outuje na
fpvcdn (~30s)** → POMIJAMY je (to była przyczyna „loading forever" — player ładował 4K
pierwsze); 720p/480p/1080p resolvują w ~1s.
"""
from __future__ import annotations
import logging
import re
from app.extractors._fetch import _DEFAULT_IMPERSONATE, _DEFAULT_UA, _HAS_CURL_CFFI
from app.extractors._models import StreamSource
log = logging.getLogger(__name__)
_SOURCE_RE = re.compile(
r"<source\s+src=['\"]([^'\"]+/get_file/[^'\"]+\.mp4[^'\"]*)['\"]"
r"[^>]*?(?:title|label)=['\"]?([^'\">]*)",
re.IGNORECASE,
)
# fpvcdn nie serwuje 4K (30s timeout) — skip żeby player nie wisiał na nim.
_SKIP_QUALITY_RE = re.compile(r"2160|1440|4k", re.IGNORECASE)
def resolve(page_url: str, base_url: str, *, timeout: float = 30.0) -> list[StreamSource] | None:
if not _HAS_CURL_CFFI:
log.info("source_getfile: curl_cffi unavailable — %s", page_url)
return None
from curl_cffi import requests as cf
try:
html = cf.get(
page_url, impersonate=_DEFAULT_IMPERSONATE,
headers={"User-Agent": _DEFAULT_UA, "Accept": "text/html,application/xhtml+xml"},
timeout=timeout,
).text
except Exception as e:
log.info("source_getfile: page fetch failed %s: %s", page_url, e)
return None
seen: set[str] = set()
out: list[StreamSource] = []
for m in _SOURCE_RE.finditer(html):
url = m.group(1).strip()
quality = (m.group(2) or "").strip()
if url.startswith("//"):
url = "https:" + url
if url in seen:
continue
seen.add(url)
if _SKIP_QUALITY_RE.search(quality):
log.info("source_getfile: skip broken-CDN quality %r on %s", quality, page_url)
continue
out.append(StreamSource(
link=url, type="mp4", quality=quality or None,
referer=base_url + "/", raw={"mobile_direct_ok": True},
))
if not out:
log.info("source_getfile: no playable <source>/get_file on %s", page_url)
return None
def _rank(s: StreamSource) -> int:
mm = re.search(r"(\d{3,4})", s.quality or "")
return int(mm.group(1)) if mm else -1
out.sort(key=_rank, reverse=True)
return out