CORRECTION to earlier "unrecoverable" call: the /post/<id> page is alive (200) and DOES expose the scene's own fresh-signed poster via og:image / <video poster> (post-id embedded, current timestamp) — only the STORED thumbnail URL had rotted. Search/listings don't re-surface old posts (0 overlap), but per-post fetch works. scripts/refresh_sxyprn_thumbs.py: iterate live sxyprn sources, fetch post page, extract fresh og:image, UPDATE thumbnail_url (verified: refreshed URLs return 200). _job_refresh_sxyprn_thumbs: every 12h refresh the 1200 least-recently-updated sources (cycles the ~19k catalog within the expiry window). Pairs with the scene_resolver overwrite fix so refreshed thumbnails stick. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
94 lines
3.2 KiB
Python
94 lines
3.2 KiB
Python
"""Odśwież wygasłe miniaturki sxyprn z ich (żywych) stron post (bug 2026-06-10).
|
|
|
|
sxyprn/trafficdeposit thumbnaile są podpisane czasowo i ROTUJĄ (token wygasa po
|
|
~tygodniach → 404), ALE sama strona /post/<id>.html ŻYJE (200) i zawiera świeży
|
|
poster sceny w `og:image` / `<video poster=>` (post-id w ścieżce, current timestamp).
|
|
Search/listingi NIE re-surfaceują starych postów (0 overlap), więc jedyna droga to
|
|
pobranie per-post page.
|
|
|
|
Strategia: iteruj żywe sxyprn playback_sources, fetch post page, wyłuskaj og:image
|
|
(fresh signed thumbnail), UPDATE thumbnail_url. Wznawialne (--offset), idempotent.
|
|
Thumbnaile dalej rotują → odpalać periodycznie (scheduled job _job_refresh_sxyprn_thumbs).
|
|
|
|
Użycie (kontener worker):
|
|
python scripts/refresh_sxyprn_thumbs.py [--limit N] [--offset M]
|
|
"""
|
|
from __future__ import annotations
|
|
|
|
import re
|
|
import sys
|
|
import time
|
|
|
|
from sqlalchemy import text
|
|
|
|
from app.db import session_scope
|
|
from app.extractors import browser_get
|
|
|
|
_OG_RE = re.compile(r"og:image[\"'][^>]*content=[\"']([^\"']+)", re.IGNORECASE)
|
|
_OG_RE2 = re.compile(r"content=[\"']([^\"']+)[\"'][^>]*property=[\"']og:image", re.IGNORECASE)
|
|
_POSTER_RE = re.compile(r"<video[^>]*poster=[\"']([^\"']+)", re.IGNORECASE)
|
|
|
|
|
|
def _fresh_thumb(page_url: str) -> str | None:
|
|
"""Pobierz post page → świeży poster (og:image / video poster). None gdy strona
|
|
martwa (Post Not Found) albo brak postera."""
|
|
try:
|
|
html = browser_get(page_url, timeout=25).text
|
|
except Exception:
|
|
return None
|
|
if "Post Not Found" in html:
|
|
return None
|
|
m = _OG_RE.search(html) or _OG_RE2.search(html) or _POSTER_RE.search(html)
|
|
if not m:
|
|
return None
|
|
u = m.group(1).strip()
|
|
if u.startswith("//"):
|
|
u = "https:" + u
|
|
if "trafficdeposit.com" not in u and "sxyprn" not in u:
|
|
return None
|
|
return u
|
|
|
|
|
|
def main() -> None:
|
|
limit = 1_000_000
|
|
offset = 0
|
|
for i, a in enumerate(sys.argv):
|
|
if a == "--limit" and i + 1 < len(sys.argv):
|
|
limit = int(sys.argv[i + 1])
|
|
if a == "--offset" and i + 1 < len(sys.argv):
|
|
offset = int(sys.argv[i + 1])
|
|
|
|
with session_scope() as s:
|
|
rows = s.execute(
|
|
text(
|
|
"SELECT id, page_url FROM playback_sources "
|
|
"WHERE origin='tube:sxyprncom' AND dead_at IS NULL "
|
|
"ORDER BY id OFFSET :off LIMIT :lim"
|
|
).bindparams(off=offset, lim=limit)
|
|
).all()
|
|
|
|
print(f"sxyprn sources to refresh: {len(rows)} (offset={offset})", flush=True)
|
|
updated = 0
|
|
dead = 0
|
|
nothumb = 0
|
|
for idx, (pbid, page_url) in enumerate(rows):
|
|
thumb = _fresh_thumb(page_url)
|
|
if thumb:
|
|
with session_scope() as s:
|
|
s.execute(
|
|
text("UPDATE playback_sources SET thumbnail_url=:t WHERE id=:i").bindparams(
|
|
t=thumb, i=pbid
|
|
)
|
|
)
|
|
s.commit()
|
|
updated += 1
|
|
else:
|
|
nothumb += 1
|
|
if (idx + 1) % 200 == 0:
|
|
print(f" {idx+1}/{len(rows)} updated={updated} no_thumb={nothumb}", flush=True)
|
|
time.sleep(0.25)
|
|
print(f"DONE refreshed={updated}/{len(rows)} no_thumb={nothumb}", flush=True)
|
|
|
|
|
|
if __name__ == "__main__":
|
|
main()
|