fix(scrapers): freshporno browse from /latest-updates/ not homepage root

The homepage root / is a KVS page with cache-control: no-store and a fresh PHPSESSID per request; the server rotates its featured block and on a cold session can serve an old set instead of the newest scenes. Result: browse-latest skipped everything for 3 days (root served 20 May content), no new freshporno scenes since 12 Jun (user report). Switch _listing_url to the explicit date-sorted /latest-updates/ feed (pagination /latest-updates/N/), which is not subject to that rotation.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This commit is contained in:
jtrzupek 2026-06-15 09:59:40 +02:00
parent 3714afa22f
commit 4b71689a95

View file

@ -5,8 +5,15 @@ Pilot #2 (po shyfap fail). Hipoteza: freshporno zachowuje oryginalne studio titl
do canonical zadziała. Bonus: channel = studio 1:1 (Pure Taboo, Brazzers, etc.). do canonical zadziała. Bonus: channel = studio 1:1 (Pure Taboo, Brazzers, etc.).
URL patterns: URL patterns:
- Listing: `/` (page 1), `/2/`, `/3/`, ... (last `/391/` w czasie pisania) - Listing: `/latest-updates/` (page 1), `/latest-updates/2/`, ... (chronologiczny feed)
- Scene: `/videos/<slug>/` - Scene: `/videos/<slug>/`
Listing: świadomie `/latest-updates/` zamiast roota `/`. Root jest KVS-owym
homepage z `cache-control: no-store` i świeżym PHPSESSID per-request serwer
rotuje tam blok "featured" i na zimnej sesji potrafi podać stary zestaw zamiast
najnowszych (zaobserwowane 2026-06-15: 3 dni browse-latest skipowało wszystko bo
root podawał sceny z 20 maja; freshporno.org report). `/latest-updates/` to jawny
feed sortowany po dacie, odporny na rotację. Paginacja: `/latest-updates/N/`.
- Channels: `/channels/<slug>/` (= studio) - Channels: `/channels/<slug>/` (= studio)
- Models: `/models/<slug>/` (= performer) - Models: `/models/<slug>/` (= performer)
- Tags: `/tags/<slug>/` (= category) - Tags: `/tags/<slug>/` (= category)
@ -61,8 +68,8 @@ class FreshpornoScraper(BaseBrowseScraper):
def _listing_url(self, page: int) -> str: def _listing_url(self, page: int) -> str:
if page <= 1: if page <= 1:
return f"{_BASE}/" return f"{_BASE}/latest-updates/"
return f"{_BASE}/{page}/" return f"{_BASE}/latest-updates/{page}/"
def _extract_scene_urls(self, listing_html: str) -> list[str]: def _extract_scene_urls(self, listing_html: str) -> list[str]:
seen: set[str] = set() seen: set[str] = set()