The umbrella Source.name for all direct tube scrapers (deep-crawl, browse-latest, performer-driven) was "pornapp" — a misleading leftover from the removed external porn-app API. It read like a dependency on a third-party "pornapp" service; it is not — these are our own scrapers hitting 25+ tubes directly (kind=scraper, origin tube:<sitetag>). Renamed to "tube-scraper" via a single SCRAPER_SOURCE_NAME constant; DB row renamed in place (UPDATE name, same id) so all ingest_runs + external_records history stays linked. No behavior change — external_id keying (sitetag:url) and dedup are unaffected. NOTE: playback_sources.origin "pornapp:<sitetag>" prefix is a separate legacy format (resolve_playback parses it) and is intentionally left untouched. Verified on prod: row renamed (0 stray "pornapp"), new runs land on "tube-scraper". Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
28 lines
932 B
Python
28 lines
932 B
Python
"""BaseDirectTubeScraper — kontrakt dla bezpośrednich scraperów tube'ów."""
|
|
from __future__ import annotations
|
|
|
|
import abc
|
|
from collections.abc import Iterator
|
|
|
|
from app.connectors.base import RawScene
|
|
|
|
|
|
class BaseDirectTubeScraper(abc.ABC):
|
|
"""Kontrakt direct scrapera. Wszystkie scrapery feedują do
|
|
`Source(name=SCRAPER_SOURCE_NAME)` ("tube-scraper", rename z "pornapp" 2026-06-07)
|
|
żeby dziedziczyć logikę resolvera + idempotent merge per external_id."""
|
|
|
|
sitetag: str
|
|
"""Stabilny ID tube'a — używany w external_id `f"{sitetag}:{url}"`. Zgodny
|
|
z porn-app sitetag (hqpornercom, sxylandcom, itp.)."""
|
|
|
|
@abc.abstractmethod
|
|
def search(
|
|
self,
|
|
query: str,
|
|
*,
|
|
page: int = 1,
|
|
limit: int | None = None,
|
|
) -> Iterator[RawScene]:
|
|
"""Search tube po query (zwykle: nazwa performera). Yield RawScene per wynik."""
|
|
raise NotImplementedError
|