goon/app/connectors/direct_scrapers/base.py
jtrzupek a196fcbcdb refactor(ingest): rename scraper Source name "pornapp" -> "tube-scraper"
The umbrella Source.name for all direct tube scrapers (deep-crawl, browse-latest,
performer-driven) was "pornapp" — a misleading leftover from the removed external
porn-app API. It read like a dependency on a third-party "pornapp" service; it is
not — these are our own scrapers hitting 25+ tubes directly (kind=scraper,
origin tube:<sitetag>). Renamed to "tube-scraper" via a single SCRAPER_SOURCE_NAME
constant; DB row renamed in place (UPDATE name, same id) so all ingest_runs +
external_records history stays linked. No behavior change — external_id keying
(sitetag:url) and dedup are unaffected.

NOTE: playback_sources.origin "pornapp:<sitetag>" prefix is a separate legacy
format (resolve_playback parses it) and is intentionally left untouched.

Verified on prod: row renamed (0 stray "pornapp"), new runs land on "tube-scraper".

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 16:54:55 +02:00

28 lines
932 B
Python

"""BaseDirectTubeScraper — kontrakt dla bezpośrednich scraperów tube'ów."""
from __future__ import annotations
import abc
from collections.abc import Iterator
from app.connectors.base import RawScene
class BaseDirectTubeScraper(abc.ABC):
"""Kontrakt direct scrapera. Wszystkie scrapery feedują do
`Source(name=SCRAPER_SOURCE_NAME)` ("tube-scraper", rename z "pornapp" 2026-06-07)
żeby dziedziczyć logikę resolvera + idempotent merge per external_id."""
sitetag: str
"""Stabilny ID tube'a — używany w external_id `f"{sitetag}:{url}"`. Zgodny
z porn-app sitetag (hqpornercom, sxylandcom, itp.)."""
@abc.abstractmethod
def search(
self,
query: str,
*,
page: int = 1,
limit: int | None = None,
) -> Iterator[RawScene]:
"""Search tube po query (zwykle: nazwa performera). Yield RawScene per wynik."""
raise NotImplementedError