fix(hqporner): require ALL query tokens in slug — stop performer over-attribution
hqporner search post-filter kept a scene if its slug contained ANY query token (>=3 chars). For multi-word performer names this matched on a single common token (e.g. "anna","mia"), so the performer-driven ingest attributed the scene to EVERY performer sharing that token — scenes accumulated up to 503 wrong performers (hqporner = 5659 of 5897 scenes with >30 performers; bug-reports 2026-06-07). Switch ANY->ALL: the slug must contain every query token, requiring a full name match before attribution. Single-word names still work. Precision over recall — 144 wrong performers is far worse than missing a few loose matches. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
parent
bc72515227
commit
cd257740be
1 changed files with 7 additions and 3 deletions
|
|
@ -47,8 +47,12 @@ class HQPornerScraper(BaseDirectTubeScraper):
|
||||||
log.debug("hqporner search %s status=%d", url, r.status_code)
|
log.debug("hqporner search %s status=%d", url, r.status_code)
|
||||||
return
|
return
|
||||||
|
|
||||||
# Filtr: slug musi zawierać przynajmniej jedno z słów query (case-insensitive)
|
# Filtr: slug musi zawierać WSZYSTKIE słowa query (≥3 znaki), case-insensitive.
|
||||||
# Eliminuje totalnie niezwiązane wyniki gdy fuzzy search szumi.
|
# Wcześniej `any` (≥1 token) → przy 2-słownych nazwach match na jednym pospolitym
|
||||||
|
# tokenie (np. "anna"/"mia") atrybutował scenę do KAŻDEGO performera dzielącego ten
|
||||||
|
# token → sceny z setkami błędnych aktorek (do 503; hqporner = 5659/5897 takich scen,
|
||||||
|
# bug-report 2026-06-07). `all` wymaga pełnego dopasowania nazwy → precyzja.
|
||||||
|
# Pojedyncze nazwy ("Belladonna") nadal działają (jeden token musi być).
|
||||||
query_tokens = {tok for tok in query.lower().split() if len(tok) >= 3}
|
query_tokens = {tok for tok in query.lower().split() if len(tok) >= 3}
|
||||||
|
|
||||||
seen_urls: set[str] = set()
|
seen_urls: set[str] = set()
|
||||||
|
|
@ -63,7 +67,7 @@ class HQPornerScraper(BaseDirectTubeScraper):
|
||||||
|
|
||||||
# Title-token filter
|
# Title-token filter
|
||||||
slug_lower = slug_part.lower()
|
slug_lower = slug_part.lower()
|
||||||
if query_tokens and not any(tok in slug_lower for tok in query_tokens):
|
if query_tokens and not all(tok in slug_lower for tok in query_tokens):
|
||||||
continue
|
continue
|
||||||
|
|
||||||
title = slug_part.replace("_", " ").replace("-", " ").strip()
|
title = slug_part.replace("_", " ").replace("-", " ").strip()
|
||||||
|
|
|
||||||
Loading…
Add table
Reference in a new issue