perf(scenes): literal tag_id in filter — 4-12s tag lists -> ~20ms

Tag-filtered scene lists (e.g. blowjob + has_playback) took 4-12s. Root cause:
the filter joined scene_tags->tags on slug, so the actual tag_id was opaque to
the planner at plan time. It fell back to average per-tag cardinality
(8.4M/11541 ≈ 726) instead of the real 273k, chose to materialize ALL matching
scene_tags + check playback per row, then top-N sort.

Fix: resolve slug->tag_id in the app and filter on a LITERAL tag_id (no slug
join). With a constant, the planner uses MCV stats, knows the tag is huge, and
walks ix_scenes_created_at_desc probing scene_tags/playback per scene, stopping
at the page limit. Verified: blowjob list 3300ms -> 18ms (EXPLAIN), HTTP 4-12s ->
47ms. Unknown slug short-circuits to empty. (Pairs with the raised tag_id
statistics target so mid-tier tags also get correct estimates.)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
jtrzupek 2026-06-07 21:10:31 +02:00
parent d52641774d
commit 43f7e1f7b2

View file

@ -8,7 +8,7 @@ from typing import Annotated
from fastapi import APIRouter, Depends, HTTPException, Query, status from fastapi import APIRouter, Depends, HTTPException, Query, status
from pydantic import BaseModel from pydantic import BaseModel
from sqlalchemy import distinct, exists, func, literal_column, select from sqlalchemy import distinct, exists, false, func, literal_column, select
from sqlalchemy.exc import IntegrityError from sqlalchemy.exc import IntegrityError
from sqlalchemy.orm import Session from sqlalchemy.orm import Session
@ -182,13 +182,29 @@ def list_scenes(
tag_slug_list = _split_csv(tags) tag_slug_list = _split_csv(tags)
# AND między tagami: scena musi mieć WSZYSTKIE zaznaczone tagi. Każdy slug → osobny # AND między tagami: scena musi mieć WSZYSTKIE zaznaczone tagi. Każdy slug → osobny
# exists() — zaznaczanie kolejnych filtrów zawęża wyniki, jak intuicja użytkownika. # exists() — zaznaczanie kolejnych filtrów zawęża wyniki, jak intuicja użytkownika.
#
# PERF (2026-06-07): resolvujemy slug→tag_id w aplikacji i filtrujemy po LITERALNYM
# tag_id (NIE JOIN po Tag.slug). Z literałem planner zna kardynalność tagu ze
# statystyk (MCV) → dla popularnych tagów (blowjob ~273k scen) wybiera index-walk po
# ix_scenes_created_at_desc zamiast materializować wszystkie scene_tags. Slug-JOIN
# ukrywał tag_id przed plannerem → używał średniej (8.4M/11541≈726) → zły plan
# (4-12s). Z literałem: ~20ms. Zob. też _build... light mode.
if tag_slug_list:
id_by_slug = dict(
session.execute(
select(Tag.slug, Tag.id).where(Tag.slug.in_(tag_slug_list))
).all()
)
for slug in tag_slug_list: for slug in tag_slug_list:
tag_id = id_by_slug.get(slug)
if tag_id is None:
base = base.where(false()) # nieznany slug → brak wyników
break
base = base.where( base = base.where(
exists( exists(
select(1) select(1)
.select_from(SceneTag) .select_from(SceneTag)
.join(Tag, Tag.id == SceneTag.tag_id) .where(SceneTag.scene_id == Scene.id, SceneTag.tag_id == tag_id)
.where(SceneTag.scene_id == Scene.id, Tag.slug == slug)
) )
) )