goon/alembic/versions/20260531_0019_taxonomy_scene_counts.py
jtrzupek 2163fee245 perf(taxonomy): denormalize scene_count for tags/performers/studios
Counts for /tags, /performers, /studios and /favorites were computed live
per-request by aggregating scene_tags / scene_performers with an EXISTS to
playback_sources. As the catalog grew to ~1.7M scenes (6.3M scene_tags) this
ran ~4.3s for /tags?order=popular (x2 incl. the total count) and ~950ms for
the default /scenes count, making those screens load in several seconds.

- migration 0019: add scene_count (+ DESC index) to tags/performers/studios
- background job _job_refresh_taxonomy_counts (every 3h) recomputes the counts
  in one UPDATE..FROM each (IS DISTINCT FROM to skip unchanged rows)
- /tags, /performers, /studios scenes path now read the column + ORDER BY the
  indexed scene_count; for_movies paths keep live aggregation (small tables)
- favorites read denormalized scene_count instead of a grouped EXISTS aggregate
- /scenes default count: 10-min in-process TTL cache (header is approximate)

Measured: /tags?order=popular&per_page=500 ~8s -> 66ms incl. serialization.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-31 17:53:48 +02:00

54 lines
2 KiB
Python

"""taxonomy scene_count denormalization — tags / performers / studios
Revision ID: 0019_taxonomy_scene_counts
Revises: 0018_movie_play_progress
Create Date: 2026-05-31
Perf fix (user-report 2026-05-31 "wolne ładowanie scen/favorites/tags"): baza urosła
do 1.69M scen / 6.3M scene_tags, a /tags?order=popular liczył scene_count dla KAŻDEGO
tagu na żywo (agregacja 6.3M scene_tags + EXISTS playback, external-merge sort 22MB) —
~4.3s, i to razy 2 (total + items). Analogicznie performers/studios + favorites.
Denormalizujemy `scene_count` na tags/performers/studios. Worker przelicza je w tle
(`_job_refresh_taxonomy_counts`, co `GOON_SCHED_TAXONOMY_COUNTS_HOURS`=3h jednym
UPDATE...FROM). Endpointy czytają gotową kolumnę + ORDER BY indexed DESC → <20ms.
scene_count = liczba scen z danym tagiem/performerem/studiem mających ≥1 ŻYWY
playback_source (dead_at IS NULL) — dokładnie ta sama definicja co dotychczasowe
live-aggregaty (has_live_playback filter w taxonomies.py / favorites.py).
Counts są do ~3h nieświeże — dla "(123)" przy filtrze i sortu "popular" bez znaczenia.
"""
from collections.abc import Sequence
import sqlalchemy as sa
from alembic import op
revision: str = "0019_taxonomy_scene_counts"
down_revision: str | None = "0018_movie_play_progress"
branch_labels: str | Sequence[str] | None = None
depends_on: str | Sequence[str] | None = None
_TABLES = ("tags", "performers", "studios")
def upgrade() -> None:
for tbl in _TABLES:
op.add_column(
tbl,
sa.Column(
"scene_count", sa.Integer(), nullable=False, server_default="0"
),
)
# DESC index — ORDER BY scene_count DESC (sortowanie "popular").
op.create_index(
f"ix_{tbl}_scene_count",
tbl,
[sa.text("scene_count DESC")],
)
def downgrade() -> None:
for tbl in _TABLES:
op.drop_index(f"ix_{tbl}_scene_count", table_name=tbl)
op.drop_column(tbl, "scene_count")