Commit graph

2 commits

Author SHA1 Message Date
jtrzupek
ee83ae5e97 scripts: add gated --fix to false-merge audit (short-clip outliers)
Opt-in remediation for the duration-inconsistent scenes found by the audit.
Scope is deliberately narrow and reversible:

- only scenes with >=3 duration-bearing sources AND max/min ratio > 3x
- anchored on scene.duration_sec (the canonical value), never the median of
  sources (a median is wrong when several bogus short clips outvote the real
  full-length source)
- marks dead ONLY sources that are >2x SHORTER than the canonical — a falsely
  merged source is almost always a short SEO clip/preview. Sources longer than
  the canonical are left alone, since an over-long outlier more often means the
  canonical duration itself is too low (so killing the long source would drop
  the real video); those stay for manual review.
- guards that at least one live source remains
- dry-run by default; --yes to apply; sets dead_at (reversible), not delete

First run marked 514 short-clip sources dead across 228 scenes.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-01 11:30:23 +02:00
jtrzupek
ee1d0c7610 scripts: add false-merge audit (duration-inconsistent scenes)
Read-only data-quality audit for scene merges made before the 2026-05-12
scoring hardening (which now caps weak-signal aggregator matches at 0.85 and
tightened the duration bump to <=3s). The auto-merge candidate log does not
record which external_ref was attached, so a merge cannot be reversed from the
log alone. Instead this detects false merges by their effect: a scene that
absorbed a different video ends up with playback_sources of inconsistent
durations (e.g. a 60s clip alongside a 2h source).

Reports counts + severity buckets by max/min duration ratio, can list the worst
offenders with a per-source breakdown, and can export suspects to JSON. Mutates
nothing — remediation (detach/mark-dead the outlier source) is left as an
explicit, separately-decided step because short durations can be legitimate
(previews) and n=2 scenes are ambiguous about which source is canonical.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-01 11:23:10 +02:00