- paradisehill.fetch_movies compared release_date coerced to midnight against the `since` timestamp, so the chronological crawl stopped at the first upload dated the same calendar day as `since` and silently dropped most new movies (0-2 seen per run; Movies tab stalled). Compare by DATE with a 1-day grace instead; idempotent external_records upsert dedups the re-fetched recent window. - scripts/backfill_paradisehill_movies.py: one-off no-delta deep crawl to recover the backlog missed during the bug (idempotent, resumable). - docs: correct stale 'raz dziennie/24h' browse-latest comments to 6h (4x/day), the actual configured cadence (config.py sched_browse_latest_hours=6). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
39 lines
1.3 KiB
Python
39 lines
1.3 KiB
Python
"""One-off: głęboki crawl paradisehill (no-delta) do odzyskania backlogu filmów
|
|
przegapionych w okresie buga delta (release_date DATA vs timestamp `since`, fix
|
|
2026-06-01). Idempotentny — znane filmy pomija przez external_records hash, więc
|
|
można puścić wielokrotnie / przerwać i wznowić bez duplikatów.
|
|
|
|
Użycie:
|
|
python scripts/backfill_paradisehill_movies.py --limit 1500
|
|
"""
|
|
from __future__ import annotations
|
|
|
|
import argparse
|
|
import logging
|
|
import sys
|
|
|
|
from app.connectors import get_movie_connectors
|
|
from app.ingest import ingest_movies_from_connector
|
|
|
|
logging.basicConfig(level=logging.INFO, format="%(asctime)s %(levelname)s %(message)s")
|
|
log = logging.getLogger("backfill_ph_movies")
|
|
|
|
|
|
def main() -> int:
|
|
ap = argparse.ArgumentParser()
|
|
ap.add_argument("--limit", type=int, default=1500, help="Ile najnowszych filmów przejrzeć (default 1500)")
|
|
args = ap.parse_args()
|
|
|
|
reg = dict(get_movie_connectors())
|
|
connector = reg.get("paradisehill")
|
|
if connector is None:
|
|
log.error("paradisehill connector not registered (available: %s)", list(reg))
|
|
return 1
|
|
|
|
counters = ingest_movies_from_connector(connector(), use_delta=False, limit=args.limit)
|
|
log.info("DONE backfill paradisehill: %s", counters)
|
|
return 0
|
|
|
|
|
|
if __name__ == "__main__":
|
|
sys.exit(main())
|