goon/scripts
jtrzupek 7e46e5ac48 feat(scheduler): deep-crawl full tube catalogs (Phase 2a — ingest-all)
We ingested only ~3% of each browse tube's catalog (porndoe >62k scenes; we had 1959)
because tubes were hit only by performer-search + top-N browse. Pilot (porndoe pages
64-110): 1119 new scenes, 100% playable + 100% tagged, 0% canonical overlap (purely
additive — content not in TPDB/StashDB).

- app/scheduler/deep_crawl.py: round-robin over ALL_BROWSE_SCRAPERS, per-tube page cursor
  in app/_state/deepcrawl_state.json (no DB migration), deep-paginate from the cursor,
  idempotent (resolver skips known by raw_hash), mark 'exhausted' at catalog end then
  reset cursors for an incremental re-sweep.
- _job_deep_crawl: hourly, 60 pages/run (~1860 scenes, ~22 min), wrapped in the 1h
  hard-timeout; registered in build_scheduler (jobs=10).
- config: sched_deep_crawl_hours=1, deep_crawl_pages_per_run=60, deepcrawl_state_path.
- scripts/pilot_porndoe_deepcrawl.py: one-off pilot used to validate the approach.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-03 09:26:44 +02:00
..
sql Initial commit 2026-05-20 10:10:22 +02:00
_extract_apk_sig_hash.py session work: bug-report fixes + WIP cleanup 2026-05-25 22:02:52 +02:00
_patch_manifest.py session work: bug-report fixes + WIP cleanup 2026-05-25 22:02:52 +02:00
add_performer_tpdb_ref.py Initial commit 2026-05-20 10:10:22 +02:00
audit_false_merges.py scripts: add gated --fix to false-merge audit (short-clip outliers) 2026-06-01 11:30:23 +02:00
auto_merge_freshporno_to_canonical.py session work: bug-report fixes + WIP cleanup 2026-05-25 22:02:52 +02:00
backfill_durations.py Initial commit 2026-05-20 10:10:22 +02:00
backfill_freshporno_dates.py session work: bug-report fixes + WIP cleanup 2026-05-25 22:02:52 +02:00
backfill_freshporno_titles.py session work: bug-report fixes + WIP cleanup 2026-05-25 22:02:52 +02:00
backfill_inferred_tags.py feat(ingest): SQL phash match, tag inference + backfill, clip-store skip, browse tubes, watchdog 2026-06-01 15:07:35 +02:00
backfill_paradisehill_movies.py fix(movies): paradisehill delta date-granularity + browse cadence docs 2026-06-01 17:00:10 +02:00
backfill_paradisehill_tags.py session work: bug-report fixes + WIP cleanup 2026-05-25 22:02:52 +02:00
backfill_phash_tube.py Initial commit 2026-05-20 10:10:22 +02:00
backfill_scene_duration_from_playback.py fix(scenes): propagate playback duration to Scene + duration-consistent counts 2026-06-01 21:31:01 +02:00
backfill_scene_thumbnails.py Initial commit 2026-05-20 10:10:22 +02:00
bulk_auto_merge.py Initial commit 2026-05-20 10:10:22 +02:00
bulk_rescrape_hqporner.py Initial commit 2026-05-20 10:10:22 +02:00
bulk_resolve_merges.py Initial commit 2026-05-20 10:10:22 +02:00
check_all_hosters.py Mobile 0.1.9: OTA enable, WebView cookie-dismiss fix, porndoe connector 2026-05-22 11:20:57 +02:00
check_hetzner_traffic.py Initial commit 2026-05-20 10:10:22 +02:00
check_series_detector.py Mobile 0.1.9: OTA enable, WebView cookie-dismiss fix, porndoe connector 2026-05-22 11:20:57 +02:00
compare_performer_canon.py Initial commit 2026-05-20 10:10:22 +02:00
debug_pornxp_listing.py Initial commit 2026-05-20 10:10:22 +02:00
debug_tpdb_performer.py Initial commit 2026-05-20 10:10:22 +02:00
dedup_favorite_performers.py Initial commit 2026-05-20 10:10:22 +02:00
dump_pornapp_sites.ps1 Initial commit 2026-05-20 10:10:22 +02:00
fill_tpdb_refs_batch.py Initial commit 2026-05-20 10:10:22 +02:00
find_underfilled_performers.py Initial commit 2026-05-20 10:10:22 +02:00
generate_icons.py Initial commit 2026-05-20 10:10:22 +02:00
goon_debug_proxy.py Mobile 0.1.9: OTA enable, WebView cookie-dismiss fix, porndoe connector 2026-05-22 11:20:57 +02:00
killall_bulk_rescrape.py Initial commit 2026-05-20 10:10:22 +02:00
merge_tags.py Initial commit 2026-05-20 10:10:22 +02:00
migrate_paradisehill_to_movies.py Initial commit 2026-05-20 10:10:22 +02:00
phash_benchmark.py Initial commit 2026-05-20 10:10:22 +02:00
phash_dedup_scenes.py Initial commit 2026-05-20 10:10:22 +02:00
pilot_browse_scrapers.py Initial commit 2026-05-20 10:10:22 +02:00
pilot_porndoe_deepcrawl.py feat(scheduler): deep-crawl full tube catalogs (Phase 2a — ingest-all) 2026-06-03 09:26:44 +02:00
probe_all_extract.py Initial commit 2026-05-20 10:10:22 +02:00
probe_browse_scraper.py Initial commit 2026-05-20 10:10:22 +02:00
probe_lpv_extract.py Initial commit 2026-05-20 10:10:22 +02:00
probe_mangoporn_hosters.py Initial commit 2026-05-20 10:10:22 +02:00
probe_perverzija.py Initial commit 2026-05-20 10:10:22 +02:00
probe_pv_extract.py Initial commit 2026-05-20 10:10:22 +02:00
probe_scene.py Initial commit 2026-05-20 10:10:22 +02:00
publish_update.py fix(ota): make publish_update.py work one-shot on Windows git-bash 2026-06-02 09:56:34 +02:00
reingest_pandamovies_hosts.py Initial commit 2026-05-20 10:10:22 +02:00
repair_dooplay_movies.py Initial commit 2026-05-20 10:10:22 +02:00
repair_truncated_titles.py Initial commit 2026-05-20 10:10:22 +02:00
reresolve_freshporno_orphans.py Initial commit 2026-05-20 10:10:22 +02:00
restore_canonical_titles.py Initial commit 2026-05-20 10:10:22 +02:00
smoke_test.py feat(ingest): SQL phash match, tag inference + backfill, clip-store skip, browse tubes, watchdog 2026-06-01 15:07:35 +02:00
stashdb_studio_backfill.py Initial commit 2026-05-20 10:10:22 +02:00
status_tubes.py Initial commit 2026-05-20 10:10:22 +02:00
studio_retrofix.py Initial commit 2026-05-20 10:10:22 +02:00
test_cross_ip.py Initial commit 2026-05-20 10:10:22 +02:00
test_porndoe_scraper.py Mobile 0.1.9: OTA enable, WebView cookie-dismiss fix, porndoe connector 2026-05-22 11:20:57 +02:00
test_resolve_endpoint.py Initial commit 2026-05-20 10:10:22 +02:00
theporndude_coverage_check.py Mobile 0.1.9: OTA enable, WebView cookie-dismiss fix, porndoe connector 2026-05-22 11:20:57 +02:00
theporndude_coverage_match.py Mobile 0.1.9: OTA enable, WebView cookie-dismiss fix, porndoe connector 2026-05-22 11:20:57 +02:00
theporndude_curl_triage.py Mobile 0.1.9: OTA enable, WebView cookie-dismiss fix, porndoe connector 2026-05-22 11:20:57 +02:00
theporndude_movies_pipeline.py Mobile 0.1.9: OTA enable, WebView cookie-dismiss fix, porndoe connector 2026-05-22 11:20:57 +02:00
theporndude_resolve_domains.py Mobile 0.1.9: OTA enable, WebView cookie-dismiss fix, porndoe connector 2026-05-22 11:20:57 +02:00
theporndude_scorecard.py Mobile 0.1.9: OTA enable, WebView cookie-dismiss fix, porndoe connector 2026-05-22 11:20:57 +02:00
title_levenshtein_benchmark.py Initial commit 2026-05-20 10:10:22 +02:00
tpdb_backfill.py Initial commit 2026-05-20 10:10:22 +02:00
tpdb_backfill_status.py Initial commit 2026-05-20 10:10:22 +02:00
tpdb_studio_backfill.py Initial commit 2026-05-20 10:10:22 +02:00