Modal VLM Doc Parser
OCR-first document parsing with selective VLM fallback
A two-stage document parsing service on Modal. Pages are rasterized, classified, and routed through PaddleOCR PP-StructureV3 (A10G); pages that fail the OCR confidence threshold fall back to a Qwen3-VL-8B vLLM worker (L4). Results are published in two stages — completed_fast and completed_final — so callers can stream progressive output, and a separate SGLang server handles entity extraction in parallel.