MinerU · 全程本地解析 / local pipeline

烂文档 → 干净结构 Messy documents, parsed into clean Markdown & JSON

左边是现实里没法直接用的文档原页,右边是 MinerU 在本地一条命令解析出的结果。三份样例都来自真实运行,未手工修饰。

01公式还原 · LaTeX from a 1953 scan
raw scanNACA-1135 · 1953
扫描论文上的公式原图
parsed · LaTeXrender
$$ \nu = -\int_{p_*}^{p} \frac{\sin 2\mu}{2\gamma p}\,dp\ \ [\mathrm{isen,\ therm\ perf}] $$
02表格还原 · structured data table
source pdfNASA CR-2008
PDF 里的数据表原图
parsed · tableHTML/MD
Airfoilt/cARCD2max
measured
AERODASStCFlat-plate
NACA 00120.12Inf.2.0902.0531.9021.980
NACA 00120.12Inf.2.0902.0531.9021.980
NACA 00150.15Inf.1.7002.0071.8781.980
NACA 44090.09Inf.2.1002.1001.9851.980
NACA 44120.12Inf.2.0602.0531.9591.980
NACA 44150.15Inf.2.0682.0071.9331.980
NACA 44180.18Inf.2.0601.9641.9061.980
NACA 00120.12Inf.2.0502.0531.9021.980
NACA 230120.12Inf.2.0822.0531.9481.980
NACA 230170.17Inf.2.0781.9781.9021.980
FX-84-W-1270.13Inf.2.0002.0421.9641.980
FX-84-W-2180.22Inf.2.0401.9111.9391.980
真实单元格与数值,与左图逐格一致 · 节选前 12 行(完整 28 行见下方 .md / .json 下载)
03图表抽取 · extracted figure
raw scanNACA-1135 · chart
扫描页里的图表原页
parsed · imagecropped
抽取出的干净图表
Variation of surface pressure coefficient with cone semivertex angle for various upstream Mach numbers.