Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: opendatalab/MinerU
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: master
Choose a base ref
...
head repository: Zhruoshui/Embedded_MinerU
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: master
Choose a head ref
Checking mergeability… Don’t worry, you can still create the pull request.
  • 5 commits
  • 231 files changed
  • 1 contributor

Commits on May 8, 2026

  1. feat: 接入在线多模态视觉模型增强图片语义描述

    新增 image-description-config 配置,支持通过在线视觉模型(Qwen3-VL-32B-Thinking 等
    OpenAI-compatible API)对文档中的图片/chart 块进行语义内容增强,替代本地 VLM
    模型输出,显著提升图片描述准确率。
    
    功能要点:
    - 配置驱动:mineru.template.json 中新增 image-description-config 段,字段包括
      enable/api_key/base_url/model/enable_thinking/max_tokens/temperature
    - 统一泛化提示词:模型自动识别图片类型(照片/图表/流程图/表格/UI截图等),
      选择最佳输出格式(自然语言描述/markdown表格/mermaid/结构化文本)
    - 后处理增强:在 VLM 版面分析完成、图片写入磁盘后,对 image/chart 块的
      content 字段调用在线模型重写,发生在 finalize_middle_json 之前
    - 图片预处理:最长边缩放至 2048px,JPEG 质量 85,base64 编码传输
    - 降级策略:API 调用失败时保留本地 VLM 原始 content,记录 warning 日志
    - Thinking 模式支持:自动解析 <think>...</think> 标签,仅保留实际回答
    - 并发处理:ThreadPoolExecutor 控制最大并发数为 3
    - 覆盖 VLM 和 Hybrid 所有后端(同步 + 异步 共 6 个调用点)
    
    新增文件:
    - mineru/backend/vlm/image_enhance.py    核心逻辑模块
    
    修改文件:
    - mineru/utils/config_reader.py          新增 get_image_description_config()
    - mineru/backend/vlm/vlm_analyze.py      同步/异步流程集成
    - mineru/backend/vlm/model_output_to_middle_json.py  独立入口集成
    - mineru/backend/hybrid/hybrid_analyze.py            同步/异步流程集成
    - mineru/backend/hybrid/hybrid_model_output_to_middle_json.py  独立入口集成
    - mineru/utils/llm_client.py             修复 pylint W0718 (broad-exception-caught)
    Zhruoshui committed May 8, 2026
    Configuration menu
    Copy the full SHA
    c21219f View commit details
    Browse the repository at this point in the history
  2. fix: 修复图片描述增强的两个问题

    1. _collect_image_spans 未处理两层嵌套结构:MagicModel 生成的 image/chart 块是
       image → blocks → image_body → lines → spans 的嵌套结构,修复后正确遍历
    2. enable_thinking 参数不兼容:Qwen3-VL-32B-Thinking 在硅基流动 API 中已内置
       thinking 模式,不再通过 extra_body 传递该参数
    3. 增强计数逻辑改为对比原始内容与增强后内容,确保只统计真正被替换的图片
    Zhruoshui committed May 8, 2026
    Configuration menu
    Copy the full SHA
    290fb52 View commit details
    Browse the repository at this point in the history
  3. chore: remove non-core resources (web UI, docker, CI, docs, demo) and…

    … add system overview explore doc
    Zhruoshui committed May 8, 2026
    Configuration menu
    Copy the full SHA
    6089b9c View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    b3c4037 View commit details
    Browse the repository at this point in the history
  5. docs: add project README and CodeStable workflow files

    - Add README introducing the fork, embedded domain focus, and quick start
    - Add .codestable/ with architecture, requirements, features, issues docs
    - Document image semantic enhancement feature (design + acceptance)
    - Document image enhancement bug fixes (report + fix-note)
    - Document scope decision to strip non-core distribution artifacts
    - Add compound knowledge: explore, trick, and learning docs
    - Track .codestable/ in git (remove from .gitignore)
    Zhruoshui committed May 8, 2026
    Configuration menu
    Copy the full SHA
    b6247ee View commit details
    Browse the repository at this point in the history
Loading