Stars
【rulelift,规则策略评估及优化的python包。欢迎star和pr~ 】 Rulelift is a Python toolkit designed for strategy rule effectiveness analysis and automatic rule mining.
WebMainBench is a high-precision benchmark for evaluating web main content extraction.
MinerU-HTML: An SLM-powered HTML main content extractor that outputs clean HTML bodies. Perfect for Deep Research Agents, RAG applications, and training data generation.
A Python package for MinerU document processing and RAG (Retrieval-Augmented Generation) knowledge base construction.
Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.
基于 MinerU 的智能论文阅读助手,提供 PDF 文档解析、OCR 识别、表格提取等功能。
data-find-questions
SHUzhangshuo / dingo
Forked from MigoXLab/dingoDingo: A Comprehensive AI Data Quality Evaluation Tool
SHUzhangshuo / MinerU
Forked from opendatalab/MinerUA high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
SHUzhangshuo / label-studio
Forked from HumanSignal/label-studioLabel Studio is a multi-type data labeling and annotation tool with standardized output format
Dingo: A Comprehensive AI Data, Model and Application Quality Evaluation Tool

