| Create, repair, and recalculate spreadsheet workbooks without breaking formulas |
116.9k |
— |
| MarkItDown Document-to-Markdown Converter by Microsoft |
93.2k |
— |
| PostgreSQL MCP Server |
85.3k |
— |
| SQLite MCP Server |
85.3k |
— |
| Build document-grounded agent context workflows with RAGFlow |
79.8k |
— |
| Use RAGFlow as a retrieval and context layer for agent workflows |
79.8k |
— |
| Elasticsearch MCP |
76.5k |
— |
| PaddleOCR Multilingual Document OCR and Structured Data Toolkit |
73.7k |
— |
| Tesseract OCR Data Extractor |
73.6k |
— |
| Tesseract OCR Document Extractor |
73.6k |
— |
| Apache Superset Dashboard and SQL Exploration Skill |
72.3k |
— |
| Protocol Buffer Schema Generator |
71.2k |
— |
| Scrapy Spider Data Pipeline |
61.3k |
— |
| MinerU PDF-to-Markdown Document Parser |
57.8k |
— |
| Docling Document Parsing and Conversion |
57.8k |
— |
| Docling Document Conversion and Extraction Toolkit |
57.6k |
— |
| Docling Document Parsing and Conversion Toolkit |
57.6k |
— |
| Docling AI Document Intelligence Pipeline |
56.9k |
— |
| Pandas DataFrame Pipeline Builder |
48.5k |
— |
| Pandas DataFrame Pipeline Orchestrator |
48.5k |
— |
| Pandas DataFrame Schema Enforcer |
48.5k |
— |
| Pandas DataFrame Schema Validator |
48.5k |
— |
| Pandas Profiling Report Generator |
48.5k |
— |
| ClickHouse Query Agent |
46.9k |
— |
| Metabase Open Source Business Intelligence and Embedded Analytics |
46.8k |
15/wk |
| Apache Airflow MCP |
45k |
— |
| Apache Spark Job Manager |
43.1k |
— |
| Apache Spark DataFrame ETL Pipeline |
43.1k |
— |
| Paperless-ngx Document OCR and Archive Management System |
38.1k |
— |
| Polars Blazing-Fast DataFrame Query Engine |
37.9k |
— |
| DuckDB SQL Analytics Agent |
37.1k |
— |
| LangExtract LLM-Powered Structured Text Extraction |
35k |
— |
| jq JSON Stream Transformer |
34.5k |
— |
| jq Pipeline Builder Agent |
34.5k |
— |
| Marker PDF-to-Markdown Converter |
33.2k |
— |
| LightRAG Graph-Based Retrieval-Augmented Generation Framework |
33.2k |
— |
| Apache Kafka Schema Extractor |
32.5k |
— |
| Apache Kafka Stream Transformer |
32.4k |
— |
| Apache Kafka Stream Processor |
32.4k |
— |
| Cheerio DOM Extraction Pipeline |
30.3k |
19.6M/wk |
| Cheerio HTML and XML Parsing Library for Node.js Extraction Workflows |
30.3k |
19.6M/wk |
| Turn mixed local folders into a queryable knowledge graph with Graphify |
25.7k |
— |
| Typesense Typo-Tolerant Search Engine |
25.5k |
— |
| Airbyte Connector Config Generator |
21.1k |
— |
| Teable No-Code Postgres Database Platform and Airtable Alternative |
21.1k |
— |
| fx Terminal JSON Viewer and Processor |
20.4k |
206k/wk |
| GraphQL Data Federation Agent |
20.3k |
34.2M/wk |
| GraphQL Schema Introspection Mapper |
20.3k |
34.2M/wk |
| Surya Document OCR with Layout Analysis and Table Recognition |
19.5k |
— |
| Extract structured markdown, JSON, and tagged-PDF-ready outputs from PDFs with OpenDataLoader PDF |
19.1k |
— |
| gallery-dl Image Gallery and Collection Downloader |
17.5k |
— |
| Convert dense PDFs into LLM-ready text and page-aligned markdown with olmOCR |
17.1k |
— |
| Maxun No-Code Web Data Extraction Platform |
15.3k |
— |
| Dagster Data Pipeline Orchestrator |
15.3k |
— |
| yq YAML and Structured Data Processor |
15.1k |
— |
| CSV Schema Validator & Auto-Fixer |
14.7k |
291.1M/wk |
| Unstructured Document ETL Toolkit |
14.5k |
— |
| Unstructured Document ETL for LLM Pipelines |
14.4k |
— |
| gron Greppable JSON Flattener |
14.4k |
— |
| Unstructured Document Partitioning and ETL Library for LLM Pipelines |
14.4k |
— |
| Gitingest Repository-to-Prompt Codebase Extraction Tool |
14.3k |
— |
| Generate LLM fine-tuning, RAG, and eval datasets from source material with easy-dataset |
14k |
— |
| Instructor Structured Data Extraction from LLMs |
12.7k |
— |
| dbt MCP Server |
12.6k |
— |
| dbt Cloud MCP |
12.6k |
— |
| dbt Data Transform Orchestrator |
12.6k |
— |
| dbt Data Transformation Orchestrator |
12.6k |
— |
| dbt Model Dependency Analyzer |
12.6k |
— |
| dbt Model Dependency Resolver |
12.6k |
— |
| dbt Model Lineage & Test Coverage Checker |
12.6k |
— |
| dbt Model Lineage Analyzer |
12.6k |
— |
| dbt Model Lineage Extractor |
12.6k |
— |
| dbt Model Lineage Mapper |
12.6k |
— |
| dbt Model Transformation Architect |
12.6k |
— |
| Datasette Data Exploration and Publishing Tool |
10.9k |
— |
| Grist Self-Hosted Relational Spreadsheet and Database Platform |
10.8k |
— |
| xsv High-Performance CSV Toolkit |
10.8k |
— |
| Jina Reader URL-to-Markdown Converter and Web Search API |
10.6k |
— |
| Orama Embeddable Search Engine and RAG Pipeline for JavaScript |
10.3k |
— |
| pdfplumber Python PDF Text and Table Extraction Library |
10.1k |
— |
| Miller CSV TSV JSON Data Processor |
9.8k |
— |
| Gorse AI-Powered Open Source Recommender System Engine |
9.6k |
— |
| Translate and validate SQL across dialects with SQLGlot |
9.1k |
— |
| Profile and triage messy tabular files from the terminal with VisiData |
9k |
— |
| WeasyPrint HTML and CSS to PDF Document Generator |
8.8k |
— |
| Redpanda Connect Declarative Stream Processor |
8.6k |
— |
| Normalize raw CLI output into JSON for reliable downstream parsing and automation |
8.6k |
— |
| Dasel Multi-Format Data Selector and Modifier |
7.9k |
— |
| Steampipe Zero-ETL SQL Cloud API Query Engine |
7.7k |
— |
| Extract structured text, metadata, tables, and images from mixed documents through an MCP server with Kreuzberg |
7.6k |
— |
| htmlq Command-Line HTML Content Extractor with CSS Selectors |
7.5k |
— |
| Migrate MySQL, SQLite, or CSV data into PostgreSQL with repeatable load files before cutover with pgloader |
6.4k |
— |
| Sync cloud and SaaS inventory into SQL tables for audits with CloudQuery |
6.4k |
— |
| csvkit Python CSV Utility Suite |
6.4k |
— |
| Apache Camel Route Data Mapper |
6.2k |
— |
| Convert DOCX documents into clean HTML for publishing workflows with Mammoth |
6.2k |
— |
| Evidence BI-as-Code SQL and Markdown Analytics Framework |
6.1k |
— |
| jnv Interactive JSON Navigator and jq Filter Editor |
6k |
— |
| dlt Python Data Load Tool |
5.2k |
— |
| ExifTool Metadata Reader and Writer for Images and Files |
4.6k |
— |
| franc Natural Language Detection Library and CLI |
4.4k |
— |
| Stripe Revenue Analytics Dashboard Builder |
4.4k |
9.3M/wk |
| Apache Kafka Schema Registry Extractor |
4k |
2.5M/wk |
| Apache Kafka Schema Registry Validator |
4k |
2.5M/wk |
| xan SIMD-Powered CSV Processing and Analysis CLI |
3.9k |
— |
| Newsboat Terminal RSS and Atom Feed Reader |
3.8k |
— |
| Inspect large CSV files interactively before cleanup, mapping, or downstream transforms with csvlens |
3.7k |
56.9k/wk |
| Turn messy document collections into structured rows with DocETL |
3.7k |
— |
| Apache Tika Content Extraction Hub |
3.7k |
— |
| Apache Tika Document Parser |
3.7k |
— |
| Apache Tika Document Parser Agent |
3.7k |
— |
| Apache Tika Document Extractor |
3.7k |
— |
| Camelot Advanced PDF Table Intelligence |
3.7k |
— |
| Camelot PDF Stream Parser |
3.7k |
— |
| PDF Table Extraction with Camelot |
3.7k |
— |
| Profile and clean large CSV datasets from the terminal with qsv |
3.6k |
— |
| qsv Blazing-Fast CSV Data Wrangling Toolkit |
3.6k |
— |
| Ingestr Cross-Database Data Copier |
3.4k |
— |
| Apache Avro Schema Evolution Agent |
3.3k |
— |
| JSON-to-Avro Schema Transformer |
3.3k |
— |
| Plan and preview warehouse SQL model changes before rollout with SQLMesh |
3k |
— |
| Postgres MCP Pro |
2.7k |
— |
| Diff nested JSON, API responses, and config snapshots before approving changes |
2.5k |
— |
| Meltano Declarative ELT Data Integration Engine |
2.4k |
— |
| Enrich Paperless-ngx documents with AI-generated titles tags and correspondents using paperless-gpt |
2.3k |
— |
| rehype Plugin-Based HTML Processor by the Unified Collective |
2.2k |
— |
| trdsql SQL Query Engine for CSV JSON and YAML Files |
2.2k |
— |
| Extract invoice fields from vendor PDFs into structured records |
2.1k |
— |
| markdownify Python HTML to Markdown Conversion Library |
2.1k |
— |
| sqlite-utils Python CLI for SQLite Database Manipulation |
2k |
— |
| Tabula PDF Table Extraction Agent |
2k |
— |
| Tabula PDF Table Extractor |
2k |
— |
| Query and rewrite Markdown structure with mdq |
1.7k |
— |
| Anyquery Universal SQL Engine with MCP Integration |
1.7k |
— |
| Repair, split, merge, and normalize PDFs with qpdf before downstream processing |
1.5k |
— |
| Documind AI-Powered Structured Data Extraction from Documents |
1.5k |
14/wk |
| Salesforce Bulk API Data Loader |
1.5k |
936.6k/wk |
| Infer And Normalize Broken CSV Dialects Before Import With Clevercsv |
1.3k |
— |
| Export Obsidian vaults into clean Markdown trees for publishing or downstream processing |
1.3k |
— |
| xq Command-Line XML and HTML Beautifier and Content Extractor |
1.1k |
— |
| Extract structured fields from HTML XML and JSON endpoints with Xidel selectors |
835 |
— |
| Give agents governed semantic data context with Wren Engine |
661 |
— |
| dbt MCP Server for Data Pipeline Context |
526 |
— |
| Compare dbt models and warehouse relations before trusting migration parity with dbt-audit-helper |
402 |
— |
| Parquet Column Mapper |
387 |
170.7k/wk |
| Parquet Column Pruning Optimizer |
387 |
170.7k/wk |
| Parquet Column Statistics Profiler |
387 |
170.7k/wk |
| Parquet Schema Extractor for S3 |
387 |
170.7k/wk |
| Operate Airflow and warehouse workflows through agent-safe data engineering skills with Astronomer Agents |
337 |
— |
| Compare recurring CSV, TSV, or JSON exports and emit row-level change sets before syncs |
330 |
— |
| Weaviate MCP Server |
161 |
— |
| Turn documents into validated knowledge graphs with Docling Graph |
134 |
— |
| Crawl4AI MCP Server |
84 |
— |
| Turn captured WARC pages into clean text and language-tagged records with warc2text |
23 |
— |
| Search large PDFs and read only the relevant pages before answering |
17 |
— |
| Process, redact, OCR, and sign documents with Nutrient Agent Skill |
5 |
— |
| Convert HTML emails and web fragments into clean plain text for downstream agents |
— |
8.2M/wk |
| Metabase Dashboard Snapshot & Alerting |
— |
— |
| Parquet to PostgreSQL Loader |
— |
— |
| QuickBooks Online Invoice Reconciliation Agent |
— |
— |
| Reddit Subreddit Sentiment Tracker |
— |
— |
| Snowflake MCP |
— |
— |
| Snowflake MCP Server |
— |
— |
| Snowflake Query History Extractor |
— |
— |
| Snowflake Query Optimizer Agent |
— |
— |
| Snowflake Query Profiler |
— |
— |
| Weights & Biases Run Monitor |
— |
— |
| XML XSLT Transform Pipeline |
— |
— |