Skip to content

Latest commit

 

History

History
82 lines (66 loc) · 12.2 KB

File metadata and controls

82 lines (66 loc) · 12.2 KB

⚖️ Legal Ops & Compliance

Contract workflows, forms, document review, archive search, and evidence-oriented legal and compliance support.

Who this is for

  • Legal operations, compliance, and records teams that need document intake, review, redaction, and archive workflows.
  • Teams preparing evidence packets where provenance and review boundaries matter.

Jobs covered

  • Convert scanned PDFs and office files into searchable text.
  • Extract clauses, tables, attachments, and metadata from mixed records.
  • Run cited research and matter knowledge retrieval with source boundaries.
  • Build diligence review tables and route higher-risk agent actions through approval gates.
  • Redact sensitive data before sharing or indexing.
  • Search large archives before manual review.

Workflow Stacks

  • Document review packet: OCR → extract text and tables → redact PII → search archive → export review notes
  • Signing and forms: Prepare PDF forms → route signature → store final packet → index metadata
  • Research and diligence support: Search cited sources → ingest matter documents → extract review-table fields → gate external actions → preserve decision evidence

Recommended Picks

Skill What it does here Persona Install Stars
Documenso Open Source Document Signing Platform Adds an auditable signing path for contract and approval packets. Legal ops / contract admin High 12.6k
DocuSeal Open Source Document Signing and PDF Form Platform Combines PDF form preparation and signatures for document-heavy approval flows. Legal ops / forms administrator Medium 11.7k
OCRmyPDF Searchable PDF OCR Pipeline Turns scanned evidence and records into searchable PDFs before review. Records manager / compliance analyst Medium 33.2k
Apache Tika Document Extractor Provides broad-format document extraction when matter files include Office docs, PDFs, and attachments. eDiscovery engineer / records ops High 3.7k
Apache Tika Document Parser Extracts metadata and embedded objects from heterogeneous files for archive triage. Compliance engineer / archive specialist High 3.7k
Extract structured text, metadata, tables, and images from mixed documents through an MCP server with Kreuzberg Adds an MCP-accessible extraction layer for PDFs, Office files, images, HTML, and other mixed matter inputs before review or indexing. Matter knowledge engineer / eDiscovery ops High 7.6k
pdfplumber Python PDF Text and Table Extraction Library Pulls tables, text, and layout clues from contract exhibits and regulatory PDFs. Legal analyst / data wrangler Medium 10.1k
Parse local PDFs into agent-ready text, JSON, and screenshots with LiteParse Creates text, spatial JSON, and screenshots so reviewers can inspect what an agent saw. Document review lead / AI ops Medium 5.1k
Search PDFs, Office files, ebooks, and archives with one query before manual review Finds relevant records across mixed archives before humans spend time opening files one by one. Investigator / records analyst Low 9.6k
Paperless-ngx Document OCR and Archive Management System Provides a durable archive system for scanned paperwork, tags, correspondents, and retrieval. Compliance ops / records manager High 38.1k
LangExtract LLM-Powered Structured Text Extraction Extracts named entities, obligations, dates, and clauses into auditable structured outputs. Legal analyst / compliance reviewer Medium 35k
Turn messy document collections into structured rows with DocETL Turns large contract, diligence, or evidence sets into repeatable structured rows with failure review across the corpus. Diligence lead / legal data analyst High 3.7k
Redact PII from text before sharing or indexing with scrubadub Redacts sensitive identifiers before content enters search, summarization, or external review. Privacy analyst / compliance ops Low 421
Search large PDFs and read only the relevant pages before answering Limits review to relevant pages of long PDFs instead of pushing full documents through an agent. Legal researcher / review analyst Medium 17
Run local deep research workflows with Local Deep Research Runs private cited research across web, academic, and local document sources while preserving source links and a controlled knowledge base. Legal researcher / knowledge manager High 7.9k
Process, redact, OCR, and sign documents with Nutrient Agent Skill Bundles OCR, redaction, form filling, conversion, and signing for governed document operations. Document automation lead High 5
Convert dense PDFs into LLM-ready text and page-aligned markdown with olmOCR Converts dense scanned or layout-heavy PDFs into page-aligned text for cited review. eDiscovery analyst / knowledge engineer High 17.1k
Turn documents into validated knowledge graphs with Docling Graph Extracts schema-checked entities and relationships when matters need structured fact maps. Knowledge engineer / compliance analyst High 134
Use RAGFlow as a retrieval and context layer for agent workflows Provides a supervised RAG layer for matter document knowledge bases with traceable source support before agent answers are reviewed. Matter knowledge manager / legal AI ops High 79.8k
Extract structured markdown, JSON, and tagged-PDF-ready outputs from PDFs with OpenDataLoader PDF Produces markdown, coordinate-aware JSON, and accessibility-oriented outputs from PDF packets. Document processing engineer High 19.1k
Enrich Paperless-ngx documents with AI-generated titles tags and correspondents using paperless-gpt Improves archive metadata after ingestion so humans can search and route records faster. Records manager / knowledge ops High 2.3k
Capture a live webpage as a clean PDF or readable archive for offline review with Percollate Preserves web evidence as readable offline artifacts for citation and handoff. Investigator / compliance analyst Low 4.6k
Extract structured data and attachments from raw email with MailParser Normalizes raw email evidence and attachments before archive search or review. Legal ops / mailbox reviewer Medium 1.7k
Strip quoted email history and signatures before summarizing inbound replies Separates the newest human reply from long threads so summaries do not duplicate history. Case manager / legal assistant Low 78
Load .mbox mail archives into SQLite for offline search, audits, and dataset joins Turns mailbox archives into queryable SQLite evidence stores for offline audit work. Investigator / data analyst Medium 39
MarkItDown Document-to-Markdown Converter by Microsoft Converts Office files, PDFs, email-like documents, and other matter inputs into Markdown for review packets and audit summaries. Legal ops analyst / compliance reviewer Medium 93.2k
MinerU PDF-to-Markdown Document Parser Handles complex PDFs with layout-aware Markdown and JSON output for contract packets, exhibits, and long-form compliance evidence. eDiscovery engineer / records analyst High 57.8k
Put approval gates and audit-ready policy checks between agents and external actions with DashClaw Adds approval gates and replayable decision evidence when legal AI workflows need human review before external actions. Legal AI governance lead / compliance ops High 241
Extract OCR-ready Markdown from documents with Zerox Turns scanned contracts, exhibits, and evidence PDFs into reviewable Markdown before legal search or redaction workflows. Legal ops / document review lead Medium 12.2k
Build managed document parsing pipelines with LlamaCloud Services Provides a managed parsing path for high-volume contract and policy packets where local parsers are too brittle. Legal ops automation engineer Medium 4.2k
Prepare agent-ready PDF and document extraction with PyMuPDF Gives legal ops a fast source-backed PDF extraction step before contract review, diligence packets, or clause analysis. Legal ops analyst / contract reviewer Medium 10.1k
Convert complex PDFs and document images into agent-ready Markdown with OCRFlux Turns scanned or layout-heavy legal documents into Markdown that downstream review agents can cite and inspect. Legal document intake specialist High 2.5k
Parse agent-ready PDFs and document images with MonkeyOCR Adds an OCR path for mixed legal PDFs and document images before matter summaries or compliance checks. Paralegal / compliance intake operator High 6.6k
Evaluate document parsers for agent ingestion with ParseBench Lets teams compare extraction quality before standardizing a parser for legal review workflows. Legal automation lead / document QA Medium 474

Editorial Notes

  • The collection avoids legal-advice framing; these are intake, evidence, and operations tools.
  • Document-centric entries are favored over general security scanners unless they support compliance evidence work directly.
  • Research and RAG picks are framed as source-grounded support for legal operations and human review, not automated legal advice.
  • Do not let infra-policy scanners take over this collection. Keep v1 document-centric.

Adjacent Collections


← Back to industry collections