privacy-filter-service

Local HTTP service that runs the OpenAI Privacy Filter model to detect PII in text and replace it with category-specific placeholders before you send it to an LLM.

This is a self-hostable, pure-Rust inference engine for the model weights published by OpenAI at openai/privacy-filter on HuggingFace. No API keys or external services required.

POST /anonymize  {"texts": ["Email me at jo@acme.com"]}
  -> {"results": [{"redacted": "Email me at [PRIVATE_EMAIL]",
                   "spans": [{"category":"private_email","start":12,"end":23,
                              "text":"jo@acme.com","placeholder":"[PRIVATE_EMAIL]"}]}]}
GET  /health -> ok

Quick start

Docker (recommended)

docker run -p 10123:10123 -v "$PWD/model:/model" \
  ghcr.io/potential-ly/privacy-filter-service:latest

Mount /model to persistent storage. The model files are ~2.7 GB on disk. If you omit the -v flag they will be downloaded into the container's ephemeral overlay FS and lost on every restart, update, or pod reschedule (e.g. Kubernetes rolling updates).

The first time you run it with an empty ./model directory, the service will automatically download the required model files from the openai/privacy-filter repository. Subsequent starts reuse the cached files.

Build from source

cargo build --release
PF_MODEL_DIR=./model ./target/release/privacy-filter-service

Request examples

POST /anonymize accepts a JSON body with a texts array and an optional output_mode string:

Field	Required	Default	Description
`texts`	yes	—	Array of strings to redact
`output_mode`	no	`"typed"`	`"typed"` (category-specific placeholders) or `"redacted"` (all spans collapsed to `[REDACTED]`)
`discard_overlapping_spans`	no	`false`	Drop overlapping spans independently within each label

output_mode: "typed" (default)

Replaces each detected span with a category-specific placeholder like [PRIVATE_PERSON]. The spans array in the response tells you exactly what was found and where.

curl -s localhost:10123/anonymize \
  -H 'Content-Type: application/json' \
  -d '{"texts": ["Contact Alice at alice@example.com"]}'

{
  "results": [{
    "redacted": "Contact [PRIVATE_PERSON] at [PRIVATE_EMAIL]",
    "spans": [
      {"category":"private_person","start":8,"end":13,
       "text":"Alice","placeholder":"[PRIVATE_PERSON]"},
      {"category":"private_email","start":17,"end":35,
       "text":"alice@example.com","placeholder":"[PRIVATE_EMAIL]"}
    ]
  }]
}

output_mode: "redacted"

Collapses every detected span to the same [REDACTED] placeholder. Useful when you only care that something was hidden, not what category it was.

curl -s localhost:10123/anonymize \
  -H 'Content-Type: application/json' \
  -d '{"texts": ["Contact Alice at alice@example.com"], "output_mode": "redacted"}'

{
  "results": [{
    "redacted": "Contact [REDACTED] at [REDACTED]",
    "spans": [
      {"category":"private_person","start":8,"end":13,
       "text":"Alice","placeholder":"[REDACTED]"},
      {"category":"private_email","start":17,"end":35,
       "text":"alice@example.com","placeholder":"[REDACTED]"}
    ]
  }]
}

Configuration

Environment variables:

Variable	Default	Description
`PF_MODEL_DIR`	`./model`	Directory for model files
`PF_BIND`	`0.0.0.0:10123`	HTTP listen address

Model files

The service expects these files in PF_MODEL_DIR:

File	Purpose
`config.json`	Label taxonomy and model dimensions
`tokenizer.json`	HuggingFace tokenizer
`tokenizer_config.json`	Tokenizer metadata
`model.safetensors`	Model weights (~2.7 GB on disk, ~5.4 GB in memory)
`viterbi_calibration.json`	Optional CRF transition biases

Anything missing on startup is fetched from https://huggingface.co/openai/privacy-filter/resolve/main/.

Tests

# Unit tests (no model weights required)
cargo test --release --lib

# Python-parity integration tests (require model weights)
PF_TEST_MODEL_DIR=./model cargo test --release --test python_parity

# Batched-vs-single parity
PF_TEST_MODEL_DIR=./model cargo test --release --test batched_parity

# Criterion end-to-end benchmark
PF_TEST_MODEL_DIR=./model cargo bench

Tests that need weights skip automatically if the model files are missing, so a fresh checkout without weights still passes cargo test.

With Nix

# Local dev
nix develop
cargo run --release

# Release build
nix build
./result/bin/privacy-filter-service

# Container (layered)
nix build .#dockerImage
docker load < result
docker run -p 10123:10123 -v "$PWD/model:/model" privacy-filter-service:latest

Caveats inherited from the OpenAI Privacy Filter model

This is a redaction / data-minimization aid, not an anonymization or compliance guarantee. Quasi-identifiers ("the only nurse in our village clinic") are not caught by any NER model.
It tends to over-redact very short strings and may miss unusual or internal identifiers.

Architecture details

For implementation details — model structure, numerical parity, inference engine internals, and the full list of tuning knobs — see ARCHITECTURE.md.

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
.github/workflows		.github/workflows
benches		benches
src		src
tests		tests
.envrc		.envrc
.gitignore		.gitignore
.rumdl.toml		.rumdl.toml
.rustfmt.toml		.rustfmt.toml
ARCHITECTURE.md		ARCHITECTURE.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE-APACHE		LICENSE-APACHE
LICENSE-MIT		LICENSE-MIT
README.md		README.md
flake.lock		flake.lock
flake.nix		flake.nix
justfile		justfile

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

privacy-filter-service

Quick start

Docker (recommended)

Build from source

Request examples

Configuration

Model files

Tests

With Nix

Caveats inherited from the OpenAI Privacy Filter model

Architecture details

About

Licenses found

Uh oh!

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

privacy-filter-service

Quick start

Docker (recommended)

Build from source

Request examples

Configuration

Model files

Tests

With Nix

Caveats inherited from the OpenAI Privacy Filter model

Architecture details

About

Topics

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages