Implement local compatibility APIs by CobraSoftware · Pull Request #59 · jjang-ai/vmlx

CobraSoftware · 2026-04-09T17:21:40Z

overview

This is desgined to add more API endpoints for boarder compatiblity.
It was heavily assisted by AI.

Summary

This PR expands the local compatibility surface for the MLX-backed server while keeping inference local-only.

Main additions:

OpenAI compatibility improvements, including broader response/resource coverage and realtime session support
Anthropic and Ollama compatibility validation
LM Studio native API support under /lmstudio/v1/*
Deepgram-compatible local APIs under /deepgram/vl/*
Model-family detection for Qwen3 Omni and Voxtral realtime variants
Additional local implementations for practical OpenAI-style resource APIs such as files and persisted responses retrieval

What Changed

Added LM Studio endpoints:
- /lmstudio/v1/models
- /lmstudio/v1/models/load
- /lmstudio/v1/models/unload
- /lmstudio/v1/models/download
- /lmstudio/v1/models/download/status
- /lmstudio/v1/chat
Added Deepgram endpoints:
- /deepgram/vl/listen
- /deepgram/vl/speak
- /deepgram/vl/read
- /deepgram/vl/models
- /deepgram/vl/models/{model_id}
Expanded OpenAI-compatible surface:
- Realtime session endpoints
- Realtime client secrets
- Realtime transcription sessions
- Local files API
- Local responses/{id} retrieval, delete, and input item listing
- Broader spec alias coverage
- Local-only 501 not_implemented_local fallback responses for remaining unsupported cloud paths
Improved auth compatibility:
- Authorization: Bearer <key>
- Authorization: Token <key>
- x-api-key
- api-key

Testing

I tested compelation python3 -m py_compile vmlx_engine/server.py tests/test_deepgram_api.py tests/test_lmstudio_api.py tests/test_openai_spec_surface.py tests/test_realtime_compat.pyand did get all tests to passs but I lack the hardware to do larger in-depth realworld testing with bigger models.

jjang-ai · 2026-05-21T10:46:11Z

Thanks for the compatibility API work. I am deferring this from the current release-hardening lane.

It is a large API surface PR, currently conflicts with main, and needs a separate current-source design/test pass before merge. Right now I am prioritizing targeted fixes and correctness regressions.

jjang-ai · 2026-05-22T03:17:35Z

Reviewed for the current release-hardening pass. I am not direct-merging this PR into the immediate release branch because the diff is very broad (server.py + multiple compatibility APIs) and includes LM Studio, Deepgram, realtime, local files, and persisted response/resource endpoints that are outside the current release-fix scope.\n\nWhat current source already covers and I verified today:\n- OpenAI Chat Completions sampling/default propagation;\n- OpenAI Responses sampling/default propagation and output-cap behavior;\n- Anthropic adapter bundle-default behavior;\n- Ollama gateway request translation / streaming done behavior;\n- panel request-builder sampling and output overrides;\n- API/cache interaction gates for DSV4 native cache, ZAYA typed CCA, hybrid SSM, TQ KV, disk roundtrip, and DSV4/DSML tool parsing.\n\nI also found and fixed a current release-gate harness regression while reviewing this PR: tests/cross_matrix/run_api_surface_contract.py and run_cache_architecture_contract.py referenced a missing tests/cross_matrix/run_noheavy_api_cache_contract.py. Restored that runner and pushed commit 4df8b3b (test: restore API cache contract runner).\n\nVerification after the runner fix:\n- run_noheavy_api_cache_contract.py -> status=pass (15 API, 5 scheduler-cache, 32 TQ/MLLM-cache, 21 DSV4/DSML tool tests)\n- run_api_surface_contract.py -> status=pass\n- run_cache_architecture_contract.py -> status=pass\n- API parity/history focused pytest -> 38 passed\n\nLeaving this PR open as a future broader compatibility-API feature lane. Credit to @CobraSoftware for the local compatibility API direction; pieces should be split/landed with isolated endpoint specs and tests rather than merged wholesale into this release.

CobraSoftware · 2026-06-01T03:19:43Z

Reviewed for the current release-hardening pass. I am not direct-merging this PR into the immediate release branch because the diff is very broad (server.py + multiple compatibility APIs) and includes LM Studio, Deepgram, realtime, local files, and persisted response/resource endpoints that are outside the current release-fix scope.\n\nWhat current source already covers and I verified today:\n- OpenAI Chat Completions sampling/default propagation;\n- OpenAI Responses sampling/default propagation and output-cap behavior;\n- Anthropic adapter bundle-default behavior;\n- Ollama gateway request translation / streaming done behavior;\n- panel request-builder sampling and output overrides;\n- API/cache interaction gates for DSV4 native cache, ZAYA typed CCA, hybrid SSM, TQ KV, disk roundtrip, and DSV4/DSML tool parsing.\n\nI also found and fixed a current release-gate harness regression while reviewing this PR: tests/cross_matrix/run_api_surface_contract.py and run_cache_architecture_contract.py referenced a missing tests/cross_matrix/run_noheavy_api_cache_contract.py. Restored that runner and pushed commit 4df8b3b (test: restore API cache contract runner).\n\nVerification after the runner fix:\n- run_noheavy_api_cache_contract.py -> status=pass (15 API, 5 scheduler-cache, 32 TQ/MLLM-cache, 21 DSV4/DSML tool tests)\n- run_api_surface_contract.py -> status=pass\n- run_cache_architecture_contract.py -> status=pass\n- API parity/history focused pytest -> 38 passed\n\nLeaving this PR open as a future broader compatibility-API feature lane. Credit to @CobraSoftware for the local compatibility API direction; pieces should be split/landed with isolated endpoint specs and tests rather than merged wholesale into this release.

Thanks. I amd happy to continue working on this. If you have priority list let me know other wise I will open a pr for one change at a time.

CobraSoftware added 3 commits April 9, 2026 10:19

Implement local compatibility APIs and review artifacts

095c690

Stop tracking tmp artifacts

f48f5b4

Fix test regressions in server and cache compatibility

a46e799

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement local compatibility APIs#59

Implement local compatibility APIs#59
CobraSoftware wants to merge 3 commits into
jjang-ai:mainfrom
CobraSoftware:api-work

CobraSoftware commented Apr 9, 2026 •

edited

Loading

Uh oh!

jjang-ai commented May 21, 2026

Uh oh!

jjang-ai commented May 22, 2026

Uh oh!

CobraSoftware commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

CobraSoftware commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

overview

Summary

What Changed

Testing

Uh oh!

jjang-ai commented May 21, 2026

Uh oh!

jjang-ai commented May 22, 2026

Uh oh!

CobraSoftware commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

CobraSoftware commented Apr 9, 2026 •

edited

Loading