Skip to content

Implement local compatibility APIs#59

Open
CobraSoftware wants to merge 3 commits into
jjang-ai:mainfrom
CobraSoftware:api-work
Open

Implement local compatibility APIs#59
CobraSoftware wants to merge 3 commits into
jjang-ai:mainfrom
CobraSoftware:api-work

Conversation

@CobraSoftware

@CobraSoftware CobraSoftware commented Apr 9, 2026

Copy link
Copy Markdown

overview

This is desgined to add more API endpoints for boarder compatiblity.
It was heavily assisted by AI.

Summary

This PR expands the local compatibility surface for the MLX-backed server while keeping inference local-only.

Main additions:

  • OpenAI compatibility improvements, including broader response/resource coverage and realtime session support
  • Anthropic and Ollama compatibility validation
  • LM Studio native API support under /lmstudio/v1/*
  • Deepgram-compatible local APIs under /deepgram/vl/*
  • Model-family detection for Qwen3 Omni and Voxtral realtime variants
  • Additional local implementations for practical OpenAI-style resource APIs such as files and persisted responses retrieval

What Changed

  • Added LM Studio endpoints:

    • /lmstudio/v1/models
    • /lmstudio/v1/models/load
    • /lmstudio/v1/models/unload
    • /lmstudio/v1/models/download
    • /lmstudio/v1/models/download/status
    • /lmstudio/v1/chat
  • Added Deepgram endpoints:

    • /deepgram/vl/listen
    • /deepgram/vl/speak
    • /deepgram/vl/read
    • /deepgram/vl/models
    • /deepgram/vl/models/{model_id}
  • Expanded OpenAI-compatible surface:

    • Realtime session endpoints
    • Realtime client secrets
    • Realtime transcription sessions
    • Local files API
    • Local responses/{id} retrieval, delete, and input item listing
    • Broader spec alias coverage
    • Local-only 501 not_implemented_local fallback responses for remaining unsupported cloud paths
  • Improved auth compatibility:

    • Authorization: Bearer <key>
    • Authorization: Token <key>
    • x-api-key
    • api-key

Testing

I tested compelation python3 -m py_compile vmlx_engine/server.py tests/test_deepgram_api.py tests/test_lmstudio_api.py tests/test_openai_spec_surface.py tests/test_realtime_compat.pyand did get all tests to passs but I lack the hardware to do larger in-depth realworld testing with bigger models.

@jjang-ai

Copy link
Copy Markdown
Owner

Thanks for the compatibility API work. I am deferring this from the current release-hardening lane.

It is a large API surface PR, currently conflicts with main, and needs a separate current-source design/test pass before merge. Right now I am prioritizing targeted fixes and correctness regressions.

@jjang-ai

Copy link
Copy Markdown
Owner

Reviewed for the current release-hardening pass. I am not direct-merging this PR into the immediate release branch because the diff is very broad (server.py + multiple compatibility APIs) and includes LM Studio, Deepgram, realtime, local files, and persisted response/resource endpoints that are outside the current release-fix scope.\n\nWhat current source already covers and I verified today:\n- OpenAI Chat Completions sampling/default propagation;\n- OpenAI Responses sampling/default propagation and output-cap behavior;\n- Anthropic adapter bundle-default behavior;\n- Ollama gateway request translation / streaming done behavior;\n- panel request-builder sampling and output overrides;\n- API/cache interaction gates for DSV4 native cache, ZAYA typed CCA, hybrid SSM, TQ KV, disk roundtrip, and DSV4/DSML tool parsing.\n\nI also found and fixed a current release-gate harness regression while reviewing this PR: tests/cross_matrix/run_api_surface_contract.py and run_cache_architecture_contract.py referenced a missing tests/cross_matrix/run_noheavy_api_cache_contract.py. Restored that runner and pushed commit 4df8b3b (test: restore API cache contract runner).\n\nVerification after the runner fix:\n- run_noheavy_api_cache_contract.py -> status=pass (15 API, 5 scheduler-cache, 32 TQ/MLLM-cache, 21 DSV4/DSML tool tests)\n- run_api_surface_contract.py -> status=pass\n- run_cache_architecture_contract.py -> status=pass\n- API parity/history focused pytest -> 38 passed\n\nLeaving this PR open as a future broader compatibility-API feature lane. Credit to @CobraSoftware for the local compatibility API direction; pieces should be split/landed with isolated endpoint specs and tests rather than merged wholesale into this release.

@CobraSoftware

Copy link
Copy Markdown
Author

Reviewed for the current release-hardening pass. I am not direct-merging this PR into the immediate release branch because the diff is very broad (server.py + multiple compatibility APIs) and includes LM Studio, Deepgram, realtime, local files, and persisted response/resource endpoints that are outside the current release-fix scope.\n\nWhat current source already covers and I verified today:\n- OpenAI Chat Completions sampling/default propagation;\n- OpenAI Responses sampling/default propagation and output-cap behavior;\n- Anthropic adapter bundle-default behavior;\n- Ollama gateway request translation / streaming done behavior;\n- panel request-builder sampling and output overrides;\n- API/cache interaction gates for DSV4 native cache, ZAYA typed CCA, hybrid SSM, TQ KV, disk roundtrip, and DSV4/DSML tool parsing.\n\nI also found and fixed a current release-gate harness regression while reviewing this PR: tests/cross_matrix/run_api_surface_contract.py and run_cache_architecture_contract.py referenced a missing tests/cross_matrix/run_noheavy_api_cache_contract.py. Restored that runner and pushed commit 4df8b3b (test: restore API cache contract runner).\n\nVerification after the runner fix:\n- run_noheavy_api_cache_contract.py -> status=pass (15 API, 5 scheduler-cache, 32 TQ/MLLM-cache, 21 DSV4/DSML tool tests)\n- run_api_surface_contract.py -> status=pass\n- run_cache_architecture_contract.py -> status=pass\n- API parity/history focused pytest -> 38 passed\n\nLeaving this PR open as a future broader compatibility-API feature lane. Credit to @CobraSoftware for the local compatibility API direction; pieces should be split/landed with isolated endpoint specs and tests rather than merged wholesale into this release.

Thanks. I amd happy to continue working on this. If you have priority list let me know other wise I will open a pr for one change at a time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants