Implement local compatibility APIs#59
Conversation
|
Thanks for the compatibility API work. I am deferring this from the current release-hardening lane. It is a large API surface PR, currently conflicts with |
|
Reviewed for the current release-hardening pass. I am not direct-merging this PR into the immediate release branch because the diff is very broad (server.py + multiple compatibility APIs) and includes LM Studio, Deepgram, realtime, local files, and persisted response/resource endpoints that are outside the current release-fix scope.\n\nWhat current source already covers and I verified today:\n- OpenAI Chat Completions sampling/default propagation;\n- OpenAI Responses sampling/default propagation and output-cap behavior;\n- Anthropic adapter bundle-default behavior;\n- Ollama gateway request translation / streaming done behavior;\n- panel request-builder sampling and output overrides;\n- API/cache interaction gates for DSV4 native cache, ZAYA typed CCA, hybrid SSM, TQ KV, disk roundtrip, and DSV4/DSML tool parsing.\n\nI also found and fixed a current release-gate harness regression while reviewing this PR: tests/cross_matrix/run_api_surface_contract.py and run_cache_architecture_contract.py referenced a missing tests/cross_matrix/run_noheavy_api_cache_contract.py. Restored that runner and pushed commit 4df8b3b (test: restore API cache contract runner).\n\nVerification after the runner fix:\n- run_noheavy_api_cache_contract.py -> status=pass (15 API, 5 scheduler-cache, 32 TQ/MLLM-cache, 21 DSV4/DSML tool tests)\n- run_api_surface_contract.py -> status=pass\n- run_cache_architecture_contract.py -> status=pass\n- API parity/history focused pytest -> 38 passed\n\nLeaving this PR open as a future broader compatibility-API feature lane. Credit to @CobraSoftware for the local compatibility API direction; pieces should be split/landed with isolated endpoint specs and tests rather than merged wholesale into this release. |
Thanks. I amd happy to continue working on this. If you have priority list let me know other wise I will open a pr for one change at a time. |
overview
This is desgined to add more API endpoints for boarder compatiblity.
It was heavily assisted by AI.
Summary
This PR expands the local compatibility surface for the MLX-backed server while keeping inference local-only.
Main additions:
/lmstudio/v1/*/deepgram/vl/*filesand persistedresponsesretrievalWhat Changed
Added LM Studio endpoints:
/lmstudio/v1/models/lmstudio/v1/models/load/lmstudio/v1/models/unload/lmstudio/v1/models/download/lmstudio/v1/models/download/status/lmstudio/v1/chatAdded Deepgram endpoints:
/deepgram/vl/listen/deepgram/vl/speak/deepgram/vl/read/deepgram/vl/models/deepgram/vl/models/{model_id}Expanded OpenAI-compatible surface:
filesAPIresponses/{id}retrieval, delete, and input item listing501 not_implemented_localfallback responses for remaining unsupported cloud pathsImproved auth compatibility:
Authorization: Bearer <key>Authorization: Token <key>x-api-keyapi-keyTesting
I tested compelation
python3 -m py_compile vmlx_engine/server.py tests/test_deepgram_api.py tests/test_lmstudio_api.py tests/test_openai_spec_surface.py tests/test_realtime_compat.pyand did get all tests to passs but I lack the hardware to do larger in-depth realworld testing with bigger models.