update#199
Closed
limweb wants to merge 330 commits into
Closed
Conversation
- Mark 9 enum-only models as deprecated (gpt-4o-mini, deepseek-v3/r1, grok-3-mini, qwen-3, etc.) — upstream LS no longer routes them - /v1/models hides deprecated models - FREE_TIER_BASE updated to gemini-2.5-flash only - Connect-RPC tested on VPS: InitializePanelState OK but StartCascade returns empty — LS only partially supports connect protocol. Default stays grpc. - PROBE_CANARIES reduced to gemini-only — Claude canary probes were burning Trial account quotas (2-3 req/hr per model) - Version bump to 2.0.0
…ross dashboard - add tooltip keys: probeTitle, copyTitle, refreshTitle, editModels - add time unit keys: day, hour, minute, second - add tier filter label and empty state messages (noRequestData, noAccountRequestData, noData) - add batch import and proxy configuration feedback messages - add login result section keys (successTitle, addedToPool, fields.*) - expand cascadeReuse.desc with detailed explanation of cache behavior and
…robe/credits operations - check-i18n.js: add I18n.t() call extraction and validation, check both string literals and variable expressions, deduplicate and verify all keys exist in locale files - i18n: add button keys
updateCapability() ran inferTier() unconditionally after every request, which overwrites the authoritative tier set by GetUserStatus / refreshCredits. When a Pro or 14-day Trial account called a non-premium model such as gemini-2.5-flash or gpt-4o-mini, inferTier() saw only that ok=true and returned 'free', so the dashboard showed the account as FREE right after a successful request. probeAccount() already worked around this by restoring status.tierName at the end, but the chat handler path (handlers/chat.js) had no such restore, so every real API call silently demoted the tier. Skip the inferTier() fallback when an authoritative source is present (userStatusLastFetched > 0, or tierManual=true). The fallback still runs for brand-new accounts that have never been probed.
…d add surface background to toggle container
…a-i18n attribute only
修复了一个实际影响用户体验的 tier 降级 bug。updateCapability 在每次 chat 后调 inferTier 会把 Pro/Trial 降为 free,现在有 userStatus 的账号跳过推断。感谢 @aict666 的精准定位。
fix: prevent cascade reuse from replaying old context
Upd/english translation
- fix(dwgx#47): remove if(emulateTools) guards — tool calls were silently dropped when condition was false. Now always preserved. Fixes Claude Code tool calls appearing as raw text. - feat: real structured output — system message + user hint + json_schema injection + response fence stripping for response_format support - feat: zero-dep PDF text extraction (src/pdf.js) — FlateDecode inflate + Tj/TJ operator parsing. Handles Anthropic type:"document" blocks. Scanned PDFs get clear error message.
… too) + add tool-calling reliability matrix + opus-4-7-max weekly quota FAQ
…calling matrix + opus-4-7-max FAQ
0a00 抓到根因 31 个 trial 账号 quota 100% 但全在 claude-opus-4-7-max 上撞 26-29 分钟 per-(account,model) 滑窗限流。upstream 对 -max / -xhigh 这种高 reasoning effort 变体单独限频跟 daily/weekly quota 独立维度。 加 pickRateLimitFallback 从模型名 suffix 推同 base 低一档 (low → medium → high → xhigh → max;1m → bare; -thinking 跳过避免 silent 降级 user-visible 行为)。 non-stream + stream 429/503 错误响应加 fallback_model + remediation 字段告诉 client 能换哪个 sibling。 测试 860 → 871 (+11)。
…t-4.5 / claude-opus-4.5) — README + recent replies use dotted form, catalog has dashed only
…rphan cleanup dwgx#126/dwgx#128 撞 rate limit 自动重试 fallback model handleChatCompletions 抽 outer wrapper:non-stream + rate_limit + fallback_model + env 不关 → 改 body.model 重发一次。响应 body.model 还原原始名加 served_model + fallback_reason 旁侧 字段。stream 路径不接 (chunks 可能已 emit)。 env 关 WINDSURFAPI_VARIANT_FALLBACK_ON_RATE_LIMIT=0。 dwgx#127 dashboard 一键更新残留 orphan LS 两层修:self-update 退出前 stopLanguageServer 优雅关;startup 早期 cleanupOrphanLanguageServers ps scan 兜底 kill 不在自己 pool 的 langserver_linux_x64 进程。 env 关 WINDSURFAPI_SKIP_LS_CLEANUP=1。 测试 871 → 882 (+11)。
v2.0.85 默认 ON auto-fallback 副作用:cascade reuse fingerprint 含 modelKey 改 model → 新 cascade 存到新 model 下 → client 下 次拿原 model 请求 reuse miss → 依赖 cascade reuse 的 client (不重发 history 那种)看到模型失忆。 shouldAutoFallback env gate 反向 默认 OFF WINDSURFAPI_VARIANT_FALLBACK_ON_RATE_LIMIT=1 才显式开。 fallback_model + remediation 字段 (v2.0.84) 保留。 v2.0.87+ 计划做 cascade pool alias 写入让 fallback 不破 reuse 那时再考虑默认 ON。 测试 882/0。
v2.0.86 hotfix 默认 OFF 是临时退步,这版做 v2.0.86 承诺的真修。 A. conversation-pool checkin 接 string | string[] 同时写多 fp slot B. chat.js inner 接 context.__aliasModelKey 算 alt fingerprintAfter 一起 checkin 列表 C. outer wrapper fallback 时设 __aliasModelKey: originalModel 让 xhigh 跑出来的 cascade 同时挂在 max 和 xhigh 两个 fingerprint D. shouldAutoFallback 默认改回 ON (env=0 仍可关) 效果:client max 撞限流 → fallback xhigh 跑 → cascade 双 index → client 下一 turn 用 max checkout 命中 → 模型继续 history 不失忆。 测试 882 → 889 (+7 含 realistic auto-fallback 场景测试)。
H-1 outer wrapper 算 originalRoutingKey = resolveModel(mergeReasoningEffortIntoModel(...)) 作为 __aliasModelKey 不再用 raw body.model (codex CLI 类 reasoning_effort split 客户端 v2.0.87 fix 失效) H-2 invalidateFor 两遍 scan 第一遍 collect cascadeId 第二遍删 sibling slot 任何指它的 (LS 重启时 alias slot 不再 dangling) H-3 outer wrapper 算 originalCkey 透传 inner cacheSet 写 alias key (rate-limit 窗口下次同 prompt 命中 cache 不再烧 quota) H-4 新 stopLanguageServerAndWait 等 child exit (SIGKILL fallback) self-update callback await 它 (避免 orphan LS 占端口 race) M-1 stats.stores 一次 per checkin 不 per slot 加 aliasWrites M-5 served_model/fallback_reason 移 usage.cascade_breakdown (避 pydantic strict client ValidationError) L-1 cleanup argv0 严格相等不 substring (避免 grep 等被 SIGTERM) 测试 889 → 898 (+9 H-1 H-2 H-4 M-1 真修测试)
codex 二审 v2.0.88 4 HIGH 修法本身。H-2/3/4 准。H-1 留 latent hole: alias fpAfter 用 fpOpts.toolPreamble 是 fallback dialect 算的,跨 provider fallback 会让下次 fpBefore 重算 toolPreamble 不一致 → fingerprint mismatch → reuse miss → 模型 失忆。 今天 pickRateLimitFallback 都同 provider ladder 走 不触发, 但 catalog 任何扩展会 silent 破。加 _isSameProviderFallback 硬 guard 锁不变量。 测试 898 → 904 (+6)
lnqdev 在 v2.0.89 又报 ERR_TOKEN_FETCH_FAILED。e2e 实测 3 账号 × 4 OTT host 组合 = 12/12 全 401 invalid_token。Auth1 + PostAuth 链路前段 都 200 OK 给 sessionToken 但任何 sessionToken 在任何 OTT host 都被 拒。v2.0.61 / v2.0.75 / v2.0.79 三层 fallback 救不了 — 是上游 GetOneTimeAuthToken 端点本身废了。 反向工程 windsurf-assistant v17.42.20 (2026-04-27 仍活跃) 走 Devin-only: Auth1 → PostAuth → sessionToken 直接当 IDE auth credential,跳过 OTT + RegisterUser。实测 Cascade gRPC backend (server.codeium.com / server.self-serve.windsurf.com) 接受裸 sessionToken 当 metadata.apiKey 4/4 200 OK 拿到 planName=Trial。 修法:windsurfLoginViaAuth1 链路从 Auth1 → PostAuth → OTT → registerWithCodeium → apiKey (sk-ws-01-) 塌成 Auth1 → PostAuth → apiKey = sessionToken。删 60 行加 10 行净减 20 行。 Firebase signInWithPassword 现在被 Google App Check 挡了 server-side 也走不通,dispatcher 实际 fall through 到 Auth1 路径,所以邮箱密码登录 功能完整恢复。 测试 904 → 910 (+6 source-level invariants v2090-ott-bypass.test.js) 全测 0 fail / 0 回归。 升级 docker compose pull && docker compose up -d --force-recreate
…l-audit v2.0.88-90 - client.js: cascadeHistoryBudget default 200k→400k, add truncation note for trimmed history so model doesn't ask user to repeat - handlers/chat.js: add IP-rate-limit circuit breaker for non-stream and stream paths, record policy blocked + rate limited events - handlers/messages.js: defensive startMessage() in finish() prevents event ordering violation when message stops before it starts - dashboard/stats.js: track policyBlockedCount and rateLimitedCount, persist to stats.json for dashboard visibility
…ling - Frontend saveGlobalProxy/editAccountProxy now checks API error response before showing success toast (fixes silent failure on ERR_PROXY_PRIVATE_HOST) - parseProxyUrl normalizes whitespace and supports space-separated format like "socks5 127.0.0.1 1089" in addition to canonical URL form - setGlobalProxy/setAccountProxy auto-trim proxy host values
The kimi_k2 vLLM section-token dialect stopped working for kimi-k2 and kimi-k2-thinking — Cascade returns ~16 token empty responses instead of tool calls. Switching to openai_json_xml which works for all Moonshot models. Verified on VPS: kimi-k2 with openai_json_xml correctly emits <tool_call> XML tool calls.
Verified on VPS (2026-05-07): kimi-k2 through Cascade returns idle_empty (12-24 tokens, empty content) even for plain chat without tools. This is an upstream Windsurf model outage, not a proxy bug. The proxy cannot fix Cascade silently returning empty for a model.
streamResponse was referencing bare `context` for the IP-level rate-limit circuit breaker, but `context` was never passed to the function. Now passes `context` through the `deps` parameter and accesses it as `deps.context`, matching the pattern used by other shared state (cachePolicy, tools, etc.). Fixes: all Claude models crashing with "Handler error: ReferenceError: context is not defined at chat.js:3223"
Windsurf PostAuth endpoint now requires the auth1 token as both a header (X-Devin-Auth1-Token) and in the request body. Previously only sent in body, causing "missing required header: X-Devin-Auth1-Token" error on the legacy server.self-serve.windsurf.com endpoint. Also made postAuthDualPath accept optional extraHeaders parameter for future-proofing.
Cherry-picked from PR dwgx#142 by @suhaihui-git — adds isCacheEnabled() guard to cacheGet/cacheSet and exposes enabled field in cacheStats(). Defaults to enabled (backward-compatible).
…o body + dwgx#137 proxy parse + cache switch - P0: ReferenceError context is not defined in streamResponse (dwgx#135) - PostAuth empty proto body + X-Devin-Auth1-Token + Referer (dwgx#134 via @Await-d PR dwgx#144) - parseProxyUrl whitespace + frontend error check (dwgx#137) - RESPONSE_CACHE_ENABLED env (PR dwgx#142 by @suhaihui-git) - IP rate-limit circuit breaker (#132) - CLAUDE.md for agent rules
…600KB - NLU retry now defaults ON for zhipu/glm/moonshot/kimi models (they narrate instead of calling tools on first pass). Still requires narrative detection to trigger. Set WINDSURFAPI_NLU_RETRY=0 to disable. (dwgx#147, dwgx#125, dwgx#120) - CASCADE_MAX_HISTORY_BYTES default 400KB -> 600KB to reduce silent context amputation in long tool-call conversations (dwgx#133)
- kimi-k2 idle_empty now returns clear 502 with model suggestions instead of silently passing null/empty content to caller (dwgx#125/dwgx#120) - expanded neutralization patterns for Windsurf content policy filter (prompt-injection, jailbreak, bypass/safety override keywords) (dwgx#130) - Devin session token + brand name normalization already in place
When Cascade returns idle_empty for kimi-k2 (null/empty content), return clear error with suggested alternative models instead of silently passing empty output to the caller.
Moved upstream_model_unavailable check to right before final message assembly where allText/allThinking values are definitive. Returns clear 502 error when kimi-k2 produces null/empty content.
POST /auth/login now accepts optional proxy field for both single
and batch account creation. Proxy is parsed and bound to the newly
created account as per-account proxy.
Single: {"token":"...", "proxy":"socks5://1.2.3.4:1080"}
Batch: {"accounts":[{"token":"t1","proxy":"http://p:8080"},...]}
Verified on VPS — account created with proxy bound in proxy.json.
- VPS credentials and SSH_ASKPASS trick - issue reply rules: test first, no version numbers, short Chinese - updated defaults: NLU retry auto ON, history 600KB, maxWait 600s - release process and common issue triage
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
改了什么 / What changed
为什么 / Why
测试 / Testing
Checklist