Skip to content
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
refactor: update transcription response formats and validation
- Removed 'diarized_json' from the allowed response formats in both the server functions and API schemas.
- Updated the TranscriptionResponseFormat type to reflect the removal of 'diarized_json'.
- Enhanced error handling in the OpenAI transcription adapter to ensure that known speaker names and references are provided together.
- Added tests to validate the new requirements for speaker diarization in the transcription process.
  • Loading branch information
8times4 committed Jun 25, 2026
commit 23b15bc38804153c18296f85f6735d6daab89544
2 changes: 1 addition & 1 deletion examples/ts-react-chat/src/lib/server-fns.ts
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,7 @@ const TRANSCRIPTION_PROVIDER_SCHEMA = z
.optional()

const TRANSCRIPTION_RESPONSE_FORMAT_SCHEMA = z
.enum(['json', 'text', 'srt', 'verbose_json', 'vtt', 'diarized_json'])
.enum(['json', 'text', 'srt', 'verbose_json', 'vtt'])
.optional()
Comment thread
coderabbitai[bot] marked this conversation as resolved.

const AUDIO_PROVIDER_SCHEMA = z
Expand Down
2 changes: 1 addition & 1 deletion examples/ts-react-chat/src/routes/api.transcribe.ts
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ const TRANSCRIPTION_PROVIDER_SCHEMA = z
.optional()

const TRANSCRIPTION_RESPONSE_FORMAT_SCHEMA = z
.enum(['json', 'text', 'srt', 'verbose_json', 'vtt', 'diarized_json'])
.enum(['json', 'text', 'srt', 'verbose_json', 'vtt'])
.optional()

const TRANSCRIBE_BODY_SCHEMA = z.object({
Expand Down
13 changes: 10 additions & 3 deletions packages/ai-openai/src/adapters/transcription.ts
Original file line number Diff line number Diff line change
Expand Up @@ -250,9 +250,7 @@ export class OpenAITranscriptionAdapter<
options
const file = this.prepareAudioFile(audio)
const isDiarizeTranscriptionModel = isDiarizeModel(model)
const topLevelResponseFormat = responseFormat as
| OpenAITranscriptionResponseFormat
| undefined
const topLevelResponseFormat = responseFormat
const effectiveResponseFormat =
topLevelResponseFormat ?? modelOptions?.response_format

Expand Down Expand Up @@ -436,6 +434,15 @@ export class OpenAITranscriptionAdapter<
)
}
Comment thread
coderabbitai[bot] marked this conversation as resolved.

if (
(modelOptions?.known_speaker_names === undefined) !==
(modelOptions?.known_speaker_references === undefined)
) {
throw new Error(
'OpenAI diarization known_speaker_names and known_speaker_references must both be provided together.',
)
}

if (modelOptions?.known_speaker_names !== undefined) {
const knownSpeakerCount = modelOptions.known_speaker_names.length
if (knownSpeakerCount > 4) {
Expand Down
29 changes: 29 additions & 0 deletions packages/ai-openai/tests/transcription-adapter.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -386,11 +386,40 @@ describe('OpenAI transcription adapter', () => {
audio: new File([], 'audio.wav', { type: 'audio/wav' }),
modelOptions: {
known_speaker_names: ['a', 'b', 'c', 'd', 'e'],
known_speaker_references: [
'data:audio/wav;base64,AAA=',
'data:audio/wav;base64,BBB=',
'data:audio/wav;base64,CCC=',
'data:audio/wav;base64,DDD=',
'data:audio/wav;base64,EEE=',
],
},
logger: testLogger,
}),
).rejects.toThrow('at most 4')

await expect(
adapter.transcribe({
model: 'gpt-4o-transcribe-diarize',
audio: new File([], 'audio.wav', { type: 'audio/wav' }),
modelOptions: {
known_speaker_names: ['agent'],
},
logger: testLogger,
}),
).rejects.toThrow('must both be provided together')

await expect(
adapter.transcribe({
model: 'gpt-4o-transcribe-diarize',
audio: new File([], 'audio.wav', { type: 'audio/wav' }),
modelOptions: {
known_speaker_references: ['data:audio/wav;base64,AAA='],
},
logger: testLogger,
}),
).rejects.toThrow('must both be provided together')

await expect(
adapter.transcribe({
model: 'gpt-4o-transcribe-diarize',
Expand Down
1 change: 0 additions & 1 deletion packages/ai/src/types.ts
Original file line number Diff line number Diff line change
Expand Up @@ -1715,7 +1715,6 @@ export type TranscriptionResponseFormat =
| 'srt'
| 'verbose_json'
| 'vtt'
Comment thread
coderabbitai[bot] marked this conversation as resolved.
| 'diarized_json'

export interface TranscriptionOptions<
TProviderOptions extends object = object,
Expand Down
2 changes: 0 additions & 2 deletions testing/e2e/src/lib/media-providers.ts
Original file line number Diff line number Diff line change
Expand Up @@ -42,9 +42,7 @@ function testHeaders(testId?: string): Record<string, string> | undefined {
function getOpenaiTranscriptionModel(options: TranscriptionAdapterOptions) {
const modelOptions = options.modelOptions
const isDiarizationRequest =
options.responseFormat === 'diarized_json' ||
modelOptions?.response_format === 'diarized_json' ||
modelOptions?.diarize === true ||
modelOptions?.chunking_strategy !== undefined ||
modelOptions?.known_speaker_names !== undefined ||
modelOptions?.known_speaker_references !== undefined
Comment on lines +42 to +48

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "1) Find where diarize is populated in E2E request construction"
rg -n --type=ts -C2 '\bdiarize\b|response_format|chunking_strategy|known_speaker_' testing/e2e/src

echo
echo "2) Confirm modelOptions is forwarded from API routes"
rg -n --type=ts -C3 'generateTranscription\(|modelOptions,' \
  testing/e2e/src/routes/api.transcription.ts \
  testing/e2e/src/routes/api.transcription.stream.ts

echo
echo "3) Confirm OpenAI adapter request spread behavior"
rg -n --type=ts -C5 '\.\.\.modelOptions|TranscriptionCreateParamsNonStreaming|request\.response_format' \
  packages/ai-openai/src/adapters/transcription.ts

Repository: TanStack/ai

Length of output: 5925


Prevent internal modelOptions.diarize from reaching the OpenAI SDK

  • Current E2E payloads don’t set modelOptions.diarize (they use response_format: 'diarized_json', chunking_strategy, and known_speaker_*), and modelOptions is forwarded unchanged by both transcription routes.
  • The OpenAI adapter still spreads ...modelOptions into the SDK request (request: { model, file, ...modelOptions }), so if any caller ever adds modelOptions.diarize, it would be sent upstream as an unsupported parameter—omit diarize before building the request.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@testing/e2e/src/lib/media-providers.ts` around lines 42 - 50, The code
currently lets an internal flag modelOptions.diarize flow into the OpenAI SDK;
update the transcription request construction to strip the diarize property
before spreading modelOptions into the SDK call—e.g., in the OpenAI
transcription adapter where the request is built, clone modelOptions and delete
or omit the diarize key (while still using getOpenaiTranscriptionModel(...) for
detection), then spread the sanitized object (e.g., sanitizedModelOptions) into
request: { model, file, ...sanitizedModelOptions } so diarize is never sent
upstream.

Expand Down