stream-translator-gpt

stream-translator-gpt is a command-line tool for real-time transcription and translation of live streams. We have now added an easier-to-use WebUI entry point.

Try it on Colab:

WebUI	Command Line

(Due to frequent scraping and theft of API keys, we are unable to provide a trial API key. You need to fill in your own API key.)

Pipeline

flowchart LR
    subgraph ga["`**Input**`"]
        direction LR
        aa("`**FFmpeg**`")
        ab("`**Device audio**`")
        ac("`**yt-dlp**`")
        ad("`**Local media file**`")
        ae("`**Live streaming**`")
        ac --> aa
        ad --> aa
        ae --> ac
    end
    subgraph gb["`**Audio Slicing**`"]
        direction LR
        ba("`**Silero VAD**`")
    end
    subgraph gc["`**Transcription**`"]
        direction LR
        ca("`**Whisper**`")
        cb("`**Faster-Whisper**`")
        cc("`**Simul Streaming**`")
        cd("`**OpenAI Transcription API**`")
        ce("`**HuggingFace ASR**`")
    end
    subgraph gd["`**Translation**`"]
        direction LR
        da("`**GPT API**`")
        db("`**Gemini API**`")
    end
    subgraph ge["`**Output**`"]
        direction LR
        ea("`**Print to terminal**`")
        ee("`**Save to file**`")
        ec("`**Discord**`")
        ed("`**Telegram**`")
        eb("`**Cqhttp**`")
    end
    aa --> gb
    ab --> gb
    gb ==> gc
    gc ==> gd
    gd ==> ge

Uses yt-dlp to extract audio data from live streams.

Dynamic threshold audio slicing based on Silero-VAD.

Use Whisper / Faster-Whisper / Simul Streaming / HuggingFace ASR locally or call OpenAI Transcription API remotely for transcription.

Use OpenAI's GPT API / Google's Gemini API for translation.

Finally, the results can be printed to the terminal, saved to a file, or sent to a group via social media bot.

Prerequisites

Python >= 3.10
FFmpeg (skip if already installed):
- Windows: winget install ffmpeg
- Linux (Debian/Ubuntu): sudo apt install ffmpeg
Install CUDA on your system.
Install cuDNN to your CUDA dir if you want to use Faster-Whisper.
Install PyTorch (with CUDA) to your Python.
Create a Google API key if you want to use Gemini API for translation.
Create a OpenAI API key if you want to use OpenAI Transcription API for transcription or GPT API for translation.

Installation

WebUI

pip install stream-translator-gpt[webui] -U

Command Line

pip install stream-translator-gpt -U

Usage

The commands on Colab are the recommended usage, below are some other commonly used options.

Transcribe live streaming (default use Whisper):

stream-translator-gpt {URL} --language {input_language}
Transcribe by Faster-Whisper:

stream-translator-gpt {URL} --language {input_language} --use_faster_whisper
Transcribe by SimulStreaming:

stream-translator-gpt {URL} --language {input_language} --use_simul_streaming
Transcribe by SimulStreaming with Faster-Whisper as the encoder:

stream-translator-gpt {URL} --language {input_language} --use_simul_streaming --use_faster_whisper
Transcribe by OpenAI Transcription API:

stream-translator-gpt {URL} --language {input_language} --use_openai_transcription_api --openai_api_key {your_openai_key}
Transcribe by a HuggingFace ASR model (requires pip install stream-translator-gpt[hf_asr]):

stream-translator-gpt {URL} --model {hf_model_name} --use_hf_asr

Only models with pipeline_tag: automatic-speech-recognition on Hugging Face Hub are supported.
Translate to other language by Gemini:

stream-translator-gpt {URL} --language ja --translation_prompt "Translate from Japanese to Chinese" --google_api_key {your_google_key}
Translate to other language by GPT:

stream-translator-gpt {URL} --language ja --translation_prompt "Translate from Japanese to Chinese" --openai_api_key {your_openai_key}
Using OpenAI Transcription API and Gemini at the same time:

stream-translator-gpt {URL} --language ja --use_openai_transcription_api --openai_api_key {your_openai_key} --translation_prompt "Translate from Japanese to Chinese" --google_api_key {your_google_key}
Local video/audio file as input:

stream-translator-gpt /path/to/file --language {input_language}
Record system audio as input:

stream-translator-gpt device --language {input_language}
Record microphone as input:

stream-translator-gpt device --language {input_language} --mic
Sending result to Discord:

stream-translator-gpt {URL} --language {input_language} --discord_webhook_url {your_discord_webhook_url}
Sending result to Telegram:

stream-translator-gpt {URL} --language {input_language} --telegram_token {your_telegram_token} --telegram_chat_id {your_telegram_chat_id}
Sending result to Cqhttp:

stream-translator-gpt {URL} --language {input_language} --cqhttp_url {your_cqhttp_url} --cqhttp_token {your_cqhttp_token}
Saving result to a .srt subtitle file:

stream-translator-gpt {URL} --language ja --translation_prompt "Translate from Japanese to Chinese" --google_api_key {your_google_key} --hide_transcribe_result --retry_if_translation_fails --output_timestamps --output_file_path ./result.srt

All options

Option	Default Value	Description
Overall Options
`--openai_api_key`		OpenAI API key if using GPT translation / Whisper API. If you have multiple keys, you can separate them with "," and each key will be used in turn.
`--google_api_key`		Google API key if using Gemini translation. If you have multiple keys, you can separate them with "," and each key will be used in turn.
`--openai_base_url`		Customize the API endpoint of OpenAI (Affects GPT translation & OpenAI Transcription).
`--google_base_url`		Customize the API endpoint of Google (Affects Gemini translation).
`--proxy`		Used to set the proxy for all --*_proxy flags if they are not specifically set. Also sets http_proxy environment variables.
Input Options
`URL`		The URL of the stream. If a local file path is filled in, it will be used as input. If fill in "device", the input will be obtained from your PC device.
`--format`	ba/wa*	Stream format code, this parameter will be passed directly to yt-dlp. You can get the list of available format codes by `yt-dlp {url} -F`
`--list_format`		Print all available formats then exit.
`--cookies`		Used to open member-only stream, this parameter will be passed directly to yt-dlp.
`--input_proxy`		Use the specified HTTP/HTTPS/SOCKS proxy for yt-dlp, e.g. http://127.0.0.1:7890.
`--device_index`		The index of the device that needs to be recorded. If not set, the system default recording device will be used.
`--list_devices`		Print all audio devices info then exit.
`--device_recording_interval`	0.5	The shorter the recording interval, the lower the latency, but it will increase CPU usage. It is recommended to set it between 0.1 and 1.0.
Audio Slicing Options
`--min_audio_length`	0.5	Minimum slice audio length in seconds.
`--max_audio_length`	30.0	Maximum slice audio length in seconds.
`--target_audio_length`	5.0	When dynamic no speech threshold is enabled (enabled by default), the program will slice the audio as close to this length as possible.
`--continuous_no_speech_threshold`	1.0	Slice if there is no speech during this number of seconds. If the dynamic no speech threshold is enabled (enabled by default), the actual threshold will be dynamically adjusted based on this value.
`--disable_dynamic_no_speech_threshold`		Set this flag to disable dynamic no speech threshold.
`--prefix_retention_length`	0.5	The length of the retention prefix audio during slicing.
`--vad_threshold`	0.35	Range 0~1. the higher this value, the stricter the speech judgment. If dynamic VAD threshold is enabled (enabled by default), this threshold will be adjusted dynamically based on the input speech's VAD results.
`--disable_dynamic_vad_threshold`		Set this flag to disable dynamic VAD threshold.
Transcription Options
`--model`	small	Select Whisper/Faster-Whisper/Simul Streaming model size. See here for available models.
`--language`	auto	Language spoken in the stream. See here for available languages.
`--use_faster_whisper`		Set this flag to use Faster-Whisper instead of Whisper. If used with --use_simul_streaming, SimulStreaming with Faster-Whisper as the encoder will be used.
`--use_simul_streaming`		Set this flag to use SimulStreaming instead of Whisper. If used with --use_faster_whisper, SimulStreaming with Faster-Whisper as the encoder will be used.
`--use_openai_transcription_api`		Set this flag to use OpenAI transcription API instead of the original local Whipser.
`--use_hf_asr`		Set this flag to use a HuggingFace ASR model. Use `--model` to specify the model ID. Requires `pip install stream-translator-gpt[hf_asr]`.
`--transcription_filters`	emoji_filter,repetition_filter	Filters apply to transcription results, separated by ",". We provide emoji_filter, repetition_filter and japanese_stream_filter.
`--transcription_initial_prompt`		General purpose prompt/glossary for transcription. Format: "Word1, Word2, Word3, ...". This text is always included in the prompt passed to the model.
`--disable_transcription_context`		Set this flag to disable context (previous sentence) propagation in transcription.
Translation Options
`--gpt_model`	gpt-5.4-nano	OpenAI's GPT model name, gpt-5.4 / gpt-5.4-mini / gpt-5.4-nano / gpt-5.5
`--gemini_model`	gemini-3.1-flash-lite	Google's Gemini model name, gemini-2.5-flash / gemini-2.5-flash-lite / gemini-3-flash-preview / gemini-3.1-flash-lite / gemini-3.5-flash
`--translation_prompt`		If set, will translate the result text to target language via GPT / Gemini API (According to which API key is filled in). Example: "Translate from Japanese to Chinese"
`--translation_history_size`	0	The number of previous transcripts sent as context when calling the LLM API. It is recommended to disable context (set to 0) for weaker models.
`--translation_timeout`	10	If the GPT / Gemini translation exceeds this number of seconds, the translation will be discarded.
`--use_json_result`		Using JSON result in LLM translation for some locally deployed models.
`--retry_if_translation_fails`		Retry when translation times out/fails. Used to generate subtitles offline.
`--temperature`		GPT/Gemini parameter. Controls output randomness, higher values produce more diverse results.
`--top_p`		GPT/Gemini parameter. Nucleus sampling threshold, only tokens with cumulative probability above this value are considered.
`--top_k`		Gemini parameter. Limits token selection to the top K most probable candidates.
`--prompt_cache_key`		GPT parameter. If set, enables prompt caching optimization on the API side.
`--reasoning_effort`		GPT parameter. Controls reasoning depth for reasoning models. Options: none / minimal / low / medium / high / xhigh.
`--verbosity`		GPT parameter. Controls the verbosity of the response. Options: auto / short / concise / detailed.
`--service_tier`		GPT parameter. Specifies processing priority tier. Options: auto / default / flex / priority.
`--debug_mode`		Enable debug mode. Print messages sent to LLM and usage info after each translation call.
`--processing_proxy`		Use the specified HTTP/HTTPS/SOCKS proxy for Whisper/GPT API (Gemini currently doesn't support specifying a proxy within the program), e.g. http://127.0.0.1:7890.
Output Options
`--output_timestamps`		Output the timestamp of the text when outputting the text.
`--hide_transcribe_result`		Hide the result of Whisper transcribe.
`--output_file_path`		If set, will save the result text to this path.
`--cqhttp_url`		If set, will send the result text to the cqhttp server.
`--cqhttp_token`		Token of cqhttp, if it is not set on the server side, it does not need to fill in.
`--discord_webhook_url`		If set, will send the result text to the discord channel.
`--telegram_token`		Token of Telegram bot.
`--telegram_chat_id`		If set, will send the result text to this Telegram chat. Needs to be used with "--telegram_token".
`--output_proxy`		Use the specified HTTP/HTTPS/SOCKS proxy for Cqhttp/Discord/Telegram, e.g. http://127.0.0.1:7890.

Contact me

Telegram: @ionic_bond

Donate

PayPal Donate or PayPal

Name		Name	Last commit message	Last commit date
Latest commit History 366 Commits
.github/workflows		.github/workflows
stream_translator_gpt		stream_translator_gpt
webui		webui
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README_CN.md		README_CN.md
README_JP.md		README_JP.md
README_PyPI.md		README_PyPI.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
requirements_hf_asr.txt		requirements_hf_asr.txt
requirements_webui.txt		requirements_webui.txt
stream_translator.ipynb		stream_translator.ipynb
webui.ipynb		webui.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

stream-translator-gpt

Pipeline

Prerequisites

Installation

WebUI

Command Line

Usage

All options

Contact me

Donate

About

Uh oh!

Releases 43

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

stream-translator-gpt

Pipeline

Prerequisites

Installation

WebUI

Command Line

Usage

All options

Contact me

Donate

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 43

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages