Industrial-grade speech recognition toolkit: 170x realtime, 50+ languages, speaker diarization, emotion detection, streaming, and OpenAI-compatible API.
-
Updated
Jun 11, 2026 - Python
Industrial-grade speech recognition toolkit: 170x realtime, 50+ languages, speaker diarization, emotion detection, streaming, and OpenAI-compatible API.
Real-time microphone noise suppression on Linux.
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
Automagically synchronize subtitles with video.
🔊 A comprehensive list of open-source datasets for voice and sound computing (95+ datasets).
Frontier CoreML audio models in your apps — text-to-speech, speech-to-text, voice activity detection, and speaker diarization. In Swift, powered by SOTA open source.
Voice Activity Detector (VAD) : low-latency, high-performance and lightweight
The Open Source Alternative to Cluely - A lightning-fast, privacy-first AI assistant that works seamlessly during meetings, interviews, and conversations without anyone knowing. Built with Tauri for native performance, just 10MB. Completely undetectable in video calls, screen shares, and recordings.
Voice activity detector (VAD) for the browser with a simple API
A python package to build AI-powered real-time audio applications
Command-line utility to transcribe/translate from video/audio/subtitles to subtitles
Real-time speech recognition and voice activity detection (VAD) using next-gen Kaldi with ncnn without Internet connection. Support iOS, Android, Linux, macOS, Windows, Raspberry Pi, VisionFive2, LicheePi4A etc.
💎 A list of accessible speech corpora for ASR, TTS, and other Speech Technologies
Python AI assistant 🧠
Whisper.net. Speech to text made simple using Whisper Models
CNN-based audio segmentation toolkit. Allows to detect speech, music, noise and speaker gender. Has been designed for large scale gender equality studies based on speech time per gender.
AI speech toolkit for Apple Silicon — ASR, TTS, speech-to-speech, VAD, and diarization powered by MLX and CoreML
Voice activity detection (VAD) toolkit including DNN, bDNN, LSTM and ACAM based VAD. We also provide our directly recorded dataset.
An audio/acoustic activity detection and audio segmentation tool
Add a description, image, and links to the voice-activity-detection topic page so that developers can more easily learn about it.
To associate your repository with the voice-activity-detection topic, visit your repo's landing page and select "manage topics."