Starred repositories
[ECCV 2026] Moebius: 0.2B Lightweight Image Inpainting Framework with 10B-Level Performance
NoMusic is a browser extension that removes background music from tab audio in real time with an entirely local processing pipeline.
Speed-optimized streaming neural speech enhancement network
Industrial audio online policy distillation (OPD) training stack for ASR and TTS, distilling compact audio models from stronger teacher models.
JUCE is an open-source cross-platform C++ application framework for desktop and mobile applications, including VST, VST3, AU, AUv3, LV2 and AAX audio plug-ins.
Magenta RealTime 2: An Open-Weights Live Music Model
SGLang is a high-performance serving framework for large language models and multimodal models.
Frontier CoreML audio models in your apps — text-to-speech, speech-to-text, voice activity detection, and speaker diarization. In Swift, powered by SOTA open source.
A FOSS navigation SDK built from the ground up for the future
An iOS and Android mobile build of valhalla.
The official repo of UL-UNAS, an ultra-lightweight SE model.
[AutoArk] GPA (General Purpose Audio) can do ASR, TTS and voice conversion with one tiny model!
💬 An extensive collection of exceptional resources dedicated to the captivating world of talking face synthesis! ⭐ If you find this repo useful, please give it a star! 🤩
Audio-Visual Lip Synthesis via Intermediate Landmark Representation
LLM inference server with continuous batching & SSD caching for Apple Silicon — managed from the macOS menu bar
SpeechDenoiser: Real-Time Speech Denoising with ONNX Welcome to SpeechDenoiser, a simple and effective solution for real-time speech denoising using an ONNX model. This repository contains everythi…
The official implementation of GTCRN, an ultra-lightweight SE model.
Diffusion model(SD,Flux,Wan,Qwen Image,Z-Image,...) inference in pure C/C++
[ECCV 2026] Implementation of "Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length"
Very low latency speech to text, intent recognition, and text to speech, for building voice agents and interfaces
Foundational Models for State-of-the-Art Speech and Text Translation
Google Research
zero-shot voice conversion & singing voice conversion, with real-time support
StreamDiffusion, Live Stream APP
StreamDiffusion: A Pipeline-Level Solution for Real-Time Interactive Generation
nijigenerate is an open source editor for the nijilive puppet format, which is derived from Inochi2D (v0.8) technology. This application allows you to rig models for use in games or for other real-…
