- GuangZhou
- lovemefan.top
Lists (22)
Sort Name ascending (A-Z)
AI Other
ASR
avatar
dataset
dataset for aiDiffusion
✨ Inspiration
kws
Language Model
llm
lager language modelMindspore
Music
python code style
python代码规范quantization
model quantizationRUST
Singing Voice Synthesis
Speech Editing
SpeechEnhance
speechllm
super resolution
TTS
工具
微服务
Stars
Fast Streaming TTS with MTP Acceleration and X-pred Mean Flow Distillation
LLM Wiki is a cross-platform desktop application that turns your documents into an organized, interlinked knowledge base — automatically. Instead of traditional RAG (retrieve-and-answer from scratc…
An end-to-end framework for multi-speaker transcription that jointly models who spoke, when, and what.
Confucius4-TTS: a Multilingual and Cross-Lingual Zero-Shot TTS Engine
Triton kernel fusion & CUDA Graph optimization for OmniVoice inference — RMSNorm, SwiGLU, Norm+Residual, SageAttention
RapidSpeech.cpp is a high-performance, edge-native speech intelligence framework written in pure C++. Built atop the ggml tensor library, it is designed to bridge the gap between state-of-the-art L…
Robust Speech Recognition Across Languages, Dialects, and Complex Acoustic Scenarios
HappyHorse AI turns text or images into remarkable 1080p cinematic video. Every HappyHorse AI video uses advanced motion synthesis — multi-shot storytelling, seamless transitions, and realism. Free…
High-Quality Voice Cloning TTS for 600+ Languages
A complete AI agency at your fingertips - From frontend wizards to Reddit community ninjas, from whimsy injectors to reality checkers. Each agent is a specialized expert with personality, processes…
A SOTA Industrial-Grade All-in-One ASR system with ASR, VAD, LID, and Punc modules. FireRedASR2 supports Chinese (Mandarin, 20+ dialects/accents), English, code-switching, and both speech and singi…
Official inference code for SoulX-Singer: Towards High-Quality Zero-Shot Singing Voice Synthesis
Pure C inference of Mistral Voxtral Realtime 4B speech to text model
The most powerful local music generation model that outperforms almost all commercial alternatives, supporting Mac, AMD, Intel, and CUDA devices.
A Large-scale Wu Dialect Speech Corpus with Multi-dimensional Annotations
Qwen3-ASR is an open-source series of ASR models developed by the Qwen team at Alibaba Cloud, supporting stable multilingual speech/music/song recognition, language detection and timestamp prediction.
A framework for efficient model inference with omni-modality models
Official Python inference and LoRA trainer package for the LTX-2 audio–video generative model.
Hybrid Flow Matching and GAN with Multi-Resolution Network for Few-Step High-Fidelity Audio Generation
FlowMirror-HydraVox — A natively accelerated multi-head autoregressive TTS system derived from CosyVoice 3.0. It predicts multiple tokens per step for faster, high-quality speech synthesis, featuri…
End-to-end speech recognition large model: 31 languages, dialects, accents, lyrics, hotwords, timestamps, speaker diarization. Trained on tens of millions of hours.
A free, open source, and extensible speech-to-text application that works completely offline.



