lovemefan

Lovemefan lovemefan

majored in singing voice synthesis， speech recognition and speech enhancement

140 followers · 91 following

GuangZhou
lovemefan.top

Organizations

Lists (22)

Sort

✨ Inspiration

kws

1 repository

Language Model

3 repositories

llm

lager language model

33 repositories

Mindspore

1 repository

Music

19 repositories

python code style

python代码规范

3 repositories

quantization

model quantization

2 repositories

RUST

7 repositories

Singing Voice Synthesis

20 repositories

Speech Editing

2 repositories

SpeechEnhance

10 repositories

speechllm

19 repositories

super resolution

2 repositories

TTS

57 repositories

工具

105 repositories

微服务

1 repository

Stars

ASLP-lab / FlashTTS

Fast Streaming TTS with MTP Acceleration and X-pred Mean Flow Distillation

Python 38 1 Updated Jun 9, 2026

nashsu / llm_wiki

LLM Wiki is a cross-platform desktop application that turns your documents into an organized, interlinked knowledge base — automatically. Instead of traditional RAG (retrieve-and-answer from scratc…

TypeScript 11,553 1,410 Updated Jun 14, 2026

Soul-AILab / SoulX-Transcriber

An end-to-end framework for multi-speaker transcription that jointly models who spoke, when, and what.

Python 248 11 Updated Jun 4, 2026

netease-youdao / Confucius4-TTS

Confucius4-TTS: a Multilingual and Cross-Lingual Zero-Shot TTS Engine

Python 168 17 Updated Jun 6, 2026

RapidAI / MaClaw

下一代自主进化智能体平台（GUI/TUI/Service/SDK)

Go 118 20 Updated Jun 15, 2026

newgrit1004 / omnivoice-triton

Triton kernel fusion & CUDA Graph optimization for OmniVoice inference — RMSNorm, SwiGLU, Norm+Residual, SageAttention

Python 49 6 Updated Jun 8, 2026

RapidAI / RapidSpeech.cpp

RapidSpeech.cpp is a high-performance, edge-native speech intelligence framework written in pure C++. Built atop the ggml tensor library, it is designed to bridge the gap between state-of-the-art L…

C 18 4 Updated Jun 15, 2026

meituan-longcat / LongCat-Video

Python 4,327 684 Updated May 27, 2026

XiaomiMiMo / MiMo-V2.5-ASR

Robust Speech Recognition Across Languages, Dialects, and Complex Acoustic Scenarios

Python 274 27 Updated Apr 23, 2026

Lourdle / cosyvoice.cpp

CosyVoice inference in C/C++

C++ 27 8 Updated Jun 14, 2026

happyhorseai / happyhorse

HappyHorse AI turns text or images into remarkable 1080p cinematic video. Every HappyHorse AI video uses advanced motion synthesis — multi-shot storytelling, seamless transitions, and realism. Free…

136 14 Updated Apr 8, 2026

k2-fsa / OmniVoice

High-Quality Voice Cloning TTS for 600+ Languages

Python 7,477 1,169 Updated Jun 11, 2026

fishaudio / fish-speech

SOTA Open Source TTS

Python 30,826 2,634 Updated Jun 9, 2026

msitarzewski / agency-agents

A complete AI agency at your fingertips - From frontend wizards to Reddit community ninjas, from whimsy injectors to reality checkers. Each agent is a specialized expert with personality, processes…

Shell 113,452 18,510 Updated Jun 15, 2026

FireRedTeam / FireRedASR2S

A SOTA Industrial-Grade All-in-One ASR system with ASR, VAD, LID, and Punc modules. FireRedASR2 supports Chinese (Mandarin, 20+ dialects/accents), English, code-switching, and both speech and singi…

Python 549 39 Updated Jun 2, 2026

Soul-AILab / SoulX-Singer

Official inference code for SoulX-Singer: Towards High-Quality Zero-Shot Singing Voice Synthesis

Python 743 98 Updated May 29, 2026

antirez / voxtral.c

Pure C inference of Mistral Voxtral Realtime 4B speech to text model

C 1,692 118 Updated Feb 15, 2026

ace-step / ACE-Step-1.5

The most powerful local music generation model that outperforms almost all commercial alternatives, supporting Mac, AMD, Intel, and CUDA devices.

Python 11,089 1,345 Updated May 27, 2026

ASLP-lab / WenetSpeech-Wu-Repo

A Large-scale Wu Dialect Speech Corpus with Multi-dimensional Annotations

Python 152 4 Updated Feb 6, 2026

Qwen3-ASR is an open-source series of ASR models developed by the Qwen team at Alibaba Cloud, supporting stable multilingual speech/music/song recognition, language detection and timestamp prediction.

Python 2,911 293 Updated Jan 30, 2026

vllm-project / vllm-omni

A framework for efficient model inference with omni-modality models

Python 5,149 1,113 Updated Jun 15, 2026

Lightricks / LTX-2

Official Python inference and LoRA trainer package for the LTX-2 audio–video generative model.

Python 7,293 1,180 Updated May 28, 2026

k2-fsa / Flow2GAN

Hybrid Flow Matching and GAN with Multi-Resolution Network for Few-Step High-Fidelity Audio Generation

Python 142 8 Updated Mar 8, 2026

jingzhunxue / FlowMirror_HydraVox

FlowMirror-HydraVox — A natively accelerated multi-head autoregressive TTS system derived from CosyVoice 3.0. It predicts multiple tokens per step for faster, high-quality speech synthesis, featuri…

Python 49 4 Updated Feb 17, 2026

UbiquitousLearning / mllm

Fast Multimodal LLM on Mobile Devices

C++ 1,540 205 Updated Jun 9, 2026

FunAudioLLM / Fun-ASR

End-to-end speech recognition large model: 31 languages, dialects, accents, lyrics, hotwords, timestamps, speaker diarization. Trained on tens of millions of hours.

Python 1,280 125 Updated Jun 12, 2026

cjpais / Handy

A free, open source, and extensible speech-to-text application that works completely offline.

Rust 23,736 1,987 Updated Jun 11, 2026

Lovemefan lovemefan

Organizations

Lists (22)

AI Other

ASR

avatar

dataset

Diffusion

✨ Inspiration

kws

Language Model

llm

Mindspore

Music

python code style

quantization

RUST

Singing Voice Synthesis

Speech Editing

SpeechEnhance

speechllm

super resolution

TTS

工具

微服务

Stars