Skip to content
View lovemefan's full-sized avatar

Organizations

@RapidAI

Block or report lovemefan

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Fast Streaming TTS with MTP Acceleration and X-pred Mean Flow Distillation

Python 38 1 Updated Jun 9, 2026

LLM Wiki is a cross-platform desktop application that turns your documents into an organized, interlinked knowledge base — automatically. Instead of traditional RAG (retrieve-and-answer from scratc…

TypeScript 11,553 1,410 Updated Jun 14, 2026
Python 660 47 Updated Jun 12, 2026

An end-to-end framework for multi-speaker transcription that jointly models who spoke, when, and what.

Python 248 11 Updated Jun 4, 2026

Confucius4-TTS: a Multilingual and Cross-Lingual Zero-Shot TTS Engine

Python 168 17 Updated Jun 6, 2026

下一代自主进化智能体平台(GUI/TUI/Service/SDK)

Go 118 20 Updated Jun 15, 2026
Python 172 16 Updated Jun 2, 2026

Triton kernel fusion & CUDA Graph optimization for OmniVoice inference — RMSNorm, SwiGLU, Norm+Residual, SageAttention

Python 49 6 Updated Jun 8, 2026

RapidSpeech.cpp is a high-performance, edge-native speech intelligence framework written in pure C++. Built atop the ggml tensor library, it is designed to bridge the gap between state-of-the-art L…

C 18 4 Updated Jun 15, 2026

Robust Speech Recognition Across Languages, Dialects, and Complex Acoustic Scenarios

Python 274 27 Updated Apr 23, 2026

CosyVoice inference in C/C++

C++ 27 8 Updated Jun 14, 2026

HappyHorse AI turns text or images into remarkable 1080p cinematic video. Every HappyHorse AI video uses advanced motion synthesis — multi-shot storytelling, seamless transitions, and realism. Free…

136 14 Updated Apr 8, 2026

High-Quality Voice Cloning TTS for 600+ Languages

Python 7,477 1,169 Updated Jun 11, 2026

SOTA Open Source TTS

Python 30,826 2,634 Updated Jun 9, 2026

A complete AI agency at your fingertips - From frontend wizards to Reddit community ninjas, from whimsy injectors to reality checkers. Each agent is a specialized expert with personality, processes…

Shell 113,452 18,510 Updated Jun 15, 2026

A SOTA Industrial-Grade All-in-One ASR system with ASR, VAD, LID, and Punc modules. FireRedASR2 supports Chinese (Mandarin, 20+ dialects/accents), English, code-switching, and both speech and singi…

Python 549 39 Updated Jun 2, 2026

Official inference code for SoulX-Singer: Towards High-Quality Zero-Shot Singing Voice Synthesis

Python 743 98 Updated May 29, 2026

Pure C inference of Mistral Voxtral Realtime 4B speech to text model

C 1,692 118 Updated Feb 15, 2026

The most powerful local music generation model that outperforms almost all commercial alternatives, supporting Mac, AMD, Intel, and CUDA devices.

Python 11,089 1,345 Updated May 27, 2026

A Large-scale Wu Dialect Speech Corpus with Multi-dimensional Annotations

Python 152 4 Updated Feb 6, 2026

Qwen3-ASR is an open-source series of ASR models developed by the Qwen team at Alibaba Cloud, supporting stable multilingual speech/music/song recognition, language detection and timestamp prediction.

Python 2,911 293 Updated Jan 30, 2026

A framework for efficient model inference with omni-modality models

Python 5,149 1,113 Updated Jun 15, 2026

Official Python inference and LoRA trainer package for the LTX-2 audio–video generative model.

Python 7,293 1,180 Updated May 28, 2026

Hybrid Flow Matching and GAN with Multi-Resolution Network for Few-Step High-Fidelity Audio Generation

Python 142 8 Updated Mar 8, 2026

FlowMirror-HydraVox — A natively accelerated multi-head autoregressive TTS system derived from CosyVoice 3.0. It predicts multiple tokens per step for faster, high-quality speech synthesis, featuri…

Python 49 4 Updated Feb 17, 2026

Fast Multimodal LLM on Mobile Devices

C++ 1,540 205 Updated Jun 9, 2026

End-to-end speech recognition large model: 31 languages, dialects, accents, lyrics, hotwords, timestamps, speaker diarization. Trained on tens of millions of hours.

Python 1,280 125 Updated Jun 12, 2026

A free, open source, and extensible speech-to-text application that works completely offline.

Rust 23,736 1,987 Updated Jun 11, 2026
Next