Skip to content
View BlackSamorez's full-sized avatar
💭
Locked In
💭
Locked In

Highlights

  • Pro

Block or report BlackSamorez

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

code for "Tying the Loop - Tied Expert Layers in Mixture-of-Experts Language Models"

Python 5 Updated Jun 16, 2026
Vue 1 Updated May 22, 2026

Python package for LLM compression

Python 391 18 Updated Jun 19, 2026

Quartet II Official Code

Python 75 9 Updated May 1, 2026
Python 114 20 Updated Feb 26, 2026

An iOS app that integrates a Large Language Model (LLM) to process audio recordings for transcription and summarization.

C++ 17 2 Updated Nov 29, 2024

QuTLASS: CUTLASS-Powered Quantized BLAS for Deep Learning

C++ 185 22 Updated Nov 11, 2025
Jupyter Notebook 125 15 Updated Mar 18, 2026

First-of-its-kind AI benchmark for evaluating the protection capabilities of large language model (LLM) guard systems (guardrails and safeguards)

Python 70 5 Updated Mar 7, 2026

Autonomous coding agent as an SDK, IDE extension, or CLI assistant.

TypeScript 63,619 6,745 Updated Jun 22, 2026

Code for the EMNLP 2024 paper "Mathador-LM: A Dynamic Benchmark for Mathematical Reasoning on LLMs".

Python 8 Updated Jun 18, 2024
Python 176 20 Updated Jun 22, 2025

QQQ is an innovative and hardware-optimized W4A8 quantization solution for LLMs.

Python 155 24 Updated Aug 21, 2025

Technical Note: From C++98 to C++2x

147 12 Updated Jun 8, 2025

QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference

Python 123 5 Updated Mar 6, 2024

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

Python 1,091 88 Updated Sep 4, 2024

QuIP quantization

Python 66 6 Updated Mar 17, 2024

Official Pytorch repository for Extreme Compression of Large Language Models via Additive Quantization https://arxiv.org/pdf/2401.06118.pdf and PV-Tuning: Beyond Straight-Through Estimation for Ext…

Python 1,319 194 Updated Feb 26, 2026
Go 5 Updated Feb 18, 2024

Friends don't let friends make certain types of data visualization - What are they and why are they bad.

R 7,077 286 Updated Sep 3, 2025
Python 598 52 Updated Oct 29, 2024

Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".

Python 2,323 201 Updated Mar 27, 2024

Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".

Python 279 25 Updated Nov 3, 2023

Meditron is a suite of open-source medical Large Language Models (LLMs).

Python 2,186 211 Updated Apr 10, 2024

[ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding

Python 1,336 83 Updated Mar 6, 2025

Repository for the QUIK project, enabling the use of 4bit kernels for generative inference - EMNLP 2024

C++ 185 13 Updated Apr 16, 2024

💎A site, that contains systematic optimization methods and theory review

Jupyter Notebook 137 110 Updated Jun 15, 2026

distributed trainer for LLMs

Python 589 84 Updated May 20, 2024

Minimalist ML framework for Rust

Rust 20,524 1,611 Updated Jun 20, 2026
Next