⚡ boost inference speed of T5 models by 5x & reduce the model size by 3x.
-
Updated
Apr 24, 2023 - Python
⚡ boost inference speed of T5 models by 5x & reduce the model size by 3x.
[ICML 2025] "SepLLM: Accelerate Large Language Models by Compressing One Segment into One Separator"
Inference speed / accuracy tradeoff on text classification with transformer models such as BERT, RoBERTa, DeBERTa, SqueezeBERT, MobileBERT, Funnel Transformer, etc.
🚀 Achieve rapid training of NanoGPT (GPT-2 124M) on a single RTX 4090, targeting a validation loss below 3.28 with FineWeb-Edu data.
A practical guide for benchmarking TensorFlow Lite (TFLite) models, covering inference performance, resource usage, and runtime configuration using the TFLite Benchmark Tool.
Load Apple SEP firmware in Binary Ninja, split embedded Mach-O modules, map sections, add symbols, and resolve shared-library GOT refs
High-efficiency CNN for medical imaging. Achieved ~3.5x actual inference speedup over ShuffleNet-v2. Published in IEEE Access.
Add a description, image, and links to the inference-speed topic page so that developers can more easily learn about it.
To associate your repository with the inference-speed topic, visit your repo's landing page and select "manage topics."