flash-attention pre-build wheels

This repository provides wheels for the pre-build flash-attention.

Since building flash-attention takes a very long time and is resource-intensive, I also build and provide combinations of CUDA and PyTorch that are not officially distributed.

The building Github Actions Workflow can be found here.

The built packages are available on the release page.

Install

pip install https://github.com/mjun0812/flash-attention-prebuild-wheels/releases/download/v0.0.0/flash_attn-2.6.3+cu124torch2.5-cp312-cp312-linux_x86_64.whl

Packages

flash_attn-[FLASH_ATTN_VERSION]+cu[CUDA_VERSION]torch[TORCH_VERSION]-cp[PYTHON_VERSION]-cp[PYTHON_VERSION]-linux_x86_64.whl

# example: flash_attn=v2.6.3, CUDA=12.4.1, torch=2.5.1, Python=3.12
flash_attn-2.6.3+cu124torch2.5-cp312-cp312-linux_x86_64.whl

v0.0.6

Release

Flash-Attention	Python	PyTorch	CUDA
2.4.3, 2.5.9, 2.6.3, 2.7.4.post1	3.10, 3.11, 3.12	2.2.2, 2.3.1, 2.4.1, 2.5.1, 2.6.0	12.4.1, 12.6.3

v0.0.5

Release

Flash-Attention	Python	PyTorch	CUDA
2.6.3, 2.7.4.post1	3.10, 3.11, 3.12	2.0.1, 2.1.2, 2.2.2, 2.3.1, 2.4.1, 2.5.1, 2.6.0	12.4.1, 12.6.3

v0.0.4

Release

Flash-Attention	Python	PyTorch	CUDA
2.7.3	3.10, 3.11, 3.12	2.0.1, 2.1.2, 2.2.2, 2.3.1, 2.4.1, 2.5.1	11.8.0, 12.1.1, 12.4.1

v0.0.3

Release

Flash-Attention	Python	PyTorch	CUDA
2.7.2.post1	3.10, 3.11, 3.12	2.0.1, 2.1.2, 2.2.2, 2.3.1, 2.4.1, 2.5.1	11.8.0, 12.1.1, 12.4.1

v0.0.2

Release

Flash-Attention	Python	PyTorch	CUDA
2.4.3, 2.5.6, 2.6.3, 2.7.0.post2	3.10, 3.11, 3.12	2.0.1, 2.1.2, 2.2.2, 2.3.1, 2.4.1, 2.5.1	11.8.0, 12.1.1, 12.4.1

v0.0.1

Release

flash-attention	Python	PyTorch	CUDA
1.0.9, 2.4.3, 2.5.6, 2.5.9, 2.6.3	3.10, 3.11, 3.12	2.0.1, 2.1.2, 2.2.2, 2.3.1, 2.4.1, 2.5.0	11.8.0, 12.1.1, 12.4.1

v0.0.0

Release

flash-attention	Python	PyTorch	CUDA
2.4.3, 2.5.6, 2.5.9, 2.6.3	3.11, 3.12	2.0.1, 2.1.2, 2.2.2, 2.3.1, 2.4.1, 2.5.0	11.8.0, 12.1.1, 12.4.1

Original

repo

@inproceedings{dao2022flashattention,
  title={Flash{A}ttention: Fast and Memory-Efficient Exact Attention with {IO}-Awareness},
  author={Dao, Tri and Fu, Daniel Y. and Ermon, Stefano and Rudra, Atri and R{\'e}, Christopher},
  booktitle={Advances in Neural Information Processing Systems (NeurIPS)},
  year={2022}
}
@inproceedings{dao2023flashattention2,
  title={Flash{A}ttention-2: Faster Attention with Better Parallelism and Work Partitioning},
  author={Dao, Tri},
  booktitle={International Conference on Learning Representations (ICLR)},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
.github/workflows		.github/workflows
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

flash-attention pre-build wheels

Install

Packages

v0.0.6

v0.0.5

v0.0.4

v0.0.3

v0.0.2

v0.0.1

v0.0.0

Original

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

flash-attention pre-build wheels

Install

Packages

v0.0.6

v0.0.5

v0.0.4

v0.0.3

v0.0.2

v0.0.1

v0.0.0

Original

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Packages