xFormers is a modular and field agnostic library to flexibly generate transformer architectures by interoperable and optimized building blocks.

Last update: Jan 08, 2023

Related tags

Overview

Description

xFormers is a modular and field agnostic library to flexibly generate transformer architectures by interoperable and optimized building blocks.

Getting started

The full documentation contains instructions for getting started, deep dives and tutorials about the various APIs. If in doubt, please check out the HOWTO. Only some general considerations are laid out in the README.

Installation

To install xFormers, it is recommended to use a dedicated virtual environment, as often with python, through python-virtualenv or conda for instance. There are two ways you can install it:

Directly from the pip package

You can also fetch the latest release from PyPi. This will not contain the wheels for the sparse attention kernels, for which you will need to build from source.

conda create --name xformer_env
conda activate xformer_env
pip install xformers

Build from source (dev mode)

These commands will fetch the latest version of the code, create a dedicated conda environment, activate it then install xFormers from source. If you want to build the sparse attention CUDA kernels, please make sure that the next point is covered prior to running these instructions.

git clone [email protected]:fairinternal/xformers.git
conda create --name xformer_env python=3.8
conda activate xformer_env
cd xformers
pip install -r requirements.txt
pip install -e .

Sparse attention kernels

Installing the CUDA-based sparse attention kernels may require extra care, as this mobilizes the CUDA toolchain. As a reminder, these kernels are built when you run pip install -e . and the CUDA buildchain is available (NVCC compiler). Re-building can for instance be done via python3 setup.py clean && python3 setup.py develop, so similarly wipe the build folder and redo a pip install -e.

Some advices related to building these CUDA-specific components, tentatively adressing common pitfalls. Please make sure that:

NVCC and the current CUDA runtime match. Depending on your setup, you may be able to change the CUDA runtime with module unload cuda module load cuda/xx.x, possibly also nvcc
the version of GCC that you're using matches the current NVCC capabilities
the TORCH_CUDA_ARCH_LIST env variable is set to the architures that you want to support. A suggested setup (slow to build but comprehensive) is export TORCH_CUDA_ARCH_LIST="6.0;6.1;6.2;7.0;7.2;8.0;8.6"

Triton

Some parts of xFormers use Triton, and will only expose themselves if Triton is installed, and a compatible GPU is present (nVidia GPU with tensor cores). If Triton was not installed as part of the testing procedure, you can install it directly by running pip install triton. You can optionally test that the installation is successful by running one of the Triton-related benchmarks, for instance python3 xformers/benchmarks/benchmnark_triton_softmax.py

Triton will cache the compiled kernels to /tmp/triton by default. If this becomes an issue, this path can be specified through the TRITON_CACHE_DIR environment variable.

Testing the installation

This will run a benchmark of the attention mechanisms exposed by xFormers, and generate a runtime and memory plot. If this concludes without errors, the installation is successful. This step is optional, and you will need some extra dependencies for it to be able to go through : pip install -r requirements-benchmark.txt.

Once this is done, you can run this particular benchmark as follows:

python3 xformers/benchmarks/benchmark_encoder.py --activations relu  --plot -emb 256 -bs 32 -heads 16

Using xFormers

Transformers key concepts

Let's start from a classical overview of the Transformer architecture (illustration from Lin et al,, "A Survey of Transformers")

You'll find the key repository boundaries in this illustration: a Transformer is generally made of a collection of attention mechanisms, embeddings to encode some positional information, feed-forward blocks and a residual path (typically referred to as pre- or post- layer norm). These boundaries do not work for all models, but we found in practice that given some accomodations it could capture most of the state of the art.

Models are thus not implemented in monolithic files, which are typically complicated to handle and modify. Most of the concepts present in the above illustration correspond to an abstraction level, and when variants are present for a given sub-block it should always be possible to select any of them. You can focus on a given encapsulation level and modify it as needed.

Repo map

├── components                  # Parts zoo, any of which can be used directly
│   └── attention
│        └ ...                  # all the supported attentions
│   └── feedforward             #
│        └ ...                  # all the supported feedforwards
│   └─- positional_embedding    #
│        └ ...                  # all the supported positional embeddings
│   ├── activations.py          #
│   └── multi_head_dispatch.py  # (optional) multihead wrap
d├── factory
│   ├── block_factory.py        # (optional) helper to programatically generate layers
│   └── model_factory.py        # (optional) helper to programatically generate models
├── models
...                             # Full models, ready to be used

Attention mechanisms

Scaled dot product
- Attention is all you need, Vaswani et al., 2017
Sparse
- whenever a sparse enough mask is passed
BlockSparse
- courtesy of Triton
Linformer
- Linformer, self-attention with linear complexity, Wang et al., 2020
Nystrom
- Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention, Xiong et al., 2021
Local. Notably used in (and many others)
- Longformer: The Long-Document Transformer, Beltagy et al., 2020
- BigBird, Transformer for longer sequences, Zaheer et al., 2020
Favor/Performer
- Rethinking Attention with Performers, Choromanski et al., 2020
Orthoformer
- Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers, Patrick et al., 2021
Random
- See BigBird, Longformers,..
Global
- See BigBird, Longformers,..
FourierMix
- FNet: Mixing Tokens with Fourier Transforms, Lee-Thorp et al.
... add a new one see Contribution.md

Feed forward mechanisms

MLP
Fused

Positional embedding

Key Features

Many attention mechanisms, interchangeables
Optimized building blocks, beyond PyTorch primitives
1. sparse attention
2. block-sparse attention
3. fused softmax
4. fused linear layer
5. fused layer norm
Benchmarking and testing tools
1. micro benchnmarks
2. transformer block benchmark
3. LRA, with SLURM suppot
Programatic and sweep friendly layer and model construction
Hackable
1. Not using monolithic CUDA kernels, composable building blocks
2. Using Triton for some optimized parts, explicit, pythonic and user-accessible

FAQ ?

We've tried to collect a relatively exhaustive list of explanations in the HOWTO

License

xFormers has a BSD-style license, as found in the LICENSE file.

Citing xFormers

If you use xFormers in your publication, please cite it by using the following BibTeX entry.

@Misc{xFormers2021,
  author =       {Benjamin Lefaudeux, Francisco Massa, Diana Liskovich, Min Xu, Jieru Hu, Marta Tintore, Susan Zhang },
  title =        {xFormers: A modular and hackable Transformer modelling library},
  howpublished = {\url{https://github.com/facebookresearch/xformers}},
  year =         {2021}
}

Comments

[feat] Dropout(Activation(x+bias)), now with partial BW fusion
What does this PR do?

This was a long time in the making.. Fusing the BW part of the activation/bias/dropout kernel. Not quite perfect but in some places the speed goes really bananas (like x3 or x4 the naive calls). Fusing this implied flipping the whole problem upside down, basically the seeds have to be per collum, and the kernels (fw and bw) also work that way. This allows us to fuse the bias gradient computations, since it's a sum over that direction

TODO:

[x] add more unit tests to check that the dropout drops are respected on average

[x] possibly make sure that the rand mask does not repeat (may or may not be a big deal). Ok this is doable by making the kernels cooperate on the same col, like Phil does on LayerNorm

[x] improve on the scheduling for small buffers

[x] Fix the atomic add funkiness (works for now but this does not look completely right, num_warps dependent)

Before submitting

[x] Did you have fun?

Make sure you had fun coding 🙃

[x] Did you read the contributor guideline?

[x] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)

[ ] N/A

[ ] Did you make sure to update the docs?

[ ] N/A

[x] Did you write any new necessary tests?

[ ] N/A

[x] Did you update the changelog? (if needed)

[ ] N/A

PR review

Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged.
CLA Signed
opened by blefaudeux 26
Support Windows and ideally build wheels for it

🚀 Feature

Supporting Windows in xformers.

Motivation

xformers provide excellent tools to increase the speed of inference, for example close to 2x in Stable Diffusion. Sadly, it lacks Windows support. This has barred us from using it on https://github.com/AUTOMATIC1111/stable-diffusion-webui as most users and developers (including myself) use Windows.

Pitch

Currently, xformers will fail to compile on Windows for a multitude of errors, some of which are trivial but most are not. Enabling Windows usage by fixing these errors and ideally distributing Windows wheels would allow projects to make xformers a necessary requirement & use it.

Alternatives

Additional context

cc. @fmassa

opened by C43H66N12O12S2 23
triton 2.0 changes
What does this PR do?

Fixes triton to work with version 2.0.0.

TODOs:

[x] Move the syntax to triton2

[x] Fix fused dropout

[ ] Fix the blocksparse op API having changed

[x] Fix fused linear layer

[x] Update the benchmarks

Before submitting

[x] Did you have fun?

Make sure you had fun coding 🙃

[ ] Did you read the contributor guideline?

[ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)

[ ] N/A

[ ] Did you make sure to update the docs?

[ ] N/A

[ ] Did you write any new necessary tests?

[ ] N/A

[ ] Did you update the changelog? (if needed)

[ ] N/A

PR review

Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged.
CLA Signed
opened by kashif 23

Pip installation fails, `CUTCLASS` not found

🐛 Bug

pip installation fails in a docker container, CUTCLASS not found, git submodule update --init --recursive not executed

To Reproduce

Dockerfile

FROM pytorch/pytorch:1.12.1-cuda11.3-cudnn8-runtime
RUN pip install xformers

then

docker build .

Error Trace

open

#1 [internal] load build definition from Dockerfile
#1 sha256:bc3772a9760c6470030d3506e7afa0b9caa2a77f63376fe30fc296a334d5c980
#1 transferring dockerfile: 116B done
#1 DONE 0.0s

#2 [internal] load .dockerignore
#2 sha256:5b674e66e988c8852edbf605c0d0921ac6eed40841cd55d9112e0d92242091a1
#2 transferring context: 2B done
#2 DONE 0.0s

#3 [internal] load metadata for docker.io/pytorch/pytorch:1.12.1-cuda11.3-cudnn8-runtime
#3 sha256:409f78a4f3551ef4b6d7a4b064ff72bb54f0677d599351b4d0dcdff08b926834
#3 DONE 0.8s

#4 [1/2] FROM docker.io/pytorch/pytorch:[email protected]:0bc0971dc8ae319af610d493aced87df46255c9508a8b9e9bc365f11a56e7b75
#4 sha256:2e3e89abd93f2e7b42b070196f0e6be4ce38a2d360c98232440e1d90189bdb02
#4 CACHED

#5 [2/2] RUN pip install xformers
#5 sha256:ef3133015f56a22d509f2aa1ef730afdcaa2591838105ba332650ff73ceb9ff9
#5 1.012 Collecting xformers
#5 1.313   Downloading xformers-0.0.13.tar.gz (292 kB)
#5 1.429      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 292.5/292.5 kB 2.6 MB/s eta 0:00:00
#5 1.534   Preparing metadata (setup.py): started
#5 2.952   Preparing metadata (setup.py): finished with status 'error'
#5 2.961   error: subprocess-exited-with-error
#5 2.961
#5 2.961   × python setup.py egg_info did not run successfully.
#5 2.961   │ exit code: 1
#5 2.961   ╰─> [8 lines of output]
#5 2.961       Traceback (most recent call last):
#5 2.961         File "<string>", line 36, in <module>
#5 2.961         File "<pip-setuptools-caller>", line 34, in <module>
#5 2.961         File "/tmp/pip-install-94ty405p/xformers_31debcecca1f46019eadae6eead5cc3f/setup.py", line 239, in <module>
#5 2.961           ext_modules=get_extensions(),
#5 2.961         File "/tmp/pip-install-94ty405p/xformers_31debcecca1f46019eadae6eead5cc3f/setup.py", line 158, in get_extensions
#5 2.961           "CUTLASS submodule not found. Did you forget "
#5 2.961       RuntimeError: CUTLASS submodule not found. Did you forget to run `git submodule update --init --recursive` ?
#5 2.961       [end of output]
#5 2.961
#5 2.961   note: This error originates from a subprocess, and is likely not a problem with pip.
#5 2.965 error: metadata-generation-failed
#5 2.965
#5 2.965 × Encountered error while generating package metadata.
#5 2.965 ╰─> See above for output.
#5 2.965
#5 2.965 note: This is an issue with the package mentioned above, not pip.
#5 2.965 hint: See above for details.
#5 ERROR: executor failed running [/bin/sh -c pip install xformers]: exit code: 1
------
 > [2/2] RUN pip install xformers:
------
executor failed running [/bin/sh -c pip install xformers]: exit code: 1

Expected behavior

installation should work.

Environment

in the container, running docker on windows

open

PyTorch version: 1.12.1
Is debug build: False
CUDA used to build PyTorch: 11.3
ROCM used to build PyTorch: N/A 

OS: Ubuntu 18.04.6 LTS (x86_64) 
GCC version: Could not collect  
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.17

Python version: 3.7.13 (default, Mar 29 2022, 02:18:16)  [GCC 7.5.0] (64-bit runtime)
Python platform: Linux-5.10.16.3-microsoft-standard-WSL2-x86_64-with-debian-buster-sid
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration: GPU 0: NVIDIA GeForce GTX 1060
Nvidia driver version: 517.48
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] numpy==1.21.5
[pip3] torch==1.12.1
[pip3] torchtext==0.13.1
[pip3] torchvision==0.13.1
[conda] blas                      1.0                         mkl
[conda] cudatoolkit               11.3.1               ha36c431_9    nvidia
[conda] ffmpeg                    4.3                  hf484d3e_0    pytorch
[conda] mkl-service               2.4.0            py37h7f8727e_0
[conda] mkl_fft                   1.3.1            py37hd3c417c_0
[conda] mkl_random                1.2.2            py37h51133e4_0
[conda] numpy                     1.21.5           py37he7a7128_2
[conda] numpy-base                1.21.5           py37hf524024_2
[conda] pytorch                   1.12.1          py3.7_cuda11.3_cudnn8.3.2_0    pytorch
[conda] pytorch-mutex             1.0                        cuda    pytorch
[conda] torchtext                 0.13.1                     py37    pytorch
[conda] torchvision               0.13.1               py37_cu113    pytorch

Additional context

I don't think this problem has anything to do with os/python/pytorch/cuda/nvcc versions, the setup.py seems to be tailored for local / manual install, and fails in the context.

opened by AbdBarho 20

[feat] add split_dim arg to reversible, remove retain_grad, add benchmark_reversible

This PR removes the repeated chunk and cat operations in xformers' RevNet code. This way, the RevNet implementation will become a little bit faster.
I'd strongly recommend calling a library like MemCNN or RevLib directly as they make it easier to switch the coupling function and generally give the user more freedom.

Unfortunately, I can't sign the CLA at the moment, as it keeps saying

Sorry, something went wrong. We're working on getting this fixed as soon as we can.

CLA Signed

opened by ClashLuke 20
Added SmeLU
What does this PR do?

Fixes #262 .

Before submitting

[x] Did you have fun?

Make sure you had fun coding 🙃

[x] Did you read the contributor guideline?

[ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)

[x] N/A

[ ] Did you make sure to update the docs?

[ ] N/A

[ ] Did you write any new necessary tests?

[ ] N/A

[ ] Did you update the changelog? (if needed)

[ ] N/A

PR review

Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged.
CLA Signed
opened by kashif 17
[chore] release v0.0.13
What does this PR do?

bump the dev version number to be able to release v0.0.13, see #402

Before submitting

[ ] Did you have fun?

Make sure you had fun coding 🙃

[ ] Did you read the contributor guideline?

[ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)

[ ] N/A

[ ] Did you make sure to update the docs?

[ ] N/A

[ ] Did you write any new necessary tests?

[ ] N/A

[ ] Did you update the changelog? (if needed)

[ ] N/A

PR review

Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged.
CLA Signed
opened by blefaudeux 14
[feat] Compositional attention
What does this PR do?

Implements Compositional Attention (based on the reference implementation), as mentioned in https://github.com/facebookresearch/xformers/issues/41

Paper

TODOs

[x] Sane defaults

[x] Speedup wherever possible. Looks like it takes a lot of memory also at the moment, probably some dummy mistakes

[x] Maybe self-attention optimization (single proj) -> doable if moving the projections within the attention to the inproj class, worth it?

[x] Add a lot of explanations/documentations

[0] Some IR results ? -> that would be for another task probably ?

cc @sarthmit if interested

Before submitting

[x] Did you have fun?

Make sure you had fun coding 🙃

[x] Did you read the contributor guideline?

[x] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)

[ ] N/A

[x] Did you make sure to update the docs?

[ ] N/A

[x] Did you write any new necessary tests?

[ ] N/A

[x] Did you update the changelog? (if needed)

[ ] N/A

PR review

Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged.
CLA Signed
opened by blefaudeux 14
Is xformers still not support cuda 12.0?

❓ Questions and Help

I got the following error while installing...

Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com/ Obtaining file:///F:/Stable_Diffusion/stable-diffusion-webui-master/repositories/xformers Preparing metadata (setup.py) ... error error: subprocess-exited-with-error

× python setup.py egg_info did not run successfully. │ exit code: 1 ╰─> [9 lines of output] No CUDA runtime is found, using CUDA_HOME='C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.0' Traceback (most recent call last): File "", line 2, in File "", line 34, in File "F:\Stable_Diffusion\stable-diffusion-webui-master\repositories\xformers\setup.py", line 293, in symlink_package( File "F:\Stable_Diffusion\stable-diffusion-webui-master\repositories\xformers\setup.py", line 83, in symlink_package os.symlink(src=path_from, dst=path_to) OSError: [WinError 1314] A required privilege is not held by the client: 'F:\Stable_Diffusion\stable-diffusion-webui-master\repositories\xformers\third_party\flash-attention\flash_attn' -> 'F:\Stable_Diffusion\stable-diffusion-webui-master\repositories\xformers\xformers_flash_attn' [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip. error: metadata-generation-failed

× Encountered error while generating package metadata. ╰─> See above for output.

Is it for cuda 12. Should I degrade the cuda version?

or what is the problem, can anyone help?

opened by debdip 13

Cannot install xformers on linux server

❓ Questions and Help

When I tried either pip install or build from source, I get this issue:

 × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [18 lines of output]
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/home/username/xformers/setup.py", line 239, in <module>
          ext_modules=get_extensions(),
        File "/home/username/xformers/setup.py", line 187, in get_extensions
          cuda_version = get_cuda_version(CUDA_HOME)
        File "/home/username/xformers/setup.py", line 51, in get_cuda_version
          raw_output = subprocess.check_output([nvcc_bin, "-V"], universal_newlines=True)
        File "/home/username/anaconda3/envs/test_env/lib/python3.9/subprocess.py", line 424, in check_output
          return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
        File "/home/username/anaconda3/envs/test_env/lib/python3.9/subprocess.py", line 505, in run
          with Popen(*popenargs, **kwargs) as process:
        File "/home/username/anaconda3/envs/test_env/lib/python3.9/subprocess.py", line 951, in __init__
          self._execute_child(args, executable, preexec_fn, close_fds,
        File "/home/username/anaconda3/envs/test_env/lib/python3.9/subprocess.py", line 1821, in _execute_child
          raise child_exception_type(errno_num, err_msg, err_filename)
      FileNotFoundError: [Errno 2] No such file or directory: '/home/username/anaconda3/envs/test_env/bin/nvcc'
      [end of output]

here's the output of nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Tue_Mar__8_18:18:20_PST_2022
Cuda compilation tools, release 11.6, V11.6.124
Build cuda_11.6.r11.6/compiler.31057947_0

and as additional information, I was able to install pytorch the usual way/verify that cuda is available.

opened by fedshyvana 13

Encoder decoder arch doesnt work when sequence lengths are different

🐛 Bug

I get an error when the sequence lengths to the encoder and decoder are different, e.g. in the code snippet below:

Command

EMB = 384
SEQ_ENC = 128
SEQ_DEC = 64
BATCH = 16
VOCAB = 64

my_config = [
    # A list of the encoder or decoder blocks which constitute the Transformer.
    # Note that a sequence of different encoder blocks can be used, same for decoders
    {
        "reversible": False,  # Optionally make these layers reversible, to save memory
            "block_type": "encoder",
            "num_layers": 3,  # Optional, this means that this config will repeat N times
            "dim_model": EMB,
            "layer_norm_style": "pre",  # Optional, pre/post
            "position_encoding_config": {
                "name": "vocab",  # whatever position encodinhg makes sense
                "seq_len": SEQ_ENC,
                "vocab_size": VOCAB,
            },
            "multi_head_config": {
                "num_heads": 4,
                "residual_dropout": 0,
                "attention": {
                    "name": "linformer",  # whatever attention mechanism
                    "dropout": 0,
                    "causal": False,
                    "seq_len": SEQ_ENC,
                },
            },
            "feedforward_config": {
                "name": "MLP",
                "dropout": 0,
                "activation": "relu",
                "hidden_layer_multiplier": 4,
            },
        },
    {
        "reversible": False,  # Optionally make these layers reversible, to save memory

            "block_type": "decoder",
            "num_layers": 3,  # Optional, this means that this config will repeat N times
            "dim_model": EMB,
            "layer_norm_style": "pre",  # Optional, pre/post
            "position_encoding_config": {
                "name": "vocab",  # whatever position encodinhg makes sense
                "seq_len": SEQ_DEC,
                "vocab_size": VOCAB,
            },
            "multi_head_config_masked": {
                "num_heads": 4,
                "residual_dropout": 0,
                "attention": {
                    "name": "nystrom",  # whatever attention mechanism
                    "dropout": 0,
                    "causal": True,
                    "seq_len": SEQ_DEC,
                },
            },
            "multi_head_config_cross": {
                "num_heads": 4,
                "residual_dropout": 0,
                "attention": {
                    "name": "favor",  # whatever attention mechanism
                    "dropout": 0,
                    "causal": True,
                    "seq_len": SEQ_DEC,
                },
            },
            "feedforward_config": {
                "name": "MLP",
                "dropout": 0,
                "activation": "relu",
                "hidden_layer_multiplier": 4,
            },
        },
]

# This part of xFormers is entirely type checked and needs a config object,
# could be changed in the future
config = xFormerConfig(my_config)
model = xFormer.from_config(config)

#  Test out with dummy inputs
src = (torch.rand((BATCH, SEQ_ENC)) * VOCAB).abs().to(torch.int)
tgt = (torch.rand((BATCH, SEQ_DEC)) * VOCAB).abs().to(torch.int)
y = model(src=src, tgt=tgt)

print(y.shape)

Expected behavior

torch.Size([16, 64, 384])

however, I get:

RuntimeError: einsum(): operands do not broadcast with remapped shapes [original->remapped]: [64, 128, 96, 96]->[64, 128, 96, 96] [64, 64, 96]->[64, 64, 1, 96]

ongoing

opened by kashif 13

How to set random seeds fixed

❓ Questions and Help

Different results occur when I run the same code twice. And the set_seed func run before all.

def set_seed(seed):
    random.seed(seed)  # Python random module.
    np.random.seed(seed)  # Numpy module.
    set_seed(seed)
    os.environ['PYTHONHASHSEED'] = str(seed)
    torch.random.manual_seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)  # if you are using multi-GPU.
    torch.backends.cudnn.benchmark = cudnn_benchmark
    torch.backends.cudnn.deterministic = cudnn_deterministic

opened by scp92 1

Allowing decoder only definition
🚀 Feature

Allow only a decoder config to be defined.

Motivation

I want to define only a decoder and pass in a memory vector from another source.

Pitch

I tried this change locally and it allows me to do what I want it to do: https://github.com/facebookresearch/xformers/compare/main...nh2liu:xformers:patch-1

Not sure if this has extending implications because it seems this code has been around for a while but the comments # If decoder: either use the encoder ouput, or just decode, both options are possible indicate that this may be a bug.

Alternatives

NOOP encoder will also allow this functionality.
opened by nh2liu 2

build from source failed

🐛 Bug

Command

pip install ninja pip install -v -U git+https://github.com/facebookresearch/[email protected]#egg=xformers

ERROR INFO

/tmp/pip-install-zruba6fo/xformers_c3f4b10bded7460eaa800569194ec7d2/xformers/csrc/attention/cuda/fmha/attention.cu:650:66:   required from ‘void _GLOBAL__N__7fac2228_12_attention_cu_724ba955_12677::launch_attention(at::Tensor&, at::Tensor&, const at::Tensor&, const at::Tensor&, const at::Tensor&, const at::Tensor&, float, at::PhiloxCudaState) [with bool compute_logsumexp = true]’
/tmp/pip-install-zruba6fo/xformers_c3f4b10bded7460eaa800569194ec7d2/xformers/csrc/attention/cuda/fmha/attention.cu:793:92:   required from here
/tmp/pip-install-zruba6fo/xformers_c3f4b10bded7460eaa800569194ec7d2/xformers/csrc/attention/cuda/fmha/attention.cu:612:58: warning: ‘at::GenericPackedTensorAccessor<T, N, PtrTraits, index_t> at::Tensor::packed_accessor() const & [with T = float; long unsigned int N = 3; PtrTraits = at::DefaultPtrTraits; index_t = long int]’ is deprecated: packed_accessor is deprecated, use packed_accessor32 or packed_accessor64 instead [-Wdeprecated-declarations]
  612 |     return attn_bias.packed_accessor<scalar_t, 3>();
      |                                                          ^
/usr/local/lib/python3.8/dist-packages/torch/include/ATen/core/TensorBody.h:247:1: note: declared here
  247 |   GenericPackedTensorAccessor<T,N,PtrTraits,index_t> packed_accessor() const & {
      | ^ ~~~~~~~~~~~~~
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/torch/utils/cpp_extension.py", line 1901, in _run_ninja_build
    subprocess.run(
  File "/usr/lib/python3.8/subprocess.py", line 516, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/tmp/pip-install-zruba6fo/xformers_c3f4b10bded7460eaa800569194ec7d2/setup.py", line 301, in <module>
    setuptools.setup(
  File "/usr/local/lib/python3.8/dist-packages/setuptools/__init__.py", line 153, in setup
    return distutils.core.setup(**attrs)
  File "/usr/lib/python3.8/distutils/core.py", line 148, in setup
    dist.run_commands()
  File "/usr/lib/python3.8/distutils/dist.py", line 966, in run_commands
    self.run_command(cmd)
  File "/usr/lib/python3.8/distutils/dist.py", line 985, in run_command
    cmd_obj.run()
  File "/usr/local/lib/python3.8/dist-packages/setuptools/command/install.py", line 68, in run
    return orig.install.run(self)
  File "/usr/lib/python3.8/distutils/command/install.py", line 589, in run
    self.run_command('build')
  File "/usr/lib/python3.8/distutils/cmd.py", line 313, in run_command
    self.distribution.run_command(command)
  File "/usr/lib/python3.8/distutils/dist.py", line 985, in run_command
    cmd_obj.run()
  File "/usr/lib/python3.8/distutils/command/build.py", line 135, in run
    self.run_command(cmd_name)
  File "/usr/lib/python3.8/distutils/cmd.py", line 313, in run_command
    self.distribution.run_command(command)
  File "/usr/lib/python3.8/distutils/dist.py", line 985, in run_command
    cmd_obj.run()
  File "/usr/local/lib/python3.8/dist-packages/setuptools/command/build_ext.py", line 79, in run
    _build_ext.run(self)
  File "/usr/local/lib/python3.8/dist-packages/Cython/Distutils/old_build_ext.py", line 186, in run
    _build_ext.build_ext.run(self)
  File "/usr/lib/python3.8/distutils/command/build_ext.py", line 340, in run
    self.build_extensions()
  File "/usr/local/lib/python3.8/dist-packages/torch/utils/cpp_extension.py", line 843, in build_extensions
    build_ext.build_extensions(self)
  File "/usr/local/lib/python3.8/dist-packages/Cython/Distutils/old_build_ext.py", line 195, in build_extensions
    _build_ext.build_ext.build_extensions(self)
  File "/usr/lib/python3.8/distutils/command/build_ext.py", line 449, in build_extensions
    self._build_extensions_serial()
  File "/usr/lib/python3.8/distutils/command/build_ext.py", line 474, in _build_extensions_serial
    self.build_extension(ext)
  File "/usr/local/lib/python3.8/dist-packages/setuptools/command/build_ext.py", line 202, in build_extension
    _build_ext.build_extension(self, ext)
  File "/usr/lib/python3.8/distutils/command/build_ext.py", line 528, in build_extension
    objects = self.compiler.compile(sources,
  File "/usr/local/lib/python3.8/dist-packages/torch/utils/cpp_extension.py", line 658, in unix_wrap_ninja_compile
    _write_ninja_file_and_compile_objects(
  File "/usr/local/lib/python3.8/dist-packages/torch/utils/cpp_extension.py", line 1573, in _write_ninja_file_and_compile_objects
    _run_ninja_build(
  File "/usr/local/lib/python3.8/dist-packages/torch/utils/cpp_extension.py", line 1917, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error compiling objects for extension
Running setup.py install for xformers: finished with status 'error'

ERROR: Command errored out with exit status 1: /usr/bin/python -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-zruba6fo/xformers_c3f4b10bded7460eaa800569194ec7d2/setup.py'"'"'; file='"'"'/tmp/pip-install-zruba6fo/xformers_c3f4b10bded7460eaa800569194ec7d2/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(file) if os.path.exists(file) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record /tmp/pip-record-lj6j_c0s/install-record.txt --single-version-externally-managed --user --prefix= --compile --install-headers /home/oppoer/.local/include/python3.8/xformers Check the logs for full command output. WARNING: You are using pip version 21.2.4; however, version 22.3.1 is available. You should consider upgrading via the '/usr/bin/python -m pip install --upgrade pip' command.

Environment

My docker image is: nvcr.io/nvidia/pytorch:22.06-py3

Collecting environment information... PyTorch version: 1.13.0a0+936e930 Is debug build: False CUDA used to build PyTorch: 11.8 ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.5 LTS (x86_64) GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 Clang version: Could not collect CMake version: version 3.24.1 Libc version: glibc-2.31

Python version: 3.8.10 (default, Jun 22 2022, 20:18:18) [GCC 9.4.0] (64-bit runtime) Python platform: Linux-3.10.0-957.27.2.el7.x86_64-x86_64-with-glibc2.29 Is CUDA available: True CUDA runtime version: 11.8.89 GPU models and configuration: GPU 0: NVIDIA A100-SXM4-80GB Nvidia driver version: 470.129.06 cuDNN version: Probably one of the following: /usr/lib/x86_64-linux-gnu/libcudnn.so.8.7.0 /usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.7.0 /usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.7.0 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.7.0 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.7.0 /usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.7.0 /usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.7.0 HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True

Versions of relevant libraries: [pip3] functorch==1.13.0a0+936e930 [pip3] mypy-extensions==0.4.3 [pip3] numpy==1.22.2 [pip3] pytorch-quantization==2.1.2 [pip3] torch==1.13.0a0+936e930 [pip3] torch-tensorrt==1.3.0a0 [pip3] torchtext==0.13.0a0+fae8e8c [pip3] torchvision==0.15.0a0 [conda] Could not collect

opened by GxjGit 7

Unable to Build from latest

🐛 Bug

Command

cd xformers
git pull
git submobule update --recursive --remote
pip install -e .

To Reproduce

Steps to reproduce the behavior:

pull latest from git, (at hash f82722f61f972c02ebc54431e3e4717f21b3e9b9)
pull latest submodules
build

Expected behavior

Building to run successfully

Environment

Collecting environment information...
PyTorch version: 1.12.1+cu116
Is debug build: False
CUDA used to build PyTorch: 11.6
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.5 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Clang version: 10.0.0-4ubuntu1
CMake version: version 3.25.0
Libc version: glibc-2.31

Python version: 3.8.13 (default, Mar 28 2022, 11:38:47)  [GCC 7.5.0] (64-bit runtime)
Python platform: Linux-5.10.102.1-microsoft-standard-WSL2-x86_64-with-glibc2.17
Is CUDA available: True
CUDA runtime version: 11.6.124
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 2060 SUPER
Nvidia driver version: 526.47
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.4.1
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.4.1
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.4.1
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.4.1
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.4.1
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.4.1
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.4.1
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.23.2
[pip3] pytorch-lightning==1.7.5
[pip3] torch==1.12.1+cu116
[pip3] torchaudio==0.12.1+cu116
[pip3] torchdynamo==1.12.0
[pip3] torchmetrics==0.9.3
[pip3] torchvision==0.13.1+cu116
[conda] numpy                     1.23.2                   pypi_0    pypi
[conda] pytorch-lightning         1.7.5                    pypi_0    pypi
[conda] torch                     1.12.1+cu116             pypi_0    pypi
[conda] torchaudio                0.12.1+cu116             pypi_0    pypi
[conda] torchdynamo               1.12.0                   pypi_0    pypi
[conda] torchmetrics              0.9.3                    pypi_0    pypi
[conda] torchvision               0.13.1+cu116             pypi_0    pypi

PyTorch Version (e.g., 1.0): 1.12.1+cu116
OS (e.g., Linux): WSL
How you installed PyTorch (conda, pip, source): pip install -e .
Build command you used (if compiling from source): pip install -e .
Python version: 3.8.13
CUDA/cuDNN version: 11.6
GPU models and configuration: NVIDIA GeForce RTX 2060 SUPER
Any other relevant information: It worked on a previous commit

Additional context

Error message from compiler:

    /home/jonno/xformers/third_party/cutlass/include/cutlass/epilogue/threadblock/default_epilogue_simt.h(350): error: namespace "cutlass::gemm::warp" has no member "WarpSize"

    /home/jonno/xformers/third_party/cutlass/include/cutlass/epilogue/threadblock/default_epilogue_simt.h(350): error: type name is not allowed

    /home/jonno/xformers/third_party/cutlass/include/cutlass/epilogue/threadblock/default_epilogue_simt.h(350): error: the global scope has no "value"

    3 errors detected in the compilation of "/home/jonno/xformers/xformers/csrc/attention/cuda/fmha/attention_forward_generic.cu".
    /home/jonno/anaconda3/envs/dyn/lib/python3.8/site-packages/torch/utils/cpp_extension.py:820: UserWarning: There are no g++ version bounds defined for CUDA version 11.6
      warnings.warn(f'There are no {compiler_name} version bounds defined for CUDA version {cuda_str_version}')
    error: command '/usr/local/cuda/bin/nvcc' failed with exit code 255
    [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.

opened by JonnoFTW 3

Releases(v0.0.13)

v0.0.13(Sep 26, 2022)

Lots of improvements and bug fixes around the memory efficient attention.
Source code(tar.gz)
Source code(zip)
v0.0.12(Aug 8, 2022)
[0.0.12] - 2022-08-08

Fixed

Removed duplicated biases in the FusedMLP layers [#317]

Rotary embeddings respecting input types [#326]

Poolformer style instantiating useless projection layers [#349]

Fix layer position not being properly tracked, causing extra layernorms for programmatic xformers [#348]

Pass use_triton flag to LayerNorm module [#336]

Added

Four blocksparsity layouts from DeepSpeed [#320]

Support several initialization options [#312]

Conv2DFeedforward feedforward part [#321]

VisualAttention [#329]

Automatic blocksparse for causal attention [#334]

Better hierarchical transformer generation [#345]

Fused operations with AOTAutograd/NVFuser, integration into MLP [#357]

Refactor LRA code to use Pytorch Lightning [#343]

Source code(tar.gz)
Source code(zip)
v0.0.11(May 30, 2022)
[0.0.11] - 2022-05-30

Fixed

Fix some torchscriptability [#246]

Fix FourierMix being compatible with AMP [#258]

Better asserts on QKV dimensions [#264]

Better perfs for FusedMLP and FusedLinearLayer [#283]

Deepnorm init missing self-attention [#284]

Added

Simplicial Embeddings [#259]

Mem efficient attention, FW pass [#267]

MHA benchmark

MLP benchmark

Move all triton kernels to triton v2 [#272]

Mem efficient attention, BW pass [#281]

Metaformer support [#294]

Source code(tar.gz)
Source code(zip)
v0.0.10(Mar 15, 2022)
Fixed

Expose bias flag for feedforwards, same default as Timm [#220]

Update eps value for layernormm, same default as torch [#221]

PreNorm bugfix, only one input was normalized [#233]

Added

Add DeepNet (DeepNorm) residual path and init [#227]

Source code(tar.gz)
Source code(zip)
v0.0.9(Feb 9, 2022)
Added

Compositional Attention [#41]

Experimental Ragged attention [#189]

Mixture of Experts [#181]

BlockSparseTensor [#202]

nd-tensor support for triton softmax [#210]

Fixed

bugfix Favor, single feature map [#183]

sanity check blocksparse settings [#207]

fixed some pickability [#204]

Source code(tar.gz)
Source code(zip)
v0.0.8(Jan 7, 2022)
Fixed

Much faster fused dropout

Fused dropout repeatability

Added

Embedding weight tying option

Source code(tar.gz)
Source code(zip)
v0.0.7(Nov 30, 2021)
Fixed

Dropout setting not properly passed in many attentions

Source code(tar.gz)
Source code(zip)
v0.0.6(Nov 24, 2021)
Fixed

Fix self attention optimization not being triggered, broken residual path [#119]

Improve speed by not using contiguous Tensors when not needed [#119]

Added

Attention mask wrapper [#113]

ViT comparison benchmark [#117]

Source code(tar.gz)
Source code(zip)
v0.0.5(Nov 18, 2021)

fixing the 0.0.4 pip package, next release will be better in that we'll try to expose pre-built binaries
Source code(tar.gz)
Source code(zip)
v0.0.4(Nov 17, 2021)
Fixing causality not being respected by the scaled dot product attention

Fixing Favor causal trainability

Enabling FusedLayerNorm by default if Triton is available

Fixing Favor with fp16

Source code(tar.gz)
Source code(zip)
v0.03(Nov 5, 2021)

["0.0.3"]
Source code(tar.gz)
Source code(zip)
v0.0.2(Nov 1, 2021)
[0.0.2] - 2021-11-01

Fixed

More robust blocksparse [#24]

Added

Rotary embeddings [#32]

More flexible layernorm [#50]

More flexible blockfactory config (key deduplication)

Source code(tar.gz)
Source code(zip)

xFormers is a modular and field agnostic library to flexibly generate transformer architectures by interoperable and optimized building blocks.

Related tags

Overview

Description

Getting started

Installation

Directly from the pip package

Build from source (dev mode)

Sparse attention kernels

Triton

Testing the installation

Using xFormers

Transformers key concepts

Repo map

Key Features

FAQ ?

License

Citing xFormers

Comments

What does this PR do?

Before submitting

PR review

🚀 Feature

Motivation

Pitch

Alternatives

Additional context

What does this PR do?

Before submitting

PR review

🐛 Bug

To Reproduce

Error Trace

Expected behavior

Environment

Additional context

What does this PR do?

Before submitting

PR review

What does this PR do?

Before submitting

PR review

What does this PR do?

Before submitting

PR review

❓ Questions and Help

❓ Questions and Help

🐛 Bug

Command

Expected behavior

❓ Questions and Help

🚀 Feature

Motivation

Pitch

Alternatives

🐛 Bug

Command

ERROR INFO

Environment

🐛 Bug

Command

To Reproduce

Expected behavior

Environment

Additional context

Releases(v0.0.13)

v0.0.13(Sep 26, 2022)

v0.0.12(Aug 8, 2022)

[0.0.12] - 2022-08-08

Fixed

Added

v0.0.11(May 30, 2022)

[0.0.11] - 2022-05-30

Fixed

Added

v0.0.10(Mar 15, 2022)

Fixed

Added

v0.0.9(Feb 9, 2022)

Added