xFormers is a modular and field agnostic library to flexibly generate transformer architectures by interoperable and optimized building blocks.

Overview

PyPI PyPI - License Documentation Status CircleCI PRs Welcome

Description

xFormers is a modular and field agnostic library to flexibly generate transformer architectures by interoperable and optimized building blocks.

Getting started

The full documentation contains instructions for getting started, deep dives and tutorials about the various APIs. If in doubt, please check out the HOWTO. Only some general considerations are laid out in the README.

Installation

To install xFormers, it is recommended to use a dedicated virtual environment, as often with python, through python-virtualenv or conda for instance. There are two ways you can install it:

Directly from the pip package

You can also fetch the latest release from PyPi. This will not contain the wheels for the sparse attention kernels, for which you will need to build from source.

conda create --name xformer_env
conda activate xformer_env
pip install xformers

Build from source (dev mode)

These commands will fetch the latest version of the code, create a dedicated conda environment, activate it then install xFormers from source. If you want to build the sparse attention CUDA kernels, please make sure that the next point is covered prior to running these instructions.

git clone [email protected]:fairinternal/xformers.git
conda create --name xformer_env python=3.8
conda activate xformer_env
cd xformers
pip install -r requirements.txt
pip install -e .

Sparse attention kernels

Installing the CUDA-based sparse attention kernels may require extra care, as this mobilizes the CUDA toolchain. As a reminder, these kernels are built when you run pip install -e . and the CUDA buildchain is available (NVCC compiler). Re-building can for instance be done via python3 setup.py clean && python3 setup.py develop, so similarly wipe the build folder and redo a pip install -e.

Some advices related to building these CUDA-specific components, tentatively adressing common pitfalls. Please make sure that:

  • NVCC and the current CUDA runtime match. Depending on your setup, you may be able to change the CUDA runtime with module unload cuda module load cuda/xx.x, possibly also nvcc
  • the version of GCC that you're using matches the current NVCC capabilities
  • the TORCH_CUDA_ARCH_LIST env variable is set to the architures that you want to support. A suggested setup (slow to build but comprehensive) is export TORCH_CUDA_ARCH_LIST="6.0;6.1;6.2;7.0;7.2;8.0;8.6"

Triton

Some parts of xFormers use Triton, and will only expose themselves if Triton is installed, and a compatible GPU is present (nVidia GPU with tensor cores). If Triton was not installed as part of the testing procedure, you can install it directly by running pip install triton. You can optionally test that the installation is successful by running one of the Triton-related benchmarks, for instance python3 xformers/benchmarks/benchmnark_triton_softmax.py

Triton will cache the compiled kernels to /tmp/triton by default. If this becomes an issue, this path can be specified through the TRITON_CACHE_DIR environment variable.

Testing the installation

This will run a benchmark of the attention mechanisms exposed by xFormers, and generate a runtime and memory plot. If this concludes without errors, the installation is successful. This step is optional, and you will need some extra dependencies for it to be able to go through : pip install -r requirements-benchmark.txt.

Once this is done, you can run this particular benchmark as follows:

python3 xformers/benchmarks/benchmark_encoder.py --activations relu  --plot -emb 256 -bs 32 -heads 16

Using xFormers

Transformers key concepts

Let's start from a classical overview of the Transformer architecture (illustration from Lin et al,, "A Survey of Transformers")

You'll find the key repository boundaries in this illustration: a Transformer is generally made of a collection of attention mechanisms, embeddings to encode some positional information, feed-forward blocks and a residual path (typically referred to as pre- or post- layer norm). These boundaries do not work for all models, but we found in practice that given some accomodations it could capture most of the state of the art.

Models are thus not implemented in monolithic files, which are typically complicated to handle and modify. Most of the concepts present in the above illustration correspond to an abstraction level, and when variants are present for a given sub-block it should always be possible to select any of them. You can focus on a given encapsulation level and modify it as needed.

Repo map

├── components                  # Parts zoo, any of which can be used directly
│   └── attention
│        └ ...                  # all the supported attentions
│   └── feedforward             #
│        └ ...                  # all the supported feedforwards
│   └─- positional_embedding    #
│        └ ...                  # all the supported positional embeddings
│   ├── activations.py          #
│   └── multi_head_dispatch.py  # (optional) multihead wrap
d├── factory
│   ├── block_factory.py        # (optional) helper to programatically generate layers
│   └── model_factory.py        # (optional) helper to programatically generate models
├── models
...                             # Full models, ready to be used
Attention mechanisms

Feed forward mechanisms

Positional embedding

Key Features

  1. Many attention mechanisms, interchangeables
  2. Optimized building blocks, beyond PyTorch primitives
    1. sparse attention
    2. block-sparse attention
    3. fused softmax
    4. fused linear layer
    5. fused layer norm
  3. Benchmarking and testing tools
    1. micro benchnmarks
    2. transformer block benchmark
    3. LRA, with SLURM suppot
  4. Programatic and sweep friendly layer and model construction
  5. Hackable
    1. Not using monolithic CUDA kernels, composable building blocks
    2. Using Triton for some optimized parts, explicit, pythonic and user-accessible

FAQ ?

We've tried to collect a relatively exhaustive list of explanations in the HOWTO

License

xFormers has a BSD-style license, as found in the LICENSE file.

Citing xFormers

If you use xFormers in your publication, please cite it by using the following BibTeX entry.

@Misc{xFormers2021,
  author =       {Benjamin Lefaudeux, Francisco Massa, Diana Liskovich, Min Xu, Jieru Hu, Marta Tintore, Susan Zhang },
  title =        {xFormers: A modular and hackable Transformer modelling library},
  howpublished = {\url{https://github.com/facebookresearch/xformers}},
  year =         {2021}
}
Comments
  • [feat] Dropout(Activation(x+bias)), now with partial BW fusion

    [feat] Dropout(Activation(x+bias)), now with partial BW fusion

    What does this PR do?

    This was a long time in the making.. Fusing the BW part of the activation/bias/dropout kernel. Not quite perfect but in some places the speed goes really bananas (like x3 or x4 the naive calls). Fusing this implied flipping the whole problem upside down, basically the seeds have to be per collum, and the kernels (fw and bw) also work that way. This allows us to fuse the bias gradient computations, since it's a sum over that direction

    TODO:

    • [x] add more unit tests to check that the dropout drops are respected on average
    • [x] possibly make sure that the rand mask does not repeat (may or may not be a big deal). Ok this is doable by making the kernels cooperate on the same col, like Phil does on LayerNorm
    • [x] improve on the scheduling for small buffers
    • [x] Fix the atomic add funkiness (works for now but this does not look completely right, num_warps dependent)

    Before submitting

    • [x] Did you have fun?
      • Make sure you had fun coding 🙃
    • [x] Did you read the contributor guideline?
    • [x] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
      • [ ] N/A
    • [ ] Did you make sure to update the docs?
      • [ ] N/A
    • [x] Did you write any new necessary tests?
      • [ ] N/A
    • [x] Did you update the changelog? (if needed)
      • [ ] N/A

    PR review

    Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

    CLA Signed 
    opened by blefaudeux 26
  • Support Windows and ideally build wheels for it

    Support Windows and ideally build wheels for it

    🚀 Feature

    Supporting Windows in xformers.

    Motivation

    xformers provide excellent tools to increase the speed of inference, for example close to 2x in Stable Diffusion. Sadly, it lacks Windows support. This has barred us from using it on https://github.com/AUTOMATIC1111/stable-diffusion-webui as most users and developers (including myself) use Windows.

    Pitch

    Currently, xformers will fail to compile on Windows for a multitude of errors, some of which are trivial but most are not. Enabling Windows usage by fixing these errors and ideally distributing Windows wheels would allow projects to make xformers a necessary requirement & use it.

    Alternatives

    Additional context

    cc. @fmassa

    opened by C43H66N12O12S2 23
  • triton 2.0 changes

    triton 2.0 changes

    What does this PR do?

    Fixes triton to work with version 2.0.0.

    TODOs:

    • [x] Move the syntax to triton2
    • [x] Fix fused dropout
    • [ ] Fix the blocksparse op API having changed
    • [x] Fix fused linear layer
    • [x] Update the benchmarks

    Before submitting

    • [x] Did you have fun?
      • Make sure you had fun coding 🙃
    • [ ] Did you read the contributor guideline?
    • [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
      • [ ] N/A
    • [ ] Did you make sure to update the docs?
      • [ ] N/A
    • [ ] Did you write any new necessary tests?
      • [ ] N/A
    • [ ] Did you update the changelog? (if needed)
      • [ ] N/A

    PR review

    Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

    CLA Signed 
    opened by kashif 23
  • Pip installation fails, `CUTCLASS` not found

    Pip installation fails, `CUTCLASS` not found

    🐛 Bug

    pip installation fails in a docker container, CUTCLASS not found, git submodule update --init --recursive not executed

    To Reproduce

    Dockerfile

    FROM pytorch/pytorch:1.12.1-cuda11.3-cudnn8-runtime
    RUN pip install xformers
    

    then

    docker build .
    

    Error Trace

    open
    #1 [internal] load build definition from Dockerfile
    #1 sha256:bc3772a9760c6470030d3506e7afa0b9caa2a77f63376fe30fc296a334d5c980
    #1 transferring dockerfile: 116B done
    #1 DONE 0.0s
    
    #2 [internal] load .dockerignore
    #2 sha256:5b674e66e988c8852edbf605c0d0921ac6eed40841cd55d9112e0d92242091a1
    #2 transferring context: 2B done
    #2 DONE 0.0s
    
    #3 [internal] load metadata for docker.io/pytorch/pytorch:1.12.1-cuda11.3-cudnn8-runtime
    #3 sha256:409f78a4f3551ef4b6d7a4b064ff72bb54f0677d599351b4d0dcdff08b926834
    #3 DONE 0.8s
    
    #4 [1/2] FROM docker.io/pytorch/pytorch:[email protected]:0bc0971dc8ae319af610d493aced87df46255c9508a8b9e9bc365f11a56e7b75
    #4 sha256:2e3e89abd93f2e7b42b070196f0e6be4ce38a2d360c98232440e1d90189bdb02
    #4 CACHED
    
    #5 [2/2] RUN pip install xformers
    #5 sha256:ef3133015f56a22d509f2aa1ef730afdcaa2591838105ba332650ff73ceb9ff9
    #5 1.012 Collecting xformers
    #5 1.313   Downloading xformers-0.0.13.tar.gz (292 kB)
    #5 1.429      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 292.5/292.5 kB 2.6 MB/s eta 0:00:00
    #5 1.534   Preparing metadata (setup.py): started
    #5 2.952   Preparing metadata (setup.py): finished with status 'error'
    #5 2.961   error: subprocess-exited-with-error
    #5 2.961
    #5 2.961   × python setup.py egg_info did not run successfully.
    #5 2.961   │ exit code: 1
    #5 2.961   ╰─> [8 lines of output]
    #5 2.961       Traceback (most recent call last):
    #5 2.961         File "<string>", line 36, in <module>
    #5 2.961         File "<pip-setuptools-caller>", line 34, in <module>
    #5 2.961         File "/tmp/pip-install-94ty405p/xformers_31debcecca1f46019eadae6eead5cc3f/setup.py", line 239, in <module>
    #5 2.961           ext_modules=get_extensions(),
    #5 2.961         File "/tmp/pip-install-94ty405p/xformers_31debcecca1f46019eadae6eead5cc3f/setup.py", line 158, in get_extensions
    #5 2.961           "CUTLASS submodule not found. Did you forget "
    #5 2.961       RuntimeError: CUTLASS submodule not found. Did you forget to run `git submodule update --init --recursive` ?
    #5 2.961       [end of output]
    #5 2.961
    #5 2.961   note: This error originates from a subprocess, and is likely not a problem with pip.
    #5 2.965 error: metadata-generation-failed
    #5 2.965
    #5 2.965 × Encountered error while generating package metadata.
    #5 2.965 ╰─> See above for output.
    #5 2.965
    #5 2.965 note: This is an issue with the package mentioned above, not pip.
    #5 2.965 hint: See above for details.
    #5 ERROR: executor failed running [/bin/sh -c pip install xformers]: exit code: 1
    ------
     > [2/2] RUN pip install xformers:
    ------
    executor failed running [/bin/sh -c pip install xformers]: exit code: 1
    

    Expected behavior

    installation should work.

    Environment

    in the container, running docker on windows

    open
    PyTorch version: 1.12.1
    Is debug build: False
    CUDA used to build PyTorch: 11.3
    ROCM used to build PyTorch: N/A 
    
    OS: Ubuntu 18.04.6 LTS (x86_64) 
    GCC version: Could not collect  
    Clang version: Could not collect
    CMake version: Could not collect
    Libc version: glibc-2.17
    
    Python version: 3.7.13 (default, Mar 29 2022, 02:18:16)  [GCC 7.5.0] (64-bit runtime)
    Python platform: Linux-5.10.16.3-microsoft-standard-WSL2-x86_64-with-debian-buster-sid
    Is CUDA available: True
    CUDA runtime version: Could not collect
    GPU models and configuration: GPU 0: NVIDIA GeForce GTX 1060
    Nvidia driver version: 517.48
    cuDNN version: Could not collect
    HIP runtime version: N/A
    MIOpen runtime version: N/A
    Is XNNPACK available: True
    
    Versions of relevant libraries:
    [pip3] numpy==1.21.5
    [pip3] torch==1.12.1
    [pip3] torchtext==0.13.1
    [pip3] torchvision==0.13.1
    [conda] blas                      1.0                         mkl
    [conda] cudatoolkit               11.3.1               ha36c431_9    nvidia
    [conda] ffmpeg                    4.3                  hf484d3e_0    pytorch
    [conda] mkl-service               2.4.0            py37h7f8727e_0
    [conda] mkl_fft                   1.3.1            py37hd3c417c_0
    [conda] mkl_random                1.2.2            py37h51133e4_0
    [conda] numpy                     1.21.5           py37he7a7128_2
    [conda] numpy-base                1.21.5           py37hf524024_2
    [conda] pytorch                   1.12.1          py3.7_cuda11.3_cudnn8.3.2_0    pytorch
    [conda] pytorch-mutex             1.0                        cuda    pytorch
    [conda] torchtext                 0.13.1                     py37    pytorch
    [conda] torchvision               0.13.1               py37_cu113    pytorch
    

    Additional context

    I don't think this problem has anything to do with os/python/pytorch/cuda/nvcc versions, the setup.py seems to be tailored for local / manual install, and fails in the context.

    opened by AbdBarho 20
  • [feat] add split_dim arg to reversible, remove retain_grad, add benchmark_reversible

    [feat] add split_dim arg to reversible, remove retain_grad, add benchmark_reversible

    This PR removes the repeated chunk and cat operations in xformers' RevNet code. This way, the RevNet implementation will become a little bit faster.
    I'd strongly recommend calling a library like MemCNN or RevLib directly as they make it easier to switch the coupling function and generally give the user more freedom.

    Unfortunately, I can't sign the CLA at the moment, as it keeps saying

    Sorry, something went wrong. We're working on getting this fixed as soon as we can.

    CLA Signed 
    opened by ClashLuke 20
  • Added SmeLU

    Added SmeLU

    What does this PR do?

    Fixes #262 .

    Before submitting

    • [x] Did you have fun?
      • Make sure you had fun coding 🙃
    • [x] Did you read the contributor guideline?
    • [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
      • [x] N/A
    • [ ] Did you make sure to update the docs?
      • [ ] N/A
    • [ ] Did you write any new necessary tests?
      • [ ] N/A
    • [ ] Did you update the changelog? (if needed)
      • [ ] N/A

    PR review

    Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

    CLA Signed 
    opened by kashif 17
  • [chore] release v0.0.13

    [chore] release v0.0.13

    What does this PR do?

    bump the dev version number to be able to release v0.0.13, see #402

    Before submitting

    • [ ] Did you have fun?
      • Make sure you had fun coding 🙃
    • [ ] Did you read the contributor guideline?
    • [ ] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
      • [ ] N/A
    • [ ] Did you make sure to update the docs?
      • [ ] N/A
    • [ ] Did you write any new necessary tests?
      • [ ] N/A
    • [ ] Did you update the changelog? (if needed)
      • [ ] N/A

    PR review

    Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

    CLA Signed 
    opened by blefaudeux 14
  • [feat] Compositional attention

    [feat] Compositional attention

    What does this PR do?

    Implements Compositional Attention (based on the reference implementation), as mentioned in https://github.com/facebookresearch/xformers/issues/41

    Paper

    TODOs

    • [x] Sane defaults
    • [x] Speedup wherever possible. Looks like it takes a lot of memory also at the moment, probably some dummy mistakes
    • [x] Maybe self-attention optimization (single proj) -> doable if moving the projections within the attention to the inproj class, worth it?
    • [x] Add a lot of explanations/documentations
    • [0] Some IR results ? -> that would be for another task probably ?

    cc @sarthmit if interested

    Before submitting

    • [x] Did you have fun?
      • Make sure you had fun coding 🙃
    • [x] Did you read the contributor guideline?
    • [x] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
      • [ ] N/A
    • [x] Did you make sure to update the docs?
      • [ ] N/A
    • [x] Did you write any new necessary tests?
      • [ ] N/A
    • [x] Did you update the changelog? (if needed)
      • [ ] N/A

    PR review

    Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

    CLA Signed 
    opened by blefaudeux 14
  • Is xformers still not support cuda 12.0?

    Is xformers still not support cuda 12.0?

    ❓ Questions and Help

    I got the following error while installing...

    Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com/ Obtaining file:///F:/Stable_Diffusion/stable-diffusion-webui-master/repositories/xformers Preparing metadata (setup.py) ... error error: subprocess-exited-with-error

    × python setup.py egg_info did not run successfully. │ exit code: 1 ╰─> [9 lines of output] No CUDA runtime is found, using CUDA_HOME='C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.0' Traceback (most recent call last): File "", line 2, in File "", line 34, in File "F:\Stable_Diffusion\stable-diffusion-webui-master\repositories\xformers\setup.py", line 293, in symlink_package( File "F:\Stable_Diffusion\stable-diffusion-webui-master\repositories\xformers\setup.py", line 83, in symlink_package os.symlink(src=path_from, dst=path_to) OSError: [WinError 1314] A required privilege is not held by the client: 'F:\Stable_Diffusion\stable-diffusion-webui-master\repositories\xformers\third_party\flash-attention\flash_attn' -> 'F:\Stable_Diffusion\stable-diffusion-webui-master\repositories\xformers\xformers_flash_attn' [end of output]

    note: This error originates from a subprocess, and is likely not a problem with pip. error: metadata-generation-failed

    × Encountered error while generating package metadata. ╰─> See above for output.

    Is it for cuda 12. Should I degrade the cuda version?

    or what is the problem, can anyone help?

    opened by debdip 13
  • Cannot install xformers on linux server

    Cannot install xformers on linux server

    ❓ Questions and Help

    When I tried either pip install or build from source, I get this issue:

     × python setup.py egg_info did not run successfully.
      │ exit code: 1
      ╰─> [18 lines of output]
          Traceback (most recent call last):
            File "<string>", line 2, in <module>
            File "<pip-setuptools-caller>", line 34, in <module>
            File "/home/username/xformers/setup.py", line 239, in <module>
              ext_modules=get_extensions(),
            File "/home/username/xformers/setup.py", line 187, in get_extensions
              cuda_version = get_cuda_version(CUDA_HOME)
            File "/home/username/xformers/setup.py", line 51, in get_cuda_version
              raw_output = subprocess.check_output([nvcc_bin, "-V"], universal_newlines=True)
            File "/home/username/anaconda3/envs/test_env/lib/python3.9/subprocess.py", line 424, in check_output
              return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
            File "/home/username/anaconda3/envs/test_env/lib/python3.9/subprocess.py", line 505, in run
              with Popen(*popenargs, **kwargs) as process:
            File "/home/username/anaconda3/envs/test_env/lib/python3.9/subprocess.py", line 951, in __init__
              self._execute_child(args, executable, preexec_fn, close_fds,
            File "/home/username/anaconda3/envs/test_env/lib/python3.9/subprocess.py", line 1821, in _execute_child
              raise child_exception_type(errno_num, err_msg, err_filename)
          FileNotFoundError: [Errno 2] No such file or directory: '/home/username/anaconda3/envs/test_env/bin/nvcc'
          [end of output]
    

    here's the output of nvcc --version

    nvcc: NVIDIA (R) Cuda compiler driver
    Copyright (c) 2005-2022 NVIDIA Corporation
    Built on Tue_Mar__8_18:18:20_PST_2022
    Cuda compilation tools, release 11.6, V11.6.124
    Build cuda_11.6.r11.6/compiler.31057947_0
    

    and as additional information, I was able to install pytorch the usual way/verify that cuda is available.

    opened by fedshyvana 13
  • Encoder decoder arch doesnt work when sequence lengths are different

    Encoder decoder arch doesnt work when sequence lengths are different

    🐛 Bug

    I get an error when the sequence lengths to the encoder and decoder are different, e.g. in the code snippet below:

    Command

    EMB = 384
    SEQ_ENC = 128
    SEQ_DEC = 64
    BATCH = 16
    VOCAB = 64
    
    my_config = [
        # A list of the encoder or decoder blocks which constitute the Transformer.
        # Note that a sequence of different encoder blocks can be used, same for decoders
        {
            "reversible": False,  # Optionally make these layers reversible, to save memory
                "block_type": "encoder",
                "num_layers": 3,  # Optional, this means that this config will repeat N times
                "dim_model": EMB,
                "layer_norm_style": "pre",  # Optional, pre/post
                "position_encoding_config": {
                    "name": "vocab",  # whatever position encodinhg makes sense
                    "seq_len": SEQ_ENC,
                    "vocab_size": VOCAB,
                },
                "multi_head_config": {
                    "num_heads": 4,
                    "residual_dropout": 0,
                    "attention": {
                        "name": "linformer",  # whatever attention mechanism
                        "dropout": 0,
                        "causal": False,
                        "seq_len": SEQ_ENC,
                    },
                },
                "feedforward_config": {
                    "name": "MLP",
                    "dropout": 0,
                    "activation": "relu",
                    "hidden_layer_multiplier": 4,
                },
            },
        {
            "reversible": False,  # Optionally make these layers reversible, to save memory
    
                "block_type": "decoder",
                "num_layers": 3,  # Optional, this means that this config will repeat N times
                "dim_model": EMB,
                "layer_norm_style": "pre",  # Optional, pre/post
                "position_encoding_config": {
                    "name": "vocab",  # whatever position encodinhg makes sense
                    "seq_len": SEQ_DEC,
                    "vocab_size": VOCAB,
                },
                "multi_head_config_masked": {
                    "num_heads": 4,
                    "residual_dropout": 0,
                    "attention": {
                        "name": "nystrom",  # whatever attention mechanism
                        "dropout": 0,
                        "causal": True,
                        "seq_len": SEQ_DEC,
                    },
                },
                "multi_head_config_cross": {
                    "num_heads": 4,
                    "residual_dropout": 0,
                    "attention": {
                        "name": "favor",  # whatever attention mechanism
                        "dropout": 0,
                        "causal": True,
                        "seq_len": SEQ_DEC,
                    },
                },
                "feedforward_config": {
                    "name": "MLP",
                    "dropout": 0,
                    "activation": "relu",
                    "hidden_layer_multiplier": 4,
                },
            },
    ]
    
    # This part of xFormers is entirely type checked and needs a config object,
    # could be changed in the future
    config = xFormerConfig(my_config)
    model = xFormer.from_config(config)
    
    #  Test out with dummy inputs
    src = (torch.rand((BATCH, SEQ_ENC)) * VOCAB).abs().to(torch.int)
    tgt = (torch.rand((BATCH, SEQ_DEC)) * VOCAB).abs().to(torch.int)
    y = model(src=src, tgt=tgt)
    
    print(y.shape)
    

    Expected behavior

    torch.Size([16, 64, 384])
    

    however, I get:

    RuntimeError: einsum(): operands do not broadcast with remapped shapes [original->remapped]: [64, 128, 96, 96]->[64, 128, 96, 96] [64, 64, 96]->[64, 64, 1, 96]
    
    ongoing 
    opened by kashif 13
  • How to set random seeds fixed

    How to set random seeds fixed

    ❓ Questions and Help

    Different results occur when I run the same code twice. And the set_seed func run before all.

    def set_seed(seed):
        random.seed(seed)  # Python random module.
        np.random.seed(seed)  # Numpy module.
        set_seed(seed)
        os.environ['PYTHONHASHSEED'] = str(seed)
        torch.random.manual_seed(seed)
        torch.manual_seed(seed)
        torch.cuda.manual_seed(seed)
        torch.cuda.manual_seed_all(seed)  # if you are using multi-GPU.
        torch.backends.cudnn.benchmark = cudnn_benchmark
        torch.backends.cudnn.deterministic = cudnn_deterministic
    
    opened by scp92 1
  • Allowing decoder only definition

    Allowing decoder only definition

    🚀 Feature

    Allow only a decoder config to be defined.

    Motivation

    I want to define only a decoder and pass in a memory vector from another source.

    Pitch

    I tried this change locally and it allows me to do what I want it to do: https://github.com/facebookresearch/xformers/compare/main...nh2liu:xformers:patch-1

    Not sure if this has extending implications because it seems this code has been around for a while but the comments # If decoder: either use the encoder ouput, or just decode, both options are possible indicate that this may be a bug.

    Alternatives

    • NOOP encoder will also allow this functionality.
    opened by nh2liu 2
  • build from source failed

    build from source failed

    🐛 Bug

    Command

    pip install ninja pip install -v -U git+https://github.com/facebookresearch/[email protected]#egg=xformers

    ERROR INFO

    /tmp/pip-install-zruba6fo/xformers_c3f4b10bded7460eaa800569194ec7d2/xformers/csrc/attention/cuda/fmha/attention.cu:650:66:   required from ‘void _GLOBAL__N__7fac2228_12_attention_cu_724ba955_12677::launch_attention(at::Tensor&, at::Tensor&, const at::Tensor&, const at::Tensor&, const at::Tensor&, const at::Tensor&, float, at::PhiloxCudaState) [with bool compute_logsumexp = true]’
    /tmp/pip-install-zruba6fo/xformers_c3f4b10bded7460eaa800569194ec7d2/xformers/csrc/attention/cuda/fmha/attention.cu:793:92:   required from here
    /tmp/pip-install-zruba6fo/xformers_c3f4b10bded7460eaa800569194ec7d2/xformers/csrc/attention/cuda/fmha/attention.cu:612:58: warning: ‘at::GenericPackedTensorAccessor<T, N, PtrTraits, index_t> at::Tensor::packed_accessor() const & [with T = float; long unsigned int N = 3; PtrTraits = at::DefaultPtrTraits; index_t = long int]’ is deprecated: packed_accessor is deprecated, use packed_accessor32 or packed_accessor64 instead [-Wdeprecated-declarations]
      612 |     return attn_bias.packed_accessor<scalar_t, 3>();
          |                                                          ^
    /usr/local/lib/python3.8/dist-packages/torch/include/ATen/core/TensorBody.h:247:1: note: declared here
      247 |   GenericPackedTensorAccessor<T,N,PtrTraits,index_t> packed_accessor() const & {
          | ^ ~~~~~~~~~~~~~
    ninja: build stopped: subcommand failed.
    Traceback (most recent call last):
      File "/usr/local/lib/python3.8/dist-packages/torch/utils/cpp_extension.py", line 1901, in _run_ninja_build
        subprocess.run(
      File "/usr/lib/python3.8/subprocess.py", line 516, in run
        raise CalledProcessError(retcode, process.args,
    subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
    
    The above exception was the direct cause of the following exception:
    
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-install-zruba6fo/xformers_c3f4b10bded7460eaa800569194ec7d2/setup.py", line 301, in <module>
        setuptools.setup(
      File "/usr/local/lib/python3.8/dist-packages/setuptools/__init__.py", line 153, in setup
        return distutils.core.setup(**attrs)
      File "/usr/lib/python3.8/distutils/core.py", line 148, in setup
        dist.run_commands()
      File "/usr/lib/python3.8/distutils/dist.py", line 966, in run_commands
        self.run_command(cmd)
      File "/usr/lib/python3.8/distutils/dist.py", line 985, in run_command
        cmd_obj.run()
      File "/usr/local/lib/python3.8/dist-packages/setuptools/command/install.py", line 68, in run
        return orig.install.run(self)
      File "/usr/lib/python3.8/distutils/command/install.py", line 589, in run
        self.run_command('build')
      File "/usr/lib/python3.8/distutils/cmd.py", line 313, in run_command
        self.distribution.run_command(command)
      File "/usr/lib/python3.8/distutils/dist.py", line 985, in run_command
        cmd_obj.run()
      File "/usr/lib/python3.8/distutils/command/build.py", line 135, in run
        self.run_command(cmd_name)
      File "/usr/lib/python3.8/distutils/cmd.py", line 313, in run_command
        self.distribution.run_command(command)
      File "/usr/lib/python3.8/distutils/dist.py", line 985, in run_command
        cmd_obj.run()
      File "/usr/local/lib/python3.8/dist-packages/setuptools/command/build_ext.py", line 79, in run
        _build_ext.run(self)
      File "/usr/local/lib/python3.8/dist-packages/Cython/Distutils/old_build_ext.py", line 186, in run
        _build_ext.build_ext.run(self)
      File "/usr/lib/python3.8/distutils/command/build_ext.py", line 340, in run
        self.build_extensions()
      File "/usr/local/lib/python3.8/dist-packages/torch/utils/cpp_extension.py", line 843, in build_extensions
        build_ext.build_extensions(self)
      File "/usr/local/lib/python3.8/dist-packages/Cython/Distutils/old_build_ext.py", line 195, in build_extensions
        _build_ext.build_ext.build_extensions(self)
      File "/usr/lib/python3.8/distutils/command/build_ext.py", line 449, in build_extensions
        self._build_extensions_serial()
      File "/usr/lib/python3.8/distutils/command/build_ext.py", line 474, in _build_extensions_serial
        self.build_extension(ext)
      File "/usr/local/lib/python3.8/dist-packages/setuptools/command/build_ext.py", line 202, in build_extension
        _build_ext.build_extension(self, ext)
      File "/usr/lib/python3.8/distutils/command/build_ext.py", line 528, in build_extension
        objects = self.compiler.compile(sources,
      File "/usr/local/lib/python3.8/dist-packages/torch/utils/cpp_extension.py", line 658, in unix_wrap_ninja_compile
        _write_ninja_file_and_compile_objects(
      File "/usr/local/lib/python3.8/dist-packages/torch/utils/cpp_extension.py", line 1573, in _write_ninja_file_and_compile_objects
        _run_ninja_build(
      File "/usr/local/lib/python3.8/dist-packages/torch/utils/cpp_extension.py", line 1917, in _run_ninja_build
        raise RuntimeError(message) from e
    RuntimeError: Error compiling objects for extension
    Running setup.py install for xformers: finished with status 'error'
    

    ERROR: Command errored out with exit status 1: /usr/bin/python -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-zruba6fo/xformers_c3f4b10bded7460eaa800569194ec7d2/setup.py'"'"'; file='"'"'/tmp/pip-install-zruba6fo/xformers_c3f4b10bded7460eaa800569194ec7d2/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(file) if os.path.exists(file) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' install --record /tmp/pip-record-lj6j_c0s/install-record.txt --single-version-externally-managed --user --prefix= --compile --install-headers /home/oppoer/.local/include/python3.8/xformers Check the logs for full command output. WARNING: You are using pip version 21.2.4; however, version 22.3.1 is available. You should consider upgrading via the '/usr/bin/python -m pip install --upgrade pip' command.

    Environment

    My docker image is: nvcr.io/nvidia/pytorch:22.06-py3

    Collecting environment information... PyTorch version: 1.13.0a0+936e930 Is debug build: False CUDA used to build PyTorch: 11.8 ROCM used to build PyTorch: N/A

    OS: Ubuntu 20.04.5 LTS (x86_64) GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 Clang version: Could not collect CMake version: version 3.24.1 Libc version: glibc-2.31

    Python version: 3.8.10 (default, Jun 22 2022, 20:18:18) [GCC 9.4.0] (64-bit runtime) Python platform: Linux-3.10.0-957.27.2.el7.x86_64-x86_64-with-glibc2.29 Is CUDA available: True CUDA runtime version: 11.8.89 GPU models and configuration: GPU 0: NVIDIA A100-SXM4-80GB Nvidia driver version: 470.129.06 cuDNN version: Probably one of the following: /usr/lib/x86_64-linux-gnu/libcudnn.so.8.7.0 /usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.7.0 /usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.7.0 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.7.0 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.7.0 /usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.7.0 /usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.7.0 HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True

    Versions of relevant libraries: [pip3] functorch==1.13.0a0+936e930 [pip3] mypy-extensions==0.4.3 [pip3] numpy==1.22.2 [pip3] pytorch-quantization==2.1.2 [pip3] torch==1.13.0a0+936e930 [pip3] torch-tensorrt==1.3.0a0 [pip3] torchtext==0.13.0a0+fae8e8c [pip3] torchvision==0.15.0a0 [conda] Could not collect

    opened by GxjGit 7
  • Unable to Build from latest

    Unable to Build from latest

    🐛 Bug

    Command

    cd xformers
    git pull
    git submobule update --recursive --remote
    pip install -e .
    

    To Reproduce

    Steps to reproduce the behavior:

    1. pull latest from git, (at hash f82722f61f972c02ebc54431e3e4717f21b3e9b9)
    2. pull latest submodules
    3. build

    Expected behavior

    Building to run successfully

    Environment

    Collecting environment information...
    PyTorch version: 1.12.1+cu116
    Is debug build: False
    CUDA used to build PyTorch: 11.6
    ROCM used to build PyTorch: N/A
    
    OS: Ubuntu 20.04.5 LTS (x86_64)
    GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
    Clang version: 10.0.0-4ubuntu1
    CMake version: version 3.25.0
    Libc version: glibc-2.31
    
    Python version: 3.8.13 (default, Mar 28 2022, 11:38:47)  [GCC 7.5.0] (64-bit runtime)
    Python platform: Linux-5.10.102.1-microsoft-standard-WSL2-x86_64-with-glibc2.17
    Is CUDA available: True
    CUDA runtime version: 11.6.124
    GPU models and configuration: GPU 0: NVIDIA GeForce RTX 2060 SUPER
    Nvidia driver version: 526.47
    cuDNN version: Probably one of the following:
    /usr/lib/x86_64-linux-gnu/libcudnn.so.8.4.1
    /usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.4.1
    /usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.4.1
    /usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.4.1
    /usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.4.1
    /usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.4.1
    /usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.4.1
    HIP runtime version: N/A
    MIOpen runtime version: N/A
    Is XNNPACK available: True
    
    Versions of relevant libraries:
    [pip3] mypy-extensions==0.4.3
    [pip3] numpy==1.23.2
    [pip3] pytorch-lightning==1.7.5
    [pip3] torch==1.12.1+cu116
    [pip3] torchaudio==0.12.1+cu116
    [pip3] torchdynamo==1.12.0
    [pip3] torchmetrics==0.9.3
    [pip3] torchvision==0.13.1+cu116
    [conda] numpy                     1.23.2                   pypi_0    pypi
    [conda] pytorch-lightning         1.7.5                    pypi_0    pypi
    [conda] torch                     1.12.1+cu116             pypi_0    pypi
    [conda] torchaudio                0.12.1+cu116             pypi_0    pypi
    [conda] torchdynamo               1.12.0                   pypi_0    pypi
    [conda] torchmetrics              0.9.3                    pypi_0    pypi
    [conda] torchvision               0.13.1+cu116             pypi_0    pypi
    
    • PyTorch Version (e.g., 1.0): 1.12.1+cu116
    • OS (e.g., Linux): WSL
    • How you installed PyTorch (conda, pip, source): pip install -e .
    • Build command you used (if compiling from source): pip install -e .
    • Python version: 3.8.13
    • CUDA/cuDNN version: 11.6
    • GPU models and configuration: NVIDIA GeForce RTX 2060 SUPER
    • Any other relevant information: It worked on a previous commit

    Additional context

    Error message from compiler:

        /home/jonno/xformers/third_party/cutlass/include/cutlass/epilogue/threadblock/default_epilogue_simt.h(350): error: namespace "cutlass::gemm::warp" has no member "WarpSize"
    
        /home/jonno/xformers/third_party/cutlass/include/cutlass/epilogue/threadblock/default_epilogue_simt.h(350): error: type name is not allowed
    
        /home/jonno/xformers/third_party/cutlass/include/cutlass/epilogue/threadblock/default_epilogue_simt.h(350): error: the global scope has no "value"
    
        3 errors detected in the compilation of "/home/jonno/xformers/xformers/csrc/attention/cuda/fmha/attention_forward_generic.cu".
        /home/jonno/anaconda3/envs/dyn/lib/python3.8/site-packages/torch/utils/cpp_extension.py:820: UserWarning: There are no g++ version bounds defined for CUDA version 11.6
          warnings.warn(f'There are no {compiler_name} version bounds defined for CUDA version {cuda_str_version}')
        error: command '/usr/local/cuda/bin/nvcc' failed with exit code 255
        [end of output]
    
    note: This error originates from a subprocess, and is likely not a problem with pip.
    
    opened by JonnoFTW 3
Releases(v0.0.13)
  • v0.0.13(Sep 26, 2022)

  • v0.0.12(Aug 8, 2022)

    [0.0.12] - 2022-08-08

    Fixed

    • Removed duplicated biases in the FusedMLP layers [#317]
    • Rotary embeddings respecting input types [#326]
    • Poolformer style instantiating useless projection layers [#349]
    • Fix layer position not being properly tracked, causing extra layernorms for programmatic xformers [#348]
    • Pass use_triton flag to LayerNorm module [#336]

    Added

    • Four blocksparsity layouts from DeepSpeed [#320]
    • Support several initialization options [#312]
    • Conv2DFeedforward feedforward part [#321]
    • VisualAttention [#329]
    • Automatic blocksparse for causal attention [#334]
    • Better hierarchical transformer generation [#345]
    • Fused operations with AOTAutograd/NVFuser, integration into MLP [#357]
    • Refactor LRA code to use Pytorch Lightning [#343]
    Source code(tar.gz)
    Source code(zip)
  • v0.0.11(May 30, 2022)

    [0.0.11] - 2022-05-30

    Fixed

    • Fix some torchscriptability [#246]
    • Fix FourierMix being compatible with AMP [#258]
    • Better asserts on QKV dimensions [#264]
    • Better perfs for FusedMLP and FusedLinearLayer [#283]
    • Deepnorm init missing self-attention [#284]

    Added

    • Simplicial Embeddings [#259]
    • Mem efficient attention, FW pass [#267]
    • MHA benchmark
    • MLP benchmark
    • Move all triton kernels to triton v2 [#272]
    • Mem efficient attention, BW pass [#281]
    • Metaformer support [#294]
    Source code(tar.gz)
    Source code(zip)
  • v0.0.10(Mar 15, 2022)

    Fixed

    • Expose bias flag for feedforwards, same default as Timm [#220]
    • Update eps value for layernormm, same default as torch [#221]
    • PreNorm bugfix, only one input was normalized [#233]

    Added

    • Add DeepNet (DeepNorm) residual path and init [#227]
    Source code(tar.gz)
    Source code(zip)
  • v0.0.9(Feb 9, 2022)

    Added

    • Compositional Attention [#41]
    • Experimental Ragged attention [#189]
    • Mixture of Experts [#181]
    • BlockSparseTensor [#202]
    • nd-tensor support for triton softmax [#210]

    Fixed

    • bugfix Favor, single feature map [#183]
    • sanity check blocksparse settings [#207]
    • fixed some pickability [#204]
    Source code(tar.gz)
    Source code(zip)
  • v0.0.8(Jan 7, 2022)

  • v0.0.7(Nov 30, 2021)

  • v0.0.6(Nov 24, 2021)

    Fixed

    • Fix self attention optimization not being triggered, broken residual path [#119]
    • Improve speed by not using contiguous Tensors when not needed [#119]

    Added

    • Attention mask wrapper [#113]
    • ViT comparison benchmark [#117]
    Source code(tar.gz)
    Source code(zip)
  • v0.0.5(Nov 18, 2021)

  • v0.0.4(Nov 17, 2021)

    • Fixing causality not being respected by the scaled dot product attention
    • Fixing Favor causal trainability
    • Enabling FusedLayerNorm by default if Triton is available
    • Fixing Favor with fp16
    Source code(tar.gz)
    Source code(zip)
  • v0.03(Nov 5, 2021)

  • v0.0.2(Nov 1, 2021)

    [0.0.2] - 2021-11-01

    Fixed

    • More robust blocksparse [#24]

    Added

    • Rotary embeddings [#32]
    • More flexible layernorm [#50]
    • More flexible blockfactory config (key deduplication)
    Source code(tar.gz)
    Source code(zip)
Owner
Facebook Research
Facebook Research
chaii - hindi & tamil question answering

chaii - hindi & tamil question answering This is the solution for rank 5th in Kaggle competition: chaii - Hindi and Tamil Question Answering. The comp

abhishek thakur 33 Dec 18, 2022
Finetune gpt-2 in google colab

gpt-2-colab finetune gpt-2 in google colab sample result (117M) from retraining on A Tale of Two Cities by Charles Di

212 Jan 02, 2023
Code for EmBERT, a transformer model for embodied, language-guided visual task completion.

Code for EmBERT, a transformer model for embodied, language-guided visual task completion.

41 Jan 03, 2023
YACLC - Yet Another Chinese Learner Corpus

汉语学习者文本多维标注数据集YACLC V1.0 中文 | English 汉语学习者文本多维标注数据集(Yet Another Chinese Learner

BLCU-ICALL 47 Dec 15, 2022
Sequence modeling benchmarks and temporal convolutional networks

Sequence Modeling Benchmarks and Temporal Convolutional Networks (TCN) This repository contains the experiments done in the work An Empirical Evaluati

CMU Locus Lab 3.5k Jan 03, 2023
Tool to add main subject to items on Wikidata using a WMFs CirrusSearch for named entity recognition or a manually supplied list of QIDs

ItemSubjector Tool made to add main subject statements to items based on the title using a home-brewed CirrusSearch-based Named Entity Recognition alg

Dennis Priskorn 9 Nov 17, 2022
The FinQA dataset from paper: FinQA: A Dataset of Numerical Reasoning over Financial Data

Data and code for EMNLP 2021 paper "FinQA: A Dataset of Numerical Reasoning over Financial Data"

Zhiyu Chen 114 Dec 29, 2022
Pervasive Attention: 2D Convolutional Networks for Sequence-to-Sequence Prediction

This is a fork of Fairseq(-py) with implementations of the following models: Pervasive Attention - 2D Convolutional Neural Networks for Sequence-to-Se

Maha 490 Dec 15, 2022
The entmax mapping and its loss, a family of sparse softmax alternatives.

entmax This package provides a pytorch implementation of entmax and entmax losses: a sparse family of probability mappings and corresponding loss func

DeepSPIN 330 Dec 22, 2022
Flexible interface for high-performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.

Flexible interface for high performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra. What is Lightning Tran

Pytorch Lightning 581 Dec 21, 2022
Implementation of Fast Transformer in Pytorch

Fast Transformer - Pytorch Implementation of Fast Transformer in Pytorch. This only work as an encoder. Yannic video AI Epiphany Install $ pip install

Phil Wang 167 Dec 27, 2022
Weakly-supervised Text Classification Based on Keyword Graph

Weakly-supervised Text Classification Based on Keyword Graph How to run? Download data Our dataset follows previous works. For long texts, we follow C

Hello_World 20 Dec 29, 2022
Dope Wars game engine on StarkNet L2 roll-up

RYO Dope Wars game engine on StarkNet L2 roll-up. What TI-83 drug wars built as smart contract system. Background mechanism design notion here. Initia

104 Dec 04, 2022
This code extends the neural style transfer image processing technique to video by generating smooth transitions between several reference style images

Neural Style Transfer Transition Video Processing By Brycen Westgarth and Tristan Jogminas Description This code extends the neural style transfer ima

Brycen Westgarth 110 Jan 07, 2023
[Preprint] Escaping the Big Data Paradigm with Compact Transformers, 2021

Compact Transformers Preprint Link: Escaping the Big Data Paradigm with Compact Transformers By Ali Hassani[1]*, Steven Walton[1]*, Nikhil Shah[1], Ab

SHI Lab 367 Dec 31, 2022
Quantifiers and Negations in RE Documents

Quantifiers-and-Negations-in-RE-Documents This project was part of my work for a

Nicolas Ruscher 1 Feb 01, 2022
Multi-Scale Temporal Frequency Convolutional Network With Axial Attention for Speech Enhancement

MTFAA-Net Unofficial PyTorch implementation of Baidu's MTFAA-Net: "Multi-Scale Temporal Frequency Convolutional Network With Axial Attention for Speec

Shimin Zhang 87 Dec 19, 2022
A music comments dataset, containing 39,051 comments for 27,384 songs.

Music Comments Dataset A music comments dataset, containing 39,051 comments for 27,384 songs. For academic research use only. Introduction This datase

Zhang Yixiao 2 Jan 10, 2022
nlpcommon is a python Open Source Toolkit for text classification.

nlpcommon nlpcommon, Python Text Tool. Guide Feature Install Usage Dataset Contact Cite Reference Feature nlpcommon is a python Open Source

xuming 3 May 29, 2022
Jarvis is a simple Chatbot with a GUI capable of chatting and retrieving information and daily news from the internet for it's user.

J.A.R.V.I.S Kindly consider starring this repository if you like the program :-) What/Who is J.A.R.V.I.S? J.A.R.V.I.S is an chatbot written that is bu

Epicalable 50 Dec 31, 2022