Huggingface Transformers + Adapters = ❤️

Overview

adapter-transformers

A friendly fork of HuggingFace's Transformers, adding Adapters to PyTorch language models

Tests GitHub PyPI

adapter-transformers is an extension of HuggingFace's Transformers library, integrating adapters into state-of-the-art language models by incorporating AdapterHub, a central repository for pre-trained adapter modules.

💡 Important: This library can be used as a drop-in replacement for HuggingFace Transformers and regularly synchronizes new upstream changes. Thus, most files in this repository are direct copies from the HuggingFace Transformers source, modified only with changes required for the adapter implementations.

Installation

adapter-transformers currently supports Python 3.6+ and PyTorch 1.3.1+. After installing PyTorch, you can install adapter-transformers from PyPI ...

pip install -U adapter-transformers

... or from source by cloning the repository:

git clone https://github.com/adapter-hub/adapter-transformers.git
cd adapter-transformers
pip install .

Getting Started

HuggingFace's great documentation on getting started with Transformers can be found here. adapter-transformers is fully compatible with Transformers.

To get started with adapters, refer to these locations:

  • Colab notebook tutorials, a series notebooks providing an introduction to all the main concepts of (adapter-)transformers and AdapterHub
  • https://docs.adapterhub.ml, our documentation on training and using adapters with adapter-transformers
  • https://adapterhub.ml to explore available pre-trained adapter modules and share your own adapters
  • Examples folder of this repository containing HuggingFace's example training scripts, many adapted for training adapters

Citation

If you use this library for your work, please consider citing our paper AdapterHub: A Framework for Adapting Transformers:

@inproceedings{pfeiffer2020AdapterHub,
    title={AdapterHub: A Framework for Adapting Transformers},
    author={Pfeiffer, Jonas and
            R{\"u}ckl{\'e}, Andreas and
            Poth, Clifton and
            Kamath, Aishwarya and
            Vuli{\'c}, Ivan and
            Ruder, Sebastian and
            Cho, Kyunghyun and
            Gurevych, Iryna},
    booktitle={Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations},
    pages={46--54},
    year={2020}
}
Comments
  • "Parallel" option for training? Parallel adapter outputs required (without interacting with each other).

    Hello,

    Thanks for this nice framework 👍 . I might be asking something that isn't yet possible but wanted to at least try asking!

    I am trying to feed two BERT-based model's outputs to subsequent NN. This requires having two BERT models to be loaded, however, the memory consumption becomes too high if I load two BERT models. To remedy this, I was wondering if I could do something like "Parallel" in training time. (FYI, I am not trying to dynamically drop the first few layers and simply trying to create two BERT forward paths with lesser memory consumption)

    I understand that active adapters can be switched by set_active_adapters(). (Actually, could you confirm if my understanding is correct?) But, this doesn't seem to fit my purpose as, in my case, I need both adapters to output independent representation based on respective adapters.

    Is there anyways that I can make adapters not interact with each other on the forward path while not loading original BERT parameters twice?

    • Making this question even more complex, I also need to make one adapter's parameters to be non-differentiable while requiring them in the forward loop. Any ideas perhaps? :)
    question 
    opened by leejayyoon 18
  • ImportError: cannot import name 'AutoModelWithHeads' from 'transformers'

    ImportError: cannot import name 'AutoModelWithHeads' from 'transformers'

    Hi I am trying with this example colab: https://colab.research.google.com/github/Adapter-Hub/website/blob/master/app/static/notebooks/Adapter_Quickstart_Training.ipynb#scrollTo=Lbwb3NRf8mBF

    getting this error:

    Traceback (most recent call last):
      File "test.py", line 11, in <module>
        from transformers import AutoTokenizer, EvalPrediction, GlueDataset, GlueDataTrainingArguments, AutoModelWithHeads, AdapterType
    ImportError: cannot import name 'AutoModelWithHeads' from 'transformers' (/idiap/user/rkarimi/libs/anaconda3/envs/adapter/lib/python3.7/site-packages/transformers/__init__.py)
    

    versions

    (adapter) [email protected]:/idiap/user/rkarimi/dev/internship/seq2seq/adapter-transformers$ conda list | grep transformers
    adapter-transformers      1.0.1                     <pip>
    transformers              3.5.1                     <pip>
    (adapter) [email protected]:/idiap/user/rkarimi/dev/internship/seq2seq/adapter-transformers$ conda list | grep pytorch
    pytorch-lightning         1.0.4                     <pip>
    adapter hub from github is installed
    
    bug 
    opened by rabeehkarimimahabadi 17
  • training the language adapters in the MAD-X paper

    training the language adapters in the MAD-X paper

    Hi I would need to train language adapters as done in MAD-X paper, I have downloaded wikipedia data, but these are very large-scale data and so far I did not managed to train them, I was wondering if you could share with me the script that you managed to train the language adapters, thank you very much in advance.

    question 
    opened by dorost1234 13
  • Add t5 adapter

    Add t5 adapter

    Followed the pattern of Bart to add adapters to T5. One change is that whereas Bart has separate classes for encoder and decoder, T5 does not. So I am using the is_decoder for changes between encoder and decoder classes, such as adding cross_attention adapters and adding invertible adapters.

    I'm working on some testing.

    opened by AmirAktify 12
  • Training an Adapter using own classification head and pytorch training loop

    Training an Adapter using own classification head and pytorch training loop

    Details

    Hello ! I want to add adapter approach in my text-classification pre-trained bert, but I did not find a good explanation in the documentation on how to that. My model class is the following:

    class BertClassifier(nn.Module):
        """Bert Model for Classification Tasks."""
        def __init__(self, freeze_bert=True):
            """
             @param    bert: a BertModel object
             @param    classifier: a torch.nn.Module classifier
             @param    freeze_bert (bool): Set `False` to fine-tune the BERT model
            """
            super(BertClassifier, self).__init__()
    
            # Instantiate BERT model
            # Specify hidden size of BERT, hidden size of our classifier, and number of labels
            self.bert = BertAdapterModel.from_pretrained(PREETRAINED_MODEL')
            self.D_in = 1024 
            self.H = 512
            self.D_out = 2
            
    
            # Add a new adapter
            self.bert.add_adapter("thermo_cl",set_active=True)
            self.bert.train_adapter(["thermo_cl"])
    
     
            # Instantiate the classifier head with some one-layer feed-forward classifier
            self.classifier = nn.Sequential(
                nn.Linear(self.D_in, 512),
                nn.Tanh(),
                nn.Linear(512, self.D_out),
                nn.Tanh()
            )
     
             # Freeze the BERT model
            if freeze_bert:
                for param in self.bert.parameters():
                    param.requires_grad = True
    
    
        def forward(self, input_ids, attention_mask):
            ''' Feed input to BERT and the classifier to compute logits.
             @param    input_ids (torch.Tensor): an input tensor with shape (batch_size,
                           max_length)
             @param    attention_mask (torch.Tensor): a tensor that hold attention mask
                           information with shape (batch_size, max_length)
             @return   logits (torch.Tensor): an output tensor with shape (batch_size,
                           num_labels) '''
             # Feed input to BERT
            outputs = self.bert(input_ids=input_ids,
                                 attention_mask=attention_mask)
             
             # Extract the last hidden state of the token `[CLS]` for classification task
            last_hidden_state_cls = outputs[0][:, 0, :]
     
             # Feed input to classifier to compute logits
            logits = self.classifier(last_hidden_state_cls)
     
            return logits
    

    The training loop is the following:

    def initialize_model(epochs):
        """ Initialize the Bert Classifier, the optimizer and the learning rate scheduler."""
        # Instantiate Bert Classifier
        bert_classifier = BertClassifier(freeze_bert=False) #false=freezed
    
        # Tell PyTorch to run the model on GPU
        bert_classifier = bert_classifier.to(device)
    
        # Create the optimizer
        optimizer = AdamW(bert_classifier.parameters(),
                          lr=lr,    # Default learning rate
                          eps=1e-8    # Default epsilon value
                          )
    
        # Total number of training steps
        total_steps = len(train_dataloader) * epochs
    
        # Set up the learning rate scheduler
        scheduler = get_linear_schedule_with_warmup(optimizer,
                                                    num_warmup_steps=0, # Default value
                                                    num_training_steps=total_steps)
    
        return bert_classifier, optimizer, scheduler
    
    def train(model, train_dataloader, val_dataloader, valid_loss_min_input, checkpoint_path, best_model_path, start_epochs, epochs, evaluation=True):
    
        """Train the BertClassifier model."""
        # Start training loop
        logging.info("--Start training...\n")
    
        # Initialize tracker for minimum validation loss
        valid_loss_min = valid_loss_min_input 
    
    
        for epoch_i in range(start_epochs, epochs):
    
                              ..............................
    
         if evaluation == True:
                # After the completion of each training epoch, measure the model's performance
                # on our validation set.
                val_loss, val_accuracy = evaluate(model, val_dataloader)
    
                # Print performance over the entire training data
                time_elapsed = time.time() - t0_epoch
                
                logging.info(f"{epoch_i + 1:^7} | {'-':^7} | {avg_train_loss:^12.6f} | {val_loss:^10.6f} | {val_accuracy:^10.6f} | {time_elapsed:^9.2f}")
    
                logging.info("-"*70)
            logging.info("\n")
    
             # create checkpoint variable and add important data
            checkpoint = {
                'epoch': epoch_i + 1,
                'valid_loss_min': val_loss,
                'state_dict': model.state_dict(),
                'optimizer': optimizer.state_dict(),
            }
            
            # save checkpoint
            save_ckp(checkpoint, False, checkpoint_path, best_model_path)
            
            ## TODO: save the model if validation loss has decreased
            if val_loss <= valid_loss_min:
                print('Validation loss decreased ({:.6f} --> {:.6f}).  Saving model ...'.format(valid_loss_min,val_loss))
                # save checkpoint as best model
                save_ckp(checkpoint, True, checkpoint_path, best_model_path)
                valid_loss_min = val_loss
    
    
        model.save_adapter("./final_adapter", "thermo_cl")
        logging.info("-----------------Training complete--------------------------")
    
    
    bert_classifier, optimizer, scheduler = initialize_model(epochs=n_epochs)
    train(model = bert_classifier....)
    

    As you can see I have my own personalized classification head, so I don't want to use the .add_classification_head() method. Is it correct to train and activate the adapter in this way? I would like to know if I'm using adapter properly and also how to save the checkpoint and my model weights because at the end of the training (where i suppose to save the adapter) I receive this error:

    AttributeError: 'BertClassifier' object has no attribute 'save_adapter'
    

    Thanks for the help!

    question Stale 
    opened by Ch-rode 11
  • Merge with original transformers library

    Merge with original transformers library

    🚀 Feature request

    Merge this into the original transformers library.

    Motivation

    This library is awesome so thanks a lot but it would be much more convenient to have this merged into the original transformers library. The Huggingface team seems to be focused on adding lightweight options for their models and adapters are huge time-and-memory-savers for multitask use cases and would be a great addition to the transformers library.

    Your contribution

    You've done the integration here already so it should be straightforward but happy to help. I've posted an issue on huggingface's end as well.

    discussion Stale 
    opened by salimmj 11
  • Unintuitive slowdown in data loading and model updating on using adapters

    Unintuitive slowdown in data loading and model updating on using adapters

    Environment info

    • transformers version: 1.0.1
    • Platform: Linux-3.10.0-1127.19.1.el7.x86_64-x86_64-with-glibc2.10
    • Python version: 3.8.5
    • PyTorch version (GPU?): 1.7.0 (True)
    • Tensorflow version (GPU?): not installed (NA)
    • Using GPU in script?: Yes
    • Using distributed or parallel set-up in script?: Yes

    Who can help: @LysandreJik @patrickvonplaten

    Model I am using: Bert

    Language I am using the model on:English

    Adapter setup I am using (if any): HoulsbyConfig

    The problem arises when using: My own modified scripts: I want to use adapters for a project of mine, which will require fine-tuning BERT multiple times. In order to get an understanding of how much speedup I shall get from using adapters, I profiled the various steps in the training loop of BERT, both with and without the use of adapters The tasks I am working on is: Stanford Natural Language inference(SNLI)

    To reproduce

    Steps to reproduce the behavior: The following function is executed for a period of 4 hours on identical GPUs(via an LSF bach system) once with UseAdapter set to true and once with it set to False. The path contains a preloaded and tokenized version of the SNLI training set(as well as the test and dev sets, dropped here via underscores)

    def load_and_train(path, UseAdapter):
        x_train,y_train,a_train,t_train,_,_,_,_,_,_,_,_=load(open(path,"rb"))
        train_inst=torch.tensor(x_train)
        train_att=torch.tensor(a_train)
        train_types=torch.tensor(t_train)
        train_targ=torch.tensor(y_train)
        train_data = TensorDataset(train_inst, train_att, train_types,train_targ)
        train_sampler = RandomSampler(train_data)
        train_dataloader = DataLoader(train_data, sampler=train_sampler, batch_size=32)
        model = BertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=3)
        if UseAdapter:
            model.add_adapter("SNLI",AdapterType.text_task,HoulsbyConfig().__dict__)
            model.train_adapter(["SNLI"])
            model.set_active_adapters(["SNLI"])
        model.cuda()
        optimizer=AdamW(model.parameters(),lr=1e-4)
        scheduler=get_linear_schedule_with_warmup(optimizer,0,len(train_dataloader)*EPOCHS)
        iter=0
        time_load=0
        time_cler=0
        time_forw=0
        time_back=0
        time_updt=0
        for e in range(15):
            model.train()
            for batch in train_dataloader:
                last=time()
                x=batch[0].cuda()
                a=batch[1].cuda()
                t=batch[2].cuda()
                y=batch[3].cuda()
                time_load+=time()-last
                last=time()
                model.zero_grad()
                time_cler+=time()-last
                last=time()
                outputs = model(x, token_type_ids=t, attention_mask=a, labels=y)
                time_forw+=time()-last
                last=time()
                loss=outputs[0]
                loss.backward()
                time_back+=time()-last
                last=time()
                optimizer.step()
                scheduler.step()
                time_updt+=time()-last
                iter+=1
                print(time_load,time_cler,time_forw,time_back,time_updt)
    

    Expected behavior

    1. With Adapters the trainer is able to run through more batches than without by the time the job gets timed out
    2. Per Batch time_load is identical for both cases
    3. Per Batch time_cler is slightly lower with adapters due to the presence of fewer gradients
    4. Per Batch time_forw is slightly higher with adapters due to extra layers that are introduced
    5. Per Batch time_back is significantly lower with adapters since it needs to save fewer gradients
    6. Per Batch time_updt is lower with adapters due to having fewer parameters to update

    Observed Behaviour

    Overall times(seconds):

    Adapter | Load Time | Clear Time | Forward Prop | Backward Prop | Update | Total | No of Batches -- | -- | -- | -- | -- | -- | -- | -- No | 9.141064644 | 349.405822 | 873.8870151 | 11770.82554 | 1159.772 | 14163.03 | 69022 Yes | 2721.683394 | 394.4980106 | 1652.686945 | 3192.402303 | 6304.335 | 14265.61 | 95981

    Per Batch Times(seconds):

    Adapter | Load Time | Clear Time | Forward Prop | Backward Prop | Update -- | -- | -- | -- | -- | -- No | 0.000132437 | 0.005062238 | 0.012660992 | 0.1705373 | 0.016803 Yes | 0.028356481 | 0.004110168 | 0.017218897 | 0.033260774 | 0.065683

    As is evident from above, points 2 and 6 above are not satisfied in this output. Note that similar observations were made in 2 reruns of the experiment. It is unclear to me if there is an explanation I am missing or if this is an implementation issue.

    bug 
    opened by cs1160701 9
  • Loading custom adapters and 'output_attentions' for AdapterFusion

    Loading custom adapters and 'output_attentions' for AdapterFusion

    Question

    Information

    Model I am using (Bert, XLNet ...): XLM-RoBERTa-base

    Language I am using the model on (English, Chinese ...): Korean

    Adapter setup I am using (if any):

    The problem arises when using:

    • [X] the official example scripts: (give details below)
    • [ ] my own modified scripts: (give details below)

    The tasks I am working on is:

    • [ ] an official GLUE/SQUaD task: (give the name)
    • [X] my own task or dataset: (give details below)
    • Datasets: KorNLI and KorSTS (Machine translated Korean MNLI & STS-B dataset)
    • Its format and size are the same as the original datasets (MNLI & STS-B)

    Background

    What I'm doing is that:

    1. train Task-Adapters for KorNLI and KorSTS on the XLM-RoBERTa-base model (to train on Korean datasets) using the official code, 'run_glue_alt.py'
    2. fusion both adapters with a fusion layer using 'run_fusion_glue.py'

    Questions

    Sorry that I'm not familiar with the adapter-transformers codebase. Here are some questions about the AdapterFusion framework.

    1. Is it available to load my own pre-trained adapters using 'model.load_adapter' function in the current framework? (I'm using the latest version of adapter-transformers')
    2. The performance on the target task (KorSTS) composed with KorSTS and KorNLI single task adapters is markedly lower than the single task adapter trained on the KorSTS dataset. Even with various hyperparameter (batch size, epoch, learning rate, fusion config, ...) search, the performance doesn't seem to be improved. Is there any way to check whether the fusion layer is trained properly?
    3. Connected with the questions above, is it possible to investigate the attention distribution of the trained fusion layer? I've checked there is an option 'output_attentions' defined in the BertModel class, but I could not find a way to output attention weights of the fusion layers, not the self-attention layers of the original pre-trained model.

    Environment info

    • transformers version:
    • Platform:
    • Python version: 3.6.3
    • PyTorch version (GPU?): 1.4
    • Tensorflow version (GPU?):
    • Using GPU in script?: Yes
    • Using distributed or parallel set-up in script?: No, I'm using a single GPU
    bug question 
    opened by bigkunzi 9
  • TypeError: unhashable type: 'Stack' error raised when using Parallel adapter heads

    TypeError: unhashable type: 'Stack' error raised when using Parallel adapter heads

    Environment info

    • adapter-transformers version:
    • Platform: Linux
    • Python version: 3.6.8
    • PyTorch version (GPU?): GPU / 1.7
    • Tensorflow version (GPU?): NA
    • Using GPU in script?: Yes
    • Using distributed or parallel set-up in script?: Using nn.DataParallel

    Information

    Model I am using (Bert, XLNet ...): BERT pretrained model with 3 custom adapters + heads are used.

    Language I am using the model on (English, Chinese ...): EN

    Adapter setup I am using (if any): 3 Adapters (with default configuration) and 3 Classification Head.

    The problem arises when using:

    • [ ] the official example scripts: (give details below)
    • [ ] my own modified scripts: (give details below)

    The tasks I am working on is: Multi-task finetuning using AdapterHub

    Error below :

     (from logs) active head : [<bound method AdapterCompositionBlock.last of Stack[combined, resource_type, action]>]
    
    Traceback (most recent call last):
      File "/home/hchoi/remote_sessions/.venv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
        result = self.forward(*input, **kwargs)
      File "/home/hchoi/remote_sessions/.venv/lib/python3.6/site-packages/transformers/models/bert/modeling_bert.py", line 1092, in forward
        head_inputs, head_name=head, attention_mask=attention_mask, return_dict=return_dict, **kwargs
      File "/home/hchoi/remote_sessions/.venv/lib/python3.6/site-packages/transformers/adapters/heads.py", line 509, in forward_head
        if head not in self.heads:
      File "/home/hchoi/remote_sessions/.venv/lib/python3.6/site-packages/torch/nn/modules/container.py", line 304, in __contains__
        return key in self._modules
    TypeError: unhashable type: 'Stack'
    

    Modified code below

    
    model = AutoModelWithHeads.from_pretrained('bert_base_uncased')
    
    # 3 adapters and classification heads are added.
    model.add_adapter('name_a')
    model.add_classification_head('name_a',  {'num_labels' : 100})
    
    model.add_adapter('name_b')
    model.add_classification_head('name_b')
    
    model.add_adapter('name_c')
    model.add_classification_head('name_c',  {'num_labels' : 5})
    
    
    # Use `Parallel` to enable multiple active heads.
    adapter_names  = ['name_a', 'name_b', 'name_c']
    model.active_heads =  ac.Parallel(adapter_names)
    
    for name in adapter_names:
        model.train_adapter(name)
        
    # Invoke forward pass. This will trigger the error. 
    model(inputs)
    
    

    Expected behavior

    Model forward pass should work.

    bug 
    opened by hchoi-moveworks 8
  • Hinglish Sentiment Adapter

    Hinglish Sentiment Adapter

    🌟 New Adapter setup

    Model and Data Description

    Hinglish: Romanized version of Hindi, and is immensely popular in India, where Hindi is spoken by millions of people but typed quite often in Roman script

    Dataset: SemEval 2020 Task 9 Sentiment Analysis: 3 classes, +ve, -ve and neutral

    Open source status

    • [x] Code Implementation for the Adapter: https://colab.research.google.com/drive/19lofRd9n142xJCtUteZb5L_r7spGcGLL?usp=sharing
    • [x] Past Work: Accepted Paper, Code and Model Weights
    • [x] Who are the authors: @NirantK and @meghanabhange

    What I need help with

    • [x] Because there were no examples other than Glue Datasets, I ended up implementing a new HinglishDataset class and other skeleton code -- I'd appreciate a review if I got something wrong

    Next Steps

    If all is well in the code above, I'd like to continue along and contribute an adapter for Hinglish under the Sentiment task.

    enhancement 
    opened by NirantK 8
  • Train adapters without Hugging Face Trainer scripts

    Train adapters without Hugging Face Trainer scripts

    Hi, I was looking into example scripts for Adapter-Hub and almost all *_no_trainer.py scripts were not using adapters at all. Are you guys planning to add those scripts soon? I can also help in porting trainer scripts to no_trainer scripts if someone can guide me about what all changes will be required for that. Thank you!

    cc: @calpt

    question Stale 
    opened by bhavitvyamalik 7
  • T5: Missing tied weights crash `accelerate`

    T5: Missing tied weights crash `accelerate`

    First opened at https://github.com/huggingface/accelerate/issues/958 . When huggingface accelerate is used via device_map='auto', there is a weight tied with the missing lm_head that stimulates a crash inside the device map planning code. It would be nice if there were a clear way to retain the head and tied weight during loading.

    Environment info

    • adapter-transformers version: 3.1.0
    • Platform: Linux-3.10.0-1160.80.1.el7.x86_64-x86_64-with-glibc2.17
    • Python version: 3.9.16+
    • Huggingface_hub version: 0.11.1
    • PyTorch version (GPU?): 1.13.1+cu117 (True)
    • Tensorflow version (GPU?): not installed (NA)
    • Flax version (CPU?/GPU?/TPU?): not installed (NA)
    • Jax version: not installed
    • JaxLib version: not installed
    • Using GPU in script?: yes, device_map='auto'
    • Using distributed or parallel set-up in script?: no

    Information

    Model I am using (Bert, XLNet ...): google/flan-t5-base

    Language I am using the model on (English, Chinese ...): n/a

    Adapter setup I am using (if any): AutoAdapterModel.from_pretrained

    The problem arises when using:

    • [ ] the official example scripts: (give details below)
    • [x] my own modified scripts: (give details below)

    The tasks I am working on is:

    • [ ] an official GLUE/SQUaD task: (give the name)
    • [x] my own task or dataset: (give details below)

    To reproduce

    Steps to reproduce the behavior:

    import transformers
    model = transformers.AutoAdapterModel.from_pretrained('google/flan-t5-base', device_map='auto')
    

    Result:

    ╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
    │ /home/user/scratch/test-2023-01-07.py:2 in <module>                                              │
    │                                                                                                  │
    │   1 import transformers                                                                          │
    │ ❱ 2 model = transformers.AutoAdapterModel.from_pretrained('google/flan-t5-base', device_map=     │
    │   3                                                                                              │
    │                                                                                                  │
    │ /home/user/.local/lib/python3.9/site-packages/transformers/models/auto/auto_factory.py:446 in    │
    │ from_pretrained                                                                                  │
    │                                                                                                  │
    │   443 │   │   │   return model_class.from_pretrained(pretrained_model_name_or_path, *model_arg   │
    │   444 │   │   elif type(config) in cls._model_mapping.keys():                                    │
    │   445 │   │   │   model_class = _get_model_class(config, cls._model_mapping)                     │
    │ ❱ 446 │   │   │   return model_class.from_pretrained(pretrained_model_name_or_path, *model_arg   │
    │   447 │   │   raise ValueError(                                                                  │
    │   448 │   │   │   f"Unrecognized configuration class {config.__class__} for this kind of AutoM   │
    │   449 │   │   │   f"Model type should be one of {', '.join(c.__name__ for c in cls._model_mapp   │
    │                                                                                                  │
    │ /home/user/.local/lib/python3.9/site-packages/transformers/modeling_utils.py:2121 in             │
    │ from_pretrained                                                                                  │
    │                                                                                                  │
    │   2118 │   │   │   no_split_modules = model._no_split_modules                                    │
    │   2119 │   │   │   # Make sure tied weights are tied before creating the device map.             │
    │   2120 │   │   │   model.tie_weights()                                                           │
    │ ❱ 2121 │   │   │   device_map = infer_auto_device_map(                                           │
    │   2122 │   │   │   │   model, no_split_module_classes=no_split_modules, dtype=torch_dtype, max_  │
    │   2123 │   │   │   )                                                                             │
    │   2124                                                                                           │
    │                                                                                                  │
    │ /shared/src/accelerate/src/accelerate/utils/modeling.py:545 in infer_auto_device_map             │
    │                                                                                                  │
    │   542 │   │   elif tied_param is not None:                                                       │
    │   543 │   │   │   # Determine the sized occupied by this module + the module containing the ti   │
    │   544 │   │   │   tied_module_size = module_size                                                 │
    │ ❱ 545 │   │   │   tied_module_index = [i for i, (n, _) in enumerate(modules_to_treat) if n in    │
    │   546 │   │   │   tied_module_name, tied_module = modules_to_treat[tied_module_index]            │
    │   547 │   │   │   tied_module_size += module_sizes[tied_module_name] - module_sizes[tied_param   │
    │   548 │   │   │   if current_max_size is not None and current_memory_used + tied_module_size >   │
    ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
    IndexError: list index out of range
    

    Expected behavior

    No crash. Ability to tie weights with seq2seq lm_head.

    bug 
    opened by xloem 0
  • Fusing task-specific and task-agnostic adapters

    Fusing task-specific and task-agnostic adapters

    Environment info

    • adapter-transformers version: 3.1.0
    • Platform: Linux-4.18.0-425.3.1.el8.x86_64-x86_64-with-glibc2.17
    • Python version: 3.8.11
    • Huggingface_hub version: 0.11.1
    • PyTorch version (GPU?): 1.12.1 (False)
    • Tensorflow version (GPU?): not installed (NA)
    • Flax version (CPU?/GPU?/TPU?): not installed (NA)
    • Jax version: not installed
    • JaxLib version: not installed
    • Using GPU in script?: yes
    • Using distributed or parallel set-up in script?: no

    Details

    Hi, I am trying to combine task-specific and task-agnostic adapters. Assume I have three tasks Task-A, Task-B, and, Task-C. I will add task-specific adapters and task-agnostic adapters as follows

    import transformers.adapters.composition as ac
    
    model.add_adapter("TASK-A")
    model.add_adapter("TASK-B")
    model.add_adapter("TASK-C")
    
    model.add_adapter("TASK-Agnostic")
    

    Now I want to fuse the task-specific adapter and task-agnostic adapter dynamically i.e, depending on what the task is.

    Should I fuse the adapters as follows?

    model.add_adapter_fusion(["TASK-A", "TASK-Agnostic"])
    model.add_adapter_fusion(["TASK-B", "TASK-Agnostic"])
    model.add_adapter_fusion(["TASK-C", "TASK-Agnostic"])
    

    Inside the forward_pass of Trainer, I will set the active adapters as follows

    task_name = get_task_name()
    model.active_adapters = ac.Fuse(task_name, "TASK-Agnostic")
    

    Is this the right way to implement this?

    Thanks

    question 
    opened by murthyrudra 0
  • Stacking two parallel composition blocks

    Stacking two parallel composition blocks

    Hi,

    Can I stack two Parallel composition blocks like this? ac.Stack(ac.Parallel('a', 'b'), ac.Parapllel('c', 'd'))

    I found that the inputs will only be replicated once, but should be twice. Could you help me fix it?

    Thanks!

    question 
    opened by HZQ950419 0
  • Add adapter to AutoModelForSequenceClassification model

    Add adapter to AutoModelForSequenceClassification model

    Environment info

    • adapter-transformers version: newest
    • Platform: Azure ML
    • Python version: 3.8
    • PyTorch version (GPU?):

    Details

    I try to use AutoModelForSequenceClassification model (using BART). The document is not so clear so I just load it directly and add adapter(LoRA) to it. When I run the trainer, I got the following errors

    RestException: INVALID_PARAMETER_VALUE: Response: {'Error': {'Code': 'ValidationError', 'Severity': None, 'Message': 'No more than 255 characters per params Value. Request contains 1 of greater length.', 'MessageFormat': None, 'MessageParameters': None, 'ReferenceCode': None, 'DetailsUri': None, 'Target': None, 'Details': [], 'InnerError': None, 'DebugInfo': None, 'AdditionalInfo': None}, 'Correlation': {'operation': '04d45ce3752c5e51c54e71f3950411ca', 'request': '6d216d8faea19d26'}, 'Environment': 'westus', 'Location': 'westus', 'Time': '2023-01-04T17:45:03.5650777+00:00', 'ComponentName': 'mlflow', 'error_code': 'INVALID_PARAMETER_VALUE'}

    Any ideas on how to solve it?

    question 
    opened by andyzengmath 0
  • Support for openai Whisper

    Support for openai Whisper

    🌟 New adapter setup

    Support for openai Whisper

    Add adapter integration for whisper.

    Open source status

    • [x] the model implementation is available: official code hf
    • [x] the model weights are available: hf
    • [x] who are the authors: @jongwook @ArthurZucker @sgugger
    enhancement 
    opened by karynaur 0
  • Add adapter configuration strings & restructure adapter method docs

    Add adapter configuration strings & restructure adapter method docs

    Configuration strings

    This PR adds the possibility to use flexible adapter configuration strings which allow specifying custom config attributes. Examples:

    • Set config attributes: model.add_adapter("name", config="parallel[reduction_factor=2]")
    • Config union model.add_adapter("name", config="prefix_tuning|parallel")
    • more examples: https://github.com/calpt/adapter-transformers/blob/8df62b9de2a8ab51115b191aca35b2fb53c96539/tests_adapters/test_adapter_config.py#L95-L102

    Documentation: https://github.com/calpt/adapter-transformers/blob/8df62b9de2a8ab51115b191aca35b2fb53c96539/adapter_docs/overview.md

    Configuration strings can allow passing complex configurations e.g. via command line.

    Documentation restructuring

    The adapter method documentation is now split into three pages:

    • Overview and Configuration: introduction, table, configuration
    • Adapter Methods
    • Method Combinations
    opened by calpt 0
Releases(adapters3.1.0)
  • adapters3.1.0(Sep 15, 2022)

    Based on transformers v4.21.3

    New

    New adapter methods

    New model integrations

    • Add Deberta and DebertaV2 integration(@hSterz via #340)
    • Add Vision Transformer integration (@calpt via #363)

    Misc

    • Add adapter_summary() method (@calpt via #371): More info
    • Return AdapterFusion attentions using output_adapter_fusion_attentions argument (@calpt via #417): Documentation

    Changed

    • Upgrade of underlying transformers version (@calpt via #344, #368, #404)

    Fixed

    • Infer label names for training for flex head models (@calpt via #367)
    • Ensure root dir exists when saving all adapters/heads/fusions (@calpt via #375)
    • Avoid attempting to set prediction head if non-existent (@calpt via #377)
    • Fix T5EncoderModel adapter integration (@calpt via #376)
    • Fix loading adapters together with full model (@calpt via #378)
    • Multi-gpu support for prefix-tuning (@alexanderhanboli via #359)
    • Fix issues with embedding training (@calpt via #386)
    • Fix initialization of added embeddings (@calpt via #402)
    • Fix model serialization using torch.save() & torch.load() (@calpt via #406)
    Source code(tar.gz)
    Source code(zip)
  • adapters3.0.1(May 18, 2022)

    Based on transformers v4.17.0

    New

    • Support float reduction factors in bottleneck adapter configs (@calpt via #339)

    Fixed

    • [AdapterTrainer] add missing preprocess_logits_for_metrics argument (@stefan-it via #317)
    • Fix save_all_adapters such that with_head is not ignored (@hSterz via #325)
    • Fix inferring batch size for prefix tuning (@calpt via #335)
    • Fix bug when using compacters with AdapterSetup context (@calpt via #328)
    • [Trainer] Fix issue with AdapterFusion and load_best_model_at_end (@calpt via #341)
    • Fix generation with GPT-2, T5 and Prefix Tuning (@calpt via #343)
    Source code(tar.gz)
    Source code(zip)
  • adapters3.0.0(Mar 23, 2022)

    Based on transformers v4.17.0

    New

    Efficient Fine-Tuning Methods

    • Add Prefix Tuning (@calpt via #292)
    • Add Parallel adapters & Mix-and-Match adapter (@calpt via #292)
    • Add Compacter (@hSterz via #297)

    Misc

    • Introduce XAdapterModel classes as central & recommended model classes (@calpt via #289)
    • Introduce ConfigUnion class for flexible combination of adapter configs (@calpt via #292)
    • Add AdapterSetup context manager to replace adapter_names parameter (@calpt via #257)
    • Add ForwardContext to wrap model forward pass with adapters (@calpt via #267, #295)
    • Search all remote sources when passing source=None (new default) to load_adapter() (@calpt via #309)

    Changed

    • Deprecate XModelWithHeads in favor of XAdapterModel (@calpt via #289)
    • Refactored adapter integration into model classes and model configs (@calpt via #263, #304)
    • Rename activation functions to match Transformers' names (@hSterz via #298)
    • Upgrade of underlying transformers version (@calpt via #311)

    Fixed

    • Fix seq2seq generation with flexible heads classes (@calpt via #275, @hSterz via #285)
    • Parallel composition for XLM-Roberta (@calpt via #305)
    Source code(tar.gz)
    Source code(zip)
  • adapters2.3.0(Feb 9, 2022)

    Based on transformers v4.12.5

    New

    • Allow adding, loading & training of model embeddings (@hSterz via #245). See https://docs.adapterhub.ml/embeddings.html.

    Changed

    • Unify built-in & custom head implementation (@hSterz via #252)
    • Upgrade of underlying transformers version (@calpt via #255)

    Fixed

    • Fix documentation and consistency issues for AdapterFusion methods (@calpt via #259)
    • Fix serialization/ deserialization issues with custom adapter config classes (@calpt via #253)
    Source code(tar.gz)
    Source code(zip)
  • adapters2.2.0(Oct 14, 2021)

    Based on transformers v4.11.3

    New

    Model support

    • T5 adapter implementation (@AmirAktify & @hSterz via #182)
    • EncoderDecoderModel adapter implementation (@calpt via #222)

    Prediction heads

    • AutoModelWithHeads prediction heads for language modeling (@calpt via #210)
    • AutoModelWithHeads prediction head & training example for dependency parsing (@calpt via #208)

    Training

    • Add a new AdapterTrainer for training adapters (@hSterz via #218, #241 )
    • Enable training of Parallel block (@hSterz via #226)

    Misc

    • Add get_adapter_info() method (@calpt via #220)
    • Add set_active argument to add & load adapter/fusion/head methods (@calpt via #214)
    • Minor improvements for adapter card creation for HF Hub upload (@calpt via #225)

    Changed

    • Upgrade of underlying transformers version (@calpt via #232, #234, #239 )
    • Allow multiple AdapterFusion configs per model; remove set_adapter_fusion_config() (@calpt via #216)

    Fixed

    • Incorrect referencing between adapter layer and layer norm for DataParallel (@calpt via #228)
    Source code(tar.gz)
    Source code(zip)
  • adapters2.1.0(Jul 8, 2021)

    Based on transformers v4.8.2

    New

    Integration into HuggingFace's Model Hub

    • Add support for loading adapters from HuggingFace Model Hub (@calpt via #162)
    • Add method to push adapters to HuggingFace Model Hub (@calpt via #197)
    • Learn more

    BatchSplit adapter composition

    • BatchSplit composition block for adapters and heads (@hSterz via #177)
    • Learn more

    Various new features

    • Add automatic conversion of static heads when loaded via XModelWithHeads (@calpt via #181) Learn more
    • Add list_adapters() method to search for adapters (@calpt via #193) Learn more
    • Add delete_adapter(), delete_adapter_fusion() and delete_head() methods (@calpt via #189)
    • MAD-X 2.0 WikiAnn NER notebook (@hSterz via #187)
    • Upgrade of underlying transformers version (@hSterz via #183, @calpt via #194 & #200)

    Changed

    • Deprecate add_fusion() and train_fusion() in favor of add_adapter_fusion() and train_adapter_fusion() (@calpt via #190)

    Fixed

    • Suppress no-adapter warning when adapter_names is given (@calpt via #186)
    • leave_out in load_adapter() when loading language adapters from Hub (@hSterz via #177)
    Source code(tar.gz)
    Source code(zip)
  • adapters2.0.1(May 28, 2021)

    Based on transformers v4.5.1

    New

    • Allow different reduction factors for different adapter layers (@hSterz via #161)
    • Allow dynamic dropping of adapter layers in load_adapter() (@calpt via #172)
    • Add method get_adapter() to retrieve weights of an adapter (@hSterz via #169)

    Changed

    • Re-add adapter_names argument to model forward() methods (@calpt via #176)

    Fixed

    • Fix resolving of adapter from Hub when multiple options available (@Aaronsom via #164)
    • Fix & improve adapter saving/ loading using Trainer class (@calpt via #178)
    Source code(tar.gz)
    Source code(zip)
  • adapters2.0.0(Apr 29, 2021)

    Based on transformers v4.5.1

    All major new features & changes are described at https://docs.adapterhub.ml/v2_transition.

    • all changes merged via #105

    Additional changes & Fixes

    • Support loading adapters with load_best_model_at_end in Trainer (@calpt via #122)
    • Add setter for active_adapters property (@calpt via #132)
    • New notebooks for NER, text generation & AdapterDrop (@hSterz via #135)
    • Enable trainer to load adapters from checkpoints (@hSterz via #138)
    • Update & clean up example scripts (@hSterz via #154 & @calpt via #141, #155)
    • Add unfreeze_adapters param to train_fusion() (@calpt via #156)
    • Ensure eval/ train mode is correct for AdapterFusion (@calpt via #157)
    Source code(tar.gz)
    Source code(zip)
  • adapters1.1.1(Jan 14, 2021)

    Based on transformers v3.5.1

    New

    • Modular & custom prediction heads for flex head models (@hSterz via #88)

    Fixed

    • Fixes for DistilBERT layer norm and AdapterFusion (@calpt via #102)
    • Fix for reloading full models with AdapterFusion (@calpt via #110)
    • Fix attention and logits output for flex head models (@calpt via #103 & #111)
    • Fix loss output of flex model with QA head (@hSterz via #88)
    Source code(tar.gz)
    Source code(zip)
  • adapters1.1.0(Nov 30, 2020)

    Based on transformers v3.5.1

    New

    • New model with adapter support: DistilBERT (@calpt via #67)
    • Save label->id mapping of the task together with the adapter prediction head (@hSterz via #75)
    • Automatically set matching label->id mapping together with active prediction head (@hSterz via #81)
    • Upgraded underlying transformers version (@calpt via #55, #72 and #85)
    • Colab notebook tutorials showcasing all AdapterHub concepts (@calpt via #89)

    Fixed

    • Support for models with flexible heads in pipelines (@calpt via #80)
    • Adapt input to models with flexible heads to static prediction heads input (@calpt via #90)
    Source code(tar.gz)
    Source code(zip)
  • adapters1.0.1(Oct 6, 2020)

    Based on transformers v2.11.0

    New

    • Adds squad-style QA prediction head to flex-head models

    Bug fixes

    • Fixes loading and saving of adapter config in model.save_pretrained()
    • Fixes parsing of adapter names in fusion setup
    Source code(tar.gz)
    Source code(zip)
  • adapters1.0(Sep 9, 2020)

[EMNLP 2021] Mirror-BERT: Converting Pretrained Language Models to universal text encoders without labels.

[EMNLP 2021] Mirror-BERT: Converting Pretrained Language Models to universal text encoders without labels.

Cambridge Language Technology Lab 61 Dec 10, 2022
A script that automatically creates a branch name using google translation api and jira api

About google translation api와 jira api을 사용하여 자동으로 브랜치 이름을 만들어주는 스크립트 Setup 환경변수에 다음 3가지를 등록해야 한다. JIRA_USER : JIRA email (ex: hyunwook.kim 2 Dec 20, 2021

AMUSE - financial summarization

AMUSE AMUSE - financial summarization Unzip data.zip Train new model: python FinAnalyze.py --task train --start 0 --count how many files,-1 for all

1 Jan 11, 2022
Ceaser-Cipher - The Caesar Cipher technique is one of the earliest and simplest method of encryption technique

Ceaser-Cipher The Caesar Cipher technique is one of the earliest and simplest me

Lateefah Ajadi 2 May 12, 2022
News-Articles-and-Essays - NLP (Topic Modeling and Clustering)

NLP T5 Project proposal Topic Modeling and Clustering of News-Articles-and-Essays Students: Nasser Alshehri Abdullah Bushnag Abdulrhman Alqurashi OVER

2 Jan 18, 2022
Code for Findings of ACL 2022 Paper "Sentiment Word Aware Multimodal Refinement for Multimodal Sentiment Analysis with ASR Errors"

SWRM Code for Findings of ACL 2022 Paper "Sentiment Word Aware Multimodal Refinement for Multimodal Sentiment Analysis with ASR Errors" Clone Clone th

14 Jan 03, 2023
Application to help find best train itinerary, uses speech to text, has a spam filter to segregate invalid inputs, NLP and Pathfinding algos.

T-IAI-901-MSC2022 - GROUP 18 Gestion de projet Notre travail a été organisé et réparti dans un Trello. https://trello.com/b/X3s2fpPJ/ia-projet Install

1 Feb 05, 2022
pyupbit 라이브러리를 활용하여 upbit에서 비트코인을 자동매매하는 코드입니다. 조코딩 유튜브 채널에서 자세한 강의 영상을 보실 수 있습니다.

파이썬 비트코인 투자 자동화 강의 코드 by 유튜브 조코딩 채널 pyupbit 라이브러리를 활용하여 upbit 거래소에서 비트코인 자동매매를 하는 코드입니다. 파일 구성 test.py : 잔고 조회 (1강) backtest.py : 백테스팅 코드 (2강) bestK.p

조코딩 JoCoding 186 Dec 29, 2022
My implementation of Safaricom Machine Learning Codility test. The code has bugs, logical I guess I made errors and any correction will be appreciated.

Safaricom_Codility Machine Learning 2022 The test entails two questions. Question 1 was on Machine Learning. Question 2 was on SQL I ran out of time.

Lawrence M. 1 Mar 03, 2022
This is the library for the Unbounded Interleaved-State Recurrent Neural Network (UIS-RNN) algorithm, corresponding to the paper Fully Supervised Speaker Diarization.

UIS-RNN Overview This is the library for the Unbounded Interleaved-State Recurrent Neural Network (UIS-RNN) algorithm. UIS-RNN solves the problem of s

Google 1.4k Dec 28, 2022
Auto translate textbox from Japanese to English or Indonesia

priconne-auto-translate Auto translate textbox from Japanese to English or Indonesia How to use Install python first, Anaconda is recommended Install

Aji Priyo Wibowo 5 Aug 25, 2022
用Resnet101+GPT搭建一个玩王者荣耀的AI

基于pytorch框架用resnet101加GPT搭建AI玩王者荣耀 本源码模型主要用了SamLynnEvans Transformer 的源码的解码部分。以及pytorch自带的预训练模型"resnet101-5d3b4d8f.pth"

冯泉荔 2.2k Jan 03, 2023
Code for papers "Generation-Augmented Retrieval for Open-Domain Question Answering" and "Reader-Guided Passage Reranking for Open-Domain Question Answering", ACL 2021

This repo provides the code of the following papers: (GAR) "Generation-Augmented Retrieval for Open-domain Question Answering", ACL 2021 (RIDER) "Read

morning 49 Dec 26, 2022
Persian Bert For Long-Range Sequences

ParsBigBird: Persian Bert For Long-Range Sequences The Bert and ParsBert algorithms can handle texts with token lengths of up to 512, however, many ta

Sajjad Ayoubi 63 Dec 14, 2022
Summarization module based on KoBART

KoBART-summarization Install KoBART pip install git+https://github.com/SKT-AI/KoBART#egg=kobart Requirements pytorch==1.7.0 transformers==4.0.0 pytor

seujung hwan, Jung 148 Dec 28, 2022
TextFlint is a multilingual robustness evaluation platform for natural language processing tasks,

TextFlint is a multilingual robustness evaluation platform for natural language processing tasks, which unifies general text transformation, task-specific transformation, adversarial attack, sub-popu

TextFlint 587 Dec 20, 2022
A PyTorch implementation of paper "Learning Shared Semantic Space for Speech-to-Text Translation", ACL (Findings) 2021

Chimera: Learning Shared Semantic Space for Speech-to-Text Translation This is a Pytorch implementation for the "Chimera" paper Learning Shared Semant

Chi Han 43 Dec 28, 2022
Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.

OpenSpeech provides reference implementations of various ASR modeling papers and three languages recipe to perform tasks on automatic speech recogniti

Soohwan Kim 26 Dec 14, 2022
Built for cleaning purposes in military institutions

Ferramenta do AL Construído para fins de limpeza em instituições militares. Instalação Requer python = 3.2 pip install -r requirements.txt Usagem Exe

0 Aug 13, 2022
Toy example of an applied ML pipeline for me to experiment with MLOps tools.

Toy Machine Learning Pipeline Table of Contents About Getting Started ML task description and evaluation procedure Dataset description Repository stru

Shreya Shankar 190 Dec 21, 2022