🤗 Transformers: State-of-the-art Natural Language Processing for Pytorch, TensorFlow, and JAX.

Overview



Build GitHub Documentation GitHub release Contributor Covenant DOI

English | 简体中文 | 繁體中文

State-of-the-art Natural Language Processing for Jax, PyTorch and TensorFlow

🤗 Transformers provides thousands of pretrained models to perform tasks on texts such as classification, information extraction, question answering, summarization, translation, text generation and more in over 100 languages. Its aim is to make cutting-edge NLP easier to use for everyone.

🤗 Transformers provides APIs to quickly download and use those pretrained models on a given text, fine-tune them on your own datasets and then share them with the community on our model hub. At the same time, each python module defining an architecture is fully standalone and can be modified to enable quick research experiments.

🤗 Transformers is backed by the three most popular deep learning libraries — Jax, PyTorch and TensorFlow — with a seamless integration between them. It's straightforward to train your models with one before loading them for inference with the other.

Online demos

You can test most of our models directly on their pages from the model hub. We also offer private model hosting, versioning, & an inference API for public and private models.

Here are a few examples:

Write With Transformer, built by the Hugging Face team, is the official demo of this repo’s text generation capabilities.

If you are looking for custom support from the Hugging Face team

HuggingFace Expert Acceleration Program

Quick tour

To immediately use a model on a given text, we provide the pipeline API. Pipelines group together a pretrained model with the preprocessing that was used during that model's training. Here is how to quickly use a pipeline to classify positive versus negative texts:

>>> from transformers import pipeline

# Allocate a pipeline for sentiment-analysis
>>> classifier = pipeline('sentiment-analysis')
>>> classifier('We are very happy to introduce pipeline to the transformers repository.')
[{'label': 'POSITIVE', 'score': 0.9996980428695679}]

The second line of code downloads and caches the pretrained model used by the pipeline, while the third evaluates it on the given text. Here the answer is "positive" with a confidence of 99.97%.

Many NLP tasks have a pre-trained pipeline ready to go. For example, we can easily extract question answers given context:

>>> from transformers import pipeline

# Allocate a pipeline for question-answering
>>> question_answerer = pipeline('question-answering')
>>> question_answerer({
...     'question': 'What is the name of the repository ?',
...     'context': 'Pipeline has been included in the huggingface/transformers repository'
... })
{'score': 0.30970096588134766, 'start': 34, 'end': 58, 'answer': 'huggingface/transformers'}

In addition to the answer, the pretrained model used here returned its confidence score, along with the start position and end position of the answer in the tokenized sentence. You can learn more about the tasks supported by the pipeline API in this tutorial.

To download and use any of the pretrained models on your given task, all it takes is three lines of code. Here is the PyTorch version:

>>> from transformers import AutoTokenizer, AutoModel

>>> tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
>>> model = AutoModel.from_pretrained("bert-base-uncased")

>>> inputs = tokenizer("Hello world!", return_tensors="pt")
>>> outputs = model(**inputs)

And here is the equivalent code for TensorFlow:

>>> from transformers import AutoTokenizer, TFAutoModel

>>> tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
>>> model = TFAutoModel.from_pretrained("bert-base-uncased")

>>> inputs = tokenizer("Hello world!", return_tensors="tf")
>>> outputs = model(**inputs)

The tokenizer is responsible for all the preprocessing the pretrained model expects, and can be called directly on a single string (as in the above examples) or a list. It will output a dictionary that you can use in downstream code or simply directly pass to your model using the ** argument unpacking operator.

The model itself is a regular Pytorch nn.Module or a TensorFlow tf.keras.Model (depending on your backend) which you can use normally. This tutorial explains how to integrate such a model into a classic PyTorch or TensorFlow training loop, or how to use our Trainer API to quickly fine-tune on a new dataset.

Why should I use transformers?

  1. Easy-to-use state-of-the-art models:

    • High performance on NLU and NLG tasks.
    • Low barrier to entry for educators and practitioners.
    • Few user-facing abstractions with just three classes to learn.
    • A unified API for using all our pretrained models.
  2. Lower compute costs, smaller carbon footprint:

    • Researchers can share trained models instead of always retraining.
    • Practitioners can reduce compute time and production costs.
    • Dozens of architectures with over 2,000 pretrained models, some in more than 100 languages.
  3. Choose the right framework for every part of a model's lifetime:

    • Train state-of-the-art models in 3 lines of code.
    • Move a single model between TF2.0/PyTorch frameworks at will.
    • Seamlessly pick the right framework for training, evaluation and production.
  4. Easily customize a model or an example to your needs:

    • We provide examples for each architecture to reproduce the results published by its original authors.
    • Model internals are exposed as consistently as possible.
    • Model files can be used independently of the library for quick experiments.

Why shouldn't I use transformers?

  • This library is not a modular toolbox of building blocks for neural nets. The code in the model files is not refactored with additional abstractions on purpose, so that researchers can quickly iterate on each of the models without diving into additional abstractions/files.
  • The training API is not intended to work on any model but is optimized to work with the models provided by the library. For generic machine learning loops, you should use another library.
  • While we strive to present as many use cases as possible, the scripts in our examples folder are just that: examples. It is expected that they won't work out-of-the box on your specific problem and that you will be required to change a few lines of code to adapt them to your needs.

Installation

With pip

This repository is tested on Python 3.6+, Flax 0.3.2+, PyTorch 1.3.1+ and TensorFlow 2.3+.

You should install 🤗 Transformers in a virtual environment. If you're unfamiliar with Python virtual environments, check out the user guide.

First, create a virtual environment with the version of Python you're going to use and activate it.

Then, you will need to install at least one of Flax, PyTorch or TensorFlow. Please refer to TensorFlow installation page, PyTorch installation page and/or Flax installation page regarding the specific install command for your platform.

When one of those backends has been installed, 🤗 Transformers can be installed using pip as follows:

pip install transformers

If you'd like to play with the examples or need the bleeding edge of the code and can't wait for a new release, you must install the library from source.

With conda

Since Transformers version v4.0.0, we now have a conda channel: huggingface.

🤗 Transformers can be installed using conda as follows:

conda install -c huggingface transformers

Follow the installation pages of Flax, PyTorch or TensorFlow to see how to install them with conda.

Model architectures

All the model checkpoints provided by 🤗 Transformers are seamlessly integrated from the huggingface.co model hub where they are uploaded directly by users and organizations.

Current number of checkpoints:

🤗 Transformers currently provides the following architectures (see here for a high-level summary of each them):

  1. ALBERT (from Google Research and the Toyota Technological Institute at Chicago) released with the paper ALBERT: A Lite BERT for Self-supervised Learning of Language Representations, by Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut.
  2. BART (from Facebook) released with the paper BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension by Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov and Luke Zettlemoyer.
  3. BARThez (from École polytechnique) released with the paper BARThez: a Skilled Pretrained French Sequence-to-Sequence Model by Moussa Kamal Eddine, Antoine J.-P. Tixier, Michalis Vazirgiannis.
  4. BEiT (from Microsoft) released with the paper BEiT: BERT Pre-Training of Image Transformers by Hangbo Bao, Li Dong, Furu Wei.
  5. BERT (from Google) released with the paper BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova.
  6. BERT For Sequence Generation (from Google) released with the paper Leveraging Pre-trained Checkpoints for Sequence Generation Tasks by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
  7. BigBird-RoBERTa (from Google Research) released with the paper Big Bird: Transformers for Longer Sequences by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed.
  8. BigBird-Pegasus (from Google Research) released with the paper Big Bird: Transformers for Longer Sequences by Manzil Zaheer, Guru Guruganesh, Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, Amr Ahmed.
  9. Blenderbot (from Facebook) released with the paper Recipes for building an open-domain chatbot by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
  10. BlenderbotSmall (from Facebook) released with the paper Recipes for building an open-domain chatbot by Stephen Roller, Emily Dinan, Naman Goyal, Da Ju, Mary Williamson, Yinhan Liu, Jing Xu, Myle Ott, Kurt Shuster, Eric M. Smith, Y-Lan Boureau, Jason Weston.
  11. BORT (from Alexa) released with the paper Optimal Subarchitecture Extraction For BERT by Adrian de Wynter and Daniel J. Perry.
  12. ByT5 (from Google Research) released with the paper ByT5: Towards a token-free future with pre-trained byte-to-byte models by Linting Xue, Aditya Barua, Noah Constant, Rami Al-Rfou, Sharan Narang, Mihir Kale, Adam Roberts, Colin Raffel.
  13. CamemBERT (from Inria/Facebook/Sorbonne) released with the paper CamemBERT: a Tasty French Language Model by Louis Martin*, Benjamin Muller*, Pedro Javier Ortiz Suárez*, Yoann Dupont, Laurent Romary, Éric Villemonte de la Clergerie, Djamé Seddah and Benoît Sagot.
  14. CANINE (from Google Research) released with the paper CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation by Jonathan H. Clark, Dan Garrette, Iulia Turc, John Wieting.
  15. CLIP (from OpenAI) released with the paper Learning Transferable Visual Models From Natural Language Supervision by Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, Ilya Sutskever.
  16. ConvBERT (from YituTech) released with the paper ConvBERT: Improving BERT with Span-based Dynamic Convolution by Zihang Jiang, Weihao Yu, Daquan Zhou, Yunpeng Chen, Jiashi Feng, Shuicheng Yan.
  17. CPM (from Tsinghua University) released with the paper CPM: A Large-scale Generative Chinese Pre-trained Language Model by Zhengyan Zhang, Xu Han, Hao Zhou, Pei Ke, Yuxian Gu, Deming Ye, Yujia Qin, Yusheng Su, Haozhe Ji, Jian Guan, Fanchao Qi, Xiaozhi Wang, Yanan Zheng, Guoyang Zeng, Huanqi Cao, Shengqi Chen, Daixuan Li, Zhenbo Sun, Zhiyuan Liu, Minlie Huang, Wentao Han, Jie Tang, Juanzi Li, Xiaoyan Zhu, Maosong Sun.
  18. CTRL (from Salesforce) released with the paper CTRL: A Conditional Transformer Language Model for Controllable Generation by Nitish Shirish Keskar*, Bryan McCann*, Lav R. Varshney, Caiming Xiong and Richard Socher.
  19. DeBERTa (from Microsoft) released with the paper DeBERTa: Decoding-enhanced BERT with Disentangled Attention by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
  20. DeBERTa-v2 (from Microsoft) released with the paper DeBERTa: Decoding-enhanced BERT with Disentangled Attention by Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen.
  21. DeiT (from Facebook) released with the paper Training data-efficient image transformers & distillation through attention by Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, Hervé Jégou.
  22. DETR (from Facebook) released with the paper End-to-End Object Detection with Transformers by Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, Sergey Zagoruyko.
  23. DialoGPT (from Microsoft Research) released with the paper DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation by Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan.
  24. DistilBERT (from HuggingFace), released together with the paper DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter by Victor Sanh, Lysandre Debut and Thomas Wolf. The same method has been applied to compress GPT2 into DistilGPT2, RoBERTa into DistilRoBERTa, Multilingual BERT into DistilmBERT and a German version of DistilBERT.
  25. DPR (from Facebook) released with the paper Dense Passage Retrieval for Open-Domain Question Answering by Vladimir Karpukhin, Barlas Oğuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih.
  26. EncoderDecoder (from Google Research) released with the paper Leveraging Pre-trained Checkpoints for Sequence Generation Tasks by Sascha Rothe, Shashi Narayan, Aliaksei Severyn.
  27. ELECTRA (from Google Research/Stanford University) released with the paper ELECTRA: Pre-training text encoders as discriminators rather than generators by Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning.
  28. FlauBERT (from CNRS) released with the paper FlauBERT: Unsupervised Language Model Pre-training for French by Hang Le, Loïc Vial, Jibril Frej, Vincent Segonne, Maximin Coavoux, Benjamin Lecouteux, Alexandre Allauzen, Benoît Crabbé, Laurent Besacier, Didier Schwab.
  29. FNet (from Google Research) released with the paper FNet: Mixing Tokens with Fourier Transforms by James Lee-Thorp, Joshua Ainslie, Ilya Eckstein, Santiago Ontanon.
  30. Funnel Transformer (from CMU/Google Brain) released with the paper Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing by Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le.
  31. GPT (from OpenAI) released with the paper Improving Language Understanding by Generative Pre-Training by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever.
  32. GPT-2 (from OpenAI) released with the paper Language Models are Unsupervised Multitask Learners by Alec Radford*, Jeffrey Wu*, Rewon Child, David Luan, Dario Amodei** and Ilya Sutskever**.
  33. GPT-J (from EleutherAI) released in the repository kingoflolz/mesh-transformer-jax by Ben Wang and Aran Komatsuzaki.
  34. GPT Neo (from EleutherAI) released in the repository EleutherAI/gpt-neo by Sid Black, Stella Biderman, Leo Gao, Phil Wang and Connor Leahy.
  35. Hubert (from Facebook) released with the paper HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units by Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, Abdelrahman Mohamed.
  36. I-BERT (from Berkeley) released with the paper I-BERT: Integer-only BERT Quantization by Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney, Kurt Keutzer.
  37. LayoutLM (from Microsoft Research Asia) released with the paper LayoutLM: Pre-training of Text and Layout for Document Image Understanding by Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou.
  38. LayoutLMv2 (from Microsoft Research Asia) released with the paper LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding by Yang Xu, Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Wanxiang Che, Min Zhang, Lidong Zhou.
  39. LayoutXLM (from Microsoft Research Asia) released with the paper LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding by Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei.
  40. LED (from AllenAI) released with the paper Longformer: The Long-Document Transformer by Iz Beltagy, Matthew E. Peters, Arman Cohan.
  41. Longformer (from AllenAI) released with the paper Longformer: The Long-Document Transformer by Iz Beltagy, Matthew E. Peters, Arman Cohan.
  42. LUKE (from Studio Ousia) released with the paper LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention by Ikuya Yamada, Akari Asai, Hiroyuki Shindo, Hideaki Takeda, Yuji Matsumoto.
  43. LXMERT (from UNC Chapel Hill) released with the paper LXMERT: Learning Cross-Modality Encoder Representations from Transformers for Open-Domain Question Answering by Hao Tan and Mohit Bansal.
  44. M2M100 (from Facebook) released with the paper Beyond English-Centric Multilingual Machine Translation by Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman Goyal, Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin.
  45. MarianMT Machine translation models trained using OPUS data by Jörg Tiedemann. The Marian Framework is being developed by the Microsoft Translator Team.
  46. MBart (from Facebook) released with the paper Multilingual Denoising Pre-training for Neural Machine Translation by Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer.
  47. MBart-50 (from Facebook) released with the paper Multilingual Translation with Extensible Multilingual Pretraining and Finetuning by Yuqing Tang, Chau Tran, Xian Li, Peng-Jen Chen, Naman Goyal, Vishrav Chaudhary, Jiatao Gu, Angela Fan.
  48. Megatron-BERT (from NVIDIA) released with the paper Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism by Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro.
  49. Megatron-GPT2 (from NVIDIA) released with the paper Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism by Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper and Bryan Catanzaro.
  50. MPNet (from Microsoft Research) released with the paper MPNet: Masked and Permuted Pre-training for Language Understanding by Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu.
  51. MT5 (from Google AI) released with the paper mT5: A massively multilingual pre-trained text-to-text transformer by Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel.
  52. Pegasus (from Google) released with the paper PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu.
  53. ProphetNet (from Microsoft Research) released with the paper ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
  54. Reformer (from Google Research) released with the paper Reformer: The Efficient Transformer by Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya.
  55. RemBERT (from Google Research) released with the paper Rethinking embedding coupling in pre-trained language models by Hyung Won Chung, Thibault Févry, Henry Tsai, M. Johnson, Sebastian Ruder.
  56. RoBERTa (from Facebook), released together with the paper a Robustly Optimized BERT Pretraining Approach by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov.
  57. RoFormer (from ZhuiyiTechnology), released together with the paper a RoFormer: Enhanced Transformer with Rotary Position Embedding by Jianlin Su and Yu Lu and Shengfeng Pan and Bo Wen and Yunfeng Liu.
  58. SpeechEncoderDecoder
  59. SpeechToTextTransformer (from Facebook), released together with the paper fairseq S2T: Fast Speech-to-Text Modeling with fairseq by Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Dmytro Okhonko, Juan Pino.
  60. SpeechToTextTransformer2 (from Facebook), released together with the paper Large-Scale Self- and Semi-Supervised Learning for Speech Translation by Changhan Wang, Anne Wu, Juan Pino, Alexei Baevski, Michael Auli, Alexis Conneau.
  61. Splinter (from Tel Aviv University), released together with the paper Few-Shot Question Answering by Pretraining Span Selection by Ori Ram, Yuval Kirstain, Jonathan Berant, Amir Globerson, Omer Levy.
  62. SqueezeBert (from Berkeley) released with the paper SqueezeBERT: What can computer vision teach NLP about efficient neural networks? by Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, and Kurt W. Keutzer.
  63. T5 (from Google AI) released with the paper Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
  64. T5v1.1 (from Google AI) released in the repository google-research/text-to-text-transfer-transformer by Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu.
  65. TAPAS (from Google AI) released with the paper TAPAS: Weakly Supervised Table Parsing via Pre-training by Jonathan Herzig, Paweł Krzysztof Nowak, Thomas Müller, Francesco Piccinno and Julian Martin Eisenschlos.
  66. Transformer-XL (from Google/CMU) released with the paper Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context by Zihang Dai*, Zhilin Yang*, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov.
  67. Vision Transformer (ViT) (from Google AI) released with the paper An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale by Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby.
  68. VisualBERT (from UCLA NLP) released with the paper VisualBERT: A Simple and Performant Baseline for Vision and Language by Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, Kai-Wei Chang.
  69. Wav2Vec2 (from Facebook AI) released with the paper wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations by Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli.
  70. XLM (from Facebook) released together with the paper Cross-lingual Language Model Pretraining by Guillaume Lample and Alexis Conneau.
  71. XLM-ProphetNet (from Microsoft Research) released with the paper ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
  72. XLM-RoBERTa (from Facebook AI), released together with the paper Unsupervised Cross-lingual Representation Learning at Scale by Alexis Conneau*, Kartikay Khandelwal*, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov.
  73. XLNet (from Google/CMU) released with the paper ​XLNet: Generalized Autoregressive Pretraining for Language Understanding by Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.
  74. XLSR-Wav2Vec2 (from Facebook AI) released with the paper Unsupervised Cross-Lingual Representation Learning For Speech Recognition by Alexis Conneau, Alexei Baevski, Ronan Collobert, Abdelrahman Mohamed, Michael Auli.
  75. Want to contribute a new model? We have added a detailed guide and templates to guide you in the process of adding a new model. You can find them in the templates folder of the repository. Be sure to check the contributing guidelines and contact the maintainers or open an issue to collect feedbacks before starting your PR.

To check if each model has an implementation in Flax, PyTorch or TensorFlow, or has an associated tokenizer backed by the 🤗 Tokenizers library, refer to this table.

These implementations have been tested on several datasets (see the example scripts) and should match the performance of the original implementations. You can find more details on performance in the Examples section of the documentation.

Learn more

Section Description
Documentation Full API documentation and tutorials
Task summary Tasks supported by 🤗 Transformers
Preprocessing tutorial Using the Tokenizer class to prepare data for the models
Training and fine-tuning Using the models provided by 🤗 Transformers in a PyTorch/TensorFlow training loop and the Trainer API
Quick tour: Fine-tuning/usage scripts Example scripts for fine-tuning models on a wide range of tasks
Model sharing and uploading Upload and share your fine-tuned models with the community
Migration Migrate to 🤗 Transformers from pytorch-transformers or pytorch-pretrained-bert

Citation

We now have a paper you can cite for the 🤗 Transformers library:

@inproceedings{wolf-etal-2020-transformers,
    title = "Transformers: State-of-the-Art Natural Language Processing",
    author = "Thomas Wolf and Lysandre Debut and Victor Sanh and Julien Chaumond and Clement Delangue and Anthony Moi and Pierric Cistac and Tim Rault and Rémi Louf and Morgan Funtowicz and Joe Davison and Sam Shleifer and Patrick von Platen and Clara Ma and Yacine Jernite and Julien Plu and Canwen Xu and Teven Le Scao and Sylvain Gugger and Mariama Drame and Quentin Lhoest and Alexander M. Rush",
    booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations",
    month = oct,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.emnlp-demos.6",
    pages = "38--45"
}
Comments
  • How to use BERT for finding similar sentences or similar news?

    How to use BERT for finding similar sentences or similar news?

    I have used BERT NextSentencePredictor to find similar sentences or similar news, However, It's super slow. Even on Tesla V100 which is the fastest GPU till now. It takes around 10secs for a query title with around 3,000 articles. Is there a way to use BERT better for finding similar sentences or similar news given a corpus of news articles?

    opened by Raghavendra15 161
  • Summarization Fine Tuning

    Summarization Fine Tuning

    ❓ Questions & Help

    Details

    I tried using T5 and Bart but the abstraction summarization on scientific texts does not seem to give the results I want since I think they are both trained on news corpora. I have scraped all of the free PMC articles and I am thinking about fine-tuning a seq2seq model between the articles and their abstracts to make an abstractive summarizer for scientific texts. This Medium article (https://medium.com/huggingface/encoder-decoders-in-transformers-a-hybrid-pre-trained-architecture-for-seq2seq-af4d7bf14bb8) provides a bit of an introduction to how to approach this but does not quite go into detail so I am wondering how to approach this.

    I'm not really asking for help being stuck but I just don't really know how to approach this problem.

    A link to original question on Stack Overflow: https://stackoverflow.com/questions/61826443/train-custom-seq2seq-transformers-model

    Discussion wontfix 
    opened by kevinlu1248 79
  • ONNXConfig: Add a configuration for all available models

    ONNXConfig: Add a configuration for all available models

    This issue is about the working group specially created for this task. If you are interested in helping out, take a look at this organization, or add me on Discord: ChainYo#3610

    We want to contribute to HuggingFace's ONNX implementation for all available models on HF's hub. There are already a lot of architectures implemented for converting PyTorch models to ONNX, but we need more! We need them all!

    Feel free to join us in this adventure! Join the org by clicking here

    Here is a non-exhaustive list of models that all models available:

    • [x] Albert
    • [x] BART
    • [x] BeiT
    • [x] BERT
    • [x] BigBird
    • [x] BigBirdPegasus
    • [x] Blenderbot
    • [x] BlenderbotSmall
    • [x] BLOOM
    • [x] CamemBERT
    • [ ] CANINE
    • [x] CLIP
    • [x] CodeGen
    • [x] ConvNext
    • [x] ConvBert
    • [ ] CTRL
    • [ ] CvT
    • [x] Data2VecText
    • [x] Data2VecVision
    • [x] Deberta
    • [x] DebertaV2
    • [x] DeiT
    • [ ] DecisionTransformer
    • [x] DETR
    • [x] Distilbert
    • [ ] DPR
    • [ ] DPT
    • [x] ELECTRA
    • [ ] FNet
    • [ ] FSMT
    • [x] Flaubert
    • [ ] FLAVA
    • [ ] Funnel Transformer
    • [ ] GLPN
    • [x] GPT2
    • [x] GPTJ
    • [x] GPT-Neo
    • [ ] GPT-NeoX
    • [ ] Hubert
    • [x] I-Bert
    • [ ] ImageGPT
    • [ ] LED
    • [x] LayoutLM
    • [ ] 🛠️ LayoutLMv2
    • [x] LayoutLMv3
    • [ ] LayoutXLM
    • [ ] LED
    • [x] LeViT
    • [x] Longformer
    • [x] LongT5
    • [ ] 🛠️ Luke
    • [ ] Lxmert
    • [x] M2M100
    • [ ] MaskFormer
    • [x] mBart
    • [ ] MCTCT
    • [ ] MPNet
    • [x] MT5
    • [x] MarianMT
    • [ ] MegatronBert
    • [x] MobileBert
    • [x] MobileViT
    • [ ] Nyströmformer
    • [x] OpenAIGPT-2
    • [ ] 🛠️ OPT
    • [x] OWLViT
    • [x] PLBart
    • [ ] Pegasus
    • [x] Perceiver
    • [ ] PoolFormer
    • [ ] ProphetNet
    • [ ] QDQBERT
    • [ ] RAG
    • [ ] REALM
    • [ ] 🛠️ Reformer
    • [x] RemBert
    • [x] ResNet
    • [ ] RegNet
    • [ ] RetriBert
    • [x] RoFormer
    • [x] RoBERTa
    • [ ] SEW
    • [ ] SEW-D
    • [ ] SegFormer
    • [ ] Speech2Text
    • [ ] Speech2Text2
    • [ ] Splinter
    • [x] SqueezeBERT
    • [ ] Swin Transformer
    • [x] T5
    • [ ] TAPAS
    • [ ] TAPEX
    • [ ] Transformer XL
    • [x] TrOCR
    • [ ] UniSpeech
    • [ ] UniSpeech-SAT
    • [ ] VAN
    • [x] ViT
    • [ ] Vilt
    • [ ] VisualBERT
    • [ ] Wav2Vec2
    • [ ] WavLM
    • [ ] XGLM
    • [x] XLM
    • [ ] XLMProphetNet
    • [x] XLM-RoBERTa
    • [x] XLM-RoBERTa-XL
    • [ ] 🛠️ XLNet
    • [x] YOLOS
    • [ ] Yoso

    🛠️ next to a model suggests that the PR is in progress. If there is nothing next to a model, it means that ONNX does not yet support the model, and thus we need to add support for it.

    If you need help implementing an unsupported model, here is a guide from HuggingFace's documentation.

    If you want an example of implementation, I did one for CamemBERT months ago.

    Good First Issue 
    opened by ChainYo 76
  • GPT-J-6B

    GPT-J-6B

    What does this PR do?

    Introduces the long awaited GPT J model class to HuggingFace! Concurrently with this PR being merged I will make a GPT J 6B checkpoint public on the EleutherAI HF page for people to use. The model has been evaluated as being within error tolerances of the GPT J 6B model we released in Jax two months ago.

    @patil-suraj was very helpful in assisting me to understand HF philosophy and how to make this PR most in line with the rest of the codebase. Other than that, the major design consideration was to make the configs compatible with GPT-2 rather than GPT-Neo. GPT-Neo has some usability limitations due to its configs having names unrelated to GPT-2’s (see #12183 for details). Given those problems and my hope that GPT-Neo will have it’s configs updated in the future, it seemed like a clear choice to align GPT J with GPT-2.

    Shout outs to @finetuneanon whose implementation this one is based off of, as well as @kumuruz for assistence optimizing and debugging.

    Supersedes #12243 #13010 #13022

    Closes #12098

    Before submitting

    • [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
    • [X] Did you read the contributor guideline, Pull Request section?
    • [X] Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case. It was discussed in Slack with @patil-suraj
    • [X] Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
    • [X] Did you write any new necessary tests?

    Who can review?

    • gpt2: @patrickvonplaten, @LysandreJik, @patil-suraj
    opened by StellaAthena 75
  • [DeepSpeed] [success] trained t5-11b on 1x 40GB gpu

    [DeepSpeed] [success] trained t5-11b on 1x 40GB gpu

    Managed to train t5-11b on 1x 40GB gpu w/ Deepspeed (A100-SXM4-40GB)

    Thank you, @PeterAJansen for letting me use your hardware!

    Thank you, @jeffra and @samyam, for not believing that it is not possible to train t5-11b on 1x 40GB gpu w/ Deepspeed and supporting me that lead me to find a few bugs in the integration.

    Sharing details for those who need.

    If you want to try this at home please make sure you use transformers master as some bug fixes were just merged in

    Well, it's similar to the t5-3b on 24GB success reported here and here. But this time t5-11b on 1x 40GB gpu (or 4x if you wanted things faster)

    As someone asked me before you need a huge amount of general RAM to use ZeRO-Offload for a huge model:

    • for t5-3b on 1x 24GB gpu: ~71GB RAM
    • for t5-11b on 1x 40GB gpu: ~234GB RAM

    I was using /usr/bin/time -v program to get the peak memory measurement - it's the Maximum resident set size entry in the final report.

    Question: I don't think /usr/bin/time does the right thing for multi-process - I think it only measures the parent process. e.g. with 4x gpus it reported only 102GB RAM, but I clearly saw in top that it was around 240GB. If you have an easy way to measure peak memory that takes into an account forked processes I'm all ears.

    Batch sizes on one gpu:

    • with buffers of 5e8 I was able to run BS=2, which might be too small for training,
    • but with 2e8 I managed to squeeze in BS=10 for training, but OOMed on prediction

    I'm referring to these batch sizes in ds_config.json:

            "allgather_bucket_size": 2e8,
            "reduce_bucket_size": 2e8,
    

    And I tested for 2x and 4x DDP as well, BS=16 OOMed, BS=8 was good so I used that - but could probably squeeze some more.

    edit1: later tests show that my test was too short and wasn't getting the CPU Adam optimizer kick in, as it skips the first 20 or so tests because of the overflow. So once it kicks in it takes more GPU memory, so the practical BS is much smaller - I think around 2 on this setup. So most likely you will need to use BS=2 for real work, until things get optimized even more.

    edit2: things are getting re-shuffling in the tests, so the default ds_config.json file has moved in master to a new, hopefully permanent home. It's now at examples/tests/deepspeed/ds_config.json so you will need to adjust the command line to reflect this new location or simply copy it over to where the old one used to be.

    here is the full benchmark:

    # 1 gpu: 
    # only training fits with this BS, eval needs a smaller BS
    
    export BS=8; rm -rf output_dir; PYTHONPATH=../../src USE_TF=0 /usr/bin/time -v deepspeed --num_gpus=1 ./finetune_trainer.py --model_name_or_path t5-11b --output_dir output_dir --adam_eps 1e-06 --data_dir wmt_en_ro --do_eval --do_predict --do_train --evaluation_strategy=steps --freeze_embeds --label_smoothing 0.1 --learning_rate 3e-5 --logging_first_step --logging_steps 1000 --max_source_length 128 --max_target_length 128 --num_train_epochs 1 --overwrite_output_dir --per_device_eval_batch_size $BS --per_device_train_batch_size $BS --predict_with_generate --eval_steps 25000  --sortish_sampler --task translation_en_to_ro --test_max_target_length 128 --val_max_target_length 128 --warmup_steps 5 --n_train 60 --n_val 10 --n_test 10 --deepspeed ds_config.json --fp16
    
    {'train_runtime': 31.0897, 'train_samples_per_second': 0.257, 'epoch': 1.0}
    
    # 2 gpus:
    
    export BS=8; rm -rf output_dir; PYTHONPATH=../../src USE_TF=0 /usr/bin/time -v deepspeed --num_gpus=2 ./finetune_trainer.py --model_name_or_path t5-11b --output_dir output_dir --adam_eps 1e-06 --data_dir wmt_en_ro --do_eval --do_predict --do_train --evaluation_strategy=steps --freeze_embeds --label_smoothing 0.1 --learning_rate 3e-5 --logging_first_step --logging_steps 1000 --max_source_length 128 --max_target_length 128 --num_train_epochs 1 --overwrite_output_dir --per_device_eval_batch_size $BS --per_device_train_batch_size $BS --predict_with_generate --eval_steps 25000  --sortish_sampler --task translation_en_to_ro --test_max_target_length 128 --val_max_target_length 128 --warmup_steps 5 --n_train 60 --n_val 10 --n_test 10 --deepspeed ds_config.json --fp16
    
    {'train_runtime': 17.9026, 'train_samples_per_second': 0.223, 'epoch': 1.0}
    
    # 4 gpus
    
    export BS=8; rm -rf output_dir; PYTHONPATH=../../src USE_TF=0 /usr/bin/time -v deepspeed --num_gpus=4 ./finetune_trainer.py --model_name_or_path t5-11b --output_dir output_dir --adam_eps 1e-06 --data_dir wmt_en_ro --do_eval --do_predict --do_train --evaluation_strategy=steps --freeze_embeds --label_smoothing 0.1 --learning_rate 3e-5 --logging_first_step --logging_steps 1000 --max_source_length 128 --max_target_length 128 --num_train_epochs 1 --overwrite_output_dir --per_device_eval_batch_size $BS --per_device_train_batch_size $BS --predict_with_generate --eval_steps 25000  --sortish_sampler --task translation_en_to_ro --test_max_target_length 128 --val_max_target_length 128 --warmup_steps 5 --n_train 60 --n_val 10 --n_test 10 --deepspeed ds_config.json --fp16
    
    {'train_runtime': 10.4404, 'train_samples_per_second': 0.192, 'epoch': 1.0}
    

    Checkpointing should allow making even bigger batch sizes.

    DeepSpeed 
    opened by stas00 65
  • FP16 overflow with GPT-Neo when using sequence lengths of 2048.

    FP16 overflow with GPT-Neo when using sequence lengths of 2048.

    Environment info

    • transformers version: 4.5.0.dev0
    • Platform: Linux-5.4.0-54-generic-x86_64-with-glibc2.29
    • Python version: 3.8.5
    • PyTorch version (GPU?): 1.8.0+cu111
    • Tensorflow version (GPU?): N/A
    • Using GPU in script?: Yes
    • Using distributed or parallel set-up in script?: No

    Who can help

    @stas00

    Models:

    • GPT-Neo 1.3b

    Library:

    • deepspeed: @stas00

    Information

    Model I am using (Bert, XLNet ...):

    The problem arises when using:

    • [ ] the official example scripts: (give details below)
    • [x] my own modified scripts: (give details below)

    The tasks I am working on is:

    • [ ] an official GLUE/SQUaD task: (give the name)
    • [x] my own task or dataset: (give details below)

    To reproduce

    Steps to reproduce the behavior:

    1. Use GPT-Neo 1.3b with The Pile dataset and built in trainer. Artificial data also suffices. It does not matter what the data is, as long as the attention mask spans all 2048 tokens.
    2. Enable FP16 and set max_length to 2048
    3. Observe that all loses reported are NaN

    Also reproducible using AMP or DeepSpeed. It seems like there is code to circumvent this outlined in the GPT-Neo implementation where q,k,v are casted to fp32 in the attention block.

    When the max_length is shorter (512) this overflow does not occur.

    Expected behavior

    I expected no overflows.

    Aside

    I'm reaching out on behalf of EleutherAI, Lysandre told us to create an issue about this.

    opened by LouisCastricato 62
  • [deepspeed] `bigscience/T0*` multi-gpu inference with ZeRO

    [deepspeed] `bigscience/T0*` multi-gpu inference with ZeRO

    Environment info

    • transformers version: 4.17.0.dev0
    • Platform: Linux-5.13.0-27-generic-x86_64-with-glibc2.10
    • Python version: 3.8.0
    • PyTorch version (GPU?): 1.10.1 (True)
    • Tensorflow version (GPU?): not installed (NA)
    • Flax version (CPU?/GPU?/TPU?): not installed (NA)
    • Jax version: not installed
    • JaxLib version: not installed
    • Using GPU in script?: yes
    • Using distributed or parallel set-up in script?: yes (deepspeed)
    • Note: I installed DeepSpeed from source

    Who can help

    Models: (I'm actually trying to use T0pp but T5 is close enough)

    • T5, BART, Marian, Pegasus, EncoderDecoder: @patrickvonplaten

    Library:

    • Deepspeed: @stas00
    • Text generation: @patrickvonplaten @narsil

    Information

    Model I am using (Bert, XLNet ...): T0pp / T0_3B

    The problem arises when using:

    • [ ] the official example scripts: (give details below)
    • [X] my own modified scripts: (give details below)

    The tasks I am working on is:

    • [ ] an official GLUE/SQUaD task: (give the name)
    • [X] my own task or dataset: (give details below)

    To reproduce

    I want to load T0pp across 2 24GB GPUs and only run inference. I know Deepspeed wit zeRO stage 3 is the way to go for this from reading documentation. I am following the HuggingFace example here to use Deepspeed without a Trainer object.

    The error I get is

    [2022-01-28 18:36:41,193] [INFO] [partition_parameters.py:456:__exit__] finished initializing model with 2.85B parameters
    Traceback (most recent call last):
      File "multi_gpu_T0pp.py", line 26, in <module>
        engine = deepspeed.initialize(model=model, config_params=ds_config)
    AttributeError: module 'transformers.deepspeed' has no attribute 'initialize'
    

    My code:

    Run with CUDA_VISIBLE_DEVICES="0,1" deepspeed <script.py>

    """
    Example code to load a PyTorch model across GPUs
    """
    from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
    from transformers.deepspeed import HfDeepSpeedConfig
    from transformers import deepspeed
    import pandas as pd
    import torch
    import pdb
    import os
    
    seed = 42
    torch.manual_seed(seed)
    
    ds_config = {
        "fp16": {
            "enabled": "auto",
            "loss_scale": 0,
            "loss_scale_window": 1000,
            "initial_scale_power": 16,
            "hysteresis": 2,
            "min_loss_scale": 1
        },
        "zero_optimization": {
            "stage": 3,
            "overlap_comm": true,
            "contiguous_gradients": true,
            "sub_group_size": 1e9,
            "reduce_bucket_size": "auto",
            "stage3_prefetch_bucket_size": "auto",
            "stage3_param_persistence_threshold": "auto",
            "stage3_max_live_parameters": 1e9,
            "stage3_max_reuse_distance": 1e9,
            "stage3_gather_fp16_weights_on_model_save": true
        },
        "gradient_accumulation_steps": 1,
        "gradient_clipping": 0,
        "steps_per_print": 2000,
        "train_batch_size": 2,
        "train_micro_batch_size_per_gpu": 1,
        "wall_clock_breakdown": false
    }
    
    if __name__ == "__main__":
        # must run before instantiating the model
        # ds_config is deepspeed config object or path to the file
        dschf = HfDeepSpeedConfig(ds_config)  # keep this object alive
    
        model_name = "bigscience/T0_3B"
        tokenizer = AutoTokenizer.from_pretrained(model_name)
        model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
    
        engine = deepspeed.initialize(model=model, config_params=ds_config)
    
        inputs = tokenizer.encode(
            "Is this review positive or negative? Review: this is the best cast iron skillet you will ever buy",
            return_tensors="pt")
        outputs = model.generate(inputs)
        print(tokenizer.decode(outputs[0]))
    

    Expected behavior

    T0pp (or T0_3B) to load across 2 GPUs, generate an answer, and then quit.

    DeepSpeed 
    opened by AADeLucia 57
  • How to use fine-tuned BART for prediction?

    How to use fine-tuned BART for prediction?

    ❓ Questions & Help

    Details

    I fine-tuned the BART model on a custom summarization dataset using the transformers/examples/summarization/bart/finetune.py and transformers/examples/summarization/bart/run_train.sh files in the repository for training (which generated three checkpointepoch=*.ckpt files) and prediction (which generated a .txt file with the test loss scores).

    I have two questions on using this model for prediction:

    • How can I modify finetune.py to generate predictions for the test set, in addition to the loss scores? I see some test functions in finetune.py, but I'm not sure how to use these for generating a .txt file with the predictions.

    • How can I load the generated .ckpt files into BartForConditionalGeneration()? A config.json file was not generated along with the checkpoint files; there doesn't seem to be a TFBartForConditionalGeneration; and the convert_tf_checkpoint_to_pytorch.py script in the repo doesn't seem to support BART yet.

    Thank you for your time!

    Discussion wontfix 
    opened by riacheruvu 56
  • Add TF ViT MAE

    Add TF ViT MAE

    This PR adds the MAE [1] model in TensorFlow. It was developed by @arig23498 and myself.

    Fun facts about this PR:

    • Probably the third pure vision model in TensorFlow in transformers.

    References:

    [1] Masked Autoencoders Are Scalable Vision Learners

    Update

    The PR is now ready for review. @gante @Rocketknight1 @sgugger

    opened by sayakpaul 49
  • Installation Error - Failed building wheel for tokenizers

    Installation Error - Failed building wheel for tokenizers

    🐛 Bug

    Information

    Model I am using (Bert, XLNet ...): N/A

    Language I am using the model on (English, Chinese ...): N/A

    The problem arises when using:

    • [X] the official example scripts: (give details below)

    Problem arises in transformers installation on Microsoft Windows 10 Pro, version 10.0.17763

    After creating and activating the virtual environment, installing transformers is not possible, because the following error occurs:

    "error: can not find Rust Compiler" "ERROR: Failed building wheel for tokenizers" Failed to build tokenizers ERROR: Could not build wheels for tokenizers which use PEP 517 and cannot be installed d

    The tasks I am working on is: [X ] transformers installation

    To reproduce

    Steps to reproduce the behavior:

    1. From command line interface, create and activate a virtual environment by following the steps in this URL: https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/
    2. Install transformers from source, by following the example in the topic From Source on this URL: https://github.com/huggingface/transformers
    -m pip --version
    -m pip install --upgrade pip
    -m pip install --user virtualenv
    -m venv env
    .\env\Scripts\activate
    pip install transformers
    
    ERROR: Command errored out with exit status 1:
       command: 'c:\users\vbrandao\env\scripts\python.exe' 'c:\users\vbrandao\env\lib\site-packages\pip\_vendor\pep517\_in_process.py' build_wheel 'C:\Users\vbrandao\AppData\Local\Temp\tmpj6evjmze'
           cwd: C:\Users\vbrandao\AppData\Local\Temp\pip-install-sza2_lmj\tokenizers
      Complete output (10 lines):
      running bdist_wheel
      running build
      running build_py
      creating build
      creating build\lib
      creating build\lib\tokenizers
      copying tokenizers\__init__.py -> build\lib\tokenizers
      running build_ext
      running build_rust
      error: Can not find Rust compiler
      ----------------------------------------
      ERROR: Failed building wheel for tokenizers
    Failed to build tokenizers
    ERROR: Could not build wheels for tokenizers which use PEP 517 and cannot be installed directly
    
    

    Expected behavior

    Installation of transformers should be complete.

    Environment info

    • transformers version: N/A - installation step
    • Platform: Command Line Interface / Virtual Env
    • Python version: python 3.8
    • PyTorch version (GPU?): N/A
    • Tensorflow version (GPU?): N/A
    • Using GPU in script?: N/A
    • Using distributed or parallel set-up in script?: N/A tokenizers_intallation_error
    wontfix Core: Tokenization Installation 
    opened by victorlongo 49
  • Add TFConvNextModel

    Add TFConvNextModel

    This PR adds the ConvNeXt [1] model in TensorFlow. It was developed by @arig23498, @gante, and myself.

    Fun facts about this PR:

    • Probably the first pure conv model in transformers.
    • Probably the second pure vision model in TensorFlow in transformers.

    References:

    [1] A ConvNet for the 2020s: https://arxiv.org/abs/2201.03545.

    @gante @LysandreJik @Rocketknight1

    opened by sayakpaul 48
  • Fix race condition on cleaning checkpoints when save_total_limit set to 1

    Fix race condition on cleaning checkpoints when save_total_limit set to 1

    What does this PR do?

    This PR fixes #20988 by testing whether the worker process is allowed to save (self.args.should_save is set to True).

    Fixes #20988

    Before submitting

    • [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
    • [X] Did you read the contributor guideline, Pull Request section?
    • [ ] Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.
    • [ ] Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
    • [ ] Did you write any new necessary tests?

    Who can review?

    Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.

    • trainer: @sgugger
    opened by radcheb 0
  • [Multi-node setup] Race condition on deleting checkpoint when using shared filesystem and save_total_limit=1

    [Multi-node setup] Race condition on deleting checkpoint when using shared filesystem and save_total_limit=1

    System Info

    When running training on multi-node setup with a shared filesystem (shared PVC on Kubernetes). W use the following configuration (Full example on Reproduction section) :

            load_best_model_at_end=True,
            save_on_each_node=False,
            save_total_limit=1,
    

    When the training is finished over all epochs, it fails with FileNotFoundError with random file. It seems all the workers are trying to delete the same files when we set save_total_limit=1. This is causing whole training script to fail:

    FileNotFoundError: [Errno 2] No such file or directory: 'rng_state_1.pth'
    ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 7796)
    ...
    torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
    ``
    
    ### Who can help?
    
    @sgugger
    
    ### Information
    
    - [X] The official example scripts
    - [X] My own modified scripts
    
    ### Tasks
    
    - [X] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
    - [ ] My own task or dataset (give details below)
    
    ### Reproduction
    
    I created the following python script `trainer_bug.py`, it runs **GLUE**  `cola` training task on a small sample of data:
    ```python
    # pip install transformers==4.25.1 datasets==2.8.0 torch==1.13.1 scipy scikit-learn
    import numpy as np
    from datasets import load_dataset, load_metric
    from transformers import AutoTokenizer
    from transformers import AutoModelForSequenceClassification, TrainingArguments, Trainer
    
    
    task = "cola"
    model_checkpoint = "distilbert-base-uncased"
    num_labels = 2
    batch_size = 2
    metric_name = "matthews_correlation"
    validation_key  = "validation"
    SAMPLE_N_ROWS = 10
    
    if __name__ == "__main__":
        dataset = load_dataset("glue", task)
        for split in dataset:
            dataset[split] = dataset[split].select(range(SAMPLE_N_ROWS))
        metric = load_metric('glue', task)
        tokenizer = AutoTokenizer.from_pretrained(model_checkpoint, use_fast=True)
        def preprocess_function(examples):
            return tokenizer(examples["sentence"], truncation=True)
    
    
        def compute_metrics(eval_pred):
            predictions, labels = eval_pred
            predictions = np.argmax(predictions, axis=1)
            return metric.compute(predictions=predictions, references=labels)
    
        encoded_dataset = dataset.map(preprocess_function, batched=True)
        model = AutoModelForSequenceClassification.from_pretrained(model_checkpoint, num_labels=num_labels)
    
        model_name = model_checkpoint.split("/")[-1]
    
        args = TrainingArguments(
            f"{model_name}-finetuned-{task}",
            evaluation_strategy="epoch",
            save_strategy="epoch",
            learning_rate=2e-5,
            per_device_train_batch_size=batch_size,
            per_device_eval_batch_size=batch_size,
            num_train_epochs=3,
            weight_decay=0.01,
            report_to="none",
            metric_for_best_model=metric_name,
            overwrite_output_dir=True,
            load_best_model_at_end=True,
            log_on_each_node=False,
            save_on_each_node=False,
            save_total_limit=1,
            # For a distributed CPU setup
            no_cuda=True,
            xpu_backend="gloo",
        )
    
        trainer = Trainer(
            model,
            args,
            train_dataset=encoded_dataset["train"],
            eval_dataset=encoded_dataset[validation_key],
            tokenizer=tokenizer,
            compute_metrics=compute_metrics
        )
    
        trainer.train()
    

    And then run it with this script trainer_bug.sh to simulate 2 nodes setup on CPUs:

    WORLD_SIZE=2
    PROC_PER_NODE=1
    MASTER_HOSTNAME=localhost
    MASTER_PORT=12345
    
    # Run worker
    RANK=1
    CUDA_VISIBLE_DEVICES="" torchrun --nnodes=$WORLD_SIZE --nproc_per_node=$PROC_PER_NODE \
                --node_rank=$RANK --master_addr=$MASTER_HOSTNAME \
                --master_port=$MASTER_PORT \
                trainer_bug.py &
    
    # Run master
    RANK=0
    CUDA_VISIBLE_DEVICES="" torchrun --nnodes=$WORLD_SIZE --nproc_per_node=$PROC_PER_NODE \
                --node_rank=$RANK --master_addr=$MASTER_HOSTNAME \
                --master_port=$MASTER_PORT \
                trainer_bug.py
    
    

    Expected behavior

    The training is expected to finish successfully. However it fails with the following stack trace:

    Loading best model from distilbert-base-uncased-finetuned-cola/checkpoint-3 (score: 0.0).
    {'train_runtime': 24.6088, 'train_samples_per_second': 1.219, 'train_steps_per_second': 0.366, 'train_loss': 0.5689484278361002, 'epoch': 3.0}{'train_runtime': 24.6164, 'train_samples_per_second': 1.219, 'train_steps_per_second': 0.366, 'train_loss': 0.5813997056749132, 'epoch': 3.0}
    
    100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:24<00:00,  1.83s/it]
    Deleting older checkpoint [distilbert-base-uncased-finetuned-cola/checkpoint-9] due to args.save_total_limit
    100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:24<00:00,  2.74s/it]
    Traceback (most recent call last):
      File "trainer_bug.py", line 66, in <module>
        trainer.train()
      File "/home/XXX/.cache/pypoetry/virtualenvs/XXX-training-zu6czGQ--py3.8/lib/python3.8/site-packages/transformers/trainer.py", line 1527, in train
        return inner_training_loop(
      File "/home/XXX/.cache/pypoetry/virtualenvs/XXX-training-zu6czGQ--py3.8/lib/python3.8/site-packages/transformers/trainer.py", line 1920, in _inner_training_loop
        shutil.rmtree(checkpoint)
      File "/home/XXX/.pyenv/versions/3.8.13/lib/python3.8/shutil.py", line 718, in rmtree
        _rmtree_safe_fd(fd, path, onerror)
      File "/home/XXX/.pyenv/versions/3.8.13/lib/python3.8/shutil.py", line 675, in _rmtree_safe_fd
        onerror(os.unlink, fullname, sys.exc_info())
      File "/home/XXX/.pyenv/versions/3.8.13/lib/python3.8/shutil.py", line 673, in _rmtree_safe_fd
        os.unlink(entry.name, dir_fd=topfd)
    FileNotFoundError: [Errno 2] No such file or directory: 'rng_state_1.pth'
    ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 7796) of binary: /home/XXX/.cache/pypoetry/virtualenvs/XXX-training-zu6czGQ--py3.8/bin/python
    Traceback (most recent call last):
      File "/home/XXX/.cache/pypoetry/virtualenvs/XXX-training-zu6czGQ--py3.8/bin/torchrun", line 8, in <module>
        sys.exit(main())
      File "/home/XXX/.cache/pypoetry/virtualenvs/XXX-training-zu6czGQ--py3.8/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
        return f(*args, **kwargs)
      File "/home/XXX/.cache/pypoetry/virtualenvs/XXX-training-zu6czGQ--py3.8/lib/python3.8/site-packages/torch/distributed/run.py", line 762, in main
        run(args)
      File "/home/XXX/.cache/pypoetry/virtualenvs/XXX-training-zu6czGQ--py3.8/lib/python3.8/site-packages/torch/distributed/run.py", line 753, in run
        elastic_launch(
      File "/home/XXX/.cache/pypoetry/virtualenvs/XXX-training-zu6czGQ--py3.8/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 132, in __call__
        return launch_agent(self._config, self._entrypoint, list(args))
      File "/home/XXX/.cache/pypoetry/virtualenvs/XXX-training-zu6czGQ--py3.8/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent
        raise ChildFailedError(
    torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
    ============================================================
    trainer_bug.py FAILED
    ------------------------------------------------------------
    Failures:
      <NO_OTHER_FAILURES>
    ------------------------------------------------------------
    Root Cause (first observed failure):
    [0]:
      time      : 2023-01-03_18:28:49
      host      : XXXXXX
      rank      : 1 (local_rank: 0)
      exitcode  : 1 (pid: 7796)
      error_file: <N/A>
      traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
    ============================================================
    
    opened by radcheb 0
  • Hugging Face Dies Silently when Memory insufficient for loading Model / Training Model

    Hugging Face Dies Silently when Memory insufficient for loading Model / Training Model

    Currently, when you load a model into memory that is too large or if you try to train a model with insufficient memory. The process gets killed without an error message. It's a bit tough to track down what is going on as a result. I'm wondering if you can add an error message similar to pytorch when you have insufficient memory to run a given process?

    opened by courtneysprouse 2
  • Added mask_time_prob and mask_time_length arguments to wav2vec2 pretraining script

    Added mask_time_prob and mask_time_length arguments to wav2vec2 pretraining script

    This PR relates to PR 19997, in which I messed up the PR by forgetting the --force flag when pushing. Hopefully this PR is correctly performed.

    @sanchit-gandhi @sgugger @patrickvonplaten

    opened by mpierrau 1
  • Add DETA

    Add DETA

    What does this PR do?

    This PR adds DETA. DETA is a slight change to Deformable DETR by using traditional IoU-based assignment as opposed to the Hungarian matching used in the original DETR, and incorporating NMS (non-maximum suppression) in the postprocessing.

    Note: this model has a torchvision dependency for NMS.

    opened by NielsRogge 0
  • Adding Support for Mixed Precision in Accelerator

    Adding Support for Mixed Precision in Accelerator

    There's a bug in the code that, we've got accelerator.use_fp16 but the accelerator.use_fp16 flag can never be True because we didn't pass it in. I've added the support by passing in the fp16 flag.

    What does this PR do?

    Fixes # (issue)

    Before submitting

    • [x] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
    • [x] Did you read the contributor guideline, Pull Request section?
    • [ ] Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.
    • [ ] Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
    • [ ] Did you write any new necessary tests?

    Who can review?

    Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.

    opened by BiEchi 1
Releases(v4.25.1)
  • v4.25.1(Dec 2, 2022)

    PyTorch 2.0 stack support

    We are very excited by the newly announced PyTorch 2.0 stack. You can enable torch.compile on any of our models, and get support with the Trainer (and in all our PyTorch examples) by using the torchdynamo training argument. For instance, just add --torchdynamo inductor when launching those examples from the command line.

    This API is still experimental and may be subject to changes as the PyTorch 2.0 stack matures.

    Note that to get the best performance, we recommend:

    • using an Ampere GPU (or more recent)
    • sticking to fixed shaped for now (so use --pad_to_max_length in our examples)
    • Repurpose torchdynamo training args towards torch._dynamo by @sgugger in #20498

    Audio Spectrogram Transformer

    The Audio Spectrogram Transformer model was proposed in AST: Audio Spectrogram Transformer by Yuan Gong, Yu-An Chung, James Glass. The Audio Spectrogram Transformer applies a Vision Transformer to audio, by turning audio into an image (spectrogram). The model obtains state-of-the-art results for audio classification.

    • Add Audio Spectogram Transformer by @NielsRogge in #19981

    Jukebox

    The Jukebox model was proposed in Jukebox: A generative model for music by Prafulla Dhariwal, Heewoo Jun, Christine Payne, Jong Wook Kim, Alec Radford, Ilya Sutskever. It introduces a generative music model which can produce minute long samples that can be conditionned on an artist, genres and lyrics.

    • Add Jukebox model (replaces #16875) by @ArthurZucker in #17826

    Switch Transformers

    The SwitchTransformers model was proposed in Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity by William Fedus, Barret Zoph, Noam Shazeer.

    It is the first MoE model supported in transformers, with the largest checkpoint currently available currently containing 1T parameters.

    • Add Switch transformers by @younesbelkada and @ArthurZucker in #19323

    RocBert

    The RoCBert model was proposed in RoCBert: Robust Chinese Bert with Multimodal Contrastive Pretraining by HuiSu, WeiweiShi, XiaoyuShen, XiaoZhou, TuoJi, JiaruiFang, JieZhou. It’s a pretrained Chinese language model that is robust under various forms of adversarial attacks.

    • Add RocBert by @sww9370 in #20013

    CLIPSeg

    The CLIPSeg model was proposed in Image Segmentation Using Text and Image Prompts by Timo Lüddecke and Alexander Ecker. CLIPSeg adds a minimal decoder on top of a frozen CLIP model for zero- and one-shot image segmentation.

    • Add CLIPSeg by @NielsRogge in #20066

    NAT and DiNAT

    NAT

    NAT was proposed in Neighborhood Attention Transformer by Ali Hassani, Steven Walton, Jiachen Li, Shen Li, and Humphrey Shi.

    It is a hierarchical vision transformer based on Neighborhood Attention, a sliding-window self attention pattern.

    DiNAT

    DiNAT was proposed in Dilated Neighborhood Attention Transformer by Ali Hassani and Humphrey Shi.

    It extends NAT by adding a Dilated Neighborhood Attention pattern to capture global context, and shows significant performance improvements over it.

    • Add Neighborhood Attention Transformer (NAT) and Dilated NAT (DiNAT) models by @alihassanijr in #20219

    MobileNetV2

    The MobileNet model was proposed in MobileNetV2: Inverted Residuals and Linear Bottlenecks by Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, Liang-Chieh Chen.

    • add MobileNetV2 model by @hollance in #17845

    MobileNetV1

    The MobileNet model was proposed in MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications by Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, Hartwig Adam.

    • add MobileNetV1 model by @hollance in #17799

    Image processors

    Image processors replace feature extractors as the processing class for computer vision models.

    Important changes:

    • size parameter is now a dictionary of {"height": h, "width": w}, {"shortest_edge": s}, {"shortest_egde": s, "longest_edge": l} instead of int or tuple.
    • Addition of data_format flag. You can now specify if you want your images to be returned in "channels_first" - NCHW - or "channels_last" - NHWC - format.
    • Processing flags e.g. do_resize can be passed directly to the preprocess method instead of modifying the class attribute: image_processor([image_1, image_2], do_resize=False, return_tensors="pt", data_format="channels_last")
    • Leaving return_tensors unset will return a list of numpy arrays.

    The classes are backwards compatible and can be created using existing feature extractor configurations - with the size parameter converted.

    • Add Image Processors by @amyeroberts in #19796
    • Add Donut image processor by @amyeroberts #20425
    • Add segmentation + object detection image processors by @amyeroberts in #20160
    • AutoImageProcessor by @amyeroberts in #20111

    Backbone for computer vision models

    We're adding support for a general AutoBackbone class, which turns any vision model (like ConvNeXt, Swin Transformer) into a backbone to be used with frameworks like DETR and Mask R-CNN. The design is in early stages and we welcome feedback.

    • Add AutoBackbone + ResNetBackbone by @NielsRogge in #20229
    • Improve backbone by @NielsRogge in #20380
    • [AutoBackbone] Improve API by @NielsRogge in #20407

    Support for safetensors offloading

    If the model you are using has a safetensors checkpoint and you have the library installed, offload to disk will take advantage of this to be more memory efficient and roughly 33% faster.

    • Safetensors offload by @sgugger in #20321

    Contrastive search in the generate method

    • Generate: TF contrastive search with XLA support by @gante in #20050
    • Generate: contrastive search with full optional outputs by @gante in #19963

    Breaking changes

    • 🚨 🚨 🚨 Fix Issue 15003: SentencePiece Tokenizers Not Adding Special Tokens in convert_tokens_to_string by @beneyal in #15775

    Bugfixes and improvements

    • add dataset by @stevhliu in #20005
    • Add BERT resources by @stevhliu in #19852
    • Add LayoutLMv3 resource by @stevhliu in #19932
    • fix typo by @stevhliu in #20006
    • Update object detection pipeline to use post_process_object_detection methods by @alaradirik in #20004
    • clean up vision/text config dict arguments by @ydshieh in #19954
    • make sentencepiece import conditional in bertjapanesetokenizer by @ripose-jp in #20012
    • Fix gradient checkpoint test in encoder-decoder by @ydshieh in #20017
    • Quality by @sgugger in #20002
    • Update auto processor to check image processor created by @amyeroberts in #20021
    • [Doctest] Add configuration_deberta_v2.py by @Saad135 in #19995
    • Improve model tester by @ydshieh in #19984
    • Fix doctest by @ydshieh in #20023
    • Show installed libraries and their versions in CI jobs by @ydshieh in #20026
    • reorganize glossary by @stevhliu in #20010
    • Now supporting pathlike in pipelines too. by @Narsil in #20030
    • Add **kwargs by @amyeroberts in #20037
    • Fix some doctests after PR 15775 by @ydshieh in #20036
    • [Doctest] Add configuration_camembert.py by @Saad135 in #20039
    • [Whisper Tokenizer] Make more user-friendly by @sanchit-gandhi in #19921
    • [FuturWarning] Add futur warning for LEDForSequenceClassification by @ArthurZucker in #19066
    • fix jit trace error for model forward sequence is not aligned with jit.trace tuple input sequence, update related doc by @sywangyi in #19891
    • Update esmfold conversion script by @Rocketknight1 in #20028
    • Fixed torch.finfo issue with torch.fx by @michaelbenayoun in #20040
    • Only resize embeddings when necessary by @sgugger in #20043
    • Speed up TF token classification postprocessing by converting complete tensors to numpy by @deutschmn in #19976
    • Fix ESM LM head test by @Rocketknight1 in #20045
    • Update README.md by @bofenghuang in #20063
    • fix tokenizer_type to avoid error when loading checkpoint back by @pacman100 in #20062
    • [Trainer] Fix model name in push_to_hub by @sanchit-gandhi in #20064
    • PoolformerImageProcessor defaults to match previous FE by @amyeroberts in #20048
    • change constant torch.tensor to torch.full by @MerHS in #20061
    • Update READMEs for ESMFold and add notebooks by @Rocketknight1 in #20067
    • Update documentation on seq2seq models with absolute positional embeddings, to be in line with Tips section for BERT and GPT2 by @jordiclive in #20068
    • Allow passing arguments to model testers for CLIP-like models by @ydshieh in #20044
    • Show installed libraries and their versions in GA jobs by @ydshieh in #20069
    • Update defaults and logic to match old FE by @amyeroberts in #20065
    • Update modeling_tf_utils.py by @cakiki in #20076
    • Update hub.py by @cakiki in #20075
    • [Doctest] Add configuration_dpr.py by @Saad135 in #20080
    • Removing RobertaConfig inheritance from CamembertConfig by @Saad135 in #20059
    • Skip 2 tests in VisionTextDualEncoderProcessorTest by @ydshieh in #20098
    • Replace unsupported facebookresearch/bitsandbytes by @tomaarsen in #20093
    • docs: Resolve many typos in the English docs by @tomaarsen in #20088
    • use huggingface_hub.model_inifo() to get pipline_tag by @y-tag in #20077
    • Fix generate_dummy_inputs for ImageGPTOnnxConfig by @ydshieh in #20103
    • docs: Fixed variables in f-strings by @tomaarsen in #20087
    • Add new terms to the glossary by @stevhliu in #20051
    • Replace awkward timm link with the expected one by @tomaarsen in #20109
    • Fix AutoTokenizer with subfolder passed by @sgugger in #20110
    • [Audio Processor] Only pass sr to feat extractor by @sanchit-gandhi in #20022
    • Update github pr docs actions by @mishig25 in #20125
    • Adapt has_labels test when no labels were found by @sgugger in #20113
    • Improve tiny model creation script by @ydshieh in #20119
    • Remove BertConfig inheritance from RobertaConfig by @Saad135 in #20124
    • [Swin] Add Swin SimMIM checkpoints by @NielsRogge in #20034
    • Update CLIPSegModelTester by @ydshieh in #20134
    • Update SwinForMaskedImageModeling doctest values by @amyeroberts in #20139
    • Attempting to test automatically the _keys_to_ignore. by @Narsil in #20042
    • Generate: move generation_.py src files into generation/.py by @gante in #20096
    • add cv + audio labels by @stevhliu in #20114
    • Update VisionEncoderDecoder to use an image processor by @amyeroberts in #20137
    • [CLIPSeg] Add resources by @NielsRogge in #20118
    • Make DummyObject more robust by @mariosasko in #20146
    • Add RoCBertTokenizer to TOKENIZER_MAPPING_NAMES by @ydshieh in #20141
    • Adding support for LayoutLMvX variants for object-detection. by @Narsil in #20143
    • Add doc tests by @NielsRogge in #20158
    • doc comment fix: Args was in wrong place by @hollance in #20164
    • Update OnnxConfig.generate_dummy_inputs to check ImageProcessingMixin by @ydshieh in #20157
    • Generate: fix TF doctests by @gante in #20159
    • Fix arg names for our models by @Rocketknight1 in #20166
    • [processor] Add 'model input names' property by @sanchit-gandhi in #20117
    • Fix object-detection bug (height, width inversion). by @Narsil in #20167
    • [OWL-ViT] Make model consistent with CLIP by @NielsRogge in #20144
    • Fix type - update any PIL.Image.Resampling by @amyeroberts in #20172
    • Fix tapas scatter by @Bearnardd in #20149
    • Update README.md by @code-with-rajeev in #19530
    • Proposal Remove the weird inspect in ASR pipeline and make WhisperEncoder just nice to use. by @Narsil in #19571
    • Pytorch type hints by @IMvision12 in #20112
    • Generate: TF sample doctest result update by @gante in #20208
    • [ROC_BERT] Make CI happy by @younesbelkada in #20175
    • add _keys_to_ignore_on_load_unexpected = [r"pooler"] by @ArthurZucker in #20210
    • docs: translated index page to korean by @wonhyeongseo in #20180
    • feat: add i18n issue template by @wonhyeongseo in #20199
    • [Examples] Generalise Seq2Seq ASR to handle Whisper by @sanchit-gandhi in #19519
    • mark test_save_load_fast_init_from_base as is_flaky by @ydshieh in #20200
    • Update README.md by @Nietism in #20188
    • Downgrade log warning -> info by @amyeroberts in #20202
    • Generate: add Bloom fixes for contrastive search by @gante in #20213
    • Adding chunking for whisper (all seq2seq actually). Very crude matching algorithm. by @Narsil in #20104
    • [docs] set overflowing image width to auto-scale by @wonhyeongseo in #20197
    • Update tokenizer_summary.mdx by @bofenghuang in #20135
    • Make ImageSegmentationPipelineTests less flaky by @ydshieh in #20147
    • update relative positional embedding by @ArthurZucker in #20203
    • [WHISPER] Update modeling tests by @ArthurZucker in #20162
    • Add accelerate support for ViT family by @younesbelkada in #20174
    • Add param_name to size_dict logs & tidy by @amyeroberts in #20205
    • Add object detection + segmentation transforms by @amyeroberts in #20003
    • Typo on doctring in ElectraTokenizer by @FacerAin in #20192
    • Remove authorized_missing_keysin favor of _keys_to_ignore_on_load_missing by @ArthurZucker in #20228
    • Add missing ESM autoclass by @Rocketknight1 in #20177
    • fix device issue by @ydshieh in #20227
    • fixed spelling error in testing.mdx by @kasmith11 in #20220
    • Fix run_clip.py by @ydshieh in #20234
    • Fix docstring of CLIPTokenizer(Fast) by @TilmannR in #20233
    • Fix MaskformerFeatureExtractor by @NielsRogge in #20100
    • New logging support to "Trainer" Class (ClearML Logger) by @skinan in #20184
    • Enable PyTorch 1.13 by @sgugger in #20168
    • [CLIP] allow loading projection layer in vision and text model by @patil-suraj in #18962
    • Slightly alter Keras dummy loss by @Rocketknight1 in #20232
    • Add to DeBERTa resources by @Saad135 in #20155
    • Add clip resources to the transformers documentation by @ambujpawar in #20190
    • Update reqs to include min gather_for_metrics Accelerate version by @muellerzr in #20242
    • Allow trainer to return eval. loss for CLIP-like models by @ydshieh in #20214
    • Adds image-guided object detection support to OWL-ViT by @alaradirik in #20136
    • Adding audio-classification example in the doc. by @Narsil in #20235
    • Updating the doctest for conversational. by @Narsil in #20236
    • Adding doctest for fill-mask pipeline. by @Narsil in #20241
    • Adding doctest for feature-extraction. by @Narsil in #20240
    • Adding ASR pipeline example. by @Narsil in #20226
    • Adding doctest for document-question-answering by @Narsil in #20239
    • Adding an example for depth-estimation pipeline. by @Narsil in #20237
    • Complete doc migration by @mishig25 in #20267
    • Fix result saving errors of pytorch examples by @li-plus in #20276
    • Adding a doctest for table-question-answering pipeline. by @Narsil in #20260
    • Adding doctest for image-segmentation pipeline. by @Narsil in #20256
    • Adding doctest for text2text-generation pipeline. by @Narsil in #20261
    • Adding doctest for text-generation pipeline. by @Narsil in #20264
    • Add TF protein notebook to notebooks doc by @Rocketknight1 in #20271
    • Rephrasing the link. by @Narsil in #20253
    • Add Chinese-CLIP implementation by @yangapku in #20368
    • Adding doctest example for image-classification pipeline. by @Narsil in #20254
    • Adding doctest for zero-shot-image-classification pipeline. by @Narsil in #20272
    • Adding doctest for zero-shot-classification pipeline. by @Narsil in #20268
    • Adding doctest for visual-question-answering pipeline. by @Narsil in #20266
    • Adding doctest for text-classification pipeline. by @Narsil in #20262
    • Adding doctest for question-answering pipeline. by @Narsil in #20259
    • [Docs] Add resources of OpenAI GPT by @shogohida in #20084
    • Adding doctest for image-to-text pipeline. by @Narsil in #20257
    • Adding doctest for token-classification pipeline. by @Narsil in #20265
    • remaining pytorch type hints by @IMvision12 in #20217
    • Data collator for token classification pads labels column when receives pytorch tensors by @markovalexander in #20244
    • [Doctest] Add configuration_deformable_detr.py by @Saad135 in #20273
    • Fix summarization script by @muellerzr in #20286
    • [DOCTEST] Fix the documentation of RoCBert by @ArthurZucker in #20142
    • [bnb] Let's warn users when saving 8-bit models by @younesbelkada in #20282
    • Adding zero-shot-object-detection pipeline doctest. by @Narsil in #20274
    • Adding doctest for object-detection pipeline. by @Narsil in #20258
    • Image transforms functionality used instead by @amyeroberts in #20278
    • TF: add test for PushToHubCallback by @gante in #20231
    • Generate: general TF XLA constrastive search are now slow tests by @gante in #20277
    • Fixing the doctests failures. by @Narsil in #20294
    • set the default cache_enable to True, aligned with the default value in pytorch cpu/cuda amp autocast by @sywangyi in #20289
    • Add docstrings for canine model by @raghavanone in #19457
    • Add missing report button for Example test by @ydshieh in #20293
    • refactor test by @younesbelkada in #20300
    • [Tiny model creation] deal with ImageProcessor by @ydshieh in #20298
    • Fix blender bot missleading doc by @ArthurZucker in #20301
    • remove two tokens that should not be suppressed by @ArthurZucker in #20302
    • [ASR Examples] Update README for Whisper by @sanchit-gandhi in #20230
    • Add padding image transformation by @amyeroberts in #19838
    • Pin TensorFlow by @sgugger in #20313
    • Add AnyPrecisionAdamW optimizer by @atturaioe in #18961
    • [Proposal] Breaking change zero-shot-object-detection for improved consistency. by @Narsil in #20280
    • Fix flakey test with seed by @muellerzr in #20318
    • Pin TF 2.10.1 for Push CI by @ydshieh in #20319
    • Remove double brackets by @stevhliu in #20307
    • TF: future proof our keras imports by @gante in #20317
    • organize pipelines by modality by @stevhliu in #20306
    • Fix torch device issues by @ydshieh in #20304
    • Generate: add generation config class by @gante in #20218
    • translate zh quicktour by @bfss in #20095)
    • Add Spanish translation of serialization.mdx by @donelianc in #20245
    • Add LayerScale to NAT/DiNAT by @alihassanijr in #20325
    • [Switch Transformers] Fix failing slow test by @younesbelkada in #20346
    • fix: "BigSicence" typo in docs by @rajrajhans in #20331
    • Generate: model_kwargs can also be an input to prepare_inputs_for_generation by @gante in #20353
    • Update Special Language Tokens for PLBART by @jordiclive in #19980
    • Add resources by @NielsRogge in #20296
    • Enhance HfArgumentParser functionality and ease of use by @konstantinjdobler in #20323
    • Add inference section to task guides by @stevhliu in #18781
    • Fix toctree for Section 3 in Spanish Documentation by @donelianc in #20360
    • Generate: shorter XLA contrastive search tests by @gante in #20354
    • revert keys_to_ignore for M2M100 by @younesbelkada in #20381
    • add accelerate support for ESM by @younesbelkada in #20379
    • Fix nightly runs by @sgugger in #20352
    • Optimizes DonutProcessor token2json method for speed by @michaelnation26 in #20283
    • Indicate better minimal version of PyTorch in big model inference by @sgugger in #20385
    • Fix longformer onnx broken export by @fxmarty in #20292
    • Use tiny models for ONNX tests - text modality by @lewtun in #20333
    • [ESM] fix accelerate tests for esmfold by @younesbelkada in #20387
    • Generate: fix plbart generation tests by @gante in #20391
    • [bloom] convert script tweaks by @stas00 in #18593
    • Fix doctest file path by @ydshieh in #20400
    • [Image Transformers] to_pil fix float edge cases by @patrickvonplaten in #20406
    • make daily CI happy by @younesbelkada in #20410
    • fix nasty bnb bug by @younesbelkada in #20408
    • change the way sentinel tokens can retrived by @raghavanone in #20373
    • [BNB] Throw ValueError when trying to cast or assign by @younesbelkada in #20409
    • Use updated model_max_length when saving tokenizers by @ydshieh in #20401
    • Add Spanish translation of pr_checks.mdx by @donelianc in #20339
    • fix device in longformer onnx path by @fxmarty in #20419
    • Fix ModelOutput instantiation when there is only one tuple by @sgugger in #20416
    • accelerate support for OwlViT by @younesbelkada in #20411
    • [AnyPrecisionAdamW] test fix by @stas00 in #20454
    • fix word_to_tokens docstring format by @SaulLu in #20450
    • Fix typo in FSMT Tokenizer by @kamalkraj in #20456
    • Fix device issues in CLIPSegModelIntegrationTest by @ydshieh in #20467
    • Fix links for contrastive_loss by @ydshieh in #20455
    • Fix doctests for audio models by @ydshieh in #20468
    • Fix ESM checkpoints for tests by @Rocketknight1 in #20436
    • More TF int dtype fixes by @Rocketknight1 in #20384
    • make tensors in function build_relative_position created on proper device instead of always on cpu by @qq775294390 in #20434
    • update cpu related doc by @sywangyi in #20444
    • with pytorch cpu only version. without --no_cuda, using --bf16 will trigger error like "Your setup doesn't support bf16/gpu. You need torch>=1.10, using Ampere GPU with cuda>=11.0" by @sywangyi in #20445
    • [CLIPTokenizer] Improve warning by @patrickvonplaten in #20458
    • Replace assertions with value errors on distilbert model by @JuheonChu in #20463
    • [Doctest] Add configuration_fsmt.py by @sha016 in #19936
    • Replace assertion with ValueError exceptions in run_image_captioning_flax.py by @katiele47 in #20365
    • [FLAX] Add dtype to embedding for bert/bart/opt/t5 by @merrymercy in #20340
    • fix both failing RoCBert tests by @ArthurZucker in #20469
    • Include image processor in add-new-model-like by @amyeroberts in #20439
    • chore: add link to the video cls notebook. by @sayakpaul in #20386
    • add timeout option for deepspeed engine by @henghuiz in #20443
    • [Maskformer] Add MaskFormerSwin backbone by @NielsRogge in #20344
    • Extract warnings from CI artifacts by @ydshieh in #20474
    • Add Donut image processor by @amyeroberts in #20425
    • Fix torch meshgrid warnings by @fxmarty in #20475
    • Fix init import_structure sorting by @sgugger in #20477
    • extract warnings in GH workflows by @ydshieh in #20487
    • add in layer gpt2 tokenizer by @piEsposito in #20421
    • Replace assert statements with raise exceptions by @miyu386 in #20478
    • fixed small typo by @sandeepgadhwal in #20490
    • Fix documentation code to import facebook/detr-resnet-50 model by @JuanFKurucz in #20491
    • Fix disk offload for full safetensors checkpoints by @sgugger in #20497
    • [modelcard] Check for IterableDataset by @sanchit-gandhi in #20495
    • [modelcard] Set model name if empty by @sanchit-gandhi in #20496
    • Add segmentation + object detection image processors by @amyeroberts in #20160
    • remove attention_mask truncation in whisper by @ydshieh in #20488
    • Make add_special_tokens more clear by @ydshieh in #20424
    • [OPT/Galactica] Load large galactica models by @younesbelkada in #20390
    • Support extraction of both train and eval XLA graphs by @jeffhataws in #20492
    • fix ipex+fp32 jit trace error in ipex 1.13 by @sywangyi in #20504
    • Expected output for the test changed by @ArthurZucker in #20493
    • Fix TF nightly tests by @Rocketknight1 in #20507
    • Update doc examples feature extractor -> image processor by @amyeroberts in #20501
    • Fix Typo in Docs for GPU by @julianpollmann in #20509
    • Fix minimum version for device_map by @sgugger in #20489
    • Update AutomaticSpeechRecognitionPipeline doc example by @ydshieh in #20512
    • Add natten for CI by @ydshieh in #20511
    • Fix Data2VecTextForCasualLM example code documentation by @JuanFKurucz in #20510
    • Add some warning for Dynamo and enable TF32 when it's set by @sgugger in #20515
    • [modelcard] Update dataset tags by @sanchit-gandhi in #20506
    • Change Doctests CI launch time by @ydshieh in #20523
    • Fix PLBart doctest by @ydshieh in #20527
    • Fix ConditionalDetrForSegmentation doc example by @ydshieh in #20531
    • add doc for by @younesbelkada in #20525
    • Update ZeroShotObjectDetectionPipeline doc example by @ydshieh in #20528
    • update post_process_image_guided_detection by @fcakyon in #20521
    • QnA example: add speed metric by @sywangyi in #20522
    • Fix doctest by @NielsRogge in #20534
    • Fix Hubert models in TFHubertModel and TFHubertForCTC documentation code by @JuanFKurucz in #20516
    • Fix link in pipeline device map by @stevhliu in #20517

    Significant community contributions

    The following contributors have made significant changes to the library over the last release:

    • @sww9370
      • Add RocBert (#20013)
    • @IMvision12
      • Pytorch type hints (#20112)
      • remaining pytorch type hints (#20217)
    • @alihassanijr
      • Add Neighborhood Attention Transformer (NAT) and Dilated NAT (DiNAT) models (#20219)
      • Add LayerScale to NAT/DiNAT (#20325)
    • @bfss
      • translate zh quicktour(#20095) (#20181)
    • @donelianc
      • Add Spanish translation of serialization.mdx (#20245)
      • Fix toctree for Section 3 in Spanish Documentation (#20360)
      • Add Spanish translation of pr_checks.mdx (#20339)
    • @yangapku
      • Add Chinese-CLIP implementation (#20368)
    Source code(tar.gz)
    Source code(zip)
  • v4.24.0(Nov 1, 2022)

    ESM-2/ESMFold

    ESM-2 and ESMFold are new state-of-the-art Transformer protein language and folding models from Meta AI's Fundamental AI Research Team (FAIR). ESM-2 is trained with a masked language modeling objective, and it can be easily transferred to sequence and token classification tasks for proteins. Checkpoints exist in various sizes, from 8 million parameters up to a huge 15 billion parameter model.

    ESMFold is a state-of-the-art single sequence protein folding model which produces high accuracy predictions significantly faster. Unlike previous protein folding tools like AlphaFold2 and openfold, ESMFold uses a pretrained protein language model to generate token embeddings that are used as input to the folding model, and so does not require a multiple sequence alignment (MSA) of related proteins as input. As a result, proteins can be folded in a single forward pass of the model without requiring any external databases or search/alignment tools to be present at inference time. This hugely reduces the time and compute requirements for folding.

    Transformer protein language models were introduced in the paper Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences by Alexander Rives, Joshua Meier, Tom Sercu, Siddharth Goyal, Zeming Lin, Jason Liu, Demi Guo, Myle Ott, C. Lawrence Zitnick, Jerry Ma, and Rob Fergus.

    ESMFold was introduced in the paper Language models of protein sequences at the scale of evolution enable accurate structure prediction by Zeming Lin, Halil Akin, Roshan Rao, Brian Hie, Zhongkai Zhu, Wenting Lu, Allan dos Santos Costa, Maryam Fazel-Zarandi, Tom Sercu, Sal Candido, and Alexander Rives.

    • Add ESMFold by @Rocketknight1 in #19977
    • TF port of ESM by @Rocketknight1 in #19587

    LiLT

    LiLT allows to combine any pre-trained RoBERTa text encoder with a lightweight Layout Transformer, to enable LayoutLM-like document understanding for many languages.

    It was proposed in LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding by Jiapeng Wang, Lianwen Jin, Kai Ding.

    • Add LiLT by @NielsRogge in #19450

    Flan-T5

    FLAN-T5 is an enhanced version of T5 that has been finetuned on a mixture of tasks.

    It was released in the paper Scaling Instruction-Finetuned Language Models by Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Eric Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Albert Webson, Shixiang Shane Gu, Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery, Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Zhao, Yanping Huang, Andrew Dai, Hongkun Yu, Slav Petrov, Ed H. Chi, Jeff Dean, Jacob Devlin, Adam Roberts, Denny Zhou, Quoc V. Le, and Jason Wei.

    • Add flan-t5 documentation page by @younesbelkada in #19892

    Table Transformer

    Table Transformer is a model that can perform table extraction and table structure recognition from unstructured documents based on the DETR architecture.

    It was proposed in PubTables-1M: Towards comprehensive table extraction from unstructured documents by Brandon Smock, Rohith Pesala, Robin Abraham.

    • Add table transformer [v2] by @NielsRogge in #19614

    Contrastive search decoding

    Contrastive search decoding is a new state-of-the-art generation method which aims at reducing the repetitive patterns in which generation models often fall.

    It was introduced in A Contrastive Framework for Neural Text Generation by Yixuan Su, Tian Lan, Yan Wang, Dani Yogatama, Lingpeng Kong, Nigel Collier.

    • Adding the state-of-the-art contrastive search decoding methods for the codebase of generation_utils.py by @gmftbyGMFTBY in #19477

    Safety and security

    We continue to explore the new serialization format not using Pickle via the safetensors library, this time by adding support for TensorFlow models. More checkpoints have been converted to this format. Support is still experimental.

    • Safetensors tf by @sgugger in #19900

    🚨 Breaking changes

    The following changes are bugfixes that we have chosen to fix even if it changes the resulting behavior. We mark them as breaking changes, so if you are using this part of the codebase, we recommend you take a look at the PRs to understand what changes were done exactly.

    • 🚨🚨🚨 TF: Remove TFWrappedEmbeddings (breaking: TF embedding initialization updated for encoder-decoder models) by @gante in #19263
    • 🚨🚨🚨 [Breaking change] Deformable DETR intermediate representations by @Narsil in #19678

    Bugfixes and improvements

    • Enabling custom TF signature draft by @dimitreOliveira in #19249
    • Fix whisper for pipeline by @ArthurZucker in #19482
    • Extend nested_XXX functions to mappings/dicts. by @Guillem96 in #19455
    • Syntax issues (lines 126, 203) by @kant in #19444
    • CLI: add import protection to datasets by @gante in #19470
    • Fix TFGroupViT CI by @ydshieh in #19461
    • Fix doctests for DeiT and TFGroupViT by @ydshieh in #19466
    • Update WhisperModelIntegrationTests.test_large_batched_generation by @ydshieh in #19472
    • [Swin] Replace hard-coded batch size to enable dynamic ONNX export by @lewtun in #19475
    • TF: TFBart embedding initialization by @gante in #19460
    • Make LayoutLM tokenizers independent from BertTokenizer by @arnaudstiegler in #19351
    • Make XLMRoberta model and config independent from Roberta by @asofiaoliveira in #19359
    • Fix get_embedding dtype at init. time by @ydshieh in #19473
    • Decouples XLMProphet model from Prophet by @srhrshr in #19406
    • Implement multiple span support for DocumentQuestionAnswering by @ankrgyl in #19204
    • Add warning in generate & device_map=auto & half precision models by @younesbelkada in #19468
    • Update TF whisper doc tests by @amyeroberts in #19484
    • Make bert_japanese and cpm independent of their inherited modules by @Davidy22 in #19431
    • Added tokenize keyword arguments to feature extraction pipeline by @quancore in #19382
    • Adding the README_es.md and reference to it in the others files readme by @Oussamaosman02 in #19427
    • [CvT] Tensorflow implementation by @mathieujouffroy in #18597
    • python3 instead of python in push CI setup job by @ydshieh in #19492
    • Update PT to TF CLI for audio models by @amyeroberts in #19465
    • New by @IMvision12 in #19481
    • Fix OPTForQuestionAnswering doctest by @ydshieh in #19479
    • Use a dynamic configuration for circleCI tests by @sgugger in #19325
    • Add multi-node conditions in trainer_qa.py and trainer_seq2seq.py by @regisss in #19502
    • update doc for perf_train_cpu_many by @sywangyi in #19506
    • Avoid Push CI failing to report due to many commits being merged by @ydshieh in #19496
    • [Doctest] Add configuration_bert.py to doctest by @ydshieh in #19485
    • Fix whisper doc by @ArthurZucker in #19518
    • Syntax issue (line 497, 526) Documentation by @kant in #19442
    • Fix pytorch seq2seq qa by @FilipposVentirozos in #19258
    • Add depth estimation pipeline by @nandwalritik in #18618
    • Adding links to pipelines parameters documentation by @AndreaSottana in #19227
    • fix MarkupLMProcessor option flag by @davanstrien in #19526
    • [Doctest] Bart configuration update by @imarekkus in #19524
    • Remove roberta dependency from longformer fast tokenizer by @sirmammingtonham in #19501
    • made tokenization_roformer independent of bert by @naveennamani in #19426
    • Remove bert fast dependency from electra by @Threepointone4 in #19520
    • [Examples] Fix typos in run speech recognition seq2seq by @sanchit-gandhi in #19514
    • [X-CLIP] Fix doc tests by @NielsRogge in #19523
    • Update Marian config default vocabulary size by @gante in #19464
    • Make MobileBert tokenizers independent from Bert by @501Good in #19531
    • [Whisper] Fix gradient checkpointing by @sanchit-gandhi in #19538
    • Syntax issues (paragraphs 122, 130, 147, 155) Documentation: @sgugger by @kant in #19437
    • using trunc_normal for weight init & cls_token by @mathieujouffroy in #19486
    • Remove MarkupLMForMaskedLM from MODEL_WITH_LM_HEAD_MAPPING_NAMES by @ydshieh in #19534
    • Image transforms library by @amyeroberts in #18520
    • Add a decorator for flaky tests by @sgugger in #19498
    • [Doctest] Add configuration_yolos.py by @daspartho in #19539
    • Albert config update by @imarekkus in #19541
    • [Doctest] Add configuration_whisper.py by @daspartho in #19540
    • Throw an error if getattribute_from_module can't find anything by @ydshieh in #19535
    • [Doctest] Beit Config for doctest by @daspartho in #19542
    • Create the arange tensor on device for enabling CUDA-Graph for Clip Encoder by @RezaYazdaniAminabadi in #19503
    • [Doctest] GPT2 Config for doctest by @daspartho in #19549
    • Build Push CI images also in a daily basis by @ydshieh in #19532
    • Fix checkpoint used in MarkupLMConfig by @ydshieh in #19547
    • add a note to whisper docs clarifying support of long-form decoding by @akashmjn in #19497
    • [Whisper] Freeze params of encoder by @sanchit-gandhi in #19527
    • [Doctest] Fixing the Doctest for imageGPT config by @RamitPahwa in #19556
    • [Doctest] Fixing mobile bert configuration doctest by @RamitPahwa in #19557
    • [Doctest] Fixing doctest bert_generation configuration by @Threepointone4 in #19558
    • [Doctest] DeiT Config for doctest by @daspartho in #19560
    • [Doctest] Reformer Config for doctest by @daspartho in #19562
    • [Doctest] RoBERTa Config for doctest by @daspartho in #19563
    • [Doctest] Add configuration_vit.py by @daspartho in #19561
    • [Doctest] bloom config update by @imarekkus in #19566
    • [Re-submit] Compute true loss Flax examples by @duongna21 in #19504
    • Fix fairseq wav2vec2-xls-r pretrained weights conversion scripts by @heatz123 in #19508
    • [Doctest] CTRL config by @imarekkus in #19574
    • [Doctest] Add configuration_canine.py by @IzicTemi in #19575
    • [Doctests] Config files for ViTMAE and YOSO by @grgkaran03 in #19567
    • Added type hints to DebertaV2ForMultipleChoice Pytorch by @IMvision12 in #19536
    • [WIP] Add type hints for Lxmert (TF) by @elusenji in #19441
    • [Doctests] add configuration_blenderbot.py by @grgkaran03 in #19577
    • [Doctest] adds trajectory_transformer config to Docs test by @SD-13 in #19586
    • [Doctests] add configuration_blenderbot_small.py by @grgkaran03 in #19589
    • [Doctest] Swin V2 Config for doctest by @daspartho in #19595
    • [Doctest] Swin Config for doctest by @daspartho in #19594
    • [Doctest] SEW Config for doctest by @daspartho in #19597
    • [Doctest] UniSpeech Config for doctest by @daspartho in #19596
    • [Doctest] SEW-D Config for doctest by @daspartho in #19598
    • [Doctest] fix doc test for megatron bert by @RamitPahwa in #19600
    • Adding type hints for TFXLnet by @thliang01 in #19344
    • [Doctest] Add configuration_bigbird_pegasus.py and configuration_big_bird.py by @Xabilahu in #19606
    • Cast masks to np.unit8 before converting to PIL.Image.Image by @amyeroberts in #19616
    • [Whisper] Don't return attention mask in feat extractor by @sanchit-gandhi in #19521
    • [Time Series Transformer] Add doc tests by @NielsRogge in #19607
    • fix BLOOM ONNX config by @NouamaneTazi in #19573
    • Fix test_tf_encode_plus_sent_to_model for TAPAS by @ydshieh in #19559
    • Allow usage of TF Text BertTokenizer on TFBertTokenizer to make it servable on TF Serving by @piEsposito in #19590
    • add gloo backend support for CPU DDP by @sywangyi in #19555
    • Fix ImageToTextPipelineTests.test_small_model_tf by @ydshieh in #19565
    • Fix FlaubertTokenizer by @ydshieh in #19552
    • Visual Bert config for doctest by @ztjhz in #19605
    • GPTTokenizer dependency removed from deberta class by @RamitPahwa in #19551
    • xlm roberta config for doctest by @ztjhz in #19609
    • Ernie config for doctest by @ztjhz in #19611
    • xlm roberta xl config for doctest by @ztjhz in #19610
    • fix: small error by @0xflotus in #19612
    • Improve error messaging for ASR pipeline. by @Narsil in #19570
    • [Doctest] LeViT Config for doctest by @daspartho in #19622
    • [Doctest] DistilBERT Config for doctest by @daspartho in #19621
    • [Whisper] Fix gradient checkpointing (again!) by @sanchit-gandhi in #19548
    • [Doctest] Add configuration_resnet.py by @daspartho in #19620
    • Fix whisper doc by @ArthurZucker in #19608
    • Sharding fails in TF when absolute scope was modified if . in layer name by @ArthurZucker in #19124
    • [Doctest] Add configuration_vision_text_dual_encoder.py by @SD-13 in #19580
    • [Doctest] Add configuration_vision_encoder_decoder.py by @SD-13 in #19583
    • [Doctest] Add configuration_time_series_transformer.py by @SD-13 in #19582
    • Tokenizer from_pretrained should not use local files named like tokenizer files by @sgugger in #19626
    • [Doctest] CodeGen config for doctest by @AymenBer99 in #19633
    • [Doctest] Add configuration_data2vec_text.py by @daspartho in #19636
    • [Doctest] Conditional DETR config for doctest by @AymenBer99 in #19641
    • [Doctest] XLNet config for doctest by @AymenBer99 in #19649
    • [Doctest] Add configuration_trocr.py by @thliang01 in #19658
    • Add doctest info in testingmdx by @ArthurZucker in #19623
    • Add pillow to layoutlmv3 example requirements.txt by @Spacefish in #19663
    • add return types for tf gptj, xlm, and xlnet by @sirmammingtonham in #19638
    • Fix pipeline predict transform methods by @s-udhaya in #19657
    • Type hints MCTCT by @rchan26 in #19618
    • added type hints for Yolos Pytorch model by @WhiteWolf47 in #19545
    • A few CI fixes for DocumentQuestionAnsweringPipeline by @ankrgyl in #19584
    • Removed Bert interdependency from Funnel transformer by @mukesh663 in #19655
    • fix warnings in deberta by @sanderland in #19458
    • word replacement line #231 by @shreem-123 in #19662
    • [Doctest] Add configuration_transfo_xl.py by @thliang01 in #19651
    • Update perf_train_gpu_one.mdx by @cakiki in #19676
    • object-detection instead of object_detection by @Spacefish in #19677
    • add return_tensor parameter for feature extraction by @ajsanjoaquin in #19257
    • Fix code examples of DETR and YOLOS by @NielsRogge in #19669
    • Revert "add return_tensor parameter for feature extraction by @sgugger in #19257)"
    • Fixed the docstring and type hint for forced_decoder_ids option in Ge… by @koreyou in #19640
    • Add normalize to image transforms module by @amyeroberts in #19544
    • [Doctest] Data2VecAudio Config for doctest by @daspartho in #19635
    • Update ESM checkpoints to point to facebook/ by @Rocketknight1 in #19675
    • Removed XLMModel inheritance from FlaubertModel(torch+tf) by @D3xter1922 in #19432
    • [Examples] make default preprocessing_num_workers=1 by @Yang-YiFan in #19684
    • [Doctest] Add configuration_convbert.py by @AymenBer99 in #19643
    • [Doctest] Add configuration_realm.py by @ak04p in #19646
    • Update CONTRIBUTING.md by @shreem-123 in #19689
    • [Doctest] Add configuration_data2vec_vision.py by @daspartho in #19637
    • Fix some CI torch device issues for PyTorch 1.13 by @ydshieh in #19681
    • Fix checkpoint used in VisualBertConfig doc example by @ydshieh in #19692
    • Fix dtype in radnomly initialized head by @sgugger in #19690
    • fix tests by @ArthurZucker in #19670
    • fix test whisper with new max length by @ArthurZucker in #19668
    • check decoder_inputs_embeds is None before shifting labels by @ArthurZucker in #19671
    • Fix docs by @NielsRogge in #19687
    • update documentation by @ArthurZucker in #19706
    • Improve DETR models by @NielsRogge in #19644
    • Small fixes for TF-ESM1b and ESM-1b weight conversions by @Rocketknight1 in #19683
    • Fix typo in perf docs by @cakiki in #19705
    • Fix redundant normalization of OWL-ViT text embeddings by @alaradirik in #19712
    • Allow user-managed Pool in Wav2Vec2ProcessorWithLM.batch_decode by @falcaopetri in #18351
    • [Doctest] CVT config for doctest by @AymenBer99 in #19695
    • [Doctest] Add configuration_wav2vec2.py to documentation_tests.py by @juancopi81 in #19698
    • ]Fixed pegasus config doctest by @mukesh663 in #19722
    • fix seq2seqtrainer predict without labels by @IvanSedykh in #19721
    • add return_tensors parameter for feature_extraction 2 by @Narsil in #19707
    • Improving image-segmentation pipeline tests. by @Narsil in #19710
    • [Doctest] Adding config files for convnext by @soma2000-lang in #19717
    • [Doctest] Fixing doctest configuration_pegasus_x.py by @mukesh663 in #19725
    • Specify TF framework in TF-related pipeline tests by @ydshieh in #19719
    • Add docs by @NielsRogge in #19729
    • Fix activations being all the same module by @sgugger in #19728
    • add accelerate support for Whisper by @younesbelkada in #19697
    • Clean up deprecation warnings by @Davidy22 in #19654
    • Repo utils test by @sgugger in #19696
    • Add decorator to flaky test by @amyeroberts in #19674
    • [Doctest] Add doctest for FlavaConfig and FNetConfig by @ndrohith09 in #19724
    • Update contribution guide by @stevhliu in #19700
    • [Doctest] Add wav2vec2_conformer for doctest by @juancopi81 in #19734
    • [Doctest] XLM Config for doctest by @AymenBer99 in #19685
    • [Doctest] Add configuration_clip.py by @daspartho in #19647
    • [Doctest] GPTNeoConfig , GPTNeoXConfig , GPTNeoXJapaneseConfig by @ndrohith09 in #19741
    • Update modeling_markuplm.py by @IMvision12 in #19723
    • Fix issue #19300 by @raghavanone in #19483
    • [Doctest] Add configuration_wavlm.py by @juancopi81 in #19749
    • Specify TF framework explicitly in more pipeline tests by @ydshieh in #19748
    • Fix cache version file creation by @sgugger in #19750
    • Image transforms add center crop by @amyeroberts in #19718
    • [Doctest] Add configuration_decision_transformer.py by @Xabilahu in #19751
    • [Doctest] Add configuration_detr.py by @Xabilahu in #19752
    • Fixed spacing errors by @shreya24ag in #19754
    • All broken links were fixed in contributing file by @mdfaizanahmed786 in #19760
    • [Doctest] SpeechToTextTransformer Config for doctest by @daspartho in #19757
    • [Doctest] SqueezeBERT Config for doctest by @daspartho in #19758
    • [Doctest] SpeechToTextTransformer2 Config for doctest by @daspartho in #19756
    • [Doctest] OpenAIGPTConfig and OPTConfig by @ndrohith09 in #19763
    • image-segmentation pipeline: re-enable small_model_pt test. by @Narsil in #19716
    • Update modeling_layoutlmv3.py by @IMvision12 in #19753
    • adding key pair dataset by @rohit1998 in #19765
    • Fix exception thrown using MishActivation by @chinoll in #19739
    • [FLAX] Add dtype to embedding for gpt2 model by @merrymercy in #18462
    • TF: sample generation compatible with XLA and dynamic batch sizes by @gante in #19773
    • Install tf2onnx dev version by @ydshieh in #19755
    • Fix docker image build by @ydshieh in #19759
    • PT <-> TF for composite models by @ydshieh in #19732
    • Add warning about restarting runtime to import errors by @Rocketknight1 in #19774
    • Added support for multivariate independent emission heads by @kashif in #19453
    • Update ImageToTextPipelineTests.test_small_model_tf by @ydshieh in #19785
    • Make public versions of private tensor utils by @sgugger in #19775
    • Update training.mdx by @ftorres16 in #19791
    • [ custom_models.mdx ] - Translated to Portuguese the custom models tutorial. by @davialvb in #19779
    • Add sentencepiece to BertJapaneseTokenizer by @conan1024hao in #19769
    • Fix CTRL test_torchscrip_xxx CI by updating _create_and_check_torchscript by @ydshieh in #19786
    • Fix nightly test setup by @sgugger in #19792
    • Fix image segmentation pipeline errors, resolve backward compatibility issues by @alaradirik in #19768
    • Fix error/typo in docstring of TokenClassificationPipeline by @pchr8 in #19798
    • Use None to detect if truncation was unset by @sgugger in #19794
    • Generate: contrastive search test updates by @gante in #19787
    • Run some TF Whisper tests in subprocesses to avoid GPU OOM by @ydshieh in #19772
    • Added translation of run_scripts.mdx to Portuguese Issue #16824 by @davialvb in #19800
    • Generate: minor docstring fix by @gante in #19801
    • [Doctest] MaskFormerConfig doctest by @sha016 in #19817
    • [Doctest] Add configuration_plbart.py by @ayaka14732 in #19809
    • [Doctest] Add configuration_poolformer.py by @ayaka14732 in #19808
    • [Doctest] Add configuration_electra.py by @ayaka14732 in #19807
    • [Doctest] Add configuration_nezha.py by @ayaka14732 in #19810
    • Display the number of trainable parameters when lauching a training by @regisss in #19835
    • replace reference to Datasets in metrics deprecation with Evaluate by @angus-lherrou in #19812
    • Fix OOM in Config doctest by @ydshieh in #19840
    • fix broken links in testing.mdx by @XFFXFF in #19820
    • fix image2test args forwarding by @kventinel in #19648
    • Added translation of converting_tensorflow_models.mdx to Portuguese Issue #16824 by @davialvb in #19824
    • Fix nightly CircleCI by @ydshieh in #19837
    • fixed typo in fp16 training section for perf_train_gpu_one by @dsingal0 in #19736
    • Update LEDModelIntegrationTests expected values by @ydshieh in #19841
    • Improve check copies by @kventinel in #19829
    • Fix doctest for MarkupLM by @ydshieh in #19845
    • add small updates only by @stevhliu in #19847
    • Refactor conversion function by @sgugger in #19799
    • Spanish translation of multiple_choice.mdx, question_answering.mdx. by @alceballosa in #19821
    • Fix doctest for GenerationMixin.contrastive_search by @ydshieh in #19863
    • Add missing lang tokens in M2M100Tokenizer.get_vocab by @guillaumekln in #18416
    • Added translation of serialization.mdx to Portuguese Issue #16824 by @davialvb in #19869
    • Generate: contrastive search cosmetic tweaks by @gante in #19871
    • [Past CI] Vilt only supports PT >= v1.10 by @LysandreJik in #19851
    • Fix incorrect model<->tokenizer mapping in tokenization testing by @ydshieh in #19872
    • Update doc for revision and token by @sgugger in #19793
    • Factored out some code in the image-segmentation pipeline. by @Narsil in #19727
    • [DOCTEST] Config doctest for MCTCT, MBart and LayoutLM by @Revanth2002 in #19889
    • Fix LR by @regisss in #19875
    • Correct README image text by @KayleeDavisGitHub in #19883
    • No conv bn folding in ipex to avoid warning by @sanderland in #19870
    • Add missing information on token_type_ids for roberta model by @raghavanone in #19766
    • Change the import of kenlm from github to pypi by @raghavanone in #19770
    • Update max_diff in test_save_load_fast_init_to_base by @ydshieh in #19849
    • Allow flax subfolder by @patrickvonplaten in #19902
    • accelerate support for RoBERTa family by @younesbelkada in #19906
    • Add checkpoint links in a few config classes by @ydshieh in #19910
    • Generate: contrastive search uses existing abstractions and conventions by @gante in #19896
    • Convert None logits processor/stopping criteria to empty list. by @ccmaymay in #19880
    • Some fixes regarding auto mappings and test class names by @ydshieh in #19923
    • Fix bug in Wav2Vec2's GPU tests by @falcaopetri in #19803
    • Fix warning when collating list of numpy arrays by @sgugger in #19846
    • Add type hints to TFPegasusModel by @EdAbati in #19858
    • Remove embarrassing debug print() in save_pretrained by @Rocketknight1 in #19922
    • Add accelerate support for M2M100 by @younesbelkada in #19912
    • Add RoBERTa resources by @stevhliu in #19911
    • Add T5 resources by @stevhliu in #19878
    • Add BLOOM resources by @stevhliu in #19881
    • Add GPT2 resources by @stevhliu in #19879
    • Let inputs of fast tokenizers be tuples as well as lists by @sgugger in #19898
    • Add accelerate support for BART-like models by @younesbelkada in #19927
    • Create dummy models by @ydshieh in #19901
    • Support segformer fx by @dwlim-nota in #19924
    • Use self._trial to generate trial_name for Trainer. by @reyoung in #19874
    • Add Onnx Config for ImageGPT by @RaghavPrabhakar66 in #19868
    • Update Code of Conduct to Contributor Covenant v2.1 by @pankali in #19935
    • add resources for bart by @stevhliu in #19928
    • add resources for distilbert by @stevhliu in #19930
    • Add wav2vec2 resources by @stevhliu in #19931
    • [Conditional, Deformable DETR] Add postprocessing methods by @NielsRogge in #19709
    • Fix ONNX tests for ONNX Runtime v1.13.1 by @lewtun in #19950
    • donut -> donut-swin by @ydshieh in #19920
    • [Doctest] Add configuration_deberta.py by @Saad135 in #19968
    • gradient checkpointing for GPT-NeoX by @chiaolun in #19946
    • [modelcard] Update for ASR by @sanchit-gandhi in #19985
    • [ASR] Update 'tasks' for model card by @sanchit-gandhi in #19986
    • Tranformers documentation translation to Italian #17459 by @draperkm in #19988
    • Pin torch to < 1.13 temporarily by @ydshieh in #19989
    • Add support for gradient checkpointing by @NielsRogge in #19990

    Significant community contributions

    The following contributors have made significant changes to the library over the last release:

    • @arnaudstiegler
      • Make LayoutLM tokenizers independent from BertTokenizer (#19351)
    • @asofiaoliveira
      • Make XLMRoberta model and config independent from Roberta (#19359)
    • @srhrshr
      • Decouples XLMProphet model from Prophet (#19406)
    • @Davidy22
      • Make bert_japanese and cpm independent of their inherited modules (#19431)
      • Clean up deprecation warnings (#19654)
    • @mathieujouffroy
      • [CvT] Tensorflow implementation (#18597)
      • using trunc_normal for weight init & cls_token (#19486)
    • @IMvision12
      • New (#19481)
      • Added type hints to DebertaV2ForMultipleChoice Pytorch (#19536)
      • Update modeling_markuplm.py (#19723)
      • Update modeling_layoutlmv3.py (#19753)
    • @501Good
      • Make MobileBert tokenizers independent from Bert (#19531)
    • @mukesh663
      • Removed Bert interdependency from Funnel transformer (#19655)
      • ]Fixed pegasus config doctest (#19722)
      • [Doctest] Fixing doctest configuration_pegasus_x.py (#19725)
    • @D3xter1922
      • Removed XLMModel inheritance from FlaubertModel(torch+tf) (#19432)
    • @falcaopetri
      • Allow user-managed Pool in Wav2Vec2ProcessorWithLM.batch_decode (#18351)
      • Fix bug in Wav2Vec2's GPU tests (#19803)
    • @gmftbyGMFTBY
      • Adding the state-of-the-art contrastive search decoding methods for the codebase of generation_utils.py (#19477)
    • @davialvb
      • [ custom_models.mdx ] - Translated to Portuguese the custom models tutorial. (#19779)
      • Added translation of run_scripts.mdx to Portuguese Issue #16824 (#19800)
      • Added translation of converting_tensorflow_models.mdx to Portuguese Issue #16824 (#19824)
      • Added translation of serialization.mdx to Portuguese Issue #16824 (#19869)
    • @alceballosa
      • Spanish translation of multiple_choice.mdx, question_answering.mdx. (#19821)
    Source code(tar.gz)
    Source code(zip)
  • v4.23.1(Oct 11, 2022)

    Fix a revert introduced by mistake making the "automatic-speech-recognition" for Whisper.

    • Fix whisper for pipeline by @ArthurZucker in #19482
    Source code(tar.gz)
    Source code(zip)
  • v4.23.0(Oct 10, 2022)

    Whisper

    The Whisper model was proposed in Robust Speech Recognition via Large-Scale Weak Supervision by Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, Ilya Sutskever.

    Whisper is an encoder-decoder Transformer trained on 680,000 hours of labeled (transcribed) audio. The model shows impressive performance and robustness in a zero-shot setting, in multiple languages.

    • Add WhisperModel to transformers by @ArthurZucker in #19166
    • Add TF whisper by @amyeroberts in #19378

    Deformable DETR

    The Deformable DETR model was proposed in Deformable DETR: Deformable Transformers for End-to-End Object Detection by Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, Jifeng Dai.

    Deformable DETR mitigates the slow convergence issues and limited feature spatial resolution of the original DETR by leveraging a new deformable attention module which only attends to a small set of key sampling points around a reference.

    • Add Deformable DETR by @NielsRogge in #17281
    • [fix] Add DeformableDetrFeatureExtractor by @NielsRogge in #19140

    Conditional DETR

    The Conditional DETR model was proposed in Conditional DETR for Fast Training Convergence by Depu Meng, Xiaokang Chen, Zejia Fan, Gang Zeng, Houqiang Li, Yuhui Yuan, Lei Sun, Jingdong Wang.

    Conditional DETR presents a conditional cross-attention mechanism for fast DETR training. Conditional DETR converges 6.7× to 10× faster than DETR.

    • Add support for conditional detr by @DeppMeng in #18948
    • Improve conditional detr docs by @NielsRogge in #19154

    Time Series Transformer

    The Time Series Transformer model is a vanilla encoder-decoder Transformer for time series forecasting.

    The model is trained in a similar way to how one would train an encoder-decoder Transformer (like T5 or BART) for machine translation; i.e. teacher forcing is used. At inference time, one can autoregressively generate samples, one time step at a time.

    :warning: This is a recently introduced model and modality, so the API hasn't been tested extensively. There may be some bugs or slight breaking changes to fix it in the future. If you see something strange, file a Github Issue.

    • time series forecasting model by @kashif in #17965

    Masked Siamese Networks

    The ViTMSN model was proposed in Masked Siamese Networks for Label-Efficient Learning by Mahmoud Assran, Mathilde Caron, Ishan Misra, Piotr Bojanowski, Florian Bordes, Pascal Vincent, Armand Joulin, Michael Rabbat, Nicolas Ballas.

    MSN (masked siamese networks) consists of a joint-embedding architecture to match the prototypes of masked patches with that of the unmasked patches. With this setup, the method yields excellent performance in the low-shot and extreme low-shot regimes for image classification, outperforming other self-supervised methods such as DINO. For instance, with 1% of ImageNet-1K labels, the method achieves 75.7% top-1 accuracy.

    • MSN (Masked Siamese Networks) for ViT by @sayakpaul in #18815

    MarkupLM

    The MarkupLM model was proposed in MarkupLM: Pre-training of Text and Markup Language for Visually-rich Document Understanding by Junlong Li, Yiheng Xu, Lei Cui, Furu Wei.

    MarkupLM is BERT, but applied to HTML pages instead of raw text documents. The model incorporates additional embedding layers to improve performance, similar to LayoutLM.

    The model can be used for tasks like question answering on web pages or information extraction from web pages. It obtains state-of-the-art results on 2 important benchmarks: WebSRC and SWDE.

    • Add MarkupLM by @NielsRogge in #19198

    Security & safety

    We explore a new serialization format not using Pickle that we can then leverage in the three frameworks we support: PyTorch, TensorFlow, and JAX. We leverage the safetensors library for that.

    Support is for PyTorch models only at this stage, and still experimental.

    • Poc to use safetensors by @sgugger in #19175

    Computer vision post-processing methods overhaul

    The processors for computer vision have been overhauled to ensure they have consistent naming, input arguments and outputs. :warning: The existing methods that are superseded by the introduced methods post_process_object_detection, post_process_semantic_segmentation, post_process_instance_segmentation, post_process_panoptic_segmentation are now deprecated.

    • Improve DETR post-processing methods by @alaradirik in #19205
    • Beit postprocessing by @alaradirik in #19099
    • Fix BeitFeatureExtractor postprocessing by @alaradirik in #19119
    • Add post_process_semantic_segmentation method to SegFormer by @alaradirik in #19072
    • Add post_process_semantic_segmentation method to DPTFeatureExtractor by @alaradirik in #19107
    • Add semantic segmentation post-processing method to MobileViT by @alaradirik in #19105
    • Detr preprocessor fix by @alaradirik in #19007
    • Improve and fix ImageSegmentationPipeline by @alaradirik in #19367
    • Restructure DETR post-processing, return prediction scores by @alaradirik in #19262
    • Maskformer post-processing fixes and improvements by @alaradirik in #19172
    • Fix MaskFormer failing postprocess tests by @alaradirik in #19354
    • Fix DETR segmentation postprocessing output by @alaradirik in #19363
    • fix docs example, add object_detection to DETR docs by @alaradirik in #19377

    🚨 Breaking changes

    The following changes are bugfixes that we have chosen to fix even if it changes the resulting behavior. We mark them as breaking changes, so if you are using this part of the codebase, we recommend you take a look at the PRs to understand what changes were done exactly..

    Breaking change for ViT parameter initialization

    • 🚨🚨🚨 Fix ViT parameter initialization by @alaradirik in #19341

    Breaking change for the top_p argument of the TopPLogitsWarper of the generate method.

    • 🚨🚨🚨 Optimize Top P Sampler and fix edge case by @ekagra-ranjan in #18984

    Model head additions

    OPT and BLOOM now have question answering heads available.

    • Add OPTForQuestionAnswering by @clementapa in #19402
    • Add BloomForQuestionAnswering by @younesbelkada in #19310

    Pipelines

    There is now a zero-shot object detection pipeline.

    • Add ZeroShotObjectDetectionPipeline by @sahamrit in #18445)

    TensorFlow architectures

    The GroupViT model is now available in TensorFlow.

    • [TensorFlow] Adding GroupViT by @ariG23498 in #18020

    Bugfixes and improvements

    • Fix a broken link for deepspeed ZeRO inference in the docs by @nijkah in #19001
    • [doc] debug: fix import by @stas00 in #19042
    • [bnb] Small improvements on utils by @younesbelkada in #18646
    • Update image segmentation pipeline test by @amyeroberts in #18731
    • Fix test_save_load for TFViTMAEModelTest by @ydshieh in #19040
    • Pin minimum PyTorch version for BLOOM ONNX export by @lewtun in #19046
    • Update serving signatures and make sure we actually use them by @Rocketknight1 in #19034
    • Move cache: expand error message by @sgugger in #19051
    • Fixing OPT fast tokenizer option. by @Narsil in #18753
    • Fix custom tokenizers test by @sgugger in #19052
    • Run torchdynamo tests by @ydshieh in #19056
    • [fix] Add DeformableDetrFeatureExtractor by @NielsRogge in #19140
    • fix arg name in BLOOM testing and remove unused arg document by @shijie-wu in #18843
    • Adds package and requirement spec output to version check exception by @colindean in #18702
    • fix use_cache by @younesbelkada in #19060
    • FX support for ConvNext, Wav2Vec2 and ResNet by @michaelbenayoun in #19053
    • [doc] Fix link in PreTrainedModel documentation by @tomaarsen in #19065
    • Add FP32 cast in ConvNext LayerNorm to prevent rounding errors with FP16 input by @jimypbr in #18746
    • Organize test jobs by @sgugger in #19058
    • Automatically tag CLIP repos as zero-shot-image-classification by @osanseviero in #19064
    • Fix LeViT checkpoint by @ydshieh in #19069
    • TF: tests for (de)serializable models with resized tokens by @gante in #19013
    • Add type hints for PyTorch UniSpeech, MPNet and Nystromformer by @daspartho in #19039
    • replace logger.warn by logger.warning by @fxmarty in #19068
    • Fix tokenizer load from one file by @sgugger in #19073
    • Note about developer mode by @LysandreJik in #19075
    • german autoclass by @flozi00 in #19049
    • Add tests for legacy load by url and fix bugs by @sgugger in #19078
    • Add runner availability check by @ydshieh in #19054
    • fix working dir by @ydshieh in #19101
    • Added type hints for TFConvBertModel by @kishore-s-15 in #19088
    • Added Type hints for VIT MAE by @kishore-s-15 in #19085
    • Add type hints for TF MPNet models by @kishore-s-15 in #19089
    • Added type hints to ResNetForImageClassification by @kishore-s-15 in #19084
    • added type hints by @daspartho in #19076
    • Improve vision models docs by @NielsRogge in #19103
    • correct spelling in README by @flozi00 in #19092
    • Don't warn of move if cache is empty by @sgugger in #19109
    • HPO: keep the original logic if there's only one process, pass the trial to trainer by @sywangyi in #19096
    • Add documentation of Trainer.create_model_card by @sgugger in #19110
    • Added type hints for YolosForObjectDetection by @kishore-s-15 in #19086
    • Fix the wrong schedule by @ydshieh in #19117
    • Change document question answering pipeline to always return an array by @ankrgyl in #19071
    • german processing by @flozi00 in #19121
    • Fix: update ltp word segmentation call in mlm_wwm by @xyh1756 in #19047
    • Add a missing space in a script arg documentation by @bryant1410 in #19113
    • Skip test_export_to_onnx for LongT5 if torch < 1.11 by @ydshieh in #19122
    • Fix GLUE MNLI when using max_eval_samples by @lvwerra in #18722
    • [BugFix] Fix fsdp option on shard_grad_op. by @ZHUI in #19131
    • Fix FlaxPretTrainedModel pt weights check by @mishig25 in #19133
    • suppoer deps from github by @lhoestq in #19141
    • Fix dummy creation for multi-frameworks objects by @sgugger in #19144
    • Allowing users to use the latest tokenizers release ! by @Narsil in #19139
    • Add some tests for check_dummies by @sgugger in #19146
    • Fixed typo in generation_utils.py by @nbalepur in #19145
    • Add accelerate support for ViLT by @younesbelkada in #18683
    • TF: check embeddings range by @gante in #19102
    • Reduce LR for TF MLM example test by @Rocketknight1 in #19156
    • update perf_train_cpu_many doc by @sywangyi in #19151
    • fix: ckpt paths. by @sayakpaul in #19159
    • Fix TrainingArguments documentation by @sgugger in #19162
    • fix HPO DDP GPU problem by @sywangyi in #19168
    • [WIP] Trainer supporting evaluation on multiple datasets by @timbmg in #19158
    • Add doctests to Perceiver examples by @stevenmanton in #19129
    • Add offline runners info in the Slack report by @ydshieh in #19169
    • Fix incorrect comments about atten mask for pytorch backend by @lygztq in #18728
    • Fixed type hint for pipelines/check_task by @Fei-Wang in #19150
    • Update run_clip.py by @enze5088 in #19130
    • german training, accelerate and model sharing by @flozi00 in #19171
    • Separate Push CI images from Scheduled CI by @ydshieh in #19170
    • Remove pos arg from Perceiver's Pre/Postprocessors by @aielawady in #18602
    • Use assertAlmostEqual in BloomEmbeddingTest.test_logits by @ydshieh in #19200
    • Move the model type check by @ankrgyl in #19027
    • Use repo_type instead of deprecated datasets repo IDs by @sgugger in #19202
    • Updated hf_argparser.py by @IMvision12 in #19188
    • Add warning for torchaudio <= 0.10 in MCTCTFeatureExtractor by @ydshieh in #19203
    • Fix cached_file in offline mode for cached non-existing files by @sgugger in #19206
    • Remove unused cur_len in generation_utils.py by @ekagra-ranjan in #18874
    • add wav2vec2_alignment by @arijitx in #16782
    • add doc for hyperparameter search by @sywangyi in #19192
    • Add a use_parallel_residual argument to control the residual computing way by @NinedayWang in #18695
    • translated add_new_pipeline by @nickprock in #19215
    • More tests for regression in cached non existence by @sgugger in #19216
    • Use math.pi instead of torch.pi in MaskFormer by @ydshieh in #19201
    • Added tests for yaml and json parser by @IMvision12 in #19219
    • Fix small use_cache typo in the docs by @ankrgyl in #19191
    • Generate: add warning when left padding should be used by @gante in #19067
    • Fix deprecation warning for return_all_scores by @ogabrielluiz in #19217
    • Fix doctest for TFDeiTForImageClassification by @ydshieh in #19173
    • Document and validate typical_p in generation by @mapmeld in #19128
    • Fix trainer seq2seq qa.py evaluate log and ft script by @iamtatsuki05 in #19208
    • Fix cache names in CircleCI jobs by @ydshieh in #19223
    • Move AutoClasses under Main Classes by @stevhliu in #19163
    • Focus doc around preprocessing classes by @stevhliu in #18768
    • Fix confusing working directory in Push CI by @ydshieh in #19234
    • XGLM - Fix Softmax NaNs when using FP16 by @gsarti in #18057
    • Add a getattr method, which replaces _module_getattr in torch.fx.Tracer from PyTorch 1.13+ by @michaelbenayoun in #19233
    • Fix m2m_100.mdx doc example missing labels by @Mustapha-AJEGHRIR in #19149
    • Fix opt softmax small nit by @younesbelkada in #19243
    • Use hf_raise_for_status instead of deprecated _raise_for_status by @Wauplin in #19244
    • Fix TrainingArgs argument serialization by @atturaioe in #19239
    • Fix test fetching for examples by @sgugger in #19237
    • Cast TF generate() inputs by @Rocketknight1 in #19232
    • Skip pipeline tests by @sgugger in #19248
    • Add job names in Past CI artifacts by @ydshieh in #19235
    • Update Past CI report script by @ydshieh in #19228
    • [Wav2Vec2] Fix None loss in doc examples by @rbsteinm in #19218
    • Catch HFValidationError in TrainingSummary by @ydshieh in #19252
    • Add expected output to the sample code for ViTMSNForImageClassification by @sayakpaul in #19183
    • Add stop sequence to text generation pipeline by @KMFODA in #18444
    • Add notebooks by @JingyaHuang in #19259
    • Add beautifulsoup4 to the dependency list by @ydshieh in #19253
    • Fix Encoder-Decoder testing issue about repo. names by @ydshieh in #19250
    • Fix cached lookup filepath on windows for hub by @kjerk in #19178
    • Docs - Guide to add a new TensorFlow model by @gante in #19256
    • Update no_trainer script for summarization by @divyanshugit in #19277
    • Don't automatically add bug label by @sgugger in #19302
    • Breakup export guide by @stevhliu in #19271
    • Update Protobuf dependency version to fix known vulnerability by @qthequartermasterman in #19247
    • Update README.md by @ShubhamJagtap2000 in #19309
    • [Docs] Fix link by @patrickvonplaten in #19313
    • Fix for sequence regression fit() in TF by @Rocketknight1 in #19316
    • Added Type hints for LED TF by @IMvision12 in #19315
    • Added type hints for TF: rag model by @debjit-bw in #19284
    • alter retrived to retrieved by @gouqi666 in #18863
    • ci(stale.yml): upgrade actions/setup-python to v4 by @oscard0m in #19281
    • ci(workflows): update actions/checkout to v3 by @oscard0m in #19280
    • wrap forward passes with torch.no_grad() by @daspartho in #19279
    • wrap forward passes with torch.no_grad() by @daspartho in #19278
    • wrap forward passes with torch.no_grad() by @daspartho in #19274
    • wrap forward passes with torch.no_grad() by @daspartho in #19273
    • Removing BertConfig inheritance from LayoutLMConfig by @arnaudstiegler in #19307
    • docker-build: Update actions/checkout to v3 by @Sushrut1101 in #19288
    • Clamping hidden state values to allow FP16 by @SSamDav in #19229
    • Remove interdependency from OpenAI tokenizer by @E-Aho in #19327
    • removing XLMConfig inheritance from FlaubertConfig by @D3xter1922 in #19326
    • Removed interdependency of BERT's Tokenizer in tokenization of prophetnet by @divyanshugit in #19331
    • Remove bert interdependency from clip tokenizer by @shyamsn97 in #19332
    • [WIP]remove XLMTokenizer inheritance from FlaubertTokenizer by @D3xter1922 in #19330
    • Making camembert independent from roberta, clean by @Mustapha-AJEGHRIR in #19337
    • Add sudachi and jumanpp tokenizers for bert_japanese by @r-terada in #19043
    • Frees LongformerTokenizer of the Roberta dependency by @srhrshr in #19346
    • Change BloomConfig docstring by @younesbelkada in #19336
    • Test failing test while we resolve the issue. by @sgugger in #19355
    • Call _set_save_spec() when creating TF models by @Rocketknight1 in #19321
    • correct typos in README by @paulaxisabel in #19304
    • Removes Roberta and Bert config dependencies from Longformer by @srhrshr in #19343
    • Fix gather for metrics by @muellerzr in #19360
    • Fix pipeline tests for Roberta-like tokenizers by @sgugger in #19365
    • Change link of repojacking vulnerable link by @Ilaygoldman in #19393
    • Making ConvBert Tokenizer independent from bert Tokenizer by @IMvision12 in #19347
    • Fix gather for metrics by @muellerzr in #19389
    • Added Type hints for XLM TF by @IMvision12 in #19333
    • add ONNX support for swin transformer by @bibhabasumohapatra in #19390
    • removes prophet config dependencies from xlm-prophet by @srhrshr in #19400
    • Added type hints for TF: TransfoXL by @thliang01 in #19380
    • HF <-> megatron checkpoint reshaping and conversion for GPT by @pacman100 in #19317
    • Remove unneded words from audio-related feature extractors by @osanseviero in #19405
    • edit: cast attention_mask to long in DataCollatorCTCWithPadding by @ddobokki in #19369
    • Copy BertTokenizer dependency into retribert tokenizer by @Davidy22 in #19371
    • Export TensorFlow models to ONNX with dynamic input shapes by @dwyatte in #19255
    • update attention mask handling by @ArthurZucker in #19385
    • Remove dependency of Bert from Squeezebert tokenizer by @rchan26 in #19403
    • Removed Bert and XML Dependency from Herbert by @harry7337 in #19410
    • Clip device map by @patrickvonplaten in #19409
    • Remove Dependency between Bart and LED (slow/fast) by @Infrared1029 in #19408
    • Removed Bert interdependency in tokenization_electra.py by @OtherHorizon in #19356
    • Make Camembert TF version independent from Roberta by @Mustapha-AJEGHRIR in #19364
    • Removed Bert dependency from BertGeneration code base. by @Threepointone4 in #19370
    • Rework pipeline tests by @sgugger in #19366
    • Fix ViTMSNForImageClassification doctest by @ydshieh in #19275
    • Skip BloomEmbeddingTest.test_embeddings for PyTorch < 1.10 by @ydshieh in #19261
    • remove RobertaConfig inheritance from MarkupLMConfig by @D3xter1922 in #19404
    • Backtick fixed (paragraph 68) by @kant in #19440
    • Fixed duplicated line (paragraph #83) Documentation: @sgugger by @kant in #19436
    • fix marianMT convertion to onnx by @kventinel in #19287
    • Fix typo in image-classification/README.md by @zhawe01 in #19424
    • Stop relying on huggingface_hub's private methods by @LysandreJik in #19392
    • Add onnx support for VisionEncoderDecoder by @mht-sharma in #19254
    • Remove dependency of Roberta in Blenderbot by @rchan26 in #19411
    • fix: renamed variable name by @ariG23498 in #18850
    • Fix the error message in run_t5_mlm_flax.py by @yangky11 in #19282
    • Add Italian translation for add_new_model.mdx by @Steboss89 in #18713
    • Fix momentum and epsilon values by @amyeroberts in #19454
    • Generate: corrected exponential_decay_length_penalty type hint by @ShivangMishra in #19376
    • Fix misspelled word in docstring by @Bearnardd in #19415
    • Fixed a non-working hyperlink in the README.md file by @MikailINTech in #19434
    • fix by @ydshieh in #19469
    • wrap forward passes with torch.no_grad() by @daspartho in #19439
    • wrap forward passes with torch.no_grad() by @daspartho in #19438
    • wrap forward passes with torch.no_grad() by @daspartho in #19416
    • wrap forward passes with torch.no_grad() by @daspartho in #19414
    • wrap forward passes with torch.no_grad() by @daspartho in #19413
    • wrap forward passes with torch.no_grad() by @daspartho in #19412

    Significant community contributions

    The following contributors have made significant changes to the library over the last release:

    • @flozi00
      • german autoclass (#19049)
      • correct spelling in README (#19092)
      • german processing (#19121)
      • german training, accelerate and model sharing (#19171)
    • @DeppMeng
      • Add support for conditional detr (#18948)
    • @sayakpaul
      • MSN (Masked Siamese Networks) for ViT (#18815)
      • fix: ckpt paths. (#19159)
      • Add expected output to the sample code for ViTMSNForImageClassification (#19183)
    • @IMvision12
      • Updated hf_argparser.py (#19188)
      • Added tests for yaml and json parser (#19219)
      • Added Type hints for LED TF (#19315)
      • Making ConvBert Tokenizer independent from bert Tokenizer (#19347)
      • Added Type hints for XLM TF (#19333)
    • @ariG23498
      • [TensorFlow] Adding GroupViT (#18020)
      • fix: renamed variable name (#18850)
    • @Mustapha-AJEGHRIR
      • Fix m2m_100.mdx doc example missing labels (#19149)
      • Making camembert independent from roberta, clean (#19337)
      • Make Camembert TF version independent from Roberta (#19364)
    • @D3xter1922
      • removing XLMConfig inheritance from FlaubertConfig (#19326)
      • [WIP]remove XLMTokenizer inheritance from FlaubertTokenizer (#19330)
      • remove RobertaConfig inheritance from MarkupLMConfig (#19404)
    • @srhrshr
      • Frees LongformerTokenizer of the Roberta dependency (#19346)
      • Removes Roberta and Bert config dependencies from Longformer (#19343)
      • removes prophet config dependencies from xlm-prophet (#19400)
    • @sahamrit
      • [WIP] Add ZeroShotObjectDetectionPipeline (#18445) (#18930)
    • @Davidy22
      • Copy BertTokenizer dependency into retribert tokenizer (#19371)
    • @rchan26
      • Remove dependency of Bert from Squeezebert tokenizer (#19403)
      • Remove dependency of Roberta in Blenderbot (#19411)
    • @harry7337
      • Removed Bert and XML Dependency from Herbert (#19410)
    • @Infrared1029
      • Remove Dependency between Bart and LED (slow/fast) (#19408)
    • @Steboss89
      • Add Italian translation for add_new_model.mdx (#18713)
    Source code(tar.gz)
    Source code(zip)
  • v4.22.2(Sep 27, 2022)

    Fixes a bug where a cached tokenizer/model was not accessible anymore offline (either forcing offline mode or because of an internet issue).

    • More tests for regression in cached non existence by @sgugger in #19216
    • Fix cached_file in offline mode for cached non-existing files by @sgugger in #19206
    • Don't warn of move if cache is empty by @sgugger in #19109
    Source code(tar.gz)
    Source code(zip)
  • v4.22.0(Sep 14, 2022)

    Swin Transformer v2

    The Swin Transformer V2 model was proposed in Swin Transformer V2: Scaling Up Capacity and Resolution by Ze Liu, Han Hu, Yutong Lin, Zhuliang Yao, Zhenda Xie, Yixuan Wei, Jia Ning, Yue Cao, Zheng Zhang, Li Dong, Furu Wei, Baining Guo.

    Swin Transformer v2 improves the original Swin Transformer using 3 main techniques: 1) a residual-post-norm method combined with cosine attention to improve training stability; 2) a log-spaced continuous position bias method to effectively transfer models pre-trained using low-resolution images to downstream tasks with high-resolution inputs; 3) A self-supervised pre-training method, SimMIM, to reduce the needs of vast labeled images.

    • Add swin transformer v2 by @nandwalritik in #17469

    VideoMAE

    The VideoMAE model was proposed in VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training by Zhan Tong, Yibing Song, Jue Wang, Limin Wang. VideoMAE extends masked auto encoders (MAE) to video, claiming state-of-the-art performance on several video classification benchmarks.

    VideoMAE is an extension of ViTMAE for video.

    • Add VideoMAE by @NielsRogge in #17821

    Donut

    The Donut model was proposed in OCR-free Document Understanding Transformer by Geewook Kim, Teakgyu Hong, Moonbin Yim, Jeongyeon Nam, Jinyoung Park, Jinyeong Yim, Wonseok Hwang, Sangdoo Yun, Dongyoon Han, Seunghyun Park. Donut consists of an image Transformer encoder and an autoregressive text Transformer decoder to perform document understanding tasks such as document image classification, form understanding and visual question answering.

    • Add Donut by @NielsRogge in #18488

    Pegasus-X

    The PEGASUS-X model was proposed in Investigating Efficiently Extending Transformers for Long Input Summarization by Jason Phang, Yao Zhao and Peter J. Liu.

    PEGASUS-X (PEGASUS eXtended) extends the PEGASUS models for long input summarization through additional long input pretraining and using staggered block-local attention with global tokens in the encoder.

    • PEGASUS-X by @zphang in #18551

    X-CLIP

    The X-CLIP model was proposed in Expanding Language-Image Pretrained Models for General Video Recognition by Bolin Ni, Houwen Peng, Minghao Chen, Songyang Zhang, Gaofeng Meng, Jianlong Fu, Shiming Xiang, Haibin Ling. X-CLIP is a minimal extension of CLIP for video. The model consists of a text encoder, a cross-frame vision encoder, a multi-frame integration Transformer, and a video-specific prompt generator.

    X-CLIP is a minimal extension of CLIP for video-language understanding.

    • Add X-CLIP by @NielsRogge in #18852

    ERNIE

    ERNIE is a series of powerful models proposed by baidu, especially in Chinese tasks, including ERNIE1.0, ERNIE2.0, ERNIE3.0, ERNIE-Gram, ERNIE-health, etc. These models are contributed by nghuyong and the official code can be found in PaddleNLP (in PaddlePaddle).

    • ERNIE-2.0 and ERNIE-3.0 models by @nghuyong in #18686

    TensorFlow models

    MobileViT and LayoutLMv3 are now available in TensorFlow.

    • TensorFlow MobileViT by @sayakpaul in #18555
    • [LayoutLMv3] Add TensorFlow implementation by @ChrisFugl in #18678

    New task-specific architectures

    A new question answering head was added for the LayoutLM model.

    • Add LayoutLMForQuestionAnswering model by @ankrgyl in #18407

    New pipelines

    Two new pipelines are available in transformers: a document question answering pipeline, as well as an image to text generation pipeline.

    • Add DocumentQuestionAnswering pipeline by @ankrgyl in #18414
    • Add Image To Text Generation pipeline by @OlivierDehaene in #18821

    M1 support

    There is now Mac M1 support in PyTorch in transformers in pipelines and the Trainer.

    • pipeline support for device="mps" (or any other string) by @julien-c in #18494
    • mac m1 mps integration by @pacman100 in #18598

    Backend version compatibility

    Starting from version v4.22.0, we'll now officially support PyTorch and TensorFlow versions that were released up to two years ago. Versions older than two years-old will not be supported going forward.

    We're making this change as we begin actively testing transformers compatibility on older versions. This project can be followed here.

    • PyTorch >= 1.7.0 and TensorFlow >= 2.4.0 by @sgugger in #19016

    Generate method updates

    The generate method now starts enforcing stronger validation in order to ensure proper usage.

    • Generate: validate model_kwargs (and catch typos in generate arguments) by @gante in #18261
    • Generate: validate model_kwargs on TF (and catch typos in generate arguments) by @gante in #18651
    • Generate: add model class validation by @gante in #18902

    API changes

    The as_target_tokenizer and as_target_processor context managers have been deprecated. The new API is to use the call method of the tokenizer/processor with keyword arguments. For instance:

    with tokenizer.as_target_tokenizer():
        encoded_labels = tokenizer(labels, padding=True)
    

    becomes

    encoded_labels = tokenizer(text_target=labels, padding=True)
    
    • Replace as_target context managers by direct calls by @sgugger in #18325

    Bits and bytes integration

    Bits and bytes is now integrated within transformers. This feature can reduce the size of large models by up to 2, with low loss in precision.

    • Supporting seq2seq models for bitsandbytes integration by @younesbelkada in #18579
    • bitsandbytes - Linear8bitLt integration into transformers models by @younesbelkada in #17901

    Large model support

    Models that have sharded checkpoints in PyTorch can be loaded in Flax.

    • Load sharded pt to flax by @ArthurZucker in #18419

    TensorFlow improvements

    The TensorFlow examples have been rewritten to support all recent features developped in the past months.

    • TF Examples Rewrite by @Rocketknight1 in #18451

    DeBERTa-v2 is now trainable with XLA.

    • TF: XLA-trainable DeBERTa v2 by @gante in #18546

    Documentation changes

    • Split model list on modality by @stevhliu in #18328

    Improvements and bugfixes

    • sentencepiece shouldn't be required for the fast LayoutXLM tokenizer by @LysandreJik in #18320
    • Fix sacremoses sof dependency for Transformers XL by @sgugger in #18321
    • Owlvit test fixes by @alaradirik in #18303
    • [Flax] Fix incomplete batches in example scripts by @sanchit-gandhi in #17863
    • start from 1.12, torch_ccl is renamed as oneccl_bindings_for_pytorch … by @sywangyi in #18229
    • Update feature extractor docs by @stevhliu in #18324
    • fixed typo by @banda-larga in #18331
    • updated translation by @banda-larga in #18333
    • Updated _toctree.yml by @nickprock in #18337
    • Update automatic_speech_recognition.py by @bofenghuang in #18339
    • Fix codeparrot deduplication - ignore whitespaces by @loubnabnl in #18023
    • Remove Flax OPT from doctest for now by @ydshieh in #18338
    • Include tensorflow-aarch64 as a candidate by @ankrgyl in #18345
    • [BLOOM] Deprecate position_ids by @thomasw21 in #18342
    • Migrate metric to Evaluate library for tensorflow examples by @VijayKalmath in #18327
    • Migrate metrics used in flax examples to Evaluate by @VijayKalmath in #18348
    • [Docs] Fix Speech Encoder Decoder doc sample by @sanchit-gandhi in #18346
    • Fix OwlViT torchscript tests by @ydshieh in #18347
    • Fix some doctests by @ydshieh in #18359
    • [FX] Symbolic trace for Bloom by @michaelbenayoun in #18356
    • Fix TFSegformerForSemanticSegmentation doctest by @ydshieh in #18362
    • fix FSDP ShardedGradScaler by @pacman100 in #18358
    • Migrate metric to Evaluate in Pytorch examples by @atturaioe in #18369
    • Correct the spelling of bleu metric by @ToluClassics in #18375
    • Remove pt-like calls on tf tensor by @amyeroberts in #18393
    • Fix from_pretrained kwargs passing by @YouJiacheng in #18387
    • Add a check regarding the number of occurrences of ``` by @ydshieh in #18389
    • Add evaluate to test dependencies by @sgugger in #18396
    • Fix OPT doc tests by @ArthurZucker in #18365
    • Fix doc tests by @NielsRogge in #18397
    • Add balanced strategies for device_map in from_pretrained by @sgugger in #18349
    • Fix docs by @NielsRogge in #18399
    • Adding fine-tuning models to LUKE by @ikuyamada in #18353
    • Fix ROUGE add example check and update README by @sgugger in #18398
    • Add Flax BART pretraining script by @duongna21 in #18297
    • Rewrite push_to_hub to use upload_files by @sgugger in #18366
    • Layoutlmv2 tesseractconfig by @kelvinAI in #17733
    • fix: create a copy for tokenizer object by @YBooks in #18408
    • Fix uninitialized parameter in conformer relative attention. by @PiotrDabkowski in #18368
    • Fix the hub user name in a longformer doctest checkpoint by @ydshieh in #18418
    • Change audio kwarg to images in TROCR processor by @ydshieh in #18421
    • update maskformer docs by @alaradirik in #18423
    • Fix test_load_default_pipelines_tf test error by @ydshieh in #18422
    • fix run_clip README by @ydshieh in #18332
    • Improve generate docstring by @JoaoLages in #18198
    • Accept trust_remote_code and ignore it in PreTrainedModel.from_pretrained by @ydshieh in #18428
    • Update pipeline word heuristic to work with whitespace in token offsets by @davidbenton in #18402
    • Add programming languages by @cakiki in #18434
    • fixing error when using sharded ddp by @pacman100 in #18435
    • Update _toctree.yml by @stevhliu in #18440
    • support ONNX export of XDropout in deberta{,_v2} and sew_d by @garymm in #17502
    • Add Spanish translation of run_scripts.mdx by @donelianc in #18415
    • Update no trainer scripts for language modeling and image classification examples by @nandwalritik in #18443
    • Update pinned hhub version by @osanseviero in #18448
    • Fix failing tests for XLA generation in TF by @dsuess in #18298
    • add zero-shot obj detection notebook to docs by @alaradirik in #18453
    • fix: keras fit tests for segformer tf and minor refactors. by @sayakpaul in #18412
    • Fix torch version comparisons by @LSinev in #18460
    • [BLOOM] Clean modeling code by @thomasw21 in #18344
    • change shape to support dynamic batch input in tf.function XLA generate for tf serving by @nlpcat in #18372
    • HFTracer.trace can now take callables and torch.nn.Module by @michaelbenayoun in #18457
    • Update no trainer scripts for multiple-choice by @kiansierra in #18468
    • Fix load of model checkpoints in the Trainer by @sgugger in #18470
    • Add FX support for torch.baddbmm andd torch.Tensor.baddbmm by @thomasw21 in #18363
    • Add machine type in the artifact of Examples directory job by @ydshieh in #18459
    • Update no trainer examples for QA and Semantic Segmentation by @kiansierra in #18474
    • Add TF_MODEL_FOR_SEMANTIC_SEGMENTATION_MAPPING by @ydshieh in #18469
    • Fixing issue where generic model types wouldn't load properly with the pipeline by @Narsil in #18392
    • Fix TFSwinSelfAttention to have relative position index as non-trainable weight by @harrydrippin in #18226
    • Refactor TFSwinLayer to increase serving compatibility by @harrydrippin in #18352
    • Add TF prefix to TF-Res test class by @ydshieh in #18481
    • Remove py.typed by @sgugger in #18485
    • Fix pipeline tests by @sgugger in #18487
    • Use new huggingface_hub tools for download models by @sgugger in #18438
    • Fix test_dbmdz_english by updating expected values by @ydshieh in #18482
    • Move cache folder to huggingface/hub for consistency with hf_hub by @sgugger in #18492
    • Update some expected values in quicktour.mdx for resampy 0.3.0 by @ydshieh in #18484
    • disable Onnx test for google/long-t5-tglobal-base by @ydshieh in #18454
    • Typo reported by Joel Grus on TWTR by @julien-c in #18493
    • Just re-reading the whole doc every couple of months 😬 by @julien-c in #18489
    • transformers-cli login => huggingface-cli login by @julien-c in #18490
    • Add seed setting to image classification example by @regisss in #18519
    • [DX fix] Fixing QA pipeline streaming a dataset. by @Narsil in #18516
    • Clean up hub by @sgugger in #18497
    • update fsdp docs by @pacman100 in #18521
    • Fix compatibility with 1.12 by @sgugger in #17925
    • Specify en in doc-builder README example by @ankrgyl in #18526
    • New cache fixes: add safeguard before looking in folders by @sgugger in #18522
    • unpin resampy by @ydshieh in #18527
    • ✨ update to use interlibrary links instead of Markdown by @stevhliu in #18500
    • Add example of multimodal usage to pipeline tutorial by @stevhliu in #18498
    • [VideoMAE] Add model to doc tests by @NielsRogge in #18523
    • Update perf_train_gpu_one.mdx by @mishig25 in #18532
    • Update no_trainer.py scripts to include accelerate gradient accumulation wrapper by @Rasmusafj in #18473
    • Add Spanish translation of converting_tensorflow_models.mdx by @donelianc in #18512
    • Spanish translation of summarization.mdx by @AguilaCudicio in #15947)
    • Let's not cast them all by @younesbelkada in #18471
    • fix: data2vec-vision Onnx ready-made configuration. by @NikeNano in #18427
    • Add mt5 onnx config by @ChainYo in #18394
    • Minor update of run_call_with_unpacked_inputs by @ydshieh in #18541
    • BART - Fix attention mask device issue on copied models by @younesbelkada in #18540
    • Adding a new align_to_words param to qa pipeline. by @Narsil in #18010
    • 📝 update metric with evaluate by @stevhliu in #18535
    • Restore _init_weights value in no_init_weights by @YouJiacheng in #18504
    • 📝 update documentation build section by @stevhliu in #18548
    • Preserve hub-related kwargs in AutoModel.from_pretrained by @sgugger in #18545
    • Use commit hash to look in cache instead of calling head by @sgugger in #18534
    • Update philosophy to include other preprocessing classes by @stevhliu in #18550
    • Properly move cache when it is not in default path by @sgugger in #18563
    • Adds CLIP to models exportable with ONNX by @unography in #18515
    • raise atol for MT5OnnxConfig by @ydshieh in #18560
    • fix string by @mrwyattii in #18568
    • Segformer TF: fix output size in documentation by @joihn in #18572
    • Fix resizing bug in OWL-ViT by @alaradirik in #18573
    • Fix LayoutLMv3 documentation by @pocca2048 in #17932
    • Change BartLearnedPositionalEmbedding's forward method signature to support Opacus training by @donebydan in #18486
    • german docs translation by @flozi00 in #18544
    • Deberta V2: Fix critical trace warnings to allow ONNX export by @iiLaurens in #18272
    • [FX] _generate_dummy_input supports audio-classification models for labels by @michaelbenayoun in #18580
    • Fix docstrings with last version of hf-doc-builder styler by @sgugger in #18581
    • fix owlvit tests, update docstring examples by @alaradirik in #18586
    • Return the permuted hidden states if return_dict=True by @amyeroberts in #18578
    • Add type hints for ViLT models by @donelianc in #18577
    • update doc for perf_train_cpu_many, add intel mpi introduction by @sywangyi in #18576
    • typos by @stas00 in #18594
    • FSDP bug fix for load_state_dict by @pacman100 in #18596
    • Add TFAutoModelForSemanticSegmentation to the main __init__.py by @ydshieh in #18600
    • Fix URLs by @NielsRogge in #18604
    • Update BLOOM parameter counts by @Muennighoff in #18531
    • [doc] fix anchors by @stas00 in #18591
    • [fsmt] deal with -100 indices in decoder ids by @stas00 in #18592
    • small change by @younesbelkada in #18584
    • Flax Remat for LongT5 by @KMFODA in #17994
    • Change scheduled CIs to use torch 1.12.1 by @ydshieh in #18644
    • Add checks for some workflow jobs by @ydshieh in #18583
    • TF: Fix generation repetition penalty with XLA by @gante in #18648
    • Update longt5.mdx by @flozi00 in #18634
    • Update run_translation_no_trainer.py by @zhoutang776 in #18637
    • [bnb] Minor modifications by @younesbelkada in #18631
    • Examples: add Bloom support for token classification by @stefan-it in #18632
    • Fix Yolos ONNX export test by @ydshieh in #18606
    • Fix matmul inputs dtype by @JingyaHuang in #18585
    • Update feature extractor methods to enable type cast before normalize by @amyeroberts in #18499
    • Allow users to force TF availability by @Rocketknight1 in #18650
    • [LongT5] Correct docs long t5 by @patrickvonplaten in #18669
    • Generate: validate model_kwargs on FLAX (and catch typos in generate arguments) by @gante in #18653
    • Ping detectron2 for CircleCI tests by @ydshieh in #18680
    • Rename method to avoid clash with property by @amyeroberts in #18677
    • Rename second input dimension from "sequence" to "num_channels" for CV models by @regisss in #17976
    • Fix repo consistency by @lewtun in #18682
    • Fix breaking change in onnxruntime for ONNX quantization by @severinsimmler in #18336
    • Add evaluate to examples requirements by @muellerzr in #18666
    • [bnb] Move documentation by @younesbelkada in #18671
    • Add an examples folder for code downstream tasks by @loubnabnl in #18679
    • model.tie_weights() should be applied after accelerator.prepare() by @Gladiator07 in #18676
    • Generate: add missing **model_kwargs in sample tests by @gante in #18696
    • Temp fix for broken detectron2 import by @patrickvonplaten in #18699
    • [Hotfix] pin detectron2 5aeb252 to avoid test fix by @ydshieh in #18701
    • Fix Data2VecVision ONNX test by @ydshieh in #18587
    • Add missing tokenizer tests - Longformer by @tgadeliya in #17677
    • remove check for main process for trackers initialization by @Gladiator07 in #18706
    • Unpin detectron2 by @ydshieh in #18727
    • Removing warning of model type for microsoft/tapex-base-finetuned-wtq by @Narsil in #18711
    • improve add_tokens docstring by @SaulLu in #18687
    • CLI: Don't check the model head when there is no model head by @gante in #18733
    • Update perf_infer_gpu_many.mdx by @mishig25 in #18744
    • Add minor doc-string change to include hp_name param in hyperparameter_search by @constantin-huetterer in #18700
    • fix pipeline_tutorial.mdx doctest by @ydshieh in #18717
    • Add TF implementation of XGLMModel by @stancld in #16543
    • fixed docstring typos by @JadeKim042386 in #18739
    • add warning to let the user know that the __call__ method is faster than encode + pad for a fast tokenizer by @SaulLu in #18693
    • examples/run_summarization_no_trainer: fixed incorrect param to hasattr by @rahular in #18720
    • Add ONNX support for Longformer by @deutschmn in #17176
    • Determine framework automatically before ONNX export by @rachthree in #18615
    • streamlining 'checkpointing_steps' parsing by @rahular in #18755
    • CLI: Improved error control and updated hub requirement by @gante in #18752
    • [VisionEncoderDecoder] Add gradient checkpointing by @patrickvonplaten in #18697
    • [Wav2vec2 + LM Test] Improve wav2vec2 with lm tests and make torch version dependent for now by @patrickvonplaten in #18749
    • Fix incomplete outputs of FlaxBert by @duongna21 in #18772
    • Fix broken link DeepSpeed documentation link by @philschmid in #18783
    • fix missing block when there is no failure by @ydshieh in #18775
    • fix a possible typo in auto feature extraction by @fcakyon in #18779
    • Fix memory leak issue in torch_fx tests by @ydshieh in #18547
    • Fix mock in test_cached_files_are_used_when_internet_is_down by @Wauplin in #18804
    • Add SegFormer and ViLT links by @NielsRogge in #18808
    • send model to the correct device by @ydshieh in #18800
    • Revert to and safely handle flag in owlvit config by @amyeroberts in #18750
    • Add docstring for BartForCausalLM by @ekagra-ranjan in #18795
    • up by @qqaatw in #18805
    • [Swin, Swinv2] Fix attn_mask dtype by @NielsRogge in #18803
    • Run tests if skip condition not met by @amyeroberts in #18764
    • Remove ViltForQuestionAnswering from check_repo by @NielsRogge in #18762
    • Adds OWLViT to models exportable with ONNX by @unography in #18588
    • Adds GroupViT to models exportable with ONNX by @unography in #18628
    • LayoutXLMProcessor: ensure 1-to-1 mapping between samples and images, and add test for it by @anthony2261 in #18774
    • Added Docstrings for Deberta and DebertaV2 [PyTorch] by @Tegzes in #18610
    • Improving the documentation for "word", within the pipeline. by @Narsil in #18763
    • Disable nightly CI temporarily by @ydshieh in #18820
    • Pin max tf version by @gante in #18818
    • Fix cost condition in DetrHungarianMatcher and YolosHungarianMatcher to allow zero-cost by @kongzii in #18647
    • oob performance improvement for cpu DDP by @sywangyi in #18595
    • Warn on TPUs when the custom optimizer and model device are not the same by @muellerzr in #18668
    • Update location identification by @LysandreJik in #18834
    • fix bug: register_for_auto_class should be defined on TFPreTrainedModel instead of TFSequenceSummary by @azonti in #18607
    • [DETR] Add num_channels attribute by @NielsRogge in #18714
    • Pin ffspec by @sgugger in #18837
    • Improve GPT2 doc by @ekagra-ranjan in #18787
    • Add an option to HfArgumentParser.parse_{dict,json_file} to raise an Exception when there extra keys by @FelixSchneiderZoom in #18692
    • Improve Text Generation doc by @ekagra-ranjan in #18788
    • Add SegFormer ONNX support by @NielsRogge in #18006
    • Add security warning about the from_pretrained() method by @lewtun in #18801
    • Owlvit memory leak fix by @alaradirik in #18734
    • Create pipeline_tutorial.mdx german docs by @flozi00 in #18625
    • Unpin fsspec by @albertvillanova in #18846
    • Delete state_dict to release memory as early as possible by @ydshieh in #18832
    • Generate: smaller TF serving test by @gante in #18840
    • add a script to get time info. from GA workflow jobs by @ydshieh in #18822
    • Pin rouge_score by @albertvillanova in #18247
    • Minor typo in prose of model outputs documentation. by @pcuenca in #18848
    • reflect max_new_tokens in Seq2SeqTrainer by @kumapo in #18786
    • Adds timeout argument to training_args to avoid socket timeouts in DDP by @gugarosa in #18562
    • Cache results of is_torch_tpu_available() by @comaniac in #18777
    • Tie weights after preparing the model in run_clm by @sgugger in #18855
    • Pin revision for LayoutLMForQuestionAnswering and TFLayoutLMForQuestionAnswering tests by @ankrgyl in #18854
    • Split docs on modality by @stevhliu in #18205
    • if learning rate is a tensor, get item (float) by @kmckiern in #18861
    • Fix naming issue with ImageToText pipeline by @OlivierDehaene in #18864
    • [LayoutLM] Add clarification to docs by @NielsRogge in #18716
    • Add OWL-ViT to the appropriate section by @NielsRogge in #18867
    • Clean up utils.hub using the latest from hf_hub by @sgugger in #18857
    • pin Slack SDK to 3.18.1 to avoid failing issue by @ydshieh in #18869
    • Fix number of examples for iterable datasets in multiprocessing by @sgugger in #18856
    • postpone bnb load until it's needed by @stas00 in #18859
    • A script to download artifacts and perform CI error statistics by @ydshieh in #18865
    • Remove cached torch_extensions on CI runners by @ydshieh in #18868
    • Update docs landing page by @stevhliu in #18590
    • Finetune guide for semantic segmentation by @stevhliu in #18640
    • Add Trainer to quicktour by @stevhliu in #18723
    • TF: TFMarianMTModel final logits bias as a layer by @gante in #18833
    • Mention TF and Flax checkpoints by @LysandreJik in #18894
    • Correct naming pegasus x by @patrickvonplaten in #18896
    • Update perf_train_gpu_one.mdx by @thepurpleowl in #18442
    • Add type hints to XLM-Roberta-XL models by @asofiaoliveira in #18475
    • Update Chinese documentation by @zkep in #18893
    • Generate: get the correct beam index on eos token by @gante in #18851
    • Mask t5 relative position bias then head pruned by @hadaev8 in #17968
    • updating gather function with gather_for_metrics in run_wav2vec2_pretraining by @arun99481 in #18877
    • Fix decode_input_ids to bare T5Model and improve doc by @ekagra-ranjan in #18791
    • Fix test_tf_encode_plus_sent_to_model for LayoutLMv3 by @ydshieh in #18898
    • fixes bugs to handle non-dict output by @alaradirik in #18897
    • Further reduce the number of alls to head for cached objects by @sgugger in #18871
    • unpin slack_sdk version by @ydshieh in #18901
    • Fix incorrect size of input for 1st strided window length in Perplexity of fixed-length models by @ekagra-ranjan in #18906
    • [VideoMAE] Improve code examples by @NielsRogge in #18919
    • Add checks for more workflow jobs by @ydshieh in #18905
    • Accelerator end training by @nbroad1881 in #18910
    • update the train_batch_size in case HPO change batch_size_per_device by @sywangyi in #18918
    • Update TF fine-tuning docs by @Rocketknight1 in #18654
    • TF: final bias as a layer in seq2seq models (replicate TFMarian fix) by @gante in #18903
    • remvoe _create_and_check_torch_fx_tracing in specific test files by @ydshieh in #18667
    • [DeepSpeed ZeRO3] Fix performance degradation in sharded models by @tjruwase in #18911
    • pin TF 2.9.1 for self-hosted CIs by @ydshieh in #18925
    • Fix XLA fp16 and bf16 error checking by @ymwangg in #18913
    • Starts on a list of external deps required for dev by @colindean in #18929
    • Add image height and width to ONNX dynamic axes by @lewtun in #18915
    • Skip some doctests in quicktour by @stevhliu in #18927
    • Fix LayoutXLM wrong link in README by @Devlee247 in #18932
    • Update translation requests contact by @NimaBoscarino in #18941
    • [JAX] Replace all jax.tree_* calls with jax.tree_util.tree_* by @sanchit-gandhi in #18361
    • Neptune.ai integration improvements by @Raalsky in #18934
    • Generate: Simplify is_pad_token_not_equal_to_eos_token_id by @ekagra-ranjan in #18933
    • Fix train_step, test_step and tests for CLIP by @Rocketknight1 in #18684
    • Exit early in load if no weights are in the sharded state dict by @sgugger in #18937
    • update black target version by @BramVanroy in #18955
    • RFC: Replace custom TF embeddings by Keras embeddings by @gante in #18939
    • TF: unpin maximum TF version by @gante in #18917
    • Revert "TF: unpin maximum TF version by @sgugger in #18917)"
    • remove unused activation dropout by @shijie-wu in #18842
    • add DDP HPO support for sigopt by @sywangyi in #18931
    • Remove decoder_position_ids from check_decoder_model_past_large_inputs by @ydshieh in #18980
    • create Past CI results as tables for GitHub issue by @ydshieh in #18953
    • Remove dropout in embedding layer of OPT by @shijie-wu in #18845
    • Fix TF start docstrings by @Rocketknight1 in #18991
    • Align try_to_load_from_cache with huggingface_hub by @sgugger in #18966
    • Fix tflongformer int dtype by @Rocketknight1 in #18907
    • TF: correct TFBart embeddings weights name when load_weight_prefix is passed by @gante in #18993
    • fix checkpoint name for wav2vec2 conformer by @ydshieh in #18994
    • added type hints by @daspartho in #18996
    • TF: TF 2.10 unpin + related onnx test skips by @gante in #18995
    • Fixed typo by @tnusser in #18921
    • Removed issue in wav2vec link by @chrisemezue in #18945
    • Fix MaskFormerFeatureExtractor instance segmentation preprocessing bug by @alaradirik in #18997
    • Add type hints for M2M by @daspartho in #18998
    • Fix tokenizer for XLMRobertaXL by @ydshieh in #19004
    • Update default revision for document-question-answering by @ankrgyl in #18938
    • Fixed bug which caused overwrite_cache to always be True by @rahular in #19000
    • add DDP HPO support for optuna by @sywangyi in #19002
    • add missing require_tf for TFOPTGenerationTest by @ydshieh in #19010
    • Re-add support for single url files in objects download by @sgugger in #19014

    Significant community contributions

    The following contributors have made significant changes to the library over the last release:

    • @nandwalritik
      • Add swin transformer v2 (#17469)
      • Update no trainer scripts for language modeling and image classification examples (#18443)
    • @ankrgyl
      • Include tensorflow-aarch64 as a candidate (#18345)
      • Specify en in doc-builder README example (#18526)
      • Add LayoutLMForQuestionAnswering model (#18407)
      • Pin revision for LayoutLMForQuestionAnswering and TFLayoutLMForQuestionAnswering tests (#18854)
      • Add DocumentQuestionAnswering pipeline (#18414)
      • Update default revision for document-question-answering (#18938)
    • @ikuyamada
      • Adding fine-tuning models to LUKE (#18353)
    • @duongna21
      • Add Flax BART pretraining script (#18297)
      • Fix incomplete outputs of FlaxBert (#18772)
    • @donelianc
      • Add Spanish translation of run_scripts.mdx (#18415)
      • Add Spanish translation of converting_tensorflow_models.mdx (#18512)
      • Add type hints for ViLT models (#18577)
    • @sayakpaul
      • fix: keras fit tests for segformer tf and minor refactors. (#18412)
      • TensorFlow MobileViT (#18555)
    • @flozi00
      • german docs translation (#18544)
      • Update longt5.mdx (#18634)
      • Create pipeline_tutorial.mdx german docs (#18625)
    • @stancld
      • Add TF implementation of XGLMModel (#16543)
    • @ChrisFugl
      • [LayoutLMv3] Add TensorFlow implementation (#18678)
    • @zphang
      • PEGASUS-X (#18551)
    • @nghuyong
      • add task_type_id to BERT to support ERNIE-2.0 and ERNIE-3.0 models (#18686)
    Source code(tar.gz)
    Source code(zip)
  • v4.21.3(Sep 5, 2022)

  • v4.21.2(Aug 24, 2022)

  • v4.21.1(Aug 4, 2022)

  • v4.21.0(Jul 27, 2022)

    TensorFlow XLA Text Generation

    The TensorFlow text generation method can now be wrapped with tf.function and compiled to XLA. You should be able to achieve up to 100x speedup this way. See our blog post and our benchmarks. You can also see XLA generation in action in our example notebooks, particularly for summarization and translation.

    import tensorflow as tf
    from transformers import AutoTokenizer, TFAutoModelForSeq2SeqLM
    
    tokenizer = AutoTokenizer.from_pretrained("t5-small")
    model = TFAutoModelForSeq2SeqLM.from_pretrained("t5-small")
    
    # Main changes with respect to the original generate workflow: `tf.function` and `pad_to_multiple_of`
    xla_generate = tf.function(model.generate, jit_compile=True)
    tokenization_kwargs = {"pad_to_multiple_of": 32, "padding": True, "return_tensors": "tf"}
    
    # The first prompt will be slow (compiling), the others will be very fast!
    input_prompts = [
        f"translate English to {language}: I have four cats and three dogs."
        for language in ["German", "French", "Romanian"]
    ]
    for input_prompt in input_prompts:
        tokenized_inputs = tokenizer([input_prompt], **tokenization_kwargs)
        generated_text = xla_generate(**tokenized_inputs, max_new_tokens=32)
        print(tokenizer.decode(generated_text[0], skip_special_tokens=True))
    
    • Generate: deprecate default max_length by @gante in #18018
    • TF: GPT-J compatible with XLA generation by @gante in #17986
    • TF: T5 can now handle a padded past (i.e. XLA generation) by @gante in #17969
    • TF: XLA beam search + most generation-compatible models are now also XLA-generate-compatible by @gante in #17857
    • TF: generate without tf.TensorArray by @gante in #17801
    • TF: BART compatible with XLA generation by @gante in #17479

    New model additions

    OwlViT

    The OWL-ViT model (short for Vision Transformer for Open-World Localization) was proposed in Simple Open-Vocabulary Object Detection with Vision Transformers by Matthias Minderer, Alexey Gritsenko, Austin Stone, Maxim Neumann, Dirk Weissenborn, Alexey Dosovitskiy, Aravindh Mahendran, Anurag Arnab, Mostafa Dehghani, Zhuoran Shen, Xiao Wang, Xiaohua Zhai, Thomas Kipf, and Neil Houlsby. OWL-ViT is an open-vocabulary object detection network trained on a variety of (image, text) pairs. It can be used to query an image with one or multiple text queries to search for and detect target objects described in text.

    • Add OWL-ViT model for zero-shot object detection by @alaradirik in #17938
    • Fix OwlViT tests by @sgugger in #18253

    NLLB

    The NLLB model was presented in No Language Left Behind: Scaling Human-Centered Machine Translation by Marta R. Costa-jussà, James Cross, Onur Çelebi, Maha Elbayad, Kenneth Heafield, Kevin Heffernan, Elahe Kalbassi, Janice Lam, Daniel Licht, Jean Maillard, Anna Sun, Skyler Wang, Guillaume Wenzek, Al Youngblood, Bapi Akula, Loic Barrault, Gabriel Mejia Gonzalez, Prangthip Hansanti, John Hoffman, Semarley Jarrett, Kaushik Ram Sadagopan, Dirk Rowe, Shannon Spruit, Chau Tran, Pierre Andrews, Necip Fazil Ayan, Shruti Bhosale, Sergey Edunov, Angela Fan, Cynthia Gao, Vedanuj Goswami, Francisco Guzmán, Philipp Koehn, Alexandre Mourachko, Christophe Ropers, Safiyyah Saleem, Holger Schwenk, and Jeff Wang. No Language Left Behind (NLLB) is a model capable of delivering high-quality translations directly between any pair of 200+ languages — including low-resource languages like Asturian, Luganda, Urdu and more.

    • [M2M100] update conversion script by @patil-suraj in #17916
    • NLLB tokenizer by @LysandreJik in #18126

    MobileViT

    The MobileViT model was proposed in MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer by Sachin Mehta and Mohammad Rastegari. MobileViT introduces a new layer that replaces local processing in convolutions with global processing using transformers.

    • add MobileViT model by @hollance in #17354

    Nezha

    The Nezha model was proposed in NEZHA: Neural Contextualized Representation for Chinese Language Understanding by Junqiu Wei et al. NEZHA is a language model based on BERT with a collection of proven improvements, which include Functional Relative Positional Encoding as an effective positional encoding scheme, Whole Word Masking strategy, Mixed Precision Training and the LAMB Optimizer in training the models.

    • Nezha Pytorch implementation by @sijunhe in #17776

    GroupViT

    The GroupViT model was proposed in GroupViT: Semantic Segmentation Emerges from Text Supervision by Jiarui Xu, Shalini De Mello, Sifei Liu, Wonmin Byeon, Thomas Breuel, Jan Kautz, Xiaolong Wang. Inspired by CLIP, GroupViT is a vision-language model that can perform zero-shot semantic segmentation on any given vocabulary categories, inspired by CLIP.

    • Adding GroupViT Models by @xvjiarui in #17313

    MVP

    The MVP model was proposed in MVP: Multi-task Supervised Pre-training for Natural Language Generation by Tianyi Tang, Junyi Li, Wayne Xin Zhao and Ji-Rong Wen. MVP is a generative language model, pre-trained on a labeled pre-training corpus from 45 datasets over seven generation tasks. For each task, the model is further pre-trained using specific soft prompts to stimulate the model capacity in performing a specific task.

    • Add MVP model by @StevenTang1998 in #17787

    CodeGen

    The CodeGen model was proposed in A Conversational Paradigm for Program Synthesis by Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, and Caiming Xiong. CodeGen is an autoregressive language model for program synthesis trained sequentially on The Pile, BigQuery, and BigPython.

    • Add CodeGen model by @rooa in #17443
    • [CodeGen] support device_map="auto" for sharded checkpoints by @patil-suraj in #17871

    UL2

    The UL2 model was presented in Unifying Language Learning Paradigms by Yi Tay, Mostafa Dehghani, Vinh Q. Tran, Xavier Garcia, Dara Bahri, Tal Schuster, Huaixiu Steven Zheng, Neil Houlsby, Donald Metzler. UL2 is a unified framework for pretraining models that are universally effective across datasets and setups. UL2 uses Mixture-of-Denoisers (MoD), a pre-training objective that combines diverse pre-training paradigms together. UL2 introduces a notion of mode switching, wherein downstream fine-tuning is associated with specific pre-training schemes.

    • Add UL2 (just docs) by @patrickvonplaten in #17740

    Custom pipelines

    This adds the ability to support custom pipelines on the Hub and share it with everyone else. Like the code in the Hub feature for models, tokenizers etc., the user has to add trust_remote_code=True when they want to use it. Apart from this, the best way to get familiar with the feature is to look at the added documentation.

    • Custom pipeline by @sgugger in #18079

    PyTorch to TensorFlow CLI utility

    This adds a CLI to convert PT weights into TF weights, validate them, and (optionally) open a PR.

    • CLI: tool to convert PT into TF weights and open hub PR by @gante in https://github.com/huggingface/transformers/pull/17497

    TensorFlow-specific improvements

    The following models have been ported to be used in TensorFlow: SegFormer, DeiT, ResNet and RegNet.

    • [SegFormer] TensorFlow port by @sayakpaul in #17910
    • Add TF DeiT implementation by @amyeroberts in #17806
    • Add TF ResNet model by @amyeroberts in #17427
    • TF implementation of RegNets by @ariG23498 in #17554

    Additionally, our TF models now support loading sharded checkpoints:

    • TF Sharded by @ArthurZucker in #17713

    Flax-specific improvements

    The following models have been ported to be used in JAX:

    • Flax t5 Encoder by @crystina-z in #17784

    Additionally, our JAX models now support loading sharded checkpoints:

    • Flax sharded by @ArthurZucker in #17760

    Additional model heads

    The following models now have a brand new head for new tasks:

    • Add ViltForTokenClassification e.g. for Named-Entity-Recognition (NER) by @gilad19 in #17924
    • Adding OPTForSeqClassification class by @oneraghavan in #18123

    ONNX support

    A continued community effort provides ONNX converters for an increasing number of models.

    • add ONNX support for LeVit by @gcheron in #18154
    • add ONNX support for BLOOM by @NouamaneTazi in #17961
    • Add ONNX support for LayoutLMv3 by @regisss in #17953
    • Mrbean/codegen onnx by @sam-h-bean in #17903
    • Add ONNX support for DETR by @regisss in #17904
    • add onnx support for deberta and debertav2 by @sam-h-bean in #17617

    Documentation translation

    A community effort aiming to translate the documentation in several languages has been continued.

    Portuguese

    • Added translation of index.mdx to Portuguese Issue #16824 by @rzimmerdev in #17565

    Spanish

    • Add Spanish translation of custom_models.mdx by @donelianc in #17807

    Italian

    • Add Italian translation of sharing_custom_models.mdx by @Xpiri in #17631
    • Add Italian translation of converting_tensorflow_models.mdx by @Xpiri in #18283
    • Add Italian translation of create_model.mdx and serialization.mdx by @F02934 in #17640
    • Italian/accelerate by @mfumanelli in #17698
    • Italian/model sharing by @mfumanelli in #17828
    • Italian translation of run_scripts.mdx gh-17459 by @lorenzobalzani in #17642
    • Translation/debugging by @nickprock in #18230
    • Translation/training: italian translation training.mdx by @nickprock in #17662
    • Translation italian: multilingual.mdx by @nickprock in #17768
    • Added preprocessing.mdx italian translation by @nickprock in #17600

    Improvements and bugfixes

    • [EncoderDecoder] Improve docs by @NielsRogge in #18271
    • [DETR] Improve code examples by @NielsRogge in #18262
    • patch for smddp import by @carolynwang in #18244
    • Fix Sylvain's nits on the original KerasMetricCallback PR by @Rocketknight1 in #18300
    • Add PYTEST_TIMEOUT for CircleCI test jobs by @ydshieh in #18251
    • Add PyTorch 1.11 to past CI by @ydshieh in #18302
    • Raise a TF-specific error when importing Torch classes by @Rocketknight1 in #18280
    • [ create_a_model.mdx ] translate to pt by @Fellip15 in #18098
    • Update translation.mdx by @gorkemozkaya in #18169
    • Add TFAutoModelForImageClassification to pipelines.py by @ydshieh in #18292
    • Adding type hints of TF:OpenAIGPT by @Mathews-Tom in #18263
    • Adding type hints of TF:CTRL by @Mathews-Tom in #18264
    • Replace false parameter by a buffer by @sgugger in #18259
    • Fix ORTTrainer failure on gpt2 fp16 training by @JingyaHuang in #18017
    • Owlvit docs test by @alaradirik in #18257
    • Good difficult issue override for the stalebot by @LysandreJik in #18094
    • Fix dtype of input_features in docstring by @ydshieh in #18258
    • Fix command of doc tests for local testing by @oneraghavan in #18236
    • Fix TF bad words filter with XLA by @Rocketknight1 in #18286
    • Allows KerasMetricCallback to use XLA generation by @Rocketknight1 in #18265
    • Skip passes report for --make-reports by @ydshieh in #18250
    • Update serving code to enable saved_model=True by @amyeroberts in #18153
    • Change how take_along_axis is computed in DeBERTa to stop confusing XLA by @Rocketknight1 in #18256
    • Fix torch version check in Vilt by @ydshieh in #18260
    • change bloom parameters to 176B by @muhammad-ahmed-ghani in #18235
    • TF: use the correct config with (...)EncoderDecoder models by @gante in #18097
    • Fix no_trainer CI by @muellerzr in #18242
    • Update notification service by @ydshieh in #17921
    • Make errors for loss-less models more user-friendly by @sgugger in #18233
    • Fix TrainingArguments help section by @sgugger in #18232
    • Better messaging and fix for incorrect shape when collating data. by @CakeCrusher in #18119
    • Add support for Sagemaker Model Parallel >= 1.10 new checkpoint API by @viclzhu in #18221
    • Update add_new_pipeline.mdx by @zh-zheng in #18224
    • Add custom config to quicktour by @stevhliu in #18115
    • skip some test_multi_gpu_data_parallel_forward by @ydshieh in #18188
    • Change to FlavaProcessor in PROCESSOR_MAPPING_NAMES by @ydshieh in #18213
    • Fix LayoutXLM docstrings by @qqaatw in #17038
    • update cache to v0.5 by @ydshieh in #18203
    • Reduce console spam when using the KerasMetricCallback by @Rocketknight1 in #18202
    • TF: Add missing cast to GPT-J by @gante in #18201
    • Use next-gen CircleCI convenience images by @ydshieh in #18197
    • Typo in readme by @flozi00 in #18195
    • [From pretrained] Allow download from subfolder inside model repo by @patrickvonplaten in #18184
    • Update docs README with instructions on locally previewing docs by @snehankekre in #18196
    • bugfix: div-->dim by @orgoro in #18135
    • Add vision example to README by @sgugger in #18194
    • Remove use_auth_token from the from_config method by @duongna21 in #18192
    • FSDP integration enhancements and fixes by @pacman100 in #18134
    • BLOOM minor fixes small test by @younesbelkada in #18175
    • fix typo inside bloom documentation by @SaulLu in #18187
    • Better default for offload_state_dict in from_pretrained by @sgugger in #18183
    • Fix template for new models in README by @sgugger in #18182
    • FIX: Typo by @ayansengupta17 in #18156
    • Update TF(Vision)EncoderDecoderModel PT/TF equivalence tests by @ydshieh in #18073
    • Fix expected loss values in some (m)T5 tests by @ydshieh in #18177
    • [HPO] update to sigopt new experiment api by @sywangyi in #18147
    • Fix incorrect type hint for lang by @JohnGiorgi in #18161
    • Fix check for falsey inputs in run_summarization by @JohnGiorgi in #18155
    • Adding support for device_map directly in pipeline(..) function. by @Narsil in #17902
    • Fixing a hard to trigger bug for text-generation pipeline. by @Narsil in #18131
    • Enable torchdynamo with torch_tensorrt(fx path) by @frank-wei in #17765
    • Make sharded checkpoints work in offline mode by @sgugger in #18125
    • add dataset split and config to model-index in TrainingSummary.from_trainer by @loicmagne in #18064
    • Add summarization name mapping for MultiNews by @JohnGiorgi in #18117
    • supported python versions reference by @CakeCrusher in #18116
    • TF: unpack_inputs decorator independent from main_input_name by @gante in #18110
    • TF: remove graph mode distinction when processing boolean options by @gante in #18102
    • Fix BLOOM dtype by @Muennighoff in #17995
    • CLI: reenable pt_to_tf test by @gante in #18108
    • Report value for a step instead of epoch. by @zhawe01 in #18095
    • speed up test by @sijunhe in #18106
    • Enhance IPEX integration in Trainer by @jianan-gu in #18072
    • Bloom Optimize operations by @younesbelkada in #17866
    • Add filename to info diaplyed when downloading things in from_pretrained by @sgugger in #18099
    • Fix image segmentation and object detection pipeline tests by @sgugger in #18100
    • Fix RESOURCE_EXHAUSTED error when dealing with large datasets in Flax example scripts by @duongna21 in #18069
    • Fix torchscript tests for GPT-NeoX by @ydshieh in #18012
    • Fix some typos. by @Yulv-git in #17560
    • [bloom] fix alibi device placement by @stas00 in #18087
    • Make predict() close progress bars after finishing by @neverix in #17952)
    • Update localized READMES when template is filled. by @sgugger in #18062
    • Fix type issue in using bucketing with Trainer by @seopbo in #18051
    • Fix slow CI by pinning resampy by @sgugger in #18077
    • Drop columns after loading samples in prepare_tf_dataset by @Rocketknight1 in #17967
    • [Generate Tests] Make sure no tokens are force-generated by @patrickvonplaten in #18053
    • Added Command for windows VENV activation in installation docs by @darthvader2 in #18008
    • Sort doc toc by @sgugger in #18034
    • Place inputs on device when include_inputs_for_metrics is True by @sgugger in #18046
    • Doc to dataset by @sgugger in #18037
    • Protect TFGenerationMixin.seed_generator so it's not created at import by @Rocketknight1 in #18044
    • Fix T5 incorrect weight decay in Trainer and official summarization example by @ADAning in #18002
    • Squash commits by @NielsRogge in #17981
    • Enable Past CI by @ydshieh in #17919
    • Fix T5/mT5 tests by @Rocketknight1 in #18029
    • [Flax] Bump to v0.4.1 by @sanchit-gandhi in #17966
    • Update expected values in DecisionTransformerModelIntegrationTest by @ydshieh in #18016
    • fixed calculation of ctc loss in TFWav2Vec2ForCTC by @Sreyan88 in #18014
    • Return scalar losses instead of per-sample means by @Rocketknight1 in #18013
    • sort list of models by @hollance in #18011
    • Replace BloomTokenizer by BloomTokenizerFast in doc by @regisss in #18005
    • Fix typo in error message in generation_utils by @regisss in #18000
    • Refactor to inherit from nn.Module instead of nn.ModuleList by @amyeroberts in #17501
    • Add link to existing documentation by @LysandreJik in #17931
    • only a stupid typo, but it can lead to confusion by @Dobatymo in #17930
    • Exclude Databricks from notebook env only if the runtime is below 11.0 by @davidheryanto in #17988
    • Shifting labels for causal LM when using label smoother by @seungeunrho in #17987
    • Restore original task in test_warning_logs by @ydshieh in #17985
    • Ensure PT model is in evaluation mode and lightweight forward pass done by @amyeroberts in #17970
    • XLA train step fixes by @Rocketknight1 in #17973
    • [Flax] Add remat (gradient checkpointing) by @sanchit-gandhi in #17843
    • higher atol to avoid flaky trainer test failure by @ydshieh in #17979
    • Fix FlaxBigBirdEmbeddings by @ydshieh in #17842
    • fixing fsdp autowrap functionality by @pacman100 in #17922
    • fix bias keyword argument in TFDebertaEmbeddings by @WissamAntoun in #17940
    • Update expected values in CodeGen tests by @ydshieh in #17888
    • Fix typo in perf_train_gpu_one.mdx by @aliencaocao in #17983
    • skip some gpt_neox tests that require 80G RAM by @ydshieh in #17923
    • feat: add pipeline registry abstraction by @aarnphm in #17905
    • skip some ipex tests until it works with torch 1.12 by @ydshieh in #17964
    • Fix number of examples for iterable dataset in distributed training by @sgugger in #17951
    • [Pipelines] Add revision tag to all default pipelines by @patrickvonplaten in #17667
    • Unifying training argument type annotations by @jannisborn in #17934
    • Fix GPT-NeoX-20B past handling, attention computation by @zphang in #17811
    • Fix #17893, removed dead code by @clefourrier in #17917
    • Fix prepare_tf_dataset when drop_remainder is not supplied by @Rocketknight1 in #17950
    • ExplicitEnum subclass str (JSON dump compatible) by @BramVanroy in #17933
    • PyTorch 1.12.0 for scheduled CI by @ydshieh in #17949
    • OPT - Fix Softmax NaN in half precision mode by @younesbelkada in #17437
    • Use explicit torch version in deepspeed CI by @ydshieh in #17942
    • fix regexes with escape sequence by @stas00 in #17943
    • Fix all is_torch_tpu_available issues by @muellerzr in #17936
    • Fix img seg tests (load checkpoints from hf-internal-testing) by @mishig25 in #17939
    • Remove imports and use forward references in ONNX feature by @sgugger in #17926
    • Fix job links in Slack report by @ydshieh in #17892
    • Add missing comment quotes by @leondz in #17379
    • Remove render tags by @NielsRogge in #17897
    • Fix the Conda package build by @bryant1410 in #16737
    • Remove DT_DOUBLE from the T5 graph by @szutenberg in #17891
    • Compute min_resolution in prepare_image_inputs by @ydshieh in #17915
    • Fixing a regression with return_all_scores introduced in #17606 by @Narsil in #17906
    • In group_texts function, drop last block if smaller than block_size by @billray0259 in #17908
    • Move logic into pixelshuffle layer by @amyeroberts in #17899
    • Fix loss computation in TFBertForPreTraining by @Rocketknight1 in #17898
    • Pin black to 22.3.0 to benefit from a stable --preview flag by @LysandreJik in #17918
    • Fix PyTorch/TF Auto tests by @ydshieh in #17895
    • Fix test_number_of_steps_in_training_with_ipex by @ydshieh in #17889
    • Update expected values in constrained beam search tests by @ydshieh in #17887
    • Fix bug in gpt2's (from-scratch) special scaled weight initialization by @karpathy in #17877
    • Update README_zh-hans.md by @mmdjiji in #17861
    • bert: add conversion script for BERT Token Dropping TF2 checkpoints by @stefan-it in #17142
    • Fix add new model like frameworks by @sgugger in #17869
    • Add type annotations for RoFormer models by @donelianc in #17878
    • fix by @ydshieh in #17890
    • fix mask by @younesbelkada in #17837
    • Add a TF in-graph tokenizer for BERT by @Rocketknight1 in #17701
    • Fix TF GPT2 test_onnx_runtime_optimize by @ydshieh in #17874
    • CLI: handle multimodal inputs by @gante in #17839
    • Properly get tests deps in test_fetcher by @sgugger in #17870
    • Fix test_inference_instance_segmentation_head by @ydshieh in #17872
    • Skip test_multi_gpu_data_parallel_forward for MaskFormer by @ydshieh in #17864
    • Use higher value for hidden_size in Flax BigBird test by @ydshieh in #17822
    • Fix: torch.utils.checkpoint import error. by @kumapo in #17849
    • Add type hints for gptneox models by @willtai in #17858
    • Fix Splinter test by @ydshieh in #17854
    • [tests/VisionEncoderDecoder] import to_2tuple from test utils by @patil-suraj in #17865
    • Fix Constrained beam search duplication and weird output issue by @boy2000-007man in #17814
    • Improve encoder decoder model docs by @Threepointone4 in #17815
    • Improve vision models by @NielsRogge in #17731
    • Auto-build Docker images before on-merge if setup.py was changed by @muellerzr in #17573
    • Properly calculate the total train iterations and recalculate num epochs in no_trainer scripts by @muellerzr in #17856
    • Index RNG states by global rank in saves by @sgugger in #17852
    • Change no trainer image_classification test by @muellerzr in #17635
    • Update modeling_cvt.py by @F02934 in #17846
    • Fix broken test for models with batchnorm by @Rocketknight1 in #17841
    • BLOOM minor changes on tokenizer by @younesbelkada in #17823
    • Improve performance docs by @lvwerra in #17750
    • Fix an error message in BigBird by @ydshieh in #17840
    • Fix properties of unset special tokens in non verbose mode by @guillaumekln in #17797
    • change message by @SaulLu in #17836
    • Add missing type hints for QDQBertModel by @willtai in #17783
    • Update type hints modeling_yoso.py by @F02934 in #17827
    • add doctests for DETR by @qherreros in #17786
    • Fix push CI artifact path by @ydshieh in #17788
    • Offload fixes by @sgugger in #17810
    • CLI: use hub's create_commit by @gante in #17755
    • initial commit by @ArthurZucker in #17818
    • Add logits_processor parameter, used by generate, to Seq2SeqTrainer methods evaluate and predict by @eranhirs in #17805
    • Fix top_k_top_p_filtering having unexpected behavior by @unifyh in #17744
    • Remove duplicate code by @lkm2835 in #17708
    • CLI: convert sharded PT models by @gante in #17959
    • Improve error message Union not allowed by @BramVanroy in #17769
    • Add final_layer_norm to OPT model by @thomasw21 in #17785
    • Properly check for a TPU device by @muellerzr in #17802
    • Fix test for BF16 detection by @sgugger in #17803
    • Use 5e-5 For BigBird PT/Flax equivalence tests by @ydshieh in #17780
    • Prepare transformers for v0.8.0 huggingface-hub release by @LysandreJik in #17716
    • Fix forward reference imports in DeBERTa configs by @sgugger in #17800
    • Fix Automatic Download of Pretrained Weights in DETR by @AnugunjNaman in #17712
    • [ViTMAE] Fix docstrings and variable names by @NielsRogge in #17710
    • Add link to notebook by @NielsRogge in #17791
    • [CodeParrot] Near-deduplication with jaccard similarity by @liyongsea in #17054
    • Update modeling_longt5.py by @bjascob in #17777
    • Not use -1e4 as attn mask by @ydshieh in #17306
    • Fix cache for GPT-Neo-X by @sgugger in #17764
    • deprecate is_torch_bf16_available by @stas00 in #17738
    • Attempt to change Push CI to workflow_run by @ydshieh in #17753
    • Save huggingface checkpoint as artifact in mlflow callback by @swethmandava in #17686
    • Migrate HFDeepSpeedConfig from trfrs to accelerate by @pacman100 in #17623
    • feat: add num_workers arg to DataLoader by @greg2451 in #17751
    • Enable PyTorch nightly build CI by @ydshieh in #17335

    Significant community contributions

    The following contributors have made significant changes to the library over the last release:

    • @donelianc
      • Add Spanish translation of custom_models.mdx (#17807)
      • Add type annotations for RoFormer models (#17878)
    • @Xpiri
      • Add Italian translation of sharing_custom_models.mdx (#17631)
      • Add Italian translation of converting_tensorflow_models.mdx (#18283)
    • @F02934
      • Add Italian translation of create_model.mdx and serialization.mdx (#17640)
      • Update modeling_cvt.py (#17846)
      • Update type hints modeling_yoso.py (#17827)
    • @sayakpaul
      • [SegFormer] TensorFlow port (#17910)
    • @mfumanelli
      • Italian/accelerate (#17698)
      • Italian/model sharing (#17828)
    • @nickprock
      • Translation/debugging (#18230)
      • Translation/training: italian translation training.mdx (#17662)
      • Translation italian: multilingual.mdx (#17768)
      • Added preprocessing.mdx italian translation (#17600)
    • @sijunhe
      • speed up test (#18106)
      • Nezha Pytorch implementation (#17776)
    • @StevenTang1998
      • Add MVP model (#17787)
    • @ariG23498
      • TF implementation of RegNets (#17554)
    • @xvjiarui
      • Adding GroupViT Models (#17313)
    • @rooa
      • Add CodeGen model (#17443)
    Source code(tar.gz)
    Source code(zip)
  • v4.20.1(Jun 21, 2022)

    This patch releases fixes a bug in the OPT models and makes Transformers compatible with huggingface_hub version 0.8.1.

    • Add final_layer_norm to OPT model #17785
    • Prepare transformers for v0.8.0 huggingface-hub release #17716
    Source code(tar.gz)
    Source code(zip)
  • v4.20.0(Jun 16, 2022)

    Big model inference

    You can now use the big model inference of Accelerate directly in any call to from_pretrained by specifying device_map="auto" (or your own device_map). It will automatically load the model taking advantage of your GPU(s) then offloading what doesn't fit in RAM, or even on the hard drive if you don't have RAM. Your model can then be used normally for inference without anything else to do.

    from transformers import AutoModelForSeq2SeqLM
    
    model = AutoModelForSeq2SeqLM.from_pretrained(
      "bigscience/T0pp", revision="sharded", device_map="auto"
    )
    
    • Use Accelerate in from_pretrained for big model inference by @sgugger in #17341

    BLOOM

    The BLOOM model has been proposed with its various versions through the BigScience Workshop. The architecture of BLOOM is essentially similar to GPT3 (auto-regressive model for next token prediction), but has been trained on different 46 languages including code.

    • BLOOM by @younesbelkada in #17474

    CvT

    The Convolutional vision Transformer (CvT) improves the Vision Transformer (ViT) in performance and efficiency by introducing convolutions into ViT to yield the best of both designs.

    • Add CvT by @NielsRogge and @AnugunjNaman in #17299

    GPT Neo-X

    GPT-NeoX-20B is a 20 billion parameter autoregressive language model trained on the Pile, whose weights are made freely and openly available to the public through a permissive license. GPT-NeoX-20B is a particularly powerful few-shot reasoner and gains far more in performance when evaluated five-shot than similarly sized GPT-3 and FairSeq models.

    • Adding GPT-NeoX-20B by @zphang in #16659

    LayoutLMv3

    LayoutLMv3 simplifies LayoutLMv2 by using patch embeddings (as in ViT) instead of leveraging a CNN backbone, and pre-trains the model on 3 objectives: masked language modeling (MLM), masked image modeling (MIM) and word-patch alignment (WPA).

    • Add LayoutLMv3 by @NielsRogge in #17060

    LeViT

    LeViT improves the Vision Transformer (ViT) in performance and efficiency by a few architectural differences such as activation maps with decreasing resolutions in Transformers and the introduction of an attention bias to integrate positional information.

    • Adding LeViT Model by Facebook by @AnugunjNaman in #17466

    LongT5

    LongT5 model is an extension of T5 model, and it enables using one of the two different efficient attention mechanisms - (1) Local attention, or (2) Transient-Global attention. It is capable of handling input sequences of a length up to 16,384 tokens.

    • Add LongT5 model by @stancld in #16792

    M-CTC-T

    The M-CTC-T model is a 1B-param transformer encoder, with a CTC head over 8065 character labels and a language identification head over 60 language ID labels. It is trained on Common Voice (version 6.1, December 2020 release) and VoxPopuli. After training on Common Voice and VoxPopuli, the model is trained on Common Voice only. The labels are unnormalized character-level transcripts (punctuation and capitalization are not removed). The model takes as input Mel filterbank features from a 16Khz audio signal.

    • M-CTC-T Model by @cwkeam in #16402

    Trajectory Transformer

    This Transformer is used for deep reinforcement learning. To use it, you need to create sequences from actions, states and rewards from all previous timesteps. This model will treat all these elements together as one big sequence (a trajectory).

    • Add trajectory transformer by @CarlCochet in #17141

    Wav2Vec2-Conformer

    The Wav2Vec2-Conformer is an updated version of fairseq S2T: Fast Speech-to-Text. It requires more parameters than Wav2Vec2, but also yields an improved word error rate.

    • [Wav2Vec2Conformer] Official release by @patrickvonplaten in #17709
    • Add Wav2Vec2Conformer by @patrickvonplaten in #16812

    TensorFlow implementations

    Data2VecVision for semantic segmentation, OPT and Swin are now available in TensorFlow.

    • Add TFData2VecVision for semantic segmentation by @sayakpaul in #17271
    • Opt in flax and tf by @ArthurZucker in #17388
    • Add Tensorflow Swin model by @amyeroberts in #16988

    Flax implementations

    OPT is now available in Flax.

    • Opt in flax and tf by @ArthurZucker in #17388

    Documentation translation in Italian and Portuguese

    A community effort has been started to translate the documentation in two new languages: Italian and Portuguese.

    • Translation/italian: added pipeline_tutorial.mdx [Issue: #17459] by @nickprock in #17507
    • Add installation.mdx Italian translation by @mfumanelli in #17530
    • Setup for Italian translation and add quicktour.mdx translation by @mfumanelli in #17472
    • Adding the Portuguese version of the tasks/token_classification.mdx documentation by @jonatasgrosman in #17492
    • Adding the Portuguese version of the tasks/sequence_classification.mdx documentation by @jonatasgrosman in #17352
    • [ fast_tokenizers.mdx ] - Added translation to portuguese to tutorial by @Fellip15 in #17076
    • Added translation of installation.mdx to Portuguese Issue #16824 by @rzimmerdev in #16979

    Improvements and bugfixes

    • Sort the model doc Toc Alphabetically by @sgugger in #17723
    • normalize keys_to_ignore by @stas00 in #17722
    • CLI: Add flag to push TF weights directly into main by @gante in #17720
    • Update requirements.txt by @jeffra in #17719
    • Revert "Change push CI to run on workflow_run event by @ydshieh in #17692)"
    • Documentation: RemBERT fixes by @stefan-it in #17641
    • Change push CI to run on workflow_run event by @ydshieh in #17692
    • fix tolerance for a bloom slow test by @younesbelkada in #17634
    • [LongT5] disable model parallel test by @patil-suraj in #17702
    • FX function refactor by @michaelbenayoun in #17625
    • Add BloomForSequenceClassification and BloomForTokenClassification classes by @haileyschoelkopf in #17639
    • Swin main layer by @amyeroberts in #17693
    • Include a comment to reflect Amy's contributions by @sayakpaul in #17689
    • Rag end2end new by @shamanez in #17650
    • [LongT5] Rename checkpoitns by @patrickvonplaten in #17700
    • Extend Transformers Trainer Class to Enable PyTorch Torchscript for Inference by @jianan-gu in #17153
    • Fix doc builder Dockerfile by @ydshieh in #17435
    • Add FP16 Support for SageMaker Model Parallel by @haohanchen-yagao in #17386
    • enable cpu distribution training using mpirun by @sywangyi in #17570
    • Add Ray's scope to training arguments by @BramVanroy in #17629
    • Update modeling_gpt_neox.py by @willfrey in #17575
    • Fix dtype getter by @sgugger in #17668
    • explicitly set utf8 for Windows by @BramVanroy in #17664
    • Fixed documentation typo, parameter name is evaluation_strategy, not eval_strategy by @sainttttt in #17669
    • Add Visual Question Answering (VQA) pipeline by @sijunhe in #17286
    • Fix typo in adding_a_new_model README by @ayushtues in #17679
    • Avoid GPU OOM for a TF Rag test by @ydshieh in #17638
    • fix typo from emtpy to empty by @domenicrosati in #17643
    • [Generation Test] Make fast test actually fast by @patrickvonplaten in #17661
    • [Data2Vec] Speed up test by @patrickvonplaten in #17660
    • [BigBirdFlaxTests] Make tests slow by @patrickvonplaten in #17658
    • update README.md by @loubnabnl in #17657
    • 🐛 Properly raise RepoNotFoundError when not authenticated by @SBrandeis in #17651
    • Fixes #17128 . by @mygithubid1 in #17356
    • Fix dtype getters by @sgugger in #17656
    • Add skip logic for attentions test - Levit by @amyeroberts in #17633
    • Enable crop_center method to handle (W, H, C) images by @alaradirik in #17626
    • Move Clip image utils to image_utils.py by @alaradirik in #17628
    • Skip tests until bug is fixed. by @sgugger in #17646
    • Translation/autoclass by @mfumanelli in #17615
    • didn't exist in pt-1.9 by @stas00 in #17644
    • convert assertion to raised exception in debertav2 by @sam-h-bean in #17619
    • Pre-build DeepSpeed by @ydshieh in #17607
    • [modeling_utils] torch_dtype/auto floating dtype fixes by @stas00 in #17614
    • Running a pipeline of float16. by @Narsil in #17637
    • fix use_amp rename after pr 17138 by @stas00 in #17636
    • Fix very long job failure text in Slack report by @ydshieh in #17630
    • Adding top_k argument to text-classification pipeline. by @Narsil in #17606
    • Mention in the doc we drop support for fairscale by @sgugger in #17610
    • Use shape_list to safely get shapes for Swin by @amyeroberts in #17591
    • Add ONNX support for ConvNeXT by @regisss in #17627
    • Add ONNX support for ResNet by @regisss in #17585
    • has_attentions - consistent test skipping logic and tf tests by @amyeroberts in #17495
    • CLI: Print all different tensors on exception by @gante in #17612
    • TF: Merge PT and TF behavior for Bart when no decoder_input_ids are passed by @gante in #17593
    • Fix telemetry URL by @sgugger in #17608
    • CLI: Properly detect encoder-decoder models by @gante in #17605
    • Fix link for community notebooks by @ngoquanghuy99 in #17602
    • Extend Transformers Trainer Class to Enable CPU AMP and Integrate Intel Extension for PyTorch by @jianan-gu in #17138
    • fix train_new_from_iterator in the case of byte-level tokenizers by @SaulLu in #17549
    • Explicit versions in docker files by @ydshieh in #17586
    • CLI: add stricter automatic checks to pt-to-tf by @gante in #17588
    • fix by @ydshieh in #17589
    • quicktour.mdx en -> pt translation by @vitorfrois in #17074
    • Fx support for Deberta-v[1-2], Hubert and LXMERT by @michaelbenayoun in #17539
    • Add examples telemetry by @sgugger in #17552
    • Fix gendered sentence in Spanish translation by @omarespejel in #17558
    • Fix circular import in onnx.utils by @sgugger in #17577
    • Use latest stable PyTorch/DeepSpeed for Push & Scheduled CI by @ydshieh in #17417
    • Remove circular imports in layoutlm/init.py by @regisss in #17576
    • Add magic method to our TF models to convert datasets with column inference by @Rocketknight1 in #17160
    • [deepspeed / testing] reset global state by @stas00 in #17553
    • Remove RuntimeErrors for NaN-checking in 20B by @zphang in #17563
    • fix integration test levit by @AnugunjNaman in #17555
    • [deepspeed] fix load_best_model test by @stas00 in #17550
    • Update index.mdx by @BritneyMuller in #17547
    • Clean imports to fix test_fetcher by @sgugger in #17531
    • Update run_glue_no_trainer.py by @bofenghuang in #17546
    • Fix all offload and MP tests by @sgugger in #17533
    • Fix bug - layer names and activation from previous refactor by @amyeroberts in #17524
    • Add support for Perceiver ONNX export by @deutschmn in #17213
    • Allow from transformers import TypicalLogitsWarper by @teticio in #17477
    • Add Gated-SiLU to T5 by @DanielHesslow in #17420
    • Update URL for Hub PR docs by @lewtun in #17532
    • fix OPT-Flax CI tests by @ArthurZucker in #17512
    • [trainer/deepspeed] load_best_model (reimplement re-init) by @stas00 in #17151
    • Implemented loss for training AudioFrameClassification by @MorenoLaQuatra in #17513
    • Update configuration_auto.py by @kamalkraj in #17527
    • Check list of models in the main README and sort it by @sgugger in #17517
    • Fix when Accelerate is not installed by @sgugger in #17518
    • Clean README in post release job as well. by @sgugger in #17519
    • Fix CI tests hang forever by @ydshieh in #17471
    • Print more library versions in CI by @ydshieh in #17384
    • Split push CI into 2 workflows by @ydshieh in #17369
    • Fix Tapas tests by @ydshieh in #17510
    • CLI: tool to convert PT into TF weights and open hub PR by @gante in #17497
    • Fix flakey no-trainer test by @muellerzr in #17515
    • Deal with the error when task is regression by @fireindark707 in #16330
    • Fix CTRL tests by @ydshieh in #17508
    • Fix LayoutXLMProcessorTest by @ydshieh in #17506
    • Debug LukeForMaskedLM by @Ryou0634 in #17499
    • Fix MP and CPU offload tests for Funnel and GPT-Neo by @sgugger in #17503
    • Exclude Databricks from notebook env by @sgugger in #17496
    • Fix tokenizer type annotation in pipeline(...) by @willfrey in #17500
    • Refactor classes to inherit from nn.Module instead of nn.Sequential by @amyeroberts in #17493
    • Fix wav2vec2 export onnx model with attention_mask error by @nilboy in #16004
    • Add warning when using older version of torch for ViltFeatureExtractor by @xhluca in #16756
    • Fix typo of variable names for key and query projection layer by @Kyeongpil in #17155
    • Fixed wrong error message for missing weight file by @123jimin in #17216
    • Add OnnxConfig for SqueezeBert iss17314 by @Ruihua-Fang in #17315
    • [GPT2Tokenizer] Fix GPT2 with bos token by @patrickvonplaten in #17498
    • [Json configs] Make json prettier for all saved tokenizer files & ensure same json format for all processors (tok + feat_extract) by @patrickvonplaten in #17457
    • Accumulate tokens into batches in PreTrainedTokenizerBase.add_tokens() by @Witiko in #17119
    • Add HF.co for PRs / Issues regarding specific model checkpoints by @patrickvonplaten in #17485
    • Fix checkpoint name by @ydshieh in #17484
    • Docker image build in parallel by @ydshieh in #17434
    • Added XLM onnx config by @nandwalritik in #17030
    • Disk offload fix by @sgugger in #17428
    • TF: GPT-2 generation supports left-padding by @gante in #17426
    • Fix ViTMAEModelTester by @ydshieh in #17470
    • [Generate] Fix output scores greedy search by @patrickvonplaten in #17442
    • Fix nits by @omarespejel in #17349
    • Fx support for multiple model architectures by @michaelbenayoun in #17393
    • typo IBERT in repr quant_mode by @scratchmex in #17398
    • Fix typo (remove parenthesis) by @mikcnt in #17415
    • Improve notrainer examples by @pacman100 in #17449
    • [OPT] Fix bos token id default by @patrickvonplaten in #17441
    • Fix model parallelism test by @sgugger in #17439
    • Pin protobouf that breaks TensorBoard in PyTorch by @sgugger in #17440
    • Spanish translation of the file preprocessing.mdx by @yharyarias in #16299
    • Spanish translation of the files sagemaker.mdx and image_classification.mdx by @SimplyJuanjo in #17262
    • Added es version of bertology.mdx doc by @jQuinRivero in #17255
    • Wav2vec2 finetuning shared file system by @patrickvonplaten in #17423
    • fix link in performance docs by @lvwerra in #17419
    • Add link to Hub PR docs in model cards by @lewtun in #17421
    • Upd AutoTokenizer.from_pretrained doc examples by @c00k1ez in #17416
    • Support compilation via Torchdynamo, AOT Autograd, NVFuser by @anijain2305 in #17308
    • Add test for new model parallelism features by @sgugger in #17401
    • Make check_init script more robust and clean inits by @sgugger in #17408
    • Fix README localizer script by @sgugger in #17407
    • Fix expected value for OPT test test_inference_no_head by @ydshieh in #17395
    • Clean up CLIP tests by @NielsRogge in #17380
    • Enabling imageGPT auto feature extractor. by @Narsil in #16871
    • Add support for device_map="auto" to OPT by @sgugger in #17382
    • OPTForCausalLM lm_head input size should be config.word_embed_proj_dim by @vfbd in #17225
    • Traced models serialization and torchscripting fix by @michaelbenayoun in #17206
    • Fix Comet ML integration by @mxschmdt in #17381
    • Fix cvt docstrings by @AnugunjNaman in #17367
    • Correct & Improve Doctests for LayoutLMv2 by @gnolai in #17168
    • Fix CodeParrot training script by @loubnabnl in #17291
    • Fix a typo relative_postion_if_large -> relative_position_if_large by @stancld in #17366
    • Pin dill to fix examples by @sgugger in #17368
    • [Test OPT] Add batch generation test opt by @patrickvonplaten in #17359
    • Fix bug in Wav2Vec2 pretrain example by @ddobokki in #17326
    • fix for 17292 by @nadahlberg in #17293
    • [Generation] Fix Transition probs by @patrickvonplaten in #17311
    • [OPT] Run test in lower precision on GPU by @patrickvonplaten in #17353
    • Adding batch_size test to QA pipeline. by @Narsil in #17330
    • [BC] Fixing usage of text pairs by @Narsil in #17324
    • [tests] fix copy-n-paste error by @stas00 in #17312
    • Fix ci_url might be None by @ydshieh in #17332
    • fix by @ydshieh in #17337
    • Fix metric calculation in examples and setup tests to run on multi-gpu for no_trainer scripts by @muellerzr in #17331
    • docs for typical decoding by @jadermcs in #17186
    • Not send successful report by @ydshieh in #17329
    • Fix test_t5_decoder_model_past_large_inputs by @ydshieh in #17320
    • Add onnx export cuda support by @JingyaHuang in #17183
    • Add Information Gain Filtration algorithm by @mraunak in #16953
    • Fix typo by @kamalkraj in #17328
    • remove by @ydshieh in #17325
    • Accepting real pytorch device as arguments. by @Narsil in #17318
    • Updating the docs for max_seq_len in QA pipeline by @Narsil in #17316
    • [T5] Fix init in TF and Flax for pretraining by @patrickvonplaten in #17294
    • Add type hints for ProphetNet (Pytorch) by @jQuinRivero in #17223
    • fix by @patrickvonplaten in #17310
    • [LED] fix global_attention_mask not being passed for generation and docs clarification about grad checkpointing by @caesar-one in #17112
    • Add support for pretraining recurring span selection to Splinter by @jvcop in #17247
    • Add PR author in CI report + merged by info by @ydshieh in #17298
    • Fix dummy creation script by @sgugger in #17304
    • Doctest longformer by @KMFODA in #16441
    • [Test] Fix W2V-Conformer integration test by @patrickvonplaten in #17303
    • Improve mismatched sizes management when loading a pretrained model by @regisss in #17257
    • correct opt by @patrickvonplaten in #17301
    • Rewrite TensorFlow train_step and test_step by @Rocketknight1 in #17057
    • Fix tests of mixed precision now that experimental is deprecated by @Rocketknight1 in #17300
    • fix retribert's test_torch_encode_plus_sent_to_model by @SaulLu in #17231
    • [ConvNeXT] Fix drop_path_rate by @NielsRogge in #17280
    • Fix wrong PT/TF categories in CI report by @ydshieh in #17272
    • Fix missing job action button in CI report by @ydshieh in #17270
    • Fix test_model_parallelization by @lkm2835 in #17249
    • [Tests] Fix slow opt tests by @patrickvonplaten in #17282
    • docs(transformers): fix typo by @k-zehnder in #17263
    • logging documentation update by @sanderland in #17174
    • Use the PR URL in CI report by @ydshieh in #17269
    • Fix FlavaForPreTrainingIntegrationTest CI test by @ydshieh in #17232
    • Better error in the Auto API when a dep is missing by @sgugger in #17289
    • Make TrainerHyperParameterSigOptIntegrationTest slow test by @ydshieh in #17288
    • Automatically sort auto mappings by @sgugger in #17250
    • Mlflowcallback fix nonetype error by @orieg in #17171
    • Align logits and labels in OPT by @MichelBartels in #17237
    • Remove next sentence prediction from supported ONNX tasks by @lewtun in #17276
    • CodeParrot data pretokenization by @loubnabnl in #16932
    • Update codeparrot data preprocessing by @loubnabnl in #16944
    • Updated checkpoint support for Sagemaker Model Parallel by @cavdard in #17219
    • fixed bug in run_mlm_flax_stream.py by @KennethEnevoldsen in #17203
    • [doc] performance/scalability revamp by @stas00 in #15723
    • TF - Fix convnext classification example by @gante in #17261
    • Fix obvious typos in flax decoder impl by @cloudhan in #17279
    • Guide to create custom models in Spanish by @ignacioct in #17158
    • Translated version of model_sharing.mdx doc to spanish by @Gerard-170 in #16184
    • Add PR title to push CI report by @ydshieh in #17246
    • Fix push CI channel by @ydshieh in #17242
    • install dev. version of accelerate by @ydshieh in #17243
    • Fix Trainer for Datasets that don't have dict items by @sgugger in #17239
    • Handle copyright in add-new-model-like by @sgugger in #17218
    • fix --gpus option for docker by @ydshieh in #17235
    • Update self-push workflow by @ydshieh in #17177
    • OPT - fix docstring and improve tests slighly by @patrickvonplaten in #17228
    • OPT-fix by @younesbelkada in #17229
    • Fix typo in bug report template by @fxmarty in #17178
    • Black preview by @sgugger in #17217
    • update BART docs by @patil-suraj in #17212
    • Add test to ensure models can take int64 inputs by @Rocketknight1 in #17210

    Significant community contributions

    The following contributors have made significant changes to the library over the last release:

    • @sayakpaul
      • Include a comment to reflect Amy's contributions (#17689)
      • Add TFData2VecVision for semantic segmentation (#17271)
    • @jianan-gu
      • Extend Transformers Trainer Class to Enable PyTorch Torchscript for Inference (#17153)
      • Extend Transformers Trainer Class to Enable CPU AMP and Integrate Intel Extension for PyTorch (#17138)
    • @stancld
      • Add LongT5 model (#16792)
      • Fix a typo relative_postion_if_large -> relative_position_if_large (#17366)
    • @mfumanelli
      • Translation/autoclass (#17615)
      • Add installation.mdx Italian translation (#17530)
      • Setup for Italian translation and add quicktour.mdx translation (#17472)
    • @cwkeam
      • M-CTC-T Model (#16402)
    • @zphang
      • Remove RuntimeErrors for NaN-checking in 20B (#17563)
      • Adding GPT-NeoX-20B (#16659)
    • @AnugunjNaman
      • fix integration test levit (#17555)
      • Adding LeViT Model by Facebook (#17466)
      • Fix cvt docstrings (#17367)
    • @yharyarias
      • Spanish translation of the file preprocessing.mdx (#16299)
    • @mraunak
      • Add Information Gain Filtration algorithm (#16953)
    • @rzimmerdev
      • Added translation of installation.mdx to Portuguese Issue #16824 (#16979)
    Source code(tar.gz)
    Source code(zip)
  • v4.19.4(Jun 10, 2022)

    Fixes the errors message when trying to access a repo that does not exist (started to break due to changes in Hub API).

    [🐛]Properly raise RepoNotFoundError when not authenticated #17651[

    Source code(tar.gz)
    Source code(zip)
  • v4.19.3(Jun 9, 2022)

    This patch release fixes the install of protobuf when a user wants to do pip install transformers[sentencepiece].

    • Pin protobouf that breaks TensorBoard in PyTorch #17440
    Source code(tar.gz)
    Source code(zip)
  • v4.19.1(May 13, 2022)

  • v4.19.0(May 12, 2022)

    Disclaimer: this release is the first release with no Python 3.6 support.

    OPT

    The OPT model was proposed in Open Pre-trained Transformer Language Models by Meta AI. OPT is a series of open-sourced large causal language models which perform similar in performance to GPT3.

    • Add OPT by @younesbelkada in #17088

    FLAVA

    The FLAVA model was proposed in FLAVA: A Foundational Language And Vision Alignment Model by Amanpreet Singh, Ronghang Hu, Vedanuj Goswami, Guillaume Couairon, Wojciech Galuba, Marcus Rohrbach, and Douwe Kiela and is accepted at CVPR 2022.

    The paper aims at creating a single unified foundation model which can work across vision, language as well as vision-and-language multimodal tasks.

    • [feat] Add FLAVA model by @apsdehal in #16654

    YOLOS

    The YOLOS model was proposed in You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection by Yuxin Fang, Bencheng Liao, Xinggang Wang, Jiemin Fang, Jiyang Qi, Rui Wu, Jianwei Niu, Wenyu Liu. YOLOS proposes to just leverage the plain Vision Transformer (ViT) for object detection, inspired by DETR. It turns out that a base-sized encoder-only Transformer can also achieve 42 AP on COCO, similar to DETR and much more complex frameworks such as Faster R-CNN.

    • Add YOLOS by @NielsRogge in #16848

    RegNet

    The RegNet model was proposed in Designing Network Design Spaces by Ilija Radosavovic, Raj Prateek Kosaraju, Ross Girshick, Kaiming He, Piotr Dollár.

    The authors design search spaces to perform Neural Architecture Search (NAS). They first start from a high dimensional search space and iteratively reduce the search space by empirically applying constraints based on the best-performing models sampled by the current search space.

    • RegNet by @FrancescoSaverioZuppichini in #16188

    TAPEX

    The TAPEX model was proposed in TAPEX: Table Pre-training via Learning a Neural SQL Executor by Qian Liu, Bei Chen, Jiaqi Guo, Morteza Ziyadi, Zeqi Lin, Weizhu Chen, Jian-Guang Lou. TAPEX pre-trains a BART model to solve synthetic SQL queries, after which it can be fine-tuned to answer natural language questions related to tabular data, as well as performing table fact checking.

    • Add TAPEX by @NielsRogge in #16473

    Data2Vec: vision

    The Data2Vec model was proposed in data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language by Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu and Michael Auli. Data2Vec proposes a unified framework for self-supervised learning across different data modalities - text, audio and images. Importantly, predicted targets for pre-training are contextualized latent representations of the inputs, rather than modality-specific, context-independent targets.

    The vision model is added in v4.19.0.

    • [Data2Vec] Add data2vec vision by @patrickvonplaten in #16760
    • Add Data2Vec for Vision in TF by @sayakpaul in #17008

    FSDP integration in Trainer

    PyTorch recently upstreamed the Fairscale FSDP into PyTorch Distributed with additional optimizations. This PR is aimed at integrating it into Trainer API.

    It enables Distributed Training at Scale. It's a wrapper for sharding Module parameters across data parallel workers. This is inspired by Xu et al. as well as the ZeRO Stage 3 from DeepSpeed. PyTorch FSDP will focus more on production readiness and long-term support. This includes better integration with ecosystems and improvements on performance, usability, reliability, debuggability and composability.

    • PyTorch FSDP integration in Trainer by @pacman100 in #17136

    Training scripts

    New example scripts were added for image classification and semantic segmentation. Both now have versions that leverage the Trainer API and Accelerate.

    • Add image classification script, no trainer by @NielsRogge in #16727
    • Add semantic script no trainer, v2 by @NielsRogge in #16788
    • Add semantic script, trainer by @NielsRogge in #16834

    Documentation in Spanish

    To continue democratizing good machine learning, we're making the Transformers documentation more accessible to non-English speakers; starting with Spanish (572M speakers worldwide).

    • Added es version of language_modeling.mdx doc by @jQuinRivero in #17021
    • Spanish translation of the file philosophy.mdx by @jkmg in #16922
    • Documentation: Spanish translation of fast_tokenizers.mdx by @jloayza10 in #16882
    • Translate index.mdx (to ES) and add Spanish models to quicktour.mdx examples by @omarespejel in #16685
    • Spanish translation of the file multilingual.mdx by @SimplyJuanjo in #16329
    • Added spanish translation of autoclass_tutorial. by @Duedme in #17069
    • Fix style error in Spanish docs by @osanseviero in #17197

    Improvements and bugfixes

    • [modeling_utils] rearrange text by @stas00 in #16632
    • Added Annotations for PyTorch models by @anmolsjoshi in #16619
    • Allow the same config in the auto mapping by @sgugger in #16631
    • Update no_trainer scripts with new Accelerate functionalities by @muellerzr in #16617
    • Fix doc example by @NielsRogge in #16448
    • Add inputs vector to calculate metric method by @lmvasque in #16461
    • [megatron-bert-uncased-345m] fix conversion by @stas00 in #16639
    • Remove parent/child tests in auto model tests by @sgugger in #16653
    • Updated _load_pretrained_model_low_mem to check if keys are in the state_dict by @FrancescoSaverioZuppichini in #16643
    • Update Support image on README.md by @BritneyMuller in #16615
    • bert: properly mention deprecation of TF2 conversion script by @stefan-it in #16171
    • add vit tf doctest with @add_code_sample_docstrings by @johko in #16636
    • Fix error in doc of DataCollatorWithPadding by @secsilm in #16662
    • Fix QA sample by @ydshieh in #16648
    • TF generate refactor - Beam Search by @gante in #16374
    • Add tests for no_trainer and fix existing examples by @muellerzr in #16656
    • only load state dict when the checkpoint is not None by @laurahanu in #16673
    • [Trainer] tf32 arg doc by @stas00 in #16674
    • Update audio examples with MInDS-14 by @stevhliu in #16633
    • add a warning in SpmConverter for sentencepiece's model using the byte fallback feature by @SaulLu in #16629
    • Fix some doc examples in task summary by @ydshieh in #16666
    • Jia multi gpu eval by @liyongsea in #16428
    • Generate: min length can't be larger than max length by @gante in #16668
    • fixed crash when deleting older checkpoint and a file f"{checkpoint_prefix}-*" exist by @sadransh in #16686
    • [Doctests] Correct task summary by @patrickvonplaten in #16644
    • Add Doc Test for BERT by @vumichien in #16523
    • Fix t5 shard on TPU Pods by @agemagician in #16527
    • update decoder_vocab_size when resizing embeds by @patil-suraj in #16700
    • Fix TF_MASKED_LM_SAMPLE by @ydshieh in #16698
    • Rename the method test_torchscript by @ydshieh in #16693
    • Reduce memory leak in _create_and_check_torchscript by @ydshieh in #16691
    • Enable more test_torchscript by @ydshieh in #16679
    • Don't push checkpoints to hub in no_trainer scripts by @muellerzr in #16703
    • Private repo TrainingArgument by @nbroad1881 in #16707
    • Handle image_embeds in ViltModel by @ydshieh in #16696
    • Improve PT/TF equivalence test by @ydshieh in #16557
    • Fix example logs repeating themselves by @muellerzr in #16669
    • [Bart] correct doc test by @patrickvonplaten in #16722
    • Add Doc Test GPT-2 by @ArEnSc in #16439
    • Only call get_output_embeddings when tie_word_embeddings is set by @smelm in #16667
    • Update run_translation_no_trainer.py by @raki-1203 in #16652
    • Qdqbert example add benchmark script with ORT-TRT by @shangz-ai in #16592
    • Replace assertion with exception by @anmolsjoshi in #16720
    • Change the chunk_iter function to handle by @Narsil in #16730
    • Remove duplicate header by @sgugger in #16732
    • Moved functions to pytorch_utils.py by @anmolsjoshi in #16625
    • TF: remove set_tensor_by_indices_to_value by @gante in #16729
    • Add Doc Tests for Reformer PyTorch by @hiromu166 in #16565
    • [FlaxSpeechEncoderDecoder] Fix input shape bug in weights init by @sanchit-gandhi in #16728
    • [FlaxWav2Vec2Model] Fix bug in attention mask by @sanchit-gandhi in #16725
    • add Bigbird ONNX config by @vumichien in #16427
    • TF generate: handle case without cache in beam search by @gante in #16704
    • Fix decoding score comparison when using logits processors or warpers by @bryant1410 in #10638
    • [Doctests] Fix all T5 doc tests by @patrickvonplaten in #16646
    • Fix #16660 (tokenizers setters of ids of special tokens) by @davidleonfdez in #16661
    • [from_pretrained] refactor find_mismatched_keys by @stas00 in #16706
    • Add Doc Test for GPT-J by @ArEnSc in #16507
    • Fix and improve CTRL doctests by @jeremyadamsfisher in #16573
    • [modeling_utils] better explanation of ignore keys by @stas00 in #16741
    • CI: setup-dependent pip cache by @gante in #16751
    • Reduce Funnel PT/TF diff by @ydshieh in #16744
    • Add defensive check for config num_labels and id2label by @sgugger in #16709
    • Add self training code for text classification by @tuvuumass in #16738
    • [self-scheduled ci] explain where dependencies are by @stas00 in #16757
    • Fixup no_trainer examples scripts and add more tests by @muellerzr in #16765
    • [Doctest] added doctest changes for electra by @bhadreshpsavani in #16675
    • Enabling Tapex in table question answering pipeline. by @Narsil in #16663
    • [Flax .from_pretrained] Raise a warning if model weights are not in float32 by @sanchit-gandhi in #16762
    • Fix batch size in evaluation loop by @sgugger in #16763
    • Make nightly install dev accelerate by @muellerzr in #16783
    • [deepspeed / m2m_100] make deepspeed zero-3 work with layerdrop by @stas00 in #16717
    • Kill async pushes when calling push_to_hub with blocking=True by @sgugger in #16755
    • Improve image classification example by @NielsRogge in #16585
    • [SpeechEncoderDecoderModel] Fix bug in reshaping labels by @sanchit-gandhi in #16748
    • Fix issue avoid-missing-comma found at https://codereview.doctor by @code-review-doctor in #16768
    • [trainer / deepspeed] fix hyperparameter_search by @stas00 in #16740
    • [modeling utils] revamp from_pretrained(..., low_cpu_mem_usage=True) + tests by @stas00 in #16657
    • Fix PT TF ViTMAE by @ydshieh in #16766
    • Update README.md by @NielsRogge in #16797
    • Pin Jax to last working release by @sgugger in #16808
    • CI: non-remote GH Actions now use a python venv by @gante in #16789
    • TF generate refactor - XLA sample by @gante in #16713
    • Raise error and suggestion when using custom optimizer with Fairscale or Deepspeed by @allanj in #16786
    • Create empty venv on cache miss by @gante in #16816
    • [ViT, BEiT, DeiT, DPT] Improve code by @NielsRogge in #16799
    • [Quicktour Audio] Improve && remove ffmpeg dependency by @patrickvonplaten in #16723
    • fix megatron bert convert state dict naming by @Codle in #15820
    • use base_version to check torch version in torch_less_than_1_11 by @nbroad1881 in #16806
    • Allow passing encoder_ouputs as tuple to EncoderDecoder Models by @jsnfly in #16814
    • Refactor issues with yaml by @LysandreJik in #16772
    • fix _setup_devices in case where there is no torch.distributed package in build by @dlwh in #16821
    • Clean up semantic segmentation tests by @NielsRogge in #16801
    • Fix LayoutLMv2 tokenization docstrings by @qqaatw in #16187
    • Wav2 vec2 phoneme ctc tokenizer optimisation by @ArthurZucker in #16817
    • [Flax] improve large model init and loading by @patil-suraj in #16148
    • Some tests misusing assertTrue for comparisons fix by @code-review-doctor in #16771
    • Type hints added for TFMobileBert by @Dahlbomii in #16505
    • fix rum_clm.py seeking text column name twice by @dandelin in #16624
    • Add onnx export of models with a multiple choice classification head by @echarlaix in #16758
    • [ASR Pipeline] Correct init docs by @patrickvonplaten in #16833
    • Add doc about attention_mask on gpt2 by @wiio12 in #16829
    • TF: Add sigmoid activation function by @gante in #16819
    • Correct Logging of Eval metric to Tensorboard by @Jeevesh8 in #16825
    • replace Speech2TextTokenizer by Speech2TextFeatureExtractor in some docstrings by @SaulLu in #16835
    • Type hints added to Speech to Text by @Dahlbomii in #16506
    • Improve test_pt_tf_model_equivalence on PT side by @ydshieh in #16731
    • Add support for bitsandbytes by @manuelciosici in #15622
    • [Typo] Fix typo in modeling utils by @patrickvonplaten in #16840
    • add DebertaV2 fast tokenizer by @mingboiz in #15529
    • Fixing return type tensor with num_return_sequences>1. by @Narsil in #16828
    • [modeling_utils] use less cpu memory with sharded checkpoint loading by @stas00 in #16844
    • [docs] fix url by @stas00 in #16860
    • Fix custom init sorting script by @sgugger in #16864
    • Fix multiproc metrics in no_trainer examples by @muellerzr in #16865
    • Long QuestionAnsweringPipeline fix. by @Narsil in #16778
    • t5: add conversion script for T5X to FLAX by @stefan-it in #16853
    • tiny tweak to allow BatchEncoding.token_to_char when token doesn't correspond to chars by @ghlai9665 in #15901
    • Adding support for array key in raw dictionnaries in ASR pipeline. by @Narsil in #16827
    • Return input_ids in ImageGPT feature extractor by @sgugger in #16872
    • Use ACT2FN to fetch ReLU activation by @eldarkurtic in #16874
    • Fix GPT-J onnx conversion by @ChainYo in #16780
    • Fix doctest list by @ydshieh in #16878
    • New features for CodeParrot training script by @loubnabnl in #16851
    • Add missing entries in mappings by @ydshieh in #16857
    • TF: rework XLA generate tests by @gante in #16866
    • Minor fixes/improvements in convert_file_size_to_int by @mariosasko in #16891
    • Add doc tests for Albert and Bigbird by @vumichien in #16774
    • Add OnnxConfig for ConvBERT by @ChainYo in #16859
    • TF: XLA repetition penalty by @gante in #16879
    • Changes in create_optimizer to support tensor parallelism with SMP by @cavdard in #16880
    • [DocTests] Fix some doc tests by @patrickvonplaten in #16889
    • add bigbird typo fixes by @ChainYo in #16897
    • Fix doc test quicktour dataset by @patrickvonplaten in #16929
    • Add missing ckpt in config docs by @ydshieh in #16900
    • Fix PyTorch RAG tests GPU OOM by @ydshieh in #16881
    • Fix RemBertTokenizerFast by @ydshieh in #16933
    • TF: XLA logits processors - minimum length, forced eos, and forced bos by @gante in #16912
    • TF: XLA Logits Warpers by @gante in #16899
    • added deit onnx config by @rushic24 in #16887
    • TF: XLA stable softmax by @gante in #16892
    • Replace deprecated logger.warn with warning by @sanchit-gandhi in #16876
    • Fix issue probably-meant-fstring found at https://codereview.doctor by @code-review-doctor in #16913
    • Limit the use of PreTrainedModel.device by @sgugger in #16935
    • apply torch int div to layoutlmv2 by @ManuelFay in #15457
    • FIx Iterations for decoder by @agemagician in #16934
    • Add onnx config for RoFormer by @skrsna in #16861
    • documentation: some minor clean up by @mingboiz in #16850
    • Fix RuntimeError message format by @ftnext in #16906
    • use original loaded keys to find mismatched keys by @tricktreat in #16920
    • [Research] Speed up evaluation for XTREME-S by @anton-l in #16785
    • Fix HubertRobustTest PT/TF equivalence test on GPU by @ydshieh in #16943
    • Misc. fixes for Pytorch QA examples: by @searchivarius in #16958
    • [HF Argparser] Fix parsing of optional boolean arguments by @NielsRogge in #16946
    • Fix distributed_concat with scalar tensor by @Yard1 in #16963
    • Update custom_models.mdx by @mishig25 in #16964
    • Fix add-new-model-like when model doesn't support all frameworks by @sgugger in #16966
    • Fix multiple deletions of the same files in save_pretrained by @sgugger in #16947
    • Fixup no_trainer save logic by @muellerzr in #16968
    • Fix doc notebooks links by @sgugger in #16969
    • Fix check_all_models_are_tested by @ydshieh in #16970
    • Add -e flag to some GH workflow yml files by @ydshieh in #16959
    • Update tokenization_bertweet.py by @datquocnguyen in #16941
    • Update check_models_are_tested to deal with Windows path by @ydshieh in #16973
    • Add parameter --config_overrides for run_mlm_wwm.py by @conan1024hao in #16961
    • Rename a class to reflect framework pattern AutoModelXxx -> TFAutoModelXxx by @amyeroberts in #16993
    • set eos_token_id to None to generate until max length by @ydshieh in #16989
    • Fix savedir for by epoch by @muellerzr in #16996
    • Update README to latest release by @sgugger in #16997
    • use scale=1.0 in floats_tensor called in speech model testers by @ydshieh in #17007
    • Update all require decorators to use skipUnless when possible by @muellerzr in #16999
    • TF: XLA bad words logits processor and list of processors by @gante in #16974
    • Make create_extended_attention_mask_for_decoder static method by @pbelevich in #16893
    • Update README_zh-hans.md by @tarzanwill in #16977
    • Updating variable names. by @Narsil in #16445
    • Revert "Updating variable names. by @Narsil in #16445)"
    • Replace dict/BatchEncoding instance checks by Mapping by @sgugger in #17014
    • Result of new doc style with fixes by @sgugger in #17015
    • Add a check on config classes docstring checkpoints by @ydshieh in #17012
    • Add translating guide by @omarespejel in #17004
    • update docs of length_penalty by @manandey in #17022
    • [FlaxGenerate] Fix bug in decoder_start_token_id by @sanchit-gandhi in #17035
    • Fx with meta by @michaelbenayoun in #16836
    • [Flax(Speech)EncoderDecoder] Fix bug in decoder_module by @sanchit-gandhi in #17036
    • Fix typo in RetriBERT docstring by @mpoemsl in #17018
    • add torch.no_grad when in eval mode by @JunnYu in #17020
    • Disable Flax GPU tests on push by @sgugger in #17042
    • Clean up vision tests by @NielsRogge in #17024
    • [Trainer] Move logic for checkpoint loading into separate methods for easy overriding by @calpt in #17043
    • Update no_trainer examples to use new logger by @muellerzr in #17044
    • Fix no_trainer examples to properly calculate the number of samples by @muellerzr in #17046
    • Allow all imports from transformers by @LysandreJik in #17050
    • Make the sacremoses dependency optional by @LysandreJik in #17049
    • Clean up setup.py by @sgugger in #17045
    • [T5 Tokenizer] Model has no fixed position ids - there is no hardcode… by @patrickvonplaten in #16990
    • [FlaxBert] Add ForCausalLM by @sanchit-gandhi in #16995
    • Move test model folders by @ydshieh in #17034
    • Make Trainer compatible with sharded checkpoints by @sgugger in #17053
    • Remove Python and use v2 action by @sgugger in #17059
    • Fix RNG reload in resume training from epoch checkpoint by @sgugger in #17055
    • Remove device parameter from create_extended_attention_mask_for_decoder by @pbelevich in #16894
    • Fix hashing for deduplication by @thomasw21 in #17048
    • Skip RoFormer ONNX test if rjieba not installed by @lewtun in #16981
    • Remove masked image modeling from BEIT ONNX export by @lewtun in #16980
    • Make sure telemetry arguments are not returned as unused kwargs by @sgugger in #17063
    • Type hint complete Albert model file. by @karthikrangasai in #16682
    • Deprecate model templates by @sgugger in #17062
    • Update to build via git for accelerate by @muellerzr in #17084
    • Allow saved_model export of TFCLIPModel in save_pretrained by @seanmor5 in #16886
    • Fix DeBERTa token_type_ids by @deutschmn in #17082
    • 📝 open fresh PR for pipeline doctests by @stevhliu in #17073
    • minor change on TF Data2Vec test by @ydshieh in #17085
    • type hints for pytorch models by @robotjellyzone in #17064
    • Add type hints for BERTGeneration by @robsmith155 in #17047
    • Fix MLflowCallback and add support for MLFLOW_EXPERIMENT_NAME by @orieg in #17091
    • Remove torchhub test by @sgugger in #17097
    • fix missing "models" in pipeline test module by @ydshieh in #17090
    • Fix link to example scripts by @stevhliu in #17103
    • Fix self-push CI report path in cat by @ydshieh in #17111
    • Added BigBirdPegasus onnx config by @nandwalritik in #17104
    • split single_gpu and multi_gpu by @ydshieh in #17083
    • LayoutLMv2Processor: ensure 1-to-1 mapping between images and samples in case of overflowing tokens by @ghlai9665 in #17092
    • Add type hints for BigBirdPegasus and Data2VecText PyTorch models by @robsmith155 in #17123
    • add mobilebert onnx configs by @manandey in #17029
    • [WIP] Fix Pyright static type checking by replacing if-else imports with try-except by @d-miketa in #16578
    • Add the auto_find_batch_size capability from Accelerate into Trainer by @muellerzr in #17068
    • Fix MLflowCallback end_run() and add support for tags and nested runs by @orieg in #17130
    • Fix all docs for accelerate install directions by @muellerzr in #17145
    • LogSumExp trick question_answering pipeline. by @Narsil in #17143
    • train args defaulting None marked as Optional by @d-miketa in #17156
    • [trainer] sharded _load_best_model by @stas00 in #17150
    • [Deepspeed] add many more models to the model zoo test by @stas00 in #12695
    • Fixing the output of code examples in the preprocessing chapter by @HallerPatrick in #17162
    • missing file by @stas00 in #17164
    • Add MLFLOW_FLATTEN_PARAMS support in MLflowCallback by @orieg in #17148
    • Fix template init by @sgugger in #17163
    • MobileBERT tokenizer tests by @leondz in #16896
    • [M2M100 doc] remove duplicate example by @patil-suraj in #17175
    • Extend Transformers Trainer Class to Enable PyTorch SGD/Adagrad Optimizers for Training by @jianan-gu in #17154
    • propagate "attention_mask" dtype for "use_past" in OnnxConfig.generate_dummy_inputs by @arampacha in #17105
    • Convert image to rgb for clip model by @hengkuanwee in #17101
    • Add missing RetriBERT tokenizer tests by @mpoemsl in #17017
    • [WIP] Enable reproducibility for distributed trainings by @hasansalimkanmaz in #16907
    • Remove unnecessary columns for all dataset types in Trainer by @Yard1 in #17166
    • Fix LED documentation by @manuelciosici in #17181
    • Ensure tensors are at least 1d for pad and concat by @Yard1 in #17179
    • add shift_tokens_right in FlaxMT5 by @patil-suraj in #17188
    • Remove columns before passing to data collator by @Yard1 in #17187
    • Remove duplicated os.path.join by @shijie-wu in #17192
    • Fix contents in index.mdx to match docs' sidebar by @omarespejel in #17198
    • ViT and Swin symbolic tracing with torch.fx by @michaelbenayoun in #17182
    • migrate azure blob for beit checkpoints by @donglixp in #16902
    • Update data2vec.mdx to include a Colab Notebook link (that shows fine-tuning) by @sayakpaul in #17194

    Significant community contributions

    The following contributors have made significant changes to the library over the last release:

    • @anmolsjoshi
      • Added Annotations for PyTorch models (#16619)
      • Replace assertion with exception (#16720)
      • Moved functions to pytorch_utils.py (#16625)
    • @vumichien
      • Add Doc Test for BERT (#16523)
      • add Bigbird ONNX config (#16427)
      • Add doc tests for Albert and Bigbird (#16774)
    • @tuvuumass
      • Add self training code for text classification (#16738)
    • @sayakpaul
      • Add Data2Vec for Vision in TF (#17008)
    • @robotjellyzone
      • type hints for pytorch models (#17064)
    • @d-miketa
      • [WIP] Fix Pyright static type checking by replacing if-else imports with try-except (#16578)
      • train args defaulting None marked as Optional (#17156)
    Source code(tar.gz)
    Source code(zip)
  • v4.18.0(Apr 7, 2022)

    New model additions

    You'll notice that we are starting to add several older models in vision. This is because those models are used as backbones in recent architectures. While we could rely on existing libraries for such pretrained models, we will ultimately need some support for those backbones in PyTorch/TensorFlow and Jax, and there is currently no library that supports those three frameworks. This is why we are starting to add those models to Transformers directly (here ResNet and VAN)

    GLPN

    The GLPN model was proposed in Global-Local Path Networks for Monocular Depth Estimation with Vertical CutDepth by Doyeon Kim, Woonghyun Ga, Pyungwhan Ahn, Donggyu Joo, Sehwan Chun, Junmo Kim. GLPN combines SegFormer’s hierarchical mix-Transformer with a lightweight decoder for monocular depth estimation. The proposed decoder shows better performance than the previously proposed decoders, with considerably less computational complexity.

    • Add GLPN by @NielsRogge in https://github.com/huggingface/transformers/pull/16199

    ResNet

    The ResNet model was proposed in Deep Residual Learning for Image Recognition by Kaiming He, Xiangyu Zhang, Shaoqing Ren and Jian Sun. Our implementation follows the small changes made by Nvidia, we apply the stride=2 for downsampling in bottleneck’s 3x3 conv and not in the first 1x1. This is generally known as “ResNet v1.5”.

    ResNet introduced residual connections, they allow to train networks with an unseen number of layers (up to 1000). ResNet won the 2015 ILSVRC & COCO competition, one important milestone in deep computer vision.

    • Resnet by @FrancescoSaverioZuppichini in https://github.com/huggingface/transformers/pull/15770

    VAN

    The VAN model was proposed in Visual Attention Network by Meng-Hao Guo, Cheng-Ze Lu, Zheng-Ning Liu, Ming-Ming Cheng, Shi-Min Hu.

    This paper introduces a new attention layer based on convolution operations able to capture both local and distant relationships. This is done by combining normal and large kernel convolution layers. The latter uses a dilated convolution to capture distant correlations.

    • Visual Attention Network (VAN) by @FrancescoSaverioZuppichini in https://github.com/huggingface/transformers/pull/16027

    VisionTextDualEncoder

    The VisionTextDualEncoderModel can be used to initialize a vision-text dual encoder model with any pretrained vision autoencoding model as the vision encoder (e.g. ViT, BEiT, DeiT) and any pretrained text autoencoding model as the text encoder (e.g. RoBERTa, BERT). Two projection layers are added on top of both the vision and text encoder to project the output embeddings to a shared latent space. The projection layers are randomly initialized so the model should be fine-tuned on a downstream task. This model can be used to align the vision-text embeddings using CLIP like contrastive image-text training and then can be used for zero-shot vision tasks such image-classification or retrieval.

    In LiT: Zero-Shot Transfer with Locked-image Text Tuning it is shown how leveraging pre-trained (locked/frozen) image and text model for contrastive learning yields significant improvment on new zero-shot vision tasks such as image classification or retrieval.

    • add VisionTextDualEncoder and CLIP fine-tuning script by @patil-suraj in https://github.com/huggingface/transformers/pull/15701

    DiT

    DiT was proposed in DiT: Self-supervised Pre-training for Document Image Transformer by Junlong Li, Yiheng Xu, Tengchao Lv, Lei Cui, Cha Zhang, Furu Wei. DiT applies the self-supervised objective of BEiT (BERT pre-training of Image Transformers) to 42 million document images, allowing for state-of-the-art results on tasks including:

    • document image classification: the RVL-CDIP dataset (a collection of 400,000 images belonging to one of 16 classes).
    • document layout analysis: the PubLayNet dataset (a collection of more than 360,000 document images constructed by automatically parsing PubMed XML files).
    • table detection: the ICDAR 2019 cTDaR dataset (a collection of 600 training images and 240 testing images).
    • Add Document Image Transformer (DiT) by @NielsRogge in https://github.com/huggingface/transformers/pull/15984

    DPT

    The DPT model was proposed in Vision Transformers for Dense Prediction by René Ranftl, Alexey Bochkovskiy, Vladlen Koltun. DPT is a model that leverages the Vision Transformer (ViT) as backbone for dense prediction tasks like semantic segmentation and depth estimation.

    • Add DPT by @NielsRogge in https://github.com/huggingface/transformers/pull/15991

    Checkpoint sharding

    Large models are becoming more and more the norm and having a checkpoint in a single file is challenging for several reasons:

    • it's tougher to upload/download files bigger than 20/30 GB efficiently
    • the whole checkpoint might not fit into RAM even if you have enough GPU memory

    That's why the save_pretrained method will know automatically shard a checkpoint in several files when you go above a 10GB threshold for PyTorch models. from_pretrained will handle such sharded checkpoints as if there was only one file.

    • Checkpoint sharding by @sgugger in https://github.com/huggingface/transformers/pull/16343

    TensorFlow implementations

    GPT-J and ViTMAE are now available in TensorFlow.

    • Add TF implementation of GPT-J by @stancld in https://github.com/huggingface/transformers/pull/15623
    • Add TF ViT MAE by @sayakpaul in https://github.com/huggingface/transformers/pull/16255

    Documentation guides

    The IA migration is wrapped up with a new conceptual guide available.

    • Create concept guide section by @stevhliu in https://github.com/huggingface/transformers/pull/16369

    Improvements and bugfixes

    • Fix doc links in release utils by @sgugger in https://github.com/huggingface/transformers/pull/15903
    • Fix a TF Vision Encoder Decoder test by @ydshieh in https://github.com/huggingface/transformers/pull/15896
    • [Fix link in pipeline doc] by @patrickvonplaten in https://github.com/huggingface/transformers/pull/15906
    • Fix and improve REALM fine-tuning by @qqaatw in https://github.com/huggingface/transformers/pull/15297
    • Freeze FlaxWav2Vec2 Feature Encoder by @sanchit-gandhi in https://github.com/huggingface/transformers/pull/15873
    • The tests were not updated after the addition of torch.diag by @Narsil in https://github.com/huggingface/transformers/pull/15890
    • [Doctests] Fix ignore bug and add more doc tests by @patrickvonplaten in https://github.com/huggingface/transformers/pull/15911
    • Enabling MaskFormer in pipelines by @Narsil in https://github.com/huggingface/transformers/pull/15917
    • Minor fixes for MaskFormer by @FrancescoSaverioZuppichini in https://github.com/huggingface/transformers/pull/15916
    • Add vision models to doc tests by @NielsRogge in https://github.com/huggingface/transformers/pull/15905
    • Fix #15898 by @davidleonfdez in https://github.com/huggingface/transformers/pull/15928
    • Update doc test readme by @patrickvonplaten in https://github.com/huggingface/transformers/pull/15926
    • Re-enabling all fast pipeline tests. by @Narsil in https://github.com/huggingface/transformers/pull/15924
    • Support CLIPTokenizerFast for CLIPProcessor by @cosmoquester in https://github.com/huggingface/transformers/pull/15913
    • Updating the slow tests: by @Narsil in https://github.com/huggingface/transformers/pull/15893
    • Adding MODEL_FOR_INSTANCE_SEGMENTATION_MAPPING by @Narsil in https://github.com/huggingface/transformers/pull/15934
    • Add missing support for Flax XLM-RoBERTa by @versae in https://github.com/huggingface/transformers/pull/15900
    • [FlaxT5 Example] Fix flax t5 example pretraining by @patrickvonplaten in https://github.com/huggingface/transformers/pull/15835
    • Do not change the output from tuple to list - to match PT's version by @ydshieh in https://github.com/huggingface/transformers/pull/15918
    • Tests for MaskFormerFeatureExtractor's post_process*** methods by @FrancescoSaverioZuppichini in https://github.com/huggingface/transformers/pull/15929
    • Constrained Beam Search [With Disjunctive Decoding] by @cwkeam in https://github.com/huggingface/transformers/pull/15761
    • [LayoutLMv2] Update requires_backends of feature extractor by @NielsRogge in https://github.com/huggingface/transformers/pull/15941
    • Made MaskFormerModelTest faster by @FrancescoSaverioZuppichini in https://github.com/huggingface/transformers/pull/15942
    • [Bug Fix] Beam search example in docs fails & a fix (integrating max_length in BeamScorer.finalize()) by @cwkeam in https://github.com/huggingface/transformers/pull/15555
    • remove re-defination of FlaxWav2Vec2ForCTCModule by @patil-suraj in https://github.com/huggingface/transformers/pull/15965
    • Support modern list type hints in HfArgumentParser by @konstantinjdobler in https://github.com/huggingface/transformers/pull/15951
    • Backprop Test for Freeze FlaxWav2Vec2 Feature Encoder by @sanchit-gandhi in https://github.com/huggingface/transformers/pull/15938
    • Fix Embedding Module Bug in Flax Models by @sanchit-gandhi in https://github.com/huggingface/transformers/pull/15920
    • Make is_thing_map in Feature Extractor post_process_panoptic_segmentation defaults to all instances by @FrancescoSaverioZuppichini in https://github.com/huggingface/transformers/pull/15954
    • Update training scripts docs by @stevhliu in https://github.com/huggingface/transformers/pull/15931
    • Set scale_embedding to False in some TF tests by @ydshieh in https://github.com/huggingface/transformers/pull/15952
    • Fix LayoutLMv2 test by @NielsRogge in https://github.com/huggingface/transformers/pull/15939
    • [Tests] Fix ViTMAE integration test by @NielsRogge in https://github.com/huggingface/transformers/pull/15949
    • Returning outputs only when asked for for MaskFormer. by @Narsil in https://github.com/huggingface/transformers/pull/15936
    • Speedup T5 Flax training by using Numpy instead of JAX for batch shuffling by @yhavinga in https://github.com/huggingface/transformers/pull/15963
    • Do a pull in case docs were updated during build by @sgugger in https://github.com/huggingface/transformers/pull/15922
    • Fix TFEncDecModelTest - Pytorch device by @ydshieh in https://github.com/huggingface/transformers/pull/15979
    • [Env Command] Add hf hub to env version command by @patrickvonplaten in https://github.com/huggingface/transformers/pull/15981
    • TF: Update multiple choice example by @gante in https://github.com/huggingface/transformers/pull/15868
    • TF generate refactor - past without encoder outputs by @gante in https://github.com/huggingface/transformers/pull/15944
    • Seed _get_train_sampler's generator with arg seed to improve reproducibility by @dlwh in https://github.com/huggingface/transformers/pull/15961
    • Add ForInstanceSegmentation models to image-segmentation pipelines by @Narsil in https://github.com/huggingface/transformers/pull/15937
    • [Doctests] Move doctests to new GPU & Fix bugs by @patrickvonplaten in https://github.com/huggingface/transformers/pull/15969
    • Removed an outdated check about hdf5_version by @ydshieh in https://github.com/huggingface/transformers/pull/16011
    • Swag example: Update doc format by @gante in https://github.com/huggingface/transformers/pull/16014
    • Fix github actions comment by @LysandreJik in https://github.com/huggingface/transformers/pull/16009
    • Simplify release utils by @sgugger in https://github.com/huggingface/transformers/pull/15921
    • Make pos optional in PerceiverAudioPreprocessor to avoid crashing PerceiverModel operation by @basilevh in https://github.com/huggingface/transformers/pull/15972
    • Fix MaskFormer failing test on master by @FrancescoSaverioZuppichini in https://github.com/huggingface/transformers/pull/16012
    • Fix broken code blocks in README.md by @upura in https://github.com/huggingface/transformers/pull/15967
    • Use tiny models for get_pretrained_model in TFEncoderDecoderModelTest by @ydshieh in https://github.com/huggingface/transformers/pull/15989
    • Add ONNX export for ViT by @lewtun in https://github.com/huggingface/transformers/pull/15658
    • Add FlaxBartForCausalLM by @sanchit-gandhi in https://github.com/huggingface/transformers/pull/15995
    • add doctests for bart like seq2seq models by @patil-suraj in https://github.com/huggingface/transformers/pull/15987
    • Fix warning message in ElectraForCausalLM by @pbelevich in https://github.com/huggingface/transformers/pull/16023
    • Freeze Feature Encoder in FlaxSpeechEncoderDecoder by @sanchit-gandhi in https://github.com/huggingface/transformers/pull/15997
    • Fix dependency error message in ServeCommand by @andstor in https://github.com/huggingface/transformers/pull/16033
    • [Docs] Improve PyTorch, Flax generate API by @patrickvonplaten in https://github.com/huggingface/transformers/pull/15988
    • [Tests] Add attentions_option to ModelTesterMixin by @NielsRogge in https://github.com/huggingface/transformers/pull/15909
    • [README] fix url for Preprocessing tutorial by @patil-suraj in https://github.com/huggingface/transformers/pull/16042
    • Fix Bug in Flax-Speech-Encoder-Decoder Test by @sanchit-gandhi in https://github.com/huggingface/transformers/pull/16041
    • Fix TFDebertaV2ConvLayer in TFDebertaV2Model by @ydshieh in https://github.com/huggingface/transformers/pull/16031
    • Build the doc in a seperate folder then move it by @sgugger in https://github.com/huggingface/transformers/pull/16020
    • Don't compute metrics in LM examples on TPU by @sgugger in https://github.com/huggingface/transformers/pull/16029
    • TF: Unpack model inputs through a decorator by @gante in https://github.com/huggingface/transformers/pull/15907
    • Fix Bug in Flax Seq2Seq Models by @sanchit-gandhi in https://github.com/huggingface/transformers/pull/16021
    • DeBERTa/DeBERTa-v2/SEW Support for torch 1.11 by @LysandreJik in https://github.com/huggingface/transformers/pull/16043
    • support new marian models by @patil-suraj in https://github.com/huggingface/transformers/pull/15831
    • Fix duplicate arguments passed to dummy inputs in ONNX export by @lewtun in https://github.com/huggingface/transformers/pull/16045
    • FIX: updating doc/example for fine-tune for downstream Token Classification by @davidsbatista in https://github.com/huggingface/transformers/pull/16063
    • Fix a TF test name (LayoutLMModelTest) by @ydshieh in https://github.com/huggingface/transformers/pull/16061
    • Move QDQBert in just PyTorch block by @sgugger in https://github.com/huggingface/transformers/pull/16062
    • Remove assertion over possible activation functions in DistilBERT by @mfuntowicz in https://github.com/huggingface/transformers/pull/16066
    • Fix torch-scatter version by @LysandreJik in https://github.com/huggingface/transformers/pull/16072
    • Add type annotations for BERT and copies by @Rocketknight1 in https://github.com/huggingface/transformers/pull/16074
    • Adding type hints for TFRoBERTa by @Rocketknight1 in https://github.com/huggingface/transformers/pull/16057
    • Make sure 'torch.dtype' has str-type value in config and all nested dicts for JSON serializability by @feifang24 in https://github.com/huggingface/transformers/pull/16065
    • Run daily doctests without time-out at least once by @patrickvonplaten in https://github.com/huggingface/transformers/pull/16077
    • Add soft length regulation for sequence generation by @kevinpl07 in https://github.com/huggingface/transformers/pull/15245
    • Update troubleshoot guide by @stevhliu in https://github.com/huggingface/transformers/pull/16001
    • Add type annotations for ImageGPT by @johnnv1 in https://github.com/huggingface/transformers/pull/16088
    • Rebuild deepspeed by @LysandreJik in https://github.com/huggingface/transformers/pull/16081
    • Add missing type hints for all flavors of RoBERTa PyTorch models. by @ChainYo in https://github.com/huggingface/transformers/pull/16086
    • [Fix doc example] FSMT by @ydshieh in https://github.com/huggingface/transformers/pull/16085
    • Audio/vision task guides by @stevhliu in https://github.com/huggingface/transformers/pull/15808
    • [ZeRO] Fixes issue with embedding resize by @jeffra in https://github.com/huggingface/transformers/pull/16093
    • [Deepspeed] add support for bf16 mode by @stas00 in https://github.com/huggingface/transformers/pull/14569
    • Change unpacking of TF Bart inputs to use decorator by @osanseviero in https://github.com/huggingface/transformers/pull/16094
    • add unpack_inputs decorator to mbart tf by @Abdelrhman-Hosny in https://github.com/huggingface/transformers/pull/16097
    • Add type annotations for segformer pytorch by @p-mishra1 in https://github.com/huggingface/transformers/pull/16099
    • Add unpack_input decorator to ViT model by @johnnv1 in https://github.com/huggingface/transformers/pull/16102
    • Add type hints to XLM model (PyTorch) by @jbrry in https://github.com/huggingface/transformers/pull/16108
    • Add missing type hints for all flavors of LayoutLMv2 PyTorch models. by @ChainYo in https://github.com/huggingface/transformers/pull/16089
    • Add TFCamembertForCausalLM and ONNX integration test by @lewtun in https://github.com/huggingface/transformers/pull/16073
    • Fix and document Zero Shot Image Classification by @osanseviero in https://github.com/huggingface/transformers/pull/16079
    • Fix Loading of Flax(Speech)EncoderDecoderModel kwargs from PreTrained Encoder-Decoder Checkpoints by @sanchit-gandhi in https://github.com/huggingface/transformers/pull/16056
    • Update convert_marian_to_pytorch.py by @jorgtied in https://github.com/huggingface/transformers/pull/16124
    • Make TF pt-tf equivalence test more aggressive by @ydshieh in https://github.com/huggingface/transformers/pull/15839
    • Fix ProphetNetTokenizer by @ydshieh in https://github.com/huggingface/transformers/pull/16082
    • Change unpacking of TF mobilebert inputs to use decorator by @vumichien in https://github.com/huggingface/transformers/pull/16110
    • Steps strategy fix for PushtoHubCallback and changed docstring by @merveenoyan in https://github.com/huggingface/transformers/pull/16138
    • [ViTMAE] Add copied from statements and fix prefix by @NielsRogge in https://github.com/huggingface/transformers/pull/16119
    • Spanish translation of the file training.mdx by @yharyarias in https://github.com/huggingface/transformers/pull/16047
    • Added missing type hints - ELECTRA PyTorch by @kamalkraj in https://github.com/huggingface/transformers/pull/16103
    • Added missing type hints - Deberta V1 and V2 by @kamalkraj in https://github.com/huggingface/transformers/pull/16105
    • [Fix doc example] Fix checkpoint name in docstring example by @ydshieh in https://github.com/huggingface/transformers/pull/16083
    • Better input variable naming for OpenAI (TF) by @bhavika in https://github.com/huggingface/transformers/pull/16129
    • Improve model variable naming - CLIP [TF] by @bhavika in https://github.com/huggingface/transformers/pull/16128
    • Add type hints for TFDistilBert by @PepijnBoers in https://github.com/huggingface/transformers/pull/16107
    • Choose framework for ONNX export by @michaelbenayoun in https://github.com/huggingface/transformers/pull/16018
    • Add type hints for Luke in PyTorch by @bhavika in https://github.com/huggingface/transformers/pull/16111
    • Add type hints for PoolFormer in Pytorch by @soomiles in https://github.com/huggingface/transformers/pull/16121
    • Add type hints for SqueezeBert PyTorch by @Tegzes in https://github.com/huggingface/transformers/pull/16126
    • Added missing type hints - ELECTRA TF by @kamalkraj in https://github.com/huggingface/transformers/pull/16104
    • Dcoker images runtime -> devel by @LysandreJik in https://github.com/huggingface/transformers/pull/16141
    • Add type annotations for CLIP (torch) (#16059) by @jacobdineen in https://github.com/huggingface/transformers/pull/16106
    • Add type hints for FNet PyTorch by @wpan03 in https://github.com/huggingface/transformers/pull/16123
    • Use HF_ENDPOINT for custom endpoints by @sgugger in https://github.com/huggingface/transformers/pull/16139
    • update albert with tf decorator by @infinite-Joy in https://github.com/huggingface/transformers/pull/16147
    • clearer model variable naming: ELECTRA by @kamalkraj in https://github.com/huggingface/transformers/pull/16143
    • Add type hints for GPTNeo PyTorch by @Tegzes in https://github.com/huggingface/transformers/pull/16127
    • Improve Swin for VisionEncoderDecoder by @NielsRogge in https://github.com/huggingface/transformers/pull/16070
    • Make transformers.utils.fx. _SUPPORTED_MODELS unique by @pbelevich in https://github.com/huggingface/transformers/pull/16015
    • Shift responsibilities a bit for issues by @patrickvonplaten in https://github.com/huggingface/transformers/pull/16154
    • typo "conaining" -> "containing" by @marxav in https://github.com/huggingface/transformers/pull/16132
    • Configurable Relative Position Max. Distance by @agemagician in https://github.com/huggingface/transformers/pull/16155
    • Added spanish translation of quicktour.mdx by @Duedme in https://github.com/huggingface/transformers/pull/16158
    • Use templates by @sgugger in https://github.com/huggingface/transformers/pull/16142
    • [Fix doc example] Fix first example for the custom_datasets tutorial by @MarkusSagen in https://github.com/huggingface/transformers/pull/16087
    • [Fix doc example] Fix 2 PyTorch Vilt docstring examples by @ydshieh in https://github.com/huggingface/transformers/pull/16076
    • TF XLA greedy generation by @Rocketknight1 in https://github.com/huggingface/transformers/pull/15786
    • clearer model variable naming: pegasus by @kamalkraj in https://github.com/huggingface/transformers/pull/16152
    • Change unpacking of TF layoutlm inputs to use decorator by @vumichien in https://github.com/huggingface/transformers/pull/16112
    • update transformer XL with tf decorator by @infinite-Joy in https://github.com/huggingface/transformers/pull/16166
    • added type hints to yoso by @mowafess in https://github.com/huggingface/transformers/pull/16163
    • Framework split by @sgugger in https://github.com/huggingface/transformers/pull/16030
    • [MT5Config] add relative_attention_max_distance in config by @patil-suraj in https://github.com/huggingface/transformers/pull/16170
    • clearer model variable naming: Tapas by @kamalkraj in https://github.com/huggingface/transformers/pull/16145
    • clearer model variable naming: Deberta by @kamalkraj in https://github.com/huggingface/transformers/pull/16146
    • Add flaubert types by @ChainYo in https://github.com/huggingface/transformers/pull/16118
    • clearer model variable naming: xlnet by @kamalkraj in https://github.com/huggingface/transformers/pull/16150
    • Add type hints for Perceiver Pytorch by @jcmc00 in https://github.com/huggingface/transformers/pull/16174
    • Add type hints for Reformer PyTorch by @Tegzes in https://github.com/huggingface/transformers/pull/16175
    • Fix some Flax models' hidden_states by @ydshieh in https://github.com/huggingface/transformers/pull/16167
    • Add the XTREME-S fine-tuning example by @anton-l in https://github.com/huggingface/transformers/pull/15985
    • [Xtreme-S] fix some namings by @patrickvonplaten in https://github.com/huggingface/transformers/pull/16183
    • Replace all deprecated jax.ops operations with jnp's at by @sanchit-gandhi in https://github.com/huggingface/transformers/pull/16078
    • clearer model variable naming: funnel by @utkusaglm in https://github.com/huggingface/transformers/pull/16178
    • clearer model variable naming: blenderbot by @utkusaglm in https://github.com/huggingface/transformers/pull/16192
    • Minor fixes to XTREME-S by @anton-l in https://github.com/huggingface/transformers/pull/16193
    • unpack_input decorator for tf_convnext by @johko in https://github.com/huggingface/transformers/pull/16181
    • clearer model variable naming: blenderbot_small by @utkusaglm in https://github.com/huggingface/transformers/pull/16194
    • Adding type hints for Distilbert by @johnryan465 in https://github.com/huggingface/transformers/pull/16090
    • ResNet: update modules names by @FrancescoSaverioZuppichini in https://github.com/huggingface/transformers/pull/16196
    • Update a CI job step name by @ydshieh in https://github.com/huggingface/transformers/pull/16189
    • Fix loading CLIPVisionConfig and CLIPTextConfig by @patil-suraj in https://github.com/huggingface/transformers/pull/16198
    • TF: add beam search tests by @gante in https://github.com/huggingface/transformers/pull/16202
    • Swin support for any input size by @FrancescoSaverioZuppichini in https://github.com/huggingface/transformers/pull/15986
    • Fix generation min length by @patrickvonplaten in https://github.com/huggingface/transformers/pull/16206
    • Add/type annotations/model vision by @johnnv1 in https://github.com/huggingface/transformers/pull/16151
    • VAN: update modules names by @FrancescoSaverioZuppichini in https://github.com/huggingface/transformers/pull/16201
    • Fixes Loss for TransfoXL when using Trainer API v2 by @LysandreJik in https://github.com/huggingface/transformers/pull/16140
    • [Tests] Fix DiT test by @NielsRogge in https://github.com/huggingface/transformers/pull/16218
    • Fix FlaxRoFormerClassificationHead activation by @ydshieh in https://github.com/huggingface/transformers/pull/16168
    • Fix typos in docstrings of data_collator.py by @daysm in https://github.com/huggingface/transformers/pull/16208
    • Fix reproducibility in Training for PyTorch 1.11 by @sgugger in https://github.com/huggingface/transformers/pull/16209
    • Fix readmes by @qqaatw in https://github.com/huggingface/transformers/pull/16217
    • MaskFormer: fix device on test by @FrancescoSaverioZuppichini in https://github.com/huggingface/transformers/pull/16219
    • Adding Unpack Decorator For DPR model by @forsc in https://github.com/huggingface/transformers/pull/16212
    • Skip equivalence test for TransfoXL by @LysandreJik in https://github.com/huggingface/transformers/pull/16224
    • Fix Type Hint of Nan/Inf Logging Filter Arg by @Sophylax in https://github.com/huggingface/transformers/pull/16227
    • [Flax] remove jax.ops.index by @patil-suraj in https://github.com/huggingface/transformers/pull/16220
    • Support PEP 563 for HfArgumentParser by @function2-llx in https://github.com/huggingface/transformers/pull/15795
    • add unpack_inputs decorator for marian by @johko in https://github.com/huggingface/transformers/pull/16226
    • fix(flax): generate with logits processor/warper by @borisdayma in https://github.com/huggingface/transformers/pull/16231
    • [FlaxSpeechEncoderDecoderModel] Skip from_encoder_decoder_pretrained by @patil-suraj in https://github.com/huggingface/transformers/pull/16236
    • [Generate Docs] Correct docs by @patrickvonplaten in https://github.com/huggingface/transformers/pull/16133
    • [Deepspeed] non-HF Trainer doc update by @stas00 in https://github.com/huggingface/transformers/pull/16238
    • integrations: mlflow: skip start_run() if a run is already active and sanity check on enabling integration by @ktzsh in https://github.com/huggingface/transformers/pull/16131
    • Update expected slices for pillow > 9 by @NielsRogge in https://github.com/huggingface/transformers/pull/16117
    • Attention mask is important in the case of batching... by @Narsil in https://github.com/huggingface/transformers/pull/16222
    • Change assertion to warning when passing past_key_value to T5 encoder by @ZhaofengWu in https://github.com/huggingface/transformers/pull/16153
    • Override _pad in LEDTokenizer to deal with global_attention_mask by @ydshieh in https://github.com/huggingface/transformers/pull/15940
    • Update XLM with TF decorator by @louisowen6 in https://github.com/huggingface/transformers/pull/16247
    • Add unpack_inputs decorator for ctrl by @johko in https://github.com/huggingface/transformers/pull/16242
    • update jax version and re-enable some tests by @patil-suraj in https://github.com/huggingface/transformers/pull/16254
    • [Constrained Beam Search] Adding Notebook Example & Minor Typo Fix by @cwkeam in https://github.com/huggingface/transformers/pull/16246
    • value check for typical sampling by @cimeister in https://github.com/huggingface/transformers/pull/16165
    • Make Flax pt-flax equivalence test more aggressive by @ydshieh in https://github.com/huggingface/transformers/pull/15841
    • Aggressive PT/TF equivalence test on PT side by @ydshieh in https://github.com/huggingface/transformers/pull/16250
    • Update flaubert with TF decorator by @Tegzes in https://github.com/huggingface/transformers/pull/16258
    • Fix links in guides by @stevhliu in https://github.com/huggingface/transformers/pull/16182
    • Small fixes to the documentation by @sgugger in https://github.com/huggingface/transformers/pull/16180
    • [WIP] add has_attentions as done in PyTorch side by @ydshieh in https://github.com/huggingface/transformers/pull/16259
    • Make add-new-model-like work in an env without all frameworks by @sgugger in https://github.com/huggingface/transformers/pull/16239
    • Deberta v2 code simplification by @guillaume-be in https://github.com/huggingface/transformers/pull/15732
    • Add Slack notification support for doc tests by @patrickvonplaten in https://github.com/huggingface/transformers/pull/16253
    • Framework split for Spanish version of doc quicktour.mdx by @omarespejel in https://github.com/huggingface/transformers/pull/16215
    • Removed the 'optional' string (in DETR post_process) by @dinesh-GDK in https://github.com/huggingface/transformers/pull/16266
    • Draft a guide with our code quirks for new models by @sgugger in https://github.com/huggingface/transformers/pull/16237
    • Fixed Error Raised Due to Wrongly Accessing Training Sample by @aflah02 in https://github.com/huggingface/transformers/pull/16115
    • Fix XGLM cross attention by @patil-suraj in https://github.com/huggingface/transformers/pull/16290
    • Fix a typo (add a coma) by @PolarisRisingWar in https://github.com/huggingface/transformers/pull/16291
    • Add type hints to xlnet by @mowafess in https://github.com/huggingface/transformers/pull/16214
    • Remove disclaimer from Longformer docs by @gchhablani in https://github.com/huggingface/transformers/pull/16296
    • Add argument "cache_dir" for transformers.onnx by @happyXia in https://github.com/huggingface/transformers/pull/16284
    • Add type hints transfoxl by @jcmc00 in https://github.com/huggingface/transformers/pull/16267
    • added type hints for BART model by @robotjellyzone in https://github.com/huggingface/transformers/pull/16270
    • ResNet & VAN: Fixed code sample tests by @FrancescoSaverioZuppichini in https://github.com/huggingface/transformers/pull/16294
    • GPT2 TensorFlow Type Hints by @cakiki in https://github.com/huggingface/transformers/pull/16261
    • Added type hints for PyTorch T5 model by @yhl48 in https://github.com/huggingface/transformers/pull/16257
    • Fix Marian conversion script by @patil-suraj in https://github.com/huggingface/transformers/pull/16300
    • [SegFormer] Remove unused attributes by @NielsRogge in https://github.com/huggingface/transformers/pull/16285
    • Update troubleshoot with more content by @stevhliu in https://github.com/huggingface/transformers/pull/16243
    • fix last element in hidden_states for XGLM by @ydshieh in https://github.com/huggingface/transformers/pull/16301
    • [FlaxGPTJ] Fix bug in rotary embeddings by @patil-suraj in https://github.com/huggingface/transformers/pull/16298
    • Add missing type hints for PyTorch Longformer models by @johnnygreco in https://github.com/huggingface/transformers/pull/16244
    • Fix Seq2SeqTrainingArguments docs by @gchhablani in https://github.com/huggingface/transformers/pull/16295
    • [xtreme-s] Update Minds14 results by @anton-l in https://github.com/huggingface/transformers/pull/16241
    • added type hints for blenderbot and blenderbot_small (v2) by @IvanLauLinTiong in https://github.com/huggingface/transformers/pull/16307
    • Update Makefile Phonies by @gchhablani in https://github.com/huggingface/transformers/pull/16306
    • TF - update (vision_)encoder_decoder past variable by @gante in https://github.com/huggingface/transformers/pull/16260
    • Add Flaubert OnnxConfig to Transformers by @ChainYo in https://github.com/huggingface/transformers/pull/16279
    • TFLongformer: Add missing type hints and unpack inputs decorator by @johnnygreco in https://github.com/huggingface/transformers/pull/16228
    • add xglm conversion script by @patil-suraj in https://github.com/huggingface/transformers/pull/16305
    • Fix bugs of s2t fairseq model converting by @beomseok-lee in https://github.com/huggingface/transformers/pull/15593
    • Add type hints for Pegasus model (PyTorch) by @Tegzes in https://github.com/huggingface/transformers/pull/16324
    • Funnel type hints by @AMontgomerie in https://github.com/huggingface/transformers/pull/16323
    • Add type hints for ProphetNet PyTorch by @Tegzes in https://github.com/huggingface/transformers/pull/16272
    • [GLPN] Improve docs by @NielsRogge in https://github.com/huggingface/transformers/pull/16331
    • Added type hints for Pytorch Marian calls by @clefourrier in https://github.com/huggingface/transformers/pull/16200
    • VAN: Code sample tests by @FrancescoSaverioZuppichini in https://github.com/huggingface/transformers/pull/16340
    • Add type annotations for Rembert/Splinter and copies by @jacobdineen in https://github.com/huggingface/transformers/pull/16338
    • [Bug template] Shift responsibilities for long-range by @patrickvonplaten in https://github.com/huggingface/transformers/pull/16344
    • Fix code repetition in serialization guide by @osanseviero in https://github.com/huggingface/transformers/pull/16346
    • Adopt framework-specific blocks for content by @stevhliu in https://github.com/huggingface/transformers/pull/16342
    • Updates the default branch from master to main by @LysandreJik in https://github.com/huggingface/transformers/pull/16326
    • [T5] Add t5 download script by @patrickvonplaten in https://github.com/huggingface/transformers/pull/16328
    • Reorganize file utils by @sgugger in https://github.com/huggingface/transformers/pull/16264
    • [FlaxBart] make sure no grads are computed an bias by @patrickvonplaten in https://github.com/huggingface/transformers/pull/16345
    • Trainer evaluation delay by @OllieBroadhurst in https://github.com/huggingface/transformers/pull/16356
    • Adding missing type hints for mBART model (TF) by @reichenbch in https://github.com/huggingface/transformers/pull/16281
    • Add type annotations of config for vision models by @johnnv1 in https://github.com/huggingface/transformers/pull/16263
    • TF - Fix interchangeable past/past_key_values and revert output variable name in GPT2 by @gante in https://github.com/huggingface/transformers/pull/16332
    • Swap inequalities by @OllieBroadhurst in https://github.com/huggingface/transformers/pull/16368
    • Make Transformers use cache files when hf.co is down by @sgugger in https://github.com/huggingface/transformers/pull/16362
    • Decision transformer gym by @edbeeching in https://github.com/huggingface/transformers/pull/15845
    • add GPT-J ONNX config to Transformers by @ChainYo in https://github.com/huggingface/transformers/pull/16274
    • Update docs/README.md by @ydshieh in https://github.com/huggingface/transformers/pull/16333
    • Make BigBird model compatiable to fp16 dtype. by @xuzhao9 in https://github.com/huggingface/transformers/pull/16034
    • [Doctests] Make roberta-like meaningfull by @patrickvonplaten in https://github.com/huggingface/transformers/pull/16363
    • [Doctests] Make TFRoberta-like meaningfull by @ydshieh in https://github.com/huggingface/transformers/pull/16370
    • Update readme with how to train offline and fix BPE command by @ncoop57 in https://github.com/huggingface/transformers/pull/15897
    • Fix BigBirdModelTester by @ydshieh in https://github.com/huggingface/transformers/pull/16310
    • Type hints and decorator for TF T5 by @Dahlbomii in https://github.com/huggingface/transformers/pull/16376
    • Add type hints for ConvBert model by @simonzli in https://github.com/huggingface/transformers/pull/16377
    • Update pt flax equivalence tests in pt by @ydshieh in https://github.com/huggingface/transformers/pull/16280
    • Bump cookiecutter version by @ydshieh in https://github.com/huggingface/transformers/pull/16387
    • Fix style by @LysandreJik in https://github.com/huggingface/transformers/pull/16391
    • Fix readme links and add CI check by @sgugger in https://github.com/huggingface/transformers/pull/16392
    • variable naming for Distilbert model by @robotjellyzone in https://github.com/huggingface/transformers/pull/16384
    • Added type hints by @yhl48 in https://github.com/huggingface/transformers/pull/16389
    • Rename semantic segmentation outputs by @NielsRogge in https://github.com/huggingface/transformers/pull/15849
    • Make FeaturesManager.get_model_from_feature a static method by @michaelbenayoun in https://github.com/huggingface/transformers/pull/16357
    • Big file_utils cleanup by @sgugger in https://github.com/huggingface/transformers/pull/16396
    • fixed typo from enable to disable in disable_progress_bar function by @Gladiator07 in https://github.com/huggingface/transformers/pull/16406
    • Rename master to main for notebooks links and leftovers by @sgugger in https://github.com/huggingface/transformers/pull/16397
    • TF PushToHubCallback fixes and updates by @Rocketknight1 in https://github.com/huggingface/transformers/pull/16409
    • Add ONNX support for Blenderbot and BlenderbotSmall by @lewtun in https://github.com/huggingface/transformers/pull/15875
    • [FlaxSpeechEncoderDecoder] Fix feature extractor gradient test by @sanchit-gandhi in https://github.com/huggingface/transformers/pull/16407
    • Fix Typo in Argument of FlaxWav2Vec2ForPreTrainingModule by @sanchit-gandhi in https://github.com/huggingface/transformers/pull/16084
    • Removed inputs_processing and replaced with decorator for lxmert by @silvererudite in https://github.com/huggingface/transformers/pull/16414
    • remove references to PDF reading via PIL by @garfieldnate in https://github.com/huggingface/transformers/pull/15293
    • Update comments in class BatchEncoding by @basicv8vc in https://github.com/huggingface/transformers/pull/15932
    • Fix broken links by @kurianbenoy in https://github.com/huggingface/transformers/pull/16113
    • cached_download ∘ hf_hub_url is hf_hub_download by @julien-c in https://github.com/huggingface/transformers/pull/16375
    • QDQBert example update by @shangz-ai in https://github.com/huggingface/transformers/pull/16395
    • [Flax] Improve Robustness of Back-Prop Tests by @sanchit-gandhi in https://github.com/huggingface/transformers/pull/16418
    • Fix typo in language modeling example comment by @dreamgonfly in https://github.com/huggingface/transformers/pull/16421
    • Use doc builder styler by @sgugger in https://github.com/huggingface/transformers/pull/16412
    • Fix PerceiverMLP and test by @jaesuny in https://github.com/huggingface/transformers/pull/16405
    • [FlaxSpeechEncoderDecoderModel] Ensure Input and Output Word Embeddings Are Not Tied by @sanchit-gandhi in https://github.com/huggingface/transformers/pull/16444
    • Translation from english to spanish of file pipeline_tutorial.mdx by @FernandoLpz in https://github.com/huggingface/transformers/pull/16149
    • Remove kwargs argument from IBERT MLM forward pass by @lewtun in https://github.com/huggingface/transformers/pull/16449
    • Fix blenderbot conversion script by @patil-suraj in https://github.com/huggingface/transformers/pull/16472
    • Adding DocTest to TrOCR by @arnaudstiegler in https://github.com/huggingface/transformers/pull/16398
    • [MNLI example] Prevent overwriting matched with mismatched metrics by @eldarkurtic in https://github.com/huggingface/transformers/pull/16475
    • Remove duplicate mLuke by @stevhliu in https://github.com/huggingface/transformers/pull/16460
    • Fix missing output_attentions in PT/Flax equivalence test by @ydshieh in https://github.com/huggingface/transformers/pull/16271
    • Fix some TF GPT-J CI testings by @ydshieh in https://github.com/huggingface/transformers/pull/16454
    • Fix example test and test_fetcher for examples by @sgugger in https://github.com/huggingface/transformers/pull/16478
    • fix wrong variable name by @wesleyacheng in https://github.com/huggingface/transformers/pull/16467
    • Add TF vision model code samples by @ydshieh in https://github.com/huggingface/transformers/pull/16477
    • missing trainer import by @wesleyacheng in https://github.com/huggingface/transformers/pull/16469
    • Add type hints for UniSpeech by @Tegzes in https://github.com/huggingface/transformers/pull/16399
    • TF: properly handle kwargs in encoder_decoder architectures by @gante in https://github.com/huggingface/transformers/pull/16465
    • added typehints for RAG pytorch models by @akashe in https://github.com/huggingface/transformers/pull/16416
    • Avoid accessing .dataset of a DataLoader in Trainer by @sanderland in https://github.com/huggingface/transformers/pull/16451
    • TF GPT2: clearer model variable naming with @unpack_inputs by @cakiki in https://github.com/huggingface/transformers/pull/16311
    • Raise diff tolerance value for TFViTMAEModelTest by @ydshieh in https://github.com/huggingface/transformers/pull/16483
    • Do not initialize torch.distributed process group if one is already initailized by @Yard1 in https://github.com/huggingface/transformers/pull/16487
    • TF GPT-J Type hints and TF decorator by @Dahlbomii in https://github.com/huggingface/transformers/pull/16488
    • Nit: MCSCOCO -> MSCOCO by @AdityaKane2001 in https://github.com/huggingface/transformers/pull/16481
    • Add length to PreTrainedTokenizer train_new_from_iterator by @dctelus in https://github.com/huggingface/transformers/pull/16493
    • Add support for exporting GPT-J to ONNX-TRT by @tomerip in https://github.com/huggingface/transformers/pull/16492
    • TF: unpack inputs on Convbert, GPTJ, LED, and templates by @gante in https://github.com/huggingface/transformers/pull/16491
    • Feature Extractor accepts segmentation_maps by @FrancescoSaverioZuppichini in https://github.com/huggingface/transformers/pull/15964
    • [examples] max samples can't be bigger than the len of dataset by @stas00 in https://github.com/huggingface/transformers/pull/16501
    • update smddp api to v1.4.0 by @roywei in https://github.com/huggingface/transformers/pull/16371
    • Support reduce_bucket_size="auto" for deepspeed stages <3 by @manuelciosici in https://github.com/huggingface/transformers/pull/16496
    • Modeling Outputs by @FrancescoSaverioZuppichini in https://github.com/huggingface/transformers/pull/16341
    • make tuple annotation more specific to avoid failures during symbolic_trace by @chenbohua3 in https://github.com/huggingface/transformers/pull/16490
    • Spanish translation of the file multilingual.mdx by @SimplyJuanjo in https://github.com/huggingface/transformers/pull/16329
    • Translate installation.mdx to Spanish by @lilianabs in https://github.com/huggingface/transformers/pull/16229
    • Translate accelerate.mdx from english to spanish by @Sangohe in https://github.com/huggingface/transformers/pull/16176
    • [Typo][Example] Fixed a typo in run_qa_no_trainer.py by @bhadreshpsavani in https://github.com/huggingface/transformers/pull/16508
    • added type hints to xglm pytorch by @mowafess in https://github.com/huggingface/transformers/pull/16500
    • Fix syntax error in generate docstrings by @sgugger in https://github.com/huggingface/transformers/pull/16516
    • [research] link to the XTREME-S paper by @anton-l in https://github.com/huggingface/transformers/pull/16519
    • Fixed a typo in seq2seq_trainer.py by @Agoniii in https://github.com/huggingface/transformers/pull/16531
    • Add ONNX export for BeiT by @akuma12 in https://github.com/huggingface/transformers/pull/16498
    • call on_train_end when optuna trial is pruned by @fschlatt in https://github.com/huggingface/transformers/pull/16536
    • Type hints added to OpenAIGPT by @Dahlbomii in https://github.com/huggingface/transformers/pull/16529
    • Fix Bart type hints by @gchhablani in https://github.com/huggingface/transformers/pull/16297
    • Add VisualBert type hints by @gchhablani in https://github.com/huggingface/transformers/pull/16544
    • Adding missing type hints for mBART model (PyTorch) by @reichenbch in https://github.com/huggingface/transformers/pull/16429
    • Remove MBart subclass of XLMRoberta in tokenzier docs by @gchhablani in https://github.com/huggingface/transformers/pull/16546
    • Use random_attention_mask for TF tests by @ydshieh in https://github.com/huggingface/transformers/pull/16517
    • [GLPN] Improve code example by @NielsRogge in https://github.com/huggingface/transformers/pull/16450
    • Pin tokenizers version <0.13 by @LysandreJik in https://github.com/huggingface/transformers/pull/16539
    • add code samples for TF speech models by @ydshieh in https://github.com/huggingface/transformers/pull/16494
    • [FlaxSpeechEncoderDecoder] Fix dtype bug by @patrickvonplaten in https://github.com/huggingface/transformers/pull/16581
    • Making the impossible to connect error actually report the right URL. by @Narsil in https://github.com/huggingface/transformers/pull/16446
    • Fix flax import in __init__.py: modeling_xglm -> modeling_flax_xglm by @stancld in https://github.com/huggingface/transformers/pull/16556
    • Add utility to find model labels by @sgugger in https://github.com/huggingface/transformers/pull/16526
    • Enable doc in Spanish by @sgugger in https://github.com/huggingface/transformers/pull/16518
    • Add use_auth to load_datasets for private datasets to PT and TF examples by @KMFODA in https://github.com/huggingface/transformers/pull/16521
    • add a test checking the format of convert_tokens_to_string's output by @SaulLu in https://github.com/huggingface/transformers/pull/16540
    • TF: Finalize unpack_inputs-related changes by @gante in https://github.com/huggingface/transformers/pull/16499
    • [SpeechEncoderDecoderModel] Correct Encoder Last Hidden State Output by @sanchit-gandhi in https://github.com/huggingface/transformers/pull/16586
    • initialize the default rank set on TrainerState by @andrescodas in https://github.com/huggingface/transformers/pull/16530
    • Fix CI: test_inference_for_pretraining in ViTMAEModelTest by @ydshieh in https://github.com/huggingface/transformers/pull/16591
    • add a template to add missing tokenization test by @SaulLu in https://github.com/huggingface/transformers/pull/16553
    • PretrainedModel: made _load_pretrained_model_low_mem static + bug fix by @FrancescoSaverioZuppichini in https://github.com/huggingface/transformers/pull/16548
    • handle torch_dtype in low cpu mem usage by @patil-suraj in https://github.com/huggingface/transformers/pull/16580
    • [Doctests] Correct filenaming by @patrickvonplaten in https://github.com/huggingface/transformers/pull/16599
    • Adding new train_step logic to make things less confusing for users by @Rocketknight1 in https://github.com/huggingface/transformers/pull/15994
    • Adding missing type hints for BigBird model by @reichenbch in https://github.com/huggingface/transformers/pull/16555
    • [deepspeed] fix typo, adjust config name by @stas00 in https://github.com/huggingface/transformers/pull/16597
    • Add global_attention_mask to gen_kwargs in Seq2SeqTrainer.prediction_step by @JohnGiorgi in https://github.com/huggingface/transformers/pull/16485
    • [benchmark tool] trainer-benchmark.py by @stas00 in https://github.com/huggingface/transformers/pull/14934
    • Update summary of the tasks by @stevhliu in https://github.com/huggingface/transformers/pull/16528
    • added type hints to CTRL pytorch by @anmolsjoshi in https://github.com/huggingface/transformers/pull/16593
    • fix default num_attention_heads in segformer doc by @JunMa11 in https://github.com/huggingface/transformers/pull/16612
    • [Docs] Correct quicktour minds14 dataset by @patrickvonplaten in https://github.com/huggingface/transformers/pull/16626
    • Fix seq2seq doc tests by @patil-suraj in https://github.com/huggingface/transformers/pull/16606
    • don't load state_dict twice when using low_cpu_mem_usage in from_pretrained by @patil-suraj in https://github.com/huggingface/transformers/pull/16602
    • Use CLIP model config to set some kwargs for components by @ydshieh in https://github.com/huggingface/transformers/pull/16609
    • [modeling_utils] typo by @stas00 in https://github.com/huggingface/transformers/pull/16621
    • [Speech2Text Doc] Fix docs by @patrickvonplaten in https://github.com/huggingface/transformers/pull/16611
    • [FlaxSpeechEncoderDecoderModel] More Rigorous PT-Flax Equivalence Tests by @sanchit-gandhi in https://github.com/huggingface/transformers/pull/16589
    • Fix TFTransfoXLLMHeadModel outputs by @ydshieh in https://github.com/huggingface/transformers/pull/16590

    Impressive community contributors

    The community contributors below have significantly contributed to the v4.18.0 release. Thank you!

    @sayakpaul, for contributing the TensorFlow version of ViTMAE @stancld, for contributing the TensorFlow version of of GPT-J

    New Contributors

    • @Soonhwan-Kwon made their first contribution in https://github.com/huggingface/transformers/pull/13727
    • @jonatasgrosman made their first contribution in https://github.com/huggingface/transformers/pull/15428
    • @ToluClassics made their first contribution in https://github.com/huggingface/transformers/pull/15432
    • @peregilk made their first contribution in https://github.com/huggingface/transformers/pull/15423
    • @bugface made their first contribution in https://github.com/huggingface/transformers/pull/15480
    • @AyushExel made their first contribution in https://github.com/huggingface/transformers/pull/14582
    • @thinksoso made their first contribution in https://github.com/huggingface/transformers/pull/15403
    • @davidleonfdez made their first contribution in https://github.com/huggingface/transformers/pull/15473
    • @sanchit-gandhi made their first contribution in https://github.com/huggingface/transformers/pull/15519
    • @arron1227 made their first contribution in https://github.com/huggingface/transformers/pull/15084
    • @cimeister made their first contribution in https://github.com/huggingface/transformers/pull/15504
    • @cwkeam made their first contribution in https://github.com/huggingface/transformers/pull/15416
    • @Albertobegue made their first contribution in https://github.com/huggingface/transformers/pull/13831
    • @derenrich made their first contribution in https://github.com/huggingface/transformers/pull/15614
    • @tkukurin made their first contribution in https://github.com/huggingface/transformers/pull/15636
    • @muzhi1991 made their first contribution in https://github.com/huggingface/transformers/pull/15638
    • @versae made their first contribution in https://github.com/huggingface/transformers/pull/15590
    • @jonrbates made their first contribution in https://github.com/huggingface/transformers/pull/15617
    • @arampacha made their first contribution in https://github.com/huggingface/transformers/pull/15413
    • @FrancescoSaverioZuppichini made their first contribution in https://github.com/huggingface/transformers/pull/15657
    • @coyotte508 made their first contribution in https://github.com/huggingface/transformers/pull/15680
    • @heytanay made their first contribution in https://github.com/huggingface/transformers/pull/15531
    • @gautierdag made their first contribution in https://github.com/huggingface/transformers/pull/15702
    • @SSardorf made their first contribution in https://github.com/huggingface/transformers/pull/15741
    • @Crabzmatic made their first contribution in https://github.com/huggingface/transformers/pull/15740
    • @dreamgonfly made their first contribution in https://github.com/huggingface/transformers/pull/15644
    • @lsb made their first contribution in https://github.com/huggingface/transformers/pull/15468
    • @pbelevich made their first contribution in https://github.com/huggingface/transformers/pull/15776
    • @sayakpaul made their first contribution in https://github.com/huggingface/transformers/pull/15750
    • @rahul003 made their first contribution in https://github.com/huggingface/transformers/pull/15877
    • @rhjohnstone made their first contribution in https://github.com/huggingface/transformers/pull/15884
    • @cosmoquester made their first contribution in https://github.com/huggingface/transformers/pull/15913
    • @konstantinjdobler made their first contribution in https://github.com/huggingface/transformers/pull/15951
    • @yhavinga made their first contribution in https://github.com/huggingface/transformers/pull/15963
    • @dlwh made their first contribution in https://github.com/huggingface/transformers/pull/15961
    • @basilevh made their first contribution in https://github.com/huggingface/transformers/pull/15972
    • @andstor made their first contribution in https://github.com/huggingface/transformers/pull/16033
    • @davidsbatista made their first contribution in https://github.com/huggingface/transformers/pull/16063
    • @feifang24 made their first contribution in https://github.com/huggingface/transformers/pull/16065
    • @kevinpl07 made their first contribution in https://github.com/huggingface/transformers/pull/15245
    • @johnnv1 made their first contribution in https://github.com/huggingface/transformers/pull/16088
    • @Abdelrhman-Hosny made their first contribution in https://github.com/huggingface/transformers/pull/16097
    • @p-mishra1 made their first contribution in https://github.com/huggingface/transformers/pull/16099
    • @jbrry made their first contribution in https://github.com/huggingface/transformers/pull/16108
    • @jorgtied made their first contribution in https://github.com/huggingface/transformers/pull/16124
    • @vumichien made their first contribution in https://github.com/huggingface/transformers/pull/16110
    • @merveenoyan made their first contribution in https://github.com/huggingface/transformers/pull/16138
    • @yharyarias made their first contribution in https://github.com/huggingface/transformers/pull/16047
    • @bhavika made their first contribution in https://github.com/huggingface/transformers/pull/16129
    • @PepijnBoers made their first contribution in https://github.com/huggingface/transformers/pull/16107
    • @soomiles made their first contribution in https://github.com/huggingface/transformers/pull/16121
    • @Tegzes made their first contribution in https://github.com/huggingface/transformers/pull/16126
    • @jacobdineen made their first contribution in https://github.com/huggingface/transformers/pull/16106
    • @wpan03 made their first contribution in https://github.com/huggingface/transformers/pull/16123
    • @infinite-Joy made their first contribution in https://github.com/huggingface/transformers/pull/16147
    • @marxav made their first contribution in https://github.com/huggingface/transformers/pull/16132
    • @Duedme made their first contribution in https://github.com/huggingface/transformers/pull/16158
    • @MarkusSagen made their first contribution in https://github.com/huggingface/transformers/pull/16087
    • @mowafess made their first contribution in https://github.com/huggingface/transformers/pull/16163
    • @jcmc00 made their first contribution in https://github.com/huggingface/transformers/pull/16174
    • @utkusaglm made their first contribution in https://github.com/huggingface/transformers/pull/16178
    • @johko made their first contribution in https://github.com/huggingface/transformers/pull/16181
    • @johnryan465 made their first contribution in https://github.com/huggingface/transformers/pull/16090
    • @daysm made their first contribution in https://github.com/huggingface/transformers/pull/16208
    • @forsc made their first contribution in https://github.com/huggingface/transformers/pull/16212
    • @Sophylax made their first contribution in https://github.com/huggingface/transformers/pull/16227
    • @function2-llx made their first contribution in https://github.com/huggingface/transformers/pull/15795
    • @ktzsh made their first contribution in https://github.com/huggingface/transformers/pull/16131
    • @louisowen6 made their first contribution in https://github.com/huggingface/transformers/pull/16247
    • @omarespejel made their first contribution in https://github.com/huggingface/transformers/pull/16215
    • @dinesh-GDK made their first contribution in https://github.com/huggingface/transformers/pull/16266
    • @aflah02 made their first contribution in https://github.com/huggingface/transformers/pull/16115
    • @PolarisRisingWar made their first contribution in https://github.com/huggingface/transformers/pull/16291
    • @happyXia made their first contribution in https://github.com/huggingface/transformers/pull/16284
    • @robotjellyzone made their first contribution in https://github.com/huggingface/transformers/pull/16270
    • @yhl48 made their first contribution in https://github.com/huggingface/transformers/pull/16257
    • @johnnygreco made their first contribution in https://github.com/huggingface/transformers/pull/16244
    • @IvanLauLinTiong made their first contribution in https://github.com/huggingface/transformers/pull/16307
    • @beomseok-lee made their first contribution in https://github.com/huggingface/transformers/pull/15593
    • @clefourrier made their first contribution in https://github.com/huggingface/transformers/pull/16200
    • @OllieBroadhurst made their first contribution in https://github.com/huggingface/transformers/pull/16356
    • @reichenbch made their first contribution in https://github.com/huggingface/transformers/pull/16281
    • @edbeeching made their first contribution in https://github.com/huggingface/transformers/pull/15845
    • @xuzhao9 made their first contribution in https://github.com/huggingface/transformers/pull/16034
    • @Dahlbomii made their first contribution in https://github.com/huggingface/transformers/pull/16376
    • @simonzli made their first contribution in https://github.com/huggingface/transformers/pull/16377
    • @Gladiator07 made their first contribution in https://github.com/huggingface/transformers/pull/16406
    • @silvererudite made their first contribution in https://github.com/huggingface/transformers/pull/16414
    • @garfieldnate made their first contribution in https://github.com/huggingface/transformers/pull/15293
    • @basicv8vc made their first contribution in https://github.com/huggingface/transformers/pull/15932
    • @kurianbenoy made their first contribution in https://github.com/huggingface/transformers/pull/16113
    • @jaesuny made their first contribution in https://github.com/huggingface/transformers/pull/16405
    • @FernandoLpz made their first contribution in https://github.com/huggingface/transformers/pull/16149
    • @arnaudstiegler made their first contribution in https://github.com/huggingface/transformers/pull/16398
    • @wesleyacheng made their first contribution in https://github.com/huggingface/transformers/pull/16467
    • @akashe made their first contribution in https://github.com/huggingface/transformers/pull/16416
    • @sanderland made their first contribution in https://github.com/huggingface/transformers/pull/16451
    • @AdityaKane2001 made their first contribution in https://github.com/huggingface/transformers/pull/16481
    • @dctelus made their first contribution in https://github.com/huggingface/transformers/pull/16493
    • @tomerip made their first contribution in https://github.com/huggingface/transformers/pull/16492
    • @roywei made their first contribution in https://github.com/huggingface/transformers/pull/16371
    • @chenbohua3 made their first contribution in https://github.com/huggingface/transformers/pull/16490
    • @SimplyJuanjo made their first contribution in https://github.com/huggingface/transformers/pull/16329
    • @lilianabs made their first contribution in https://github.com/huggingface/transformers/pull/16229
    • @Sangohe made their first contribution in https://github.com/huggingface/transformers/pull/16176
    • @Agoniii made their first contribution in https://github.com/huggingface/transformers/pull/16531
    • @akuma12 made their first contribution in https://github.com/huggingface/transformers/pull/16498
    • @fschlatt made their first contribution in https://github.com/huggingface/transformers/pull/16536
    • @KMFODA made their first contribution in https://github.com/huggingface/transformers/pull/16521
    • @andrescodas made their first contribution in https://github.com/huggingface/transformers/pull/16530
    • @JohnGiorgi made their first contribution in https://github.com/huggingface/transformers/pull/16485
    • @JunMa11 made their first contribution in https://github.com/huggingface/transformers/pull/16612

    Full Changelog: https://github.com/huggingface/transformers/compare/v4.17.0...v4.18.0

    Source code(tar.gz)
    Source code(zip)
  • v4.17.0(Mar 3, 2022)

    New models

    XGLM

    The XGLM model was proposed in Few-shot Learning with Multilingual Language Models by Xi Victoria Lin, Todor Mihaylov, Mikel Artetxe, Tianlu Wang, Shuohui Chen, Daniel Simig, Myle Ott, Naman Goyal, Shruti Bhosale, Jingfei Du, Ramakanth Pasunuru, Sam Shleifer, Punit Singh Koura, Vishrav Chaudhary, Brian O’Horo, Jeff Wang, Luke Zettlemoyer, Zornitsa Kozareva, Mona Diab, Veselin Stoyanov, Xian Li.

    XGLM is a GPT3-like multilingual model trained on a balanced corpus covering a diverse set of languages.

    • Add XGLM models by @patil-suraj in https://github.com/huggingface/transformers/pull/14876

    ConvNext

    The ConvNeXT model was proposed in A ConvNet for the 2020s by Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Darrell, Saining Xie.

    ConvNeXT is a pure convolutional model (ConvNet), inspired by the design of Vision Transformers, that claims to outperform them.

    • Add ConvNeXT by @NielsRogge in https://github.com/huggingface/transformers/pull/15277
    • Add TFConvNextModel by @sayakpaul in https://github.com/huggingface/transformers/pull/15750

    PoolFormer

    The PoolFormer model was proposed in MetaFormer is Actually What You Need for Vision by Sea AI Labs.

    • Add PoolFormer by @heytanay in https://github.com/huggingface/transformers/pull/15531

    PLBart

    The PLBART model was proposed in Unified Pre-training for Program Understanding and Generation by Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, Kai-Wei Chang.

    This is a BART-like model which can be used to perform code-summarization, code-generation, and code-translation tasks. The pre-trained model plbart-base has been trained using multilingual denoising task on Java, Python and English.

    • Add PLBart by @gchhablani in https://github.com/huggingface/transformers/pull/13269
    • Add missing PLBart entry in README by @gchhablani in https://github.com/huggingface/transformers/pull/15721

    Data2Vec

    The Data2Vec model was proposed in data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language by Alexei Baevski, Wei-Ning Hsu, Qiantong Xu, Arun Babu, Jiatao Gu and Michael Auli.

    Data2Vec proposes a unified framework for self-supervised learning across different data modalities - text, audio and images. Importantly, predicted targets for pre-training are contextualized latent representations of the inputs, rather than modality-specific, context-independent targets.

    • Add Data2Vec by @edugp in https://github.com/huggingface/transformers/pull/15507

    Maskformer

    The MaskFormer model was proposed in Per-Pixel Classification is Not All You Need for Semantic Segmentation by Bowen Cheng, Alexander G. Schwing, Alexander Kirillov.

    MaskFormer addresses semantic segmentation with a mask classification paradigm instead of performing classic pixel-level classification.

    • Maskformer by @FrancescoSaverioZuppichini in https://github.com/huggingface/transformers/pull/15682

    Code in the Hub

    This is a new experimental feature added to the library. It allows you to share a custom model (with configuration, tokenizer, feature extractor, processor) with anyone through the Model Hub while still using the Auto-classes API of the Transformers library.

    See the documentation for more information!

    • Allow relative imports in dynamic code by @sgugger in https://github.com/huggingface/transformers/pull/15352
    • Save code of registered custom models by @sgugger in https://github.com/huggingface/transformers/pull/15379

    Documentation

    We are working on updating the existing guides in the documentation, and writing more!

    • Update model share tutorial by @stevhliu in https://github.com/huggingface/transformers/pull/15288
    • Get started docs by @stevhliu in https://github.com/huggingface/transformers/pull/15098
    • Update fine-tune docs by @stevhliu in https://github.com/huggingface/transformers/pull/15259
    • Update tutorial docs by @stevhliu in https://github.com/huggingface/transformers/pull/15165
    • Create a custom model guide by @stevhliu in https://github.com/huggingface/transformers/pull/15489
    • 🧼 NLP task guides by @stevhliu in https://github.com/huggingface/transformers/pull/15731
    • Inference for multilingual models by @stevhliu in https://github.com/huggingface/transformers/pull/15836

    Time Stamps for Speech models

    Speech models that have been trained with the CTC loss (Wav2Vec2, XLS-R, HuBERT, WavLM, ...) can now output the time stamp in addition to the transcription of the input audio. E.g. one can retrieve the start and end time for every transcribed word via the Wav2Vec2CTCTokenizer.decode method or the Wav2Vec2ProcessorWithLM.decoder method. See the documentation here and here respectively.

    This feature can also be directly used via the ASR pipeline - see here and this example.

    • Add time stamps for wav2vec2 with lm by @patrickvonplaten in https://github.com/huggingface/transformers/pull/15854
    • Adding timestamps for CTC with LM in ASR pipeline. by @Narsil in https://github.com/huggingface/transformers/pull/15863
    • Adding the option to return_timestamps on pure CTC ASR models. by @Narsil in https://github.com/huggingface/transformers/pull/15792
    • Time stamps for CTC models by @patrickvonplaten in https://github.com/huggingface/transformers/pull/15687

    Breaking change

    Unfortunately, some bugs had crept into CLIPTokenizerFast : the tokenization produced by CLIPTokenizer and CLIPTokenizerFast were not equal. CLIPTokenizerFast has been corrected to encode the text with the same strategy as CLIPTokenizer.

    What does this mean for you ? You need to use the tokenizer that was used to train the CLIP template you are using. For example:

    • Case 1 : you use openai/clip-vit-base-patch32, openai/clip-vit-base-patch16 or openai/clip-vit-large-patch14 , before v4.17.0 the good version of the tokenizer was CLIPTokenizer. From v4.17.0, you can use both CLIPTokenizer and CLIPTokenizerFast.
    • Case 2 : you have trained your own CLIP model using CLIPTokenizerFast. Your tokenizer is no longer a CLIPTokenizerFast and we recommend you to load your tokenizer.json in a PreTrainedTokenizerFast directly or to continue to use a version prior to v4.17.0.
    • Case 3: you have trained your own CLIP model using CLIPTokenizer. Now, you can produce a fast equivalent of your tokenizer by doing CLIPTokenizerFast.from_pretrained("Path to local folder or Hub repo with slow tokenizer files", from_slow=True).

    To make CLIPTokenizerFast identical to CLIPTokenizer, the template of the tokenization of a sentence pair (A,B) has been modified. The previous template was <|startoftext|> A B <|endoftext|> and the new one is <|startoftext|> A <|endoftext|> <|endoftext|> B <|endoftext|>.

    What's Changed

    • Fix tests_fetcher by @sgugger in https://github.com/huggingface/transformers/pull/15376
    • Fix code format for Accelerate doc by @stevhliu in https://github.com/huggingface/transformers/pull/15335
    • Add init to BORT by @LysandreJik in https://github.com/huggingface/transformers/pull/15378
    • Set syncfree AdamW as the default optimizer for xla:gpu device in amp mode by @ymwangg in https://github.com/huggingface/transformers/pull/15361
    • Fixing support batch_size and num_return_Sequences in text-generation pipeline by @Narsil in https://github.com/huggingface/transformers/pull/15318
    • Fix bad_words_ids not working with sentencepiece-based tokenizers by @ngoquanghuy99 in https://github.com/huggingface/transformers/pull/15343
    • [docs] fix wrong file name in pr_check by @ngoquanghuy99 in https://github.com/huggingface/transformers/pull/15380
    • Prepare deprecated ONNX exporter for torch v1.11 by @lewtun in https://github.com/huggingface/transformers/pull/15388
    • [Fix doc example] FlaxMarianPreTrainedModel by @ydshieh in https://github.com/huggingface/transformers/pull/15391
    • Make links explicit by @Rocketknight1 in https://github.com/huggingface/transformers/pull/15395
    • [deepspeed] saving checkpoint fallback when fp16 weights aren't saved by @stas00 in https://github.com/huggingface/transformers/pull/14948
    • Fix missing eps arg for LayerNorm in ElectraGeneratorPredictions by @ydshieh in https://github.com/huggingface/transformers/pull/15332
    • Use argument for preprocessing workers in run_summairzation by @sgugger in https://github.com/huggingface/transformers/pull/15394
    • Add support for XLM-R XL and XXL models by modeling_xlm_roberta_xl.py by @Soonhwan-Kwon in https://github.com/huggingface/transformers/pull/13727
    • Fix the inconsistency of loss calculation between PT/TF XLNetLMHeadModel by @ydshieh in https://github.com/huggingface/transformers/pull/15298
    • [XGLMTokenizer] fix init and add in AutoTokenizer by @patil-suraj in https://github.com/huggingface/transformers/pull/15406
    • Add SegformerFeatureExtractor to Auto API by @NielsRogge in https://github.com/huggingface/transformers/pull/15410
    • Fix additional DataTrainingArguments documentation by @FremyCompany in https://github.com/huggingface/transformers/pull/15408
    • Add (M)Luke model training for Token Classification in the examples by @jplu in https://github.com/huggingface/transformers/pull/14880
    • Update README.md by @kamalkraj in https://github.com/huggingface/transformers/pull/15430
    • [Robust Speech Challenge] Add missing LR parameter by @jonatasgrosman in https://github.com/huggingface/transformers/pull/15428
    • [XGLM] fix gradient checkpointing by @patil-suraj in https://github.com/huggingface/transformers/pull/15427
    • [Hotfix] Fix Swin model outputs by @NielsRogge in https://github.com/huggingface/transformers/pull/15414
    • add t5 ner finetuning by @ToluClassics in https://github.com/huggingface/transformers/pull/15432
    • Add doc for add-new-model-like command by @sgugger in https://github.com/huggingface/transformers/pull/15433
    • [Swin] Add missing header by @NielsRogge in https://github.com/huggingface/transformers/pull/15434
    • [deepspeed doc] fix import, extra notes by @stas00 in https://github.com/huggingface/transformers/pull/15400
    • Fix loss calculation in TFXXXForTokenClassification models by @ydshieh in https://github.com/huggingface/transformers/pull/15294
    • Fix spurious warning in TF TokenClassification models by @Rocketknight1 in https://github.com/huggingface/transformers/pull/15435
    • Change REALM checkpoint to new ones by @sgugger in https://github.com/huggingface/transformers/pull/15439
    • [Trainer] suppress warning for length-related columns by @patrickvonplaten in https://github.com/huggingface/transformers/pull/15421
    • [examples/Flax] add a section about GPUs by @patil-suraj in https://github.com/huggingface/transformers/pull/15198
    • Fix TFLEDModel by @ydshieh in https://github.com/huggingface/transformers/pull/15356
    • [XGLMTokenizer] correct positional emb size by @patil-suraj in https://github.com/huggingface/transformers/pull/15441
    • [RobertaTokenizer] remove inheritance on GPT2Tokenizer by @patil-suraj in https://github.com/huggingface/transformers/pull/15429
    • Misfiring tf warnings by @Rocketknight1 in https://github.com/huggingface/transformers/pull/15442
    • Add 'with torch.no_grad()' to BEiT integration test forward passes by @itsTurner in https://github.com/huggingface/transformers/pull/14961
    • Update modeling_wav2vec2.py by @peregilk in https://github.com/huggingface/transformers/pull/15423
    • Error when group_by_length is used with an IterableDataset by @sgugger in https://github.com/huggingface/transformers/pull/15437
    • skip large generations pipeline test for XGLM by @patil-suraj in https://github.com/huggingface/transformers/pull/15445
    • [generate] fix synced_gpus default by @stas00 in https://github.com/huggingface/transformers/pull/15446
    • Remove "inputs" in tf common test script (no longer required) by @ydshieh in https://github.com/huggingface/transformers/pull/15262
    • Fix TF Causal LM models' returned logits by @ydshieh in https://github.com/huggingface/transformers/pull/15256
    • fix from_vision_text_pretrained doc example by @ydshieh in https://github.com/huggingface/transformers/pull/15453
    • [M2M100, XGLM] fix positional emb resize by @patil-suraj in https://github.com/huggingface/transformers/pull/15444
    • Update README.md by @kamalkraj in https://github.com/huggingface/transformers/pull/15462
    • replace assert with exception for padding_side arg in PreTrainedTokenizerBase __init__ by @SaulLu in https://github.com/huggingface/transformers/pull/15454
    • fix the tokenizer_config.json file for the slow tokenizer when a fast version is available by @SaulLu in https://github.com/huggingface/transformers/pull/15319
    • use mean instead of elementwise_mean in XLMPredLayer by @ydshieh in https://github.com/huggingface/transformers/pull/15436
    • [BartTokenizer] remove inheritance on RobertaTokenizer by @patil-suraj in https://github.com/huggingface/transformers/pull/15461
    • Trainer.push_to_hub always tries to push to the Hub by @sgugger in https://github.com/huggingface/transformers/pull/15463
    • Harder check for IndexErrors in QA scripts by @sgugger in https://github.com/huggingface/transformers/pull/15438
    • Add option to resize like torchvision's Resize by @NielsRogge in https://github.com/huggingface/transformers/pull/15419
    • [Wav2Vec2ProcessorWithLM] add alpha & beta to batch decode & decode by @patrickvonplaten in https://github.com/huggingface/transformers/pull/15465
    • Adding support for microphone streaming within pipeline. by @Narsil in https://github.com/huggingface/transformers/pull/15046
    • fix error posted in issue #15448 by @bugface in https://github.com/huggingface/transformers/pull/15480
    • Fic docstring of ASR pipeline by @sgugger in https://github.com/huggingface/transformers/pull/15481
    • Add W&B backend for hyperparameter sweep by @AyushExel in https://github.com/huggingface/transformers/pull/14582
    • Fix labels stored in model config for token classification examples by @sgugger in https://github.com/huggingface/transformers/pull/15482
    • fix set truncation attribute in __init__ of PreTrainedTokenizerBase by @SaulLu in https://github.com/huggingface/transformers/pull/15456
    • Correct eos_token_id settings in generate by @thinksoso in https://github.com/huggingface/transformers/pull/15403
    • fix TFMarianMTModel output by @ydshieh in https://github.com/huggingface/transformers/pull/15494
    • Cleanup load_weight_prefix in TFEncoderDecoderModel by @ydshieh in https://github.com/huggingface/transformers/pull/15101
    • [Flax tests] Disable scheduled GPU tests by @patrickvonplaten in https://github.com/huggingface/transformers/pull/15503
    • Add general vision docstrings by @NielsRogge in https://github.com/huggingface/transformers/pull/15501
    • [deepspeed] fix a bug in a test by @stas00 in https://github.com/huggingface/transformers/pull/15493
    • Add preprocess_logits_for_metrics Trainer param by @davidleonfdez in https://github.com/huggingface/transformers/pull/15473
    • [deepspeed docs] memory requirements by @stas00 in https://github.com/huggingface/transformers/pull/15506
    • Remove loss from some flax models docs & examples by @ydshieh in https://github.com/huggingface/transformers/pull/15492
    • Fix TFElectraForMultipleChoice by @ydshieh in https://github.com/huggingface/transformers/pull/15509
    • Handle PyTorch to Flax conversion of 1D convolutions by @sanchit-gandhi in https://github.com/huggingface/transformers/pull/15519
    • Fix TFRemBertEncoder all_hidden_states by @ydshieh in https://github.com/huggingface/transformers/pull/15510
    • [parallelism docs] Megatron-Deepspeed info by @stas00 in https://github.com/huggingface/transformers/pull/15488
    • Standardize semantic segmentation models outputs by @sgugger in https://github.com/huggingface/transformers/pull/15469
    • [deepspeed docs] DeepSpeed ZeRO Inference by @stas00 in https://github.com/huggingface/transformers/pull/15486
    • Revert "Handle PyTorch to Flax conversion of 1D convolutions" by @patrickvonplaten in https://github.com/huggingface/transformers/pull/15540
    • [ASR pipeline] correct asr pipeline for seq2seq models by @patrickvonplaten in https://github.com/huggingface/transformers/pull/15541
    • [torch_int_div] Correct true division in generation by @patrickvonplaten in https://github.com/huggingface/transformers/pull/15498
    • [Trainer] Deeper length checks for IterableDatasetShard by @anton-l in https://github.com/huggingface/transformers/pull/15539
    • Add ASR CTC streaming example by @anton-l in https://github.com/huggingface/transformers/pull/15309
    • Wav2Vec2 models must either throw or deal with add_apater by @FremyCompany in https://github.com/huggingface/transformers/pull/15409
    • Remove Longformers from ONNX-supported models by @lewtun in https://github.com/huggingface/transformers/pull/15273
    • Fix TF T5/LED missing cross attn in retrun values by @ydshieh in https://github.com/huggingface/transformers/pull/15511
    • Make TF Wav2Vec2 outputs the same as PT's version by @ydshieh in https://github.com/huggingface/transformers/pull/15530
    • FX tracing improvement by @michaelbenayoun in https://github.com/huggingface/transformers/pull/14321
    • electra is added to onnx supported model by @arron1227 in https://github.com/huggingface/transformers/pull/15084
    • [GPTJ] fix docs by @patil-suraj in https://github.com/huggingface/transformers/pull/15558
    • Force use_cache to be False in PyTorch by @ydshieh in https://github.com/huggingface/transformers/pull/15385
    • Add TFSpeech2Text by @gante in https://github.com/huggingface/transformers/pull/15113
    • feat(flax): allow encoder_outputs in generate by @borisdayma in https://github.com/huggingface/transformers/pull/15554
    • Add codecarbon callback to docs by @nateraw in https://github.com/huggingface/transformers/pull/15563
    • [Flax tests] fix test_model_outputs_equivalence by @patil-suraj in https://github.com/huggingface/transformers/pull/15571
    • logger.warn --> logger.warning by @ydshieh in https://github.com/huggingface/transformers/pull/15572
    • PoC for a ProcessorMixin class by @sgugger in https://github.com/huggingface/transformers/pull/15549
    • add model scaling section by @lvwerra in https://github.com/huggingface/transformers/pull/15119
    • Upgrade black to version ~=22.0 by @LysandreJik in https://github.com/huggingface/transformers/pull/15565
    • Make sure custom configs work with Transformers by @sgugger in https://github.com/huggingface/transformers/pull/15569
    • Add Wav2Vec2 Adapter Weights to Flax by @sanchit-gandhi in https://github.com/huggingface/transformers/pull/15566
    • Click new version by @LysandreJik in https://github.com/huggingface/transformers/pull/15579
    • [Flax tests/FlaxBert] make from_pretrained test faster by @patil-suraj in https://github.com/huggingface/transformers/pull/15561
    • Add implementation of typical sampling by @cimeister in https://github.com/huggingface/transformers/pull/15504
    • Constrained Beam Search [without disjunctive decoding] by @cwkeam in https://github.com/huggingface/transformers/pull/15416
    • Fix tests hub failure by @sgugger in https://github.com/huggingface/transformers/pull/15580
    • update serving_output for some TF models by @ydshieh in https://github.com/huggingface/transformers/pull/15568
    • [trainer docs] document how to select specific gpus by @stas00 in https://github.com/huggingface/transformers/pull/15551
    • [ViTMAE] Add link to script by @NielsRogge in https://github.com/huggingface/transformers/pull/15588
    • Expand tutorial for custom models by @sgugger in https://github.com/huggingface/transformers/pull/15587
    • Add Tensorflow handling of ONNX conversion by @Albertobegue in https://github.com/huggingface/transformers/pull/13831
    • Add example batch size to all commands by @patrickvonplaten in https://github.com/huggingface/transformers/pull/15596
    • Compute loss independent from decoder for TF EncDec models (as #14139) by @ydshieh in https://github.com/huggingface/transformers/pull/15175
    • Fix Seq2SeqTrainer for VisionEncoderDecoderModel by @NielsRogge in https://github.com/huggingface/transformers/pull/15603
    • Add local and TensorFlow ONNX export examples to docs by @lewtun in https://github.com/huggingface/transformers/pull/15604
    • [deepspeed docs] Correct JSON format by @ngoquanghuy99 in https://github.com/huggingface/transformers/pull/15600
    • Small clean up generate by @patrickvonplaten in https://github.com/huggingface/transformers/pull/15611
    • Mark "code in the Hub" API as experimental by @sgugger in https://github.com/huggingface/transformers/pull/15624
    • Enable ONNX export when PyTorch and TensorFlow installed in the same env by @lewtun in https://github.com/huggingface/transformers/pull/15625
    • TF: Add informative warning for inexistent CPU backprop ops by @gante in https://github.com/huggingface/transformers/pull/15612
    • Add aws studio notebooks by @mishig25 in https://github.com/huggingface/transformers/pull/15606
    • TF MT5 embeddings resize by @gante in https://github.com/huggingface/transformers/pull/15567
    • Fix broken link in CTRL docs by @stevhliu in https://github.com/huggingface/transformers/pull/15615
    • Fix _configuration_file argument getting passed to model by @sgugger in https://github.com/huggingface/transformers/pull/15629
    • [deepspeed docs] misc additions by @stas00 in https://github.com/huggingface/transformers/pull/15585
    • [research_projects] deal with security alerts by @stas00 in https://github.com/huggingface/transformers/pull/15594
    • Custom feature extractor by @sgugger in https://github.com/huggingface/transformers/pull/15630
    • Fix grammar in tokenizer_summary docs by @derenrich in https://github.com/huggingface/transformers/pull/15614
    • Add push to hub to feature extractor by @sgugger in https://github.com/huggingface/transformers/pull/15632
    • [Fix doc example] FlaxVisionEncoderDecoder by @ydshieh in https://github.com/huggingface/transformers/pull/15626
    • Fix a bug that QuestionAnsweringPipeline ignores max_seq_len parameter by @wptoux in https://github.com/huggingface/transformers/pull/15238
    • Report only the failed imports in requires_backends by @tkukurin in https://github.com/huggingface/transformers/pull/15636
    • Make Swin work with VisionEncoderDecoderModel by @NielsRogge in https://github.com/huggingface/transformers/pull/15527
    • Remove redundant error logging in from_pretrained() method by @lewtun in https://github.com/huggingface/transformers/pull/15631
    • Register feature extractor by @sgugger in https://github.com/huggingface/transformers/pull/15634
    • fix bug for the log of RNG states are not properly loaded lead to exception. by @muzhi1991 in https://github.com/huggingface/transformers/pull/15638
    • [SpeechEncoderDecoder] Make sure no EOS is generated in test by @patrickvonplaten in https://github.com/huggingface/transformers/pull/15655
    • Require tokenizers>=0.11.1 by @aphedges in https://github.com/huggingface/transformers/pull/15266
    • Fix ASR pipelines from local directories with wav2vec models that have language models attached by @versae in https://github.com/huggingface/transformers/pull/15590
    • Fix typo in speech2text2 doc by @jonrbates in https://github.com/huggingface/transformers/pull/15617
    • Allow custom code for Processors by @sgugger in https://github.com/huggingface/transformers/pull/15649
    • add scores to Wav2Vec2WithLMOutput by @arampacha in https://github.com/huggingface/transformers/pull/15413
    • Update bad_words_ids usage by @ngoquanghuy99 in https://github.com/huggingface/transformers/pull/15641
    • Updated the RAG training with latest Pytorch Lightning library and the RAY by @shamanez in https://github.com/huggingface/transformers/pull/15653
    • Add section about doc testing by @patrickvonplaten in https://github.com/huggingface/transformers/pull/15659
    • add a network debug script and document it by @stas00 in https://github.com/huggingface/transformers/pull/15652
    • Re-export KeyDataset. by @Narsil in https://github.com/huggingface/transformers/pull/15645
    • Add decoder_kwargs to send to LM on asr pipeline. by @Narsil in https://github.com/huggingface/transformers/pull/15646
    • TF generate refactor - Greedy Search by @patrickvonplaten in https://github.com/huggingface/transformers/pull/15562
    • [pipeline doc] fix api by @stas00 in https://github.com/huggingface/transformers/pull/15660
    • Fix TFSequenceSummary's activation by @ydshieh in https://github.com/huggingface/transformers/pull/15643
    • Fix model equivalence tests by @LysandreJik in https://github.com/huggingface/transformers/pull/15670
    • Fix vit test by @LysandreJik in https://github.com/huggingface/transformers/pull/15671
    • Add a missing space in a deprecation message by @bryant1410 in https://github.com/huggingface/transformers/pull/15651
    • [t5/t0/mt5 models] faster/leaner custom layer norm by @stas00 in https://github.com/huggingface/transformers/pull/14656
    • Add push_to_hub method to processors by @sgugger in https://github.com/huggingface/transformers/pull/15668
    • Usage examples for logger by @FrancescoSaverioZuppichini in https://github.com/huggingface/transformers/pull/15657
    • Fix dec_attn_mask in TFTransfoXLMainLayer by @ydshieh in https://github.com/huggingface/transformers/pull/15665
    • 🔥 Remove build_doc_test github action by @coyotte508 in https://github.com/huggingface/transformers/pull/15680
    • Add register method to AutoProcessor by @sgugger in https://github.com/huggingface/transformers/pull/15669
    • [Wav2Vec2ProcessorWithLM] Fix auto processor with lm by @patrickvonplaten in https://github.com/huggingface/transformers/pull/15683
    • Fix Funnel configuration doc by @ydshieh in https://github.com/huggingface/transformers/pull/15686
    • Implementation of activations as pytorch modules by @eldarkurtic in https://github.com/huggingface/transformers/pull/15616
    • Add image classification notebook by @NielsRogge in https://github.com/huggingface/transformers/pull/15667
    • Minor fix on README.md by @ydshieh in https://github.com/huggingface/transformers/pull/15688
    • Fix shape by @gchhablani in https://github.com/huggingface/transformers/pull/15696
    • Add SimMIM by @NielsRogge in https://github.com/huggingface/transformers/pull/15586
    • Adding a model, more doc for pushing to the hub by @FrancescoSaverioZuppichini in https://github.com/huggingface/transformers/pull/15690
    • fix CLIP fast tokenizer and change some properties of the slow version by @SaulLu in https://github.com/huggingface/transformers/pull/15067
    • Fix SiluActivation by @sgugger in https://github.com/huggingface/transformers/pull/15718
    • Add initializer_std to TFFunnelModelTester with a default value 0.02 by @ydshieh in https://github.com/huggingface/transformers/pull/15684
    • Fix DETR model deprecation warnings for int div by @gautierdag in https://github.com/huggingface/transformers/pull/15702
    • Fix LongformerModel hidden states by @ydshieh in https://github.com/huggingface/transformers/pull/15537
    • style_doc handles decorators in examples by @sgugger in https://github.com/huggingface/transformers/pull/15719
    • Fix auto model tests by @LysandreJik in https://github.com/huggingface/transformers/pull/15706
    • Fix HfDeepSpeedConfig argument in Trainer by @jaketae in https://github.com/huggingface/transformers/pull/15711
    • fix bug in PT speech-encoder-decoder by @sanchit-gandhi in https://github.com/huggingface/transformers/pull/15699
    • Fix undoing preprocessing step in summarization example by @SSardorf in https://github.com/huggingface/transformers/pull/15741
    • Fix minor comment typos by @Crabzmatic in https://github.com/huggingface/transformers/pull/15740
    • add VisionTextDualEncoder and CLIP fine-tuning script by @patil-suraj in https://github.com/huggingface/transformers/pull/15701
    • Add layer_idx to CrossAttention of GPT2 model by @hyunwoongko in https://github.com/huggingface/transformers/pull/15730
    • TF text classification examples by @gante in https://github.com/huggingface/transformers/pull/15704
    • revert temporary addition to test next version of CLIPTokenizerFast by @SaulLu in https://github.com/huggingface/transformers/pull/15717
    • added link to our writing-doc document by @FrancescoSaverioZuppichini in https://github.com/huggingface/transformers/pull/15756
    • TF train_step docstring by @gante in https://github.com/huggingface/transformers/pull/15755
    • Gelu10 by @mfuntowicz in https://github.com/huggingface/transformers/pull/15676
    • fixed pipeline code by @Moumeneb1 in https://github.com/huggingface/transformers/pull/15607
    • Fix typo on examples/pytorch/question-answering by @dreamgonfly in https://github.com/huggingface/transformers/pull/15644
    • Cleanup transformers-cli by @julien-c in https://github.com/huggingface/transformers/pull/15767
    • Fix HfArgumentParser when passing a generator by @bryant1410 in https://github.com/huggingface/transformers/pull/15758
    • Adding ZeroShotImageClassificationPipeline by @Narsil in https://github.com/huggingface/transformers/pull/12119
    • [M2M100, XGLM] fix create_position_ids_from_inputs_embeds by @patil-suraj in https://github.com/huggingface/transformers/pull/15751
    • Supporting Merges.txt files than contain an endline. (hf-internal-testing/tiny-clip for instance) by @Narsil in https://github.com/huggingface/transformers/pull/15782
    • [CLIP] fix gradient checkpointing by @patil-suraj in https://github.com/huggingface/transformers/pull/15789
    • [ViLT] Fix checkpoint url in config by @patil-suraj in https://github.com/huggingface/transformers/pull/15790
    • Enable image-segmentation on AutoModelForSemanticSegmentation by @Narsil in https://github.com/huggingface/transformers/pull/15647
    • [doc] custom_models: mention security features of the Hub by @julien-c in https://github.com/huggingface/transformers/pull/15768
    • [Wav2Vec2FeatureExtractor] Align documentation with code by @lsb in https://github.com/huggingface/transformers/pull/15468
    • HTML dev docs by @coyotte508 in https://github.com/huggingface/transformers/pull/15678
    • Fix indent in doc-builder CI by @coyotte508 in https://github.com/huggingface/transformers/pull/15798
    • [Test refactor 1/5] Per-folder tests reorganization by @LysandreJik in https://github.com/huggingface/transformers/pull/15725
    • [Test refactor 2/5] Tests fetcher by @LysandreJik in https://github.com/huggingface/transformers/pull/15726
    • [Test refactor 3/5] Notification service improvement by @LysandreJik in https://github.com/huggingface/transformers/pull/15727
    • [Test refactor 4/5] Improve the scheduled tests by @LysandreJik in https://github.com/huggingface/transformers/pull/15728
    • [Test refactor 5/5] Build docker images by @LysandreJik in https://github.com/huggingface/transformers/pull/15729
    • Fix build_documentation CI by @coyotte508 in https://github.com/huggingface/transformers/pull/15803
    • Fix model templates by @LysandreJik in https://github.com/huggingface/transformers/pull/15806
    • Fix add-new-model-like when old model checkpoint is not found by @sgugger in https://github.com/huggingface/transformers/pull/15805
    • Fix from_pretrained with default base_model_prefix by @sgugger in https://github.com/huggingface/transformers/pull/15814
    • Revert changes in logit size for semantic segmentation models by @sgugger in https://github.com/huggingface/transformers/pull/15722
    • [Unispeech] Fix slow tests by @patrickvonplaten in https://github.com/huggingface/transformers/pull/15818
    • [Barthez Tokenizer] Fix saving by @patrickvonplaten in https://github.com/huggingface/transformers/pull/15815
    • [TFXLNet] Correct tf xlnet generate by @patrickvonplaten in https://github.com/huggingface/transformers/pull/15822
    • Fixes the "push" CI run by @LysandreJik in https://github.com/huggingface/transformers/pull/15807
    • Fix semantic segmentation pipeline test by @sgugger in https://github.com/huggingface/transformers/pull/15826
    • Fix dummy_inputs() to dummy_inputs in symbolic_trace doc string by @pbelevich in https://github.com/huggingface/transformers/pull/15776
    • Add model specific output classes to PoolFormer model docs by @heytanay in https://github.com/huggingface/transformers/pull/15746
    • HFTracer.trace should use self.graph to be compatible with torch.fx.Tracer by @pbelevich in https://github.com/huggingface/transformers/pull/15824
    • Fix tf.concatenate + test past_key_values for TF models by @ydshieh in https://github.com/huggingface/transformers/pull/15774
    • [examples/summarization and translation] fix readme by @patil-suraj in https://github.com/huggingface/transformers/pull/15833
    • Add ONNX Runtime quantization for text classification notebook by @echarlaix in https://github.com/huggingface/transformers/pull/15817
    • Re-enable doctests for the quicktour by @sgugger in https://github.com/huggingface/transformers/pull/15828
    • Framework split model report by @LysandreJik in https://github.com/huggingface/transformers/pull/15825
    • [UniSpeechSat] Revert previous incorrect change of slow tests by @patrickvonplaten in https://github.com/huggingface/transformers/pull/15847
    • Flax Speech-Encoder-Decoder Model by @sanchit-gandhi in https://github.com/huggingface/transformers/pull/15613
    • Fix (deprecated) ONNX exporter to account for new tf2onnx API by @lewtun in https://github.com/huggingface/transformers/pull/15856
    • Fixing the timestamps with chunking. by @Narsil in https://github.com/huggingface/transformers/pull/15843
    • [TF-PT-Tests] Fix PyTorch - TF tests for different GPU devices by @patrickvonplaten in https://github.com/huggingface/transformers/pull/15846
    • [Benchmark tools] Deprecate all by @patrickvonplaten in https://github.com/huggingface/transformers/pull/15848
    • Add PT + TF automatic builds by @LysandreJik in https://github.com/huggingface/transformers/pull/15860
    • Update TF LM examples by @gante in https://github.com/huggingface/transformers/pull/15855
    • [ViLT] Add link to notebooks by @NielsRogge in https://github.com/huggingface/transformers/pull/15791
    • Scatter should run on CUDA by @LysandreJik in https://github.com/huggingface/transformers/pull/15872
    • [vision] Add problem_type support by @NielsRogge in https://github.com/huggingface/transformers/pull/15851
    • use python 3.7 for flax self-push tests by @patil-suraj in https://github.com/huggingface/transformers/pull/15865
    • Bump up doc node version to 16 by @mishig25 in https://github.com/huggingface/transformers/pull/15874
    • No self-hosted by @LysandreJik in https://github.com/huggingface/transformers/pull/15710
    • fix deepspeed tests by @stas00 in https://github.com/huggingface/transformers/pull/15881
    • Remove stash for now by @LysandreJik in https://github.com/huggingface/transformers/pull/15882
    • M2M100 support for ONNX export by @michaelbenayoun in https://github.com/huggingface/transformers/pull/15193
    • [Bart] Fix implementation note doc by @patrickvonplaten in https://github.com/huggingface/transformers/pull/15879
    • Add TF generate sample tests with all logit processors by @gante in https://github.com/huggingface/transformers/pull/15852
    • TF: Update QA example by @gante in https://github.com/huggingface/transformers/pull/15870
    • Updates in Trainer to support new features in SM Model Parallel library by @rahul003 in https://github.com/huggingface/transformers/pull/15877
    • Fix tiny typo in docs by @rhjohnstone in https://github.com/huggingface/transformers/pull/15884
    • Fix Bug in FlaxWav2Vec2 Slow Test by @sanchit-gandhi in https://github.com/huggingface/transformers/pull/15887
    • [SegFormer] Add deprecation warning by @NielsRogge in https://github.com/huggingface/transformers/pull/15889
    • TF generate refactor - Sample by @gante in https://github.com/huggingface/transformers/pull/15793
    • [XGLM] run sampling test on CPU to be deterministic by @patil-suraj in https://github.com/huggingface/transformers/pull/15892
    • Fix SegformerForImageClassification by @NielsRogge in https://github.com/huggingface/transformers/pull/15895
    • Update delete-dev-doc job to match build-dev-doc by @sgugger in https://github.com/huggingface/transformers/pull/15891

    Impressive community contributors

    The community contributors below have significantly contributed to the v4.17.0 release. Thank you!

    @sayakpaul, for contributing the TensorFlow version of ConvNext @gchhablani, for contributing PLBart @edugp, for contributing Data2Vec

    New Contributors

    • @Soonhwan-Kwon made their first contribution in https://github.com/huggingface/transformers/pull/13727
    • @jonatasgrosman made their first contribution in https://github.com/huggingface/transformers/pull/15428
    • @ToluClassics made their first contribution in https://github.com/huggingface/transformers/pull/15432
    • @peregilk made their first contribution in https://github.com/huggingface/transformers/pull/15423
    • @bugface made their first contribution in https://github.com/huggingface/transformers/pull/15480
    • @AyushExel made their first contribution in https://github.com/huggingface/transformers/pull/14582
    • @thinksoso made their first contribution in https://github.com/huggingface/transformers/pull/15403
    • @davidleonfdez made their first contribution in https://github.com/huggingface/transformers/pull/15473
    • @sanchit-gandhi made their first contribution in https://github.com/huggingface/transformers/pull/15519
    • @arron1227 made their first contribution in https://github.com/huggingface/transformers/pull/15084
    • @cimeister made their first contribution in https://github.com/huggingface/transformers/pull/15504
    • @cwkeam made their first contribution in https://github.com/huggingface/transformers/pull/15416
    • @Albertobegue made their first contribution in https://github.com/huggingface/transformers/pull/13831
    • @derenrich made their first contribution in https://github.com/huggingface/transformers/pull/15614
    • @tkukurin made their first contribution in https://github.com/huggingface/transformers/pull/15636
    • @muzhi1991 made their first contribution in https://github.com/huggingface/transformers/pull/15638
    • @versae made their first contribution in https://github.com/huggingface/transformers/pull/15590
    • @jonrbates made their first contribution in https://github.com/huggingface/transformers/pull/15617
    • @arampacha made their first contribution in https://github.com/huggingface/transformers/pull/15413
    • @FrancescoSaverioZuppichini made their first contribution in https://github.com/huggingface/transformers/pull/15657
    • @coyotte508 made their first contribution in https://github.com/huggingface/transformers/pull/15680
    • @heytanay made their first contribution in https://github.com/huggingface/transformers/pull/15531
    • @gautierdag made their first contribution in https://github.com/huggingface/transformers/pull/15702
    • @SSardorf made their first contribution in https://github.com/huggingface/transformers/pull/15741
    • @Crabzmatic made their first contribution in https://github.com/huggingface/transformers/pull/15740
    • @dreamgonfly made their first contribution in https://github.com/huggingface/transformers/pull/15644
    • @lsb made their first contribution in https://github.com/huggingface/transformers/pull/15468
    • @pbelevich made their first contribution in https://github.com/huggingface/transformers/pull/15776
    • @sayakpaul made their first contribution in https://github.com/huggingface/transformers/pull/15750
    • @rahul003 made their first contribution in https://github.com/huggingface/transformers/pull/15877
    • @rhjohnstone made their first contribution in https://github.com/huggingface/transformers/pull/15884

    Full Changelog: https://github.com/huggingface/transformers/compare/v4.16.0...v4.17.0

    Source code(tar.gz)
    Source code(zip)
  • v4.16.2(Jan 31, 2022)

    • Add header (huggingface#15434)
    • [Hotfix] Fix Swin model outputs (huggingface#15414)

    Full Changelog: https://github.com/huggingface/transformers/compare/v4.16.1...v4.16.2

    Source code(tar.gz)
    Source code(zip)
  • v4.16.1(Jan 28, 2022)

  • v4.16.0(Jan 27, 2022)

    New models

    Nyströmformer

    The Nyströmformer model was proposed in Nyströmformer: A Nyström-Based Algorithm for Approximating Self-Attention by Yunyang Xiong, Zhanpeng Zeng, Rudrasis Chakraborty, Mingxing Tan, Glenn Fung, Yin Li, and Vikas Singh.

    The Nyströmformer model overcomes the quadratic complexity of self-attention on the input sequence length by adapting the Nyström method to approximate standard self-attention, enabling longer sequences with thousands of tokens as input.

    • Add Nystromformer by @novice03 in https://github.com/huggingface/transformers/pull/14659

    Compatible checkpoints can be found on the hub: https://huggingface.co/models?other=nystromformer

    REALM

    The REALM model was proposed in REALM: Retrieval-Augmented Language Model Pre-Training by Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat and Ming-Wei Chang.

    It’s a retrieval-augmented language model that firstly retrieves documents from a textual knowledge corpus and then utilizes retrieved documents to process question answering tasks.

    • Add REALM by @qqaatw in https://github.com/huggingface/transformers/pull/13292
    • Add FastTokenizer to REALM by @qqaatw in https://github.com/huggingface/transformers/pull/15211

    Compatible checkpoints can be found on the hub: https://huggingface.co/models?other=realm

    ViTMAE

    The ViTMAE model was proposed in Masked Autoencoders Are Scalable Vision Learners by Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, Ross Girshick.

    The paper shows that, by pre-training a Vision Transformer (ViT) to reconstruct pixel values for masked patches, one can get results after fine-tuning that outperform supervised pre-training.

    • Add MAE by @NielsRogge in https://github.com/huggingface/transformers/pull/15120

    Compatible checkpoints can be found on the hub: https://huggingface.co/models?other=vit_mae

    ViLT

    The ViLT model was proposed in ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision by Wonjae Kim, Bokyung Son, Ildoo Kim.

    ViLT incorporates text embeddings into a Vision Transformer (ViT), allowing it to have a minimal design for Vision-and-Language Pre-training (VLP).

    • Add ViLT by @NielsRogge in https://github.com/huggingface/transformers/pull/14895

    Compatible checkpoints can be found on the hub: https://huggingface.co/models?other=vilt

    Swin Transformer

    The Swin Transformer was proposed in Swin Transformer: Hierarchical Vision Transformer using Shifted Windows by Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, Baining Guo.

    The Swin Transformer serves as a general-purpose backbone for computer vision. The shifted windowing scheme brings greater efficiency by limiting self-attention computation to non-overlapping local windows while also allowing for cross-window connection. This hierarchical architecture has the flexibility to model at various scales and has linear computational complexity with respect to image size.

    • Add Swin Transformer by @novice03 in https://github.com/huggingface/transformers/pull/15085

    Compatible checkpoints can be found on the hub: https://huggingface.co/models?other=swin

    YOSO

    The YOSO model was proposed in You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling by Zhanpeng Zeng, Yunyang Xiong, Sathya N. Ravi, Shailesh Acharya, Glenn Fung, Vikas Singh.

    YOSO approximates standard softmax self-attention via a Bernoulli sampling scheme based on Locality Sensitive Hashing (LSH). In principle, all the Bernoulli random variables can be sampled with a single hash.

    • Add YOSO by @novice03 in #15091

    Compatible checkpoints can be found on the hub: https://huggingface.co/models?other=yoso

    Add model like

    To help contributors add new models more easily to Transformers, there is a new command that will clone an existing model and set the various hooks in the library, so that you only have to write the tweaks needed to the modeling file. Just run transformers-cli add-new-model-like and fill the questionnaire!

    • Add model like by @sgugger in https://github.com/huggingface/transformers/pull/14992

    Training scripts

    New training scripts were introduced, for speech seq2seq models and an image pre-training script leveraging the ViTMAE models. Finally, an image captioning example in Flax gets added to the library.

    • Add Speech Seq2Seq Training script by @patrickvonplaten in https://github.com/huggingface/transformers/pull/14792
    • [ViTMAE] Add image pretraining script by @NielsRogge in https://github.com/huggingface/transformers/pull/15242
    • Add Flax image captioning example by @ydshieh in https://github.com/huggingface/transformers/pull/14864

    Pipelines

    Adding support for long files on automatic-speech-recognition (ASR) as well as supporting audio models with LM which increases the WER on many tasks See the blogpost. Also continuously increasing homogeneity in arguments, framework support on all pipelines.

    • Large audio chunking for the existing ASR pipeline by @anton-l in https://github.com/huggingface/transformers/pull/14896
    • Enabling TF on image-classification pipeline. by @Narsil in https://github.com/huggingface/transformers/pull/15030
    • Pipeline ASR with LM. by @Narsil in https://github.com/huggingface/transformers/pull/15071
    • ChunkPipeline: batch_size enabled on zero-cls and qa pipelines. by @Narsil in https://github.com/huggingface/transformers/pull/14225

    PyTorch improvements

    The ELECTRA model can now be used as a decoder, enabling an ELECTRA encoder-decoder model.

    • Add ElectraForCausalLM -> Enable Electra encoder-decoder model by @stancld in https://github.com/huggingface/transformers/pull/14729

    TensorFlow improvements

    • Keras metric callback by @Rocketknight1 and @merveenoyan in https://github.com/huggingface/transformers/pull/14867

    The vision encoder decoder model can now be used in TensorFlow.

    • Add TFVisionEncoderDecoderModel by @ydshieh in https://github.com/huggingface/transformers/pull/14148

    CLIP gets ported to TensorFlow.

    • Add TFCLIPModel by @ydshieh in https://github.com/huggingface/transformers/pull/13967

    Flax improvements

    RoFormer gets ported to Flax.

    • Add Flax RoFormer by @stancld in https://github.com/huggingface/transformers/pull/15005

    Deprecations

    • Deprecates AdamW and adds --optim by @manuelciosici in https://github.com/huggingface/transformers/pull/14744

    Documentation

    The documentation has been fully migrated to MarkDown, if you are making contribution, make sure to read the upgraded guide on how to write good docstrings.

    • Convert rst files by @sgugger in https://github.com/huggingface/transformers/pull/14888
    • Doc styler v2 by @sgugger in https://github.com/huggingface/transformers/pull/14950
    • Convert last rst file by @sgugger in https://github.com/huggingface/transformers/pull/14952
    • Doc styler examples by @sgugger in https://github.com/huggingface/transformers/pull/14953
    • [doc] consistent True/False/None default format by @stas00 in https://github.com/huggingface/transformers/pull/14951
    • [doc] :obj: hunt by @stas00 in https://github.com/huggingface/transformers/pull/14954
    • [doc] :class: hunt by @stas00 in https://github.com/huggingface/transformers/pull/14955

    Bugfixes and improvements

    • Fix installation instructions for BART ONNX example by @lewtun in https://github.com/huggingface/transformers/pull/14885
    • Fix doc examples: ... takes no keyword arguments by @ydshieh in https://github.com/huggingface/transformers/pull/14701
    • Fix AttributeError from PreTrainedTokenizerFast.decoder by @aphedges in https://github.com/huggingface/transformers/pull/14691
    • Add 'with torch.no_grad()' to ALBERT integration test forward pass by @henholm in https://github.com/huggingface/transformers/pull/14808
    • Add ONNX support for MarianMT models by @lewtun in https://github.com/huggingface/transformers/pull/14586
    • add custom stopping criteria to human eval script by @lvwerra in https://github.com/huggingface/transformers/pull/14897
    • Set run_name in MLflowCallback by @YangDong2002 in https://github.com/huggingface/transformers/pull/14894
    • [AutoTokenizer] Fix incorrect from pretrained by @patrickvonplaten in https://github.com/huggingface/transformers/pull/14900
    • [Tests] Update speech diarization and WavLM tolerances by @anton-l in https://github.com/huggingface/transformers/pull/14902
    • [doc] post-porting by @stas00 in https://github.com/huggingface/transformers/pull/14890
    • [Generate] Remove attention_mask and integrate model_main_input_name by @patrickvonplaten in https://github.com/huggingface/transformers/pull/14856
    • Fix failing GPU trainer tests by @sgugger in https://github.com/huggingface/transformers/pull/14903
    • Better logic for getting tokenizer config in AutoTokenizer by @sgugger in https://github.com/huggingface/transformers/pull/14906
    • [doc] install - add link to jax installation by @stas00 in https://github.com/huggingface/transformers/pull/14912
    • [WavLM] fix wavlm docs by @patrickvonplaten in https://github.com/huggingface/transformers/pull/14910
    • Fix Perceiver docs by @Sanster in https://github.com/huggingface/transformers/pull/14917
    • fix to issue #14833 in data_collator - consider no labels by @kleinay in https://github.com/huggingface/transformers/pull/14930
    • Fix duplicate call to save_checkpoint when using deepspeed by @MihaiBalint in https://github.com/huggingface/transformers/pull/14946
    • [WavLM] give model more precision tolerance in tests by @patrickvonplaten in https://github.com/huggingface/transformers/pull/14958
    • [Speech Recognition Examples] Update README.md by @patrickvonplaten in https://github.com/huggingface/transformers/pull/14965
    • [Tests] Speed up tokenizer tests by @patrickvonplaten in https://github.com/huggingface/transformers/pull/14964
    • [Wav2Vec2] Rename model's feature extractor to feature encoder by @patrickvonplaten in https://github.com/huggingface/transformers/pull/14959
    • Replace assertion with exception by @jaketae in https://github.com/huggingface/transformers/pull/14970
    • remove absl workaround as it's no longer needed by @stas00 in https://github.com/huggingface/transformers/pull/14909
    • Fixing a pathological case for slow tokenizers by @Narsil in https://github.com/huggingface/transformers/pull/14981
    • [AutoProcessor] Correct AutoProcessor and automatically add processor… by @patrickvonplaten in https://github.com/huggingface/transformers/pull/14881
    • [Generate] correct encoder_outputs are passed without attention_mask by @patrickvonplaten in https://github.com/huggingface/transformers/pull/14980
    • Adding num_return_sequences support for text2text generation. by @Narsil in https://github.com/huggingface/transformers/pull/14988
    • Enabling tokenizers upgrade. by @Narsil in https://github.com/huggingface/transformers/pull/14941
    • Allow training to resume even if RNG states are not properly loaded by @sgugger in https://github.com/huggingface/transformers/pull/14994
    • Map model_type and doc pages names by @sgugger in https://github.com/huggingface/transformers/pull/14944
    • Fixing t2t pipelines lists outputs. by @Narsil in https://github.com/huggingface/transformers/pull/15008
    • Improve truncation_side by @Narsil in https://github.com/huggingface/transformers/pull/14947
    • Fix doc examples: name 'torch' is not defined by @ydshieh in https://github.com/huggingface/transformers/pull/15016
    • [Tests] Correct Wav2Vec2 & WavLM tests by @patrickvonplaten in https://github.com/huggingface/transformers/pull/15015
    • [doc] Update parallelism.mdx by @hyunwoongko in https://github.com/huggingface/transformers/pull/15013
    • Fix Code block speech pretraining example by @flozi00 in https://github.com/huggingface/transformers/pull/14983
    • Fix a little typo by @milyiyo in https://github.com/huggingface/transformers/pull/15002
    • Hotfix chunk_length_s instead of _ms. by @Narsil in https://github.com/huggingface/transformers/pull/15029
    • [doc] Update parallelism.mdx by @hyunwoongko in https://github.com/huggingface/transformers/pull/15018
    • [megatron convert] PYTHONPATH requirements by @stas00 in https://github.com/huggingface/transformers/pull/14956
    • Fix doc example: mask_time_indices (numpy) has no attribute 'to' by @ydshieh in https://github.com/huggingface/transformers/pull/15033
    • Adding QoL for batch_size arg (like others enabled everywhere). by @Narsil in https://github.com/huggingface/transformers/pull/15027
    • [CLIP] Fix PT test by @patrickvonplaten in https://github.com/huggingface/transformers/pull/15041
    • [SpeechEncoderDecoder] Fix from pretrained by @patrickvonplaten in https://github.com/huggingface/transformers/pull/15043
    • [CLIP] Fix TF test by @patil-suraj in https://github.com/huggingface/transformers/pull/15042
    • Wrap Roberta integration test forward passes with torch.no_grad() by @mattchurgin in https://github.com/huggingface/transformers/pull/15037
    • Add Detectron2 to Github actions by @NielsRogge in https://github.com/huggingface/transformers/pull/15053
    • Remove old asserts. by @Narsil in https://github.com/huggingface/transformers/pull/15012
    • Add 'with torch.no_grad()' to BertGeneration integration test forward passes by @itsTurner in https://github.com/huggingface/transformers/pull/14963
    • Update run_speech_recognition_seq2seq.py (max_eval_samples instead of train_samples) by @flozi00 in https://github.com/huggingface/transformers/pull/14967
    • [VisionTextDualEncoder] Fix doc example by @ydshieh in https://github.com/huggingface/transformers/pull/15057
    • Resubmit changes after rebase to master by @kct22aws in https://github.com/huggingface/transformers/pull/14982
    • [Fix doc examples] missing from_pretrained by @ydshieh in https://github.com/huggingface/transformers/pull/15044
    • [VisionTextDualEncoder] Add token_type_ids param by @ydshieh in https://github.com/huggingface/transformers/pull/15073
    • Fix convert for newer megatron-lm bert model by @yoquankara in https://github.com/huggingface/transformers/pull/14082
    • [Wav2Vec2 Speech Event] Add speech event v2 by @patrickvonplaten in https://github.com/huggingface/transformers/pull/15083
    • fix model table cell text alignment by @ydshieh in https://github.com/huggingface/transformers/pull/14999
    • Update check_repo.py by @kamalkraj in https://github.com/huggingface/transformers/pull/15014
    • Make OpenAIGPTTokenizer work with SpaCy 2.x and 3.x by @cody-moveworks in https://github.com/huggingface/transformers/pull/15019
    • Change assignee for tokenizers by @LysandreJik in https://github.com/huggingface/transformers/pull/15088
    • support the trocr small models by @liminghao1630 in https://github.com/huggingface/transformers/pull/14893
    • [Fix doc example] RagModel by @ydshieh in https://github.com/huggingface/transformers/pull/15076
    • Model summary doc page horizontal banners by @mishig25 in https://github.com/huggingface/transformers/pull/15058
    • Use tqdm.auto in Pipeline docs by @bryant1410 in https://github.com/huggingface/transformers/pull/14920
    • [doc] normalize HF Transformers string by @stas00 in https://github.com/huggingface/transformers/pull/15023
    • Happy New Year! by @sgugger in https://github.com/huggingface/transformers/pull/15094
    • [DOC] fix doc examples for bart-like models by @patil-suraj in https://github.com/huggingface/transformers/pull/15093
    • [performance doc] Power and Cooling by @stas00 in https://github.com/huggingface/transformers/pull/14935
    • Add test to check reported training loss by @sgugger in https://github.com/huggingface/transformers/pull/15096
    • Take gradient accumulation into account when defining samplers by @sgugger in https://github.com/huggingface/transformers/pull/15095
    • [Fix doc example] Speech2TextForConditionalGeneration by @ydshieh in https://github.com/huggingface/transformers/pull/15092
    • Fix cookiecutter by @NielsRogge in https://github.com/huggingface/transformers/pull/15100
    • [Wav2Vec2ProcessorWithLM] improve decoder download by @patrickvonplaten in https://github.com/huggingface/transformers/pull/15040
    • Adds IBERT to models exportable with ONNX by @MaximovaIrina in https://github.com/huggingface/transformers/pull/14868
    • change metric_key_prefix in seq2seq_trainer.py by @JejuWayfarer in https://github.com/huggingface/transformers/pull/15099
    • Print out durations of all scheduled tests by @LysandreJik in https://github.com/huggingface/transformers/pull/15102
    • Fix failing W2V2 test by @LysandreJik in https://github.com/huggingface/transformers/pull/15104
    • Doc styler tip by @sgugger in https://github.com/huggingface/transformers/pull/15105
    • Update ONNX docs by @lewtun in https://github.com/huggingface/transformers/pull/14904
    • Fix saving FlaubertTokenizer configs by @vmaryasin in https://github.com/huggingface/transformers/pull/14991
    • Update TF test_step to match train_step by @Rocketknight1 in https://github.com/huggingface/transformers/pull/15111
    • use block_size instead of max_seq_length in tf run_clm example by @riklopfer in https://github.com/huggingface/transformers/pull/15036
    • fix: switch from slow to generic tokenizer class by @lvwerra in https://github.com/huggingface/transformers/pull/15122
    • Fix TFEncoderDecoder labels handling #14357 by @ydshieh in https://github.com/huggingface/transformers/pull/15001
    • Add ONNX configuration classes to docs by @lewtun in https://github.com/huggingface/transformers/pull/15121
    • Add with torch.no_grad() to DistilBERT integration test forward pass by @jaketae in https://github.com/huggingface/transformers/pull/14979
    • mBART support for run_summarization.py by @banda-larga in https://github.com/huggingface/transformers/pull/15125
    • doc-builder -> doc-build by @LysandreJik in https://github.com/huggingface/transformers/pull/15134
    • [Fix doc example] - ProphetNetDecoder by @ydshieh in https://github.com/huggingface/transformers/pull/15124
    • [examples/flax/language-modeling] set loglevel by @stas00 in https://github.com/huggingface/transformers/pull/15129
    • Update model_sharing.mdx by @carlos-aguayo in https://github.com/huggingface/transformers/pull/15142
    • Enable AMP for xla:gpu device in trainer class by @ymwangg in https://github.com/huggingface/transformers/pull/15022
    • [deepspeed tests] fix summarization by @stas00 in https://github.com/huggingface/transformers/pull/15149
    • Check the repo consistency in model templates test by @sgugger in https://github.com/huggingface/transformers/pull/15141
    • Add TF glu activation function by @gante in https://github.com/huggingface/transformers/pull/15146
    • Make sure all submodules are properly registered by @sgugger in https://github.com/huggingface/transformers/pull/15144
    • [Fix doc example] - OpenAIGPTDoubleHeadsModel by @ydshieh in https://github.com/huggingface/transformers/pull/15143
    • fix BertTokenizerFast tokenize_chinese_chars arg by @SaulLu in https://github.com/huggingface/transformers/pull/15158
    • Fix typo in test_configuration_common.py by @novice03 in https://github.com/huggingface/transformers/pull/15160
    • Add "open in hf spaces" gradio button issue #73 by @AK391 in https://github.com/huggingface/transformers/pull/15106
    • TF Bert inference - support np.ndarray optional arguments by @gante in https://github.com/huggingface/transformers/pull/15074
    • Fixing flaky test (hopefully). by @Narsil in https://github.com/huggingface/transformers/pull/15154
    • Better dummies by @sgugger in https://github.com/huggingface/transformers/pull/15148
    • Update from keras2onnx to tf2onnx by @gante in https://github.com/huggingface/transformers/pull/15162
    • [doc] performance: Efficient Software Prebuilds by @stas00 in https://github.com/huggingface/transformers/pull/15147
    • [Speech models] Disable non-existing chunking in tests by @patrickvonplaten in https://github.com/huggingface/transformers/pull/15163
    • Added forward pass of test_inference_image_classification_head by @MrinalTyagi in https://github.com/huggingface/transformers/pull/14777
    • Fix dtype issue in TF BART by @Rocketknight1 in https://github.com/huggingface/transformers/pull/15178
    • [doc] new MoE paper by @stas00 in https://github.com/huggingface/transformers/pull/15184
    • Mark bad tokenizers version by @sgugger in https://github.com/huggingface/transformers/pull/15188
    • [Fix doc example] UniSpeechSatForPreTraining by @ydshieh in https://github.com/huggingface/transformers/pull/15152
    • is_ctc needs to be updated to `self.type == "ctc". by @Narsil in https://github.com/huggingface/transformers/pull/15194
    • [Fix doc example] TFRagModel by @ydshieh in https://github.com/huggingface/transformers/pull/15187
    • Error when code examples are improperly closed by @sgugger in https://github.com/huggingface/transformers/pull/15186
    • Fix deprecation warnings for int div by @sgugger in https://github.com/huggingface/transformers/pull/15180
    • Copies and docstring styling by @sgugger in https://github.com/huggingface/transformers/pull/15202
    • [ASR pipeline] correct with lm pipeline by @patrickvonplaten in https://github.com/huggingface/transformers/pull/15200
    • Remove dependency to quiet Dependabot by @sgugger in https://github.com/huggingface/transformers/pull/15205
    • Ignore empty subfolders when identifying submodules by @sgugger in https://github.com/huggingface/transformers/pull/15204
    • [MBartTokenizer] remove dep on xlm-roberta tokenizer by @patil-suraj in https://github.com/huggingface/transformers/pull/15201
    • fix: #14486 do not use BertPooler in DPR by @PaulLerner in https://github.com/huggingface/transformers/pull/15068
    • [Fix doc example] Wrong checkpoint name by @ydshieh in https://github.com/huggingface/transformers/pull/15079
    • [Robust Speech Event] Add guides by @patrickvonplaten in https://github.com/huggingface/transformers/pull/15155
    • Enable tqdm toggling by @jaketae in https://github.com/huggingface/transformers/pull/15167
    • [FLAX] glue training example refactor by @kamalkraj in https://github.com/huggingface/transformers/pull/13815
    • Rename compute_loss in TF models by @Rocketknight1 in https://github.com/huggingface/transformers/pull/15207
    • Build dev documentation by @LysandreJik in https://github.com/huggingface/transformers/pull/15210
    • [Fix doc example] TFFunnelTokenizer' is not defined by @ydshieh in https://github.com/huggingface/transformers/pull/15225
    • Correct Speech Event Readme by @patrickvonplaten in https://github.com/huggingface/transformers/pull/15226
    • [ViTMAE] Various fixes by @NielsRogge in https://github.com/huggingface/transformers/pull/15221
    • [Speech Event] Fix speech event readme by @patil-suraj in https://github.com/huggingface/transformers/pull/15227
    • Fix typo in BERT tokenization file by @qqaatw in https://github.com/huggingface/transformers/pull/15228
    • Fix PR number by @LysandreJik in https://github.com/huggingface/transformers/pull/15231
    • Adapt Common Voice Talk Title and Abstract by @patrickvonplaten in https://github.com/huggingface/transformers/pull/15233
    • Update Trainer code example by @NielsRogge in https://github.com/huggingface/transformers/pull/15070
    • Make chuking smartly (long files) work on asr ctc_with_lm. by @Narsil in https://github.com/huggingface/transformers/pull/15219
    • Fix usage of additional kwargs in from_encoder_decoder_pretrained in encoder-decoder models by @jsnfly in https://github.com/huggingface/transformers/pull/15056
    • Update README.md by @anton-l in https://github.com/huggingface/transformers/pull/15239
    • Update README.md by @anton-l in https://github.com/huggingface/transformers/pull/15246
    • Update pipelines.mdx by @kamalkraj in https://github.com/huggingface/transformers/pull/15243
    • [Fix doc example] missing import by @ydshieh in https://github.com/huggingface/transformers/pull/15240
    • Fixes tf_default_data_collator sometimes guessing the wrong dtype for labels by @Rocketknight1 in https://github.com/huggingface/transformers/pull/15234
    • Make sure to raise NotImplementedError with correct method name by @kumapo in https://github.com/huggingface/transformers/pull/15253
    • Fix crash when logs are empty because Keras has wiped them out of spite by @Rocketknight1 in https://github.com/huggingface/transformers/pull/15258
    • Tentative workflow improvement by @LysandreJik in https://github.com/huggingface/transformers/pull/15255
    • Fix code examples by @NielsRogge in https://github.com/huggingface/transformers/pull/15257
    • Adds missing module_specs for usages of _LazyModule by @jkuball in https://github.com/huggingface/transformers/pull/15230
    • Prepare ONNX export for torch v1.11 by @lewtun in https://github.com/huggingface/transformers/pull/15270
    • Fix by @novice03 in https://github.com/huggingface/transformers/pull/15276
    • Move BART + ONNX example to research_projects by @lewtun in https://github.com/huggingface/transformers/pull/15271
    • Specify providers explicitly in ORT session initialization by @wangyems in https://github.com/huggingface/transformers/pull/15235
    • Fixes Benchmark example link by @evandrosks in https://github.com/huggingface/transformers/pull/15278
    • [Robust Speech Challenge] Add timeline by @patrickvonplaten in https://github.com/huggingface/transformers/pull/15274
    • [Fix doc example] TFLayoutLMForTokenClassification: missing import tf by @ydshieh in https://github.com/huggingface/transformers/pull/15268
    • [Wav2Vec2ProcessorWithLM] improve multi processing by @patrickvonplaten in https://github.com/huggingface/transformers/pull/15247
    • Refine errors for pretrained objects by @sgugger in https://github.com/huggingface/transformers/pull/15261
    • [PyTorch-nightly-test] Fix Wav2Vec2 LM & Phoneme tests by @patrickvonplaten in https://github.com/huggingface/transformers/pull/15272
    • Update eval.py by @patrickvonplaten in https://github.com/huggingface/transformers/pull/15310
    • Update CONTRIBUTING.md by @kamalkraj in https://github.com/huggingface/transformers/pull/15290
    • Fix a typo in tag addition by @sgugger in https://github.com/huggingface/transformers/pull/15286
    • Remove old debug code leftover. by @Narsil in https://github.com/huggingface/transformers/pull/15306
    • [Fix doc example] fix missing import jnp by @ydshieh in https://github.com/huggingface/transformers/pull/15291
    • [LayoutLMV2 Tests] Make sure input is on GPU by @patrickvonplaten in https://github.com/huggingface/transformers/pull/15314
    • Replace NystromformerTokenizer with AutoTokenizer by @novice03 in https://github.com/huggingface/transformers/pull/15312
    • [Beam Search] Correct returned beam scores by @patrickvonplaten in https://github.com/huggingface/transformers/pull/14654
    • [Examples] Correct run ner label2id for fine-tuned models by @patrickvonplaten in https://github.com/huggingface/transformers/pull/15017
    • Avoid using get_list_of_files by @sgugger in https://github.com/huggingface/transformers/pull/15287
    • [Tests] Fix test by @NielsRogge in https://github.com/huggingface/transformers/pull/15324
    • Add 🤗 Accelerate tutorial by @stevhliu in https://github.com/huggingface/transformers/pull/15263
    • Added missing code in exemplary notebook - custom datasets fine-tuning by @Pawloch247 in https://github.com/huggingface/transformers/pull/15300
    • Fix encoder-decoder models when labels is passed by @ydshieh in https://github.com/huggingface/transformers/pull/15172
    • Fix table formatting in SegFormer docs by @deppen8 in https://github.com/huggingface/transformers/pull/15337
    • Fix deepspeed docs by @ngoquanghuy99 in https://github.com/huggingface/transformers/pull/15346
    • Fix 'eval_split_name' described as defaulting to 'train' by @FremyCompany in https://github.com/huggingface/transformers/pull/15348
    • Update doc writing guide by @sgugger in https://github.com/huggingface/transformers/pull/15350
    • Add YOSO by @novice03 in https://github.com/huggingface/transformers/pull/15091
    • [docs] post-PR merge fix by @stas00 in https://github.com/huggingface/transformers/pull/15355
    • Fix YosoConfig doc by @sgugger in https://github.com/huggingface/transformers/pull/15353
    • [DocTests Speech] Add doc tests for all speech models by @patrickvonplaten in https://github.com/huggingface/transformers/pull/15031
    • Push to hub save by @sgugger in https://github.com/huggingface/transformers/pull/15327
    • Fix KerasMetricCallback prediction with generate() and inference of column names by @Rocketknight1 in https://github.com/huggingface/transformers/pull/15351
    • Add a device argument to the eval script by @anton-l in https://github.com/huggingface/transformers/pull/15371
    • improve saving strategy of sentencepiece tokenizer by @SaulLu in https://github.com/huggingface/transformers/pull/15328
    • Implement fixes for TrainingArguments doc by @sgugger in https://github.com/huggingface/transformers/pull/15370
    • Super-small fix stops us confusing Keras console logging by modifying… by @Rocketknight1 in https://github.com/huggingface/transformers/pull/15373
    • Add proper documentation for Keras callbacks by @sgugger in https://github.com/huggingface/transformers/pull/15374
    • Example script for PushToHubCallback by @Rocketknight1 in https://github.com/huggingface/transformers/pull/15375

    Impressive community contributors

    The community contributors below have significantly contributed to the v4.16.0 release. Thank you!

    • @novice03, for contributing Nyströmformer, Swin Transformer and YOSO
    • @qqaatw, for contributing REALM
    • @stancld, for adding support for ELECTRA as a decoder, and porting RoFormer to Flax
    • @ydshieh, for a myriad of documentation fixes, the port of CLIP to TensorFlow, the addition of the TensorFlow vision encoder-decoder model, and the contribution of an image captioning example in Flax.

    New Contributors

    • @YangDong2002 made their first contribution in https://github.com/huggingface/transformers/pull/14894
    • @Sanster made their first contribution in https://github.com/huggingface/transformers/pull/14917
    • @kleinay made their first contribution in https://github.com/huggingface/transformers/pull/14930
    • @MihaiBalint made their first contribution in https://github.com/huggingface/transformers/pull/14946
    • @milyiyo made their first contribution in https://github.com/huggingface/transformers/pull/15002
    • @mattchurgin made their first contribution in https://github.com/huggingface/transformers/pull/15037
    • @itsTurner made their first contribution in https://github.com/huggingface/transformers/pull/14963
    • @kct22aws made their first contribution in https://github.com/huggingface/transformers/pull/14982
    • @yoquankara made their first contribution in https://github.com/huggingface/transformers/pull/14082
    • @cody-moveworks made their first contribution in https://github.com/huggingface/transformers/pull/15019
    • @MaximovaIrina made their first contribution in https://github.com/huggingface/transformers/pull/14868
    • @JejuWayfarer made their first contribution in https://github.com/huggingface/transformers/pull/15099
    • @novice03 made their first contribution in https://github.com/huggingface/transformers/pull/14659
    • @banda-larga made their first contribution in https://github.com/huggingface/transformers/pull/15125
    • @manuelciosici made their first contribution in https://github.com/huggingface/transformers/pull/14744
    • @carlos-aguayo made their first contribution in https://github.com/huggingface/transformers/pull/15142
    • @gante made their first contribution in https://github.com/huggingface/transformers/pull/15146
    • @AK391 made their first contribution in https://github.com/huggingface/transformers/pull/15106
    • @MrinalTyagi made their first contribution in https://github.com/huggingface/transformers/pull/14777
    • @jsnfly made their first contribution in https://github.com/huggingface/transformers/pull/15056
    • @jkuball made their first contribution in https://github.com/huggingface/transformers/pull/15230
    • @wangyems made their first contribution in https://github.com/huggingface/transformers/pull/15235
    • @evandrosks made their first contribution in https://github.com/huggingface/transformers/pull/15278
    • @Pawloch247 made their first contribution in https://github.com/huggingface/transformers/pull/15300
    • @deppen8 made their first contribution in https://github.com/huggingface/transformers/pull/15337
    • @ngoquanghuy99 made their first contribution in https://github.com/huggingface/transformers/pull/15346

    Full Changelog: https://github.com/huggingface/transformers/compare/v4.15.0...v4.16.0

    Source code(tar.gz)
    Source code(zip)
  • v4.15.0(Dec 22, 2021)

    New Model additions

    WavLM

    WavLM was proposed in WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing by Sanyuan Chen, Chengyi Wang, Zhengyang Chen, Yu Wu, Shujie Liu, Zhuo Chen, Jinyu Li, Naoyuki Kanda, Takuya Yoshioka, Xiong Xiao, Jian Wu, Long Zhou, Shuo Ren, Yanmin Qian, Yao Qian, Jian Wu, Michael Zeng, Furu Wei.

    WavLM sets a new SOTA on the SUPERB benchmark.

    Compatible checkpoints can be found on the hub: https://huggingface.co/models?other=wavlm

    • Add WavLM by @patrickvonplaten in https://github.com/huggingface/transformers/pull/14354

    Wav2Vec2Phoneme

    Wav2Vec2Phoneme was proposed in Simple and Effective Zero-shot Cross-lingual Phoneme Recognition by Qiantong Xu, Alexei Baevski, Michael Auli. Wav2Vec2Phoneme allows to do phoneme classification as part of automatic speech recognition

    • [Wav2Vec2 Phoneme] Let phonemizer lang default to tokenizer's settings by @patrickvonplaten in https://github.com/huggingface/transformers/pull/14829

    Compatible checkpoints can be found on the hub: https://huggingface.co/models?other=phoneme-recognition

    UniSpeech-SAT

    Unispeech-SAT was proposed in UNISPEECH-SAT: UNIVERSAL SPEECH REPRESENTATION LEARNING WITH SPEAKER AWARE PRE-TRAINING by Sanyuan Chen, Yu Wu, Chengyi Wang, Zhengyang Chen, Zhuo Chen, Shujie Liu, Jian Wu, Yao Qian, Furu Wei, Jinyu Li, Xiangzhan Yu.

    UniSpeech-SAT is especially good at speaker related tasks.

    Compatible checkpoints can be found on the hub: https://huggingface.co/models?other=unispeech-sat

    UniSpeech

    Unispeech was proposed in UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data by Chengyi Wang, Yu Wu, Yao Qian, Kenichi Kumatani, Shujie Liu, Furu Wei, Michael Zeng, Xuedong Huang. Three new models are released as part of the ImageGPT integration: ImageGPTModel, ImageGPTForCausalImageModeling, ImageGPTForImageClassification, in PyTorch.

    Compatible checkpoints can be found on the hub: https://huggingface.co/models?other=unispeech

    New Tasks

    Speaker Diarization and Verification

    Wav2Vec2-like architecture now have a speaker diarization and speaker verification head added to their architectures. You can try out the new task here: https://huggingface.co/spaces/microsoft/wavlm-speaker-verification

    • Add Speaker Diarization and Verification heads by @anton-l in https://github.com/huggingface/transformers/pull/14723

    What's Changed

    • Move import to avoid circular import by @sgugger in https://github.com/huggingface/transformers/pull/14787
    • PoC for conserving old links by @sgugger in https://github.com/huggingface/transformers/pull/14754
    • Removes images to put them in a dataset by @LysandreJik in https://github.com/huggingface/transformers/pull/14781
    • Post sphinx-clean up and contributing guide updates by @sgugger in https://github.com/huggingface/transformers/pull/14790
    • Fix the build documentation job by @sgugger in https://github.com/huggingface/transformers/pull/14788
    • Update CONTRIBUTING.md by @kamalkraj in https://github.com/huggingface/transformers/pull/14799
    • Update CONTRIBUTING.md by @kamalkraj in https://github.com/huggingface/transformers/pull/14800
    • Train step fix by @Rocketknight1 in https://github.com/huggingface/transformers/pull/14796
    • [Generate] Make generate multi-modal by @patrickvonplaten in https://github.com/huggingface/transformers/pull/14784
    • Remove require_datasets testing utility by @LysandreJik in https://github.com/huggingface/transformers/pull/14795
    • [WavLM] Correct position bias computation by @patrickvonplaten in https://github.com/huggingface/transformers/pull/14805
    • Fix Perceiver multi GPU test by @NielsRogge in https://github.com/huggingface/transformers/pull/14810
    • [WavLM] Layerdrop is not allowed for first layer by @patrickvonplaten in https://github.com/huggingface/transformers/pull/14811
    • [Generate] Correct input_ids detection by @patrickvonplaten in https://github.com/huggingface/transformers/pull/14815
    • Implement head_mask for Flax BERT and other models copied from BERT by @stancld in https://github.com/huggingface/transformers/pull/14620
    • Convert rst to mdx bert by @LysandreJik in https://github.com/huggingface/transformers/pull/14806
    • Wav2Vec2 meets phonemes by @patrickvonplaten in https://github.com/huggingface/transformers/pull/14353
    • [ImageGPT] Deprecate pixel_values input name to input_ids by @patrickvonplaten in https://github.com/huggingface/transformers/pull/14801
    • [Seq2SeqTrainer] Remove model input name hack by @patrickvonplaten in https://github.com/huggingface/transformers/pull/14802
    • [WavLM] Fix slow tests by @patrickvonplaten in https://github.com/huggingface/transformers/pull/14845
    • Add SD and SV heads for WavLM by @anton-l in https://github.com/huggingface/transformers/pull/14847
    • Add an argument to set bucket_cap_mb for PyTorch DDP by @changlan in https://github.com/huggingface/transformers/pull/14756
    • Update CONTRIBUTING.md by @kamalkraj in https://github.com/huggingface/transformers/pull/14835
    • Fix dead link to benchmarks.ipynb by @DerekChia in https://github.com/huggingface/transformers/pull/14842
    • [Perceiver] Skip multi-gpu tests for now by @patrickvonplaten in https://github.com/huggingface/transformers/pull/14813
    • Add 'with torch.no_grad()' to DeBERTa integration test forward pass by @henholm in https://github.com/huggingface/transformers/pull/14821
    • Add 'with torch.no_grad()' to BERT integration test forward pass by @henholm in https://github.com/huggingface/transformers/pull/14820
    • Add a main_input_name attribute to all models by @sgugger in https://github.com/huggingface/transformers/pull/14803
    • [doc] typo by @stas00 in https://github.com/huggingface/transformers/pull/14849
    • [logging] implement warning_advice / TRANSFORMERS_NO_ADVISORY_WARNINGS by @stas00 in https://github.com/huggingface/transformers/pull/14669
    • Make the onnx submodule init lazy by @sgugger in https://github.com/huggingface/transformers/pull/14855
    • Convert docstrings of modeling files by @sgugger in https://github.com/huggingface/transformers/pull/14850
    • [Bart] better error message by @patrickvonplaten in https://github.com/huggingface/transformers/pull/14854
    • Only create the model card on process 0 by @sgugger in https://github.com/huggingface/transformers/pull/14857
    • [ASR example] Improve example + add more examples by @patrickvonplaten in https://github.com/huggingface/transformers/pull/14848
    • Fix the value error typo of AdamW's betas' valid values checking by @dourgey in https://github.com/huggingface/transformers/pull/14780
    • Add custom stopping_criteria and logits_processor to generate by @lvwerra in https://github.com/huggingface/transformers/pull/14779
    • Replace commit sha by commit url for update jobs by @sgugger in https://github.com/huggingface/transformers/pull/14852
    • [examples/summarization] deal with None in data records by @stas00 in https://github.com/huggingface/transformers/pull/14816
    • [doc porting] several docs by @stas00 in https://github.com/huggingface/transformers/pull/14858
    • Mass conversion of documentation from rst to Markdown by @sgugger in https://github.com/huggingface/transformers/pull/14866
    • Fix FLAX_MULTIPLE_CHOICE_SAMPLE typo by @mishig25 in https://github.com/huggingface/transformers/pull/14871
    • Fixes in marian doc by @sgugger in https://github.com/huggingface/transformers/pull/14872
    • Fix FlaxMarianMTModel return block. by @sgugger in https://github.com/huggingface/transformers/pull/14873
    • Fix doc mistakes by @sgugger in https://github.com/huggingface/transformers/pull/14874
    • Convert model files from rst to mdx by @LysandreJik in https://github.com/huggingface/transformers/pull/14865
    • update the arguments add_prefix_space and trim_offsets in backend_tokenizer.post_processor of RobertaTokenizerFast by @SaulLu in https://github.com/huggingface/transformers/pull/14752
    • Feature/fix slow test in mluke by @Ryou0634 in https://github.com/huggingface/transformers/pull/14749
    • Updated deberta attention by @guillaume-be in https://github.com/huggingface/transformers/pull/14625
    • IterableDatasetShard should use per device batch size instead of real… by @SysuCharon in https://github.com/huggingface/transformers/pull/14714
    • Fix Perceiver code example by @NielsRogge in https://github.com/huggingface/transformers/pull/14879
    • Fix pytorch image classification example by @mariosasko in https://github.com/huggingface/transformers/pull/14883
    • Onnx enable tasks for supported models (part 2) by @michaelbenayoun in https://github.com/huggingface/transformers/pull/14700
    • Properly indent return block by @sgugger in https://github.com/huggingface/transformers/pull/14887

    New Contributors

    • @changlan made their first contribution in https://github.com/huggingface/transformers/pull/14756
    • @DerekChia made their first contribution in https://github.com/huggingface/transformers/pull/14842
    • @henholm made their first contribution in https://github.com/huggingface/transformers/pull/14821
    • @dourgey made their first contribution in https://github.com/huggingface/transformers/pull/14780
    • @SysuCharon made their first contribution in https://github.com/huggingface/transformers/pull/14714

    Full Changelog: https://github.com/huggingface/transformers/compare/v4.14.0...v4.15.0

    Source code(tar.gz)
    Source code(zip)
  • v4.14.1(Dec 15, 2021)

  • v4.14.0(Dec 15, 2021)

    Perceiver

    The Perceiver model was released in the previous version:

    Perceiver

    Eight new models are released as part of the Perceiver implementation: PerceiverModel, PerceiverForMaskedLM, PerceiverForSequenceClassification, PerceiverForImageClassificationLearned, PerceiverForImageClassificationFourier, PerceiverForImageClassificationConvProcessing, PerceiverForOpticalFlow, PerceiverForMultimodalAutoencoding, in PyTorch.

    The Perceiver IO model was proposed in Perceiver IO: A General Architecture for Structured Inputs & Outputs by Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hénaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, João Carreira.

    • Add Perceiver IO by @NielsRogge in https://github.com/huggingface/transformers/pull/14487

    Compatible checkpoints can be found on the hub: https://huggingface.co/models?other=perceiver

    Version v4.14.0 adds support for Perceiver in multiple pipelines, including the fill mask and sequence classification pipelines.

    Keras model cards

    The Keras push to hub callback now generates model cards when pushing to the model hub. Additionally to the callback, model cards will be generated by default by the model.push_to_hub() method.

    • TF model cards by @Rocketknight1 in https://github.com/huggingface/transformers/pull/14720

    What's Changed

    • Fix : wrong link in the documentation (ConvBERT vs DistilBERT) by @Tikquuss in https://github.com/huggingface/transformers/pull/14705

    • Put back open in colab markers by @sgugger in https://github.com/huggingface/transformers/pull/14684

    • Fix doc examples: KeyError by @ydshieh in https://github.com/huggingface/transformers/pull/14699

    • Fix doc examples: 'CausalLMOutput...' object has no attribute 'last_hidden_state' by @ydshieh in https://github.com/huggingface/transformers/pull/14678

    • Adding Perceiver to AutoTokenizer. by @Narsil in https://github.com/huggingface/transformers/pull/14711

    • Fix doc examples: unexpected keyword argument by @ydshieh in https://github.com/huggingface/transformers/pull/14689

    • Automatically build doc notebooks by @sgugger in https://github.com/huggingface/transformers/pull/14718

    • Fix special character in MDX by @sgugger in https://github.com/huggingface/transformers/pull/14721

    • Fixing tests for perceiver (texts) by @Narsil in https://github.com/huggingface/transformers/pull/14719

    • [doc] document MoE model approach and current solutions by @stas00 in https://github.com/huggingface/transformers/pull/14725

    • [Flax examples] remove dependancy on pytorch training args by @patil-suraj in https://github.com/huggingface/transformers/pull/14636

    • Update bug-report.md by @patrickvonplaten in https://github.com/huggingface/transformers/pull/14715

    • [Adafactor] Fix adafactor by @patrickvonplaten in https://github.com/huggingface/transformers/pull/14713

    • Code parrot minor fixes/niceties by @ncoop57 in https://github.com/huggingface/transformers/pull/14666

    • Fix doc examples: modify config before super().init by @ydshieh in https://github.com/huggingface/transformers/pull/14697

    • Improve documentation of some models by @NielsRogge in https://github.com/huggingface/transformers/pull/14695

    • Skip Perceiver tests by @LysandreJik in https://github.com/huggingface/transformers/pull/14745

    • Add ability to get a list of supported pipeline tasks by @codesue in https://github.com/huggingface/transformers/pull/14732

    • Fix the perceiver docs by @LysandreJik in https://github.com/huggingface/transformers/pull/14748

    • [CI/pt-nightly] switch to cuda-11.3 by @stas00 in https://github.com/huggingface/transformers/pull/14726

    • Swap TF and PT code inside two blocks by @LucienShui in https://github.com/huggingface/transformers/pull/14742

    • Fix doc examples: cannot import name by @ydshieh in https://github.com/huggingface/transformers/pull/14698

    • Fix: change tooslow to slow by @ydshieh in https://github.com/huggingface/transformers/pull/14734

    • Small fixes for the doc by @sgugger in https://github.com/huggingface/transformers/pull/14751

    • Update transformers metadata by @sgugger in https://github.com/huggingface/transformers/pull/14724

    • Mention no images added to repository by @LysandreJik in https://github.com/huggingface/transformers/pull/14738

    • Avoid using tf.tile in embeddings for TF models by @ydshieh in https://github.com/huggingface/transformers/pull/14735

    • Change how to load config of XLNetLMHeadModel by @josutk in https://github.com/huggingface/transformers/pull/14746

    • Improve perceiver by @NielsRogge in https://github.com/huggingface/transformers/pull/14750

    • Convert Trainer doc page to MarkDown by @sgugger in https://github.com/huggingface/transformers/pull/14753

    • Update Table of Contents by @sgugger in https://github.com/huggingface/transformers/pull/14755

    • Fixing tests for Perceiver by @Narsil in https://github.com/huggingface/transformers/pull/14739

    • Make data shuffling in run_clm_flax.py respect global seed by @bminixhofer in https://github.com/huggingface/transformers/pull/13410

    • Adding support for multiple mask tokens. by @Narsil in https://github.com/huggingface/transformers/pull/14716

    • Fix broken links to distillation on index page of documentation by @amitness in https://github.com/huggingface/transformers/pull/14722

    • [doc] performance: groups of operations by compute-intensity by @stas00 in https://github.com/huggingface/transformers/pull/14757

    • Fix the doc_build_test job by @sgugger in https://github.com/huggingface/transformers/pull/14774

    • Fix preprocess_function in run_summarization_flax.py by @ydshieh in https://github.com/huggingface/transformers/pull/14769

    • Simplify T5 docs by @xhlulu in https://github.com/huggingface/transformers/pull/14776

    • Update Perceiver code examples by @NielsRogge in https://github.com/huggingface/transformers/pull/14783

    New Contributors

    • @Tikquuss made their first contribution in https://github.com/huggingface/transformers/pull/14705
    • @codesue made their first contribution in https://github.com/huggingface/transformers/pull/14732
    • @LucienShui made their first contribution in https://github.com/huggingface/transformers/pull/14742
    • @josutk made their first contribution in https://github.com/huggingface/transformers/pull/14746
    • @amitness made their first contribution in https://github.com/huggingface/transformers/pull/14722

    Full Changelog: https://github.com/huggingface/transformers/compare/v4.13.0...v4.14.0

    Source code(tar.gz)
    Source code(zip)
  • v4.13.0(Dec 9, 2021)

    New Model additions

    Perceiver

    Eight new models are released as part of the Perceiver implementation: PerceiverModel, PerceiverForMaskedLM, PerceiverForSequenceClassification, PerceiverForImageClassificationLearned, PerceiverForImageClassificationFourier, PerceiverForImageClassificationConvProcessing, PerceiverForOpticalFlow, PerceiverForMultimodalAutoencoding, in PyTorch.

    The Perceiver IO model was proposed in Perceiver IO: A General Architecture for Structured Inputs & Outputs by Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hénaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, João Carreira.

    • Add Perceiver IO by @NielsRogge in https://github.com/huggingface/transformers/pull/14487

    Compatible checkpoints can be found on the hub: https://huggingface.co/models?other=perceiver

    mLUKE

    The mLUKE tokenizer is added. The tokenizer can be used for the multilingual variant of LUKE.

    The mLUKE model was proposed in mLUKE: The Power of Entity Representations in Multilingual Pretrained Language Models by Ryokan Ri, Ikuya Yamada, and Yoshimasa Tsuruoka. It's a multilingual extension of the LUKE model trained on the basis of XLM-RoBERTa.

    • Add mLUKE by @Ryou0634 in https://github.com/huggingface/transformers/pull/14640

    Compatible checkpoints can be found on the hub: https://huggingface.co/models?other=luke

    ImageGPT

    Three new models are released as part of the ImageGPT integration: ImageGPTModel, ImageGPTForCausalImageModeling, ImageGPTForImageClassification, in PyTorch.

    The ImageGPT model was proposed in Generative Pretraining from Pixels by Mark Chen, Alec Radford, Rewon Child, Jeffrey Wu, Heewoo Jun, David Luan, Ilya Sutskever. ImageGPT (iGPT) is a GPT-2-like model trained to predict the next pixel value, allowing for both unconditional and conditional image generation.

    • Add ImageGPT by @NielsRogge in https://github.com/huggingface/transformers/pull/14240

    Compatible checkpoints can be found on the hub: https://huggingface.co/models?other=imagegpt

    QDQBert

    Eight new models are released as part of the QDQBert implementation: QDQBertModel, QDQBertLMHeadModel, QDQBertForMaskedLM, QDQBertForSequenceClassification, QDQBertForNextSentencePrediction, QDQBertForMultipleChoice, QDQBertForTokenClassification, QDQBertForQuestionAnswering, in PyTorch.

    The QDQBERT model can be referenced in Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation by Hao Wu, Patrick Judd, Xiaojie Zhang, Mikhail Isaev and Paulius Micikevicius.

    • Add QDQBert model and quantization examples of SQUAD task by @shangz-ai in https://github.com/huggingface/transformers/pull/14066

    Semantic Segmentation models

    The semantic Segmentation models' API is unstable and bound to change between this version and the next.

    The first semantic segmentation models are added. In semantic segmentation, the goal is to predict a class label for every pixel of an image. The models that are added are SegFormer (by NVIDIA) and BEiT (by Microsoft Research). BEiT was already available in the library, but this release includes the model with a semantic segmentation head.

    The SegFormer model was proposed in SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers by Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M. Alvarez, Ping Luo. The model consists of a hierarchical Transformer encoder and a lightweight all-MLP decode head to achieve great results on image segmentation benchmarks such as ADE20K and Cityscapes.

    The BEiT model was proposed in BEiT: BERT Pre-Training of Image Transformers by Hangbo Bao, Li Dong, Furu Wei. Rather than pre-training the model to predict the class of an image (as done in the original ViT paper), BEiT models are pre-trained to predict visual tokens from the codebook of OpenAI’s DALL-E model given masked patches.

    • Add SegFormer by @NielsRogge in https://github.com/huggingface/transformers/pull/14019
    • Add BeitForSemanticSegmentation by @NielsRogge in https://github.com/huggingface/transformers/pull/14096

    Vision-text dual encoder

    Adds VisionTextDualEncoder model in PyTorch and Flax to be able to load any pre-trained vision (ViT, DeiT, BeiT, CLIP's vision model) and text (BERT, ROBERTA) model in the library for vision-text tasks like CLIP.

    This model pairs a vision and text encoder and adds projection layers to project the embeddings to another embeddings space with similar dimensions. which can then be used to align the two modalities.

    • VisionTextDualEncoder by @patil-suraj in https://github.com/huggingface/transformers/pull/13511

    CodeParrot

    CodeParrot, a model trained to generate code, has been open-sourced in the research projects by @lvwerra.

    • Add CodeParrot 🦜 codebase by @lvwerra in https://github.com/huggingface/transformers/pull/14536

    Language model support for ASR

    • Add language model support for CTC models by @patrickvonplaten in https://github.com/huggingface/transformers/pull/14339 Language model boosted decoding is added for all CTC models via https://github.com/kensho-technologies/pyctcdecode and https://github.com/kpu/kenlm.

    See https://huggingface.co/patrickvonplaten/wav2vec2-xlsr-53-es-kenlm for more information.

    Flax-specific additions

    Adds Flax version of the vision encoder-decoder model, and adds a Flax version of GPT-J.

    • Add FlaxVisionEncoderDecoderModel by @ydshieh in https://github.com/huggingface/transformers/pull/13359
    • FlaxGPTJ by @patil-suraj in https://github.com/huggingface/transformers/pull/14396

    TensorFlow-specific additions

    Vision transformers are here! Convnets are so 2012, now that ML is converging on self-attention as a universal model.

    • Add TFViTModel by @ydshieh in https://github.com/huggingface/transformers/pull/13778

    Want to handle real-world tables, where text and data are positioned in a 2D grid? TAPAS is now here for both TensorFlow and PyTorch.

    • Tapas tf by @kamalkraj in https://github.com/huggingface/transformers/pull/13393

    Automatic checkpointing and cloud saves to the HuggingFace Hub during training are now live, allowing you to resume training when it's interrupted, even if your initial instance is terminated. This is an area of very active development - watch this space for future developments, including automatic model card creation and more.

    • Add model checkpointing to push_to_hub and PushToHubCallback by @Rocketknight1 in https://github.com/huggingface/transformers/pull/14492

    Auto-processors

    A new class to automatically select processors is added: AutoProcessor. It can be used for all models that require a processor, in both computer vision and audio.

    • Auto processor by @sgugger in https://github.com/huggingface/transformers/pull/14465

    New documentation frontend

    A new documentation frontend is out for the transformers library! The goal with this documentation is to be better aligned with the rest of our website, and contains tools to improve readability. The documentation can now be written in markdown rather than RST.

    • Doc new front by @sgugger in https://github.com/huggingface/transformers/pull/14590

    LayoutLM Improvements

    The LayoutLMv2 feature extractor now supports non-English languages, and LayoutXLM gets its own processor.

    • LayoutLMv2FeatureExtractor now supports non-English languages when applying Tesseract OCR. by @Xargonus in https://github.com/huggingface/transformers/pull/14514
    • Add LayoutXLMProcessor (and LayoutXLMTokenizer, LayoutXLMTokenizerFast) by @NielsRogge in https://github.com/huggingface/transformers/pull/14115

    Trainer Improvements

    You can now take advantage of the Ampere hardware with the Trainer:

    • --bf16 - do training or eval in mixed precision of bfloat16
    • --bf16_full_eval - do eval in full bfloat16
    • --tf32 control having TF32 mode on/off

    Improvements and bugfixes

    • Replace assertions with RuntimeError exceptions by @ddrm86 in https://github.com/huggingface/transformers/pull/14186
    • Adding batch_size support for (almost) all pipelines by @Narsil in https://github.com/huggingface/transformers/pull/13724
    • Remove n_ctx from configs by @thomasw21 in https://github.com/huggingface/transformers/pull/14165
    • Add BlenderbotTokenizerFast by @stancld in https://github.com/huggingface/transformers/pull/13720
    • Adding handle_long_generation paramters for text-generation pipeline. by @Narsil in https://github.com/huggingface/transformers/pull/14118
    • Fix pipeline tests env and fetch by @sgugger in https://github.com/huggingface/transformers/pull/14209
    • Generalize problem_type to all sequence classification models by @sgugger in https://github.com/huggingface/transformers/pull/14180
    • Fixing image segmentation with inference mode. by @Narsil in https://github.com/huggingface/transformers/pull/14204
    • Add a condition for checking labels by @hrxorxm in https://github.com/huggingface/transformers/pull/14211
    • Torch 1.10 by @LysandreJik in https://github.com/huggingface/transformers/pull/14169
    • Add more missing models to models/init.py by @ydshieh in https://github.com/huggingface/transformers/pull/14177
    • Clarify QA examples by @NielsRogge in https://github.com/huggingface/transformers/pull/14172
    • Fixing image-segmentation tests. by @Narsil in https://github.com/huggingface/transformers/pull/14223
    • Tensor location is already handled by @Narsil in https://github.com/huggingface/transformers/pull/14224
    • Raising exceptions instead of using assertions for few models by @pdcoded in https://github.com/huggingface/transformers/pull/14219
    • Fix the write problem in trainer.py comment by @wmathor in https://github.com/huggingface/transformers/pull/14202
    • [GPTJ] enable common tests and few fixes by @patil-suraj in https://github.com/huggingface/transformers/pull/14190
    • improving efficiency of mlflow metric logging by @wamartin-aml in https://github.com/huggingface/transformers/pull/14232
    • Fix generation docstring by @qqaatw in https://github.com/huggingface/transformers/pull/14216
    • Fix test_configuration_tie in FlaxEncoderDecoderModelTest by @ydshieh in https://github.com/huggingface/transformers/pull/14076
    • [Tests] Fix DistilHubert path by @anton-l in https://github.com/huggingface/transformers/pull/14245
    • Add PushToHubCallback in main init by @sgugger in https://github.com/huggingface/transformers/pull/14246
    • Fixes Beit training for PyTorch 1.10+ by @sgugger in https://github.com/huggingface/transformers/pull/14249
    • Added Beit model ouput class by @lumliolum in https://github.com/huggingface/transformers/pull/14133
    • Update Transformers to huggingface_hub >= 0.1.0 by @sgugger in https://github.com/huggingface/transformers/pull/14251
    • Add cross attentions to TFGPT2Model by @ydshieh in https://github.com/huggingface/transformers/pull/14038
    • [Wav2Vec2] Adapt conversion script by @patrickvonplaten in https://github.com/huggingface/transformers/pull/14258
    • Put load_image function in image_utils.py & fix image rotation issue by @mishig25 in https://github.com/huggingface/transformers/pull/14062
    • minimal fixes to run DataCollatorForWholeWordMask with return_tensors="np" and return_tensors="tf" by @dwyatte in https://github.com/huggingface/transformers/pull/13891
    • Adding support for truncation parameter on feature-extraction pipeline. by @Narsil in https://github.com/huggingface/transformers/pull/14193
    • Fix of issue #13327: Wrong weight initialization for TF t5 model by @dshirron in https://github.com/huggingface/transformers/pull/14241
    • Fixing typo in error message. by @Narsil in https://github.com/huggingface/transformers/pull/14226
    • Pin Keras cause they messed their release by @sgugger in https://github.com/huggingface/transformers/pull/14262
    • Quality explain by @sgugger in https://github.com/huggingface/transformers/pull/14264
    • Add more instructions to the release guide by @sgugger in https://github.com/huggingface/transformers/pull/14263
    • Fixing slow pipeline tests by @Narsil in https://github.com/huggingface/transformers/pull/14260
    • Fixing mishandling of ignore_labels. by @Narsil in https://github.com/huggingface/transformers/pull/14274
    • improve rewrite state_dict missing _metadata by @changwangss in https://github.com/huggingface/transformers/pull/14276
    • Removing Keras version pinning by @Rocketknight1 in https://github.com/huggingface/transformers/pull/14280
    • Pin TF until tests are fixed by @sgugger in https://github.com/huggingface/transformers/pull/14283
    • [Hubert Docs] Make sure example uses a fine-tuned model by @patrickvonplaten in https://github.com/huggingface/transformers/pull/14291
    • Add new LFS prune API by @sgugger in https://github.com/huggingface/transformers/pull/14294
    • Remove DPRPretrainedModel from docs by @xhlulu in https://github.com/huggingface/transformers/pull/14300
    • Handle long answer needs to be updated. by @Narsil in https://github.com/huggingface/transformers/pull/14279
    • [tests] Fix SegFormer and BEiT tests by @NielsRogge in https://github.com/huggingface/transformers/pull/14289
    • Fix typo on PPLM example README by @Beomi in https://github.com/huggingface/transformers/pull/14287
    • [Marian Conversion] Fix eos_token_id conversion in conversion script by @patrickvonplaten in https://github.com/huggingface/transformers/pull/14320
    • [Tests] Update audio classification tests to support torch 1.10 by @anton-l in https://github.com/huggingface/transformers/pull/14318
    • [TFWav2Vec2Model] Fix input shapes in TFWav2Vec2WeightNormConv1D by @anton-l in https://github.com/huggingface/transformers/pull/14319
    • Fixing tests on master. by @Narsil in https://github.com/huggingface/transformers/pull/14317
    • Fixing mutable default argument in pipeline. by @Narsil in https://github.com/huggingface/transformers/pull/14316
    • Changed relative imports to absolute to allow convert_graph_to_onnx.py to run as a script. by @nbertagnolli in https://github.com/huggingface/transformers/pull/14325
    • Expand dynamic supported objects to configs and tokenizers by @sgugger in https://github.com/huggingface/transformers/pull/14296
    • [deepspeed] Enable multiple test runs on single box, defer to DS_TEST_PORT if set by @jeffra in https://github.com/huggingface/transformers/pull/14331
    • Small change to Wav2Vec2 model to support Tensor-Parallelism with DeepSpeed by @RezaYazdaniAminabadi in https://github.com/huggingface/transformers/pull/14298
    • Correct order of overflowing tokens for LayoutLmV2 tokenizer by @Apoorvgarg-creator in https://github.com/huggingface/transformers/pull/13495
    • Update Seq2Seq QA example script to use SQuAD metric. by @karthikrangasai in https://github.com/huggingface/transformers/pull/14335
    • remove an irrelevant test from test_modeling_tf_layoutlm by @ydshieh in https://github.com/huggingface/transformers/pull/14341
    • bump flax version by @patil-suraj in https://github.com/huggingface/transformers/pull/14343
    • Rewrite guides for fine-tuning with Datasets by @stevhliu in https://github.com/huggingface/transformers/pull/13923
    • [Bert2Bert] allow bert2bert + relative embeddings by @patrickvonplaten in https://github.com/huggingface/transformers/pull/14324
    • Support for TF >= 2.7 by @sgugger in https://github.com/huggingface/transformers/pull/14345
    • BatchFeature: Convert List[np.ndarray] to np.ndarray before converting to pytorch tensors by @eladsegal in https://github.com/huggingface/transformers/pull/14306
    • Adding some quality of life for pipeline function. by @Narsil in https://github.com/huggingface/transformers/pull/14322
    • Fix fast tokenization problems by @qqaatw in https://github.com/huggingface/transformers/pull/13930
    • Add notebook INC quantization for text classification tasks by @echarlaix in https://github.com/huggingface/transformers/pull/14293
    • enhance rewrite state_dict missing _metadata by @changwangss in https://github.com/huggingface/transformers/pull/14348
    • Fix list index out of range when padding nested empty lists by @qqaatw in https://github.com/huggingface/transformers/pull/13876
    • [testing] solve the port conflict by @stas00 in https://github.com/huggingface/transformers/pull/14362
    • Fix Flax params dtype by @patil-suraj in https://github.com/huggingface/transformers/pull/13098
    • [flax generate] allow passing params to encode by @patil-suraj in https://github.com/huggingface/transformers/pull/14370
    • Experimenting with adding proper get_config() and from_config() methods by @Rocketknight1 in https://github.com/huggingface/transformers/pull/14361
    • Fixing requirements for TF LM models and use correct model mappings by @Rocketknight1 in https://github.com/huggingface/transformers/pull/14372
    • fix loading flax bf16 weights in pt by @patil-suraj in https://github.com/huggingface/transformers/pull/14369
    • [wav2vec2] fix --gradient_checkpointing by @stas00 in https://github.com/huggingface/transformers/pull/13964
    • Adding support for raw python generator in addition to Dataset for pipelines by @Narsil in https://github.com/huggingface/transformers/pull/14352
    • minor doc fix by @patil-suraj in https://github.com/huggingface/transformers/pull/14377
    • [Wav2Vec2 Example] Improve fine-tuning script by @patrickvonplaten in https://github.com/huggingface/transformers/pull/14373
    • Use AlbertConverter for FNet instead of using FNet's own converter by @qqaatw in https://github.com/huggingface/transformers/pull/14365
    • Add support for WMT21 tokenizer in M2M100Tokenizer by @patil-suraj in https://github.com/huggingface/transformers/pull/14376
    • [M2M100Tokenizer] fix _build_translation_inputs by @patil-suraj in https://github.com/huggingface/transformers/pull/14382
    • Raise exceptions instead of using asserts in modeling_openai #12789 by @nbertagnolli in https://github.com/huggingface/transformers/pull/14386
    • [doc] performance and parallelism updates by @stas00 in https://github.com/huggingface/transformers/pull/14391
    • Quick fix to TF summarization example by @Rocketknight1 in https://github.com/huggingface/transformers/pull/14401
    • [Speech2Text2] Enable tokenizers by @patrickvonplaten in https://github.com/huggingface/transformers/pull/14390
    • Fix TFViT by @NielsRogge in https://github.com/huggingface/transformers/pull/14399
    • Fix weight loading issue by @ydshieh in https://github.com/huggingface/transformers/pull/14016
    • Replace BertLayerNorm with LayerNorm by @eldarkurtic in https://github.com/huggingface/transformers/pull/14385
    • [Wav2Vec2] Make sure that gradient checkpointing is only run if needed by @patrickvonplaten in https://github.com/huggingface/transformers/pull/14407
    • Allow per-version configurations by @LysandreJik in https://github.com/huggingface/transformers/pull/14344
    • Fix gradient_checkpointing backward compatibility by @sgugger in https://github.com/huggingface/transformers/pull/14408
    • Add forward method to dummy models by @sgugger in https://github.com/huggingface/transformers/pull/14419
    • Avoid looping when data exhausted by @valentindey in https://github.com/huggingface/transformers/pull/14413
    • Debug doc by @sgugger in https://github.com/huggingface/transformers/pull/14424
    • [Wav2Vec2] Add New Wav2Vec2 Translation by @patrickvonplaten in https://github.com/huggingface/transformers/pull/14392
    • Improve semantic segmentation models by @NielsRogge in https://github.com/huggingface/transformers/pull/14355
    • [Gradient checkpoining] Update Wav2Vec scripts by @falcaopetri in https://github.com/huggingface/transformers/pull/14036
    • [Bart] Fix docs by @patrickvonplaten in https://github.com/huggingface/transformers/pull/14434
    • [WIP] Ensure TF model configs can be converted to proper JSON by @Zahlii in https://github.com/huggingface/transformers/pull/14415
    • Recover Deleted XNLI Instructions by @Helw150 in https://github.com/huggingface/transformers/pull/14437
    • Fix EncoderDecoderModel code example by @NielsRogge in https://github.com/huggingface/transformers/pull/14441
    • Add a post init method to all models by @sgugger in https://github.com/huggingface/transformers/pull/14431
    • Fix finite IterableDataset test on multiple GPUs by @sgugger in https://github.com/huggingface/transformers/pull/14445
    • [Bert, et al] fix early device assignment by @stas00 in https://github.com/huggingface/transformers/pull/14447
    • Add GitPython to quality tools by @LysandreJik in https://github.com/huggingface/transformers/pull/14459
    • [ImageGPT] Small fixes by @NielsRogge in https://github.com/huggingface/transformers/pull/14460
    • [Generation] Allow inputs_embeds as an input by @patrickvonplaten in https://github.com/huggingface/transformers/pull/14443
    • Adding support for hidden_states and attentions in unbatching support. by @Narsil in https://github.com/huggingface/transformers/pull/14420
    • add Tuple as possible type hint for EvalPredictions label_ids by @ameasure in https://github.com/huggingface/transformers/pull/14473
    • Fix dummy objects for quantization by @sgugger in https://github.com/huggingface/transformers/pull/14478
    • Moving pipeline tests from Narsil to hf-internal-testing. by @Narsil in https://github.com/huggingface/transformers/pull/14463
    • Improve add-new-pipeline docs a bit by @stancld in https://github.com/huggingface/transformers/pull/14485
    • [test] add test for --config_overrides by @stas00 in https://github.com/huggingface/transformers/pull/14466
    • Support for Training with BF16 by @JamesDeAntonis in https://github.com/huggingface/transformers/pull/13207
    • fixes some key names for in LayoutLMv2 / LayoutXLM tokenizers by @valentindey in https://github.com/huggingface/transformers/pull/14493
    • Switch from using sum for flattening lists of lists in group_texts by @nbroad1881 in https://github.com/huggingface/transformers/pull/14472
    • [deepspeed] zero inference by @stas00 in https://github.com/huggingface/transformers/pull/14253
    • add cache_dir for tokenizer verification loading by @vmaryasin in https://github.com/huggingface/transformers/pull/14508
    • Fix feature extraction utils import by @LysandreJik in https://github.com/huggingface/transformers/pull/14515
    • [Tests] Improve vision tests by @NielsRogge in https://github.com/huggingface/transformers/pull/14458
    • [CI] clear ~/.cache/torch_extensions between builds by @stas00 in https://github.com/huggingface/transformers/pull/14520
    • Fix a slow test. by @Narsil in https://github.com/huggingface/transformers/pull/14527
    • added save_directories for _psave_pretrained_pt and _tf, changed model to tf_model and pt_model, enable the notebook to run cleanly from top to bottom without error by @cfregly in https://github.com/huggingface/transformers/pull/14529
    • Quicktour updates by @LysandreJik in https://github.com/huggingface/transformers/pull/14533
    • Fixes by @LysandreJik in https://github.com/huggingface/transformers/pull/14534
    • [flax] unfreeze initial cache in gpt models by @patil-suraj in https://github.com/huggingface/transformers/pull/14535
    • Tokenizers docs: Specify which class contains __call__ method by @xhlulu in https://github.com/huggingface/transformers/pull/14379
    • Rename ImageGPT by @NielsRogge in https://github.com/huggingface/transformers/pull/14526
    • [Generate] Fix generate with inputs_embeds on GPU by @patrickvonplaten in https://github.com/huggingface/transformers/pull/14564
    • [Flax] token-classification model steps enumerate start from 1 by @kamalkraj in https://github.com/huggingface/transformers/pull/14547
    • Fix sentinel token IDs in data collator for Flax T5 pretraining script by @rahuln in https://github.com/huggingface/transformers/pull/14477
    • Fix backend regex by @sgugger in https://github.com/huggingface/transformers/pull/14566
    • [Flax] Add FlaxBlenderbot by @stancld in https://github.com/huggingface/transformers/pull/13633
    • Add documentation for multi-label classification by @gsnidero in https://github.com/huggingface/transformers/pull/14168
    • use functional interface for softmax in attention by @t-vi in https://github.com/huggingface/transformers/pull/14198
    • Fix mask token handling by @qqaatw in https://github.com/huggingface/transformers/pull/14364
    • [doc] bf16/tf32 guide by @stas00 in https://github.com/huggingface/transformers/pull/14579
    • Rename toctree.yml -> _toctree.yml by @mishig25 in https://github.com/huggingface/transformers/pull/14594
    • Update doc img links by @mishig25 in https://github.com/huggingface/transformers/pull/14593
    • Adds a git pull instruction to the documentation builder by @LysandreJik in https://github.com/huggingface/transformers/pull/14597
    • [Flax] Add FlaxBlenderbotSmall by @stancld in https://github.com/huggingface/transformers/pull/14576
    • Python 3.6 -> Python 3.7 for TF runs by @LysandreJik in https://github.com/huggingface/transformers/pull/14598
    • change tf.math.divide with int(/) in distilbert model by @yis11178 in https://github.com/huggingface/transformers/pull/14600
    • fix #14524 (IndexError when mask prob is too low) by @nikvaessen in https://github.com/huggingface/transformers/pull/14525
    • Improve tokenizer tests by @qqaatw in https://github.com/huggingface/transformers/pull/13594
    • [CI] move env print to util, add pt, nccl versions by @stas00 in https://github.com/huggingface/transformers/pull/14607
    • 2022 is the year of multi-modality by @LysandreJik in https://github.com/huggingface/transformers/pull/14610
    • Fix doc builder by @LysandreJik in https://github.com/huggingface/transformers/pull/14616
    • [trainer] add tf32-mode control by @stas00 in https://github.com/huggingface/transformers/pull/14606
    • Make DefaultDataCollator importable from root by @Rocketknight1 in https://github.com/huggingface/transformers/pull/14588
    • fix a typo by @yuchenlin in https://github.com/huggingface/transformers/pull/14626
    • updated pytorch token-classification readme by @kamalkraj in https://github.com/huggingface/transformers/pull/14624
    • Add Flax example tests by @patil-suraj in https://github.com/huggingface/transformers/pull/14599
    • fix typo by @patil-suraj in https://github.com/huggingface/transformers/pull/14635
    • add flax example tests in CI workflow by @patil-suraj in https://github.com/huggingface/transformers/pull/14637
    • [urls to hub] Replace outdated model tags with their now-canonical pipeline types by @julien-c in https://github.com/huggingface/transformers/pull/14617
    • Update the example of exporting Bart + BeamSearch to ONNX module to resolve comments. by @fatcat-z in https://github.com/huggingface/transformers/pull/14310
    • Add GPTJForQuestionAnswering by @tucan9389 in https://github.com/huggingface/transformers/pull/14503
    • doc: mismatch between pooler/d_output by @guhur in https://github.com/huggingface/transformers/pull/14641
    • fix flax example tests by @patil-suraj in https://github.com/huggingface/transformers/pull/14643
    • Auto processor fix by @LysandreJik in https://github.com/huggingface/transformers/pull/14623
    • Fix syntax for class references by @sgugger in https://github.com/huggingface/transformers/pull/14644
    • Add a job to test the documentation build by @sgugger in https://github.com/huggingface/transformers/pull/14645
    • fix flax examples tests by @patil-suraj in https://github.com/huggingface/transformers/pull/14646
    • Use cross_attention_hidden_size in Encoder-Decoder models by @ydshieh in https://github.com/huggingface/transformers/pull/14378
    • [deepspeed] fix --load_best_model_at_end by @stas00 in https://github.com/huggingface/transformers/pull/14652
    • quick fix SummarizationPipeline error messages by @NouamaneTazi in https://github.com/huggingface/transformers/pull/14618
    • Fix a Bug, trainer_seq2seq.py, in the else branch at Line 172, generation_inputs should be a dict by @TranSirius in https://github.com/huggingface/transformers/pull/14546
    • [trainer] conditional ctx managers into one wrapper by @stas00 in https://github.com/huggingface/transformers/pull/14663
    • Fixing Dataset for TQA + token-classification. by @Narsil in https://github.com/huggingface/transformers/pull/14658
    • fix deprecated tf method by @ZOHETH in https://github.com/huggingface/transformers/pull/14671
    • Fix doc builder by @LysandreJik in https://github.com/huggingface/transformers/pull/14676
    • [AutoProcessor] Add Wav2Vec2WithLM & small fix #14675 (@patrickvonplaten)
    • Added support for other features for already supported models #14358 (@michaelbenayoun)
    • Revert "Added support for other features for already supported models" #14679 (@lewtun)
    • Convert tutorials #14665 (@sgugger)
    • fix: verify jsonlines file in run_translation (#14660) #14661 (@GaurangTandon)
    • Improvements to Comet Integration #14680 (@DN6)
    • Fixes in init #14681 (@sgugger)
    • Revert open-in-colab and add perceiver #14683 (@sgugger)
    • Fix wrong checkpoint paths in doc examples #14685 (@ydshieh)
    • [bf16 support] tweaks #14580 (@stas00)
    • [trainer] support UserDict inputs (torch-nightly) #14688 (@stas00)
    • Move pyctcdecode #14686 (@sgugger)
    • Make MLuke tokenizer tests slow #14690 (@sgugger)
    • Fix doc examples: name '...' is not defined #14687 (@ydshieh)
    • Add a job to test doc building (for realsies this time) #14662 (@sgugger)
    • Fix Perceiver tests #14703 (@NielsRogge)
    • add str hub token to repository when provided else fallback to default #14682 (@philschmid)
    • Fix typo in toctree #14704 (@mishig25)

    New Contributors

    • @hrxorxm made their first contribution in https://github.com/huggingface/transformers/pull/14211
    • @pdcoded made their first contribution in https://github.com/huggingface/transformers/pull/14219
    • @wmathor made their first contribution in https://github.com/huggingface/transformers/pull/14202
    • @wamartin-aml made their first contribution in https://github.com/huggingface/transformers/pull/14232
    • @lumliolum made their first contribution in https://github.com/huggingface/transformers/pull/14133
    • @dwyatte made their first contribution in https://github.com/huggingface/transformers/pull/13891
    • @dshirron made their first contribution in https://github.com/huggingface/transformers/pull/14241
    • @changwangss made their first contribution in https://github.com/huggingface/transformers/pull/14276
    • @xhlulu made their first contribution in https://github.com/huggingface/transformers/pull/14300
    • @Beomi made their first contribution in https://github.com/huggingface/transformers/pull/14287
    • @nbertagnolli made their first contribution in https://github.com/huggingface/transformers/pull/14325
    • @jeffra made their first contribution in https://github.com/huggingface/transformers/pull/14331
    • @RezaYazdaniAminabadi made their first contribution in https://github.com/huggingface/transformers/pull/14298
    • @echarlaix made their first contribution in https://github.com/huggingface/transformers/pull/14293
    • @valentindey made their first contribution in https://github.com/huggingface/transformers/pull/14413
    • @Zahlii made their first contribution in https://github.com/huggingface/transformers/pull/14415
    • @Helw150 made their first contribution in https://github.com/huggingface/transformers/pull/14437
    • @shangz-ai made their first contribution in https://github.com/huggingface/transformers/pull/14066
    • @vmaryasin made their first contribution in https://github.com/huggingface/transformers/pull/14508
    • @cfregly made their first contribution in https://github.com/huggingface/transformers/pull/14529
    • @Xargonus made their first contribution in https://github.com/huggingface/transformers/pull/14514
    • @rahuln made their first contribution in https://github.com/huggingface/transformers/pull/14477
    • @gsnidero made their first contribution in https://github.com/huggingface/transformers/pull/14168
    • @t-vi made their first contribution in https://github.com/huggingface/transformers/pull/14198
    • @JamesDeAntonis made their first contribution in https://github.com/huggingface/transformers/pull/13207
    • @yis11178 made their first contribution in https://github.com/huggingface/transformers/pull/14600
    • @nikvaessen made their first contribution in https://github.com/huggingface/transformers/pull/14525
    • @yuchenlin made their first contribution in https://github.com/huggingface/transformers/pull/14626
    • @Ryou0634 made their first contribution in https://github.com/huggingface/transformers/pull/14640
    • @NouamaneTazi made their first contribution in https://github.com/huggingface/transformers/pull/14618
    • @TranSirius made their first contribution in https://github.com/huggingface/transformers/pull/14546
    • @ZOHETH made their first contribution in https://github.com/huggingface/transformers/pull/14671

    Full Changelog: https://github.com/huggingface/transformers/compare/v4.12.0...v4.13.0

    Source code(tar.gz)
    Source code(zip)
  • v4.12.5(Nov 17, 2021)

  • v4.12.4(Nov 16, 2021)

    • Fix gradient_checkpointing backward compatibility (#14408)
    • [Wav2Vec2] Make sure that gradient checkpointing is only run if needed (#14407)
    • Experimenting with adding proper get_config() and from_config() methods (#14361)
    • enhance rewrite state_dict missing _metadata (#14348)
    • Support for TF >= 2.7 (#14345)
    • improve rewrite state_dict missing _metadata (#14276)
    • Fix of issue #13327: Wrong weight initialization for TF t5 model (#14241)
    Source code(tar.gz)
    Source code(zip)
  • v4.12.3(Nov 3, 2021)

Owner
Hugging Face
The AI community building the future.
Hugging Face
Open source repository for the code accompanying the paper 'Non-Rigid Neural Radiance Fields Reconstruction and Novel View Synthesis of a Deforming Scene from Monocular Video'.

Non-Rigid Neural Radiance Fields This is the official repository for the project "Non-Rigid Neural Radiance Fields: Reconstruction and Novel View Synt

Facebook Research 296 Dec 29, 2022
SciKit-Learn Laboratory (SKLL) makes it easy to run machine learning experiments.

SciKit-Learn Laboratory This Python package provides command-line utilities to make it easier to run machine learning experiments with scikit-learn. O

ETS 528 Nov 25, 2022
This repo contains the pytorch implementation for Dynamic Concept Learner (accepted by ICLR 2021).

DCL-PyTorch Pytorch implementation for the Dynamic Concept Learner (DCL). More details can be found at the project page. Framework Grounding Physical

Zhenfang Chen 31 Jan 06, 2023
You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling

You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling Transformer-based models are widely used in natural language processi

Zhanpeng Zeng 12 Jan 01, 2023
A collection of Jupyter notebooks to play with NVIDIA's StyleGAN3 and OpenAI's CLIP for a text-based guided image generation.

StyleGAN3 CLIP-based guidance StyleGAN3 + CLIP StyleGAN3 + inversion + CLIP This repo is a collection of Jupyter notebooks made to easily play with St

Eugenio Herrera 176 Dec 30, 2022
Code for Contrastive-Geometry Networks for Generalized 3D Pose Transfer

CGTransformer Code for our AAAI 2022 paper "Contrastive-Geometry Transformer network for Generalized 3D Pose Transfer" Contrastive-Geometry Transforme

18 Jun 28, 2022
GeneDisco is a benchmark suite for evaluating active learning algorithms for experimental design in drug discovery.

GeneDisco is a benchmark suite for evaluating active learning algorithms for experimental design in drug discovery.

22 Dec 12, 2022
Official repo of the paper "Surface Form Competition: Why the Highest Probability Answer Isn't Always Right"

Surface Form Competition This is the official repo of the paper "Surface Form Competition: Why the Highest Probability Answer Isn't Always Right" We p

Peter West 46 Dec 23, 2022
deep learning for image processing including classification and object-detection etc.

深度学习在图像处理中的应用教程 前言 本教程是对本人研究生期间的研究内容进行整理总结,总结的同时也希望能够帮助更多的小伙伴。后期如果有学习到新的知识也会与大家一起分享。 本教程会以视频的方式进行分享,教学流程如下: 1)介绍网络的结构与创新点 2)使用Pytorch进行网络的搭建与训练 3)使用Te

WuZhe 13.6k Jan 04, 2023
Code for the CVPR 2021 paper: Understanding Failures of Deep Networks via Robust Feature Extraction

Welcome to Barlow Barlow is a tool for identifying the failure modes for a given neural network. To achieve this, Barlow first creates a group of imag

Sahil Singla 33 Dec 05, 2022
LegoDNN: a block-grained scaling tool for mobile vision systems

Table of contents 1 Introduction 1.1 Major features 1.2 Architecture 2 Code and Installation 2.1 Code 2.2 Installation 3 Repository of DNNs in vision

41 Dec 24, 2022
image scene graph generation benchmark

Scene Graph Benchmark in PyTorch 1.7 This project is based on maskrcnn-benchmark Highlights Upgrad to pytorch 1.7 Multi-GPU training and inference Bat

Microsoft 303 Dec 27, 2022
Distance Encoding for GNN Design

Distance-encoding for GNN design This repository is the official PyTorch implementation of the DEGNN and DEAGNN framework reported in the paper: Dista

172 Nov 08, 2022
Custom IMDB Dataset is extracted between 2020-2021 and custom distilBERT model is trained for movie success probability prediction

IMDB Success Predictor Project involves Web Scraping custom IMDB data between 2020 and 2021 of 10000 movies and shows sorted by number of votes ,fine

Gautam Diwan 1 Jan 18, 2022
Coarse implement of the paper "A Simultaneous Denoising and Dereverberation Framework with Target Decoupling", On DNS-2020 dataset, the DNSMOS of first stage is 3.42 and second stage is 3.47.

SDDNet Coarse implement of the paper "A Simultaneous Denoising and Dereverberation Framework with Target Decoupling", On DNS-2020 dataset, the DNSMOS

Cyril Lv 43 Nov 21, 2022
MegEngine implementation of YOLOX

Introduction YOLOX is an anchor-free version of YOLO, with a simpler design but better performance! It aims to bridge the gap between research and ind

旷视天元 MegEngine 77 Nov 22, 2022
Classic Papers for Beginners and Impact Scope for Authors.

There have been billions of academic papers around the world. However, maybe only 0.0...01% among them are valuable or are worth reading. Since our limited life has never been forever, TopPaper provi

Qiulin Zhang 228 Dec 18, 2022
PyTorch implementation of the REMIND method from our ECCV-2020 paper "REMIND Your Neural Network to Prevent Catastrophic Forgetting"

REMIND Your Neural Network to Prevent Catastrophic Forgetting This is a PyTorch implementation of the REMIND algorithm from our ECCV-2020 paper. An ar

Tyler Hayes 72 Nov 27, 2022
The official MegEngine implementation of the ICCV 2021 paper: GyroFlow: Gyroscope-Guided Unsupervised Optical Flow Learning

[ICCV 2021] GyroFlow: Gyroscope-Guided Unsupervised Optical Flow Learning This is the official implementation of our ICCV2021 paper GyroFlow. Our pres

MEGVII Research 36 Sep 07, 2022