Code for the ICML 2021 paper: "ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision"

Overview

ViLT

Code for the paper: "ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision"


The main figure

Install

pip install -r requirements.txt
pip install -e .

Download Pretrained Weights

We provide five pretrained weights

  1. ViLT-B/32 Pretrained with MLM+ITM for 200k steps on GCC+SBU+COCO+VG (ViLT-B/32 200k) link
  2. ViLT-B/32 200k finetuned on VQAv2 link
  3. ViLT-B/32 200k finetuned on NLVR2 link
  4. ViLT-B/32 200k finetuned on COCO IR/TR link
  5. ViLT-B/32 200k finetuned on F30K IR/TR link

Out-of-the-box MLM + Visualization Demo

pip install gradio==1.6.4
python demo.py with num_gpus=<0 if you have no gpus else 1> load_path="<YOUR_WEIGHT_ROOT>/vilt_200k_mlm_itm.ckpt"

ex)
python demo.py with num_gpus=0 load_path="weights/vilt_200k_mlm_itm.ckpt"

Out-of-the-box VQA Demo

pip install gradio==1.6.4
python demo_vqa.py with num_gpus=<0 if you have no gpus else 1> load_path="<YOUR_WEIGHT_ROOT>/vilt_vqa.ckpt" test_only=True

ex)
python demo_vqa.py with num_gpus=0 load_path="weights/vilt_vqa.ckpt" test_only=True

Dataset Preparation

See DATA.md

Train New Models

See TRAIN.md

Evaluation

See EVAL.md

Citation

If you use any part of this code and pretrained weights for your own purpose, please cite our paper.

@article{kim2021vilt,
  title={ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision},
  author={Kim, Wonjae and Son, Bokyung and Kim, Ildoo},
  journal={arXiv preprint arXiv:2102.03334},
  year={2021}
}

Contact for Issues

Comments
  • Add ViLT to HuggingFace Transformers

    Add ViLT to HuggingFace Transformers

    Hi,

    I've been reading the ViLT paper and was impressed by the simplicity, as it only adds text embeddings to a ViT.

    As ViT is already available in HuggingFace Transformers, adding ViLT should be relatively easy.

    I've currently implemented the model (see here for my current implementation). It includes a conversion script (convert_vilt_original_to_pytorch.py) to convert the weights from this repository (the PyTorch Lightning module) to its HuggingFace counterpart, for all models (base one + the ones with a head on top).

    However, I'm facing some issues when performing a forward pass with the original implementation in Google Colab (when just doing pip install -r requirements.txt and running the demo_vqa.py script, you get the following):

    Traceback (most recent call last):
      File "demo_vqa.py", line 17, in <module>
        from vilt.modules import ViLTransformerSS
      File "/content/ViLT/vilt/modules/__init__.py", line 1, in <module>
        from .vilt_module import ViLTransformerSS
      File "/content/ViLT/vilt/modules/vilt_module.py", line 3, in <module>
        import pytorch_lightning as pl
      File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/__init__.py", line 62, in <module>
        from pytorch_lightning import metrics
      File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/metrics/__init__.py", line 14, in <module>
        from pytorch_lightning.metrics.metric import Metric
      File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/metrics/metric.py", line 23, in <module>
        from pytorch_lightning.metrics.utils import _flatten, dim_zero_cat, dim_zero_mean, dim_zero_sum
      File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/metrics/utils.py", line 18, in <module>
        from pytorch_lightning.utilities import rank_zero_warn
      File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/utilities/__init__.py", line 24, in <module>
        from pytorch_lightning.utilities.apply_func import move_data_to_device
      File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/utilities/apply_func.py", line 25, in <module>
        from torchtext.data import Batch
    ImportError: cannot import name 'Batch' from 'torchtext.data' (/usr/local/lib/python3.7/dist-packages/torchtext/data/__init__.py)
    
    If you suspect this is an IPython bug, please report it at:
        https://github.com/ipython/ipython/issues
    or send an email to the mailing list at [email protected]
    
    You can print a more detailed traceback right now with "%tb", or use "%debug"
    to interactively debug it.
    
    Extra-detailed tracebacks for bug-reporting purposes can be enabled via:
        %config Application.verbose_crash=True
    

    Upgrading PyTorch Lightning to the latest version also returns an error:

    Traceback (most recent call last):
      File "demo_vqa.py", line 17, in <module>
        from vilt.modules import ViLTransformerSS
      File "/content/ViLT/vilt/modules/__init__.py", line 1, in <module>
        from .vilt_module import ViLTransformerSS
      File "/content/ViLT/vilt/modules/vilt_module.py", line 7, in <module>
        from vilt.modules import heads, objectives, vilt_utils
      File "/content/ViLT/vilt/modules/vilt_utils.py", line 11, in <module>
        from vilt.gadgets.my_metrics import Accuracy, VQAScore, Scalar
      File "/content/ViLT/vilt/gadgets/my_metrics.py", line 2, in <module>
        from pytorch_lightning.metrics import Metric
    ModuleNotFoundError: No module named 'pytorch_lightning.metrics'
    

    As PL deprecated the metrics module.

    Are you able to provide a simple Colab notebook to perform inference on an image+text pair?

    Thanks!

    opened by NielsRogge 13
  • python run.py with data_root=/data/workspace/dataset num_gpus=4 num_nodes=1 task_finetune_irtr_f30k_randaug per_gpu_batchsize=4 load_path=

    python run.py with data_root=/data/workspace/dataset num_gpus=4 num_nodes=1 task_finetune_irtr_f30k_randaug per_gpu_batchsize=4 load_path="weights/vilt_200k_mlm_itm.ckpt"

    Saving latest checkpoint... INFO - lightning - Saving latest checkpoint... ERROR - ViLT - Failed after 1:05:38! Traceback (most recent calls WITHOUT Sacred internals): File "/root/anaconda3/envs/vilt/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 524, in train self.train_loop.run_training_epoch() File "/root/anaconda3/envs/vilt/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 572, in run_training_epoch batch_output = self.run_training_batch(batch, batch_idx, dataloader_idx) File "/root/anaconda3/envs/vilt/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 704, in run_training_batch self.trainer.hiddens) File "/root/anaconda3/envs/vilt/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 818, in training_step_and_backward result = self.training_step(split_batch, batch_idx, opt_idx, hiddens) File "/root/anaconda3/envs/vilt/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 339, in training_step training_step_output = self.trainer.accelerator_backend.training_step(args) File "/root/anaconda3/envs/vilt/lib/python3.7/site-packages/pytorch_lightning/accelerators/ddp_accelerator.py", line 158, in training_step return self._step(args) File "/root/anaconda3/envs/vilt/lib/python3.7/site-packages/pytorch_lightning/accelerators/ddp_accelerator.py", line 170, in _step output = self.trainer.model(*args) File "/root/anaconda3/envs/vilt/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, **kwargs) File "/root/anaconda3/envs/vilt/lib/python3.7/site-packages/pytorch_lightning/overrides/data_parallel.py", line 179, in forward output = self.module.training_step(*inputs[0], **kwargs[0]) File "/data/workspace/ViLT/vilt/modules/vilt_module.py", line 219, in training_step vilt_utils.set_task(self) File "/data/workspace/ViLT/vilt/modules/vilt_utils.py", line 177, in set_task picked = all_gather(current_tasks) File "/data/workspace/ViLT/vilt/modules/dist_utils.py", line 165, in all_gather size_list, tensor = _pad_to_largest_tensor(tensor, group) File "/data/workspace/ViLT/vilt/modules/dist_utils.py", line 129, in _pad_to_largest_tensor dist.all_gather(size_list, local_size, group=group) File "/root/anaconda3/envs/vilt/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 1870, in all_gather work.wait() RuntimeError: [/pytorch/third_party/gloo/gloo/transport/tcp/unbound_buffer.cc:84] Timed out waiting 1800000ms for recv operation to complete

    During handling of the above exception, another exception occurred:

    Traceback (most recent calls WITHOUT Sacred internals): File "/data/workspace/ViLT/run.py", line 72, in main trainer.fit(model, datamodule=dm) File "/root/anaconda3/envs/vilt/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 473, in fit results = self.accelerator_backend.train() File "/root/anaconda3/envs/vilt/lib/python3.7/site-packages/pytorch_lightning/accelerators/ddp_accelerator.py", line 152, in train results = self.ddp_train(process_idx=self.task_idx, model=model) File "/root/anaconda3/envs/vilt/lib/python3.7/site-packages/pytorch_lightning/accelerators/ddp_accelerator.py", line 305, in ddp_train results = self.train_or_test() File "/root/anaconda3/envs/vilt/lib/python3.7/site-packages/pytorch_lightning/accelerators/accelerator.py", line 69, in train_or_test results = self.trainer.train() File "/root/anaconda3/envs/vilt/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 555, in train self.train_loop.on_train_end() File "/root/anaconda3/envs/vilt/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 200, in on_train_end self.check_checkpoint_callback(should_save=True, is_last=True) File "/root/anaconda3/envs/vilt/lib/python3.7/site-packages/pytorch_lightning/trainer/training_loop.py", line 234, in check_checkpoint_callback callback.on_validation_end(self.trainer, model) File "/root/anaconda3/envs/vilt/lib/python3.7/site-packages/pytorch_lightning/callbacks/model_checkpoint.py", line 203, in on_validation_end self.save_checkpoint(trainer, pl_module) File "/root/anaconda3/envs/vilt/lib/python3.7/site-packages/pytorch_lightning/callbacks/model_checkpoint.py", line 238, in save_checkpoint self._validate_monitor_key(trainer) File "/root/anaconda3/envs/vilt/lib/python3.7/site-packages/pytorch_lightning/callbacks/model_checkpoint.py", line 516, in _validate_monitor_key raise MisconfigurationException(m) pytorch_lightning.utilities.exceptions.MisconfigurationException: ModelCheckpoint(monitor='val/the_metric') not found in the returned metrics: ['irtr/train/irtr_loss', 'itm/train/loss', 'itm/train/wpa_loss', 'itm/train/accuracy']. HINT: Did you call self.log('val/the_metric', tensor) in the LightningModule?

    Epoch 0: 0%| | 24/9691 [30:14<202:59:08, 75.59s/it, loss=0.579, v_num=0]

    opened by raojay7 9
  • got error when I turn on mpp (Masked Patch Prediction)

    got error when I turn on mpp (Masked Patch Prediction)

    Hi, @dandelin

    When I turn on the mpp (Masked Patch Prediction), I get this error:

    AttributeError: 'VisionTransformer' object has no attribute 'mask_token'

    The above error is appear in vision_transformer.py. Could you please tell me how to address it?

    Thank you for your help.

    Best regards, Ge-Peng.

    opened by GewelsJI 8
  • Unable to reproduce the 100k results

    Unable to reproduce the 100k results

    Dear Authors, Thanks for open sourcing the code. I tried pretrain 100k steps and finetune on vqav2, but my dev-test score is about 65, unlike the 70.8 on the paper.

    Here is my pretrain and finetune command

    python run.py with data_root=vilt_dataset/ \
    	num_gpus=8 num_nodes=8 task_mlm_itm whole_word_masking=True step100k \
    	per_gpu_batchsize=64 exp_name=pretrain 
    
    python run.py with data_root=vilt_dataset/ \
    	num_gpus=8 num_nodes=1 task_finetune_vqa_randaug \
    	per_gpu_batchsize=32 load_path="result/pretrain_seed0_from_/version_0/checkpoints/last.ckpt" \
    	exp_name=vqa_finetune
    

    Generate JSON with

    python run.py with data_root=vilt_dataset/ \
    	num_gpus=4 num_nodes=1 task_finetune_vqa \
    	per_gpu_batchsize=256 load_path="result/vqa_finetune_seed0_from_last/version_0/checkpoints/last.ckpt" \
    	test_only=True  exp_name="test_vqa"
    

    here is my pretraining and finetuning tb log Screen Shot 2021-06-10 at 6 34 22 PM Screen Shot 2021-06-10 at 6 34 28 PM Screen Shot 2021-06-10 at 6 35 14 PM

    opened by JACKHAHA363 8
  • Possible out-of-memory issue of dataloader

    Possible out-of-memory issue of dataloader

    Hello,

    I have read through your code, but haven't run the code yet. One question about the dataloader implementation. According to

    https://github.com/dandelin/ViLT/blob/master/vilt/datasets/base_dataset.py#L43

    You load all the arrow files into memory. The pre-training data have hundreds of gigabytes. Is it possible that this may cause out-of-memory issue? Or does this implementation assume large machine memory?

    Thanks,

    opened by zhiqiangdon 4
  • AttributeError: module 'vilt' has no attribute 'modules'

    AttributeError: module 'vilt' has no attribute 'modules'

    I run into an error

    File "run.py",line 6, in from vilt.modules import ViLTransformerSS File "ViLT/vilt/moudles/vilt/moudules/init.py",line 1, in form .vilt_module import ViLTransformerSS File "ViLT/vilt/moudles/vilt_moudule.py",line 4, in import vilt.module.vision_transformer as vit AttributeError: module 'vilt' has no attribute 'modules'

    when I run the "Evaluate VQAv2" command

    opened by leonodelee 4
  • I reproduced the code of pytorch version, but get different result

    I reproduced the code of pytorch version, but get different result

    In Image Retrieval, the [email protected] is 68.4 which is higher than 61.9 in paper In Text retrieval, the [email protected] is 73.5 which is lower than 81.4 in paper

    So, I want if the input format is error in my code.

    In image, I use "pixelbert_transform" function of size=384. In Text, I use Bert base tokenizer with max len 40 which includes [CLS], word tokens and without [SEP]. In flickr-30k, I use dataset_flickr30k.json to get test datasets, and I chose the first caption of five about each image.

    Thanks very much for your help!

    opened by NostalgiaOfTime 4
  • Why is answers set to 0 for irtr even for the positive case?

    Why is answers set to 0 for irtr even for the positive case?

    Hello, thanks for the amazing repository. If I understand correct, for IRTR, answer should be 1 for the first element which is true, and 0 for the remaining false texts (https://github.com/dandelin/ViLT/blob/master/vilt/modules/objectives.py#L429)

    But in the code, it sets all of them including the positive sample to be zero. Am I missing something here? Thanks!

    opened by TheShadow29 3
  • Can you share 'vqa_dict.json' file for vqa_demo?

    Can you share 'vqa_dict.json' file for vqa_demo?

    Hi, @dandelin Thank you for your interesting work, I am runing demo_vqa.py and failed at line 38 https://dl.dropboxusercontent.com/s/otya4i5sagt4f5p/vqa_dict.json, because this link is unavailable now. Do you download this file? Also, vilt_200k_mlm_itm.ckpt link is already unavailable... Wishing for your reply!

    opened by Senwang98 3
  • Reproduce Flickr30k Evaluation results - DataSet problem

    Reproduce Flickr30k Evaluation results - DataSet problem

    Hello again @dandelin ,

    I was trying to reproduce the steps from https://github.com/dandelin/ViLT/blob/master/EVAL.md to get the results from Flickr30k T2IR.

    First I did what is suggested in https://github.com/dandelin/ViLT/blob/master/DATA.md.

    So I have in a folder /content/flickr30k this structure:

        /content/flickr30k
         ├── flickr30k_images            
         │   ├── ....jpg
         |   ├── ....jpg
         ├── karpathy          
             ├── dataset_flickr30k.json              
    

    Then I do the transformation:

    from vilt.utils.write_f30k_karpathy import make_arrow
    make_arrow( '/content/flickr30k',  '/content/arrow')
    

    But when I run:

    python run.py with data_root='/content/arrow' num_gpus=1 num_nodes=1 per_gpu_batchsize=4 task_finetune_irtr_f30k_randaug test_only=True load_path="/content/TFM_Sparse_Embeddings/vilt_irtr_f30k.ckpt"
    

    I get the error:

    ERROR - ViLT - Failed after 0:00:06!
    Traceback (most recent calls WITHOUT Sacred internals):
      File "run.py", line 73, in main
        trainer.test(model, datamodule=dm)
      File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/trainer.py", line 755, in test
        results = self.__test_given_model(model, test_dataloaders)
      File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/trainer.py", line 820, in __test_given_model
        results = self.fit(model)
      File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/trainer.py", line 473, in fit
        results = self.accelerator_backend.train()
      File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/accelerators/ddp_accelerator.py", line 152, in train
        results = self.ddp_train(process_idx=self.task_idx, model=model)
      File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/accelerators/ddp_accelerator.py", line 305, in ddp_train
        results = self.train_or_test()
      File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/accelerators/accelerator.py", line 67, in train_or_test
        results = self.trainer.run_test()
      File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/trainer.py", line 662, in run_test
        eval_loop_results, _ = self.run_evaluation(test_mode=True)
      File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/trainer.py", line 566, in run_evaluation
        dataloaders, max_batches = self.evaluation_loop.get_evaluation_dataloaders(max_batches)
      File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/evaluation_loop.py", line 56, in get_evaluation_dataloaders
        self.trainer.reset_test_dataloader(model)
      File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/data_loading.py", line 299, in reset_test_dataloader
        self._reset_eval_dataloader(model, 'test')
      File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/trainer/data_loading.py", line 249, in _reset_eval_dataloader
        num_batches = len(dataloader) if has_len(dataloader) else float('inf')
      File "/usr/local/lib/python3.7/dist-packages/pytorch_lightning/utilities/data.py", line 33, in has_len
        raise ValueError('`Dataloader` returned 0 length.'
    ValueError: `Dataloader` returned 0 length. Please make sure that your Dataloader at least returns 1 batch
    
    opened by JoanFM 2
  • small difference between paper and code about token type embedding

    small difference between paper and code about token type embedding

    Thanks for your paper and code, it helps me a lot. There is a small problem that makes me feel confused. In your paper 3.1, the text embedding consists of word embedding, position embedding, and modal-type embedding. vilt-3 1

    while in the source code of vilt/modules/vilt_module.py, the text_embedding is implemented by:

    from transformers.models.bert.modeling_bert import BertConfig, BertEmbeddings
    ...
      self.text_embeddings = BertEmbeddings(bert_config)
    

    and an extra token_type embedding self.token_type_embeddings = nn.Embedding(2, config["hidden_size"]) As I know, BertEmbedding() already contains a token type embedding operation inside, so there are actually two token type embedding for text input, and one token type embedding for image input. I know the self.token_type_embeddings is used as the modal_type embedding to distinguish between image and text. Is it a mistake? Is it ok not to remove the token type embedding inside BertEmbeddings(bert_config)? Will it cause any difference? Hope for your reply, thanks!

    opened by AAbathur 2
  • pyarrow.lib.ArrowInvalid: Not an Arrow file

    pyarrow.lib.ArrowInvalid: Not an Arrow file

    While running the filetuning command for vqav2 using following command:

    python run.py with data_root=/data2/dsets/dataset num_gpus=8 num_nodes=1 task_finetune_vqa_randaug per_gpu_batchsize=64 load_path="weights/vilt_200k_mlm_itm.ckpt"

    I'm encountering the following error:

    WARNING - root - Changed type of config entry "max_steps" from int to NoneType WARNING - ViLT - No observers have been added to this run INFO - ViLT - Running command 'main' INFO - ViLT - Started Global seed set to 0 INFO - lightning - Global seed set to 0 GPU available: True, used: True INFO - lightning - GPU available: True, used: True TPU available: None, using: 0 TPU cores INFO - lightning - TPU available: None, using: 0 TPU cores Using environment variable NODE_RANK for node rank (0). INFO - lightning - Using environment variable NODE_RANK for node rank (0). LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0] INFO - lightning - LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0] Using native 16bit precision. INFO - lightning - Using native 16bit precision. Missing logger folder: result/finetune_vqa_randaug_seed0_from_vilt_200k_mlm_itm WARNING - lightning - Missing logger folder: result/finetune_vqa_randaug_seed0_from_vilt_200k_mlm_itm Global seed set to 0 INFO - lightning - Global seed set to 0 initializing ddp: GLOBAL_RANK: 0, MEMBER: 1/1 INFO - lightning - initializing ddp: GLOBAL_RANK: 0, MEMBER: 1/1 INFO - torch.distributed.distributed_c10d - Added key: store_based_barrier_key:1 to store for rank: 0 INFO - torch.distributed.distributed_c10d - Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. ERROR - ViLT - Failed after 0:00:06! Traceback (most recent call last): File "/home/imt2018525/.local/lib/python3.8/site-packages/sacred/experiment.py", line 312, in run_commandline return self.run( File "/home/imt2018525/.local/lib/python3.8/site-packages/sacred/experiment.py", line 276, in run run() File "/home/imt2018525/.local/lib/python3.8/site-packages/sacred/run.py", line 238, in call self.result = self.main_function(*args) File "/home/imt2018525/.local/lib/python3.8/site-packages/sacred/config/captured_function.py", line 42, in captured_function result = wrapped(*args, **kwargs) File "run.py", line 71, in main trainer.fit(model, datamodule=dm) File "/home/imt2018525/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 473, in fit results = self.accelerator_backend.train() File "/home/imt2018525/.local/lib/python3.8/site-packages/pytorch_lightning/accelerators/ddp_accelerator.py", line 152, in train results = self.ddp_train(process_idx=self.task_idx, model=model) File "/home/imt2018525/.local/lib/python3.8/site-packages/pytorch_lightning/accelerators/ddp_accelerator.py", line 268, in ddp_train self.trainer.call_setup_hook(model) File "/home/imt2018525/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 859, in call_setup_hook self.datamodule.setup(stage_name) File "/home/imt2018525/.local/lib/python3.8/site-packages/pytorch_lightning/core/datamodule.py", line 92, in wrapped_fn return fn(*args, **kwargs) File "/home/imt2018525/ViLT/vilt/datamodules/multitask_datamodule.py", line 34, in setup dm.setup(stage) File "/home/imt2018525/.local/lib/python3.8/site-packages/pytorch_lightning/core/datamodule.py", line 92, in wrapped_fn return fn(*args, **kwargs) File "/home/imt2018525/ViLT/vilt/datamodules/vqav2_datamodule.py", line 19, in setup super().setup(stage) File "/home/imt2018525/ViLT/vilt/datamodules/datamodule_base.py", line 138, in setup self.set_val_dataset() File "/home/imt2018525/ViLT/vilt/datamodules/datamodule_base.py", line 88, in set_val_dataset self.val_dataset = self.dataset_cls( File "/home/imt2018525/ViLT/vilt/datasets/vqav2_dataset.py", line 16, in init super().init( File "/home/imt2018525/ViLT/vilt/datasets/base_dataset.py", line 43, in init tables = [ File "/home/imt2018525/ViLT/vilt/datasets/base_dataset.py", line 44, in pa.ipc.RecordBatchFileReader( File "/home/imt2018525/.local/lib/python3.8/site-packages/pyarrow/ipc.py", line 94, in init self._open(source, footer_offset=footer_offset) File "pyarrow/ipc.pxi", line 624, in pyarrow.lib._RecordBatchFileReader._open File "pyarrow/error.pxi", line 122, in pyarrow.lib.pyarrow_internal_check_status File "pyarrow/error.pxi", line 84, in pyarrow.lib.check_status pyarrow.lib.ArrowInvalid: Not an Arrow file

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last): File "run.py", line 11, in def main(_config): File "/home/imt2018525/.local/lib/python3.8/site-packages/sacred/experiment.py", line 190, in automain self.run_commandline() File "/home/imt2018525/.local/lib/python3.8/site-packages/sacred/experiment.py", line 347, in run_commandline print_filtered_stacktrace() File "/home/imt2018525/.local/lib/python3.8/site-packages/sacred/utils.py", line 493, in print_filtered_stacktrace print(format_filtered_stacktrace(filter_traceback), file=sys.stderr) File "/home/imt2018525/.local/lib/python3.8/site-packages/sacred/utils.py", line 528, in format_filtered_stacktrace return "".join(filtered_traceback_format(tb_exception)) File "/home/imt2018525/.local/lib/python3.8/site-packages/sacred/utils.py", line 568, in filtered_traceback_format current_tb = tb_exception.exc_traceback AttributeError: 'TracebackException' object has no attribute 'exc_traceback'

    I am not sure it's the issue with the pyarrow version. Can someone help me resolve this error? Thanks in advance.

    opened by psrimanreddy 0
  • Mistakes in vqa_dict.json ?

    Mistakes in vqa_dict.json ?

    Hey, I runned demo_vqa.py and did something more. And what I found is that the "ids" in vqa_dict.json (which is download from this URL:"https://github.com/dandelin/ViLT/releases/download/200k/" in the file demo_vqa.py ) misses the id :"125". That means the id jumps from "124" to "126", which caused some bugs . Can you please check the issue and tell me what's the original answer pair with the id "125" ? Thanks a lot !

    opened by Rom-Worker 0
  • The problem of fine-flickr30k

    The problem of fine-flickr30k

    Hello, what configuration does your vilt use when fine-tuning flickr30k? Eight gpus? What is the memory of each gpu? What is the result of fine-tuning flickr30k? Is it a weight file?

    opened by wuqiang12345 0
  • pretrain datasets

    pretrain datasets

    Hello, the author, great work! As time goes by, a lot of image urls in the dataset become invalid. Is there any solution? Could you provide the data arrow?

    opened by mactavish91 0
  • Question about train on coco dataset

    Question about train on coco dataset

    Traceback (most recent calls WITHOUT Sacred internals): File "run.py", line 71, in main trainer.fit(model, datamodule=dm) File "/home/amax/anaconda3/envs/vilt/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 473, in fit results = self.accelerator_backend.train() File "/home/amax/anaconda3/envs/vilt/lib/python3.8/site-packages/pytorch_lightning/accelerators/ddp_accelerator.py", line 152, in train results = self.ddp_train(process_idx=self.task_idx, model=model) File "/home/amax/anaconda3/envs/vilt/lib/python3.8/site-packages/pytorch_lightning/accelerators/ddp_accelerator.py", line 268, in ddp_train self.trainer.call_setup_hook(model) File "/home/amax/anaconda3/envs/vilt/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 859, in call_setup_hook self.datamodule.setup(stage_name) File "/home/amax/anaconda3/envs/vilt/lib/python3.8/site-packages/pytorch_lightning/core/datamodule.py", line 92, in wrapped_fn return fn(*args, **kwargs) File "/data/zjw/ViLT/vilt/datamodules/multitask_datamodule.py", line 34, in setup dm.setup(stage) File "/home/amax/anaconda3/envs/vilt/lib/python3.8/site-packages/pytorch_lightning/core/datamodule.py", line 92, in wrapped_fn return fn(*args, **kwargs) File "/data/zjw/ViLT/vilt/datamodules/datamodule_base.py", line 137, in setup self.set_train_dataset() File "/data/zjw/ViLT/vilt/datamodules/datamodule_base.py", line 76, in set_train_dataset self.train_dataset = self.dataset_cls( File "/data/zjw/ViLT/vilt/datasets/coco_caption_karpathy_dataset.py", line 17, in init super().init(*args, **kwargs, names=names, text_column_name="caption") File "/data/zjw/ViLT/vilt/datasets/base_dataset.py", line 53, in init self.table_names += [name] * len(tables[i]) IndexError: list index out of range

    opened by Bmilab22 0
Owner
Wonjae Kim
Wonjae Kim
OREO: Object-Aware Regularization for Addressing Causal Confusion in Imitation Learning (NeurIPS 2021)

OREO: Object-Aware Regularization for Addressing Causal Confusion in Imitation Learning (NeurIPS 2021) Video demo We here provide a video demo from co

20 Nov 25, 2022
PyTorch reimplementation of the paper Involution: Inverting the Inherence of Convolution for Visual Recognition [CVPR 2021].

Involution: Inverting the Inherence of Convolution for Visual Recognition Unofficial PyTorch reimplementation of the paper Involution: Inverting the I

Christoph Reich 100 Dec 01, 2022
Simultaneous Demand Prediction and Planning

Simultaneous Demand Prediction and Planning Dependencies Python packages: Pytorch, scikit-learn, Pandas, Numpy, PyYAML Data POI: data/poi Road network

Yizong Wang 1 Sep 01, 2022
Putting NeRF on a Diet: Semantically Consistent Few-Shot View Synthesis

Putting NeRF on a Diet: Semantically Consistent Few-Shot View Synthesis Website | ICCV paper | arXiv | Twitter This repository contains the official i

Ajay Jain 73 Dec 27, 2022
Py-faster-rcnn - Faster R-CNN (Python implementation)

py-faster-rcnn has been deprecated. Please see Detectron, which includes an implementation of Mask R-CNN. Disclaimer The official Faster R-CNN code (w

Ross Girshick 7.8k Jan 03, 2023
这个开源项目主要是对经典的时间序列预测算法论文进行复现,模型主要参考自GluonTS,框架主要参考自Informer

Time Series Research with Torch 这个开源项目主要是对经典的时间序列预测算法论文进行复现,模型主要参考自GluonTS,框架主要参考自Informer。 建立原因 相较于mxnet和TF,Torch框架中的神经网络层需要提前指定输入维度: # 建立线性层 TensorF

Chi Zhang 85 Dec 29, 2022
Job Assignment System by Real-time Emotion Detection

Emotion-Detection Job Assignment System by Real-time Emotion Detection Emotion is the essential role of facial expression and it could provide a lot o

1 Feb 08, 2022
The official implementation of CSG-Stump: A Learning Friendly CSG-Like Representation for Interpretable Shape Parsing

CSGStumpNet The official implementation of CSG-Stump: A Learning Friendly CSG-Like Representation for Interpretable Shape Parsing Paper | Project page

Daxuan 39 Dec 26, 2022
[NeurIPS 2019] Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss

Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss Kaidi Cao, Colin Wei, Adrien Gaidon, Nikos Arechiga, Tengyu Ma This is the offi

Kaidi Cao 528 Jan 01, 2023
[ICCV 2021 (oral)] Planar Surface Reconstruction from Sparse Views

Planar Surface Reconstruction From Sparse Views Linyi Jin, Shengyi Qian, Andrew Owens, David F. Fouhey University of Michigan ICCV 2021 (Oral) This re

Linyi Jin 89 Jan 05, 2023
DeepMind Alchemy task environment: a meta-reinforcement learning benchmark

The DeepMind Alchemy environment is a meta-reinforcement learning benchmark that presents tasks sampled from a task distribution with deep underlying structure.

DeepMind 188 Dec 25, 2022
Pytorch implementation of the paper "Optimization as a Model for Few-Shot Learning"

Optimization as a Model for Few-Shot Learning This repo provides a Pytorch implementation for the Optimization as a Model for Few-Shot Learning paper.

Albert Berenguel Centeno 238 Jan 04, 2023
PyTorch Lightning implementation of Automatic Speech Recognition

lasr Lightening Automatic Speech Recognition An MIT License ASR research library, built on PyTorch-Lightning, for developing end-to-end ASR models. In

Soohwan Kim 40 Sep 19, 2022
Simple codebase for flexible neural net training

neural-modular Simple codebase for flexible neural net training. Allows for seamless exchange of models, dataset, and optimizers. Uses hydra for confi

Jannik Kossen 7 Apr 05, 2022
project page for VinVL

VinVL: Revisiting Visual Representations in Vision-Language Models Updates 02/28/2021: Project page built. Introduction This repository is the project

308 Jan 09, 2023
Blender add-on: Add to Cameras menu: View → Camera, View → Add Camera, Camera → View, Previous Camera, Next Camera

Blender add-on: Camera additions In 3D view, it adds these actions to the View|Cameras menu: View → Camera : set the current camera to the 3D view Vie

German Bauer 11 Feb 08, 2022
Fast and Easy Infinite Neural Networks in Python

Neural Tangents ICLR 2020 Video | Paper | Quickstart | Install guide | Reference docs | Release notes Overview Neural Tangents is a high-level neural

Google 1.9k Jan 09, 2023
Generic Event Boundary Detection: A Benchmark for Event Segmentation

Generic Event Boundary Detection: A Benchmark for Event Segmentation We release our data annotation & baseline codes for detecting generic event bound

47 Nov 22, 2022
Flax is a neural network ecosystem for JAX that is designed for flexibility.

Flax: A neural network library and ecosystem for JAX designed for flexibility Overview | Quick install | What does Flax look like? | Documentation See

Google 3.9k Jan 02, 2023
Just playing with getting CLIP Guided Diffusion running locally, rather than having to use colab.

CLIP-Guided-Diffusion Just playing with getting CLIP Guided Diffusion running locally, rather than having to use colab. Original colab notebooks by Ka

Nerdy Rodent 336 Dec 09, 2022