当前位置：网站首页>Pythia: Facebook's latest open source visual and language multitasking learning framework

Pythia: Facebook's latest open source visual and language multitasking learning framework

2022-07-27 21:52:00 【Xiaobai learns vision】

Click on the above “ Xiaobai studies vision ”, Optional plus " Star standard " or “ Roof placement ”

 Heavy dry goods , First time delivery

today ,Facebook Released a new multi task learning framework Pythia, It's based on PyTorch And it can be used in the joint task of vision and language .Pythia Is a modular plug and play framework , Data scientists and machine learning developers can quickly build 、 Reproduce and build benchmark models .

Project address ：https://github.com/facebookresearch/pythia

Pythia What is a ？

Pythia Is a deep learning framework , It supports multitasking in the field of vision and language . The framework is built on open source PyTorch above , Its modularity 、 Plug and play design allows researchers to quickly build models .Pythia It is designed for visual and language tasks , Such as question answering related to visual data and automatic image annotation .

Pythia It not only supports distributed training and a variety of data sets , It also supports custom loss functions 、 Metrics 、 Scheduling and optimizer .Pythia Common visual and language level modules are also provided , They all support distributed training .Pythia Another feature is that there are many built-in corpora , Include VQA、VizWiz、TextVQA and VisualDialog, They can be used for multi task learning , namely Pythia It can train a single multi task model in multiple corpora at the same time .

Overall speaking ,Pythia The main features of are as follows ：

Model Zoo：SoTA Reasoning implementation of visual and language models , Include LoRRA（VQA and TextVQA Of SoTA）、Pythia Model （VQA 2018 Challenge champion ） and BAN.
multitasking ： Multi task support , Allow simultaneous training on multiple datasets .
Data sets ： Including built-in support for multiple datasets , Yes VQA、VizWiz、TextVQA and VisualDialog.
modular ： It provides the implementation of several common layers in the field of vision and language .
Distributed ： Support based on DataParallel and DistributedDataParallel Distributed training .
Non designated ： Do not specify the dataset and model implementation built on it .
Customized ： Custom loss function 、 Metrics 、 Dispatch 、 The optimizer 、TensorBoard, Meet all customization needs .

Pythia What's the use ？

Pythia Contains Facebook Recent AI competition （VQA 2018 Challenges and Vizwiz 2018 challenge round ） The winning element in . Features include reasoning implementation , To show the previous SOTA How can the model achieve relevant benchmark results and quickly evaluate the new model . Besides multitasking ,Pythia It also supports distributed training 、 A series of data sets and customized loss functions 、 Measure 、 Scheduling and optimizer .

Pythia Official documents ：https://learnpythia.readthedocs.io/en/latest/

We can use Pythia Complete the vision and language multimodality research project , As shown in the figure below, it is a visual question and answer , It also needs to learn knowledge about images and texts .

Pythia How to use it? ？

Pythia The installation of is very simple , Various dependencies are also automatically installed ：

# Clone Pythia repository
git clone https://github.com/facebookresearch/pythia ~/pythia

# Install dependencies and setup
cd ~/pythia
python setup.py develop

get data

Pythia The currently supported datasets require two parts , That is, characteristics and ImDB. for example , about TextVQA, We need to download the following data and pre training weights .

cd ~/pythia;
# Create data folder
mkdir -p data && cd data;

# Download and extract the features
wget https://dl.fbaipublicfiles.com/pythia/features/open_images.tar.gz
tar xf open_images.tar.gz

# Get vocabularies
wget http://dl.fbaipublicfiles.com/pythia/data/vocab.tar.gz
tar xf vocab.tar.gz

# Download detectron weights required by some models
wget http://dl.fbaipublicfiles.com/pythia/data/detectron_weights.tar.gz
tar xf detectron_weights.tar.gz

# Download and extract ImDB
mkdir -p imdb && cd imdb
wget https://dl.fbaipublicfiles.com/pythia/data/imdb/textvqa_0.5.tar.gz
tar xf textvqa_0.5.tar.gz

Training

After downloading the data, you can train directly ：

cd ~/pythia;
python tools/run.py --tasks vqa --datasets textvqa --model lorra --config *\*
configs/vqa/textvqa/lorra.yml

infer

If you need to run inferences or generate predictions , We can download the corresponding pre training model , And run the following command line ：

cd ~/pythia/data
mkdir -p models && cd models;

wget https://dl.fbaipublicfiles.com/pythia/pretrained_models/textvqa/lorra_best.pthcd ../..

python tools/run.py --tasks vqa --datasets textvqa --model lorra --config *\*
configs/vqa/textvqa/lorra.yml --resume_file data/models/lorra_best.pth *\*
--evalai_inference 1 --run_type inference

A complete example can be found in colab Check out ：https://colab.research.google.com/drive/1Z9fsh10rFtgWe4uy8nvU4mQmqdokdIRR

Pythia What is important

Pythia It makes the process of entering the developing visual and language sub fields smoother , It allows researchers to focus on faster prototypes and experiments .Facebook Our goal is to accelerate the process by improving the reproducibility of these models and results . In this way , Communities can more easily build successful systems , And benchmark it .

Facebook hope , After removing some obstacles , Researchers can more quickly develop new ways for humans to communicate with intelligent machines . This work should also help researchers develop adaptability AI, Synthesize multiple understandings into more context based 、 Multimodal understanding . Except for the content of this open source ,Facebook There are also plans to add some tools 、 Mission 、 Data sets and reference models .

Reference article ：https://code.fb.com/ai-research/pythia/

The good news ！

Xiaobai learns visual knowledge about the planet

Open to the outside world

 download 1：OpenCV-Contrib Chinese version of extension module 

 stay 「 Xiaobai studies vision 」 Official account back office reply ： Extension module Chinese course , You can download the first copy of the whole network OpenCV Extension module tutorial Chinese version , Cover expansion module installation 、SFM Algorithm 、 Stereo vision 、 Target tracking 、 Biological vision 、 Super resolution processing and other more than 20 chapters .


 download 2：Python Visual combat project 52 speak 
 stay 「 Xiaobai studies vision 」 Official account back office reply ：Python Visual combat project , You can download, including image segmentation 、 Mask detection 、 Lane line detection 、 Vehicle count 、 Add Eyeliner 、 License plate recognition 、 Character recognition 、 Emotional tests 、 Text content extraction 、 Face recognition, etc 31 A visual combat project , Help fast school computer vision .


 download 3：OpenCV Actual project 20 speak 
 stay 「 Xiaobai studies vision 」 Official account back office reply ：OpenCV Actual project 20 speak , You can download the 20 Based on OpenCV Realization 20 A real project , Realization OpenCV Learn advanced .


 Communication group 

 Welcome to join the official account reader group to communicate with your colleagues , There are SLAM、 3 d visual 、 sensor 、 Autopilot 、 Computational photography 、 testing 、 Division 、 distinguish 、 Medical imaging 、GAN、 Wechat groups such as algorithm competition （ It will be subdivided gradually in the future ）, Please scan the following micro signal clustering , remarks ：” nickname + School / company + Research direction “, for example ：” Zhang San  +  Shanghai Jiaotong University  +  Vision SLAM“. Please note... According to the format , Otherwise, it will not pass . After successful addition, they will be invited to relevant wechat groups according to the research direction . Please do not send ads in the group , Or you'll be invited out , Thanks for your understanding ~

原网站

版权声明
本文为[Xiaobai learns vision]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/199/202207151351464860.html

当前位置：网站首页>Pythia: Facebook's latest open source visual and language multitasking learning framework

Pythia: Facebook's latest open source visual and language multitasking learning framework

边栏推荐

猜你喜欢

随机推荐