当前位置:网站首页>Pythia: Facebook's latest open source visual and language multitasking learning framework
Pythia: Facebook's latest open source visual and language multitasking learning framework
2022-07-27 21:52:00 【Xiaobai learns vision】
Click on the above “ Xiaobai studies vision ”, Optional plus " Star standard " or “ Roof placement ”
Heavy dry goods , First time delivery today ,Facebook Released a new multi task learning framework Pythia, It's based on PyTorch And it can be used in the joint task of vision and language .Pythia Is a modular plug and play framework , Data scientists and machine learning developers can quickly build 、 Reproduce and build benchmark models .
Project address :https://github.com/facebookresearch/pythia
Pythia What is a ?
Pythia Is a deep learning framework , It supports multitasking in the field of vision and language . The framework is built on open source PyTorch above , Its modularity 、 Plug and play design allows researchers to quickly build models .Pythia It is designed for visual and language tasks , Such as question answering related to visual data and automatic image annotation .
Pythia It not only supports distributed training and a variety of data sets , It also supports custom loss functions 、 Metrics 、 Scheduling and optimizer .Pythia Common visual and language level modules are also provided , They all support distributed training .Pythia Another feature is that there are many built-in corpora , Include VQA、VizWiz、TextVQA and VisualDialog, They can be used for multi task learning , namely Pythia It can train a single multi task model in multiple corpora at the same time .
Overall speaking ,Pythia The main features of are as follows :
Model Zoo:SoTA Reasoning implementation of visual and language models , Include LoRRA(VQA and TextVQA Of SoTA)、Pythia Model (VQA 2018 Challenge champion ) and BAN.
multitasking : Multi task support , Allow simultaneous training on multiple datasets .
Data sets : Including built-in support for multiple datasets , Yes VQA、VizWiz、TextVQA and VisualDialog.
modular : It provides the implementation of several common layers in the field of vision and language .
Distributed : Support based on DataParallel and DistributedDataParallel Distributed training .
Non designated : Do not specify the dataset and model implementation built on it .
Customized : Custom loss function 、 Metrics 、 Dispatch 、 The optimizer 、TensorBoard, Meet all customization needs .
Pythia What's the use ?
Pythia Contains Facebook Recent AI competition (VQA 2018 Challenges and Vizwiz 2018 challenge round ) The winning element in . Features include reasoning implementation , To show the previous SOTA How can the model achieve relevant benchmark results and quickly evaluate the new model . Besides multitasking ,Pythia It also supports distributed training 、 A series of data sets and customized loss functions 、 Measure 、 Scheduling and optimizer .
Pythia Official documents :https://learnpythia.readthedocs.io/en/latest/
We can use Pythia Complete the vision and language multimodality research project , As shown in the figure below, it is a visual question and answer , It also needs to learn knowledge about images and texts .

Pythia How to use it? ?
Pythia The installation of is very simple , Various dependencies are also automatically installed :
# Clone Pythia repository
git clone https://github.com/facebookresearch/pythia ~/pythia
# Install dependencies and setup
cd ~/pythia
python setup.py developget data
Pythia The currently supported datasets require two parts , That is, characteristics and ImDB. for example , about TextVQA, We need to download the following data and pre training weights .
cd ~/pythia;
# Create data folder
mkdir -p data && cd data;
# Download and extract the features
wget https://dl.fbaipublicfiles.com/pythia/features/open_images.tar.gz
tar xf open_images.tar.gz
# Get vocabularies
wget http://dl.fbaipublicfiles.com/pythia/data/vocab.tar.gz
tar xf vocab.tar.gz
# Download detectron weights required by some models
wget http://dl.fbaipublicfiles.com/pythia/data/detectron_weights.tar.gz
tar xf detectron_weights.tar.gz
# Download and extract ImDB
mkdir -p imdb && cd imdb
wget https://dl.fbaipublicfiles.com/pythia/data/imdb/textvqa_0.5.tar.gz
tar xf textvqa_0.5.tar.gzTraining
After downloading the data, you can train directly :
cd ~/pythia;
python tools/run.py --tasks vqa --datasets textvqa --model lorra --config *\*
configs/vqa/textvqa/lorra.ymlinfer
If you need to run inferences or generate predictions , We can download the corresponding pre training model , And run the following command line :
cd ~/pythia/data
mkdir -p models && cd models;
wget https://dl.fbaipublicfiles.com/pythia/pretrained_models/textvqa/lorra_best.pthcd ../..
python tools/run.py --tasks vqa --datasets textvqa --model lorra --config *\*
configs/vqa/textvqa/lorra.yml --resume_file data/models/lorra_best.pth *\*
--evalai_inference 1 --run_type inferenceA complete example can be found in colab Check out :https://colab.research.google.com/drive/1Z9fsh10rFtgWe4uy8nvU4mQmqdokdIRR
Pythia What is important
Pythia It makes the process of entering the developing visual and language sub fields smoother , It allows researchers to focus on faster prototypes and experiments .Facebook Our goal is to accelerate the process by improving the reproducibility of these models and results . In this way , Communities can more easily build successful systems , And benchmark it .
Facebook hope , After removing some obstacles , Researchers can more quickly develop new ways for humans to communicate with intelligent machines . This work should also help researchers develop adaptability AI, Synthesize multiple understandings into more context based 、 Multimodal understanding . Except for the content of this open source ,Facebook There are also plans to add some tools 、 Mission 、 Data sets and reference models .
Reference article :https://code.fb.com/ai-research/pythia/
The good news !
Xiaobai learns visual knowledge about the planet
Open to the outside world

download 1:OpenCV-Contrib Chinese version of extension module
stay 「 Xiaobai studies vision 」 Official account back office reply : Extension module Chinese course , You can download the first copy of the whole network OpenCV Extension module tutorial Chinese version , Cover expansion module installation 、SFM Algorithm 、 Stereo vision 、 Target tracking 、 Biological vision 、 Super resolution processing and other more than 20 chapters .
download 2:Python Visual combat project 52 speak
stay 「 Xiaobai studies vision 」 Official account back office reply :Python Visual combat project , You can download, including image segmentation 、 Mask detection 、 Lane line detection 、 Vehicle count 、 Add Eyeliner 、 License plate recognition 、 Character recognition 、 Emotional tests 、 Text content extraction 、 Face recognition, etc 31 A visual combat project , Help fast school computer vision .
download 3:OpenCV Actual project 20 speak
stay 「 Xiaobai studies vision 」 Official account back office reply :OpenCV Actual project 20 speak , You can download the 20 Based on OpenCV Realization 20 A real project , Realization OpenCV Learn advanced .
Communication group
Welcome to join the official account reader group to communicate with your colleagues , There are SLAM、 3 d visual 、 sensor 、 Autopilot 、 Computational photography 、 testing 、 Division 、 distinguish 、 Medical imaging 、GAN、 Wechat groups such as algorithm competition ( It will be subdivided gradually in the future ), Please scan the following micro signal clustering , remarks :” nickname + School / company + Research direction “, for example :” Zhang San + Shanghai Jiaotong University + Vision SLAM“. Please note... According to the format , Otherwise, it will not pass . After successful addition, they will be invited to relevant wechat groups according to the research direction . Please do not send ads in the group , Or you'll be invited out , Thanks for your understanding ~边栏推荐
- STL源码剖析
- 2021-11-05类变量和类方法的理解
- Exception -exception
- 内部类(四种内部类详解)
- Form of objects in memory & memory allocation mechanism
- 一口气学完 Redis 集群方案
- Shengyang technology officially launched the remote voiceprint health return visit service system!
- 二维数组的基本用法
- Is log4j vulnerability still widespread?
- LinkedList underlying source code
猜你喜欢

Log4j vulnerability is still widespread and continues to cause impact

看起来是线程池的BUG,但是我认为是源码设计不合理。

LM NAV: robot navigation method based on large models of language, vision and behavior

零钱通项目(两个版本)含思路详解

异常-Exception

CBAM learning notes

day 1 - day 4

@Detailed introduction of requestparam annotation

@RequestParam注解的详细介绍

How to deal with high concurrency deadlock?
随机推荐
紫光展锐:2020年将有数十款基于春藤510的5G终端商用
MySQL execution process and order
Shengyang technology officially launched the remote voiceprint health return visit service system!
Software testing interview question: when does the software testing project start? Why?
Basic usage of two-dimensional array
Acwing3715. Minimum exchange times (simulation idea of bubble sorting method)
如何实现一个好的知识管理系统?
Software test interview question: does software acceptance test include formal acceptance test, alpha test and beta test?
Is log4j vulnerability still widespread?
怎么还有人问 MySQL 是如何归档数据的呢?
Technology Management - we must focus on the big and let go of the small
面向3nm及以下工艺,ASML新一代EUV光刻机曝光
Pytest failed and rerun
Ora-27300, ora-27301, ora-27302, ora-27303, tns-2518, tns-12549, tns-12560, tns-00519 and other alarm processing
Ziguang zhanrui: dozens of 5g terminals based on chunteng 510 will be commercially available in 2020
看起来是线程池的BUG,但是我认为是源码设计不合理。
JVM-内存模型 面试总结
CocoaPods 重装
In addition to "adding machines", in fact, your micro service can be optimized like this
In depth understanding of recursive method calls (including instance maze problem, tower of Hanoi, monkey eating peach, fiboracci, factorial))