Resources for our AAAI 2022 paper: "LOREN: Logic-Regularized Reasoning for Interpretable Fact Verification".

Overview

LOREN

Resources for our AAAI 2022 paper (pre-print): "LOREN: Logic-Regularized Reasoning for Interpretable Fact Verification".

front

DEMO System

Check out our demo system! Note that the results will be slightly different from the paper, since we use an up-to-date Wikipedia as the evidence source whereas FEVER uses Wikipedia dated 2017.

Dependencies

  • CUDA > 11
  • Prepare requirements: pip3 install -r requirements.txt.
    • Also works for allennlp==2.3.0, transformers==4.5.1, torch==1.8.1.
  • Set environment variable $PJ_HOME: export PJ_HOME=/YOUR_PATH/LOREN/.

Download Pre-processed Data and Checkpoints

  • Pre-processed data at Google Drive. Unzip it and put them under LOREN/data/.

    • Data for training a Seq2Seq MRC is at data/mrc_seq2seq_v5/.
    • Data for training veracity prediction is at data/fact_checking/v5/*.json.
      • Note: dev.json uses ground truth evidence for validation, where eval.json uses predicted evidence for validation. This is consistent with the settings in KGAT.
    • Evidence retrieval models are not required for training LOREN, since we directly adopt the retrieved evidence from KGAT, which is at data/fever/baked_data/ (using only during pre-processing).
    • Original data is at data/fever/ (using only during pre-processing).
  • Pre-trained checkpoints at Huggingface Models. Unzip it and put them under LOREN/models/.

    • Checkpoints for veracity prediciton are at models/fact_checking/.
    • Checkpoint for generative MRC is at models/mrc_seq2seq/.
    • Checkpoints for KGAT evidence retrieval models are at models/evidence_retrieval/ (not used in training, displayed only for the sake of completeness).

Training LOREN from Scratch

For quick training and inference with pre-processed data & pre-trained models, please go to Veracity Prediction.

First, go to LOREN/src/.

1 Building Local Premises from Scratch

1) Extract claim phrases and generate questions

You'll need to download three external models in this step, i.e., two models from AllenNLP in parsing_client/sentence_parser.py and a T5-based question generation model in qg_client/question_generator.py. Don't worry, they'll be automatically downloaded.

  • Run python3 pproc_client/pproc_questions.py --roles eval train val test
  • This generates cached json files:
    • AG_PREFIX/answer.{role}.cache: extracted phrases are stored in the field answers.
    • QG_PREFIX/question.{role}.cache: generated questions are stored in the field cloze_qs, generate_qs and questions (two types of questions concatenated).

2) Train Seq2Seq MRC

Prepare self-supervised MRC data (only for SUPPORTED claims)
  • Run python3 pproc_client/pproc_mrc.py -o LOREN/data/mrc_seq2seq_v5.
  • This generates files for Seq2Seq training in a HuggingFace style:
    • data/mrc_seq2seq_v5/{role}.source: concatenated question and evidence text.
    • data/mrc_seq2seq_v5/{role}.target: answer (claim phrase).
Training Seq2Seq
  • Go to mrc_client/seq2seq/, which is modified based on HuggingFace's examples.
  • Follow script/train.sh.
  • The best checkpoint will be saved in $output_dir (e.g., models/mrc_seq2seq/).
    • Best checkpoints are decided by ROUGE score on dev set.

3) Run MRC for all questions and assemble local premises

  • Run python3 pproc_client/pproc_evidential.py --roles val train eval test -m PATH_TO_MRC_MODEL/.
  • This generates files:
    • {role}.json: files for veracity prediction. Assembled local premises are stored in the field evidential_assembled.

4) Building NLI prior

Before training veracity prediction, we'll need a NLI prior from pre-trained NLI models, such as DeBERTa.

  • Run python3 pproc_client/pproc_nli_labels.py -i PATH_TO/{role}.json -m microsoft/deberta-large-mnli.
  • Mind the order! The predicted classes [Contradict, Neutral, Entailment] correspond to [REF, NEI, SUP], respectively.
  • This generates files:
    • Adding a new field nli_labels to {role}.json.

2 Veracity Prediction

This part is rather easy (less pipelined :P). A good place to start if you want to skip the above pre-processing.

1) Training

  • Go to folder check_client/.
  • See what scripts/train_*.sh does.

2) Testing

  • Stay in folder check_client/
  • Run python3 fact_checker.py --params PARAMS_IN_THE_CODE
  • This generates files:
    • results/*.predictions.jsonl

3) Evaluation

  • Go to folder eval_client/

  • For Label Accuracy and FEVER score: fever_scorer.py

  • For CulpA (turn on --verbose in testing): culpa.py

Citation

If you find our paper or resources useful to your research, please kindly cite our paper (pre-print, official published paper coming soon).

@misc{chen2021loren,
      title={LOREN: Logic-Regularized Reasoning for Interpretable Fact Verification}, 
      author={Jiangjie Chen and Qiaoben Bao and Changzhi Sun and Xinbo Zhang and Jiaze Chen and Hao Zhou and Yanghua Xiao and Lei Li},
      year={2021},
      eprint={2012.13577},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
Owner
Jiangjie Chen
Ph.D. student.
Jiangjie Chen
Implementation for Paper "Inverting Generative Adversarial Renderer for Face Reconstruction"

StyleGAR TODO: add arxiv link Implementation of Inverting Generative Adversarial Renderer for Face Reconstruction TODO: for test Currently, some model

155 Oct 27, 2022
Code and Datasets from the paper "Self-supervised contrastive learning for volcanic unrest detection from InSAR data"

Code and Datasets from the paper "Self-supervised contrastive learning for volcanic unrest detection from InSAR data" You can download the pretrained

Bountos Nikos 3 May 07, 2022
Deeper insights into graph convolutional networks for semi-supervised learning

deeper_insights_into_GCNs Deeper insights into graph convolutional networks for semi-supervised learning References data and utils.py come from Implem

Davidham3 17 Dec 16, 2022
This dlib-based facial login system

Facial-Login-System This dlib-based facial login system is a technology capable of matching a human face from a digital webcam frame capture against a

Mushahid Ali 3 Apr 23, 2022
This repo provides function call to track multi-objects in videos

Custom Object Tracking Introduction This repo provides function call to track multi-objects in videos with a given trained object detection model and

Jeff Lo 51 Nov 22, 2022
The 2nd Version Of Slothybot

SlothyBot Go to this website: "https://bitly.com/SlothyBot" The 2nd Version Of Slothybot. The Bot Has Many Features, Such As: Moderation Commands; Kic

Slothy 0 Jun 01, 2022
Code image classification of MNIST dataset using different architectures: simple linear NN, autoencoder, and highway network

Deep Learning for image classification pip install -r http://webia.lip6.fr/~baskiotisn/requirements-amal.txt Train an autoencoder python3 train_auto

Hector Kohler 0 Mar 30, 2022
Code for weakly supervised segmentation of a single class

SingleClassRL Implementation of weak single object segmentation from paper "Regularized Loss for Weakly Supervised Single Class Semantic Segmentation"

16 Nov 14, 2022
Just-Now - This Is Just Now Login Friendlist Cloner Tools

JUST NOW LOGIN FRIENDLIST CLONER TOOLS Install $ apt update $ apt upgrade $ apt

MAHADI HASAN AFRIDI 21 Mar 09, 2022
DAT4 - General Assembly's Data Science course in Washington, DC

DAT4 Course Repository Course materials for General Assembly's Data Science course in Washington, DC (12/15/14 - 3/16/15). Instructors: Sinan Ozdemir

Kevin Markham 779 Dec 25, 2022
Few-shot Learning of GPT-3

Few-shot Learning With Language Models This is a codebase to perform few-shot "in-context" learning using language models similar to the GPT-3 paper.

Tony Z. Zhao 224 Dec 28, 2022
Unleashing Transformers: Parallel Token Prediction with Discrete Absorbing Diffusion for Fast High-Resolution Image Generation from Vector-Quantized Codes

Unleashing Transformers: Parallel Token Prediction with Discrete Absorbing Diffusion for Fast High-Resolution Image Generation from Vector-Quantized C

Sam Bond-Taylor 139 Jan 04, 2023
This repository contains an implementation of the Permutohedral Attention Module in Pytorch

Permutohedral_attention_module This repository contains an implementation of the Permutohedral Attention Module

Samuel JOUTARD 26 Nov 27, 2022
ExCon: Explanation-driven Supervised Contrastive Learning

ExCon: Explanation-driven Supervised Contrastive Learning Link to the paper: https://arxiv.org/pdf/2111.14271.pdf Contributors of this repo: Zhibo Zha

Zhibo (Darren) Zhang 18 Nov 01, 2022
Using machine learning to predict and analyze high and low reader engagement for New York Times articles posted to Facebook.

How The New York Times can increase Engagement on Facebook Using machine learning to understand characteristics of news content that garners "high" Fa

Jessica Miles 0 Sep 16, 2021
The Surprising Effectiveness of Visual Odometry Techniques for Embodied PointGoal Navigation

PointNav-VO The Surprising Effectiveness of Visual Odometry Techniques for Embodied PointGoal Navigation Project Page | Paper Table of Contents Setup

Xiaoming Zhao 41 Dec 15, 2022
The implementation code for "DAGAN: Deep De-Aliasing Generative Adversarial Networks for Fast Compressed Sensing MRI Reconstruction"

DAGAN This is the official implementation code for DAGAN: Deep De-Aliasing Generative Adversarial Networks for Fast Compressed Sensing MRI Reconstruct

TensorLayer Community 159 Nov 22, 2022
PiCIE: Unsupervised Semantic Segmentation using Invariance and Equivariance in clustering (CVPR2021)

PiCIE: Unsupervised Semantic Segmentation using Invariance and Equivariance in Clustering Jang Hyun Cho1, Utkarsh Mall2, Kavita Bala2, Bharath Harihar

Jang Hyun Cho 164 Dec 30, 2022
A time series processing library

Timeseria Timeseria is a time series processing library which aims at making it easy to handle time series data and to build statistical and machine l

Stefano Alberto Russo 11 Aug 08, 2022
Pytorch implementation of COIN, a framework for compression with implicit neural representations 🌸

COIN 🌟 This repo contains a Pytorch implementation of COIN: COmpression with Implicit Neural representations, including code to reproduce all experim

Emilien Dupont 104 Dec 14, 2022