One implementation of the paper "DMRST: A Joint Framework for Document-Level Multilingual RST Discourse Segmentation and Parsing".

Overview

Introduction

  • One implementation of the paper "DMRST: A Joint Framework for Document-Level Multilingual RST Discourse Segmentation and Parsing".
  • Users can apply it to parse the input text from scratch, and get the EDU segmentations and the parsed tree structure.
  • The model supports both sentence-level and document-level RST discourse parsing.
  • This repo and the pre-trained model is only for research use.

Package Requirements

  1. pytorch==1.7.1
  2. transformers==4.8.2

Supported Languages

We trained and evaluated the model with the multilingual collection of RST discourse treebanks, and it natively supports 6 languages: English, Portuguese, Spanish, German, Dutch, Basque. Interested users can also try other languages.

Data Format

  • [Input] InputSentence: The input document/sentence, and the raw text will be tokenizaed and encoded by the xlm-roberta-base language backbone. '|| ' denotes the EDU boundary positions.

    • Although the report, || which has released || before the stock market opened, || didn't trigger the 190.58 point drop in the Dow Jones Industrial Average, || analysts said || it did play a role in the market's decline. ||
  • [Output] EDU_Breaks: The indices of the EDU boundary tokens, including the last word of the sentence.

    • [2, 5, 10, 22, 24, 33]
  • [Output] tree_parsing_output: The model outputs of the discourse parsing tree follow this format.

    • (1:Satellite=Contrast:4,5:Nucleus=span:6) (1:Nucleus=Same-Unit:3,4:Nucleus=Same-Unite:4) (5:Satellite=Attribution:5,6:Nucleus=span:6) (1:Satellite=span:1,2:Nucleus=Elaboration:3) (2:Nucleus=span:2,3:Satellite=Temporal:3)

How to use it for parsing

  • Put the text paragraph to the file ./data/text_for_inference.txt.
  • Run the script MUL_main_Infer.py to obtain the RST parsing result. See the script for detailed model output.
  • We recommend users to run the parser on a GPU-equipped environment.

Citation

@article{liu2021dmrst,
  title={DMRST: A Joint Framework for Document-Level Multilingual RST Discourse Segmentation and Parsing},
  author={Liu, Zhengyuan and Shi, Ke and Chen, Nancy F},
  journal={arXiv preprint arXiv:2110.04518},
  year={2021}
}
@inproceedings{liu2020multilingual,
  title={Multilingual Neural RST Discourse Parsing},
  author={Liu, Zhengyuan and Shi, Ke and Chen, Nancy},
  booktitle={Proceedings of the 28th International Conference on Computational Linguistics},
  pages={6730--6738},
  year={2020}
}
Owner
seq-to-mind
seq-to-mind
Codes for ACL-IJCNLP 2021 Paper "Zero-shot Fact Verification by Claim Generation"

Zero-shot-Fact-Verification-by-Claim-Generation This repository contains code and models for the paper: Zero-shot Fact Verification by Claim Generatio

Liangming Pan 47 Jan 01, 2023
Efficient and intelligent interactive segmentation annotation software

Efficient and intelligent interactive segmentation annotation software

294 Dec 30, 2022
ImageNet-CoG is a benchmark for concept generalization. It provides a full evaluation framework for pre-trained visual representations which measure how well they generalize to unseen concepts.

The ImageNet-CoG Benchmark Project Website Paper (arXiv) Code repository for the ImageNet-CoG Benchmark introduced in the paper "Concept Generalizatio

NAVER 23 Oct 09, 2022
Pytorch implementation of TailCalibX : Feature Generation for Long-tail Classification

TailCalibX : Feature Generation for Long-tail Classification by Rahul Vigneswaran, Marc T. Law, Vineeth N. Balasubramanian, Makarand Tapaswi [arXiv] [

Rahul Vigneswaran 34 Jan 02, 2023
Large scale embeddings on a single machine.

Marius Marius is a system under active development for training embeddings for large-scale graphs on a single machine. Training on large scale graphs

Marius 107 Jan 03, 2023
Regression Metrics Calculation Made easy for tensorflow2 and scikit-learn

Regression Metrics Installation To install the package from the PyPi repository you can execute the following command: pip install regressionmetrics I

Ashish Patel 11 Dec 16, 2022
Code repository for "Free View Synthesis", ECCV 2020.

Free View Synthesis Code repository for "Free View Synthesis", ECCV 2020. Setup Install the following Python packages in your Python environment - num

Intelligent Systems Lab Org 253 Dec 07, 2022
Extreme Lightwegith Portrait Segmentation

Extreme Lightwegith Portrait Segmentation Please go to this link to download code Requirements python 3 pytorch = 0.4.1 torchvision==0.2.1 opencv-pyt

HYOJINPARK 59 Dec 16, 2022
An Approach to Explore Logistic Regression Models

User-centered Regression An Approach to Explore Logistic Regression Models This tool applies the potential of Attribute-RadViz in identifying correlat

0 Nov 12, 2021
For encoding a text longer than 512 tokens, for example 800. Set max_pos to 800 during both preprocessing and training.

LongScientificFormer For encoding a text longer than 512 tokens, for example 800. Set max_pos to 800 during both preprocessing and training. Some code

Athar Sefid 6 Nov 02, 2022
Elevation Mapping on GPU.

Elevation Mapping cupy Overview This is a ros package of elevation mapping on GPU. Code are written in python and uses cupy for GPU calculation. * pla

Robotic Systems Lab - Legged Robotics at ETH Zürich 183 Dec 19, 2022
Code release for ConvNeXt model

A ConvNet for the 2020s Official PyTorch implementation of ConvNeXt, from the following paper: A ConvNet for the 2020s. arXiv 2022. Zhuang Liu, Hanzi

Meta Research 4.6k Jan 08, 2023
Open-AI's DALL-E for large scale training in mesh-tensorflow.

DALL-E in Mesh-Tensorflow [WIP] Open-AI's DALL-E in Mesh-Tensorflow. If this is similarly efficient to GPT-Neo, this repo should be able to train mode

EleutherAI 432 Dec 16, 2022
Retinal Vessel Segmentation with Pixel-wise Adaptive Filters (ISBI 2022)

Retinal Vessel Segmentation with Pixel-wise Adaptive Filters (ISBI 2022) Introdu

anonymous 14 Oct 27, 2022
PyTorchVideo is a deeplearning library with a focus on video understanding work

PyTorchVideo is a deeplearning library with a focus on video understanding work. PytorchVideo provides resusable, modular and efficient components needed to accelerate the video understanding researc

Facebook Research 2.7k Jan 07, 2023
Implementation of Ag-Grid component for Streamlit

streamlit-aggrid AgGrid is an awsome grid for web frontend. More information in https://www.ag-grid.com/. Consider purchasing a license from Ag-Grid i

Pablo Fonseca 556 Dec 31, 2022
Bib-parser - Convenient script to parse .bib files with the ACM Digital Library like metadata

Bib Parser Convenient script to parse .bib files with the ACM Digital Library li

Mehtab Iqbal (Shahan) 1 Jan 26, 2022
[CVPR 2021] MetaSAug: Meta Semantic Augmentation for Long-Tailed Visual Recognition

MetaSAug: Meta Semantic Augmentation for Long-Tailed Visual Recognition (CVPR 2021) arXiv Prerequisite PyTorch = 1.2.0 Python3 torchvision PIL argpar

51 Nov 11, 2022
An excellent hash algorithm combining classical sponge structure and RNN.

SHA-RNN Recurrent Neural Network with Chaotic System for Hash Functions Anonymous Authors [摘要] 在这次作业中我们提出了一种新的 Hash Function —— SHA-RNN。其以海绵结构为基础,融合了混

Houde Qian 5 May 15, 2022
Code for ACL2021 long paper: Knowledgeable or Educated Guess? Revisiting Language Models as Knowledge Bases

LANKA This is the source code for paper: Knowledgeable or Educated Guess? Revisiting Language Models as Knowledge Bases (ACL 2021, long paper) Referen

Boxi Cao 30 Oct 24, 2022