System Combination for Grammatical Error Correction Based on Integer Programming

Related tags

Deep Learninggec_ip
Overview

System Combination for Grammatical Error Correction Based on Integer Programming


This repository contains the code and scripts that implement the system combination approach for grammatical error correction in Lin and Ng (2021).

Reference

Ruixi Lin and Hwee Tou Ng (2021). System Combination for Grammatical Error Correction Based on Integer Programming.

Please cite:

@inproceedings{lin2021gecip,
  author    = "Lin, Ruixi and Ng, Hwee Tou",
  title     = "System Combination for Grammatical Error Correction Based on Integer Programming",
  booktitle = "Proceedings of Recent Advances in Natural Language Processing",
  year      = "2021",
  pages     = "829-834"
}

Table of contents

Prerequisites

Example

License

Prerequisites

conda create --name comb python=3.6
conda activate comb
pip install spacy
python -m spacy download en

For the nonlinear integer programming solver, we use

LINGO10.0

Note that educational institutions can obtain a free license to use the LINGO solver.

Example

Combine the 3 GEC systems listed in the paper using the IP approach. The three systems are UEdin-MS (https://aclanthology.org/W19-4427), Kakao (https://aclanthology.org/W19-4423), and Tohoku (https://aclanthology.org/D19-1119). The core functions for the IP objective are implemented in model.lg4. You can find model.lg4 under lingo/inputs.

  1. Run python prepare_data.py -dir . -list kakao uedinms tohoku to generate aggregated TP, FP, and FN counts. The counts files are stored under lingo/inputs.

  2. Load model.lg4 into the LINGO console and specify the input data path with the counts file path, select the INLP model, and run optimizations. Store the solutions to lingo/outputs/sol_kakao_uedinms_tohoku.txt.

  3. Run ./comb.sh . sol_kakao_uedinms_tohoku.txt to load LINGO solutions, merge and apply edits. The resulted blind test file can be found under submissions. It can be zipped and submitted to the BEA CodeLab website (https://competitions.codalab.org/competitions/20228) for evaluations.

The data folder provides individual GEC system output files, and .m2 files generated using ERRANT for the listed systems. For more information, please visit the ERRANT github page.

We include the IP combined .m2 files under merged_m2, and the corresponding text files under submissions.

License

The source code and models in this repository are licensed under the GNU General Public License v3.0 (see LICENSE). For further research interests and commercial use of the code and models, please contact Ruixi Lin ([email protected]) and Prof. Hwee Tou Ng ([email protected]).

Owner
NUS NLP Group
National University of Singapore
NUS NLP Group
A Dataset of Python Challenges for AI Research

Python Programming Puzzles (P3) This repo contains a dataset of python programming puzzles which can be used to teach and evaluate an AI's programming

Microsoft 850 Dec 24, 2022
Binary classification for arrythmia detection with ECG datasets.

HEART DISEASE AI DATATHON 2021 [Eng] / [Kor] #English This is an AI diagnosis modeling contest that uses the heart disease echocardiography and electr

HY_Kim 3 Jul 14, 2022
This is the code repository for the paper "Identification of the Generalized Condorcet Winner in Multi-dueling Bandits" (NeurIPS 2021).

Code Repository for the Paper "Identification of the Generalized Condorcet Winner in Multi-dueling Bandits" (To appear in: Proceedings of NeurIPS20

1 Oct 03, 2022
A collection of 100 Deep Learning images and visualizations

A collection of Deep Learning images and visualizations. The project has been developed by the AI Summer team and currently contains almost 100 images.

AI Summer 65 Sep 12, 2022
My solutions for Stanford University course CS224W: Machine Learning with Graphs Fall 2021 colabs (GNN, GAT, GraphSAGE, GCN)

machine-learning-with-graphs My solutions for Stanford University course CS224W: Machine Learning with Graphs Fall 2021 colabs Course materials can be

Marko Njegomir 7 Dec 14, 2022
DP-CL(Continual Learning with Differential Privacy)

DP-CL(Continual Learning with Differential Privacy) This is the official implementation of the Continual Learning with Differential Privacy. If you us

Phung Lai 3 Nov 04, 2022
This project is the official implementation of our accepted ICLR 2021 paper BiPointNet: Binary Neural Network for Point Clouds.

BiPointNet: Binary Neural Network for Point Clouds Created by Haotong Qin, Zhongang Cai, Mingyuan Zhang, Yifu Ding, Haiyu Zhao, Shuai Yi, Xianglong Li

Haotong Qin 59 Dec 17, 2022
pybaum provides tools to work with pytrees which is a concept burrowed from JAX.

pybaum provides tools to work with pytrees which is a concept burrowed from JAX.

Open Source Economics 9 May 11, 2022
E2EC: An End-to-End Contour-based Method for High-Quality High-Speed Instance Segmentation

E2EC: An End-to-End Contour-based Method for High-Quality High-Speed Instance Segmentation E2EC: An End-to-End Contour-based Method for High-Quality H

zhangtao 146 Dec 29, 2022
Implements Stacked-RNN in numpy and torch with manual forward and backward functions

Recurrent Neural Networks Implements simple recurrent network and a stacked recurrent network in numpy and torch respectively. Both flavours implement

Vishal R 1 Nov 16, 2021
Instant-nerf-pytorch - NeRF trained SUPER FAST in pytorch

instant-nerf-pytorch This is WORK IN PROGRESS, please feel free to contribute vi

94 Nov 22, 2022
HairCLIP: Design Your Hair by Text and Reference Image

Overview This repository hosts the official PyTorch implementation of the paper: "HairCLIP: Design Your Hair by Text and Reference Image". Our single

322 Jan 06, 2023
Manage the availability of workspaces within Frappe/ ERPNext (sidebar) based on user-roles

Workspace Permissions Manage the availability of workspaces within Frappe/ ERPNext (sidebar) based on user-roles. Features Configure foreach workspace

Patrick.St. 18 Sep 26, 2022
Code for Towards Unifying Behavioral and Response Diversity for Open-ended Learning in Zero-sum Games

Unifying Behavioral and Response Diversity for Open-ended Learning in Zero-sum Games How to run our algorithm? Create the new environment using: conda

MARL @ SJTU 8 Dec 27, 2022
Codebase for ECCV18 "The Sound of Pixels"

Sound-of-Pixels Codebase for ECCV18 "The Sound of Pixels". *This repository is under construction, but the core parts are already there. Environment T

Hang Zhao 318 Dec 20, 2022
MEDS: Enhancing Memory Error Detection for Large-Scale Applications

MEDS: Enhancing Memory Error Detection for Large-Scale Applications Prerequisites cmake and clang Build MEDS supporting compiler $ make Build Using Do

Secomp Lab at Purdue University 34 Dec 14, 2022
Self-Supervised Methods for Noise-Removal

SSMNR | Self-Supervised Methods for Noise Removal Image denoising is the task of removing noise from an image, which can be formulated as the task of

1 Jan 16, 2022
Multi-Agent Reinforcement Learning (MARL) method to learn scalable control polices for multi-agent target tracking.

scalableMARL Scalable Reinforcement Learning Policies for Multi-Agent Control CD. Hsu, H. Jeong, GJ. Pappas, P. Chaudhari. "Scalable Reinforcement Lea

Christopher Hsu 17 Nov 17, 2022
Train a deep learning net with OpenStreetMap features and satellite imagery.

DeepOSM Classify roads and features in satellite imagery, by training neural networks with OpenStreetMap (OSM) data. DeepOSM can: Download a chunk of

TrailBehind, Inc. 1.3k Nov 24, 2022
Repo for EMNLP 2021 paper "Beyond Preserved Accuracy: Evaluating Loyalty and Robustness of BERT Compression"

beyond-preserved-accuracy Repo for EMNLP 2021 paper "Beyond Preserved Accuracy: Evaluating Loyalty and Robustness of BERT Compression" How to implemen

Kevin Canwen Xu 10 Dec 23, 2022