Code for ACL 2020 paper "Rigid Formats Controlled Text Generation"

Last update: Dec 17, 2022

Overview

SongNet

SongNet: SongCi + Song (Lyrics) + Sonnet + etc.

@inproceedings{li-etal-2020-rigid,
    title = "Rigid Formats Controlled Text Generation",
    author = "Li, Piji and Zhang, Haisong and Liu, Xiaojiang and Shi, Shuming",
    booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",
    month = jul,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.acl-main.68",
    doi = "10.18653/v1/2020.acl-main.68",
    pages = "742--751"
}

Run

python prepare_data.py
./train.sh

Evaluation

Modify test.py: m_path = the best dev model
./test.sh
python metrics.py

Polish

./polish.sh

Download

The pretrained Chinese Language Model: https://drive.google.com/file/d/1g2tGyUwPe86vPn2nub1vkQva5lwtZ6Rd/view
The finetuned SongCi model: https://drive.google.com/file/d/16A2AzuU7slf7xj2QdLcBAorUCCaCk650/view

Reference

Guyu: https://github.com/lipiji/Guyu
Pretraining：https://github.com/lipiji/big_tpl_zh_10_base

Code for ACL 2020 paper "Rigid Formats Controlled Text Generation"

Related tags

Overview

SongNet

Run

Evaluation

Polish

Download

Reference

Owner

Piji Li

jiant is an NLP toolkit

Gold standard corpus annotated with verb-preverb connections for Hungarian.

LOT: A Benchmark for Evaluating Chinese Long Text Understanding and Generation

Implementation of COCO-LM, Correcting and Contrasting Text Sequences for Language Model Pretraining, in Pytorch

A Word Level Transformer layer based on PyTorch and 🤗 Transformers.

Sorce code and datasets for "K-BERT: Enabling Language Representation with Knowledge Graph",

构建一个多源（公众号、RSS）、干净、个性化的阅读环境

✨Fast Coreference Resolution in spaCy with Neural Networks

The Easy-to-use Dialogue Response Selection Toolkit for Researchers

Automated question generation and question answering from Turkish texts using text-to-text transformers

EMNLP'2021: Can Language Models be Biomedical Knowledge Bases?

A cross platform OCR Library based on PaddleOCR & OnnxRuntime

NumPy String-Indexed is a NumPy extension that allows arrays to be indexed using descriptive string labels

CVSS: A Massively Multilingual Speech-to-Speech Translation Corpus

We have built a Voice based Personal Assistant for people to access files hands free in their device using natural language processing.

Shared code for training sentence embeddings with Flax / JAX

Dé op-de-vlucht Pieton vertaler. Wereldwijd gebruikt door meer dan 1.000+ succesvolle bedrijven!

Code for the paper TestRank: Bringing Order into Unlabeled Test Instances for Deep Learning Tasks

A PyTorch-based model pruning toolkit for pre-trained language models

🍊 PAUSE (Positive and Annealed Unlabeled Sentence Embedding), accepted by EMNLP'2021 🌴