Grammar Induction using a Template Tree Approach

Related tags

Deep Learninggitta
Overview

Gitta

Gitta ("Grammar Induction using a Template Tree Approach") is a method for inducing context-free grammars. It performs particularly well on datasets that have latent templates, e.g. forum topics, writing prompts and output from template-based text generators. The found context-free grammars can easily be converted into grammars for use in grammar languages such as Tracery & Babbly.

Demo

A demo for Gitta can be found & executed on Google Colaboratory.

Example

dataset = [
    "I like cats and dogs",
    "I like bananas and geese",
    "I like geese and cats",
    "bananas are not supposed to be in a salad",
    "geese are not supposed to be in the zoo",
]
induced_grammar = grammar_induction.induce_grammar_using_template_trees(
    dataset,
    relative_similarity_threshold=0.1,
)
print(induced_grammar)
print(induced_grammar.generate_all())

Outputs as grammar:

{
    "origin": [
        "<B> are not supposed to be in <C>",
        "I like <B> and <B>"
    ],
    "B": [
        "bananas",
        "cats",
        "dogs",
        "geese"
    ],
    "C": [
        "a salad",
        "the zoo"
    ]
}

Which in turn generates all these texts:

{"dogs are not supposed to be in the zoo",
"cats are not supposed to be in a salad",
"I like geese and cats",
"cats are not supposed to be in the zoo", 
bananas are not supposed to be in a salad",
"I like dogs and dogs",
"bananas are not supposed to be in the zoo",
"I like dogs and bananas",
"geese are not supposed to be in the zoo",
"geese are not supposed to be in a salad",
"I like cats and dogs",
"I like dogs and geese",
"I like cats and bananas",
"I like bananas and dogs",
"I like bananas and bananas",
"I like cats and geese",
"I like geese and dogs",
"I like dogs and cats",
"I like geese and bananas",
"I like bananas and geese",
"dogs are not supposed to be in a salad",
"I like cats and cats",
"I like geese and geese",
"I like bananas and cats"}

Performance

We tested out this grammar induction algorithm on Twitterbots using the Tracery grammar modelling tool. Gitta only saw either 25, 50 or 100 example generations, and had to introduce a grammar that could generate similar texts. Every setting was run 5 times, and the median number of in-language texts (generations that were also produced by the original grammar) and not in-language texts (texts that the induced grammar generated, but not the original grammar). The median number of production rules is also included, to show its generalisation performance.

Grammar 25 examples 50 examples 100 examples
Name # generations size in lang not in lang size in lang not in lang size in lang not in lang size
botdoesnot 380292 363 648 0 64 2420 0 115 1596 4 179
BotSpill 43452 249 75 0 32 150 0 62 324 0 126
coldteabot 448 24 39 0 38 149 19 63 388 9 78
hometapingkills 4080 138 440 0 48 1184 3240 76 2536 7481 106
InstallingJava 390096 95 437 230 72 2019 1910 146 1156 3399 228
pumpkinspiceit 6781 6885 25 0 26 50 0 54 100 8 110
SkoolDetention 224 35 132 0 31 210 29 41 224 29 49
soundesignquery 15360 168 256 179 52 76 2 83 217 94 152
whatkilledme 4192 132 418 0 45 1178 0 74 2646 0 108
Whinge_Bot 450805 870 3092 6 80 16300 748 131 59210 1710 222

Credits & Paper citation

If you like this work, consider following me on Twitter. If use this work in an academic context, please consider citing the following paper:

@article{winters2020gitta,
    title={Discovering Textual Structures: Generative Grammar Induction using Template Trees},
    author={Winters, Thomas and De Raedt, Luc},
    journal={Proceedings of the 11th International Conference on Computational Creativity},
    pages = {177-180},
    year={2020},
    publisher={Association for Computational Creativity}
}

Or APA style:

Winters, T., & De Raedt, L. (2020). Discovering Textual Structures: Generative Grammar Induction using Template Trees. Proceedings of the 11th International Conference on Computational Creativity.
Owner
Thomas Winters
PhD Researcher in Creative Artificial Intelligence @ KU Leuven.
Thomas Winters
SalFBNet: Learning Pseudo-Saliency Distribution via Feedback Convolutional Networks

SalFBNet This repository includes Pytorch implementation for the following paper: SalFBNet: Learning Pseudo-Saliency Distribution via Feedback Convolu

12 Aug 12, 2022
Visualizer for neural network, deep learning, and machine learning models

Netron is a viewer for neural network, deep learning and machine learning models. Netron supports ONNX (.onnx, .pb, .pbtxt), Keras (.h5, .keras), Tens

Lutz Roeder 21k Jan 06, 2023
Rainbow DQN implementation that outperforms the paper's results on 40% of games using 20x less data 🌈

Rainbow 🌈 An implementation of Rainbow DQN which reaches a median HNS of 205.7 after only 10M frames (the original Rainbow from Hessel et al. 2017 re

Dominik Schmidt 31 Dec 21, 2022
LSTMs (Long Short Term Memory) RNN for prediction of price trends

Price Prediction with Recurrent Neural Networks LSTMs BTC-USD price prediction with deep learning algorithm. Artificial Neural Networks specifically L

5 Nov 12, 2021
Tree Nested PyTorch Tensor Lib

DI-treetensor treetensor is a generalized tree-based tensor structure mainly developed by OpenDILab Contributors. Almost all the operation can be supp

OpenDILab 167 Dec 29, 2022
[CVPR2021 Oral] End-to-End Video Instance Segmentation with Transformers

VisTR: End-to-End Video Instance Segmentation with Transformers This is the official implementation of the VisTR paper: Installation We provide instru

Yuqing Wang 687 Jan 07, 2023
Gym-TORCS is the reinforcement learning (RL) environment in TORCS domain with OpenAI-gym-like interface.

Gym-TORCS Gym-TORCS is the reinforcement learning (RL) environment in TORCS domain with OpenAI-gym-like interface. TORCS is the open-rource realistic

naoto yoshida 400 Dec 27, 2022
A Python toolbox to create adversarial examples that fool neural networks in PyTorch, TensorFlow, and JAX

Foolbox Native: Fast adversarial attacks to benchmark the robustness of machine learning models in PyTorch, TensorFlow, and JAX Foolbox is a Python li

Bethge Lab 2.4k Dec 25, 2022
This GitHub repo consists of Code and Some results of project- Diabetes Treatment using Gold nanoparticles. These Consist of ML Models used for prediction Diabetes and further the basic theory and working of Gold nanoparticles.

GoldNanoparticles This GitHub repo consists of Code and Some results of project- Diabetes Treatment using Gold nanoparticles. These Consist of ML Mode

1 Jan 30, 2022
Wileless-PDGNet Implementation

Wileless-PDGNet Implementation This repo is related to the following paper: Boning Li, Ananthram Swami, and Santiago Segarra, "Power allocation for wi

6 Oct 04, 2022
Deep-learning X-Ray Micro-CT image enhancement, pore-network modelling and continuum modelling

EDSR modelling A Github repository for deep-learning image enhancement, pore-network and continuum modelling from X-Ray Micro-CT images. The repositor

Samuel Jackson 7 Nov 03, 2022
Object-Centric Learning with Slot Attention

Slot Attention This is a re-implementation of "Object-Centric Learning with Slot Attention" in PyTorch (https://arxiv.org/abs/2006.15055). Requirement

Untitled AI 72 Jan 02, 2023
Paddle implementation for "Highly Efficient Knowledge Graph Embedding Learning with Closed-Form Orthogonal Procrustes Analysis" (NAACL 2021)

ProcrustEs-KGE Paddle implementation for Highly Efficient Knowledge Graph Embedding Learning with Orthogonal Procrustes Analysis 🙈 A more detailed re

Lincedo Lab 4 Jun 09, 2021
SCAAML is a deep learning framwork dedicated to side-channel attacks run on top of TensorFlow 2.x.

SCAAML (Side Channel Attacks Assisted with Machine Learning) is a deep learning framwork dedicated to side-channel attacks. It is written in python and run on top of TensorFlow 2.x.

Google 69 Dec 21, 2022
The 2nd place solution of 2021 google landmark retrieval on kaggle.

Google_Landmark_Retrieval_2021_2nd_Place_Solution The 2nd place solution of 2021 google landmark retrieval on kaggle. Environment We use cuda 11.1/pyt

229 Dec 13, 2022
PyTorch implementation for ACL 2021 paper "Maria: A Visual Experience Powered Conversational Agent".

Maria: A Visual Experience Powered Conversational Agent This repository is the Pytorch implementation of our paper "Maria: A Visual Experience Powered

Jokie 22 Dec 12, 2022
Intrinsic Image Harmonization

Intrinsic Image Harmonization [Paper] Zonghui Guo, Haiyong Zheng, Yufeng Jiang, Zhaorui Gu, Bing Zheng Here we provide PyTorch implementation and the

VISION @ OUC 44 Dec 21, 2022
Pervasive Attention: 2D Convolutional Networks for Sequence-to-Sequence Prediction

This is a fork of Fairseq(-py) with implementations of the following models: Pervasive Attention - 2D Convolutional Neural Networks for Sequence-to-Se

Maha 490 Dec 15, 2022
Official Implementation of Swapping Autoencoder for Deep Image Manipulation (NeurIPS 2020)

Swapping Autoencoder for Deep Image Manipulation Taesung Park, Jun-Yan Zhu, Oliver Wang, Jingwan Lu, Eli Shechtman, Alexei A. Efros, Richard Zhang UC

449 Dec 27, 2022
Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.

Tensor2Tensor Tensor2Tensor, or T2T for short, is a library of deep learning models and datasets designed to make deep learning more accessible and ac

12.9k Jan 09, 2023