Text Summarization - WCN — Weighted Contextual N-gram method for evaluation of Text Summarization

Last update: Jan 03, 2022

Related tags

Overview

Text Summarization

WCN — Weighted Contextual N-gram method for evaluation of Text Summarization

In this project, I fine tune T5 model on Extreme Summarization (XSum) Dataset achieving a rouge2 f score of 9.5% on test data. Further I discuss the drawbacks of ngram based metrics as well as contextual word metrics.

Finally, I propose use of Weighted Contextual N-gram (WCN) method – an alternative metric which can be more effective for evaluation of text generation tasks.

The complete documentation of the project can be found here

Dataset

I use the Extreme Summarization (XSum) Dataset. The dataset can be downloaded from here

The dataset consists of BBC articles and accompanying single sentence summaries. Specifically, each article is prefaced with an introductory sentence (aka summary) which is professionally written, typically by the author of the article.

There are two features in this dataset:
(1) document: Input news article.
(2) summary: Onesentence summary of the article.

The idea is to generate a short, one-sentence news summary answering the question ”What is the article about?”. There are in total 226k samples: 204,045 samples for training data, 11,332 samples for validation data and 11,334 samples for test data. The average number of words in a document is 431.07 (19.77 sentences) and the average number of words in a summary is 23.26.

Code

The source code for this project can be found at text_summarization.ipynb.

Text Summarization - WCN — Weighted Contextual N-gram method for evaluation of Text Summarization

Related tags

Overview

Text Summarization

WCN — Weighted Contextual N-gram method for evaluation of Text Summarization

Dataset

Code

Owner

Aditya Shah

Stochastic Extragradient: General Analysis and Improved Rates

docTR by Mindee (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.

The codes of paper 'Active-LATHE: An Active Learning Algorithm for Boosting the Error exponent for Learning Homogeneous Ising Trees'

Learning with Noisy Labels via Sparse Regularization, ICCV2021

This repository is dedicated to developing and maintaining code for experiments with wide neural networks.

Repository for Multimodal AutoML Benchmark

A Streamlit demo demonstrating the Deep Dream technique. Adapted from the TensorFlow Deep Dream tutorial.

Implementation of TabTransformer, attention network for tabular data, in Pytorch

Official code for "Simpler is Better: Few-shot Semantic Segmentation with Classifier Weight Transformer. ICCV2021".

TPH-YOLOv5: Improved YOLOv5 Based on Transformer Prediction Head for Object Detection on Drone-Captured Scenarios

PyTorch Implementation of Realtime Multi-Person Pose Estimation project.

Detectron2-FC a fast construction platform of neural network algorithm based on detectron2

Pyramid Scene Parsing Network, CVPR2017.

Pytorch reimplementation of PSM-Net: "Pyramid Stereo Matching Network"

RSNA Intracranial Hemorrhage Detection with python

The codes and related files to reproduce the results for Image Similarity Challenge Track 2.

A rough implementation of the paper "A Steering Algorithm for Redirected Walking Using Reinforcement Learning"

Pytorch Code for "Medical Transformer: Gated Axial-Attention for Medical Image Segmentation"

Here I will explain the flow to deploy your custom deep learning models on Ultra96V2.

Spatial-Temporal Transformer for Dynamic Scene Graph Generation, ICCV2021