A repo for materials relating to the tutorial of CS-332 NLP

Last update: Feb 15, 2022

Overview

CS-332-NLP

A repo for materials relating to the tutorial of CS-332 NLP

Tutorial 1:
- Introduction
- Corpus
- Regular expression
- Tokenization
Tutorial 2:
- Normalization
- Parsing
- Morpheme
- Stemming
- Lemmatization

Acknowledgements

Speech and Language Processing. Daniel Jurafsky & James H. Martin. (Edition 2 & 3)
Marcinkiewicz, M. A. (1994). Building a large annotated corpus of English: The Penn Treebank. Using Large Corpora, 273.
http://su.diva-portal.org/smash/record.jsf?pid=diva2%3A686162&dswid=9114

Owner

Alok singh

GitHub Repository

ProteinBERT is a universal protein language model pretrained on ~106M proteins from the UniRef90 dataset.

ProteinBERT is a universal protein language model pretrained on ~106M proteins from the UniRef90 dataset. Through its Python API, the pretrained model can be fine-tuned on any protein-related task in

241 Jan 04, 2023

MicBot - MicBot uses Google Translate to speak everyone's chat messages

MicBot MicBot uses Google Translate to speak everyone's chat messages. It can al

2 Mar 09, 2022

This is the source code of RPG (Reward-Randomized Policy Gradient)

RPG (Reward-Randomized Policy Gradient) Zhenggang Tang*, Chao Yu*, Boyuan Chen, Huazhe Xu, Xiaolong Wang, Fei Fang, Simon Shaolei Du, Yu Wang, Yi Wu (

40 Nov 25, 2022

FactSumm: Factual Consistency Scorer for Abstractive Summarization

FactSumm: Factual Consistency Scorer for Abstractive Summarization FactSumm is a toolkit that scores Factualy Consistency for Abstract Summarization W

83 Jan 09, 2023

This is the offline-training-pipeline for our project.

offline-training-pipeline This is the offline-training-pipeline for our project. We adopt the offline training and online prediction Machine Learning

0 Apr 22, 2022

This is a project of data parallel that running on NLP tasks.

2 Dec 12, 2021

TEACh is a dataset of human-human interactive dialogues to complete tasks in a simulated household environment.

98 Dec 09, 2022

InfoBERT: Improving Robustness of Language Models from An Information Theoretic Perspective

InfoBERT: Improving Robustness of Language Models from An Information Theoretic Perspective This is the official code base for our ICLR 2021 paper

71 Nov 25, 2022

Hostapd-mac-tod-acl - Setup a hostapd AP with MAC ToD ACL

A brief explanation This script provides a quick way to setup a Time-of-day (Tod

2 Feb 03, 2022

Unsupervised Abstract Reasoning for Raven’s Problem Matrices

Unsupervised Abstract Reasoning for Raven’s Problem Matrices This code is the implementation of our TIP paper. This is the first unsupervised abstract

9 Dec 17, 2022

An Explainable Leaderboard for NLP

319 Dec 20, 2022

Use fastai-v2 with HuggingFace's pretrained transformers

FastHugs Use fastai v2 with HuggingFace's pretrained transformers, see the notebooks below depending on your task: Text classification: fasthugs_seq_c

111 Nov 16, 2022

🚀 RocketQA, dense retrieval for information retrieval and question answering, including both Chinese and English state-of-the-art models.

In recent years, the dense retrievers based on pre-trained language models have achieved remarkable progress. To facilitate more developers using cutt

475 Jan 04, 2023

A repo for materials relating to the tutorial of CS-332 NLP

Related tags

Overview

CS-332-NLP

Contents

Acknowledgements

Owner

Alok singh

ProteinBERT is a universal protein language model pretrained on ~106M proteins from the UniRef90 dataset.

MicBot - MicBot uses Google Translate to speak everyone's chat messages

This is the source code of RPG (Reward-Randomized Policy Gradient)

FactSumm: Factual Consistency Scorer for Abstractive Summarization

This is the offline-training-pipeline for our project.

This is a project of data parallel that running on NLP tasks.

TEACh is a dataset of human-human interactive dialogues to complete tasks in a simulated household environment.

InfoBERT: Improving Robustness of Language Models from An Information Theoretic Perspective

Hostapd-mac-tod-acl - Setup a hostapd AP with MAC ToD ACL

Unsupervised Abstract Reasoning for Raven’s Problem Matrices

An Explainable Leaderboard for NLP

Use fastai-v2 with HuggingFace's pretrained transformers

🚀 RocketQA, dense retrieval for information retrieval and question answering, including both Chinese and English state-of-the-art models.

Pipeline for fast building text classification TF-IDF + LogReg baselines.

A library for Multilingual Unsupervised or Supervised word Embeddings

The Classical Language Toolkit

Klexikon: A German Dataset for Joint Summarization and Simplification

Super Tickets in Pre-Trained Language Models: From Model Compression to Improving Generalization (ACL 2021)

A programming language with logic of Python, and syntax of all languages.

Code from the paper "High-Performance Brain-to-Text Communication via Handwriting"