Course materials for a 3-day seminar "Machine Learning and NLP: Advances and Applications" at New College of Florida

Overview

Machine Learning and NLP: Advances and Applications

This repository hosts the course materials used for a 3-day seminar "Machine Learning and NLP: Advances and Applications" as part of Independent Study Period 2020 at New College of Florida.

Note that the seminar was held in Jan 2020, and the content may be a little bit oudated (as of Feb 2022). Please also refer to a Fall 2021 full semester course "CIS6930 Topics in Computing for Data Science", which covers much wider (and a little bit newer) Deep Learning topics.

Syllabus

Course Description

This 3-day course provides students with an opportunity to learn Machine Learning and Natural Language Processing (NLP) from basics to applications. The course covers some state-of-the-art NLP techniques including Deep Learning. Each day consists of a lecture and a hands-on session to help students learn how to apply those techniques to real-world applications. During the hands-on session, students will be given assignments to develop programming code in Python. Three days are too short to fully understand the concepts that are covered by the course and learn to apply those techniques to actual problems. Students are strongly encouraged to complete reading assignments before the lecture to be ready for the course assignments, and bring a lot of questions to the course. :)

Learning Objectives

Students successfully completing the course will

  • demonstrate the ability to apply machine learning and natural language processing techniques to various types of problems.
  • demonstrate the ability to build their own machine learning models using Python libraries.
  • demonstrate the ability to read and understand research papers in ML and NLP.

Course Outline

  • Wed 1/22 Day 1: Machine Learning basics [Slides]

    • Machine learning examples
    • Problem formulation
    • Evaluation and hyper-parameter tuning
    • Data Processing basics with pandas
    • Machine Learning with scikit-learn
    • Hands-on material: [ipynb] Open In Colab
  • Thu 1/23 Day 2: NLP basics [Slides]

    • Unsupervised learning and visualization
    • Topic models
    • NLP basics with SpaCy and NLTK
    • Understanding NLP pipeline for feature extraction
    • Machine learning for NLP tasks (text classification, sequential tagging)
    • Hands-on material [ipynb] Open In Colab
    • Follow-up
      • Commonsense Reasoning (Winograd Schema Challenge)
  • Fri 1/24 Day 3: Advanced techniques and applications [Slides]

    • Basic Deep Learning techniques
    • Word embeddings
    • Advanced Deep Learning techniques for NLP
    • Problem formulation and applications to (non-)NLP tasks
    • Pre-training models: ELMo and BERT
    • Hands-on material: [ipynb] Open In Colab
    • Follow-up
      • The Illustrated Transformer – Jay Alammar – Visualizing machine learning one concept at a time
      • Cross-lingual word/sentence embeddings

Reading Assignments & Recommendations:

The following online tutorials for students who are not familiar with the Python libraries used in the course. Each day will have a hands-on session that requires those libraries. Please do not expect to have enough time to learn how to use those libraries during the lecture.

The following list is a good starting point.

The course will cover the following papers as examples of (non-NLP) applications (probably in Day 3.) Students who'd like to learn how to apply Deep Learning techniques to your own problems are encouraged to read the following papers.

  • [1] A. Asai, S. Evensen, B. Golshan, A. Halevy, V. Li, A. Lopatenko, D. Stepanov, Y. Suhara, W.-C. Tan, Y. Xu, "HappyDB: A Corpus of 100,000 Crowdsourced Happy Moments" Proc LREC 18, 2018. [Paper] [Dataset]
  • [2] S. Evensen, Y. Suhara, A. Halevy, V. Li, W.-C. Tan, S. Mumick, "Happiness Entailment: Automating Suggestions for Well-Being," Proc. ACII 2019, 2019. [Paper]
  • [3] Y. Suhara, Y. Xu, A. Pentland, "DeepMood: Forecasting Depressed Mood Based on Self-Reported Histories via Recurrent Neural Networks," Proc. WWW '17, 2017. [Paper]
  • [4] N. Bhutani, Y. Suhara, W.-C. Tan, A. Halevy, H. V. Jagadish, "Open Information Extraction from Question-Answer Pairs," Proc. NAACL-HLT 2019, 2019. [Paper]

Computing Resources:

The course requires students to write code:

  • Students are expected to have a personal computer at their disposal. Students should have a Python interpreter and the listed libraries installed on their machines.

The hands-on sessions will require the following Python libraries. Please install those libraries on your computer prior to the course. See also the reading assignment section for the recommended tutorials.

  • pandas
  • scikit-learn
  • gensim
  • spacy
  • nltk
  • torch (PyTorch)
Owner
Yoshi Suhara
Yoshi Suhara
addons to the turtle package that help you drew stuff more quickly

TurtlePlus addons to the turtle package that help you drew stuff more quickly --------------

1 Nov 18, 2021
SpellingBeeSolver - This program generates solutions to NYT style spelling bee problems.

SpellingBeeSolver This program generates solutions to NYT style spelling bee problems. The initial version of this program is being written in Python

1 Jan 01, 2022
Create a simple program by applying the use of class

TUGAS PRAKTIKUM 8 💻 Nama : Achmad Mahfud NIM : 312110520 Kelas : TI.21.C5 Perintah : Buat program sederhana dengan mengaplikasikan pengguna

Achmad Mahfud 1 Dec 23, 2021
It is a personal assistant chatbot, capable to perform many tasks same as Google Assistant plus more extra features...

PersonalAssistant It is an Personal Assistant, capable to perform many tasks with some unique features, that you haven'e seen yet.... Features / Tasks

Roshan Kumar 95 Dec 21, 2022
Identify and annotate mutations from genome editing assays.

CRISPR-detector Here we propose our CRISPR-detector to facilitate the CRISPR-edited amplicon and whole genome sequencing data analysis, with functions

hlcas 2 Feb 20, 2022
Simple Python Gemini browser with nice formatting

gg I wasn't satisfied with any of the other available Gemini clients, so I wrote my own. Requires Python 3.9 (maybe older, I haven't checked) and opti

Sarah Taube 2 Nov 21, 2021
Decipher using Markov Chain Monte Carlo

Decipher using Markov Chain Monte Carlo

Science étonnante 43 Dec 24, 2022
The best free and open-source automated time tracker. Cross-platform, extensible, privacy-focused.

Records what you do so that you can know how you've spent your time. All in a secure way where you control the data. Website — Forum — Documentation —

ActivityWatch 7.8k Jan 09, 2023
Prop-based map editor for the Apex Legends mod, R5Reloaded

R5R Map Editor A tool to build maps out of props in the Apex Legends mod, R5Reloaded Instuctions Install R5R Download this program Get the prop spawne

7 Dec 16, 2022
Demo Python project using Conda and Poetry

Conda Poetry This is a demonstration of how Conda and Poetry can be used in a Python project for dev dependency management and production deployment.

Ryan Allen 2 Apr 26, 2022
Hacktoberfest 2021 contribution repository✨

🎃 HacktoberFest-2021 🎃 Repository for Hacktoberfest Note: Although, We are actively focusing on Machine Learning, Data Science and Tricky Python pro

Manjunatha Sai Uppu 42 Dec 11, 2022
- Auto join teams teams ( from calendar invite )

Auto Join Teams Meetings Requirements: Python 3.7 or higher Latest Google Chrome This script automatically logins to your account and joins the meetin

Prajin Khadka 10 Aug 20, 2022
Hands-on machine learning workshop

emb-ntua-workshop This workshop discusses introductory concepts of machine learning and data mining following a hands-on approach using popular tools

ISSEL Soft Eng Team 12 Oct 30, 2022
VCM EE1.2 P-layer feature map anchor generation 137th MPEG-VCM

VCM EE1.2 P-layer feature map anchor generation 137th MPEG-VCM

IPSL 6 Oct 18, 2022
Adansons Base is a data management tool that organizes metadata of unstructured data and creates and organizes datasets.

Adansons Base is a data management tool that organizes metadata of unstructured data and creates and organizes datasets. It makes dataset creation more effective and helps find essential insights fro

Adansons Inc 27 Oct 22, 2022
Advanced python code - For students in my advanced python class

advanced_python_code For students in my advanced python class Week Topic Recordi

Ariel Avshalom 3 May 27, 2022
Coderslab Workshop Projects

Workshop Coderslab workshop projects that include: Guessing Game Lotto simulator Guessing Game vol.2 Guessing Game vol.3 Dice 2001 Game Technologies P

Szymon Połczyński 1 Nov 06, 2021
Open Source Repository for CFD Solvers

Background and Validation This wiki is built in Notion. Here are all the tips you need to contribute. General Background Flow over a cylinder The proj

1 Dec 30, 2021
An implementation of an interpreter for the Brainfuck esoteric language in Python

Brainfuck Interpreter in Python An implementation of an interpreter for the Brainfuck esoteric language in Python. 🧠 The Brainfuck Language Created i

Carlos Santos 0 Feb 01, 2022
Python interface to ISLEX, an English IPA pronunciation dictionary with syllable and stress marking.

pysle Questions? Comments? Feedback? Pronounced like 'p' + 'isle'. An interface to a pronunciation dictionary with stress markings (ISLEX - the intern

Tim 38 Dec 14, 2022