[email protected] of Asaf Mazar, Millad Kassaie and Georgios Chochlakis named "Powered by the Will? Exploring Lay Theories of Behavior Change through Social Media" | PythonRepo" /> [email protected] of Asaf Mazar, Millad Kassaie and Georgios Chochlakis named "Powered by the Will? Exploring Lay Theories of Behavior Change through Social Media" | PythonRepo">

This was initially the repo for the project of [email protected] of Asaf Mazar, Millad Kassaie and Georgios Chochlakis named "Powered by the Will? Exploring Lay Theories of Behavior Change through Social Media"

Overview

Subreddit Analysis

This repo includes tools for Subreddit analysis, originally developed for our class project of PSYC 626 in USC, titled "Powered by the Will?: Themes in online discussions of Fitness".

Installation and Requirements

You need to use Python 3.9, R 4.1.0 and git basically to run the scripts provided in this repo. For Ubuntu, to install essential dependencies:

sudo apt update
sudo apt install git python3.9 python3-pip
pip3 install virtualenv

Now clone this repo:

git clone https://github.com/gchochla/subreddit-analysis
cd subreddit-analysis

Create and activate a python environment to download the python requirements for the scripts:

~/.local/bin/virtualenv .venv
source .venv/bin/activate
pip install .

Usage

  1. Download a subreddit into a JSON that preserves the hierarchical structure of the posts by running:
python subreddit_analysis/subreddit_forest.py -r <SUBREDDIT_NAME>

where <SUBREDDIT_NAME> is the name of the subreddit after r/. You can also limit the number of submissions returned by setting -l <LIMIT>. The result can be found in the file <SUBREDDIT_NAME>-<NUM_OF_POSTS>-pushshift.json.

  1. Transform this JSON to a rectangle (CSV), you can use:
python subreddit_analysis/json_forest_to_csv.py -fn <SUBREDDIT_NAME>-<NUM_OF_POSTS>-pushshift.json

which creates <SUBREDDIT_NAME>-<NUM_OF_POSTS>-pushshift.csv.

  1. To have a background corpus for control, you can download posts from the redditors that have posted in your desired subreddit from other subreddits:
python subreddit_analysis/user_baseline.py -fn <SUBREDDIT_NAME>-<NUM_OF_POSTS>-pushshift.json -pl 200

where -pl specifies the number of posts per redditor to fetch (before filtering the desired subreddit). The file is saved as <SUBREDDIT_NAME>-<NUM_OF_POSTS>-pushshift-baseline-<pl>.json

  1. Transform that as well to a CSV:
python subreddit_analysis/json_baseline_to_csv.py -fn <SUBREDDIT_NAME>-<NUM_OF_POSTS>-pushshift-baseline-<pl>.json

which creates <SUBREDDIT_NAME>-<NUM_OF_POSTS>-pushshift-baseline-<pl>.csv.

  1. Create a folder, <ROOT>, move the subreddit CSV to it, and create another folder inside it named dictionaries that includes a file (note: the filename -- with a possible extensions -- will be used as the header of the loading) per distributed dictionary with space-separated words:
positive joy happy excited
  1. Tokenize CSVs using the r_scripts.

  2. Compute each post's loadings and write it into the CSV:

python subreddit_analysis/submission_loadings.py -d <ROOT> -doc <CSV_FILENAME>

where <CSV_FILENAME> is relative to <ROOT>.

  1. If annotations are available, which should be in a CSV with (at least) a column for the labels themselves and the ID of the post with a post_id header, you can use these to design a data-driven distributed dictionary. You can first train an RNN to create another annotation file with a predicted label for each post with:
python subreddit_analysis/rnn.py --doc_filename <SUBREDDIT_CSV> --label_filename <ANNOTATION_CSV> --label_column <LABEL_HEADER_1> <LABEL_HEADER_2> ... <LABEL_HEADER_N> --out_filename <NEW_ANNOTATION_CSV>

where you can provide multiple labels for multitasking, thought the model provides predictions only for the first specified label for now. Finally, if annotations are ordinal, you can get learned coefficients from Ridge Regression for each word in the vocabulary of all posts (in descending order of importance) using a tf-idf model to represent each document using:

python subreddit_analysis/bow_model.py --doc_filename <SUBREDDIT_CSV> --label_filename <ANY_ANNOTATION_CSV> --label_column <LABEL_HEADER> --out_filename <IMPORTANCE_CSV>
  1. Run analyses using r_scripts.
Owner
Georgios Chochlakis
ML researcher; CS PhD student @ Uni of Southern California
Georgios Chochlakis
Code For TDEER: An Efficient Translating Decoding Schema for Joint Extraction of Entities and Relations (EMNLP2021)

TDEER (WIP) Code For TDEER: An Efficient Translating Decoding Schema for Joint Extraction of Entities and Relations (EMNLP2021) Overview TDEER is an e

Alipay 6 Dec 17, 2022
A "gym" style toolkit for building lightweight Neural Architecture Search systems

A "gym" style toolkit for building lightweight Neural Architecture Search systems

Jack Turner 12 Nov 05, 2022
This project is based on our SIGGRAPH 2021 paper, ROSEFusion: Random Optimization for Online DenSE Reconstruction under Fast Camera Motion .

ROSEFusion šŸŒ¹ This project is based on our SIGGRAPH 2021 paper, ROSEFusion: Random Optimization for Online DenSE Reconstruction under Fast Camera Moti

219 Dec 27, 2022
an implementation of 3D Ken Burns Effect from a Single Image using PyTorch

3d-ken-burns This is a reference implementation of 3D Ken Burns Effect from a Single Image [1] using PyTorch. Given a single input image, it animates

Simon Niklaus 1.4k Dec 28, 2022
Reproduce results and replicate training fo T0 (Multitask Prompted Training Enables Zero-Shot Task Generalization)

T-Zero This repository serves primarily as codebase and instructions for training, evaluation and inference of T0. T0 is the model developed in Multit

BigScience Workshop 253 Dec 27, 2022
Lane follower: Lane-detector (OpenCV) + Object-detector (YOLO5) + CAN-bus

Lane Follower This code is for the lane follower, including perception and control, as shown below. Environment Hardware Industrial Camera Intel-NUC(1

Siqi Fan 3 Jul 07, 2022
Tensorboard for pytorch (and chainer, mxnet, numpy, ...)

tensorboardX Write TensorBoard events with simple function call. The current release (v2.3) is tested on anaconda3, with PyTorch 1.8.1 / torchvision 0

Tzu-Wei Huang 7.5k Dec 28, 2022
Data visualization app for H&M competition in kaggle

handm_data_visualize_app Data visualization app by streamlit for H&M competition in kaggle. competition page: https://www.kaggle.com/competitions/h-an

Kyohei Uto 12 Apr 30, 2022
Code Repo for the ACL21 paper "Common Sense Beyond English: Evaluating and Improving Multilingual LMs for Commonsense Reasoning"

Common Sense Beyond English: Evaluating and Improving Multilingual LMs for Commonsense Reasoning This is the Github repository of our paper, "Common S

INK Lab @ USC 19 Nov 30, 2022
Label Studio is a multi-type data labeling and annotation tool with standardized output format

Website ā€¢ Docs ā€¢ Twitter ā€¢ Join Slack Community What is Label Studio? Label Studio is an open source data labeling tool. It lets you label data types

Heartex 11.7k Jan 09, 2023
A fast implementation of bss_eval metrics for blind source separation

fast_bss_eval Do you have a zillion BSS audio files to process and it is taking days ? Is your simulation never ending ? Fear no more! fast_bss_eval i

Robin Scheibler 99 Dec 13, 2022
This is the PyTorch implementation of GANs Nā€™ Roses: Stable, Controllable, Diverse Image to Image Translation

Official PyTorch repo for GAN's N' Roses. Diverse im2im and vid2vid selfie to anime translation.

1.1k Jan 01, 2023
Official PyTorch implementation of "VITON-HD: High-Resolution Virtual Try-On via Misalignment-Aware Normalization" (CVPR 2021)

VITON-HD ā€” Official PyTorch Implementation VITON-HD: High-Resolution Virtual Try-On via Misalignment-Aware Normalization Seunghwan Choi*1, Sunghyun Pa

Seunghwan Choi 250 Jan 06, 2023
Provably Rare Gem Miner.

Provably Rare Gem Miner just another random project by yoyoismee.eth useful link main site market contract useful thing you should know read contract

34 Nov 22, 2022
Library extending Jupyter notebooks to integrate with Apache TinkerPop and RDF SPARQL.

Graph Notebook: easily query and visualize graphs The graph notebook provides an easy way to interact with graph databases using Jupyter notebooks. Us

Amazon Web Services 501 Dec 28, 2022
Code and models for ICCV2021 paper "Robust Object Detection via Instance-Level Temporal Cycle Confusion".

Robust Object Detection via Instance-Level Temporal Cycle Confusion This repo contains the implementation of the ICCV 2021 paper, Robust Object Detect

Xin Wang 69 Oct 13, 2022
Simple Python project using Opencv and datetime package to recognise faces and log attendance data in a csv file.

Attendance-System-based-on-Facial-recognition-Attendance-data-stored-in-csv-file- Simple Python project using Opencv and datetime package to recognise

3 Aug 09, 2022
Confidence Propagation Cluster aims to replace NMS-based methods as a better box fusion framework in 2D/3D Object detection

CP-Cluster Confidence Propagation Cluster aims to replace NMS-based methods as a better box fusion framework in 2D/3D Object detection, Instance Segme

Yichun Shen 41 Dec 08, 2022
deep learning model that learns to code with drawing in the Processing language

sketchnet sketchnet - processing code generator can we teach a computer to draw pictures with code. We use Processing and java/jruby code paired with

41 Dec 12, 2022
PyTorch3D is FAIR's library of reusable components for deep learning with 3D data

Introduction PyTorch3D provides efficient, reusable components for 3D Computer Vision research with PyTorch. Key features include: Data structure for

Facebook Research 6.8k Jan 01, 2023