Improve current data preprocessing for FTM's WOB data to analyze Shell and Dutch Governmental contacts.

Overview

data-preprocessing_toogoodtogo_threatlines

We're the hackathon leftovers, but we are Too Good To Go ;-). A repo by Lukas Schubotz, Stef van Buuren, and Raymon van Dinter. We aim to improve current data preprocessing for FTM's WOB data to analyze Shell and Dutch Governmental contacts.

Synchronous visualisation of email threads

Publications from the FTM "Dossier SHELL papers" https://www.ftm.nl/dossier/shell-papers suggest that timing of events is critical in the interactions between actors. It would therefore be useful if we could visualise the mail exchanges in time.

The idea is to visualise threads of mail exchanges between actors over time. When this is done for multiple threads, the display would give rapid insight into the structure and timing of exchanges between actors. For example, suppose we are able to construct a single thread from "RE:" and "FW:" mails in the data. A simple visualisation would be

See https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.88.9825&rep=rep1&type=pdf for variations on this display, for example by adding the interactions between the actors by fancy arcs and resorting the mails according to actor pairs.

A generalisation to multiple simulataneous threads would stack multiple lines, similar to a dot plot. Such a design calls for relatively simple thread displays that are synchronised in time. Therefore we will concentrate on using a simple thread line that plots mail chronology against calender time.

A somewhat grander idea would be to create a "film of events". The user would place a cursor on the time axis, and scroll through time. The new information per mail is displayed as the cursor passes the send time of the email.

Issues to resolve

We need complex/advanced text processing. Some of the issues include:

  1. How can we split multiple emails in a RE/FW into a set of elementary mails, each corresponding to just one sender?
  2. How well can we form threads by matching on subject lines?
  3. Do duplicates extracted from RE/FW serve any useful purpose?
  4. What is the percentage of threads for which we can find the parent mail (the mail that started the thread)?

Experiment 1

The first design plots all thread lines between 2016 and 2020 on one chart.

Experiment 2

The second design uses trelliscopejs to plot the same information in smaller pieces.

The user can switch between 27 panes, each containing about 20 threads.

Try out the interactive version

Experiment 3

Back to figure 1, but now plotted with rbokeh, so that we may zoom and use tooltips (interaction not supported by GitHub markdown)

Owner
ASReview hackathon for Follow the Money
ASReview hackathon for Follow the Money
An app to automatically take attendance by scanning students' bar coded ID card as they enter the classroom.

Auto Classroom Attendance This application may be run on a PC to automatically scan students' ID card using a generic bar code scanner and output the

1 Nov 10, 2021
A tool to quickly create codeforces contest directories with templates.

Codeforces Template Tool I created this tool to help me quickly set up codeforces contests/singular problems with templates. Tested for windows, shoul

1 Jun 02, 2022
validation for pre-commit.ci configuration

pre-commit-ci-config validation for pre-commit.ci configuration installation pip install pre-commit-ci-config api pre_commit_ci_config.SCHEMA a cfgv s

pre-commit.ci 17 Jul 11, 2022
Supply Chain will be a SAAS platfom to provide e-logistic facilites with most optimal

Shipp It Welcome To Supply Chain App [ Shipp It ] In "Shipp It" we are creating a full solution[web+app] for a entire supply chain from receiving orde

SAIKAT_CLAW 25 Dec 26, 2022
Framework for creating efficient data processing pipelines

Aqueduct Framework for creating efficient data processing pipelines. Contact Feel free to ask questions in telegram t.me/avito-ml Key Features Increas

avito.tech 137 Dec 29, 2022
Traductor de webs desde consola usando el servicio de Google Traductor.

proxiGG Traductor de webs desde consola usando el servicio de Google Traductor. Se adjunta el código fuente para Python3 y un binario compilado en C p

@as_informatico 2 Oct 20, 2021
Feapder的管道扩展

FEAPDER 管道扩展 简介 此模块为feapder的pipelines扩展,感谢广大开发者对feapder的贡献 随着feapder支持的pipelines越来越多,为减少feapder的体积,特将pipelines提出,使用者可按需安装 管道 PostgreSQL 贡献者:沈瑞祥 联系方式:r

boris 9 Dec 07, 2022
Awesome Cheatsheet

Awesome Cheatsheet List of useful cheatsheets Inspired by @sindresorhus awesome and improved by these amazing contributors. If you see a link here is

detailyang 6.5k Jan 07, 2023
Back-end API for the reternal framework

RE:TERNAL RE:TERNAL is a centralised purple team simulation platform. Reternal uses agents installed on a simulation network to execute various known

Joey Dreijer 7 Apr 15, 2022
Basic Clojure REPL for Sublime Text

Basic Clojure REPL for Sublime Text Goals: Decomplected: just REPL, nothing more Zero dependencies: works directly with pREPL Compact: Display code ev

Nikita Prokopov 23 Dec 24, 2021
In this repo, I will put all the code related to data science using python libraries like Numpy, Pandas, Matplotlib, Seaborn and many more.

Python-for-DS In this repo, I will put all the code related to data science using python libraries like Numpy, Pandas, Matplotlib, Seaborn and many mo

1 Jan 10, 2022
A casual IDOR exploiter that provides .csv files of url and status code.

IDOR-for-the-casual Do you like to IDOR? Are you a Windows hax0r? Well have I got a tool for you... A casual IDOR exploiter that provides .csv files o

Ben Wildee 2 Jan 20, 2022
Project Interface For nextcord-ext

Project Interface For nextcord-ext

nextcord-ext 1 Nov 13, 2021
Unified Distributed Execution

Unified Distributed Execution The framework supports multiple execution backends: Ray, Dask, MPI and MultiProcessing. To run tests you need to install

17 Dec 25, 2022
Larvamatch - Find your larva or punk match.

LarvaMatch Find your larva or punk match. UI TBD API (not started) The API will allow you to specify a punk by token id to find a larva match, and vic

1 Jan 02, 2022
A basic tic tac toe game on python!

A basic tic tac toe game on python!

Shubham Kumar Chandrabansi 1 Nov 18, 2021
MiniJVM is simple java virtual machine written by python language, it can load class file from file system and run it.

MiniJVM MiniJVM是一款使用python编写的简易JVM,能够从本地加载class文件并且执行绝大多数指令。 支持的功能 1.从本地磁盘加载class并解析 2.支持绝大多数指令集的执行 3.支持虚拟机内存分区以及对象的创建 4.支持方法的调用和参数传递 5.支持静态代码块的初始化 不支

keguoyu 60 Apr 01, 2022
Airflow Operator for running Soda SQL scans

Airflow Operator for running Soda SQL scans

Todd de Quincey 7 Oct 18, 2022
Repo Home WPDrawBot - (Repo, Home, WP) A powerful programmatic 2D drawing application for MacOS X which generates graphics from Python scripts. (graphics, dev, mac)

DrawBot DrawBot is a powerful, free application for macOS that invites you to write Python scripts to generate two-dimensional graphics. The built-in

Frederik Berlaen 342 Dec 27, 2022
Uproot - A script to bring deeply nested files or directories to the surface

UPROOT Bring deeply nested files or folders to the surface Uproot helps convert

Ted 2 Jan 15, 2022