Company clustering with K-means/GMM and visualization with PCA, t-SNE, using SSAN relation extraction

Overview

RE results graph visualization and company clustering

Installation

  1. pip install -r requirements.txt

  2. python -m nltk.downloader stopwords

  3. python3.7 main.py

1. Paragraph-Level Relation Extraction using rule-based and SSAN

|- df4rule.py

  • Prerequiste

    • You need csv files that are generated with finiancial_news_api
    • Those files should be located in "visualization_code/rule_base_datasets/*.csv"
  • This code extracts relations with rule-based patterns.

    • (S + V + O) -> (head: S, relation: V, tail: O )

|- df4ssan.py

  • Prerequiste
    • We recommend you run SSAN independently, and make sure all relation extraction.json file from SSAN code saved in "output/*/SSAN_result_all_relation.json"
  • This code convert json file to dataframe and concat all the dataframes from various companies.

2. Graph visualization by degree and betweeness centrality using networkx

|- visualize_cent.py

  • output
    • degree_centrality: "./graph_png/degree.png"
    • betweenness_centrality: "./graph_png/between.png"

3. Get embedding vector with Node2vec Company clustering with K-means and GMM

|- node.py

|-similarity.py

  • output
    • consine similarity: "./similarity_result/consine_similarity.csv"
    • l2 norm: "./similarity_result/l2_norm.csv"

|- company_cluster.py

  • GMM (soft clustering) k: number of clusters

    main.py company_clustering(com_list, com_vec, 4, 'gmm')

  • K-means (hard clustering)

    main.py company_clustering(com_list, com_vec, 4, 'kmeans')

4. Visualize with PCA and TSNE

|-cluster_visualize.py

  • output
    • PCA: "./graph_png/company_cluster_pca.png"
    • TSNE: "./graph_png/company_cluster_tsne.png"

Output

  • degree_centrality: "./graph_png/degree.png"
  • betweenness_centrality: "./graph_png/between.png"
  • consine similarity: "./similarity_result/consine_similarity.csv"
  • l2 norm: "./similarity_result/l2_norm.csv"
  • PCA: "./graph_png/company_cluster_pca.png"
  • TSNE: "./graph_png/company_cluster_tsne.png"
Owner
Jieun Han
Jieun Han
Predict bus arrival time using VertexAI and Nvidia's Jetson Nano

bus_prediction predict bus arrival time using VertexAI and Nvidia's Jetson Nano imagenet the command for imagenet.py look like this python3 /path/to/i

10 Dec 22, 2022
This code is an implementation for Singing TTS.

MLP Singer This code is an implementation for Singing TTS. The algorithm is based on the following papers: Tae, J., Kim, H., & Lee, Y. (2021). MLP Sin

Heejo You 22 Dec 23, 2022
METS/ALTO OCR enhancing tool by the National Library of Luxembourg (BnL)

Nautilus-OCR The National Library of Luxembourg (BnL) started its first initiative in digitizing newspapers, with layout recognition and OCR on articl

National Library of Luxembourg 36 Dec 05, 2022
A simple Python library for stochastic graphical ecological models

What is Viridicle? Viridicle is a library for simulating stochastic graphical ecological models. It implements the continuous time models described in

Theorem Engine 0 Dec 04, 2021
Dynamic Graph Event Detection

DyGED Dynamic Graph Event Detection Get Started pip install -r requirements.txt TODO Paper link to arxiv, and how to cite. Twitter Weather dataset tra

Mert Koşan 3 May 09, 2022
Pyramid addon for OpenAPI3 validation of requests and responses.

Validate Pyramid views against an OpenAPI 3.0 document Peace of Mind The reason this package exists is to give you peace of mind when providing a REST

Pylons Project 79 Dec 30, 2022
Official implementation of the ICML2021 paper "Elastic Graph Neural Networks"

ElasticGNN This repository includes the official implementation of ElasticGNN in the paper "Elastic Graph Neural Networks" [ICML 2021]. Xiaorui Liu, W

liuxiaorui 34 Dec 04, 2022
Using Clinical Drug Representations for Improving Mortality and Length of Stay Predictions

Using Clinical Drug Representations for Improving Mortality and Length of Stay Predictions Usage Clone the code to local. https://github.com/tanlab/MI

Computational Biology and Machine Learning lab @ TOBB ETU 3 Oct 18, 2022
Code for "AutoMTL: A Programming Framework for Automated Multi-Task Learning"

AutoMTL: A Programming Framework for Automated Multi-Task Learning This is the website for our paper "AutoMTL: A Programming Framework for Automated M

Ivy Zhang 40 Dec 04, 2022
mmfewshot is an open source few shot learning toolbox based on PyTorch

OpenMMLab FewShot Learning Toolbox and Benchmark

OpenMMLab 514 Dec 28, 2022
Official Pytorch implementation for 2021 ICCV paper "Learning Motion Priors for 4D Human Body Capture in 3D Scenes" and trained models / data

Learning Motion Priors for 4D Human Body Capture in 3D Scenes (LEMO) Official Pytorch implementation for 2021 ICCV (oral) paper "Learning Motion Prior

165 Dec 19, 2022
Code for "Infinitely Deep Bayesian Neural Networks with Stochastic Differential Equations"

Infinitely Deep Bayesian Neural Networks with SDEs This library contains JAX and Pytorch implementations of neural ODEs and Bayesian layers for stocha

Winnie Xu 95 Nov 26, 2021
An implementation of MobileFormer

MobileFormer An implementation of MobileFormer proposed by Yinpeng Chen, Xiyang Dai et al. Including [1] Mobile-Former proposed in:

slwang9353 62 Dec 28, 2022
Multi-Horizon-Forecasting-for-Limit-Order-Books

Multi-Horizon-Forecasting-for-Limit-Order-Books This jupyter notebook is used to demonstrate our work, Multi-Horizon Forecasting for Limit Order Books

Zihao Zhang 116 Dec 23, 2022
Piotr - IoT firmware emulation instrumentation for training and research

Piotr: Pythonic IoT exploitation and Research Introduction to Piotr Piotr is an emulation helper for Qemu that provides a convenient way to create, sh

Damien Cauquil 51 Nov 09, 2022
This repository contains the implementation of the paper: "Towards Frequency-Based Explanation for Robust CNN"

RobustFreqCNN About This repository contains the implementation of the paper "Towards Frequency-Based Explanation for Robust CNN" arxiv. It primarly d

Sarosij Bose 2 Jan 23, 2022
Bayesian Optimization using GPflow

Note: This package is for use with GPFlow 1. For Bayesian optimization using GPFlow 2 please see Trieste, a joint effort with Secondmind. GPflowOpt GP

GPflow 257 Dec 26, 2022
Full-featured Decision Trees and Random Forests learner.

CID3 This is a full-featured Decision Trees and Random Forests learner. It can save trees or forests to disk for later use. It is possible to query tr

Alejandro Penate-Diaz 3 Aug 15, 2022
Flexible Networks for Learning Physical Dynamics of Deformable Objects (2021)

Flexible Networks for Learning Physical Dynamics of Deformable Objects (2021) By Jinhyung Park, Dohae Lee, In-Kwon Lee from Yonsei University (Seoul,

Jinhyung Park 0 Jan 09, 2022
Tiny-NewsRec: Efficient and Effective PLM-based News Recommendation

Tiny-NewsRec The source codes for our paper "Tiny-NewsRec: Efficient and Effective PLM-based News Recommendation". Requirements PyTorch == 1.6.0 Tensor

Yang Yu 3 Dec 07, 2022