Open-CyKG: An Open Cyber Threat Intelligence Knowledge Graph

Overview

Open-CyKG

Open-CyKG: An Open Cyber Threat Intelligence Knowledge Graph

Journal Paper Google Scholar LinkedIn

Model Description

Open-CyKG is a framework that is constructed using an attention-based neural Open Information Extraction (OIE) model to extract valuable cyber threat information from unstructured Advanced Persistent Threat (APT) reports. More specifically, we first identify relevant entities by developing a neural cybersecurity Named Entity Recognizer (NER) that aids in labeling relation triples generated by the OIE model. Afterwards, the extracted structured data is canonicalized to build the KG by employing fusion techniques using word embeddings.

Datasets

  • OIE dataset: Malware DB
  • NER dataset: Microsoft Security Bulletins (MSB) and Cyber Threat Intelligence reports (CTI)

For dataset files please refer to the appropiate refrence in the paper.

Code:

Dependencies

  • Compatible with Python 3.x

  • Dependencies can be installed as specified in Block 1 in the respective notebooks.

  • All the code was implemented on Google Colab using GPU. Please ensure that you are using the version as specified in the "Ïnstallion and Drives" block.

  • Make sure to adapt the code based on your dataset and choice of word embeddings.

  • To utlize CRF in NER model using Keras; plase make sure to:

    -- Use tensorFlow version and Keras version:

    -- In tensorflow_backend.py and Optimizer.py write down those 2 liness ---> then restart runtime

      ```
      import tensorflow.compat.v1 as tf
      tf.disable_v2_behavior()
      ```
    

For more details on the how the exact process was carried out and the final hyper-parameters used; please refer to Open-CyKG paper.

Citing:

Please cite Open-CyKG if you use any of this material in your work.

I. Sarhan and M. Spruit, Open-CyKG: An Open Cyber Threat Intelligence Knowledge Graph, Knowledge-Based Systems (2021), doi: https://doi.org/10.1016/j.knosys.2021.107524.

@article{SARHAN2021107524,
title = {Open-CyKG: An Open Cyber Threat Intelligence Knowledge Graph},
journal = {Knowledge-Based Systems},
volume = {233},
pages = {107524},
year = {2021},
issn = {0950-7051},
doi = {https://doi.org/10.1016/j.knosys.2021.107524},
url = {https://www.sciencedirect.com/science/article/pii/S0950705121007863},
author = {Injy Sarhan and Marco Spruit},
keywords = {Cyber Threat Intelligence, Knowledge Graph, Named Entity Recognition, Open Information Extraction, Attention network},
abstract = {Instant analysis of cybersecurity reports is a fundamental challenge for security experts as an immeasurable amount of cyber information is generated on a daily basis, which necessitates automated information extraction tools to facilitate querying and retrieval of data. Hence, we present Open-CyKG: an Open Cyber Threat Intelligence (CTI) Knowledge Graph (KG) framework that is constructed using an attention-based neural Open Information Extraction (OIE) model to extract valuable cyber threat information from unstructured Advanced Persistent Threat (APT) reports. More specifically, we first identify relevant entities by developing a neural cybersecurity Named Entity Recognizer (NER) that aids in labeling relation triples generated by the OIE model. Afterwards, the extracted structured data is canonicalized to build the KG by employing fusion techniques using word embeddings. As a result, security professionals can execute queries to retrieve valuable information from the Open-CyKG framework. Experimental results demonstrate that our proposed components that build up Open-CyKG outperform state-of-the-art models.11Our implementation of Open-CyKG is publicly available at https://github.com/IS5882/Open-CyKG.}
}

Implementation Refrences:

  • Contextualized word embediings: link to Flairs word embedding documentation, Hugging face link of all pretrained models https://huggingface.co/transformers/v2.3.0/pretrained_models.html
  • Functions in block 3&9 are originally refrenced from the work of Stanvosky et al. Please refer/cite his work, with exception of some modification in the functions Stanovsky, Gabriel, et al. "Supervised open information extraction." Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). 2018.
  • OIE implements Bahdanau attention (https://arxiv.org/pdf/1409.0473.pdf). Towards Data Science Blog
  • NER refrence blog
  • Knowledge Graph fusion motivated by the work of CESI Vashishth, Shikhar, Prince Jain, and Partha Talukdar. "Cesi: Canonicalizing open knowledge bases using embeddings and side information." Proceedings of the 2018 World Wide Web Conference. 2018..
  • Neo4J was used for Knowledge Graph visualization.

Please cite the appropriate reference(s) in your work

Owner
Injy Sarhan
Injy Sarhan
Code for visualizing the loss landscape of neural nets

Visualizing the Loss Landscape of Neural Nets This repository contains the PyTorch code for the paper Hao Li, Zheng Xu, Gavin Taylor, Christoph Studer

Tom Goldstein 2.2k Jan 09, 2023
Official Keras Implementation for UNet++ in IEEE Transactions on Medical Imaging and DLMIA 2018

UNet++: A Nested U-Net Architecture for Medical Image Segmentation UNet++ is a new general purpose image segmentation architecture for more accurate i

Zongwei Zhou 1.8k Dec 27, 2022
Get the partition that a file belongs and the percentage of space that consumes

tinos_eisai_sy Get the partition that a file belongs and the percentage of space that consumes (works only with OSes that use the df command) tinos_ei

Konstantinos Patronas 6 Jan 24, 2022
Streamlit component for TensorBoard, TensorFlow's visualization toolkit

streamlit-tensorboard This is a work-in-progress, providing a function to embed TensorBoard, TensorFlow's visualization toolkit, in Streamlit apps. In

Snehan Kekre 27 Nov 13, 2022
Context-Aware Image Matting for Simultaneous Foreground and Alpha Estimation

Context-Aware Image Matting for Simultaneous Foreground and Alpha Estimation This is the inference codes of Context-Aware Image Matting for Simultaneo

Qiqi Hou 125 Oct 22, 2022
Information-Theoretic Multi-Objective Bayesian Optimization with Continuous Approximations

Information-Theoretic Multi-Objective Bayesian Optimization with Continuous Approximations Requirements The code is implemented in Python and requires

1 Nov 03, 2021
Retinal vessel segmentation based on GT-UNet

Retinal vessel segmentation based on GT-UNet Introduction This project is a retinal blood vessel segmentation code based on UNet-like Group Transforme

Kent0n 27 Dec 18, 2022
Deep Learning Algorithms for Hedging with Frictions

Deep Learning Algorithms for Hedging with Frictions This repository contains the Forward-Backward Stochastic Differential Equation (FBSDE) solver and

Xiaofei Shi 3 Dec 22, 2022
Code for the paper "Adversarially Regularized Autoencoders (ICML 2018)" by Zhao, Kim, Zhang, Rush and LeCun

ARAE Code for the paper "Adversarially Regularized Autoencoders (ICML 2018)" by Zhao, Kim, Zhang, Rush and LeCun https://arxiv.org/abs/1706.04223 Disc

Junbo (Jake) Zhao 399 Jan 02, 2023
A Neural Net Training Interface on TensorFlow, with focus on speed + flexibility

Tensorpack is a neural network training interface based on TensorFlow. Features: It's Yet Another TF high-level API, with speed, and flexibility built

Tensorpack 6.2k Jan 01, 2023
A simple Rock-Paper-Scissors game using CV in python

ML18_Rock-Paper-Scissors-using-CV A simple Rock-Paper-Scissors game using CV in python For IITISOC-21 Rules and procedure to play the interactive game

Anirudha Bhagwat 3 Aug 08, 2021
No Code AI/ML platform

NoCodeAIML No Code AI/ML platform - Community Edition Video credits: Uday Kiran Typical No Code AI/ML Platform will have features like drag and drop,

Bhagvan Kommadi 5 Jan 28, 2022
Pytorch implementation of MLP-Mixer with loading pre-trained models.

MLP-Mixer-Pytorch PyTorch implementation of MLP-Mixer: An all-MLP Architecture for Vision with the function of loading official ImageNet pre-trained p

Qiushi Yang 2 Sep 29, 2022
You Only Hypothesize Once: Point Cloud Registration with Rotation-equivariant Descriptors

You Only Hypothesize Once: Point Cloud Registration with Rotation-equivariant Descriptors In this paper, we propose a novel local descriptor-based fra

Haiping Wang 80 Dec 15, 2022
An open source Jetson Nano baseboard and tools to design your own.

My Jetson Nano Baseboard This basic baseboard gives the user the foundation and the flexibility to design their own baseboard for the Jetson Nano. It

NVIDIA AI IOT 57 Dec 29, 2022
Semantic Segmentation with Pytorch-Lightning

This is a simple demo for performing semantic segmentation on the Kitti dataset using Pytorch-Lightning and optimizing the neural network by monitoring and comparing runs with Weights & Biases.

Boris Dayma 58 Nov 18, 2022
FusionNet: A deep fully residual convolutional neural network for image segmentation in connectomics

FusionNet_Pytorch FusionNet: A deep fully residual convolutional neural network for image segmentation in connectomics Requirements Pytorch 0.1.11 Pyt

Choi Gunho 102 Dec 13, 2022
An implementation of the "Attention is all you need" paper without extra bells and whistles, or difficult syntax

Simple Transformer An implementation of the "Attention is all you need" paper without extra bells and whistles, or difficult syntax. Note: The only ex

29 Jun 16, 2022
Check out the StyleGAN repo and place it in the same directory hierarchy as the present repo

Variational Model Inversion Attacks Kuan-Chieh Wang, Yan Fu, Ke Li, Ashish Khisti, Richard Zemel, Alireza Makhzani Most commands are in run_scripts. W

Jackson Wang 15 Dec 26, 2022
GndNet: Fast ground plane estimation and point cloud segmentation for autonomous vehicles using deep neural networks.

GndNet: Fast Ground plane Estimation and Point Cloud Segmentation for Autonomous Vehicles. Authors: Anshul Paigwar, Ozgur Erkent, David Sierra Gonzale

Anshul Paigwar 114 Dec 29, 2022