当前位置:网站首页>Summary of data sets in intrusion detection field

Summary of data sets in intrusion detection field

2022-07-06 05:43:00 Charming pie star

reference

The data in this paper are quoted from the following literature :

Yang, Zhen, et al. “A systematic literature review of methods and datasets for anomaly-based network intrusion detection.” Computers & Security (2022): 102675.

The meaning of a noun

  • emulated: Represents the network traffic generated in the experimental environment
  • real: Network traffic captured in real scenes

The values in parentheses after the dataset correspond to :

  1. Date of publication of data set
  2. The data set is simulated or real
  3. The total amount of data in the dataset
  4. Whether it is marked data
  5. Total data category

KDD99(1999 / emulated / 5,00,000 / yes / 4)

KDD99 Data set from Lee and Stolfo (2000) from DARPA Network dataset file creation . This data set contains seven weeks of network traffic , It contains about 490 Ten thousand records . Attack types are divided into :(1) The user to root(U2R); (2) Remote to local (R2L); (3) exploration ; (4) DoS. Each instance consists of three categories 41 Two features represent :(1) basic ; ( Two ) Traffic ; (3) Content . The basic feature is from TCP/IP Extracted from the connection . Traffic characteristics are divided into traffic characteristics with the same host characteristics or the same service characteristics . The content characteristics are related to the suspicious behavior of the data part .KDD99 It is the most extensive data set used to evaluate intrusion detection models .

Dataset Links :http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html

NSL-KDD(2009 / emulated / 148,517 / yes / 4)

NSL-KDD For resolution KDD99 Some inherent problems of data sets . although , This new version of KDD Data sets still exist Tavallaee Some problems discussed by et al (2009) And may not be the perfect representative of the existing real network , Due to the lack of Web-based IDS The public data set of , Therefore, it can still be used as an effective benchmark data set , To help researchers compare different intrusion detection methods . Besides ,NSL-KDD The number of records in the training and test set is reasonable . This advantage makes it affordable to run experiments on the whole set without randomly selecting a small part . therefore , Evaluation results of different research work ​​ Will be consistent and comparable .

Dataset Links :https://www.unb.ca/cic/datasets/nsl.html

UNSW-NB15(2015 / emulated / 2,540,044 / yes / 9)

UNSW-NB15 The data set was created by the cyber range Laboratory of the Australian Cyber Security Centre . Due to its various novel attack methods , It is widely used . The types of attacks include Fuzzer、Analysis、Backdoor、DoS、Exploits、Generic、Reconnaissance、Shellcode and Worms. It has one containing 82,332 A training set of records and one containing 175,341 Test set of records .

Dataset Links :https://cloudstor.aarnet.edu.au/plus/index.php/s/2DhnLGDdEECo4ys?path=2FUNSW-NB1520-20CSV20Files

CICIDS2017(2017 / emulated / 2,830,743 / yes / 7)

CICIDS2017 Data sets contain benign and common attacks , Including source data (PCAP) And timestamp based 、 The source and target IP、 Source and destination ports 、 Network traffic analysis results of protocol and attack token flow (CSV file ). Used by researchers B-Profile System (Sharafaldin, et al. 2016) Analyze the abstract behavior of human interaction and generate benign background traffic . This data set includes data based on HTTP、HTTPS、FTP、SSH And email protocol 25 Abstract behavior of users . Brute force attacks include FTP、SSH、DoS、Heartbleed、Web attack 、 penetration 、 Botnets and DDoS.

Dataset Links :https://www.unb.ca/cic/datasets/ids-2017.html

CICDDoS2019(2019 / emulated / huge / yes / 11)

CICDoS2019 The dataset contains the latest DDoS attack , Similar to real-world data . It includes the use of CICFLOWMeter-V3 The results of network traffic analysis , It contains token flows based on timestamp sources , And the goal IPS Source and port protocols and attacks .
Dataset Links :https://www.unb.ca/cic/datasets/ddos-2019.html

Kyoto 2006+(2006 / real / unknown / yes / unknown)

Kyoto 2006+ Data set is a publicly available honeypot data set of real network traffic , Contains only a small amount and a small range of reality 、 Normal user behavior . Researchers convert packet based traffic into a new format called session . Each session has 24 Attributes , among 14 One is suffering KDD CUP 99 Statistical information features inspired by data sets , rest 10 Attributes are typical traffic based attributes , for example IP Address ( anonymous )、 Port and duration . These data were collected in three years , Including about 9300 Ten thousand conversations .

Dataset Links :http://www.takakura.com/Kyoto_data/

NDSec-1(2016 / emulated / huge / yes / 8)

NDSec-1 The data set contains traces and log files of network attacks synthesized by researchers from network facilities . It is publicly available , And in 2016 Captured in packet based format . It contains additional system logs and Windows Event log information . The attack mix includes botnets 、 Brute force ( in the light of FTP、HTTP and SSH)、DoS(HTTP、SYN and UDP flooding )、 Exploit 、 Port scanning 、 Deception and XSS/SQL Inject .

Dataset Links :https://www2.hs-fulda.de/NDSec/NDSec-1/Files/

CTU-13(2014 / real / huge / yes / 7)

CTU-13 Data set in 2013 Annual capture , Provide packets 、 One way flow and two-way flow formats . Capture in a university network , its 13 Scenarios include different botnet attacks . More information about infected hosts is available on the website .3 The flow is marked in three stages :1) All traffic to and from infected hosts is marked as botnets ; 2) The flow that matches a particular filter is marked as normal ; 3) The remaining traffic is marked as background . therefore , Background traffic may be normal or malicious .

Dataset Links :http://mcfp.weebly.com/

BoT-IoT(2019 / real / 73,360,900 / yes / 2)

BoT-IoT The dataset contains more than 7200 Ten thousand records , Include DDoS、DoS、OS、 Service scan 、 Keyboard logging and data leakage attacks . Node-red Tools are used to simulate the network behavior of IOT devices . MQTT Is a lightweight communication protocol , Used to link machine to machine (M2M) signal communication . The IOT scenario of the test platform is the weather station 、 Smart fridge 、 Motion activation lamp 、 Remotely activate the garage door and intelligent thermostat .

Dataset Links :https://www.unsw.adfa.edu.au/unsw-canberra-cyber/cybersecurity/ADFA-NB15-Datasets/bot_iot.php

IoT-23(2020 / real / unknown / yes / 20)

IoT-23 Data set from 23 Network capture of Internet of things traffic ( It's called a scene ) form , Including from infected IOT devices 20 individual (PCAP file ) And three real Internet of things network traffic . Raspberry Pi Malware uses multiple protocols and performs different operations in each malicious scenario . The network traffic of benign scenario captures the network traffic from three real IOT devices : philips HUE intelligence LED The lamp 、 Amazon Echo Home intelligent personal assistant monk Fei intelligent door lock . Both malicious and benign scenarios operate in a controlled network environment with unlimited Internet connections , Just like any real IOT device .

Dataset Links :https://mcfp.felk.cvut.cz/publicDatasets/IoT-23-Dataset/iot_23_datasets_small.tar.gz

ICML-09(2009 / real / 2,400,000 / yes / 1)

Dataset Links :http://www.sysnet.ucsd.edu/projects/url/

CDX(2009 / real / 5771 / yes / 2)

Dataset Links :https://www.usma.edu/centers-and-research/cyber-research-center/data-sets

ISOT Botnet(2010 / real / 1,675,424 / yes /unknown)

Dataset Links :https://www.uvic.ca/engineering/ece/isot/datasets/botnet-ransomware/index.php

ISCX-IDS(2012 / real / 2,450,324 / yes / unknown)

Dataset Links :https://www.unb.ca/cic/datasets/ids.html

Botnet-2014(2014 / real / 283,770 / yes / 16)

Dataset Links :https://www.unb.ca/cic/datasets/botnet.html

CIDDS-001(2017 / emulated / 31,959,267 / yes / 6)

Dataset Links :http://www.hs-coburg.de/cidds

CIDDS-002(2017 / emulated / 16,161,183 / yes / 5)

Dataset Links :http://www.hs-coburg.de/cidds

TRAbID(2017 / emulated / huge / yes / 2)

Dataset Links :https://secplab.ppgia.pucpr.br/?q=trabid

ISOT HTTP Botnet(2017 / emulated / huge / yes / 9)

Dataset Links :https://www.uvic.ca/engineering/ece/isot/datasets/botnet-ransomware/index.php

ISOT CID(2018 / real / 36,938,985 / yes / 18)

Dataset Links :https://www.uvic.ca/engineering/ece/isot/datasets/cloud-security/index.php

InSDN(2020 / real / unknown / yes / 20)

Dataset Links :http://aseados.ucd.ie/?p=177

CIRA-CIC-DoHBrw 2020(2020 / emulated / 1,185,286 / yes / 3)

Dataset Links :https://www.unb.ca/cic/datasets/dohbrw-2020.html

OPCUA(2020 / emulated / 107,634 / yes / 3)

Dataset Links :https://digi2-feup.github.io/OPCUADataset/

To be added …

原网站

版权声明
本文为[Charming pie star]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/187/202207060542291946.html