当前位置:网站首页>Summary of data sets in intrusion detection field
Summary of data sets in intrusion detection field
2022-07-06 05:43:00 【Charming pie star】
reference
The data in this paper are quoted from the following literature :
Yang, Zhen, et al. “A systematic literature review of methods and datasets for anomaly-based network intrusion detection.” Computers & Security (2022): 102675.
The meaning of a noun
- emulated: Represents the network traffic generated in the experimental environment
- real: Network traffic captured in real scenes
The values in parentheses after the dataset correspond to :
- Date of publication of data set
- The data set is simulated or real
- The total amount of data in the dataset
- Whether it is marked data
- Total data category
KDD99(1999 / emulated / 5,00,000 / yes / 4)
KDD99 Data set from Lee and Stolfo (2000) from DARPA Network dataset file creation . This data set contains seven weeks of network traffic , It contains about 490 Ten thousand records . Attack types are divided into :(1) The user to root(U2R); (2) Remote to local (R2L); (3) exploration ; (4) DoS. Each instance consists of three categories 41 Two features represent :(1) basic ; ( Two ) Traffic ; (3) Content . The basic feature is from TCP/IP Extracted from the connection . Traffic characteristics are divided into traffic characteristics with the same host characteristics or the same service characteristics . The content characteristics are related to the suspicious behavior of the data part .KDD99 It is the most extensive data set used to evaluate intrusion detection models .
Dataset Links :http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html
NSL-KDD(2009 / emulated / 148,517 / yes / 4)
NSL-KDD For resolution KDD99 Some inherent problems of data sets . although , This new version of KDD Data sets still exist Tavallaee Some problems discussed by et al (2009) And may not be the perfect representative of the existing real network , Due to the lack of Web-based IDS The public data set of , Therefore, it can still be used as an effective benchmark data set , To help researchers compare different intrusion detection methods . Besides ,NSL-KDD The number of records in the training and test set is reasonable . This advantage makes it affordable to run experiments on the whole set without randomly selecting a small part . therefore , Evaluation results of different research work Will be consistent and comparable .
Dataset Links :https://www.unb.ca/cic/datasets/nsl.html
UNSW-NB15(2015 / emulated / 2,540,044 / yes / 9)
UNSW-NB15 The data set was created by the cyber range Laboratory of the Australian Cyber Security Centre . Due to its various novel attack methods , It is widely used . The types of attacks include Fuzzer、Analysis、Backdoor、DoS、Exploits、Generic、Reconnaissance、Shellcode and Worms. It has one containing 82,332 A training set of records and one containing 175,341 Test set of records .
Dataset Links :https://cloudstor.aarnet.edu.au/plus/index.php/s/2DhnLGDdEECo4ys?path=2FUNSW-NB1520-20CSV20Files
CICIDS2017(2017 / emulated / 2,830,743 / yes / 7)
CICIDS2017 Data sets contain benign and common attacks , Including source data (PCAP) And timestamp based 、 The source and target IP、 Source and destination ports 、 Network traffic analysis results of protocol and attack token flow (CSV file ). Used by researchers B-Profile System (Sharafaldin, et al. 2016) Analyze the abstract behavior of human interaction and generate benign background traffic . This data set includes data based on HTTP、HTTPS、FTP、SSH And email protocol 25 Abstract behavior of users . Brute force attacks include FTP、SSH、DoS、Heartbleed、Web attack 、 penetration 、 Botnets and DDoS.
Dataset Links :https://www.unb.ca/cic/datasets/ids-2017.html
CICDDoS2019(2019 / emulated / huge / yes / 11)
CICDoS2019 The dataset contains the latest DDoS attack , Similar to real-world data . It includes the use of CICFLOWMeter-V3 The results of network traffic analysis , It contains token flows based on timestamp sources , And the goal IPS Source and port protocols and attacks .
Dataset Links :https://www.unb.ca/cic/datasets/ddos-2019.html
Kyoto 2006+(2006 / real / unknown / yes / unknown)
Kyoto 2006+ Data set is a publicly available honeypot data set of real network traffic , Contains only a small amount and a small range of reality 、 Normal user behavior . Researchers convert packet based traffic into a new format called session . Each session has 24 Attributes , among 14 One is suffering KDD CUP 99 Statistical information features inspired by data sets , rest 10 Attributes are typical traffic based attributes , for example IP Address ( anonymous )、 Port and duration . These data were collected in three years , Including about 9300 Ten thousand conversations .
Dataset Links :http://www.takakura.com/Kyoto_data/
NDSec-1(2016 / emulated / huge / yes / 8)
NDSec-1 The data set contains traces and log files of network attacks synthesized by researchers from network facilities . It is publicly available , And in 2016 Captured in packet based format . It contains additional system logs and Windows Event log information . The attack mix includes botnets 、 Brute force ( in the light of FTP、HTTP and SSH)、DoS(HTTP、SYN and UDP flooding )、 Exploit 、 Port scanning 、 Deception and XSS/SQL Inject .
Dataset Links :https://www2.hs-fulda.de/NDSec/NDSec-1/Files/
CTU-13(2014 / real / huge / yes / 7)
CTU-13 Data set in 2013 Annual capture , Provide packets 、 One way flow and two-way flow formats . Capture in a university network , its 13 Scenarios include different botnet attacks . More information about infected hosts is available on the website .3 The flow is marked in three stages :1) All traffic to and from infected hosts is marked as botnets ; 2) The flow that matches a particular filter is marked as normal ; 3) The remaining traffic is marked as background . therefore , Background traffic may be normal or malicious .
Dataset Links :http://mcfp.weebly.com/
BoT-IoT(2019 / real / 73,360,900 / yes / 2)
BoT-IoT The dataset contains more than 7200 Ten thousand records , Include DDoS、DoS、OS、 Service scan 、 Keyboard logging and data leakage attacks . Node-red Tools are used to simulate the network behavior of IOT devices . MQTT Is a lightweight communication protocol , Used to link machine to machine (M2M) signal communication . The IOT scenario of the test platform is the weather station 、 Smart fridge 、 Motion activation lamp 、 Remotely activate the garage door and intelligent thermostat .
Dataset Links :https://www.unsw.adfa.edu.au/unsw-canberra-cyber/cybersecurity/ADFA-NB15-Datasets/bot_iot.php
IoT-23(2020 / real / unknown / yes / 20)
IoT-23 Data set from 23 Network capture of Internet of things traffic ( It's called a scene ) form , Including from infected IOT devices 20 individual (PCAP file ) And three real Internet of things network traffic . Raspberry Pi Malware uses multiple protocols and performs different operations in each malicious scenario . The network traffic of benign scenario captures the network traffic from three real IOT devices : philips HUE intelligence LED The lamp 、 Amazon Echo Home intelligent personal assistant monk Fei intelligent door lock . Both malicious and benign scenarios operate in a controlled network environment with unlimited Internet connections , Just like any real IOT device .
Dataset Links :https://mcfp.felk.cvut.cz/publicDatasets/IoT-23-Dataset/iot_23_datasets_small.tar.gz
ICML-09(2009 / real / 2,400,000 / yes / 1)
Dataset Links :http://www.sysnet.ucsd.edu/projects/url/
CDX(2009 / real / 5771 / yes / 2)
Dataset Links :https://www.usma.edu/centers-and-research/cyber-research-center/data-sets
ISOT Botnet(2010 / real / 1,675,424 / yes /unknown)
Dataset Links :https://www.uvic.ca/engineering/ece/isot/datasets/botnet-ransomware/index.php
ISCX-IDS(2012 / real / 2,450,324 / yes / unknown)
Dataset Links :https://www.unb.ca/cic/datasets/ids.html
Botnet-2014(2014 / real / 283,770 / yes / 16)
Dataset Links :https://www.unb.ca/cic/datasets/botnet.html
CIDDS-001(2017 / emulated / 31,959,267 / yes / 6)
Dataset Links :http://www.hs-coburg.de/cidds
CIDDS-002(2017 / emulated / 16,161,183 / yes / 5)
Dataset Links :http://www.hs-coburg.de/cidds
TRAbID(2017 / emulated / huge / yes / 2)
Dataset Links :https://secplab.ppgia.pucpr.br/?q=trabid
ISOT HTTP Botnet(2017 / emulated / huge / yes / 9)
Dataset Links :https://www.uvic.ca/engineering/ece/isot/datasets/botnet-ransomware/index.php
ISOT CID(2018 / real / 36,938,985 / yes / 18)
Dataset Links :https://www.uvic.ca/engineering/ece/isot/datasets/cloud-security/index.php
InSDN(2020 / real / unknown / yes / 20)
Dataset Links :http://aseados.ucd.ie/?p=177
CIRA-CIC-DoHBrw 2020(2020 / emulated / 1,185,286 / yes / 3)
Dataset Links :https://www.unb.ca/cic/datasets/dohbrw-2020.html
OPCUA(2020 / emulated / 107,634 / yes / 3)
Dataset Links :https://digi2-feup.github.io/OPCUADataset/
To be added …
边栏推荐
- LeetCode_字符串反转_简单_557. 反转字符串中的单词 III
- Station B, Mr. Liu Er - multiple logistic regression, structure 7
- Vulhub vulnerability recurrence 73_ Webmin
- Promotion hung up! The leader said it wasn't my poor skills
- 大型网站如何选择比较好的云主机服务商?
- How to get list length
- The ECU of 21 Audi q5l 45tfsi brushes is upgraded to master special adjustment, and the horsepower is safely and stably increased to 305 horsepower
- Winter 2021 pat class B problem solution (C language)
- 02. Develop data storage of blog project
- Redis message queue
猜你喜欢
自建DNS服务器,客户端打开网页慢,解决办法
59. Spiral matrix
First acquaintance with CDN
[Jiudu OJ 08] simple search x
Classes and objects (I) detailed explanation of this pointer
Winter 2021 pat class B problem solution (C language)
What is independent IP and how about independent IP host?
B站刘二大人-反向传播
03. Login of development blog project
华为路由器如何配置静态路由
随机推荐
05. 博客项目之安全
Closure, decorator
Node 之 nvm 下载、安装、使用,以及node 、nrm 的相关使用
01. Project introduction of blog development project
Web Security (V) what is a session? Why do I need a session?
进程和线程
B站刘二大人-Softmx分类器及MNIST实现-Lecture 9
Station B, Mr. Liu Er - multiple logistic regression, structure 7
PDK process library installation -csmc
28io stream, byte output stream writes multiple bytes
PDK工艺库安装-CSMC
移植InfoNES到STM32
CoDeSys note 2: set coil and reset coil
P2802 go home
Easy to understand IIC protocol explanation
Promotion hung up! The leader said it wasn't my poor skills
04. 项目博客之日志
Selective parameters in MATLAB functions
How to use PHP string query function
Codeless June event 2022 codeless Explorer conference will be held soon; AI enhanced codeless tool launched