当前位置:网站首页>Summary of data sets in intrusion detection field
Summary of data sets in intrusion detection field
2022-07-06 05:43:00 【Charming pie star】
reference
The data in this paper are quoted from the following literature :
Yang, Zhen, et al. “A systematic literature review of methods and datasets for anomaly-based network intrusion detection.” Computers & Security (2022): 102675.
The meaning of a noun
- emulated: Represents the network traffic generated in the experimental environment
- real: Network traffic captured in real scenes
The values in parentheses after the dataset correspond to :
- Date of publication of data set
- The data set is simulated or real
- The total amount of data in the dataset
- Whether it is marked data
- Total data category
KDD99(1999 / emulated / 5,00,000 / yes / 4)
KDD99 Data set from Lee and Stolfo (2000) from DARPA Network dataset file creation . This data set contains seven weeks of network traffic , It contains about 490 Ten thousand records . Attack types are divided into :(1) The user to root(U2R); (2) Remote to local (R2L); (3) exploration ; (4) DoS. Each instance consists of three categories 41 Two features represent :(1) basic ; ( Two ) Traffic ; (3) Content . The basic feature is from TCP/IP Extracted from the connection . Traffic characteristics are divided into traffic characteristics with the same host characteristics or the same service characteristics . The content characteristics are related to the suspicious behavior of the data part .KDD99 It is the most extensive data set used to evaluate intrusion detection models .
Dataset Links :http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html
NSL-KDD(2009 / emulated / 148,517 / yes / 4)
NSL-KDD For resolution KDD99 Some inherent problems of data sets . although , This new version of KDD Data sets still exist Tavallaee Some problems discussed by et al (2009) And may not be the perfect representative of the existing real network , Due to the lack of Web-based IDS The public data set of , Therefore, it can still be used as an effective benchmark data set , To help researchers compare different intrusion detection methods . Besides ,NSL-KDD The number of records in the training and test set is reasonable . This advantage makes it affordable to run experiments on the whole set without randomly selecting a small part . therefore , Evaluation results of different research work Will be consistent and comparable .
Dataset Links :https://www.unb.ca/cic/datasets/nsl.html
UNSW-NB15(2015 / emulated / 2,540,044 / yes / 9)
UNSW-NB15 The data set was created by the cyber range Laboratory of the Australian Cyber Security Centre . Due to its various novel attack methods , It is widely used . The types of attacks include Fuzzer、Analysis、Backdoor、DoS、Exploits、Generic、Reconnaissance、Shellcode and Worms. It has one containing 82,332 A training set of records and one containing 175,341 Test set of records .
Dataset Links :https://cloudstor.aarnet.edu.au/plus/index.php/s/2DhnLGDdEECo4ys?path=2FUNSW-NB1520-20CSV20Files
CICIDS2017(2017 / emulated / 2,830,743 / yes / 7)
CICIDS2017 Data sets contain benign and common attacks , Including source data (PCAP) And timestamp based 、 The source and target IP、 Source and destination ports 、 Network traffic analysis results of protocol and attack token flow (CSV file ). Used by researchers B-Profile System (Sharafaldin, et al. 2016) Analyze the abstract behavior of human interaction and generate benign background traffic . This data set includes data based on HTTP、HTTPS、FTP、SSH And email protocol 25 Abstract behavior of users . Brute force attacks include FTP、SSH、DoS、Heartbleed、Web attack 、 penetration 、 Botnets and DDoS.
Dataset Links :https://www.unb.ca/cic/datasets/ids-2017.html
CICDDoS2019(2019 / emulated / huge / yes / 11)
CICDoS2019 The dataset contains the latest DDoS attack , Similar to real-world data . It includes the use of CICFLOWMeter-V3 The results of network traffic analysis , It contains token flows based on timestamp sources , And the goal IPS Source and port protocols and attacks .
Dataset Links :https://www.unb.ca/cic/datasets/ddos-2019.html
Kyoto 2006+(2006 / real / unknown / yes / unknown)
Kyoto 2006+ Data set is a publicly available honeypot data set of real network traffic , Contains only a small amount and a small range of reality 、 Normal user behavior . Researchers convert packet based traffic into a new format called session . Each session has 24 Attributes , among 14 One is suffering KDD CUP 99 Statistical information features inspired by data sets , rest 10 Attributes are typical traffic based attributes , for example IP Address ( anonymous )、 Port and duration . These data were collected in three years , Including about 9300 Ten thousand conversations .
Dataset Links :http://www.takakura.com/Kyoto_data/
NDSec-1(2016 / emulated / huge / yes / 8)
NDSec-1 The data set contains traces and log files of network attacks synthesized by researchers from network facilities . It is publicly available , And in 2016 Captured in packet based format . It contains additional system logs and Windows Event log information . The attack mix includes botnets 、 Brute force ( in the light of FTP、HTTP and SSH)、DoS(HTTP、SYN and UDP flooding )、 Exploit 、 Port scanning 、 Deception and XSS/SQL Inject .
Dataset Links :https://www2.hs-fulda.de/NDSec/NDSec-1/Files/
CTU-13(2014 / real / huge / yes / 7)
CTU-13 Data set in 2013 Annual capture , Provide packets 、 One way flow and two-way flow formats . Capture in a university network , its 13 Scenarios include different botnet attacks . More information about infected hosts is available on the website .3 The flow is marked in three stages :1) All traffic to and from infected hosts is marked as botnets ; 2) The flow that matches a particular filter is marked as normal ; 3) The remaining traffic is marked as background . therefore , Background traffic may be normal or malicious .
Dataset Links :http://mcfp.weebly.com/
BoT-IoT(2019 / real / 73,360,900 / yes / 2)
BoT-IoT The dataset contains more than 7200 Ten thousand records , Include DDoS、DoS、OS、 Service scan 、 Keyboard logging and data leakage attacks . Node-red Tools are used to simulate the network behavior of IOT devices . MQTT Is a lightweight communication protocol , Used to link machine to machine (M2M) signal communication . The IOT scenario of the test platform is the weather station 、 Smart fridge 、 Motion activation lamp 、 Remotely activate the garage door and intelligent thermostat .
Dataset Links :https://www.unsw.adfa.edu.au/unsw-canberra-cyber/cybersecurity/ADFA-NB15-Datasets/bot_iot.php
IoT-23(2020 / real / unknown / yes / 20)
IoT-23 Data set from 23 Network capture of Internet of things traffic ( It's called a scene ) form , Including from infected IOT devices 20 individual (PCAP file ) And three real Internet of things network traffic . Raspberry Pi Malware uses multiple protocols and performs different operations in each malicious scenario . The network traffic of benign scenario captures the network traffic from three real IOT devices : philips HUE intelligence LED The lamp 、 Amazon Echo Home intelligent personal assistant monk Fei intelligent door lock . Both malicious and benign scenarios operate in a controlled network environment with unlimited Internet connections , Just like any real IOT device .
Dataset Links :https://mcfp.felk.cvut.cz/publicDatasets/IoT-23-Dataset/iot_23_datasets_small.tar.gz
ICML-09(2009 / real / 2,400,000 / yes / 1)
Dataset Links :http://www.sysnet.ucsd.edu/projects/url/
CDX(2009 / real / 5771 / yes / 2)
Dataset Links :https://www.usma.edu/centers-and-research/cyber-research-center/data-sets
ISOT Botnet(2010 / real / 1,675,424 / yes /unknown)
Dataset Links :https://www.uvic.ca/engineering/ece/isot/datasets/botnet-ransomware/index.php
ISCX-IDS(2012 / real / 2,450,324 / yes / unknown)
Dataset Links :https://www.unb.ca/cic/datasets/ids.html
Botnet-2014(2014 / real / 283,770 / yes / 16)
Dataset Links :https://www.unb.ca/cic/datasets/botnet.html
CIDDS-001(2017 / emulated / 31,959,267 / yes / 6)
Dataset Links :http://www.hs-coburg.de/cidds
CIDDS-002(2017 / emulated / 16,161,183 / yes / 5)
Dataset Links :http://www.hs-coburg.de/cidds
TRAbID(2017 / emulated / huge / yes / 2)
Dataset Links :https://secplab.ppgia.pucpr.br/?q=trabid
ISOT HTTP Botnet(2017 / emulated / huge / yes / 9)
Dataset Links :https://www.uvic.ca/engineering/ece/isot/datasets/botnet-ransomware/index.php
ISOT CID(2018 / real / 36,938,985 / yes / 18)
Dataset Links :https://www.uvic.ca/engineering/ece/isot/datasets/cloud-security/index.php
InSDN(2020 / real / unknown / yes / 20)
Dataset Links :http://aseados.ucd.ie/?p=177
CIRA-CIC-DoHBrw 2020(2020 / emulated / 1,185,286 / yes / 3)
Dataset Links :https://www.unb.ca/cic/datasets/dohbrw-2020.html
OPCUA(2020 / emulated / 107,634 / yes / 3)
Dataset Links :https://digi2-feup.github.io/OPCUADataset/
To be added …
边栏推荐
- Promotion hung up! The leader said it wasn't my poor skills
- ArcGIS应用基础4 专题图的制作
- PDK process library installation -csmc
- Auto.js学习笔记17:基础监听事件和UI简单的点击事件操作
- Check the useful photo lossless magnification software on Apple computer
- 华为路由器如何配置静态路由
- 网站进行服务器迁移前应做好哪些准备?
- What impact will frequent job hopping have on your career?
- Game push image / table /cv/nlp, multi-threaded start
- Solution of QT TCP packet sticking
猜你喜欢

59. Spiral matrix

Text classification still stays at Bert? The dual contrast learning framework is too strong
[SQL Server fast track] - authentication and establishment and management of user accounts

什么是独立IP,独立IP主机怎么样?
![[Tang Laoshi] C -- encapsulation: classes and objects](/img/4e/30d2d4652ea2d4cd5fa7cbbb795863.jpg)
[Tang Laoshi] C -- encapsulation: classes and objects
【SQL server速成之路】——身份验证及建立和管理用户账户

PDK工艺库安装-CSMC

【经验】win11上安装visio

Sword finger offer II 039 Maximum rectangular area of histogram

Vulhub vulnerability recurrence 71_ Unomi
随机推荐
Promise summary
Li Chuang EDA learning notes 12: common PCB board layout constraint principles
华为路由器如何配置静态路由
How to use PHP string query function
类和对象(一)this指针详解
Auto. JS learning notes 17: basic listening events and UI simple click event operations
【torch】|torch.nn.utils.clip_grad_norm_
Note the various data set acquisition methods of jvxetable
Vulhub vulnerability recurrence 73_ Webmin
大型网站如何选择比较好的云主机服务商?
Sword finger offer II 039 Maximum rectangular area of histogram
LeetCode_ String inversion_ Simple_ 557. Reverse word III in string
A master in the field of software architecture -- Reading Notes of the beauty of Architecture
Clear floating mode
01. Project introduction of blog development project
网站进行服务器迁移前应做好哪些准备?
Station B, Master Liu Er - back propagation
[email protected] raspberry pie
Station B, Master Liu Er - dataset and data loading
Problems encountered in installing mysql8 on MAC