当前位置:网站首页>Summary of data sets in intrusion detection field
Summary of data sets in intrusion detection field
2022-07-06 05:43:00 【Charming pie star】
reference
The data in this paper are quoted from the following literature :
Yang, Zhen, et al. “A systematic literature review of methods and datasets for anomaly-based network intrusion detection.” Computers & Security (2022): 102675.
The meaning of a noun
- emulated: Represents the network traffic generated in the experimental environment
- real: Network traffic captured in real scenes
The values in parentheses after the dataset correspond to :
- Date of publication of data set
- The data set is simulated or real
- The total amount of data in the dataset
- Whether it is marked data
- Total data category
KDD99(1999 / emulated / 5,00,000 / yes / 4)
KDD99 Data set from Lee and Stolfo (2000) from DARPA Network dataset file creation . This data set contains seven weeks of network traffic , It contains about 490 Ten thousand records . Attack types are divided into :(1) The user to root(U2R); (2) Remote to local (R2L); (3) exploration ; (4) DoS. Each instance consists of three categories 41 Two features represent :(1) basic ; ( Two ) Traffic ; (3) Content . The basic feature is from TCP/IP Extracted from the connection . Traffic characteristics are divided into traffic characteristics with the same host characteristics or the same service characteristics . The content characteristics are related to the suspicious behavior of the data part .KDD99 It is the most extensive data set used to evaluate intrusion detection models .
Dataset Links :http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html
NSL-KDD(2009 / emulated / 148,517 / yes / 4)
NSL-KDD For resolution KDD99 Some inherent problems of data sets . although , This new version of KDD Data sets still exist Tavallaee Some problems discussed by et al (2009) And may not be the perfect representative of the existing real network , Due to the lack of Web-based IDS The public data set of , Therefore, it can still be used as an effective benchmark data set , To help researchers compare different intrusion detection methods . Besides ,NSL-KDD The number of records in the training and test set is reasonable . This advantage makes it affordable to run experiments on the whole set without randomly selecting a small part . therefore , Evaluation results of different research work Will be consistent and comparable .
Dataset Links :https://www.unb.ca/cic/datasets/nsl.html
UNSW-NB15(2015 / emulated / 2,540,044 / yes / 9)
UNSW-NB15 The data set was created by the cyber range Laboratory of the Australian Cyber Security Centre . Due to its various novel attack methods , It is widely used . The types of attacks include Fuzzer、Analysis、Backdoor、DoS、Exploits、Generic、Reconnaissance、Shellcode and Worms. It has one containing 82,332 A training set of records and one containing 175,341 Test set of records .
Dataset Links :https://cloudstor.aarnet.edu.au/plus/index.php/s/2DhnLGDdEECo4ys?path=2FUNSW-NB1520-20CSV20Files
CICIDS2017(2017 / emulated / 2,830,743 / yes / 7)
CICIDS2017 Data sets contain benign and common attacks , Including source data (PCAP) And timestamp based 、 The source and target IP、 Source and destination ports 、 Network traffic analysis results of protocol and attack token flow (CSV file ). Used by researchers B-Profile System (Sharafaldin, et al. 2016) Analyze the abstract behavior of human interaction and generate benign background traffic . This data set includes data based on HTTP、HTTPS、FTP、SSH And email protocol 25 Abstract behavior of users . Brute force attacks include FTP、SSH、DoS、Heartbleed、Web attack 、 penetration 、 Botnets and DDoS.
Dataset Links :https://www.unb.ca/cic/datasets/ids-2017.html
CICDDoS2019(2019 / emulated / huge / yes / 11)
CICDoS2019 The dataset contains the latest DDoS attack , Similar to real-world data . It includes the use of CICFLOWMeter-V3 The results of network traffic analysis , It contains token flows based on timestamp sources , And the goal IPS Source and port protocols and attacks .
Dataset Links :https://www.unb.ca/cic/datasets/ddos-2019.html
Kyoto 2006+(2006 / real / unknown / yes / unknown)
Kyoto 2006+ Data set is a publicly available honeypot data set of real network traffic , Contains only a small amount and a small range of reality 、 Normal user behavior . Researchers convert packet based traffic into a new format called session . Each session has 24 Attributes , among 14 One is suffering KDD CUP 99 Statistical information features inspired by data sets , rest 10 Attributes are typical traffic based attributes , for example IP Address ( anonymous )、 Port and duration . These data were collected in three years , Including about 9300 Ten thousand conversations .
Dataset Links :http://www.takakura.com/Kyoto_data/
NDSec-1(2016 / emulated / huge / yes / 8)
NDSec-1 The data set contains traces and log files of network attacks synthesized by researchers from network facilities . It is publicly available , And in 2016 Captured in packet based format . It contains additional system logs and Windows Event log information . The attack mix includes botnets 、 Brute force ( in the light of FTP、HTTP and SSH)、DoS(HTTP、SYN and UDP flooding )、 Exploit 、 Port scanning 、 Deception and XSS/SQL Inject .
Dataset Links :https://www2.hs-fulda.de/NDSec/NDSec-1/Files/
CTU-13(2014 / real / huge / yes / 7)
CTU-13 Data set in 2013 Annual capture , Provide packets 、 One way flow and two-way flow formats . Capture in a university network , its 13 Scenarios include different botnet attacks . More information about infected hosts is available on the website .3 The flow is marked in three stages :1) All traffic to and from infected hosts is marked as botnets ; 2) The flow that matches a particular filter is marked as normal ; 3) The remaining traffic is marked as background . therefore , Background traffic may be normal or malicious .
Dataset Links :http://mcfp.weebly.com/
BoT-IoT(2019 / real / 73,360,900 / yes / 2)
BoT-IoT The dataset contains more than 7200 Ten thousand records , Include DDoS、DoS、OS、 Service scan 、 Keyboard logging and data leakage attacks . Node-red Tools are used to simulate the network behavior of IOT devices . MQTT Is a lightweight communication protocol , Used to link machine to machine (M2M) signal communication . The IOT scenario of the test platform is the weather station 、 Smart fridge 、 Motion activation lamp 、 Remotely activate the garage door and intelligent thermostat .
Dataset Links :https://www.unsw.adfa.edu.au/unsw-canberra-cyber/cybersecurity/ADFA-NB15-Datasets/bot_iot.php
IoT-23(2020 / real / unknown / yes / 20)
IoT-23 Data set from 23 Network capture of Internet of things traffic ( It's called a scene ) form , Including from infected IOT devices 20 individual (PCAP file ) And three real Internet of things network traffic . Raspberry Pi Malware uses multiple protocols and performs different operations in each malicious scenario . The network traffic of benign scenario captures the network traffic from three real IOT devices : philips HUE intelligence LED The lamp 、 Amazon Echo Home intelligent personal assistant monk Fei intelligent door lock . Both malicious and benign scenarios operate in a controlled network environment with unlimited Internet connections , Just like any real IOT device .
Dataset Links :https://mcfp.felk.cvut.cz/publicDatasets/IoT-23-Dataset/iot_23_datasets_small.tar.gz
ICML-09(2009 / real / 2,400,000 / yes / 1)
Dataset Links :http://www.sysnet.ucsd.edu/projects/url/
CDX(2009 / real / 5771 / yes / 2)
Dataset Links :https://www.usma.edu/centers-and-research/cyber-research-center/data-sets
ISOT Botnet(2010 / real / 1,675,424 / yes /unknown)
Dataset Links :https://www.uvic.ca/engineering/ece/isot/datasets/botnet-ransomware/index.php
ISCX-IDS(2012 / real / 2,450,324 / yes / unknown)
Dataset Links :https://www.unb.ca/cic/datasets/ids.html
Botnet-2014(2014 / real / 283,770 / yes / 16)
Dataset Links :https://www.unb.ca/cic/datasets/botnet.html
CIDDS-001(2017 / emulated / 31,959,267 / yes / 6)
Dataset Links :http://www.hs-coburg.de/cidds
CIDDS-002(2017 / emulated / 16,161,183 / yes / 5)
Dataset Links :http://www.hs-coburg.de/cidds
TRAbID(2017 / emulated / huge / yes / 2)
Dataset Links :https://secplab.ppgia.pucpr.br/?q=trabid
ISOT HTTP Botnet(2017 / emulated / huge / yes / 9)
Dataset Links :https://www.uvic.ca/engineering/ece/isot/datasets/botnet-ransomware/index.php
ISOT CID(2018 / real / 36,938,985 / yes / 18)
Dataset Links :https://www.uvic.ca/engineering/ece/isot/datasets/cloud-security/index.php
InSDN(2020 / real / unknown / yes / 20)
Dataset Links :http://aseados.ucd.ie/?p=177
CIRA-CIC-DoHBrw 2020(2020 / emulated / 1,185,286 / yes / 3)
Dataset Links :https://www.unb.ca/cic/datasets/dohbrw-2020.html
OPCUA(2020 / emulated / 107,634 / yes / 3)
Dataset Links :https://digi2-feup.github.io/OPCUADataset/
To be added …
边栏推荐
- Check the useful photo lossless magnification software on Apple computer
- [SQL Server Express Way] - authentification et création et gestion de comptes utilisateurs
- Pytorch代码注意的细节,容易敲错的地方
- Station B, Master Liu Er - dataset and data loading
- 【华为机试真题详解】统计射击比赛成绩
- 【经验】UltralSO制作启动盘时报错:磁盘/映像容量太小
- The ECU of 21 Audi q5l 45tfsi brushes is upgraded to master special adjustment, and the horsepower is safely and stably increased to 305 horsepower
- Deep learning -yolov5 introduction to actual combat click data set training
- Note the various data set acquisition methods of jvxetable
- Sword finger offer II 039 Maximum rectangular area of histogram
猜你喜欢

Summary of deep learning tuning tricks

Station B, Master Liu Er - dataset and data loading

CoDeSys note 2: set coil and reset coil

27io stream, byte output stream, OutputStream writes data to file

26file filter anonymous inner class and lambda optimization

B站刘二大人-线性回归 Pytorch

Winter 2021 pat class B problem solution (C language)

SequoiaDB湖仓一体分布式数据库2022.6月刊

03. 开发博客项目之登录

The ECU of 21 Audi q5l 45tfsi brushes is upgraded to master special adjustment, and the horsepower is safely and stably increased to 305 horsepower
随机推荐
移植InfoNES到STM32
华为路由器如何配置静态路由
[detailed explanation of Huawei machine test] check whether there is a digital combination that meets the conditions
How to download GB files from Google cloud hard disk
Self built DNS server, the client opens the web page slowly, the solution
04. 项目博客之日志
Li Chuang EDA learning notes 12: common PCB board layout constraint principles
PDK工艺库安装-CSMC
First acquaintance with CDN
[experience] when ultralso makes a startup disk, there is an error: the disk / image capacity is too small
Analysis of grammar elements in turtle Library
Vulhub vulnerability recurrence 72_ uWSGI
01. 开发博客项目之项目介绍
【华为机试真题详解】检查是否存在满足条件的数字组合
P2802 回家
Problems encountered in installing mysql8 on MAC
Jvxetable用slot植入j-popup
应用安全系列之三十七:日志注入
Download, install and use NVM of node, and related use of node and NRM
网站进行服务器迁移前应做好哪些准备?