当前位置:网站首页>Malware detection method based on convolutional neural network
Malware detection method based on convolutional neural network
2022-07-08 00:41:00 【biyezuopinvip】
Resource download address :https://download.csdn.net/download/sheziqiong/85948113
Resource download address :https://download.csdn.net/download/sheziqiong/85948113
Malware detection method based on convolutional neural network
Project structure
software/ // Executable file
malware/
normal/
utils/ // Script for processing executable files
exe_add_remove_prefix.py
exe_filter.py
exe_analyse.py
exe_to_bytes.py
exe_to_img.py
imgs/ // Grayscale image
malware_img/
normal_img/
data/ // Training set and verification set
train/
malware/
normal/
valid/
malware/
normal/
split_data.py // Divide training set and verification set
exe_rm.txt
loss_history.py // draw ACC Curves and LOSS Common classes of curves
model/ // The storage model
output/ // Store eigenvectors
result/ // Store model validation results
vgg16-finetune.py // VGG16 Single model evaluation and feature vector extraction
inceptionv3-finetune.py // inception-v3 Single model evaluation and feature vector extraction
xception-finetune.py // xception Single model evaluation and feature vector extraction
resnet50-finetune.py // resnet50 Single model evaluation and feature vector extraction
merge_all.py // Model fusion
my_model.py // Custom model
predict.py // Model accuracy 、 The rate of false positives 、 Underreporting rate 、 Time cost assessment
merge_all_predict.py // Model accuracy 、 The rate of false positives 、 Underreporting rate evaluation
inception.py // Customize inception Model ( The sample program , Do not participate in the final model evaluation )
run.sh
The basic flow
collecting data
First, collect a large number of malware and benign software through various channels , Among them, benign software mainly comes from WinXP、Win7、Win8、Win10 Wait for several systems . Malware mainly comes from websites that collect malware .
exe_add_remove_prefix.py
Add winxp_、win7_、win8_、win10 Prefix , To distinguish benign software from different operating systems .
exe_analyse.py
Analyze the size of malware and benign software , Through analysis, we can draw the following conclusions :
- Yes 99% The benign software size is 0 ~ 8 MB Between .
- Yes 99.87% The benign software size of is larger than 1 KB.
- Yes 99% The size of malware is 0 ~ 3 MB Between .
- Yes 99% The size of malware is larger than 200 B.
The specific analysis results are as follows :
====================================================================================================
Total number of benign software : 10059
> 0 MB Number of benign software : 10059 100.00%
> 1 MB Number of benign software : 1935 19.24%
> 2 MB Number of benign software : 1110 11.03%
> 3 MB Number of benign software : 661 6.57%
> 4 MB Number of benign software : 372 3.70%
> 5 MB Number of benign software : 224 2.23%
> 6 MB Number of benign software : 160 1.59%
> 7 MB Number of benign software : 119 1.18%
> 8 MB Number of benign software : 78 0.78%
> 9 MB Number of benign software : 43 0.43%
> 10 MB Number of benign software : 0 0.00%
====================================================================================================
Total malware : 57988
> 0 MB Number of malware : 57988 100.00%
> 1 MB Number of malware : 2092 3.61%
> 2 MB Number of malware : 920 1.59%
> 3 MB Number of malware : 558 0.96%
> 4 MB Number of malware : 304 0.52%
> 5 MB Number of malware : 163 0.28%
> 6 MB Number of malware : 91 0.16%
> 7 MB Number of malware : 71 0.12%
> 8 MB Number of malware : 64 0.11%
> 9 MB Number of malware : 50 0.09%
> 10 MB Number of malware : 46 0.08%
====================================================================================================
Total number of benign software : 10059
> 0 KB Number of benign software : 10059 100.00%
> 1 KB Number of benign software : 10046 99.87%
> 2 KB Number of benign software : 10035 99.76%
> 3 KB Number of benign software : 10030 99.71%
> 4 KB Number of benign software : 10019 99.60%
> 5 KB Number of benign software : 10004 99.45%
> 6 KB Number of benign software : 9990 99.31%
> 7 KB Number of benign software : 9972 99.14%
> 8 KB Number of benign software : 9941 98.83%
> 9 KB Number of benign software : 9902 98.44%
> 10 KB Number of benign software : 9848 97.90%
====================================================================================================
Total malware : 57988
> 0 KB Number of malware : 57988 100.00%
> 1 KB Number of malware : 55711 96.07%
> 2 KB Number of malware : 54521 94.02%
> 3 KB Number of malware : 53616 92.46%
> 4 KB Number of malware : 52961 91.33%
> 5 KB Number of malware : 52446 90.44%
> 6 KB Number of malware : 51812 89.35%
> 7 KB Number of malware : 51230 88.35%
> 8 KB Number of malware : 50403 86.92%
> 9 KB Number of malware : 49807 85.89%
> 10 KB Number of malware : 49154 84.77%
====================================================================================================
Total malware : 57988
> 0 B Number of malware : 57988 100.00%
> 100 B Number of malware : 57791 99.66%
> 200 B Number of malware : 57555 99.25%
> 300 B Number of malware : 57315 98.84%
> 400 B Number of malware : 57094 98.46%
> 500 B Number of malware : 56903 98.13%
> 600 B Number of malware : 56475 97.39%
> 700 B Number of malware : 56268 97.03%
> 800 B Number of malware : 56119 96.78%
> 900 B Number of malware : 55971 96.52%
> 1000 B Number of malware : 55785 96.20%
exe_filter.py
By analyzing the size of malware and benign software , We can preliminarily screen malware and benign software :
- The deletion size is larger than 2 MB Or less than 500 B Of malware .
- The deletion size is larger than 5 MB Or less than 1KB Benign software .
Generate... In the project root directoryexe_rm.txt
file , This file records all to be deleted exe file name .
exe_to_img.py
Turn the executable file into a grayscale image .
split_data.py
Divide the data set , produce data/
Folder , The division rules are as follows :
- There are as many benign software as malware
- 80% The data of is classified into the training set ,20% The data of is included in the validation set
The program runs as follows :
Backdoor num: 10854
Trojan num: 24428
worm num: 1349
exploit num: 409
Number of normal software : 9822
The number of malware : 9820
run.sh
Train all models and evaluate .
Model fusion diagram
The schematic diagrams of other models are more complex , Please move to result View under folder .
Evaluate the results
The model name | Accuracy rate | The rate of false positives | Underreporting rate | Time cost |
---|---|---|---|---|
VGG16 | 95.92% | 3.46% | 4.59% | 58s |
Xception | 95.16% | 6.87% | 2.80% | 62s |
inception-v3 | 94.20% | 5.95% | 5.65% | 47s |
ResNet50 | 94.40% | 6.26% | 4.94% | 54s |
Model fusion | 96.65% | 3.21% | 3.49% | / |
Custom model | 96.75% | 3.11% | 3.39% | 73s |
Resource download address :https://download.csdn.net/download/sheziqiong/85948113
Resource download address :https://download.csdn.net/download/sheziqiong/85948113
边栏推荐
猜你喜欢
接口测试进阶接口脚本使用—apipost(预/后执行脚本)
The underlying principles and templates of new and delete
1293_FreeRTOS中xTaskResumeAll()接口的实现分析
Smart regulation enters the market, where will meituan and other Internet service platforms go
What if the testing process is not perfect and the development is not active?
SQL knowledge summary 004: Postgres terminal command summary
DNS 系列(一):为什么更新了 DNS 记录不生效?
某马旅游网站开发(对servlet的优化)
【笔记】常见组合滤波电路
Qt添加资源文件,为QAction添加图标,建立信号槽函数并实现
随机推荐
[C language] objective questions - knowledge points
fabulous! How does idea open multiple projects in a single window?
How does the markdown editor of CSDN input mathematical formulas--- Latex syntax summary
ReentrantLock 公平锁源码 第0篇
The standby database has been delayed. Check that the MRP is wait_ for_ Log, apply after restarting MRP_ Log but wait again later_ for_ log
Service Mesh的基本模式
51与蓝牙模块通讯,51驱动蓝牙APP点灯
They gathered at the 2022 ecug con just for "China's technological power"
"An excellent programmer is worth five ordinary programmers", and the gap lies in these seven key points
Is it safe to open an account on the official website of Huatai Securities?
Notice on organizing the second round of the Southwest Division (Sichuan) of the 2021-2022 National Youth electronic information intelligent innovation competition
2022-07-07:原本数组中都是大于0、小于等于k的数字,是一个单调不减的数组, 其中可能有相等的数字,总体趋势是递增的。 但是其中有些位置的数被替换成了0,我们需要求出所有的把0替换的方案数量:
Cascade-LSTM: A Tree-Structured Neural Classifier for Detecting Misinformation Cascades(KDD20)
5g NR system messages
基于微信小程序开发的我最在行的小游戏
ABAP ALV LVC template
“一个优秀程序员可抵五个普通程序员”,差距就在这7个关键点
Hotel
Which securities company has a low, safe and reliable account opening commission
Where is the big data open source project, one-stop fully automated full life cycle operation and maintenance steward Chengying (background)?