当前位置:网站首页>[anomaly detection] malware detection: mamadroid (dnss 2017)
[anomaly detection] malware detection: mamadroid (dnss 2017)
2022-06-22 06:55:00 【chad_ lee】
This article uses Markov chain modeling API Sequence , Based on malicious and normal APP Although it is possible to call the same API Sequence , But the call order is different . The article uses Pure static analysis Methods .
《MAMADROID: Detecting Android Malware by Building Markov Chains of Behavioral Models》NDSS 2017
Method : Four steps

overview
![[ Failed to transfer the external chain picture , The origin station may have anti-theft chain mechanism , It is suggested to save the pictures and upload them directly (img-ffQF92sX-1642753194945)(C:\Users\lichi\AppData\Roaming\Typora\typora-user-images\image-20201207145941706.png)]](/img/f5/a314f5c84d256f01e4fca4f8526609.jpg)
The article doesn't actually use API, But with API Of family perhaps package, Pictured above , One API, Belongs to a class ; Class belongs to package;package Belong to family. The author also has his own difficulties , From static analysis API That's too much , Let's talk about it in detail .
The first step is to get apk use soot And other tools to decompile Call Graph; The second step is from Call Graph Traverse out API Sequence , At the same time, it is abstracted into package/family Sequence ; The third step is Markov chain modeling ; Step 4: classification .
One 、 extract Call Graph
Get this APP Of apk, Then use static analysis tools Soot and FlowDroid Extract API Call Graph. For example, figure 2 Is a code snippet of malware :

Can be extracted API Call Graph:

Two 、 Extract sequence
First, identify all the starting nodes on the graph , Then traverse the entire graph to find all the execution paths , To get this APP Of API Sequence .
The ideal is full , But because the article is extracted from static analysis API Sequence ,44k individual app, More than have been extracted 1000 ten thousand API, This means that the dictionary size is 1000 ten thousand , The transition matrix has 1000 Ten thousand nodes . So there is no way but to API classified , hold API Abstract to the corresponding package perhaps family.
package Altogether 340 Kind of , Include :Android 243、Google API 95、 Customize 1( It may contain malicious actions )、 Fuzzy 1.
family Altogether 11 Kind of , Including customization and obfuscation .
This way API Abstract classification , The size of the dictionary is reduced , The extracted sequence is shown in the figure 4:

3、 ... and 、 Markov chain modeling
This is a first-order Markov chain , Such as Fig 5, Corresponding to the above Fig 4 and Fig 3:

Get one v o c a b _ s i z e × v o c a b _ s i z e vocab\_size \times vocab\_size vocab_size×vocab_size Transfer matrix of size , This APP The eigenvector of is the transfer matrix . It can be used PCA To reduce dimension , Can be better .
The later classifiers are based on the feature vectors extracted from Markov chains .
Four 、 classification
Use four classification algorithms :Random Forests、1-Nearest Neighbor(1-NN)、3-Nearest Neighbor(3-NN)、SVM. The input of the algorithm is the eigenvector obtained in the third step .
Experimental results

The experimental results are very good , After all, it uses a APP All the sequence information to classify this APP, The information is quite large , The task is much simpler than our work .
But the model aging Serious problems , Training with old samples , The accuracy of new samples will decrease ( New sample training , The accuracy of the old sample is very good , No more pictures )

The next direction that this article calls for and guides is to solve the aging problem of models .
边栏推荐
- [M32] simple interpretation of MCU code, RO data, RW data and Zi data
- Detailed tutorial on connecting MySQL with tableau
- 5g-guti detailed explanation
- 【5G NR】NAS连接管理—CM状态
- sessionStorage 和 localStorage 的使用
- Cactus Song - online operation (5)
- 5G-GUTI详解
- Network layer: IP protocol
- Introduction to 51 Single Chip Microcomputer -- minimum system of single chip microcomputer
- Databricks from open source to commercialization
猜你喜欢

Introduction to 51 single chip microcomputer - matrix key

MySQL ifnull processing n/a

MySQL Niuke brush questions

猿辅导最强暑假计划分享:暑假计划这样做,学习玩耍两不误
![[5g NR] RRC connection reconstruction analysis](/img/7a/6f9942b1874604664924e22e04d516.png)
[5g NR] RRC connection reconstruction analysis

Five common SQL interview questions
![[rust daily] January 23, 2022 webapi benchmarking](/img/f7/64f389ff2b8fb481820e577b8531b3.png)
[rust daily] January 23, 2022 webapi benchmarking

Don't throw away the electric kettle. It's easy to fix!

Shengxin literature learning (Part1) -- precision: a approach to transfer predictors of drug response from pre-clinical ...

流程引擎解决复杂的业务问题
随机推荐
Flink core features and principles
The journey of an operator in the framework of deep learning
5g terminal identification Supi, suci and IMSI analysis
sessionStorage 和 localStorage 的使用
Error when connecting MySQL with dbeaver for the first time
Record of problems caused by WPS document directory update
实训渗透靶场02|3星vh-lll靶机|vulnhub靶场Node1
Pytest data parameterization & data driven
Shengxin literature learning (Part1) -- precision: a approach to transfer predictors of drug response from pre-clinical ...
Databricks from open source to commercialization
C技能树评测——用户至上做精品
代理模式与装饰模式到底哪家强
仙人掌之歌——进军To C直播(2)
6. 安装ssh连接工具(用于我们连接实验室的服务器)
[5g NR] ng interface
流程引擎解决复杂的业务问题
JS中控制对象的访问
[5g NR] UE registration management status
Chrome 安装 driver
Data security practice guide - data collection security management