当前位置:网站首页>[anomaly detection] malware detection: mamadroid (dnss 2017)

[anomaly detection] malware detection: mamadroid (dnss 2017)

2022-06-22 06:55:00 chad_ lee

This article uses Markov chain modeling API Sequence , Based on malicious and normal APP Although it is possible to call the same API Sequence , But the call order is different . The article uses Pure static analysis Methods .

《MAMADROID: Detecting Android Malware by Building Markov Chains of Behavioral Models》NDSS 2017

Method : Four steps

 Insert picture description here

overview

[ Failed to transfer the external chain picture , The origin station may have anti-theft chain mechanism , It is suggested to save the pictures and upload them directly (img-ffQF92sX-1642753194945)(C:\Users\lichi\AppData\Roaming\Typora\typora-user-images\image-20201207145941706.png)]

The article doesn't actually use API, But with API Of family perhaps package, Pictured above , One API, Belongs to a class ; Class belongs to package;package Belong to family. The author also has his own difficulties , From static analysis API That's too much , Let's talk about it in detail .

The first step is to get apk use soot And other tools to decompile Call Graph; The second step is from Call Graph Traverse out API Sequence , At the same time, it is abstracted into package/family Sequence ; The third step is Markov chain modeling ; Step 4: classification .

One 、 extract Call Graph

Get this APP Of apk, Then use static analysis tools Soot and FlowDroid Extract API Call Graph. For example, figure 2 Is a code snippet of malware :

 Insert picture description here

Can be extracted API Call Graph:

 Insert picture description here

Two 、 Extract sequence

First, identify all the starting nodes on the graph , Then traverse the entire graph to find all the execution paths , To get this APP Of API Sequence .

The ideal is full , But because the article is extracted from static analysis API Sequence ,44k individual app, More than have been extracted 1000 ten thousand API, This means that the dictionary size is 1000 ten thousand , The transition matrix has 1000 Ten thousand nodes . So there is no way but to API classified , hold API Abstract to the corresponding package perhaps family.

package Altogether 340 Kind of , Include :Android 243、Google API 95、 Customize 1( It may contain malicious actions )、 Fuzzy 1.

family Altogether 11 Kind of , Including customization and obfuscation .

This way API Abstract classification , The size of the dictionary is reduced , The extracted sequence is shown in the figure 4:

 Insert picture description here

3、 ... and 、 Markov chain modeling

This is a first-order Markov chain , Such as Fig 5, Corresponding to the above Fig 4 and Fig 3:

 Insert picture description here

Get one v o c a b _ s i z e × v o c a b _ s i z e vocab\_size \times vocab\_size vocab_size×vocab_size Transfer matrix of size , This APP The eigenvector of is the transfer matrix . It can be used PCA To reduce dimension , Can be better .

The later classifiers are based on the feature vectors extracted from Markov chains .

Four 、 classification

Use four classification algorithms :Random Forests、1-Nearest Neighbor(1-NN)、3-Nearest Neighbor(3-NN)、SVM. The input of the algorithm is the eigenvector obtained in the third step .

Experimental results

 Insert picture description here

The experimental results are very good , After all, it uses a APP All the sequence information to classify this APP, The information is quite large , The task is much simpler than our work .

But the model aging Serious problems , Training with old samples , The accuracy of new samples will decrease ( New sample training , The accuracy of the old sample is very good , No more pictures )

 Insert picture description here

The next direction that this article calls for and guides is to solve the aging problem of models .

原网站

版权声明
本文为[chad_ lee]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/02/202202220543470499.html