当前位置：网站首页>Angr（十二）——官方文档（Part3）

Angr（十二）——官方文档（Part3）

2022-08-02 00:13:00 【c1rcl3】

通过阅读、整理angr的官方文档，加深对angr的理解，掌握更多、更通用angr的使用方法

参考链接：

angr Documentation

angr-doc

核心原理

模拟管理器

angr中最重要的控制接口是模拟管理器（Simulation Manager），通过模拟管理器可以同时控制一组状态的的符号执行，使用搜索策略来探索程序的状态空间。

模拟管理器提供了一种巧妙地方法来管理多个状态。状态被组织在存储区中，可以根据需要执行向前推进、过滤、合并或移动等操作。例如：允许不同存储区中的状态以不同的速度向前推进，然后再将它们合并。默认的存储区是active存储区，这是在初始化模拟管理器时放置状态的地方。

Stepping

模拟管理器最基本的功能是将给定存储区中的状态向前推进一个基本块，可以使用.step()。

(angr) [email protected]:/home/c1rcl3/Desktop/Angr/doc# python
Python 3.6.9 (default, Jun 29 2022, 11:45:57) 
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import angr, claripy
>>> project = angr.Project('./fauxware')
>>> state = project.factory.entry_state()
>>> simgr = proj.factory.simgr(state)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'proj' is not defined
>>> simgr = project.factory.simgr(state)
>>> simgr.active
[<SimState @ 0x400580>]
>>> simgr.step()
WARNING | 2022-07-28 18:30:32,239 | angr.storage.memory_mixins.default_filler_mixin | The program is accessing memory with an unspecified value. This could indicate unwanted behavior.
WARNING | 2022-07-28 18:30:32,240 | angr.storage.memory_mixins.default_filler_mixin | angr will cope with this by generating an unconstrained symbolic variable and continuing. You can resolve this by:
WARNING | 2022-07-28 18:30:32,240 | angr.storage.memory_mixins.default_filler_mixin | 1) setting a value to the initial state
WARNING | 2022-07-28 18:30:32,240 | angr.storage.memory_mixins.default_filler_mixin | 2) adding the state option ZERO_FILL_UNCONSTRAINED_{MEMORY,REGISTERS}, to make unknown regions hold null
WARNING | 2022-07-28 18:30:32,240 | angr.storage.memory_mixins.default_filler_mixin | 3) adding the state option SYMBOL_FILL_UNCONSTRAINED_{MEMORY,REGISTERS}, to suppress these messages.
WARNING | 2022-07-28 18:30:32,241 | angr.storage.memory_mixins.default_filler_mixin | Filling memory at 0x7fffffffffeff8c with 4 unconstrained bytes referenced from 0x400585 (_start+0x5 in fauxware (0x400585))
<SimulationManager with 1 active>
>>> simgr.active
[<SimState @ 0x400540>]

存储区模型的真正强大之处在于：当一个状态执行到一个符号分支条件时，这个分支的两个后继状态都会被添加到存储区中，且可以使它们同步的单步执行。如果并不关心非常仔细的控制分析，而是只想逐步执行到没有任何剩余可执行基本块时，可以使用.run()方法。

>>> while len(simgr.active) == 1:
...     simgr.step()
... 
<SimulationManager with 1 active>
<SimulationManager with 1 active>
<SimulationManager with 1 active>
<SimulationManager with 1 active>
<SimulationManager with 1 active>
<SimulationManager with 1 active>
<SimulationManager with 1 active>
<SimulationManager with 1 active>
<SimulationManager with 1 active>
<SimulationManager with 1 active>
<SimulationManager with 1 active>
<SimulationManager with 1 active>
<SimulationManager with 1 active>
<SimulationManager with 1 active>
<SimulationManager with 1 active>
<SimulationManager with 1 active>
<SimulationManager with 1 active>
<SimulationManager with 1 active>
<SimulationManager with 1 active>
<SimulationManager with 1 active>
<SimulationManager with 1 active>
<SimulationManager with 1 active>
<SimulationManager with 1 active>
<SimulationManager with 1 active>
<SimulationManager with 1 active>
<SimulationManager with 1 active>
<SimulationManager with 1 active>
<SimulationManager with 1 active>
<SimulationManager with 1 active>
<SimulationManager with 1 active>
<SimulationManager with 1 active>
<SimulationManager with 1 active>
<SimulationManager with 1 active>
<SimulationManager with 1 active>
<SimulationManager with 1 active>
<SimulationManager with 1 active>
<SimulationManager with 1 active>
WARNING | 2022-07-28 18:37:42,260 | angr.storage.memory_mixins.default_filler_mixin | Filling memory at 0x7fffffffffeff70 with 8 unconstrained bytes referenced from 0x79d820 (strcmp+0x0 in libc.so.6 (0x9d820))
<SimulationManager with 1 active>
<SimulationManager with 2 active>
>>> simgr.active
[<SimState @ 0x400692>, <SimState @ 0x400699>]
>>> simgr.run()
WARNING | 2022-07-28 18:43:41,479 | angr.state_plugins.posix | Trying to open unknown file b'\x00\x00\x00\x00\x00\x00\x00\x00' - created a symbolic file since ALL_FILES_EXIST is set
WARNING | 2022-07-28 18:43:41,546 | angr.storage.memory_mixins.default_filler_mixin | Filling memory at 0x7fffffffffeff00 with 8 unconstrained bytes referenced from 0x810020 (read+0x0 in libc.so.6 (0x110020))
WARNING | 2022-07-28 18:43:41,598 | angr.storage.memory_mixins.default_filler_mixin | Filling memory at 0x7fffffffffeff30 with 8 unconstrained bytes referenced from 0x79d820 (strcmp+0x0 in libc.so.6 (0x9d820))
<SimulationManager with 3 deadended>

当一个state在执行期间未能产生任何后继状态时，它会从active存储区中移出，并放入deadended存储区中。

要在存储区之间移动状态，可以使用.move()方法。例如：移动输出中包含特定字符串的所有状态：

>>> simgr.move(from_stash='deadended', to_stash='authenticated', filter_func=lambda s: b'Welcome' in s.posix.dumps(1))
<SimulationManager with 1 deadended, 2 authenticated>

通过要求将某些状态移动至某个新的存储区来创建新的存储区。在authenticated存储区中的所有状态的标准输出中都有Welcome字符串。

每一个存储区都是一个列表，可以索引或迭代访问每一个状态。如果使用one_存储区的方式来访问存储区，就可以获得这个存储区中的第一个状态；如果使用mp_存储区的方式来访问存储区，就可以获得存储区的mulpyplexed版本。

>>> for s in simgr.deadended + simgr.authenticated:
...     print(s)
... 
<SimState @ 0x743110>
<SimState @ 0xe00058>
<SimState @ 0xe00058>
>>> simgr.one_deadended
<SimState @ 0x743110>
>>> simgr.mp_authenticated
MP([<SimState @ 0xe00058>, <SimState @ 0xe00058>])
>>> simgr.mp_authenticated.posix.dumps(0)
MP([b'\x00\x00\x00\x00\x00\x00\x00\x00\x00SOSNEAKY\x00', b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'])

step、run或其他方法都可以接收stash参数，来指定是对哪个存储区进行操作。

存储区类型

可以使用任何喜欢的存储区名称，但是有一些存储区是对某些状态的特定分类：

Stash	描述
active	step方法默认对这个存储区中的状态执行，除非另一个存储区通过参数指定。
deadended	当某状态因为一些原因（没有更多有效指令、所有后继状态的约束条件均无法满足、指令指针非法）无法继续执行时，状态将被移动至该存储区。
pruned	当使用了LAZY_SOLVES选项时，在非必要时不会检查某状态对应的约束条件的可满足性。当在LAZY_SOLVES模式下有某个状态发现不可满足时，就回去检查该状态的父状态，以找到最初那个不可满足的状态。而这个状态的所有子孙状态都是不可满足的，因此这些状态需要被剪掉，并放入pruned存储区。
unconstrained	如果向模拟管理器的构造函数传入参数save_unconstrained，那么被认为“不受约束”的状态（指令指针被用户输入或其它符号数据源控制）就会放入这个存储区。
unsat	如果向模拟管理器的构造函数传入参数save_unsat，被确定为不可满足的状态会被存储在这里。

还有另一个状态列表：errored。如果在执行过程中发生了错误，那么该状态将被包装在一个ErrorRecord对象中，该对象包含state（record.state）和它触发的错误（record.error），并将该对象插入到errored列表中。可以通过record.debug()来启动一个调试shell。

符号执行中一个极其常见的操作是找到可以到达某地址的状态，同时丢弃所有经过另一地址的状态。模拟管理器通过.explore()方法来解决这类问题。

当传入find参数时，explore将一直运行直到找到与find条件匹配的状态，该条件可以是某条指令的地址、指令地址列表或是一个函数（该函数以state为参数，并判断该state是否满足某些条件）。当active存储区中找到符合find条件的状态时，这些状态将会被放入found存储区，并终止执行。之后，可以遍历found中的state。还可以通过与find相同的方式来指定avoid参数。当某个状态符合avoid条件时，将它移动至avoided存储区，并继续执行。最后，num_find参数控制返回之前应该找到的状态的数目，默认值为1。如果在找到num_find个解决方案（状态）之前，active中已经没有状态，则无论如何都会停止执行。

示例：

1. 加载二进制文件

(angr) [email protected]:/home/c1rcl3/Desktop/Angr/doc# python
Python 3.6.9 (default, Jun 29 2022, 11:45:57) 
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import angr, claripy
>>> project = angr.Project('./crackme0x00a')

2. 创建模拟管理器

>>> simgr = project.factory.simgr()

3. 符号执行直到找到符合条件的状态

>>> simgr.explore(find=0x804853A)
WARNING | 2022-07-28 19:45:14,656 | angr.storage.memory_mixins.default_filler_mixin | The program is accessing register with an unspecified value. This could indicate unwanted behavior.
WARNING | 2022-07-28 19:45:14,657 | angr.storage.memory_mixins.default_filler_mixin | angr will cope with this by generating an unconstrained symbolic variable and continuing. You can resolve this by:
WARNING | 2022-07-28 19:45:14,657 | angr.storage.memory_mixins.default_filler_mixin | 1) setting a value to the initial state
WARNING | 2022-07-28 19:45:14,657 | angr.storage.memory_mixins.default_filler_mixin | 2) adding the state option ZERO_FILL_UNCONSTRAINED_{MEMORY,REGISTERS}, to make unknown regions hold null
WARNING | 2022-07-28 19:45:14,657 | angr.storage.memory_mixins.default_filler_mixin | 3) adding the state option SYMBOL_FILL_UNCONSTRAINED_{MEMORY,REGISTERS}, to suppress these messages.
WARNING | 2022-07-28 19:45:14,658 | angr.storage.memory_mixins.default_filler_mixin | Filling register edi with 4 unconstrained bytes referenced from 0x8048571 (__libc_csu_init+0x1 in crackme0x00a (0x8048571))
WARNING | 2022-07-28 19:45:14,659 | angr.storage.memory_mixins.default_filler_mixin | Filling register ebx with 4 unconstrained bytes referenced from 0x8048573 (__libc_csu_init+0x3 in crackme0x00a (0x8048573))
<SimulationManager with 1 active, 1 found>

4. 获取正确的输入

>>> s = simgr.found[0]
>>> print(s.posix.dumps(0))
b'g00dJ0B!\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'

探索技术

angr对一部分功能进行封装，使得可以对模拟管理器进行自定义，称为探索技术。探索技术是为了修改探索程序状态空间的默认方法。要使用探索技术，需要调用simgr.use_technique(tech)，其中tech是ExplorationTechnique子类的一个实例。angr内置的探索技术可以通过angr.exploration_techniques查看。

DFS	深度优先搜索。一次只保持一个状态处于active状态，将其余状态放入deferred存储区中，直到active的状态停止或出错。
Explorer	实现了.explore()方法。允许通过find和avoid来找到或避开某些状态。
LengthLimiter	限制state可以走过的最大路径长度。
LoopSeer	用循环计数的合理近似值来丢弃通过循环次数过多的状态。将他们放入spinning存储区中，如果探索完其它可行状态，则将它们再次拉出。
ManualMergepoint	将程序中某个地址设置为合并点。到达该地址的状态将会被短暂保存，并且在指定时延内到达该点的状态将会被合并。
MemoryWatcher	在模拟管理器执行step之间检查系统中有多少内存空间是空闲可用的，如果太少，则停止探索。
Oppologist	当angr执行遇到一条不被支持的指令时，将具体化该指令的所有输入并且使用unicorn engine来执行这条指令，使能够继续执行。
Spiller	当处于active的状态过多时，该技术可以将一些状态转储到磁盘以实现较低的内存消耗。
Threading	将线程级的并行添加到step执行过程中。
Tracer	按照其它动态跟踪记录执行的探索技术。
Veritesting	自动化定义合并点。

执行引擎

当使用angr执行一个step（基本块）时，需要有一个东西来切实的执行。angr使用一系列的引擎（SimEngine类的子类）来模拟某段代码对输入状态的影响。angr执行的核心就是按顺序尝试所有可用的引擎，并选择一个能够处理该step的引擎。默认引擎列表按需依次为：

如果前一个step导致程序到达一个无法继续的state，会启用failure engine。
如果前一个step是以系统调用结束的，会启用syscall engine。
如果当前地址被hook，会启用hook engine。
当unicorn选项被启用并且state中没有符号数据时，会启用unicorn engine。
如果上述条件都不满足，则启用VEX engine。

SimSuccessors

依次尝试所有引擎的代码是project.factory.successors(state, **kwargs)，该函数将参数依次传递给各个引擎。这个函数是state.step()和simulation_manager.step()方法的核心。它们会返回一个SimSuccessors对象。SimSuccessors的目的是对产生的后继状态进行简单的分类，分别存储在不同的属性列表中：

属性	条件	指令指针	描述
successors	True（状态约束条件可以满足）	可以是符号值（解的数目必须小于等于256）	由引擎处理某状态后得到的普通的、可满足的后续状态指令指针可以是符号值（例如基于用户输入进行跳转）因此这个列表中存储的状态可能产生多个后继状态
unsat_successors	False（状态约束条件不可满足）	可以是符号值	不可满足的后继状态这些状态的约束条件不可能被满足（如跳转不可能发生，或必须跳转到默认分支）
flat_successors	True（状态约束条件可以满足）	具体值	如上所述，successors列表中状态的指令指针可以是符号值，这会带来问题：在执行一次step时，一个state只能表示某个基本代码块的执行结果为了解决这种情况，在处理successors列表中符号指令指针的状态时，会先计算出所有可能的具体解决方案（上限为256），并为每一种解决方案创建一个状态的副本，称为平坦化
unconstrained_successors	True（状态约束条件可以满足）	符号值	在上述平坦化的过程中，如果符号指令指针的解决方案数目超过256，就假设这个符号指令指针是一个不受约束的数据（例如用户数据导致栈溢出）这个假设通常是不合理的，将这些状态放入unconstrained_successors列表中
all_successors	任何	可以是符号值	successors+unsat_successors+unconstrained_successors

断点

angr支持断点：

(angr) [email protected]:/home/c1rcl3/Desktop/Angr/doc# python
Python 3.6.9 (default, Jun 29 2022, 11:45:57) 
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import angr
>>> project = angr.Project('./fauxware')
>>> state = project.factory.entry_state()
>>> state.inspect.b('mem_write') # 设置断点
<BP before-action with conditions {}, no condition func, no action func>
>>> def debug_func(state):
...     print("state is about to do a memory write")
... 
>>> state.inspect.b('mem_write', when=angr.BP_AFTER, action=debug_func) # 设置断点 断点被触发后使用回调函数处理
<BP after-action with conditions {}, no condition func, with action func>
>>> state.inspect.b('mem_write', when=angr.BP_AFTER, action=angr.BP_IPYTHON) # 设置断点 断点被触发后进入ipython交互界面
<BP after-action with conditions {}, no condition func, with action func>

除了内存写之外，还有很多其他种类的断点，如下表所示。对于表中的事件，都可以设置是在触发前还是触发后中断。

mem_read	内存被读取
mem_write	内存被写入
address_concretization	符号化内存值被解析
reg_read	寄存器被读取
reg_write	寄存器被写入
tmp_read	立即数被读取
tmp_write	立即数被写入
expr	表达式被建立（例如：算术运算的结果或IR中的常数）
statement	一个IR statement被执行
instruction	一个新的指令被执行
irsb	一个新的基本块被执行
constraints	新的条件约束被添加到state中
exit	通过执行产生一个新的后继状态
fork	一个符号执行的state分叉成多个state
symbolic_variable	一个新的符号变量被创建
call	执行到call指令
return	执行到return指令
simprocedure	一个simprocedure（或系统调用）被执行
dirty	一个dirty IR回调被执行
syscall	一个系统调用被执行（除了simprocedure的事件）
engine_process	一个SimEngine对象要处理一段代码

事件的具体属性见官方文档。这些属性可以在处理断点的回调函数中通过state.inspect的成员访问，甚至可以进行修改。此外，这些属性可以作为inspect.b的参数，为断点设置条件。条件也可以通过函数来设定。

关于mem_read断点：

每当执行程序时进行内存读取，就会触发mem_read断点。如果在设置了mem_read断点的同时使用state.mem从指定的内存地址加载数据，那么断点就会触发。

所以如果希望在设置了mem_read断点的同时从内存中加载数据，那么可以使用state.memory.load函数，并且传递参数disable_actions=True和inspect=False。

state.find也可以通过传递相同的参数来防止触发mem_read断点。

分析

angr是为了让对二进制文件的分析更加简单。因此，angr允许将分析代码打包成通用格式，从而可以方便的分析任何文件。所有的分析器都在project.analyses中，且可以被当作方法调用，并返回分析结果的实例。

内置分析器

CFGFast	构造程序的快速控制流图
CFGEmulated	构造程序的精确控制流图
VFG	对程序中的每一个函数执行VSA，创建值流图并检测栈变量
DDG	计算数据依赖图，可以确定某个值依赖于哪些语句
BackwardSlice	有目的地计算一个程序的backward slice
Identifier	识别CGC二进制文件常用的库函数