FCSC 2021 - ShuffleMe
ShuffleMe
This challenge was part of the Reverse category, and was worth 200 points.
Retrouvez les entrées permettant de valider.
Discovery
First, we identify what type of binary this :
$ file shuffleMe
shuffleMe: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, stripped
This is a 64-bit ELF. It is stripped. Let’s try running the binary :
$ ./shuffleMe
Usage: ./shuffleMe <secret>
$ ./shuffleMe 1
bobcat
Nope.
The program expects a secret as a first argument, then waits for input through stdin. Afterwards, it simply says Nope.
Let’s have a look at the assembly of the program - starting with the main function.
0x00401172 55 push rbp
0x00401173 4889e5 mov rbp, rsp
0x00401176 4883ec70 sub rsp, 0x70
0x0040117a 897d9c mov dword [rbp - 0x64], edi
0x0040117d 48897590 mov qword [rbp - 0x70], rsi
0x00401181 837d9c02 cmp dword [rbp - 0x64], 2
┌─< 0x00401185 742a je 0x4011b1
│ 0x00401187 488b4590 mov rax, qword [rbp - 0x70]
│ 0x0040118b 488b10 mov rdx, qword [rax]
│ 0x0040118e 488b05cb4e00. mov rax, qword [obj.stderr] ; [0x406060:8]=0
│ 0x00401195 be04404000 mov esi, str.Usage:__s__secret ; 0x404004 ; "Usage: %s <secret>\n"
│ 0x0040119a 4889c7 mov rdi, rax
│ 0x0040119d b800000000 mov eax, 0
│ 0x004011a2 e8a9feffff call sym.imp.fprintf
│ 0x004011a7 bf01000000 mov edi, 1
│ 0x004011ac e8cffeffff call sym.imp.exit
└─> 0x004011b1 488b4590 mov rax, qword [rbp - 0x70]
0x004011b5 4883c008 add rax, 8
0x004011b9 488b00 mov rax, qword [rax]
0x004011bc 4889c7 mov rdi, rax
0x004011bf e89cfeffff call sym.imp.atoi
0x004011c4 8945fc mov dword [rbp - 4], eax
0x004011c7 488d45a0 lea rax, [rbp - 0x60]
0x004011cb ba50000000 mov edx, 0x50 ; 'P' ; 80
0x004011d0 be00000000 mov esi, 0
0x004011d5 4889c7 mov rdi, rax
0x004011d8 e863feffff call sym.imp.memset
0x004011dd 488d45a0 lea rax, [rbp - 0x60]
0x004011e1 4889c6 mov rsi, rax
0x004011e4 bf18404000 mov edi, str.70s ; 0x404018 ; "%70s"
0x004011e9 b800000000 mov eax, 0
0x004011ee e87dfeffff call sym.imp.__isoc99_scanf
0x004011f3 8b55fc mov edx, dword [rbp - 4]
0x004011f6 488d45a0 lea rax, [rbp - 0x60]
0x004011fa 4889d6 mov rsi, rdx
0x004011fd 4889c7 mov rdi, rax
0x00401200 e82b000000 call 0x401230
0x00401205 488945f0 mov qword [rbp - 0x10], rax
0x00401209 48817df00000. cmp qword [rbp - 0x10], 0x460000
┌─< 0x00401211 750c jne 0x40121f
│ 0x00401213 bf1d404000 mov edi, str.Yay ; 0x40401d ; "Yay!"
│ 0x00401218 e813feffff call sym.imp.puts
┌──< 0x0040121d eb0a jmp 0x401229
│└─> 0x0040121f bf22404000 mov edi, str.Nope. ; 0x404022 ; "Nope."
│ 0x00401224 e807feffff call sym.imp.puts
└──> 0x00401229 b800000000 mov eax, 0
0x0040122e c9 leave
0x0040122f c3 ret
The code of this function is fairly straight forward : it checks that argc
is equal to 2
, else, it prints the usage and exits.
If we do provide an argument, it calls atoi
on our argument, so we know it probably expects an integer as argument.
Afterwards, it zeroes out a stack buffer, and calls scanf("%70s", stack_buffer)
. Knowing that most challenge flags are 70 characters long, I suspect this expects the flag to be inputed.
Finally, it calls a function at 0x401230
, with our stack buffer and the integer as arguments. If the return value is 0x460000
, it prints Yay!
, otherwise Nope.
. So we know the real magic happens in this function, the goal is to get this function to return 0x460000
The main logic
Let’s have a look at the function at 0x401230
.
0x00401230 4831db xor rbx, rbx
0x00401233 4d31c9 xor r9, r9
0x00401236 4d31db xor r11, r11
0x00401239 49c7c4330000. mov r12, 0x33 ; '3' ; 51
0x00401240 e800000000 call 0x401245
0x00401245 4158 pop r8
0x00401247 4983c038 add r8, 0x38 ; 56
┌┌┌┌─> 0x0040124b 50 push rax
╎╎╎╎ 0x0040124c 4889f0 mov rax, rsi
╎╎╎╎ 0x0040124f 48c7c1b50000. mov rcx, 0xb5 ; 181
╎╎╎╎ 0x00401256 48f7e1 mul rcx
╎╎╎╎ 0x00401259 4805d9000000 add rax, 0xd9 ; 217
╎╎╎╎ 0x0040125f 48c7c1000200. mov rcx, 0x200 ; 512
╎╎╎╎ 0x00401266 48f7f1 div rcx
╎╎╎╎ 0x00401269 4889d6 mov rsi, rdx
╎╎╎╎ 0x0040126c 4889f1 mov rcx, rsi
╎╎╎╎ 0x0040126f 48c1e104 shl rcx, 4
╎╎╎╎ 0x00401273 4c01c1 add rcx, r8
╎╎╎╎ 0x00401276 58 pop rax
╎╎╎╎ 0x00401277 4983eb07 sub r11, 7
╎╎╎╎ 0x0040127b ffe1 jmp rcx
╎╎╎╎ 0x0040127d 4d01d1 add r9, r10
╎╎╎╎ 0x00401280 90 nop
╎╎╎╎ 0x00401281 90 nop
╎╎╎╎ 0x00401282 90 nop
╎╎╎╎ 0x00401283 90 nop
╎╎╎╎ 0x00401284 90 nop
╎╎╎╎ 0x00401285 90 nop
╎╎╎╎ 0x00401286 90 nop
╎╎╎╎ 0x00401287 90 nop
└────< 0x00401288 e9beffffff jmp 0x40124b
╎╎╎ 0x0040128d 4909d9 or r9, rbx
╎╎╎ 0x00401290 90 nop
╎╎╎ 0x00401291 90 nop
╎╎╎ 0x00401292 90 nop
╎╎╎ 0x00401293 90 nop
╎╎╎ 0x00401294 90 nop
╎╎╎ 0x00401295 90 nop
╎╎╎ 0x00401296 90 nop
╎╎╎ 0x00401297 90 nop
└───< 0x00401298 e9aeffffff jmp 0x40124b
╎╎ 0x0040129d 4831f3 xor rbx, rsi
╎╎ 0x004012a0 90 nop
╎╎ 0x004012a1 90 nop
╎╎ 0x004012a2 90 nop
...
This function seems atypical : it is very long - we can’t even see a ret
. Let’s try to understand the prologue.
Firstly, it clears out rbx
, r9
and r11
, and sets r12
to 0x33
. Then, it call 0x401245
. This is the address imediately after ! Why does the program do that ? The first instruction after the call
is a pop r8
, and when a call
occurs, it pushes the return address onto the stack, which there is immediately pop
’d in r8
. This is simply a way of setting r8
to 0x401245
.
Next, it adds 0x38
to r8
, which now contains 0x40127d
. Note that this is the address of slightly later in the function.
Afterwards, it saves rax
by push
‘ing it, then performs some math of rsi
, which is the 2nd argument passed to our function, the argument we passed through argv
.
This is the equivalent python code to the calculations :
rsi = ((0xd9 + (0xb5 * rsi)) % 0x200)
It restores rax
, then adds our calculated rsi << 4
to the previously calculated address in rcx
, substracts 7
to r11
, and finally jumps to rcx
.
What this does is calculate an address to which it jumps, and the possibles addresses are right after : it jumps to one instruction followed by a number of nop
, then jumps back to the top, calculate a new rsi
value, jumps somewhere else, etc… It is some sort of loop, and the our argv
is what determines the seed for the loop.
So how do we figure out the correct value for our argv
?
Finding the correct argument
It is not easy to follows the many jumps the program does, so I wrote a python script to print out the first instruction in the loop based, using the capstone
library.
from capstone import *
base = 0x40127d
with open("shuffleMe", "rb") as f:
data = f.read()
md = Cs(CS_ARCH_X86, CS_MODE_64)
def print_first_instructions(seed):
for _ in range(25):
#Let's print the first 25 instructions
seed = ((0xd9 + (0xb5 * seed)) % 0x200)
addr = base + (seed<<4)
CODE = data[addr - 0x400000 : addr+20 - 0x400000]
#We just want one instruction, but we don't know how many bytes it is, so we take more and break after the first instruction printed out
for dis in md.disasm(CODE, addr):
print("0x%x:\t%s\t%s" %(dis.address, dis.mnemonic, dis.op_str))
break
for i in range(0x200):
print(f"---------{hex(i)}---------")
print_first_instructions(i)
Let’s have a look at the output for the first values :
---------0x0---------
0x40200d: add r11, r12
0x4016dd: mov bl, byte ptr [rdi + 0xa]
0x4017ed: mov bh, bl
0x40183d: xor bx, 0x6783
0x4030cd: xor rbx, rsi
0x402e9d: or r9, rbx
0x4022ad: add r9, r10
0x4031fd: add r11, r12
0x40258d: mov bl, byte ptr [rdi + 0xb]
0x401a5d: mov bh, bl
0x40316d: xor bx, 0x64b0
0x401fbd: xor rbx, rsi
0x401e4d: or r9, rbx
0x401a1d: add r9, r10
0x40242d: add r11, r12
0x40217d: mov bl, byte ptr [rdi + 0xc]
0x401b0d: mov bh, bl
0x402ddd: xor bx, 0x64e3
0x401aed: xor rbx, rsi
0x40173d: or r9, rbx
0x401bcd: add r9, r10
0x40159d: add r11, r12
0x4015ad: mov bl, byte ptr [rdi + 0xd]
0x4020fd: mov bh, bl
0x40208d: xor bx, 0x35da
---------0x1---------
0x402b5d: add r9, r10
0x40166d: add r11, r12
0x4028bd: mov bl, byte ptr [rdi + 0x40]
0x401b4d: mov bh, bl
0x401b1d: xor bx, 0x355e
0x40192d: xor rbx, rsi
0x401a7d: or r9, rbx
0x40280d: add r9, r10
0x401edd: add r11, r12
0x401fed: mov bl, byte ptr [rdi + 0x41]
0x40203d: mov bh, bl
0x4018cd: xor bx, 0x3072
0x40169d: xor rbx, rsi
0x402aad: or r9, rbx
0x4019fd: add r9, r10
0x402d8d: add r11, r12
0x40225d: mov bl, byte ptr [rdi + 0x42]
0x40196d: mov bh, bl
0x4027bd: xor bx, 0x3804
0x40264d: xor rbx, rsi
0x40221d: or r9, rbx
0x402c2d: add r9, r10
0x40297d: add r11, r12
0x40230d: mov bl, byte ptr [rdi + 0x43]
0x4015dd: mov bh, bl
---------0x2---------
0x4016ad: or r9, rbx
0x4015fd: add r9, r10
0x40198d: add r11, r12
0x401e5d: mov bl, byte ptr [rdi + 2]
0x40256d: mov bh, bl
0x4023bd: xor bx, 0x52ae
0x40324d: xor rbx, rsi
0x401e1d: or r9, rbx
0x40182d: add r9, r10
0x40257d: add r11, r12
0x402f0d: mov bl, byte ptr [rdi + 3]
...
The first thing that we notice is that in each case, it takes on byte from [rdi + x]
, and perform some xor
operations on it. Remember that in rdi
we have our first argument to the function, the address of our input. So it is taking one byte from our input. More so, it seems to do it in an incremental manner, but starts at different offset each time. So, to find out the correct argument, we probably have to find on which starts to take values from [rdi + 0]
.
It is the case for 0xc1
!
---------0xc1---------
0x40275d: shl r10, 8
0x40226d: dec rax
0x4024bd: mov bl, byte ptr [rdi]
0x40274d: mov bh, bl
0x40171d: xor bx, 0x476d
0x40252d: xor rbx, rsi
0x40167d: or r9, rbx
0x40140d: add r9, r10
0x401add: add r11, r12
0x402bed: mov bl, byte ptr [rdi + 1]
0x401c3d: mov bh, bl
0x4024cd: xor bx, 0x4341
0x40129d: xor rbx, rsi
0x4016ad: or r9, rbx
0x4015fd: add r9, r10
0x40198d: add r11, r12
0x401e5d: mov bl, byte ptr [rdi + 2]
0x40256d: mov bh, bl
0x4023bd: xor bx, 0x52ae
0x40324d: xor rbx, rsi
0x401e1d: or r9, rbx
0x40182d: add r9, r10
0x40257d: add r11, r12
0x402f0d: mov bl, byte ptr [rdi + 3]
Now we have our value for the argument, time to figure out how it checks the flag.
Finding the correct input
Let’s focus on the instructions following the mov bl, byte ptr [rdi + x]
, after that we always have a mov bh, bl
, so both of the lower bytes of rbx are set to the byte of our input that we are currently checking.
Next, it performs a xor bx, 0xXXYY
followed by a xor rbx, rsi
. First, let’s imagine assume that our goal is to set rbx
to 0
, which would be the simplest way to check our input - without checking what it does to r9
, we can come back to it if needed.
Then, because we know rsi
value easily - it is the seed
variable in our script, we can easily figure out each character by performing a simple 0xXXYY ^ seed
.
Let’s modify our python script to this :
from capstone import *
base = 0x40127d
with open("shuffleMe", "rb") as f:
data = f.read()
md = Cs(CS_ARCH_X86, CS_MODE_64)
def get_flag():
flag = ""
val = 0
seed = 0xc1
while True:
#Let's print the first 25 instructions
seed = ((0xd9 + (0xb5 * seed)) % 0x200)
addr = base + (seed<<4)
CODE = data[addr - 0x400000 : addr+20 - 0x400000]
#We just want one instruction, but we don't know how many bytes it is, so we take more and break after the first instruction printed out
for dis in md.disasm(CODE, addr):
if "xor" in dis.mnemonic and "bx" in dis.op_str and "rsi" not in dis.op_str:
val = int(dis.op_str.split(", ")[1], 16)
if "rsi" in dis.op_str:
ch = chr(int(hex((val ^ seed))[-2:], 16))
#We do this because the result is 2 bytes long, twice the same byte and we just want one
flag = flag + ch
if ch == "}":
print(flag)
return
break
get_flag()
With this, we get the flag : FCSC{af22cfdd46bfc8105cb28e04988f904049dd60be1245718ffef5b9bbf9b509d5}
We did not check how it calculates the return value to get 0x460000
, but that was efficient.