FCSC 2021 - ShuffleMe

May 3, 2021

ShuffleMe

This challenge was part of the Reverse category, and was worth 200 points.

Retrouvez les entrées permettant de valider.

Discovery

First, we identify what type of binary this :

$ file shuffleMe 
shuffleMe: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, stripped

This is a 64-bit ELF. It is stripped. Let’s try running the binary :

$ ./shuffleMe 
Usage: ./shuffleMe <secret>
$ ./shuffleMe 1
bobcat
Nope.

The program expects a secret as a first argument, then waits for input through stdin. Afterwards, it simply says Nope. Let’s have a look at the assembly of the program - starting with the main function.

            0x00401172      55             push rbp
            0x00401173      4889e5         mov rbp, rsp
            0x00401176      4883ec70       sub rsp, 0x70
            0x0040117a      897d9c         mov dword [rbp - 0x64], edi
            0x0040117d      48897590       mov qword [rbp - 0x70], rsi
            0x00401181      837d9c02       cmp dword [rbp - 0x64], 2
        ┌─< 0x00401185      742a           je 0x4011b1
        │   0x00401187      488b4590       mov rax, qword [rbp - 0x70]
        │   0x0040118b      488b10         mov rdx, qword [rax]
        │   0x0040118e      488b05cb4e00.  mov rax, qword [obj.stderr] ; [0x406060:8]=0
        │   0x00401195      be04404000     mov esi, str.Usage:__s__secret ; 0x404004 ; "Usage: %s <secret>\n"
        │   0x0040119a      4889c7         mov rdi, rax
        │   0x0040119d      b800000000     mov eax, 0
        │   0x004011a2      e8a9feffff     call sym.imp.fprintf
        │   0x004011a7      bf01000000     mov edi, 1
        │   0x004011ac      e8cffeffff     call sym.imp.exit
        └─> 0x004011b1      488b4590       mov rax, qword [rbp - 0x70]
            0x004011b5      4883c008       add rax, 8
            0x004011b9      488b00         mov rax, qword [rax]
            0x004011bc      4889c7         mov rdi, rax
            0x004011bf      e89cfeffff     call sym.imp.atoi
            0x004011c4      8945fc         mov dword [rbp - 4], eax
            0x004011c7      488d45a0       lea rax, [rbp - 0x60]
            0x004011cb      ba50000000     mov edx, 0x50               ; 'P' ; 80
            0x004011d0      be00000000     mov esi, 0
            0x004011d5      4889c7         mov rdi, rax
            0x004011d8      e863feffff     call sym.imp.memset
            0x004011dd      488d45a0       lea rax, [rbp - 0x60]
            0x004011e1      4889c6         mov rsi, rax
            0x004011e4      bf18404000     mov edi, str.70s            ; 0x404018 ; "%70s"
            0x004011e9      b800000000     mov eax, 0
            0x004011ee      e87dfeffff     call sym.imp.__isoc99_scanf
            0x004011f3      8b55fc         mov edx, dword [rbp - 4]
            0x004011f6      488d45a0       lea rax, [rbp - 0x60]
            0x004011fa      4889d6         mov rsi, rdx
            0x004011fd      4889c7         mov rdi, rax
            0x00401200      e82b000000     call 0x401230
            0x00401205      488945f0       mov qword [rbp - 0x10], rax
            0x00401209      48817df00000.  cmp qword [rbp - 0x10], 0x460000
        ┌─< 0x00401211      750c           jne 0x40121f
        │   0x00401213      bf1d404000     mov edi, str.Yay            ; 0x40401d ; "Yay!"
        │   0x00401218      e813feffff     call sym.imp.puts
       ┌──< 0x0040121d      eb0a           jmp 0x401229
       │└─> 0x0040121f      bf22404000     mov edi, str.Nope.          ; 0x404022 ; "Nope."
       │    0x00401224      e807feffff     call sym.imp.puts
       └──> 0x00401229      b800000000     mov eax, 0
            0x0040122e      c9             leave
            0x0040122f      c3             ret

The code of this function is fairly straight forward : it checks that argc is equal to 2, else, it prints the usage and exits. If we do provide an argument, it calls atoi on our argument, so we know it probably expects an integer as argument. Afterwards, it zeroes out a stack buffer, and calls scanf("%70s", stack_buffer). Knowing that most challenge flags are 70 characters long, I suspect this expects the flag to be inputed. Finally, it calls a function at 0x401230, with our stack buffer and the integer as arguments. If the return value is 0x460000, it prints Yay!, otherwise Nope.. So we know the real magic happens in this function, the goal is to get this function to return 0x460000

The main logic

Let’s have a look at the function at 0x401230.

            0x00401230      4831db         xor rbx, rbx
            0x00401233      4d31c9         xor r9, r9
            0x00401236      4d31db         xor r11, r11
            0x00401239      49c7c4330000.  mov r12, 0x33               ; '3' ; 51
            0x00401240      e800000000     call 0x401245
            0x00401245      4158           pop r8
            0x00401247      4983c038       add r8, 0x38                ; 56
     ┌┌┌┌─> 0x0040124b      50             push rax
     ╎╎╎╎   0x0040124c      4889f0         mov rax, rsi
     ╎╎╎╎   0x0040124f      48c7c1b50000.  mov rcx, 0xb5               ; 181
     ╎╎╎╎   0x00401256      48f7e1         mul rcx
     ╎╎╎╎   0x00401259      4805d9000000   add rax, 0xd9               ; 217
     ╎╎╎╎   0x0040125f      48c7c1000200.  mov rcx, 0x200              ; 512
     ╎╎╎╎   0x00401266      48f7f1         div rcx
     ╎╎╎╎   0x00401269      4889d6         mov rsi, rdx
     ╎╎╎╎   0x0040126c      4889f1         mov rcx, rsi
     ╎╎╎╎   0x0040126f      48c1e104       shl rcx, 4
     ╎╎╎╎   0x00401273      4c01c1         add rcx, r8
     ╎╎╎╎   0x00401276      58             pop rax
     ╎╎╎╎   0x00401277      4983eb07       sub r11, 7
     ╎╎╎╎   0x0040127b      ffe1           jmp rcx
     ╎╎╎╎   0x0040127d      4d01d1         add r9, r10
     ╎╎╎╎   0x00401280      90             nop
     ╎╎╎╎   0x00401281      90             nop
     ╎╎╎╎   0x00401282      90             nop
     ╎╎╎╎   0x00401283      90             nop
     ╎╎╎╎   0x00401284      90             nop
     ╎╎╎╎   0x00401285      90             nop
     ╎╎╎╎   0x00401286      90             nop
     ╎╎╎╎   0x00401287      90             nop
     └────< 0x00401288      e9beffffff     jmp 0x40124b
      ╎╎╎   0x0040128d      4909d9         or r9, rbx
      ╎╎╎   0x00401290      90             nop
      ╎╎╎   0x00401291      90             nop
      ╎╎╎   0x00401292      90             nop
      ╎╎╎   0x00401293      90             nop
      ╎╎╎   0x00401294      90             nop
      ╎╎╎   0x00401295      90             nop
      ╎╎╎   0x00401296      90             nop
      ╎╎╎   0x00401297      90             nop
      └───< 0x00401298      e9aeffffff     jmp 0x40124b
       ╎╎   0x0040129d      4831f3         xor rbx, rsi
       ╎╎   0x004012a0      90             nop
       ╎╎   0x004012a1      90             nop
       ╎╎   0x004012a2      90             nop
...

This function seems atypical : it is very long - we can’t even see a ret. Let’s try to understand the prologue. Firstly, it clears out rbx, r9 and r11, and sets r12 to 0x33. Then, it call 0x401245. This is the address imediately after ! Why does the program do that ? The first instruction after the call is a pop r8, and when a call occurs, it pushes the return address onto the stack, which there is immediately pop’d in r8. This is simply a way of setting r8 to 0x401245.

Next, it adds 0x38 to r8, which now contains 0x40127d. Note that this is the address of slightly later in the function.

Afterwards, it saves rax by push‘ing it, then performs some math of rsi, which is the 2nd argument passed to our function, the argument we passed through argv. This is the equivalent python code to the calculations :

rsi = ((0xd9 + (0xb5 * rsi)) % 0x200)

It restores rax, then adds our calculated rsi << 4 to the previously calculated address in rcx, substracts 7 to r11, and finally jumps to rcx. What this does is calculate an address to which it jumps, and the possibles addresses are right after : it jumps to one instruction followed by a number of nop, then jumps back to the top, calculate a new rsi value, jumps somewhere else, etc… It is some sort of loop, and the our argv is what determines the seed for the loop. So how do we figure out the correct value for our argv ?

Finding the correct argument

It is not easy to follows the many jumps the program does, so I wrote a python script to print out the first instruction in the loop based, using the capstone library.

from capstone import *

base = 0x40127d


with open("shuffleMe", "rb") as f:
        data = f.read()

md = Cs(CS_ARCH_X86, CS_MODE_64)


def print_first_instructions(seed):
    for _ in range(25):
        #Let's print the first 25 instructions
        seed = ((0xd9 + (0xb5 * seed)) % 0x200)
        addr = base + (seed<<4)
        CODE = data[addr - 0x400000 : addr+20 - 0x400000]
        #We just want one instruction, but we don't know how many bytes it is, so we take more and break after the first instruction printed out
        for dis in md.disasm(CODE, addr):
            print("0x%x:\t%s\t%s" %(dis.address, dis.mnemonic, dis.op_str))
            break


for i in range(0x200):
    print(f"---------{hex(i)}---------")
    print_first_instructions(i)

Let’s have a look at the output for the first values :

---------0x0---------                                                                                                                                                     
0x40200d:       add     r11, r12                                                                                                                                        
0x4016dd:       mov     bl, byte ptr [rdi + 0xa]                                                                                                                        
0x4017ed:       mov     bh, bl                                                                                                                                          
0x40183d:       xor     bx, 0x6783                                                                                                                                      
0x4030cd:       xor     rbx, rsi                                                                                                                                        
0x402e9d:       or      r9, rbx                                                                                                                                         
0x4022ad:       add     r9, r10                                                                                                                                         
0x4031fd:       add     r11, r12                                                                                                                                        
0x40258d:       mov     bl, byte ptr [rdi + 0xb]                                                                                                                        
0x401a5d:       mov     bh, bl                                                                                                                                          
0x40316d:       xor     bx, 0x64b0                                                                                                                                      
0x401fbd:       xor     rbx, rsi                                                                                                                                        
0x401e4d:       or      r9, rbx                                                                                                                                         
0x401a1d:       add     r9, r10                                                                                                                                         
0x40242d:       add     r11, r12                                                                                                                                        
0x40217d:       mov     bl, byte ptr [rdi + 0xc]                                                                                                                        
0x401b0d:       mov     bh, bl                                                                                                                                          
0x402ddd:       xor     bx, 0x64e3                                                                                                                                      
0x401aed:       xor     rbx, rsi                                                                                                                                        
0x40173d:       or      r9, rbx                                                                                                                                         
0x401bcd:       add     r9, r10                                                                                                                                         
0x40159d:       add     r11, r12                                                                                                                                        
0x4015ad:       mov     bl, byte ptr [rdi + 0xd]                                                                                                                        
0x4020fd:       mov     bh, bl                                                                                                                                          
0x40208d:       xor     bx, 0x35da                                                                                                                                      
---------0x1---------                                                                                                                                                     
0x402b5d:       add     r9, r10
0x40166d:       add     r11, r12
0x4028bd:       mov     bl, byte ptr [rdi + 0x40]
0x401b4d:       mov     bh, bl
0x401b1d:       xor     bx, 0x355e
0x40192d:       xor     rbx, rsi
0x401a7d:       or      r9, rbx
0x40280d:       add     r9, r10
0x401edd:       add     r11, r12
0x401fed:       mov     bl, byte ptr [rdi + 0x41]
0x40203d:       mov     bh, bl
0x4018cd:       xor     bx, 0x3072
0x40169d:       xor     rbx, rsi
0x402aad:       or      r9, rbx
0x4019fd:       add     r9, r10
0x402d8d:       add     r11, r12
0x40225d:       mov     bl, byte ptr [rdi + 0x42]
0x40196d:       mov     bh, bl
0x4027bd:       xor     bx, 0x3804
0x40264d:       xor     rbx, rsi
0x40221d:       or      r9, rbx
0x402c2d:       add     r9, r10
0x40297d:       add     r11, r12
0x40230d:       mov     bl, byte ptr [rdi + 0x43]
0x4015dd:       mov     bh, bl
---------0x2---------
0x4016ad:       or      r9, rbx
0x4015fd:       add     r9, r10
0x40198d:       add     r11, r12
0x401e5d:       mov     bl, byte ptr [rdi + 2]
0x40256d:       mov     bh, bl
0x4023bd:       xor     bx, 0x52ae
0x40324d:       xor     rbx, rsi
0x401e1d:       or      r9, rbx
0x40182d:       add     r9, r10
0x40257d:       add     r11, r12
0x402f0d:       mov     bl, byte ptr [rdi + 3]
...

The first thing that we notice is that in each case, it takes on byte from [rdi + x], and perform some xor operations on it. Remember that in rdi we have our first argument to the function, the address of our input. So it is taking one byte from our input. More so, it seems to do it in an incremental manner, but starts at different offset each time. So, to find out the correct argument, we probably have to find on which starts to take values from [rdi + 0].

It is the case for 0xc1 !

---------0xc1---------
0x40275d:       shl     r10, 8
0x40226d:       dec     rax
0x4024bd:       mov     bl, byte ptr [rdi]
0x40274d:       mov     bh, bl
0x40171d:       xor     bx, 0x476d
0x40252d:       xor     rbx, rsi
0x40167d:       or      r9, rbx
0x40140d:       add     r9, r10
0x401add:       add     r11, r12
0x402bed:       mov     bl, byte ptr [rdi + 1]
0x401c3d:       mov     bh, bl
0x4024cd:       xor     bx, 0x4341
0x40129d:       xor     rbx, rsi
0x4016ad:       or      r9, rbx
0x4015fd:       add     r9, r10
0x40198d:       add     r11, r12
0x401e5d:       mov     bl, byte ptr [rdi + 2]
0x40256d:       mov     bh, bl
0x4023bd:       xor     bx, 0x52ae
0x40324d:       xor     rbx, rsi
0x401e1d:       or      r9, rbx
0x40182d:       add     r9, r10
0x40257d:       add     r11, r12
0x402f0d:       mov     bl, byte ptr [rdi + 3]

Now we have our value for the argument, time to figure out how it checks the flag.

Finding the correct input

Let’s focus on the instructions following the mov bl, byte ptr [rdi + x], after that we always have a mov bh, bl, so both of the lower bytes of rbx are set to the byte of our input that we are currently checking. Next, it performs a xor bx, 0xXXYY followed by a xor rbx, rsi. First, let’s imagine assume that our goal is to set rbx to 0, which would be the simplest way to check our input - without checking what it does to r9, we can come back to it if needed. Then, because we know rsi value easily - it is the seed variable in our script, we can easily figure out each character by performing a simple 0xXXYY ^ seed.

Let’s modify our python script to this :

from capstone import *

base = 0x40127d


with open("shuffleMe", "rb") as f:
        data = f.read()

md = Cs(CS_ARCH_X86, CS_MODE_64)


def get_flag():
    flag = ""
    val = 0
    seed = 0xc1
    while True:
        #Let's print the first 25 instructions
        seed = ((0xd9 + (0xb5 * seed)) % 0x200)
        addr = base + (seed<<4)
        CODE = data[addr - 0x400000 : addr+20 - 0x400000]
        #We just want one instruction, but we don't know how many bytes it is, so we take more and break after the first instruction printed out
        for dis in md.disasm(CODE, addr):
            if "xor" in dis.mnemonic and "bx" in dis.op_str and "rsi" not in dis.op_str:
                val = int(dis.op_str.split(", ")[1], 16)
            if "rsi" in dis.op_str:
                ch = chr(int(hex((val ^ seed))[-2:], 16))
                #We do this because the result is 2 bytes long, twice the same byte and we just want one
                flag = flag + ch
                if ch == "}":
                    print(flag)
                    return
            break


get_flag()

With this, we get the flag : FCSC{af22cfdd46bfc8105cb28e04988f904049dd60be1245718ffef5b9bbf9b509d5}

We did not check how it calculates the return value to get 0x460000, but that was efficient.