🗃️ Reversing a Go stripped binary
You are a brand new member of the Gophers team wohoo! But before you become an official member, you need to check the Gopher Archive and see if you know someone in there so that you can talk to them. One small issue: You weren’t given the password to that archive, just the archive itself. It shouldn’t be a problem for you to find the password to that archive thanks to your Gopher reverse engineering talents!
I chose to write a long blog post about this challenge as it is a Go binary, which isn’t faced a lot in CTFs, and it is also stripped - which makes it harder. With that explained and the description of the challenge above, let’s get right into reversing it
File Analysis
When first downloading the file the main idea is to check what kind of file we are facing:
krypton@spacelab:~$ file gopher-archive gopher-archive: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, Go BuildID=DZY5-NZWffdx_11K_RuZ/ecYcwlsBPPJC4gB4vUTq/R6IjWjgUnvavmgqByNvV/V2BcxgTPHcU3p7JRtY_J, stripped |
As we can see, it is a ‘normal’ linux executable file. One special thing is that we can see it’s a Go build, so the file is a Go binary. We can also see that the binary is stripped, meaning that it will most likely be horrible to read the assembly code and recognize what the binary does.
When executing the binary we have a normal welcome message, with a Gopher ASCII art. It is asking for a password to access the archive. We can try some overflow but that of course does not work, and in the end it’s a reversing challenge; not a pwn challenge
First look at disassembled program
Functions
Let’s open up the binary in a disassembler; as expected it’s kind of a mess due to the binary being stripped. We see lots of sub_XXXXXX
function names and none of them directly seem to look like the main
function of the binary. We however see a _start
function but once again, that doesn’t look like it’s the main
function.
Strings
In Go binaries strings are handled differently than C binaries, a pointer to the string is first loaded and then the length we need is moved in a register, as per Go’s source code:
1 | type stringStruct struct { str unsafe.Pointer len int } |
Looking at the strings, when searching for the string that appears when the password entered was incorrect we get the following:
810668945312588817841970012523233890533447265625[ @f | @)) | | @)) l 0 _/ \nSorry, that password is incorrect! |
Our string is inside that one, but it’s definitely not used only at the place where we print the error message. Looking at the cross references of that string:
We see that it’s used at only 4 places. We can now use GDB to place a breakpoint on each of the calls and see if it reaches the breakpoint before printing the welcome message.
Debugging
Finding the main function
Once we reach a breakpoint before the welcome message gets printed, we can simply use ni
to get to the next instruction and hold Enter pressed until we see the welcome message with user prompt, here we must have called the main function. And indeed, before printing the welome message and asking for user input we had two important instructions:
1 | 0x4347c2 mov rax, qword [rel data_4aa980] {sub_489920} 0x4347c9 lea rdx, [rel data_4aa980] ; Less important 0x4347d0 call rax {sub_489920} |
We make a call to $rax
which has the value of the address of the sub_489920
function. With that information, we’ve found the main function!
Disassembling the main function
Understanding the logic
Once we’ve found the main function we need to understand how the password checking system works. Now we can go inside the main function (at 0x489920
) and look at the assembly code there.
We first see lots of calls being made one after the other to always the same function. The graph view confirms that there are no checks being made and the functions are just called with no specific behavior. We have a Gopher ASCII art printed when running the file, we can safely consider these calls as being responsible for printing this text. So we can rename the function sub_484100
to print
and not look at its content, as it’s not what we care about.
Once we come at the end of that list of calls, we see two calls followed by a condition. If the condition succeeded we go and print a single time. If the condition failed we go and print twice. So out of the error message we can say that left is the error message and right the success message. Now we just have to look at the functions sub_44afc0
and sub_48a3c0
to see what they do. One, or maybe both, will be the password check.
Debugging the two unknown functions
Second unknown function
Let’s put a breakpoint on the first function, sub_44afc0
. We don’t hit it before having to put user input, so it’s not responsible of asking for user input. But let’s continue the debugging and use ni
to get to the next function call. After calling the function we can look at the content of $al
to see if the condition will pass or fail.
gef➤ p $al
$1 = 0x0 |
So the condition will succeed (0x0 & 0x0 = 0x0
) and we will jump to 0x489df1
. We can see that the function sub_48a3c0
will check if the password is valid, so let’s rename it to isPasswordValid
.
First unknown function
Before jumping in the isPasswordValid
function we need to find where the user input is asked, just to gain more debugging experience
We have confirmed that it’s not the function above the password checking function, so let’s break one function call higher - at 0x489d2d
. We run the program, then use ni
and now we get the user input. After giving it we see our user input in the $rbx
register.
gef➤ x/s $rbx 0xc0000b2000: "1337test\n" |
So our previously unknown function is, for our reversing task, kind of useless. But we can go on and rename the function above, sub_469900
, to handleUserInput
.
Disassembling the isPasswordValid function
String splitting
When first looking at the function we see a call to sub_468740
and before that we load the string "-./05:<=?CLMPSU..."
in the $rcx
register. As per Go’s way of handling strings, we see the length of the string we will need for the next call getting moved in the $edi
register:
1 | 0x48a3dd lea rcx, [rel data_4a1bdd] {"-./05:<=?CLMPSUZ[]`hms{} + @ P […"} 0x48a3e4 mov edi, 0x1 |
So we can see that the string used will be just "-"
. After that the call to the function is made and we can see that there is a comparison:
1 | 0x48a3f7 cmp rbx, 0x2 |
Our string is getting splitted with "-"
and checked if there are two strings at the end! We can confirm this supposition by using the debugger:
$rax : 0x0000c0000b6040 → 0x0000c0000bc000 → "hello-world-123-456" $rbx : 0x4 |
Here we have 4 elements after the string is getting splitted. We can rename the function sub_468740
to splitString
.
Strings length
Continuing witht the assembly code, we have previously seen that it’s doing a check if the elements are two, so we know our password will be something like XYZ-ZYX
.
Looking at the next conditions, we see two checks made with 0x6
, we can assume each part of the password will need to have a length of 6. So something like 123456-123456
. We can confirm this with the debugger but I feel like I don’t need to show how it’s being done.
Moving forward we can see two functions being called; sub_48a0e0
and sub_48a2a0
. And as always their return value in $al
is being checked. Considering we have two strings part, we can guess the first call will check the first part of the string and the second call will check the second part of the string.
Debugging first string validation function
Disassembling the rough logic
The first function, sub_48a0e0
, is responsible for validating the first part of the password, we can rename it. Looking at it with the disassembler we don’t get a lot of information because there are lots of calls being made and at the end a check with a 0x28
long string - "df4c2d865b38db152e7f6bb4ad2f325de9570185"
.
1 | 0x48a222 lea rbx, [rel data_4a84fe] {"df4c2d865b38db152e7f6bb4ad2f325d…"} 0x48a229 mov ecx, 0x28 0x48a22e call compare 0x48a233 test al, al 0x48a235 je 0x48a248 |
I renamed the function sub_402940
to compare
as it’s pretty obvious by looking at the disassembled code of it. Now let’s jump into a debugging session to see what this function does.
String getting lowered
When breaking at 0x48a0e0
and going one instruction after the other, we can see that after calling the function sub_468fc0
our first part of the string got lowered:
$rax : 0x0000c000138010 → 0x00616161616161 ("abcdef"?) |
So we can now rename this function, we don’t need to dig deep into how that function works.
String getting reversed
After continuing we see nothing really big going on until we face a call at 0x48a1a9
on the function sub_44b240
that will pass the "abcdef"
string as an argument.
Looking at the results after the call, we see in $rax
register "fedcban:/bin:/snap/bin:/"
. Do you see something special? Yes! We can see that our abcdef
string got reversed to fedcba
- remember how strings are handled. Another function to rename :D
Loop over characters in string
If we continue with the next instructions we see we stumble into a loop, it can also be seen in the graph view:
Looking at it closer we see that we first give a value to the $edx
register to be 0x1ca3
(7331). After that we jump to 0x48a1d2
and compare if $rbx
is equals to $rcx
if it is, then we jump to 0x48a213
and exit the loop. Otherwise we continue and load into the $rdi
register $rcx+0x1
, this is a counter. Considering $rbx
contains the length of the string, we can say this loop goes over every character of the string.
Now if we look at what exactly the loop does, we see the following assembly code:
1 | 0x48a1c2 movsxd rsi, esi 0x48a1c5 imul rcx, rsi 0x48a1c9 add rsi, rcx 0x48a1cc add rdx, rsi 0x48a1cf mov rcx, rdi ; Not really useful 0x48a1d2 mov qword [rsp+0x28 {var_60_1}], rdx |
Here is the logic explained when using ABCDEF-GHIJKL as password and i
being 1 which means we are iterating over e (remember, it’s reversed):
$esi
contains the hex value of the current character and it is being moved to$rsi
($rsi : 0x65
)- We multiply
$rsi
with$rcx
which contains the current value ofi
(0x65*0x1
is stored in$rcx
) - We add to
$rsi
the result of the multiplication (0x65+0x65
is stored in$rsi
) - We add to
$rdx
the value of$rsi
(0x1d09+0xca
is stored in$rdx
) - This resulting value is then stored in
$rsp+0x28
- We check if
$rsp+0x28
is equals to0x256c
more below after calling another unknown function
So we can resume this loop as being like the following:
1 | x = 0x1ca3 while (True): if i >= len(s): break x += ord(s[i]) + (i * ord(s[i])) i += 0x1 if x == 0x256c: # Do something |
Hashing function
When two functions are called one after the other, the return value of the first function is passed as an argument to the second automatically, you will see this in the incoming steps.
As previously said, there is another unknown function going on (sub_489e60
). Right after calling it, it is being checked if the length of the resulting string is 0x28
which is a 40, the only hashing algorithm I know that has a resulting length of 40 is SHA1. So let’s try to hash fedcba
with SHA1.
The string coming out of the function is "82639466494baa873844ae6cdc593cd54b5c054e"
and our input SHA1 hashed is "122fe41b518797f9474d5f6f4665e411c449512c"
. That doesn’t look like it’s the same, maybe we should go deeper in the function, because Google confirmed SHA1 has a length of 40 characters. When putting a breakpoint on a function call we can use si
to jump in the function itself. We enter a loop once again, this time a new string is getting generated by the loop:
$rax : 0x0000c0001b0000 → "8df67bd6d5e123d52a9a" $rax : 0x0000c0001b0000 → "8df67bd6d5e123d52a9a9" ... $rax : 0x0000c0001b0000 → "8df67bd6d5e123d52a9a966bb31cf719" |
If we continue with the loop we see a new function called with string above as argument. Looking at the generated string, it’s the SHA1 value above starting with 826
. So what is happening during the first part? The resulting string has a length of 32 characters, this is MD5. So basically what is happening is result = SHA1(MD5(string))
. We can confirm it easily by doing it ourselves and doing a SHA1 hash of "fedcba"
and then make a MD5 hash of the resulting SHA1 hash.
Phew! That’s it for this hashing function, we can rename it to something like md5sha1
.
Comparison
At the end we check if the resulting string is equals to the value in $rbx
and take 0x28
characters of that string.
$rbx : 0x000000004a84fe → "df4c2d865b38db152e7f6bb4ad2f325de9570185failed to [...]" $rcx : 0x28 |
So we compare if MD5(SHA1(string))
is equals to "df4c2d865b38db152e7f6bb4ad2f325de9570185"
.
Finally! The first check is reverse engineered.
Debugging second string validation function
Disassembling the logic
The second check function is sub_48a2a0
, so let’s rename it.
Looking at it we have some similar pattern to the previous one:
1 | 0x48a2c0 call lower ; ... 0x48a360 call reverse 0x48a365 call md5sha1 ; ... 0x48a374 lea rbx, [rel data_4a845e] {"a2ce7d4c220df20b186f5458d9449a56…"} 0x48a37b mov ecx, 0x28 0x48a380 call compare |
Phew! We have basically everything needed to already finish the reversing of this function.
It does everything the first function does, just without the loop!
Generating the flag
We are now finished with reversing the functions and we can start generating the flag!
First string
The first string validity in TL;DR is:
- Reverse the string
- Add the hex values of each character to a counter
- That counter, add the hex values of each character multiplied by the current index of the counter
- If the counter is equals 9580 the first check is valid
- If the reversed string md5 hashed and then sha1 hashed is equals to ?
"df4c2d865b38db152e7f6bb4ad2f325de9570185"
the second check is valid - If both checks are valid, the first string is valid
Second string
The second string validity in TL;DR is:
- Reverse the string
- If the reversed string md5 hashed and then sha1 hashed is equals to
"a2ce7d4c220df20b186f5458d9449a56e0e36149"
the check is valid - If the check is valid, the second string is valid
solver.py
The entire code for the solver would be the following:
1 | def solve_first() -> str: from string import ascii_lowercase import itertools import hashlib def iter(): for s in itertools.product(ascii_lowercase, repeat=6): yield "".join(s) for s in iter(): s = s[::-1] x = 7331 i = 0 while (True): if (i >= len(s)): break x += ord(s[i]) + (i * ord(s[i])) i += 1 print(f"Trying {s[::-1]}", end="\r") if (x == 9580) and (hashlib.sha1(hashlib.md5(s.encode()).hexdigest().encode()).hexdigest() == 'df4c2d865b38db152e7f6bb4ad2f325de9570185'): return s def solve_second() -> str: import hashlib with open('rockyou.txt', 'r', encoding="latin-1") as f: for l in f.read().splitlines(): print(f"Trying {l}", end="\r") reversed = l[::-1] if (hashlib.sha1(hashlib.md5(reversed.encode()).hexdigest().encode()).hexdigest() == 'a2ce7d4c220df20b186f5458d9449a56e0e36149'): return l first = solve_first() print(f"Found first string: {first[::-1]}") second = solve_second() print(f"Found second string: {second}") print(f"\nFlag/password: {first}-{second}") |
I could’ve used a wordlist in the first check but I decided to two different ways of solving it, even though it’s really slow. I also tried with itertools.permutations
and using multiprocessing
but it wasn’t really faster at getting the permutations. If you want to try yourself you can edit the first check by downloading, for example, a wordlist of English words and make a wordlist bruteforce just like the second check - that’s what I did to actually get the string.
Conclusion
If you read the post in its entirety until here, you are amazing!
It was an overall awesome challenge and I’ve definitely learned a lot, not only because it was stripped but also because it was a Go binary and not a usual C binary. Took me lots of time to make this writeup and solve the challenge but I am happy about it :) Hopefully you enjoyed this challenge and if you want to try it yourself you can download the binary and have a look at it!
~ Krypton 🐺