Analyzing Linux x86 shellcodes

In the previous posts, we have looked at creating shellcodes. In this post, I will cover analyses of 3 shellcodes generated using msfvenom. All the shellcodes will be based on Linux x86 architecture. Our aim is to understand how these shellcodes are crafted and what happens in the background i.e. analysis of syscalls, instructions, etc.

Let’s jump directly to the first shellcode.

Linux/x86/exec

Creating a workable exploit

The first one that we are going to look at is Linux/x86/exec payload. We will start with creating a workable exploit for this shellcode. The payload is designed to execute a command provided by the user at the time of creation of payload. The following command generates the payload, which executes “ifconfig”

root@kali:~# msfvenom -p linux/x86/exec CMD=ifconfig -f C
[-] No platform was selected, choosing Msf::Module::Platform::Linux from the payload
[-] No arch selected, selecting arch: x86 from the payload
No encoder or badchars specified, outputting raw payload
Payload size: 44 bytes
Final size of c file: 209 bytes
unsigned char buf[] =
"\x6a\x0b\x58\x99\x52\x66\x68\x2d\x63\x89\xe7\x68\x2f\x73\x68"
"\x00\x68\x2f\x62\x69\x6e\x89\xe3\x52\xe8\x09\x00\x00\x00\x69"
"\x66\x63\x6f\x6e\x66\x69\x67\x00\x57\x53\x89\xe1\xcd\x80";

Note that we have used -f flag to get the output in format which can be directly embedded in our C program.

#include <stdio.h>
#include <string.h>

unsigned char code[] = \
"\x6a\x0b\x58\x99\x52\x66\x68\x2d\x63\x89\xe7\x68\x2f\x73\x68"
"\x00\x68\x2f\x62\x69\x6e\x89\xe3\x52\xe8\x09\x00\x00\x00\x69"
"\x66\x63\x6f\x6e\x66\x69\x67\x00\x57\x53\x89\xe1\xcd\x80";

main()

{
        printf("Executing Shellcode... \n");
        printf("******************************************************************************\n");
        int (*ret)() = (int(*)())code;
        ret();
}

The final step is to compile this using gcc and then run it. We need to add fno-stack-protector to unprotect the stack and execstack to make the stack executable.

root@kali:~/slae/assignments/assignment-5# gcc -fno-stack-protector -z execstack -o linux_x86_exec linux_x86_exec.c
linux_x86_exec.c:9:1: warning: return type defaults to ‘int’ [-Wimplicit-int]
    9 | main()
      | ^~~~

Running the shellcode runs “ifconfig” on my system. The output is as shown.

Now that we have a workable exploit for exec shellcode, let’s move on to the analysis part.

What’s under the hood !

There are multiple ways to analyze the shellcode. I am going to use the following approach for this payload:

Explore the payload using Libemu
Load the program in GDB and perform step by step analysis

Libemu

Libemu is x86 emulator and is pretty good in analysis of shellcode. For our usage, we can feed in raw input from msfvenom to sctest, which is one of the tools in Libemu library. You can research on tool installation, basic usage, etc. I will jump straight to the analysis.

Feeding sctest with raw exec payload from msfvenom gives the following output:

root@kali:~/slae/assignments/assignment-5# msfvenom -p linux/x86/exec CMD=ifconfig -f raw | sctest -v -Ss 1000 |more
[-] No platform was selected, choosing Msf::Module::Platform::Linux from the payload
[-] No arch selected, selecting arch: x86 from the payload
No encoder or badchars specified, outputting raw payload
Payload size: 44 bytes

verbose = 1
execve
int execve (const char *dateiname=00416fc0={/bin/sh}, const char * argv[], const char *envp[]);
cpu error error accessing 0x00000004 not mapped

stepcount 15
int execve (
     const char * dateiname = 0x00416fc0 =>
           = "/bin/sh";
     const char * argv[] = [
           = 0x00416fb0 =>
               = 0x00416fc0 =>
                   = "/bin/sh";
           = 0x00416fb4 =>
               = 0x00416fc8 =>
                   = "-c";
           = 0x00416fb8 =>
               = 0x0041701d =>
                   = "ifconfig";
           = 0x00000000 =>
             none;
     ];
     const char * envp[] = 0x00000000 =>
         none;
) =  0;

As you see, the output is quite clean as it is in C type format. We can deduce the following from the above output:

The syscall being invoked in execve
The program being run is /bin/sh
The argument for this syscall is “/bin/sh -c ifconfig”

As you see, with just a single command, we know how the program was crafted by msfvenom.

To dig deeper and see this shellcode in action, let’s load the executable we generated earlier in GDB.

GDB

GDB will help us step through each instruction and analyse the registers, memory locations, etc. This will provide us deeper understanding.

Using Libemu, we already know that execve syscall is being used, The man reference of execve syscall is as follows:

int execve(const char *pathname, char *const argv[], char *const envp[]);

Based upon information from Linemu analysis, the arguments will map to the following:

Register	Argument	Value
EAX	N.A.	0xb
EBX	*pathname	/bin/sh
ECX	argv[]	Address of “/bin/sh -c ifconfig”
EDX	envp[]	0

Let’s load the program in GDB and disassemble it. The breakpoint was set at code, which is the shellcode.

The above disassembly shows the code for our payload.

Let’s breakup the code for better understanding. The first breakup is as follows:

   0x00404040 <+0>:     push   0xb
   0x00404042 <+2>:     pop    eax
   0x00404043 <+3>:     cdq

The initial instructions push 0xb onto the stack and then pops the value in EAX. So, EAX is loaded with 0xb, which is the syscall number for execve. Then, cdq is used to extend the sign bit of EAX(which is 0) to EDX. So, it sets EDX to NULL.

   0x00404045 <+5>:     pushw  0x632d
   0x00404049 <+9>:     mov    edi,esp

Next, we push null on the stack, followed by 0x632d, which translates to “-c”. The next instruction moves ESP to EDI. So, EDI points to “-c” followed by NULL. The following will make this clearer.

(gdb) break *0x0040404b
Breakpoint 2 at 0x40404b
(gdb) c
Continuing.

Breakpoint 2, 0x0040404b in code ()
(gdb) disassemble $eip,+10
Dump of assembler code from 0x40404b to 0x404055:
=> 0x0040404b <code+11>:        push   0x68732f
   0x00404050 <code+16>:        push   0x6e69622f
End of assembler dump.
(gdb) print /x $edi
$1 = 0xbffff156
(gdb) x/s 0xbffff156
0xbffff156:     "-c"

   0x0040404b <+11>:    push   0x68732f
   0x00404050 <+16>:    push   0x6e69622f
   0x00404055 <+21>:    mov    ebx,esp
   0x00404057 <+23>:    push   edx

The next two instructions push “/bin/sh” onto the stack. Then ESP is moved to EBP (*pathname), which now points to “/bin/sh” followed by NULL.

Dump of assembler code from 0x40404b to 0x404055:
=> 0x0040404b <code+11>:        push   0x68732f
   0x00404050 <code+16>:        push   0x6e69622f
End of assembler dump.
(gdb) stepi
0x00404050 in code ()
(gdb) disassemble $eip,+10
Dump of assembler code from 0x404050 to 0x40405a:
=> 0x00404050 <code+16>:        push   0x6e69622f
   0x00404055 <code+21>:        mov    ebx,esp
   0x00404057 <code+23>:        push   edx
   0x00404058 <code+24>:        call   0x404066 <code+38>
End of assembler dump.
(gdb) print /x $esp
$2 = 0xbffff152
(gdb) x/s 0xbffff152
0xbffff152:     "/sh"
(gdb) stepi
0x00404055 in code ()
(gdb) print /x $esp
$3 = 0xbffff14e
(gdb) x/s 0xbffff14e
0xbffff14e:     "/bin/sh"

0x00404058 <+24>:    call   0x404066 <code+38>
0x0040405d <+29>:    imul   esp,DWORD PTR [esi+0x63],0x69666e6f

The next bit is interesting. We encounter a call instruction. The call instruction pushes the address of next instruction to be executed on the stack and then jumps to the defined address. This means 0x0040405d should be pushed on the stack. This is verified below. Also, upon examining this memory address, we can see that this is the actual command of our payload – ifconfig.

(gdb) stepi
(gdb) disassemble $eip,+10
Dump of assembler code from 0x404066 to 0x404070:
=> 0x00404066 <code+38>:        push   edi
   0x00404067 <code+39>:        push   ebx
   0x00404068 <code+40>:        mov    ecx,esp
   0x0040406a <code+42>:        int    0x80
   0x0040406c <code+44>:        add    BYTE PTR [eax],al
   0x0040406e:  add    BYTE PTR [eax],al
End of assembler dump.
(gdb) print /x $esp
$5 = 0xbffff146
(gdb) x/w 0xbffff146
0xbffff146:     0x0040405d
(gdb) x/s 0x0040405d
0x40405d <code+29>:     "ifconfig"

=> 0x00404066 <code+38>:        push   edi
   0x00404067 <code+39>:        push   ebx
   0x00404068 <code+40>:        mov    ecx,esp
   0x0040406a <code+42>:        int    0x80

The next instructions push EDI on the stack, which points to “-c” and then ebx, which points to “/bin/sh”. ECX is then loaded with ESP. So, to conclude, ECX(argv) now points to “/bin/sh -c ifconfig”

(gdb) disassemble $eip,+10
Dump of assembler code from 0x404066 to 0x404070:
=> 0x00404066 <code+38>:        push   edi
   0x00404067 <code+39>:        push   ebx
   0x00404068 <code+40>:        mov    ecx,esp
   0x0040406a <code+42>:        int    0x80
   0x0040406c <code+44>:        add    BYTE PTR [eax],al
   0x0040406e:  add    BYTE PTR [eax],al
End of assembler dump.
(gdb) break *0x0040406a
Breakpoint 3 at 0x40406a
(gdb) c
Continuing.

Breakpoint 3, 0x0040406a in code ()
(gdb) disassemble $eip,+5
Dump of assembler code from 0x40406a to 0x40406f:
=> 0x0040406a <code+42>:        int    0x80
   0x0040406c <code+44>:        add    BYTE PTR [eax],al
   0x0040406e:  add    BYTE PTR [eax],al
End of assembler dump.
(gdb) print /x $ecx
$6 = 0xbffff13e
(gdb) x/4w 0xbffff13e
0xbffff13e:     0xbffff14e      0xbffff156      0x0040405d      0x00000000
(gdb) x/s 0xbffff14e
0xbffff14e:     "/bin/sh"
(gdb) x/s 0xbffff156
0xbffff156:     "-c"
(gdb) x/s 0x0040405d
0x40405d <code+29>:     "ifconfig"

Continuing the program runs ifconfig. This concludes analysis of Linux/x86/exec shellcode.

Linux/x86/chmod

Creating a workable exploit

The next up for analysis is linux/x86/chmod. This payload is aimed to run chmod on specified file with specified mode. We will use msfvenom to create a payload to change the mode to 0777 on file /root/slae/assignments/assignment-5/permtest.txt

root@kali:~# msfvenom -p linux/x86/chmod FILE=/root/slae/assignments/assignment-5/permtest.txt MODE=0777 -f C
[-] No platform was selected, choosing Msf::Module::Platform::Linux from the payload
[-] No arch selected, selecting arch: x86 from the payload
No encoder or badchars specified, outputting raw payload
Payload size: 73 bytes
Final size of c file: 331 bytes
unsigned char buf[] =
"\x99\x6a\x0f\x58\x52\xe8\x31\x00\x00\x00\x2f\x72\x6f\x6f\x74"
"\x2f\x73\x6c\x61\x65\x2f\x61\x73\x73\x69\x67\x6e\x6d\x65\x6e"
"\x74\x73\x2f\x61\x73\x73\x69\x67\x6e\x6d\x65\x6e\x74\x2d\x35"
"\x2f\x70\x65\x72\x6d\x74\x65\x73\x74\x2e\x74\x78\x74\x00\x5b"
"\x68\xff\x01\x00\x00\x59\xcd\x80\x6a\x01\x58\xcd\x80";

Embedding the above shellcode in our C skeleton exploit and running it (using the same steps as earlier), we see that the permissions of the file have been modified.

root@kali:~/slae/assignments/assignment-5# ls -l permtest.txt
-rw-r--r-- 1 root root 20 Sep 12 07:16 permtest.txt
root@kali:~/slae/assignments/assignment-5# ./linux_x86_chmod
Shellcode length: 7
root@kali:~/slae/assignments/assignment-5# ls -l permtest.txt
-rwxrwxrwx 1 root root 20 Sep 12 07:16 permtest.txt

What’s under the hood !

GDB

For this exploit, we will directly jump into GDB for analysis. Hooking the program and breaking at code brings us to the following screen:

Let’s breakup this code and walk through one section at a time.

Dump of assembler code for function code:
=> 0x00404040 <+0>:     cdq
   0x00404041 <+1>:     push   0xf
   0x00404043 <+3>:     pop    eax
   0x00404044 <+4>:     push   edx

The code starts with cdq instruction that sets EDX to NULL. Then 0xf is moved to EAX using push and pop instructions. This confirms that the syscall being invoked is chmod. The man reference can now be queried to know the parameters needed for this syscall.

int chmod(const char *pathname, mode_t mode);

Pathname points to the file on which changes need to be made. Mode specifies the new bit mask to be set for the file.

Let’s step into the next section of the instruction.

0x00404045 <+5>:     call   0x40407b <code+59>
0x0040404a <+10>:    das

The next instruction is a call instruction. This means that the address of the next instruction to be executed will be pushed on the stack. Though this seems gibberish data, querying it confirms that it holds the path of the file on which changes need to be made.

(gdb) x/s 0x0040404a
0x40404a <code+10>:     "/root/slae/assignments/assignment-5/permtest.txt"

Let’s step into the next instruction

   0x0040407b <+59>:    pop    ebx
   0x0040407c <+60>:    push   0x1ff
   0x00404081 <+65>:    pop    ecx
   0x00404082 <+66>:    int    0x80

The next instruction pops the top of stack (which has the file pathname) in EBX. Then ECX is populated with 0x1ff using push and pop instructions. 0x1ff converts to octal 777, which is the modified file permissions. Post this, the syscall is executed.

The following summarizes the value in registers EAX, EBX and ECX

(gdb) disassemble $eip,+10
Dump of assembler code from 0x404082 to 0x40408c:
=> 0x00404082 <code+66>:        int    0x80
   0x00404084 <code+68>:        push   0x1
   0x00404086 <code+70>:        pop    eax
   0x00404087 <code+71>:        int    0x80
   0x00404089 <code+73>:        add    BYTE PTR [eax],al
   0x0040408b:  add    BYTE PTR [eax],al
End of assembler dump.
(gdb) print /x $eax
$1 = 0xf
(gdb) print /x $ebx
$2 = 0x40404a
(gdb) x/s 0x40404a
0x40404a <code+10>:     "/root/slae/assignments/assignment-5/permtest.txt"
(gdb) print /x $ecx
$3 = 0x1ff

Linux/x86/read_file

Creating a workable exploit

The third and the last payload on which we will perform analysis is linux/x86/read_file. This payload when executed reads a file from the file system and writes it to the specified file descriptor.

In our case, let’s create shellcode to read a file on the local system(/root/slae/assignments/assignment-5/readfile.txt) and print it out on the screen(FD=1).

root@kali:~# msfvenom -p linux/x86/read_file FD=1 PATH=/root/slae/assignments/assignment-5/readfile.txt -f C
[-] No platform was selected, choosing Msf::Module::Platform::Linux from the payload
[-] No arch selected, selecting arch: x86 from the payload
No encoder or badchars specified, outputting raw payload
Payload size: 110 bytes
Final size of c file: 488 bytes
unsigned char buf[] =
"\xeb\x36\xb8\x05\x00\x00\x00\x5b\x31\xc9\xcd\x80\x89\xc3\xb8"
"\x03\x00\x00\x00\x89\xe7\x89\xf9\xba\x00\x10\x00\x00\xcd\x80"
"\x89\xc2\xb8\x04\x00\x00\x00\xbb\x01\x00\x00\x00\xcd\x80\xb8"
"\x01\x00\x00\x00\xbb\x00\x00\x00\x00\xcd\x80\xe8\xc5\xff\xff"
"\xff\x2f\x72\x6f\x6f\x74\x2f\x73\x6c\x61\x65\x2f\x61\x73\x73"
"\x69\x67\x6e\x6d\x65\x6e\x74\x73\x2f\x61\x73\x73\x69\x67\x6e"
"\x6d\x65\x6e\x74\x2d\x35\x2f\x72\x65\x61\x64\x66\x69\x6c\x65"
"\x2e\x74\x78\x74\x00";

Following the same steps as we did in the first shellcode, we need to embed this in our C skeleton exploit code and then compile it. Once run, you can see that the file is read and the output is printed on the screen.

Now that we have a workable exploit, let’s move to the analysis part.

What’s under the hood !

GDB

To start with, the program was hooked with GDB and then a breakpoint was set at code, which is the actual shellcode in our exploit. The complete disassembly is shown below:

(gdb) break *&code
Breakpoint 1 at 0x4040
(gdb) run
Starting program: /root/slae/assignments/assignment-5/linux_x86_read_file
Shellcode length: 4

Breakpoint 1, 0x00404040 in code ()
(gdb) disassemble
Dump of assembler code for function code:
=> 0x00404040 <+0>:     jmp    0x404078 <code+56>
   0x00404042 <+2>:     mov    eax,0x5
   0x00404047 <+7>:     pop    ebx
   0x00404048 <+8>:     xor    ecx,ecx
   0x0040404a <+10>:    int    0x80
   0x0040404c <+12>:    mov    ebx,eax
   0x0040404e <+14>:    mov    eax,0x3
   0x00404053 <+19>:    mov    edi,esp
   0x00404055 <+21>:    mov    ecx,edi
   0x00404057 <+23>:    mov    edx,0x1000
   0x0040405c <+28>:    int    0x80
   0x0040405e <+30>:    mov    edx,eax
   0x00404060 <+32>:    mov    eax,0x4
   0x00404065 <+37>:    mov    ebx,0x1
   0x0040406a <+42>:    int    0x80
   0x0040406c <+44>:    mov    eax,0x1
   0x00404071 <+49>:    mov    ebx,0x0
   0x00404076 <+54>:    int    0x80
   0x00404078 <+56>:    call   0x404042 <code+2>
   0x0040407d <+61>:    das
   0x0040407e <+62>:    jb     0x4040ef
   0x00404080 <+64>:    outs   dx,DWORD PTR ds:[esi]
   0x00404081 <+65>:    je     0x4040b2
   0x00404083 <+67>:    jae    0x4040f1
   0x00404085 <+69>:    popa
   0x00404086 <+70>:    gs das
   0x00404088 <+72>:    popa
   0x00404089 <+73>:    jae    0x4040fe
   0x0040408b <+75>:    imul   esp,DWORD PTR [edi+0x6e],0x746e656d
   0x00404092 <+82>:    jae    0x4040c3
   0x00404094 <+84>:    popa
   0x00404095 <+85>:    jae    0x40410a
   0x00404097 <+87>:    imul   esp,DWORD PTR [edi+0x6e],0x746e656d
   0x0040409e <+94>:    sub    eax,0x65722f35
   0x004040a3 <+99>:    popa
   0x004040a4 <+100>:   imul   bp,WORD PTR fs:[ebp+eiz*2+0x2e],0x7874
   0x004040ac <+108>:   je     0x4040ae <code+110>
   0x004040ae <+110>:   add    BYTE PTR [eax],al
End of assembler dump.

Let’s break this into sections.

Dump of assembler code for function code:
=> 0x00404040 <+0>:     jmp    0x404078 <code+56>
   0x00404042 <+2>:     mov    eax,0x5
   0x00404047 <+7>:     pop    ebx
   0x00404048 <+8>:     xor    ecx,ecx
   0x0040404a <+10>:    int    0x80
   0x0040404c <+12>:    mov    ebx,eax
   0x0040404e <+14>:    mov    eax,0x3
   0x00404053 <+19>:    mov    edi,esp
   0x00404055 <+21>:    mov    ecx,edi
   0x00404057 <+23>:    mov    edx,0x1000
   0x0040405c <+28>:    int    0x80
   0x0040405e <+30>:    mov    edx,eax
   0x00404060 <+32>:    mov    eax,0x4
   0x00404065 <+37>:    mov    ebx,0x1
   0x0040406a <+42>:    int    0x80
   0x0040406c <+44>:    mov    eax,0x1
   0x00404071 <+49>:    mov    ebx,0x0
   0x00404076 <+54>:    int    0x80
   0x00404078 <+56>:    call   0x404042 <code+2>
   0x0040407e <+62>:    jb     0x4040ef

Here is how the first syscall unfolds. The first one is classic JMP-CALL-POP technique. The value is popped in EBX register. If you evaluate the value closely, it is nothing but the file that needs to be read.

(gdb) x/s 0x0040407e
0x40407e <code+62>:     "root/slae/assignments/assignment-5/readfile.txt"

Also, EAX is populated with 0x5, which is the syscall number for open sycall. Post that, ECX is XORed to NULL. The man reference of open syscall is shown below:

int open(const char *pathname, int flags)

Comparing this to the values populated in the registers, we can confirm that EAX is loaded with the syscall number. EBX has pointer to the pathname and the flags is set to 0 using ECX. Post this, the open syscall is initiated and our file is opened.

Now, let’s look at the second syscall.

   0x0040404c <+12>:    mov    ebx,eax
   0x0040404e <+14>:    mov    eax,0x3
   0x00404053 <+19>:    mov    edi,esp
   0x00404055 <+21>:    mov    ecx,edi
   0x00404057 <+23>:    mov    edx,0x1000
   0x0040405c <+28>:    int    0x80

Looking at the value 0x3 being loaded in the EAX, register, we know that this is the read syscall. EBX is first loaded with value in EAX, which the file descriptor returned from the open syscall. The man reference of read syscall is as follows:

 ssize_t read(int fd, void *buf, size_t count);

We already have EAX populated with the sycall number and EBX with fd, which is the file descriptor of the file to be read. The next instructions moves the value of ESP in ECX, which is the second argument – *buf. This points to a valid stack value to read the file into the buffer. In the next instruction, EDX is set to 0x1 for the third argument – count, which translates to 4096 in decimal. After this, the syscall is invoked.

Our next step is to write the output back. In this case, the output is written on the screen. Let’s look at the next syscall to demystify this.

   0x0040405e <+30>:    mov    edx,eax
   0x00404060 <+32>:    mov    eax,0x4
   0x00404065 <+37>:    mov    ebx,0x1
   0x0040406a <+42>:    int    0x80
   0x0040406c <+44>:    mov    eax,0x1
   0x00404071 <+49>:    mov    ebx,0x0
   0x00404076 <+54>:    int    0x80

As we see that EAX is loaded with 0x4, which is the write syscall, let’s look at man reference of this syscall:

ssize_t write(int fd, const void *buf, size_t count);

The first instruction moves value of EAX, which is the return value of last syscall in EDX. This basically populates EDX, which should hold the value of argument count with the number of bytes read (returned from last syscall).

Then EBX is loaded with 1, which is the value of file descriptor fd. That means that the output will be written to STDOUT. Post this, the sycall is invoked and we see the output on the screen.

This blog post has been created for completing the requirements of the SecurityTube Linux Assembly Expert certification: https://www.pentesteracademy.com/course?id=3

Student ID: PA-16521

The code is also stored at GitHub. Thanks for reading !

Linux/x86/exec

Creating a workable exploit

What’s under the hood !

Libemu

GDB

Linux/x86/chmod

Creating a workable exploit

What’s under the hood !

GDB

Linux/x86/read_file

Creating a workable exploit

What’s under the hood !

GDB

Leave a Reply Cancel reply