Nothing Escapes The Simulation

May 14, 2025

Overview

In this article, I discuss using AFL++ to perform instrumented fuzzing for discovering memory bugs in the Astro-8 emulator.

Introduction

One of my favorite topics is emulation, and I especially enjoy the ability to play old games developed for hardware that is no longer readily available. However, the possibilities for emulation are endless and even allow people to run systems that have never existed in physical form. A cool example of this is Astro-8 which is a custom computer design. It defines specifications for a 16-bit CPU architecture, RAM, and video display.

As the hardware for Astro-8 has never been physically built, the only way to experience it is through emulation. You could recreate the individual logic components according to Astro-8’s specifications in a tool such as Logisim Evolution. However, the best way to actually enjoy programs and games written for Astro-8 is through high-level, dedicated emulation software. There are several such programs designed to run Astro-8, but this article will focus on the Astro-8 developer’s own emulator software written in C++.

When it comes to emulation, one of the main security concerns is containing the emulated software. Even security-concious people are much more likely to run untrusted software in an emulator, believing that the programs are perfectly isolated inside of the emulator’s execution environment. Often, this is a reasonable assumption, especially if the emulator does not allow programs to have direct host file access or make network connections. But what if there was a way for a malicious program to escape the emulator and run malicious software directly on the host? A somewhat famous example of this is an alleged (we have never seen a proof-of-concept for this) vulnerability in ZSNES as demonstrated in this video.

There are quite a few ways to escape an emulator’s sandbox. For example, Bishop Fox discovered a way to escape RetroArch emulators through a command injection vulnerability in how text-to-speech is handled on Windows RetroArch builds. For emulators developed in languages that are not memory safe like C and C++, another possibility is finding memory bugs like buffer overflows. This could potentially be used to trick the host into running arbitrary commands and downloading malware.

There are a variety of techniques to search for these memory bugs, but my favorite way is through fuzzing which involves mutating inputs (in the case of emulators, usually ROM files) and checking if the program crashes. Programs like AFL++ are capable of instrumented fuzzing, which goes a step further and monitors the program’s execution as the files are mutated. This allows the fuzzer to track how the mutations affect the program logic and discover new paths through the program that may result in further crashes. There are many reasons a program might crash, but what we are interested in is crashes caused by potentially-exploitable memory bugs.

Fuzzing System Setup

You can fuzz programs on very minimal computer hardware, but the more CPU threads your system has, the more parallel AFL++ instances that can be run, and the faster crashes can be found. For this setup, we’ll use a Debian VM running inside of an R710 server with 2x L5640 CPUs. This gives the system a total of 24 execution threads, and we’ll use 22 of them (and leave the last 2 for system overhead and copying files around).

We also want to be mindful that AFL++ does an enormous amount of system writes which may damage SSDs with long enough fuzz times. We’re going to use an HDD to store test cases and crashes (which will take up lots of space) and then try to keep everything else in system memory.

Clean Build

We’ll create an ~/astro8 folder as a working directory. Inside of astro8, We’ll create a builds folder to store different builds.

Now, let’s download the latest code for the Astro-8 emulator. This emulator can be found in the Astro8-Computer repo that also stores examples and the Logisim Evolution circuit.

$ git clone https://github.com/sam-astro/Astro8-Computer.git

We’ll rename the repo to 41fb1a5_clean to track the fact that this is on commit 41fb1a5 (the latest at the time of writing) and that it is a clean build without instrumentation or modification.

According to the README.md, we need to navigate to 41fb1a5_clean/Astro8-Emulator/linux-build, run cmake and then run the generated Makefile with make.

$ cd 41fb1a5_clean/Astro8-Emulator/linux-build
$ cmake ..
$ make

And we get an error, which is definitely not supposed to happen on our clean build:

/home/elelzedel/astro8/builds/41fb1a5_clean/Astro8-Emulator/main.cpp:2426:9: error: expected unqualified-id before ‘for’
 2426 |         for (int ins = 1; ins < (sizeof(instructioncodes) / sizeof(instructioncodes[0])); ins++)  // Iterate through all definitions of instructions

Okay, looking at the code there’s a stray } on line 2420 so we’ll remove that and try again. This time, it generates warnings but successfully builds:

[ 33%] Building CXX object CMakeFiles/astro8.dir/main.cpp.o
/home/elelzedel/astro8/builds/41fb1a5_clean/Astro8-Emulator/main.cpp: In function ‘int main(int, char**)’:
/home/elelzedel/astro8/builds/41fb1a5_clean/Astro8-Emulator/main.cpp:1148:71: warning: comparison between ‘enum SDL_Scancode’ and ‘enum SDL_KeyCode’ [-Wenum-compare]
 1148 |                                         if (event.key.keysym.scancode == SDLK_g) {
      |                                             ~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~
/home/elelzedel/astro8/builds/41fb1a5_clean/Astro8-Emulator/main.cpp: In function ‘void Update()’:
/home/elelzedel/astro8/builds/41fb1a5_clean/Astro8-Emulator/main.cpp:1285:33: warning: statement will never be executed [-Wswitch-unreachable]
 1285 |                                 [[likely]]
      |                                 ^~~~~~~~~~
[ 66%] Building CXX object CMakeFiles/astro8.dir/strops.cpp.o
/home/elelzedel/astro8/builds/41fb1a5_clean/Astro8-Emulator/strops.cpp: In function ‘std::string JoinArrayPieces(std::string*)’:
/home/elelzedel/astro8/builds/41fb1a5_clean/Astro8-Emulator/strops.cpp:7:36: warning: ‘sizeof’ on array function parameter ‘input’ will return size of ‘std::string*’ {aka ‘std::__cxx11::basic_string<char>*’} [-Wsizeof-array-argument]
    7 |         for (int i = 0; i < sizeof(input) / sizeof(input[0]); i++) {
      |                                   ~^~~~~~
/home/elelzedel/astro8/builds/41fb1a5_clean/Astro8-Emulator/strops.cpp:4:41: note: declared here
    4 | std::string JoinArrayPieces(std::string input[])
      |                             ~~~~~~~~~~~~^~~~~~~
[100%] Linking CXX executable astro8
[100%] Built target astro8

We perform a clean build so that later we can test our crashes against an unmodified version of the application, and also to make sure that it builds correctly before we start modifying it.

Let’s test it against one of the example programs:

$ ./astro8 ../../example_assembly_programs/hello_world.txt

Hello World

Whew, it works!

Instrumenting the Program

In order for AFL++ to track how different inputs progress through the program code, it needs to be instrumented.

AFL++ has a bunch of different ways to instrument applications and is even capable of instrumenting binary-only apps for when you don’t have access to source code. However, if you do have access to source code (which we do) the best way to instrument the program is to build it using afl-cc which allows you to build the binary using compilers like Clang and GCC but with instrumentation compiled into the target.

First, we will download, build, and install AFL++ using the official guide.

$ git clone https://github.com/AFLplusplus/AFLplusplus
$ cd AFLplusplus
$ make source-only
$ sudo make install

This AFL++ documentation details how to choose the best compiler options for your target. Because we have LLVM 15 on our system, we will use afl-clang-lto and afl-clang-lto++ which symlinks to afl-cc and tells it to instrument the binary with Clang in LTO mode.

Let us clone a fresh copy of the Astro8-Computer to 41fb1a5_harness to store our instrumented version of the Astro-8 Emulator and modifications we will make to it later to optimize it. Navigating to 41fb1a5_harness/Astro8-Emulator/linux-build, we simply need to tell CMake to use afl-clang-lto and afl-clang-lto++ as the C and C++ compilers respectively:

$ CC=afl-clang-lto CXX=afl-clang-lto++ cmake ..

Now, we’ll build it again:

$ make

# SNIP...
[+] Instrumented 12713 locations (206 selects) (non-hardened mode).
[100%] Built target astro8

It compiled the program like last time, but now it has instrumentation compiled into it and AFL++ will be able to run it and track its behavior.

Initial Test Cases

We could actually just give AFL++ a blank file to use as a starting input. It would then keep adding bytes, modifying bytes, and exploring how those mutations change the program’s behavior. Given enough time, this could actually result in AFL++ “learning” how to build valid programs that the Astro-8 Emulator will accept and run.

A much better option, however, is to use valid files that already result in covering a lot of code paths in the Astro-8 Emulator. Then, AFL++ has a much better starting point to form new test cases by mutating these initially-valid files. Thankfully, the Astro8-Computer repo contains a bunch of demo programs that will be perfect to use as a set of initial test cases to mutate.

There are two main types of inputs the Astro-8 Emulator will accept. It will accept raw Astro-8 assembly code and as well as Armstrong code, which is a custom language developed to make building programs for Astro-8 easier. Both of these are plaintext files, but Armstrong programs have #AS at the top to let the emulator know it needs to be compiled first. There’s a lot of good potential for memory bugs in the process of compiling Armstrong code so we definitely want to use both types of files as inputs.

We’ll copy all of the example_armstrong_programs and example_assembly_programs into a new input folder in our astro8 base folder.

$ mkdir ~/astro8/input
$ cp \
      ~/astro8/builds/41fb1a5_clean/example_armstrong_programs/*.arm \
      ~/astro8/builds/41fb1a5_clean/example_assembly_programs/*.txt \
      ~/astro8/input/

First Instrumented Run

Let’s create an output folder in the root astro8 folder and run afl-fuzz with our newly instrumented binary:

$ mkdir ~/astro8/output
$ afl-fuzz \
      -i ~/astro8/input \
      -o ~/astro8/output \
      -- \
      ~/astro8/builds/41fb1a5_harness/Astro8-Emulator/linux-build/astro8 @@

This command tells afl-fuzz our input and output, folder locations followed by a --, folled by the command to run the instrumented binary. The @@ tells AFL++ to substitute the test case file for this parameter as Astro-8 expects the first parameter to be the input program.

After that, we’re off to the races!

[-] The program took more than 1000 ms to process one of the initial test cases.
    This is bad news; raising the limit with the -t option is possible, but
    will probably make the fuzzing process extremely slow.

    If this test case is just a fluke, the other option is to just avoid it
    altogether, and find one that is less of a CPU hog.

[-] PROGRAM ABORT : Test case 'id:000001,time:0,execs:0,orig:NEGATIVE.arm' results in a timeout
         Location : perform_dry_run(), src/afl-fuzz-init.c:1018

Actually no, AFL++ almost immediately aborts. The reason for this is that it does a clean run against the initial test cases to make sure the program is working correctly. The error indicates that the first test case it ran timed out. The problem is that the Astro-8 program will run indefinitely (unless the user quits) and AFL++ will treat this as a hang.

We need to modify the Astro-8 emulator (specifically the version in our 41fb1a5_harness folder) so that it runs the program for a limited period of time and then exits. Reviewing the code in int main(...), we see a bunch of initialization code and then what appears to be the main emulation loop:

while (running) {
  // Statistics printing code
  // CPU emulation code
}

What if we just turned that into a for loop that only runs for a few cycles.

for (int cycles = 0; cycles < 63; cycles++) {
  // Statistics printing code
  // CPU emulation code
}

We’ll make that change, run make again, and rerun our `afl-fuzz`` command, but with a really long timeout:

$ afl-fuzz \
      -i ~/astro8/input \
      -o ~/astro8/output \
      -t 3000 \
      -- \
      ~/astro8/builds/41fb1a5_harness/Astro8-Emulator/linux-build/astro8 @@

This time, AFL++ gets past the calibration phase and into fuzzing the application!

 AFL ++4.32c {default} (...ness/Astro8-Emulator/linux-build/astro8) [explore] 
┌─ process timing ────────────────────────────────────┬─ overall results ────┐
│        run time : 0 days, 0 hrs, 0 min, 2 sec       │  cycles done : 0     │
│   last new find : none seen yet                     │ corpus count : 29    │
│last saved crash : none seen yet                     │saved crashes : 0     │
│ last saved hang : none seen yet                     │  saved hangs : 0     │
├─ cycle progress ─────────────────────┬─ map coverage┴──────────────────────┤
│  now processing : 14.0 (48.3%)       │    map density : 6.56% / 19.28%     │
│  runs timed out : 0 (0.00%)          │ count coverage : 1.93 bits/tuple    │
├─ stage progress ─────────────────────┼─ findings in depth ─────────────────┤
│  now trying : inference              │ favored items : 11 (37.93%)         │
│ stage execs : 0/5 (0.00%)            │  new edges on : 12 (41.38%)         │
│ total execs : 249                    │ total crashes : 0 (0 saved)         │
│  exec speed : 1.68/sec (zzzz...)     │  total tmouts : 0 (0 saved)         │
├─ fuzzing strategy yields ────────────┴─────────────┬─ item geometry ───────┤
│   bit flips : 0/0, 0/0, 0/0                        │    levels : 1         │
│  byte flips : 0/0, 0/0, 0/0                        │   pending : 20        │
│ arithmetics : 0/0, 0/0, 0/0                        │  pend fav : 11        │
│  known ints : 0/0, 0/0, 0/0                        │ own finds : 0         │
│  dictionary : 0/0, 0/0, 0/0, 0/0                   │  imported : 0         │
│havoc/splice : 0/0, 0/0                             │ stability : 99.67%    │
│py/custom/rq : unused, unused, unused, unused       ├───────────────────────┘
│    trim/eff : n/a, n/a                             │          [cpu000: 16%]
└─ strategy: explore ────────── state: started :-) ──┘

Optimizing

With our super basic harness we are getting ~1.7 executions per second. If we run 22 parallel instances we should get about 37 executions per second. This can definitely start getting coverage but this is considered very slow. Let’s see what we can do to make our harness run faster without preventing it from getting interesting coverage.

For example, look at this code:

// Attempt to parse assembly, will throw error if not proper assembly
if (!runAstroExecutable) {
    try {
        // Generate memory from code and convert from hex to decimal
        parseCode(code);
        vector<vector<std::string>> mbytes = parseCode(code);
        for (int membank = 0; membank < mbytes.size(); membank++)
            for (int memindex = 0; memindex < mbytes[membank].size(); memindex++)
                memoryBytes[membank][memindex] = (HexToDec(mbytes[membank][memindex]));

        // Store memory into an .AEXE file
        std::ofstream f(projectDirectory + programName + ".aexe");
        f << "ASTRO-8 AEXE Executable file" << '\n';
        f << (usingKeyboard == true ? "1" : "0");
        f << (usingWebcam == true ? "1" : "0");
        f << (usingMouse == true ? "1" : "0");
        f << (verbose == true ? "1" : "0");
        f << "," << std::to_string(target_cpu_freq);
        f << '\n';
        for (vector<string>::const_iterator i = mbytes[0].begin(); i != mbytes[0].end(); ++i) {
            f << *i << '\n';
        }
        f.close();
        PrintColored("Binary executable written to " + projectDirectory + programName + ".aexe\n", whiteFGColor, "");
    }

    // SNIP...
}

The program parses assembly (which may have been directly loaded from the input file or compiled from Armstrong code) and then writes it to the filesystem as an .aexe file. This is a slow operation and not something we care about for our fuzzing, let’s just comment out those file write operations:

// Attempt to parse assembly, will throw error if not proper assembly
if (!runAstroExecutable) {
    try {
        // Generate memory from code and convert from hex to decimal
        parseCode(code);
        vector<vector<std::string>> mbytes = parseCode(code);
        for (int membank = 0; membank < mbytes.size(); membank++)
            for (int memindex = 0; memindex < mbytes[membank].size(); memindex++)
                memoryBytes[membank][memindex] = (HexToDec(mbytes[membank][memindex]));

        // Store memory into an .AEXE file
        //std::ofstream f(projectDirectory + programName + ".aexe");
        //f << "ASTRO-8 AEXE Executable file" << '\n';
        //f << (usingKeyboard == true ? "1" : "0");
        //f << (usingWebcam == true ? "1" : "0");
        //f << (usingMouse == true ? "1" : "0");
        //f << (verbose == true ? "1" : "0");
        //f << "," << std::to_string(target_cpu_freq);
        //f << '\n';
        //for (vector<string>::const_iterator i = mbytes[0].begin(); i != mbytes[0].end(); ++i) {
        //    f << *i << '\n';
        //}
        //f.close();
        //PrintColored("Binary executable written to " + projectDirectory + programName + ".aexe\n", whiteFGColor, "");
    }

Changes like this will make slight improvements in performance, but if we want a massive improvement in performance we need to run AFL++ in persistent mode. Currently AFL++ has to do the following for every test case:

AFL++ creates a new test case
AFL++ writes the test case to the disk
AFL++ runs the Astro-8 Emulator harness program
Astro-8 Emulator has to reinitialize everything
Astro-8 Emulator loads the test case from the disk
Astro-8 Emulator runs the test case
Astro-8 Emulator cleans up everything

What if we could keep the program running (unless a test case causes it to crash) and just keep accepting new test cases from AFL++ directly from memory instead of loading it from the disk.

AFL++ starts the Astro-8 Emulator
Astro-8 Emulator initializes everything
Astro-8 Emulator loops the following:
- Astro-8 Emulator requests a new test case from AFL++
- AFL++ creates a new test case
- AFL++ directly gives the new test case to Astro-8 in memory
- Astro-8 Emulator runs the test case
- Astro-8 Emulator cleans up so that it is in a fresh state for the next test case

This is what persistent mode does. It keeps the program running and passes test cases through memory. The downside is that it requires quite a bit of tweaking to make the program work.

First, we need to put __AFL_FUZZ_INIT() in after the includes of main.cpp. This will tell AFL++ that we are using persistent mode and to add some initialization code.

#include <limits.h>

#include <algorithm>
#include <chrono>
#include <string>
#include <vector>
//#include <SDL.h>
//#include <SDL_mixer.h>
#include <time.h>

#include <codecvt>
#include <filesystem>
#include <fstream>
#include <map>
#include <queue>
#include <sstream>

#include "VSSynth/VSSynth.h"
#include "armstrong-compiler.h"
#include "processing.h"

__AFL_FUZZ_INIT();

Now, we need to put a loop into int main(...) to iterate over test cases and rewrite it to load the assembly/Armstrong in from AFL++ instead of from a file.

We need to tweak a lot of the code to make sure that as much program initialization happens outside of the test case loop as possible and also that the program returns to a clean state between test cases. Until we get all of that working, the program will probably crash from inconsistent states between runs (e.g. attempting to use a pointer that got freed in a previous run).

Before we implement all the AFL++ persistent mode code, let’s run the program with GDB instead of AFL++ so that we can debug the crashes. Then, we’ll write the program so it iterates over a clean, working input file. This will help us debug the harness itself and confirm that the program creates a new clean state for each test case.

After a few iterations this is what int main(...) looks like now:

int main(int argc, char** argv) {
	InitGraphics("Astro-8 Emulator", 108, 108, 5);

	int pixnum = GenerateCharacterROM();
	GenerateMicrocode();
	GenerateAsciiSdciiTables();

	// Channel A
	Tone t0(
		[](double frequency, double time) {
			return Waveforms::square(
				frequency,
				time,
				0);	 // LFO
		});
	tone[0] = &t0;
	(*tone[0]).setVolume(30);
	// Channel B
	Tone t1(
		[](double frequency, double time) {
			return Waveforms::square(
				frequency,
				time,
				0);	 // LFO
		});
	tone[1] = &t1;
	(*tone[1]).setVolume(30);
	// Channel C
	Tone t2(
		[](double frequency, double time) {
			return Waveforms::triangle(
				frequency,
				time,
				0);	 // LFO
		});
	tone[2] = &t2;
	(*tone[2]).setVolume(30);
	// Channel D
	Tone t3(
		[](double frequency, double time) {
			return Waveforms::noise();
		});
	tone[3] = &t3;
	(*tone[3]).setVolume(30);



	for (int testCaseI = 0; testCaseI < 128; testCaseI++) {
		// Open and read file
		std::string li;
		ifstream fileStr("/home/elelzedel/astro8/input/powder-sim.arm");

		std::string code = "";

		if (fileStr.is_open())
		{
			while (getline(fileStr, li)) {
				code += trim(li) + "\n";
			}
			fileStr.close();
		}
		else {
			PrintColored("\nError: could not open file ", redFGColor, "");
			cout << "\n\nPress Enter to Exit...";
			cin.ignore();
			exit(1);
		}

		// Fill the memory
		memoryBytes = vector<vector<uint16_t>>(6, vector<uint16_t>(65535, 0));
		videoBuffer = vector<vector<uint16_t>>(2, vector<uint16_t>(11990, 0));

		//// Fill video buffers with random data to emulate real ram chip
		std::srand(time(0));
		std::generate(videoBuffer[0].begin(), videoBuffer[0].end(), std::rand);
		std::generate(videoBuffer[1].begin(), videoBuffer[1].end(), std::rand);
		videoBuffer[1][148] = 14;
		videoBuffer[1][149] = 27;
		videoBuffer[1][150] = 27;
		videoBuffer[1][151] = 32;
		videoBuffer[1][152] = 21;
		videoBuffer[1][153] = 26;
		videoBuffer[1][154] = 19;
		videoBuffer[1][155] = 54;
		videoBuffer[1][156] = 54;
		videoBuffer[1][157] = 54;


		// Print `building` message if applicable
		if (!runAstroExecutable) {
			PrintColored("Building:", blackFGColor, whiteBGColor);
			cout << "\n\n";
		}

		
		if ((split(code, "\n")[0] == "#AS" || compileOnly) && !assembleOnly && !runAstroExecutable) {
			vector<std::string> codelines = PreProcess(code);

			for (int i = 0; i < codelines.size(); i++) {
				// If including another file:    #include "./path/to/file.arm"
				if (split(codelines[i], " ")[0] == "#include") {
					std::string clCpy = codelines[i];

					codelines.erase(codelines.begin() + i);	 // Remove the #include

					std::string path = trim(split(clCpy, " ")[1]);
					path.erase(std::remove(path.begin(), path.end(), '\''), path.end());  // Remove all single quotes
					path.erase(std::remove(path.begin(), path.end(), '\"'), path.end());  // Remove all double quotes
					path = path.substr(path.find_last_of("/\\") + 1, path.size());

					// Open and read file, appending code onto it after
					std::string codeTmp = "";
					std::string li;
					ifstream fileStr(path);
					if (fileStr.is_open()) {
						// Add jump to end (this prevents this code from executing accidentally)
						int randNum = rand() % 9999;
						codeTmp += "\ngoto #" + path + to_string(randNum) + "\n";

						// Get all lines of file
						while (getline(fileStr, li)) {
							if (li != "#AS")  // We don't need another Armstrong label, so we can remove it
								codeTmp += li + "\n";
						}

						// Label to jump to
						codeTmp += "\n#" + path + to_string(randNum);
						fileStr.close();
					}
					else {
						PrintColored("\nError: could not open file ", redFGColor, "");
						PrintColored("\"" + path + "\"\n", brightBlueFGColor, "");
						PrintColored("as used with include here:  ", redFGColor, "");
						PrintColored(clCpy + "\n", yellowFGColor, "");
						cout << "\n\nPress Enter to Exit...";
						cin.ignore();
						exit(1);
					}

					code = codeTmp + "\n\n" + VecToString(codelines);

					// Reprocess and go back to beginning
					codelines = PreProcess(code);
					i = 0;
				}
			}

			// Compile
			cout << "* Begin Compiling Armstrong...\n";
			code = CompileCode(code);

			if (code != "") {
				if (verbose) {
					cout << "   -  Output:\n";
					ColorAndPrintAssembly(code, instructions);
				}
			}
			else
				exit(0);
		}
		else if (!runAstroExecutable && verbose)
			ColorAndPrintAssembly(code, instructions);


		// Attempt to parse assembly, will throw error if not proper assembly
		if (!runAstroExecutable) {
			try {
				// Generate memory from code and convert from hex to decimal
				parseCode(code);
				vector<vector<std::string>> mbytes = parseCode(code);
				for (int membank = 0; membank < mbytes.size(); membank++)
					for (int memindex = 0; memindex < mbytes[membank].size(); memindex++)
						memoryBytes[membank][memindex] = (HexToDec(mbytes[membank][memindex]));
			}
			catch (const std::exception& e) {
				cout << e.what() << endl;
				PrintColored("\nError: failed to parse code. if you are trying to run Armstrong, make sure the first line of code contains  \"#AS\" ", redFGColor, "");
				cout << "\n\nPress Enter to Exit...";
				cin.ignore();
				exit(1);
			}
		}

		// Stop executing if successfully compiled/assembled
		if (compileOnly || assembleOnly) {
			PrintColored("\nFinished successfully", greenFGColor, "");
			exit(1);
		}

		// Start Emulation
		cout << "\n\n";
		PrintColored("Starting Emulation...", blackFGColor, whiteBGColor);
		cout << "\n\n";


		bool keyPress = false;
		bool running = true;
		SDL_Event event;
		SDL_Event lastEvent = SDL_Event();
		SDL_Event pendingEvent;
		int eventUses = 0;

		int updateCount = 0;
		int frameCount = 0;
		auto lastSecond = std::chrono::high_resolution_clock::now();
		auto lastFrame = lastSecond;
		auto lastTick = lastSecond;

		uint16_t lastKey = 0;
		uint16_t samekeyUses = 10;

		deque<KeyPress> keyRollover = {};
		std::map<int, bool> pressedKeys;

		// Draw the initial random data in the buffer, then clear it which would be done by the BIOS
		Draw();

		for (size_t i = 0; i < 10000000; i++) { videoBuffer[0][0] = 0; }
		videoBuffer = vector<vector<uint16_t>>(2, vector<uint16_t>(11990, 0));

		std::string receivedPath;
		std::string receivedData;
		bool returningFileData = false;
		uint16_t fileData[65535 * 4];
		int fileIterator = 0;
		uint16_t fileLength = 0;
		ofstream outputProgramFileStream;

        // Limit how long the program runs
		for (int cycles = 0; cycles < 64; cycles++) {
			auto startTime = std::chrono::high_resolution_clock::now();
			double secondDiff = std::chrono::duration<double, std::chrono::milliseconds::period>(startTime - lastSecond).count();
			double frameDiff = std::chrono::duration<double, std::chrono::milliseconds::period>(startTime - lastFrame).count();
			double tickDiff = std::chrono::duration<double, std::chrono::milliseconds::period>(startTime - lastTick).count();

			// CPU tick
			// Update 1000 times, otherwise chrono becomes a bottleneck
			// Chrono is slow in any case, so it may be replaced later
			// SDL_GetTick is probably not precise enough
			const int numUpdates = 1000;
			if (tickDiff > (numUpdates * 1000.0 / target_cpu_freq)) {
				lastTick = startTime;
				for (int i = 0; i < numUpdates; ++i)
					Update();
				updateCount += numUpdates;
			}

			// Frame
			if (frameDiff > (1000.0 / TARGET_RENDER_FPS)) {
				lastFrame = startTime;
				++frameCount;
				pendingEvent = SDL_Event();
				bool keyboardDecided = false;
				int undecidedKey = 168;

				if (usingFileSystem) {
					if (returningFileData) {
						if (memoryBytes[1][53505] == 0) {
							memoryBytes[1][53505] = fileData[fileIterator] | 0b100000000000000;
							if (fileIterator >= fileLength) {
								fileIterator = 0;
								returningFileData = false;
								memoryBytes[1][53505] = 4095;
							}
							else {
								fileIterator++;
							}
						}
					}
					else if (memoryBytes[1][53505] >= 0b1111000000000000) {	   // Save raw data
						uint8_t ch = memoryBytes[1][53505] & 255;			   // SDCII format character
						if (ch == 85 && !outputProgramFileStream.is_open()) {  // If newline character, attempt to open new file
							outputProgramFileStream.open(receivedPath);
							receivedPath = "";
						}
						else if (ch == 78 && outputProgramFileStream.is_open()) {  // If newline character, attempt to open new file
							outputProgramFileStream.close();
						}
						else {
							if (outputProgramFileStream.is_open())
								outputProgramFileStream << (char)(ConvertSdciiToAscii(ch));
							else
								receivedPath += (char)(ConvertSdciiToAscii(ch));
						}
						memoryBytes[1][53505] = 0;
					}
					else if (memoryBytes[1][53505] >= 0b1110000000000000) {	   // Save data
						uint8_t ch = memoryBytes[1][53505] & 255;			   // SDCII format character
						if (ch == 85 && !outputProgramFileStream.is_open()) {  // If newline character, attempt to open new file
							outputProgramFileStream.open(receivedPath);
							outputProgramFileStream << "Astro-8 Binary Text File Format\n";
							outputProgramFileStream << "<settings>\n";
							receivedPath = "";
						}
						else if (ch == 78 && outputProgramFileStream.is_open()) {  // If newline character, attempt to open new file
							outputProgramFileStream.close();
						}
						else {
							if (outputProgramFileStream.is_open())
								outputProgramFileStream << DecToHexFilled(ch, 4) << "\n";
							else
								receivedPath += (char)(ConvertSdciiToAscii(ch));
						}
						memoryBytes[1][53505] = 0;
					}
					else if (memoryBytes[1][53505] >= 0b1101000000000000) {	 // Load raw text data
						uint8_t ch = memoryBytes[1][53505] & 255;			 // SDCII format character
						if (ch == 85) {										 // If newline character, attempt to load from file
							cout << "\n\nLoading from file: " << receivedPath << endl
								 << endl;
							int it = 0;

							fstream fin(receivedPath, fstream::in);
							char cc;
							while (fin >> noskipws >> cc) {
								fileData[it] = ascToSdcii[cc];
								it++;
							}
							fileLength = it;
							cout << it << " characters read\n";
							fin.close();
							receivedPath = "";

							memoryBytes[1][53505] = 0;
							returningFileData = true;
						}
						else {
							receivedPath += (char)(ConvertSdciiToAscii(ch));
							memoryBytes[1][53505] = 0;
						}
					}
					else if (memoryBytes[1][53505] >= 0b1100000000000000) {	 // Load data
						uint8_t ch = memoryBytes[1][53505] & 255;			 // SDCII format character
						if (ch == 85) {										 // If newline character, attempt to load from file
							cout << "\n\nLoading from file: " << receivedPath << endl
								 << endl;
							ifstream hexFile;
							hexFile.open(receivedPath);
							receivedPath = "";
							std::string tmp;
							int it = 0;
							if (hexFile.is_open()) {
								getline(hexFile, tmp);
								getline(hexFile, tmp);
								while (getline(hexFile, tmp)) {
									fileData[it] = HexToDec(tmp);

									it++;
								}
								cout << it << " lines read\n";
								fileLength = it;
							}
							hexFile.close();
							memoryBytes[1][53505] = 0;
							returningFileData = true;
						}
						else {
							receivedPath += (char)(ConvertSdciiToAscii(ch));
							memoryBytes[1][53505] = 0;
						}
					}
					else if (memoryBytes[1][53505] >= 0b1000000000000000) {	 // Load data and run
						uint8_t ch = memoryBytes[1][53505] & 255;			 // SDCII format character
						if (ch == 85) {										 // If newline character, attempt to load from file
							cout << "\n\nStarting from file: " << receivedPath << endl
								 << endl;
							ifstream hexFile;
							hexFile.open(receivedPath);
							receivedPath = "";
							std::string tmp;
							int it = 0;
							if (hexFile.is_open()) {
								// Clear memory before loading
								for (int i = 0; i < memoryBytes.size(); i++) {
									memoryBytes[0][i] = 0;
									memoryBytes[1][i] = 0;
								}
								for (int i = 0; i < videoBuffer[0].size(); i++) {
									videoBuffer[0][i] = 0;
									videoBuffer[1][i] = 0;
								}
								getline(hexFile, tmp);
								getline(hexFile, tmp);
								while (getline(hexFile, tmp)) {
									memoryBytes[0][it] = HexToDec(tmp);

									it++;
								}
								programCounter = 0;
								cout << it << " lines read\n";
							}
							hexFile.close();
							memoryBytes[1][53505] = 0;
						}
						else {
							receivedPath += (char)(ConvertSdciiToAscii(ch));
							memoryBytes[1][53505] = 0;
						}
					}
				}

				memoryBytes[1][53500] = 168;
			}
		}
	}

	SDL_Quit();

	return 0;
}

There is a lot of code here, but the important thing is that we have rearranged it so that the program first initializes things that will stay active for as long as the program is running like SDL, importing the static Character ROM, configuring tones, etc. After that, it repeatedly loads in the powder-sim.arm program and executes it. The emulator doesn’t crash and appears to be correctly loading the program in each iteration so now it’s time to make it work with AFL++.

First, we’ll add unsigned char *buf = __AFL_FUZZ_TESTCASE_BUF before the test case loop. This pointer will point to the shared memory that AFL++ puts test cases into and our harness loads them from.

Second, we’ll swap out that for (int testCaseI = 0; testCaseI < 128; testCaseI++) loop with while (__AFL_LOOP(10000)). Every time a new loop happens, AFL++ will generate a new test case and put it in the memory that aflBuf points to. The length of test cases may vary and currently the program doesn’t know how much of aflBuf to read. For this, we’ll add the code int aflLen = __AFL_FUZZ_TESTCASE_LEN after the loop to get the test case size from AFL++.

Finally, we’ll update this code:

std::string li;
ifstream fileStr("/home/elelzedel/astro8/input/powder-sim.arm");

std::string code = "";

if (fileStr.is_open())
{
    while (getline(fileStr, li)) {
        code += trim(li) + "\n";
    }
    fileStr.close();
}

It needs to stop loading the program in from a static file and instead load it in from aflBuf.

Putting it all together, int main(...) now looks like this:

int main(int argc, char** argv)
{
    // Initialization code...
	
	unsigned char *aflBuf = __AFL_FUZZ_TESTCASE_BUF;

	while (__AFL_LOOP(10000)) {
		int aflLen = __AFL_FUZZ_TESTCASE_LEN;

		std::string li;
		stringstream stringStr(std::string(reinterpret_cast<char const*>(aflBuf), aflLen));

		std::string code = "";

		while (getline(stringStr, li)) {
			code += trim(li) + "\n";
		}

        // Parsing program code...

        // Main loop
		for (int cycles = 0; cycles < 64; cycles++) {
            // CPU emulation code...
            // Frame rendering code...
        }
    }

	SDL_Quit();

	return 0;
}

This time when we run afl-fuzz (dropping the @@ because it’s not pointing to a file anymore), it detects that the binary is designed to run in persistent mode:

[+] Persistent mode binary detected.

That was a lot of work, but now we’re getting 5 times the execution speed we originally got:

 AFL ++4.32c {default} (...ness/Astro8-Emulator/linux-build/astro8) [explore] 
┌─ process timing ────────────────────────────────────┬─ overall results ────┐
│        run time : 0 days, 0 hrs, 0 min, 20 sec      │  cycles done : 0     │
│   last new find : 0 days, 0 hrs, 0 min, 19 sec      │ corpus count : 30    │
│last saved crash : none seen yet                     │saved crashes : 0     │
│ last saved hang : none seen yet                     │  saved hangs : 0     │
├─ cycle progress ─────────────────────┬─ map coverage┴──────────────────────┤
│  now processing : 14.1 (46.7%)       │    map density : 3.13% / 19.11%     │
│  runs timed out : 0 (0.00%)          │ count coverage : 4.01 bits/tuple    │
├─ stage progress ─────────────────────┼─ findings in depth ─────────────────┤
│  now trying : interest 32/8          │ favored items : 12 (40.00%)         │
│ stage execs : 13/950 (1.37%)         │  new edges on : 16 (53.33%)         │
│ total execs : 499                    │ total crashes : 0 (0 saved)         │
│  exec speed : 8.75/sec (zzzz...)     │  total tmouts : 0 (0 saved)         │
├─ fuzzing strategy yields ────────────┴─────────────┬─ item geometry ───────┤
│   bit flips : 0/160, 0/159, 0/157                  │    levels : 2         │
│  byte flips : 0/20, 0/19, 0/17                     │   pending : 26        │
│ arithmetics : 0/1386, 0/2544, 0/2240               │  pend fav : 12        │
│  known ints : 0/175, 0/712, 0/0                    │ own finds : 1         │
│  dictionary : 0/0, 0/0, 0/0, 0/0                   │  imported : 0         │
│havoc/splice : 0/0, 0/0                             │ stability : 72.71%    │
│py/custom/rq : unused, unused, unused, unused       ├───────────────────────┘
│    trim/eff : n/a, 95.00%                          │          [cpu000: 25%]
└─ strategy: explore ────────── state: started :-) ──┘

Stability

Unfortunately, our new harness has a new issue. The stability was originally ~99%, but it has gone down to ~72% with the persistent harness. What does this mean? Stability is a representation of how consistent the program behaves with identical inputs. A stability of 100% means that the same test case will follow the same execution paths within the program every time it runs. The lower the stability, the more random the execution paths are and the less effective instrumentation will be.

To get our stability up, let’s first look at obvious examples of randomness being introduced into the program:

//// Fill video buffers with random data to emulate real ram chip
std::srand(time(0));
std::generate(videoBuffer[0].begin(), videoBuffer[0].end(), std::rand);
std::generate(videoBuffer[1].begin(), videoBuffer[1].end(), std::rand);

Every time the video buffer is initialized, it’s random. If any of the test cases are making decisions on data in the buffer it may cause instability. Let’s just patch it so it seeds with the same value every time:

std::srand(0);

Now let’s look at how throttling might cause instability. Within the main emulation loop we can see code that attempts to hit a specific CPU target frequency:

// CPU tick
// Update 1000 times, otherwise chrono becomes a bottleneck
// Chrono is slow in any case, so it may be replaced later
// SDL_GetTick is probably not precise enough
const int numUpdates = 1000;
if (tickDiff > (numUpdates * 1000.0 / target_cpu_freq)) {
    lastTick = startTime;
    for (int i = 0; i < numUpdates; ++i)
        Update();
    updateCount += numUpdates;
}

This makes a lot of sense, the emulated Astro-8 system should execute according to its specifications and not according to how fast the host computer runs. For our purposes though, this introduces a lot of instability as the load of the fuzzing system will affect whether CPU updates occur or not.

Let’s just have our harness execute exactly 1000 CPU cycles each loop while completely disregarding timing. This will massively improve the stability of the program and even allow us to squeeze more execution speed and coverage out of the program because it can max out the fuzzing system’s CPU without delaying to match the Astro-8 intended execution speed.

Finally, the huge jump in instability from the persistent harnesses tells us that a lot of variables and buffers are not being fully reset between runs. There are a lot of ways to track this down, but ultimately I had to just review all the variables being used by the program and make sure they are reset before each run. Some of the variables were very hard to track down like these:

vector<string> vars;
vector<string> labels;
vector<int> labelLineValues;
vector<string> compiledLines;

These are global variables hidden between functions in main.cpp. When the program runs for the first time they are all empty, but on subsequent runs they may have data in them. For these and all the variables like them, we need to add code after the start of the while (__AFL_LOOP(10000)) loop to reset them:

while (__AFL_LOOP(10000)) {
    vars.clear();
    labels.clear();
    labelLineValues.clear();
    compiledLines.clear();

    VideoBufReg = false;

    compileOnly = false;
    assembleOnly = false;
    runAstroExecutable = false;
    verbose = false;
    superVerbose = false;
    usingWebcam = false;
    imageOnlyMode = false;

    usingKeyboard = true;
    usingMouse = true;
    performanceMode = true;
    usingFileSystem = true;

    target_cpu_freq = 16000000;

    AReg = 0;
    BReg = 0;
    CReg = 0;
    BankReg = 0;
    InstructionReg = 0;
    flags[0] = 0;
    flags[1] = 0;
    bus = 0;
    outputReg = 0;
    memoryIndex = 0;
    programCounter = 0;

    imgX = 0;
    imgY = 0;
    charPixX = 0;
    charPixY = 0;
    characterRamIndex = 0;
    pixelRamIndex = 0;

    for (int i = 0; i < 4; i++) {
        lastFreq[i] = 440;
        isPlaying[i] = false;
        channelsPlaying[i] = false;
    }

    for (int i = 0; i < 6; i++)
        for(int k = 0; k < 65535; k++)
            memoryBytes[i][k] = 0;

    for (int i = 0; i < 2; i++)
        for(int k = 0; k < 11990; k++)
            videoBuffer[i][k] = 0;

    for (int i = 0; i < 108 * 108 * 4; i++)
        pixels[i] = 0;

    variableMap.clear();
    
    // ...
}

With those changes, our new harness stability jumps up to ~97%.

 AFL ++4.32c {master} (...rness/Astro8-Emulator/linux-build/astro8) [explore] 
┌─ process timing ────────────────────────────────────┬─ overall results ────┐
│        run time : 0 days, 0 hrs, 0 min, 15 sec      │  cycles done : 0     │
│   last new find : 0 days, 0 hrs, 0 min, 0 sec       │ corpus count : 28    │
│last saved crash : none seen yet                     │saved crashes : 0     │
│ last saved hang : none seen yet                     │  saved hangs : 0     │
├─ cycle progress ─────────────────────┬─ map coverage┴──────────────────────┤
│  now processing : 1.0 (3.6%)         │    map density : 3.86% / 19.21%     │
│  runs timed out : 0 (0.00%)          │ count coverage : 2.88 bits/tuple    │
├─ stage progress ─────────────────────┼─ findings in depth ─────────────────┤
│  now trying : inference              │ favored items : 12 (42.86%)         │
│ stage execs : 16/-                   │  new edges on : 16 (57.14%)         │
│ total execs : 296                    │ total crashes : 0 (0 saved)         │
│  exec speed : 0.00/sec (zzzz...)     │  total tmouts : 0 (0 saved)         │
├─ fuzzing strategy yields ────────────┴─────────────┬─ item geometry ───────┤
│   bit flips : 0/0, 0/0, 0/0                        │    levels : 2         │
│  byte flips : 0/0, 0/0, 0/0                        │   pending : 27        │
│ arithmetics : 0/0, 0/0, 0/0                        │  pend fav : 12        │
│  known ints : 0/0, 0/0, 0/0                        │ own finds : 5         │
│  dictionary : 0/0, 0/0, 0/0, 0/0                   │  imported : 0         │
│havoc/splice : 0/0, 0/0                             │ stability : 97.30%    │
│py/custom/rq : unused, unused, unused, unused       ├───────────────────────┘
│    trim/eff : disabled, n/a                        │          [cpu000: 20%]
└─ strategy: explore ────────── state: started :-) ──┘

There are some more tricks we can do to narrow down the remaining unstable edges by adding the __AFL_COVERAGE_ON() and __AFL_COVERAGE_OFF() macros to different places and retesting. However, for our purposes >97% is already really good.

Parallel Run

AFL++ has the ability to do parallel fuzzing where multiple afl-fuzz instances share test cases and work together to find new paths through the program. Doing this on one system is fairly straightforward, all you need to do is run a main node and several secondary instances pointed at the same output directory. We’ll automate launching all these instances with a quick script that launches them in a tmux session.

For testing, I kept as much SDL rendering in place as possible so I could visually check that the Astro-8 Emulator harness was still running the test cases correctly. Now that that we are ready for parallel fuzzing though, rather than having 22 windows trying to render to the screen, we will set the environment variable SDL_VIDEODRIVER to dummy. This will drastically improve performance and make it easier to interact with the system.

Here is what the script looks like with all of this in mind:

#!/bin/bash

BASE_DIR="/home/elelzedel/astro8"

INPUT_DIR="$BASE_DIR/input"
OUTPUT_DIR="$BASE_DIR/output"

EXE="$BASE_DIR/builds/41fb1a5_harness/Astro8-Emulator/linux-build/astro8"

TIMEOUT=1000

TMUX_SESSION=fuzz
WORKERS=21

export SDL_VIDEODRIVER=dummy

tmux kill-ses -t $TMUX_SESSION
tmux new -d -s $TMUX_SESSION

for i in $(seq $WORKERS); do
	tmux new-window -t $TMUX_SESSION
done

# Launch main
tmux send-keys -t $TMUX_SESSION:0.0 "afl-fuzz -t $TIMEOUT \\" Enter
tmux send-keys -t $TMUX_SESSION:0.0 "	-M main \\" Enter
tmux send-keys -t $TMUX_SESSION:0.0 "	-i $INPUT_DIR \\" Enter
tmux send-keys -t $TMUX_SESSION:0.0 "	-o $OUTPUT_DIR \\" Enter
tmux send-keys -t $TMUX_SESSION:0.0 "	-- $EXE" Enter

# Launch workers
for i in $(seq $WORKERS); do
	tmux send-keys -t $TMUX_SESSION:$i.0 "afl-fuzz -t $TIMEOUT \\" Enter
	tmux send-keys -t $TMUX_SESSION:$i.0 "	-S worker_$i \\" Enter
	tmux send-keys -t $TMUX_SESSION:$i.0 "	-i $INPUT_DIR \\" Enter
	tmux send-keys -t $TMUX_SESSION:$i.0 "	-o $OUTPUT_DIR \\" Enter
	tmux send-keys -t $TMUX_SESSION:$i.0 "	-- $EXE" Enter
done

After running the script, we can tab through the tmux session to see individual fuzzer stats, or use the afl-whatsup command to see an overview of all the instances:

$ afl-whatsup -s ~/astro8/output
/usr/local/bin/afl-whatsup status check tool for afl-fuzz by Michal Zalewski
 
Summary stats
=============

        Fuzzers alive : 22
       Total run time : 22 minutes, 0 seconds
          Total execs : 13 thousands
     Cumulative speed : 207 execs/sec
  Total average speed : 9 execs/sec
Current average speed : 132 execs/sec
        Pending items : 239 faves, 417 total
   Pending per fuzzer : 10 faves, 18 total (on average)
     Coverage reached : 17.86%
        Crashes saved : 4
          Hangs saved : 0
 Cycles without finds : 0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0
   Time without finds : 50 seconds

Triaging Crashes

After a few days, the parallel fuzzers have ran an impressive 42 million executions and have saved 9371 crashes:

$ afl-whatsup -s ~/astro8/output                                                                                       ~/astro8/scripts
/usr/local/bin/afl-whatsup status check tool for afl-fuzz by Michal Zalewski

Summary stats
=============

        Fuzzers alive : 22
       Total run time : 51 days, 8 hours
          Total execs : 42 millions
     Cumulative speed : 199 execs/sec
  Total average speed : 9 execs/sec
Current average speed : 211 execs/sec
        Pending items : 0 faves, 28375 total
   Pending per fuzzer : 0 faves, 1289 total (on average)
     Coverage reached : 28.65%
        Crashes saved : 9371
          Hangs saved : 1251
 Cycles without finds : 0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0/0
   Time without finds : 1 minutes, 14 seconds

Over 9000

We need a version of the Astro-8 Emulator with symbols added and optimization disabled so that we can debug everything in GDB effectively. First, we will create a new 41fb1a5_debug directory with a fresh clone of the Astro-8 repo. Then, we will pass some environment variables to CMake to have it generate a Makefile that builds the program with symbols added and optimization disabled. Finally, we will call make against that Makefile:

$ cmake -DCMAKE_BUILD_TYPE=Debug -DCMAKE_C_FLAGS_DEBUG="-g -O0" -DCMAKE_CXX_FLAGS_DEBUG="-g -O0" ..
$ make

Many of these crashes likely have the same root cause, and manually reviewing all of them in GDB will take ages. What if we instead build a Python script that attempts to group these crashes together based on the similarity of their crash state? We can create a hash of the backtrace for every crash (I stole this idea from the exploitable plugin). We can then group crashes with the same hash into the same folder and put a single text file that shows the backtrace also in the folder. For cases where the program does not crash at all, or crashes in a way that is not particularly dangerous, we can throw those in an uninteresting folder. Here is what the completed script looks like:

import os
import hashlib
from pprint import pprint
from time import sleep, time
import threading
from queue import Queue

from pygdbmi import gdbmiparser
from pygdbmi.gdbcontroller import GdbController

TIMEOUT = 20
THREADS = 23

ROOT_DIR = "/home/elelzedel/astro8"
EXE_PATH = f"{ROOT_DIR}/builds/41fb1a5_debug/Astro8-Emulator/linux-build/astro8"
CRASH_DIR = f"{ROOT_DIR}/crashes"

CRASH_UNINTERESTING_DIR = f"{CRASH_DIR}/uninteresting"

UNINTERESTING_STRINGS = [
    "Error: failed to parse code.",
    "std::__throw_invalid_argument",
    "std::__throw_out_of_range",
    "terminate called after throwing an instance of 'std::out_of_range'",
    "Error: could not open file"
]

def worker(crash_queue):
    gdbmi = GdbController()

    gdbmi.write(f"-file-exec-and-symbols {EXE_PATH}")

    while True:
        try:
            crash = crash_queue.get(timeout=5)
        except:
            return

        print(f"-exec-arguments {CRASH_DIR}/{crash}")
        gdbmi.write(f"-exec-arguments {CRASH_DIR}/{crash}", timeout_sec=20)
        gdbmi.write("-exec-run", timeout_sec=20)

        stack = None

        start_time = time()

        while True:
            for response in gdbmi.write(
                "-stack-list-frames",
                raise_error_on_timeout=False
            ):
                if "message" in response.keys() and response["message"] == "done":
                    stack = response["payload"]["stack"]
                    break

            if stack is not None:
                break

            if time() - start_time > TIMEOUT:
                break

            sleep(1)

        os.kill(gdbmi.gdb_process.pid, 2)

        # The program timed out and didn't crash against the debug exe
        if stack is None:
            os.rename(f"{CRASH_DIR}/{crash}", f"{CRASH_UNINTERESTING_DIR}/{crash}")
        elif any(uninteresting_string in str(stack) for uninteresting_string in UNINTERESTING_STRINGS):
            os.rename(f"{CRASH_DIR}/{crash}", f"{CRASH_UNINTERESTING_DIR}/{crash}")
        elif "Error: failed to parse code." in str(stack):
            os.rename(f"{CRASH_DIR}/{crash}", f"{CRASH_UNINTERESTING_DIR}/{crash}")
        else:
            sha1 = hashlib.sha1()
            sha1.update(str(stack).encode("utf-8"))
            digest = sha1.hexdigest()
            os.system(f"mkdir -p {CRASH_DIR}/crashes_{digest}")
            os.rename(f"{CRASH_DIR}/{crash}", f"{CRASH_DIR}/crashes_{digest}/{crash}")
            if os.path.exists(f"{CRASH_DIR}/crashes_{digest}/stack"):
                continue

            with open(f"{CRASH_DIR}/crashes_{digest}/stack", "w") as stack_file:
                for response in gdbmi.write("bt"):
                    if response["payload"] is not None:
                        stack_file.write(response["payload"])

if __name__ == "__main__":
    os.system(f"rm -fr {CRASH_DIR}")
    os.system(f"mkdir -p {CRASH_DIR}")
    os.system(f"cp -r {ROOT_DIR}/output/*/crashes/* {CRASH_DIR}")
    os.system(f"rm {CRASH_DIR}/README.md")

    i = 0
    for crash in os.listdir(CRASH_DIR):
        i_str = str(i).zfill(4)
        os.rename(f"{CRASH_DIR}/{crash}", f"{CRASH_DIR}/crash_{i_str}")
        i = i + 1

    os.system(f"mkdir -p {CRASH_UNINTERESTING_DIR}")

    crash_queue = Queue()

    for crash in os.listdir(CRASH_DIR):
        if os.path.isdir(f"{CRASH_DIR}/{crash}"):
            continue
        crash_queue.put(crash)

    threads = []
    for _ in range(THREADS):
        thread = threading.Thread(target=worker, args=(crash_queue,))
        thread.start()
        threads.append(thread)

    for thread in threads:
        thread.join()

After about 16 minutes, this script was able to sort all 9371 crashes neatly into 70 folders:

$ tree ~/astro8/crashes

crashes_ec2ad4b05725652d28d8d083735546338609bc1b
├── crash_0058
├── crash_0534
├── crash_3722
├── crash_4286
├── crash_5446
├── crash_6307
├── crash_7157
├── crash_7523
├── crash_7818
└── stack
crashes_efb7019f3a251fe713c8c42a3dc02ac0ab8da044
├── crash_6465
└── stack
crashes_fcdcbb6f837e9c92e995b552123d30357067477a
├── crash_0356
├── crash_2179
├── crash_3070
├── crash_3099
├── crash_5926
├── crash_7617
├── crash_7910
├── crash_8088
├── crash_8182
├── crash_8610
├── crash_8787
└── stack
crashes_fdc09e58b36b616ed424d1ab768bad0f8b164514
├── crash_2693
├── crash_6787
└── stack
uninteresting
├── crash_0000
├── ...
└── crash_9353
...

70 directories, 9431 files

We can now test just one crash from each folder since they should all be similar.

Crash Root Cause Analysis

After looking at the stack files in a few folders, the crashes_19d9f1b27afe8328d5ebf22c7034d9637c8f31b0 group looks like they trigger out-of-bounds writes which has the most potential for danger. Let us manually load it in with GDB:

$ gdb ~/astro8/builds/41fb1a5_debug/Astro8-Emulator/linux-build/astro8

// ...

(gdb) set args /home/elelzedel/astro8/crashes/crashes_19d9f1b27afe8328d5ebf22c7034d9637c8f31b0/crash_0130
(gdb) run

// ...

Thread 1 "astro8" received signal SIGSEGV, Segmentation fault.
SetMem (bank=7, address=288, data=0) at /home/elelzedel/astro8/builds/41fb1a5_debug/Astro8-Emulator/main.cpp:1850
1850				memoryBytes[bank][address] = data;

Here is the function that triggered that segmentation fault:

void SetMem(uint16_t bank, uint16_t address, uint16_t data)
{
    // SNIP: Code to handle other banks
	// Otherwise just set the memory value to the bus
	else {
		if (bank == 1 && address >= 53546)	// If a video location
			videoBuffer[(int)VideoBufReg][address - 53546] = data;
		else  // Else a normal memory location
			memoryBytes[bank][address] = data; // This is where the segmentation fault happens
	}
}

It appears that data is being written outside of the allocated memory for memoryBytes. Looking at memoryBytes closer, here is where it is initialized:

int main(int argc, char** argv)
{
#if DEV_MODE
	verbose = true;
#endif

	// Fill the memory
	memoryBytes = vector<vector<uint16_t>>(6, vector<uint16_t>(65535, 0));
	videoBuffer = vector<vector<uint16_t>>(2, vector<uint16_t>(11990, 0));

    // ...
}

It looks like memoryBytes is only supposed to have 6 banks, but we are able to access 8 banks (7 is the 8th bank because the arrays are zero-indexed).

There are several Astro-8 instructions that can update the bank the virtual CPU is using, but they all include this line:

BankReg = arg & 0b111;

This is a bitwise AND that ensures that only the lowest 3 bits of the argument are kept when setting the BankReg register. This essentially limits BankReg to values between 0 and 7. This is an issue because only 0 to 5 have memory allocated to them. The crashes that actually trigger this vulnerability take a very convoluted path. Since this vulnerability isn’t too complicated, though, we can just read through the Astro-8 Instruction Set and write our own program that triggers these conditions:

LDIA 65
STLGE 6
HERE 50000

The first line loads the decimal value 65 into register A. The second line tells the processor to store register A to the subsequent 16-bit address in bank 6. The last line puts the decimal 50000 into memory directly as a full 16-bit value so that the previous instruction knows where to store register A’s value.

Going back to GDB, let us load this in and set a breakpoint to SetMem to see if everything works.

(gdb) set args /home/elelzedel/astro8/workspace/crash_poc
(gdb) b SetMem
Breakpoint 1 at 0x3e430: file /home/elelzedel/astro8/builds/41fb1a5_debug/Astro8-Emulator/main.cpp, line 1799.
(gdb) run

// ...

Thread 1 "astro8" hit Breakpoint 1, SetMem (bank=6, address=50000, data=65)
    at /home/elelzedel/astro8/builds/41fb1a5_debug/Astro8-Emulator/main.cpp:1799
1799		if (bank == 1 && address == 53502) {

Perfect! Our program has correctly set each of the arguments in the SetMem call. Let us see if it crashes:

(gdb) c
Continuing.

Thread 1 "astro8" received signal SIGSEGV, Segmentation fault.
SetMem (bank=6, address=50000, data=65) at /home/elelzedel/astro8/builds/41fb1a5_debug/Astro8-Emulator/main.cpp:1851
1851				memoryBytes[bank][address] = data;

It crashes! We have a perfect starting point to build this into a POC that can do something more dangerous. For the purposes of this article though, we have enough information to write up a bug report and just get it fixed. Either the bank register needs to be limited to values from 0 or 6 or the memoryBytes needs to be expanded to reserve 8 banks.

Conclusion

I hope you learned something in this rather lengthy discussion of fuzzing, harnesses, AFL++, speed optimization, and stability tuning. Security testing comes in many forms and, for memory unsafe languages, instrumented fuzzing is one of the most interesting and fun techniques. Despite AFL++ doing the actual testing, it takes a lot of expertise to actually turn a program into something it can use.

Tags:

Elelzedel's Projects

Security and hardware enthusiast