Making a C++23 Toolchain for the Mega Drive


[Cross-post from Clownacy's Corner]

One of the great things about C++ is its zero-overhead abstractions, which enable the creation of code which is every bit as performant as C, or even assembly, while being far more concise. Since 2012, I have been programming in assembly for the Mega Drive, whose CPU is a Motorola 68000. This CPU is supported by both GCC and Clang/LLVM, which had me wondering how feasible it is to create homebrew using C++ by leveraging one of these two compilers.

Of course, there already exist toolchains for using C++ to create Mega Drive homebrew, such as the venerable SGDK, but I wanted to start from scratch, going through the bootstrapping process as if the Mega Drive were a new embedded platform. For this, I would need to learn how to produce a cross-compiler.

Creating the Cross-Compiler
My research quickly led me to this invaluable article on the OSDev wiki. It detailed that I would need to download the source code of GNU Binutils and GCC, then configure, compile, and install them both in typical autotools fashion. Despite the very Unix-centric build system, I was able to build both of these on Windows, using MSYS2. I opted to use the path 'C:/msys2/opt/clownsdk' as the installation location, similarly to devkitPro.

Binutils did not require any special configuration, however GCC did. These are the options that I used:

--target=m68k-elf --prefix=/opt/clownsdk --disable-nls --enable-languages=c,c++ --without-headers --disable-multilib --with-cpu=68000

These options specify that the compiler should only target the Motorola 68000: '--target=m68k-elf' selects the m68k family of CPUs, '--with-cpu=68000' makes the compiler target the 68000 by default, and '--disable-multilib' disables support for other CPUs in the m68k family such as the 68020 and 68040. Because the Mega Drive is a bare-metal embedded platform, the '--without-headers' flag is used to alert the compiler that there is no C standard library available.

With these settings, I was able to build a working Motorola 68000 compiler. By passing it the '-S' command line flag, I could convert C/C++ to Motorola 68000 assembly, enabling me to examine how compiler flags would influence the generation of assembly code:

Affecting Assembly Code Generation
By default, GCC treats 'int' as 32-bit, which does not suit the 68000 well as 32-bit operations are slower than 16-bit and 8-bit, and it has no 32-bit multiplication and division instructions. As a result of this, multiplications and divisions are achieved with calls to helper functions instead, which are incredibly slow. By passing GCC the '-mshort' flag, it can be made to treat 'int' as 16-bit instead, which results in the generation of far more natural assembly.

When compiling code, it is important to pass the '-ffreestanding' and '-nostdlib' flags to GCC, as these prevent the compiler from trying to use the non-existent C standard library. By doing this, it is possible to compile C/C++ to object files ('.o') and even shared object files ('.so'). Compiling to executable files ('.elf') mostly works, though the linker will complain about the lack of an entry point.

Only a small part of the C standard library is available, such as the 'stdbool.h' and 'stdint.h' headers. Because of this, it is not possible to use things like the 'strlen', 'assert', and 'qsort' functions. Likewise, there is no C++ standard library whatsoever. This is very much a 'naked' version of C and C++, where you are forced to make do with only the features of the language itself.

It was with this build environment that I was able to start writing some Mega Drive library code in C++. In particular, I created a partial port of my modified SMPS sound driver - the Sonic 2 Clone Driver v2. By making the compiler produce position-independent code with no global state, I could make it generate a binary which I could include directly into a Sonic ROM-hack and use as a kind of binary blob.

Normally, the generated object files are not truly position-independent as they require relocation at runtime. They require relocation because GCC does not use Program Counter-relative addressing by default, however it can be made to do so by passing it the '-mpcrel' flag. By doing this, truly position-independent code is produced, which can be used as-is with no relocation.

Being an embedded platform, it is necessary to read from and write to various memory addresses, such as to access the console's YM2612 sound chip. For this, volatile pointers are necessary:

static volatile unsigned char &YM2612_A0 = *reinterpret_cast<volatile unsigned char*>(0xA04000);
static volatile unsigned char &YM2612_D0 = *reinterpret_cast<volatile unsigned char*>(0xA04001);
static volatile unsigned char &YM2612_A1 = *reinterpret_cast<volatile unsigned char*>(0xA04002);
static volatile unsigned char &YM2612_D1 = *reinterpret_cast<volatile unsigned char*>(0xA04003);

Unfortunately, GCC's handling of volatile pointers is quite clumsy, causing it to frequently needlessly reload the address into a register. To avoid this, accessing raw memory addresses can instead be handled by inline assembly:

void WriteFMI(const unsigned char port, const unsigned char value)
   asm volatile(
       "    tst.b    (%0)\n"     // 8(2/0)
       "    bmi.s    0b\n"       // 10(2/0) | 8(1/0)
       "    move.b    %1,(%0)\n"  // 8(1/1)
       "    move.b    %2,1(%0)\n" // 12(2/1)
       "    nop\n"              // 4(1/0)
       "    nop\n"              // 4(1/0)
       "    nop\n"              // 4(1/0)
       : "a" (YM2612), "idQUm" (port), "idQUm" (value)
       : "cc"

The syntax for this is fairly complex, but also surprisingly flexible and powerful: with clever usage of the input and output operands, it is possible to allow the compiler to inline literal inputs and to load non-literal inputs into registers and reuse them between instances of the inline assembly.

The 68000 has a terrible calling convention, where every argument is passed as a 32-bit value on the stack. Because of this, it is desirable to minimise function calls. This can be achieved by marking functions as 'static' wherever possible, allowing the compiler to inline them. For class methods and functions with external visibility, the process is a bit more complicated: link-time optimisation would suffice, but globally-visible functions and methods are considered to be 'exported' by default, meaning that they are made part of the library's API, which prevents the compiler from being able to inline them. To resolve this, the default visibility must be changed also. Enabling link-time optimisation and changing the default visibility are done by passing the '-flto' and '-fvisibility=hidden' flags, respectively. With this, slow, stack-hungry function calls will be avoided as much as possible.

C++ Utilities
While I did not find the lack of a C standard library to be a problem, the lack of C++ niceties such as 'std::array', 'std::min', 'std::max', and 'std::eek:ptional' was, as it meant that all of the zero-overhead abstractions that I liked so much were gone. However, a post on OSDev's forum tipped me off that it is possible to add a portion of the C++ standard library to the toolchain: GCC must be reconfigured with the '--disable-hosted-libstdcxx' flag, and then 'make all-target-libstdc++-v3 install-strip-target-libstdc++-v3' can be ran to produce and install a "free-standing" C++ standard library. Now, all of the aforementioned C++ utilities were available for use!

By doing all of this, I was now able to write appealing C++ code that compiled to efficient 68000 assembly code. However, I soon became interested in creating more than just position-independent libraries: I wanted to be able to make executables.

Making Executables
I knew from the 68 Katy's Linux kernel port that bare-metal C/C++ software needs to be linked with a small bootstrapping program. This program is typically written in assembly and is responsible for initialising the hardware and copying the contents of the executable's '.data' section to RAM before finally executing the C/C++ software. This is not too dissimilar to the usual start-up process of Mega Drive games, so I was able to create such a bootstrapping program without much trouble: the program starts with a standard 68000 vector table, followed by the standard Sega boot-code for initialising the hardware ('ICD_BLK4.PRG'), followed by an instruction that jumps to the C++ code's EntryPoint function:

   dc.l    0x00000000,.Lentry,BusErrorHandler,AddressErrorHandler
   dc.l    IllegalInstructionHandler,DivisionByZeroHandler,CHKHandler,TRAPVHandler
   dc.l    PrivilegeViolationHandler,TraceHandler,UnimplementedInstructionLineAHandler,UnimplementedInstructionLineFHandler
   dc.l    UnassignedHandler,UnassignedHandler,UnassignedHandler,UninitialisedInterruptHandler
   dc.l    UnassignedHandler,UnassignedHandler,UnassignedHandler,UnassignedHandler
   dc.l    UnassignedHandler,UnassignedHandler,UnassignedHandler,UnassignedHandler
   dc.l    SpuriousInterruptHandler,Level1InterruptHandler,Level2InterruptHandler,Level3InterruptHandler
   dc.l    Level4InterruptHandler,Level5InterruptHandler,Level6InterruptHandler,Level7InterruptHandler
   dc.l    TRAP0Handler,TRAP1Handler,TRAP2Handler,TRAP3Handler
   dc.l    TRAP4Handler,TRAP5Handler,TRAP6Handler,TRAP7Handler
   dc.l    TRAP8Handler,TRAP9Handler,TRAP10Handler,TRAP11Handler
   dc.l    TRAP12Handler,TRAP13Handler,TRAP14Handler,TRAP15Handler
   dc.l    UnassignedHandler,UnassignedHandler,UnassignedHandler,UnassignedHandler
   dc.l    UnassignedHandler,UnassignedHandler,UnassignedHandler,UnassignedHandler
   dc.l    UnassignedHandler,UnassignedHandler,UnassignedHandler,UnassignedHandler
   dc.l    UnassignedHandler,UnassignedHandler,UnassignedHandler,UnassignedHandler
   .incbin "ICD_BLK4.BIN"

   | Load DATA section.
   lea    (_DATA_ROM_START_).l,%a0
   lea    (_DATA_RAM_START_).l,%a1
   move.l    #_DATA_SIZE_,%d0
   move.w    %d0,%d1
   lsr.l    #4,%d0
   andi.w    #0xC,%d1
   eori.w    #0xC,%d1
   lsr.w    #1,%d1
   jmp    .Lloop(%pc,%d1.w)
   move.l    (%a0)+,(%a1)+
   move.l    (%a0)+,(%a1)+
   move.l    (%a0)+,(%a1)+
   move.l    (%a0)+,(%a1)+
   dbf    %d0,.Lloop

   | Clear BSS section.
   moveq    #0,%d2
   lea    (_BSS_START_).l,%a1
   move.l    #_BSS_SIZE_,%d0
   move.w    %d0,%d1
   lsr.l    #4,%d0
   andi.w    #0xC,%d1
   eori.w    #0xC,%d1
   lsr.w    #1,%d1
   jmp    .Lloop2(%pc,%d1.w)
   move.l    %d2,(%a1)+
   move.l    %d2,(%a1)+
   move.l    %d2,(%a1)+
   move.l    %d2,(%a1)+
   dbf    %d0,.Lloop2

   | Jump into the user-code.
   jmp    (EntryPoint).l

For the bootstrapping program to be properly linked to the executable, a linker script would be needed. With a linker script, I could obtain the position and length of the '.data' section, ensure that the bootstrapping program be located at the start of the executable, specify the memory layout of the Mega Drive (so that code is expected to be located in ROM at 0x000000 and variables are expected to be located in RAM at 0xFFFF0000), and make the linker output a raw flat binary instead of an ELF executable file:


   ROM (rx) : ORIGIN = 0x00000000, LENGTH = 4M
   RAM (wx) : ORIGIN = 0xFFFF0000, LENGTH = 64K

   .rom() : {
       . = ALIGN(4);
       . = ALIGN(4);
       . = ALIGN(4);
   } > ROM

   .ram : {
       . = ALIGN(2);
       . = ALIGN(4);
       . = ALIGN(2);
_BSS_START_ = .;
       . = ALIGN(4);
_BSS_END_ = .;
   } > RAM AT> ROM

   /DISCARD/ : {

The bootstrapping program expects various symbols to be provided by the C++ code, such as 'EntryPoint', 'BusErrorHandler' and 'Level1InterruptHandler'. All except for EntryPoint are interrupt handlers, and respond to things like the vertical-blanking interrupt and various types of crashes and exceptions. These interrupt handlers need to be declared in a special way so that the compiler knows to end them with the 'rte' instruction instead of the usual 'rts' instruction. This is done by declaring the functions with '__attribute__ ((interrupt))'. All of these functions need to be declared with 'extern "C"' so that the bootstrapping program can link to them properly.

With all that done, I was able to produce a standard Mega Drive ROM file, and I was pleasantly surprised to see that it successfully booted in my Mega Drive emulator. After that, I worked on creating a library for interfacing with the Mega Drive hardware, such as reading the controllers and uploading graphical data to the video display processor. That, in turn, allowed me to make the homebrew more elaborate, eventually developing into a Columns-like block stacking puzzle game:


As I was making this homebrew, I encountered a peculiar error: "symbol '__umodsi3' undefined". This is one of those helper functions that I mentioned earlier, which implements 32-bit modulo because the 68000 lacks such an instruction. These functions are provided by libgcc, which can be compiled and installed similarly to the libstdc++-v3 library from before. Instructions for this can be found in the OSDev article. Even with libgcc installed, however, the error persisted; this is because, unlike libstdc++-v3, libgcc needs to be linked to the program ('-lgcc'). With that done, the error was finally gone.

Supporting Global Constructors
As is typical for a puzzle game, the first piece that the player is given should be random. To implement this, I had a global variable that represented the colour of the current piece, and I initialised it using a call to the game's RandomColour function. I expected such code to produce a compiler error, as it does in C, however, C++ actually allows this. Yet, when I ran the game, the piece would always be the default colour. I could even booby-trap the RandomColour function to crash the game, and yet it would not. Clearly, the variable was not being initialised properly.

Leave it to the OSDev wiki to save the day again! In another article, it is detailed that a few extra assembly files need to be written and linked, and the bootstrapping code needs to call a function called '_init', which will in turn call the global constructors. The instructions given are mainly for x86 platforms, but it is easy enough to fill in the gaps for the 68000. Here are the 68000 versions of the 'crti.s' and 'crtn.s' files:

.section .init
.global _init

.section .fini
.global _fini
.section .init

.section .fini

With this done, RandomColour was finally being called and the piece was set to a random colour at the start of the game!

Because Mega Drive games are incapable of exiting, there is no need to implement global destructors. As a result, the linker script discards anything from the '.dtors' and '.fini' sectors.

At this point, the toolchain seems to be very complete, at least as far as free-standing C++ compilers go.

Writing C++ is way easier than writing assembly, since I do not have to worry about register allocation and implementing complex algorithms with long lists of dual-operand instructions. The code-generation can be sub-par ('compilers write better assembly than humans', my arse), but if you are mindful to work around GCC's quirks, you can still get it to produce very efficient assembly.

I have always wondered about the process of turning object files into executables and how they are executed, so to finally explore this subject has been a treat! For the longest time, C and C++ existed in a bubble to me: while I understood the languages themselves, I did not understand the environment in which they operated, and yet this was the complete opposite of how I understood assembly, as I knew the process of initialising the Mega Drive all the way down to the 68000's vector table. With this knowledge, I can finally bridge the gap between the two, and now understand how to initialise a system with assembly, and then run C/C++ code!
Last edited: