Making a C++23 Toolchain for the Mega Drive

Clownacy

Member
Messages
40
[Cross-post from Clownacy's Corner]

One of the great things about C++ is its zero-overhead abstractions, which enable the creation of code which is every bit as performant as C, or even assembly, while being far more concise. Since 2012, I have been programming in assembly for the Mega Drive, whose CPU is a Motorola 68000. This CPU is supported by both GCC and Clang/LLVM, which had me wondering how feasible it is to create homebrew using C++ by leveraging one of these two compilers.

Of course, there already exist toolchains for using C++ to create Mega Drive homebrew, such as the venerable SGDK, but I wanted to start from scratch, going through the bootstrapping process as if the Mega Drive were a new embedded platform. For this, I would need to learn how to produce a cross-compiler.

Creating the Cross-Compiler
My research quickly led me to this invaluable article on the OSDev wiki. It detailed that I would need to download the source code of GNU Binutils and GCC, then configure, compile, and install them both in typical autotools fashion. Despite the very Unix-centric build system, I was able to build both of these on Windows, using MSYS2. I opted to use the path 'C:/msys2/opt/clownsdk' as the installation location, similarly to devkitPro.

Binutils did not require any special configuration, however GCC did. These are the options that I used:

Code:
--target=m68k-elf --prefix=/opt/clownsdk --disable-nls --enable-languages=c,c++ --without-headers --disable-multilib --with-cpu=68000


These options specify that the compiler should only target the Motorola 68000: '--target=m68k-elf' selects the m68k family of CPUs, '--with-cpu=68000' makes the compiler target the 68000 by default, and '--disable-multilib' disables support for other CPUs in the m68k family such as the 68020 and 68040. Because the Mega Drive is a bare-metal embedded platform, the '--without-headers' flag is used to alert the compiler that there is no C standard library available.

With these settings, I was able to build a working Motorola 68000 compiler. By passing it the '-S' command line flag, I could convert C/C++ to Motorola 68000 assembly, enabling me to examine how compiler flags would influence the generation of assembly code:

Affecting Assembly Code Generation
By default, GCC treats 'int' as 32-bit, which does not suit the 68000 well as 32-bit operations are slower than 16-bit and 8-bit, and it has no 32-bit multiplication and division instructions. As a result of this, multiplications and divisions are achieved with calls to helper functions instead, which are incredibly slow. By passing GCC the '-mshort' flag, it can be made to treat 'int' as 16-bit instead, which results in the generation of far more natural assembly.

When compiling code, it is important to pass the '-ffreestanding' and '-nostdlib' flags to GCC, as these prevent the compiler from trying to use the non-existent C standard library. By doing this, it is possible to compile C/C++ to object files ('.o') and even shared object files ('.so'). Compiling to executable files ('.elf') mostly works, though the linker will complain about the lack of an entry point.

Only a small part of the C standard library is available, such as the 'stdbool.h' and 'stdint.h' headers. Because of this, it is not possible to use things like the 'strlen', 'assert', and 'qsort' functions. Likewise, there is no C++ standard library whatsoever. This is very much a 'naked' version of C and C++, where you are forced to make do with only the features of the language itself.

It was with this build environment that I was able to start writing some Mega Drive library code in C++. In particular, I created a partial port of my modified SMPS sound driver - the Sonic 2 Clone Driver v2. By making the compiler produce position-independent code with no global state, I could make it generate a binary which I could include directly into a Sonic ROM-hack and use as a kind of binary blob.

Normally, the generated object files are not truly position-independent as they require relocation at runtime. They require relocation because GCC does not use Program Counter-relative addressing by default, however it can be made to do so by passing it the '-mpcrel' flag. By doing this, truly position-independent code is produced, which can be used as-is with no relocation.

Being an embedded platform, it is necessary to read from and write to various memory addresses, such as to access the console's YM2612 sound chip. For this, volatile pointers are necessary:

Code:
static volatile unsigned char &YM2612_A0 = *reinterpret_cast<volatile unsigned char*>(0xA04000);
static volatile unsigned char &YM2612_D0 = *reinterpret_cast<volatile unsigned char*>(0xA04001);
static volatile unsigned char &YM2612_A1 = *reinterpret_cast<volatile unsigned char*>(0xA04002);
static volatile unsigned char &YM2612_D1 = *reinterpret_cast<volatile unsigned char*>(0xA04003);


Unfortunately, GCC's handling of volatile pointers is quite clumsy, causing it to frequently needlessly reload the address into a register. To avoid this, accessing raw memory addresses can instead be handled by inline assembly:

Code:
void WriteFMI(const unsigned char port, const unsigned char value)
{
   asm volatile(
       "0:\n"
       "    tst.b    (%0)\n"     // 8(2/0)
       "    bmi.s    0b\n"       // 10(2/0) | 8(1/0)
       "    move.b    %1,(%0)\n"  // 8(1/1)
       "    move.b    %2,1(%0)\n" // 12(2/1)
       "    nop\n"              // 4(1/0)
       "    nop\n"              // 4(1/0)
       "    nop\n"              // 4(1/0)
       :
       : "a" (YM2612), "idQUm" (port), "idQUm" (value)
       : "cc"
   );
}


The syntax for this is fairly complex, but also surprisingly flexible and powerful: with clever usage of the input and output operands, it is possible to allow the compiler to inline literal inputs and to load non-literal inputs into registers and reuse them between instances of the inline assembly.

The 68000 has a terrible calling convention, where every argument is passed as a 32-bit value on the stack. Because of this, it is desirable to minimise function calls. This can be achieved by marking functions as 'static' wherever possible, allowing the compiler to inline them. For class methods and functions with external visibility, the process is a bit more complicated: link-time optimisation would suffice, but globally-visible functions and methods are considered to be 'exported' by default, meaning that they are made part of the library's API, which prevents the compiler from being able to inline them. To resolve this, the default visibility must be changed also. Enabling link-time optimisation and changing the default visibility are done by passing the '-flto' and '-fvisibility=hidden' flags, respectively. With this, slow, stack-hungry function calls will be avoided as much as possible.

C++ Utilities
While I did not find the lack of a C standard library to be a problem, the lack of C++ niceties such as 'std::array', 'std::min', 'std::max', and 'std::eek:ptional' was, as it meant that all of the zero-overhead abstractions that I liked so much were gone. However, a post on OSDev's forum tipped me off that it is possible to add a portion of the C++ standard library to the toolchain: GCC must be reconfigured with the '--disable-hosted-libstdcxx' flag, and then 'make all-target-libstdc++-v3 install-strip-target-libstdc++-v3' can be ran to produce and install a "free-standing" C++ standard library. Now, all of the aforementioned C++ utilities were available for use!

By doing all of this, I was now able to write appealing C++ code that compiled to efficient 68000 assembly code. However, I soon became interested in creating more than just position-independent libraries: I wanted to be able to make executables.

Making Executables
I knew from the 68 Katy's Linux kernel port that bare-metal C/C++ software needs to be linked with a small bootstrapping program. This program is typically written in assembly and is responsible for initialising the hardware and copying the contents of the executable's '.data' section to RAM before finally executing the C/C++ software. This is not too dissimilar to the usual start-up process of Mega Drive games, so I was able to create such a bootstrapping program without much trouble: the program starts with a standard 68000 vector table, followed by the standard Sega boot-code for initialising the hardware ('ICD_BLK4.PRG'), followed by an instruction that jumps to the C++ code's EntryPoint function:

Code:
   dc.l    0x00000000,.Lentry,BusErrorHandler,AddressErrorHandler
   dc.l    IllegalInstructionHandler,DivisionByZeroHandler,CHKHandler,TRAPVHandler
   dc.l    PrivilegeViolationHandler,TraceHandler,UnimplementedInstructionLineAHandler,UnimplementedInstructionLineFHandler
   dc.l    UnassignedHandler,UnassignedHandler,UnassignedHandler,UninitialisedInterruptHandler
   dc.l    UnassignedHandler,UnassignedHandler,UnassignedHandler,UnassignedHandler
   dc.l    UnassignedHandler,UnassignedHandler,UnassignedHandler,UnassignedHandler
   dc.l    SpuriousInterruptHandler,Level1InterruptHandler,Level2InterruptHandler,Level3InterruptHandler
   dc.l    Level4InterruptHandler,Level5InterruptHandler,Level6InterruptHandler,Level7InterruptHandler
   dc.l    TRAP0Handler,TRAP1Handler,TRAP2Handler,TRAP3Handler
   dc.l    TRAP4Handler,TRAP5Handler,TRAP6Handler,TRAP7Handler
   dc.l    TRAP8Handler,TRAP9Handler,TRAP10Handler,TRAP11Handler
   dc.l    TRAP12Handler,TRAP13Handler,TRAP14Handler,TRAP15Handler
   dc.l    UnassignedHandler,UnassignedHandler,UnassignedHandler,UnassignedHandler
   dc.l    UnassignedHandler,UnassignedHandler,UnassignedHandler,UnassignedHandler
   dc.l    UnassignedHandler,UnassignedHandler,UnassignedHandler,UnassignedHandler
   dc.l    UnassignedHandler,UnassignedHandler,UnassignedHandler,UnassignedHandler
.Lentry:
   .incbin "ICD_BLK4.BIN"

   | Load DATA section.
   lea    (_DATA_ROM_START_).l,%a0
   lea    (_DATA_RAM_START_).l,%a1
   move.l    #_DATA_SIZE_,%d0
   move.w    %d0,%d1
   lsr.l    #4,%d0
   andi.w    #0xC,%d1
   eori.w    #0xC,%d1
   lsr.w    #1,%d1
   jmp    .Lloop(%pc,%d1.w)
.Lloop:
   move.l    (%a0)+,(%a1)+
   move.l    (%a0)+,(%a1)+
   move.l    (%a0)+,(%a1)+
   move.l    (%a0)+,(%a1)+
   dbf    %d0,.Lloop

   | Clear BSS section.
   moveq    #0,%d2
   lea    (_BSS_START_).l,%a1
   move.l    #_BSS_SIZE_,%d0
   move.w    %d0,%d1
   lsr.l    #4,%d0
   andi.w    #0xC,%d1
   eori.w    #0xC,%d1
   lsr.w    #1,%d1
   jmp    .Lloop2(%pc,%d1.w)
.Lloop2:
   move.l    %d2,(%a1)+
   move.l    %d2,(%a1)+
   move.l    %d2,(%a1)+
   move.l    %d2,(%a1)+
   dbf    %d0,.Lloop2

   | Jump into the user-code.
   jmp    (EntryPoint).l


For the bootstrapping program to be properly linked to the executable, a linker script would be needed. With a linker script, I could obtain the position and length of the '.data' section, ensure that the bootstrapping program be located at the start of the executable, specify the memory layout of the Mega Drive (so that code is expected to be located in ROM at 0x000000 and variables are expected to be located in RAM at 0xFFFF0000), and make the linker output a raw flat binary instead of an ELF executable file:

Code:
STARTUP(bin/init.o)
OUTPUT_FORMAT(binary)

MEMORY
{
   ROM (rx) : ORIGIN = 0x00000000, LENGTH = 4M
   RAM (wx) : ORIGIN = 0xFFFF0000, LENGTH = 64K
}

SECTIONS
{
   .rom() : {
       *(.text)
       *(.text.*)
       *(.rodata)
       *(.rodata.*)
       . = ALIGN(4);
       *(.ctors)
       . = ALIGN(4);
       *(.init)
       . = ALIGN(4);
       *(.eh_frame)
       *(.tm_clone_table)
   } > ROM

   .ram : {
       . = ALIGN(2);
_DATA_ROM_START_ = LOADADDR(.ram);
_DATA_RAM_START_ = .;
       *(.data)
       *(.data.*)
       . = ALIGN(4);
_DATA_RAM_END_ = .;
_DATA_SIZE_ = ABSOLUTE(_DATA_RAM_END_ - _DATA_RAM_START_);
       . = ALIGN(2);
_BSS_START_ = .;
       *(.bss)
       *(.bss.*)
       . = ALIGN(4);
_BSS_END_ = .;
_BSS_SIZE_ = ABSOLUTE(_BSS_END_ - _BSS_START_);
   } > RAM AT> ROM

   /DISCARD/ : {
       *(.dtors)
       *(.fini)
       *(.comment)
       *(.debug_str)
       *(.debug_line)
       *(.debug_line_str)
       *(.debug_info)
       *(.debug_abbrev)
       *(.debug_aranges)
   }
}


The bootstrapping program expects various symbols to be provided by the C++ code, such as 'EntryPoint', 'BusErrorHandler' and 'Level1InterruptHandler'. All except for EntryPoint are interrupt handlers, and respond to things like the vertical-blanking interrupt and various types of crashes and exceptions. These interrupt handlers need to be declared in a special way so that the compiler knows to end them with the 'rte' instruction instead of the usual 'rts' instruction. This is done by declaring the functions with '__attribute__ ((interrupt))'. All of these functions need to be declared with 'extern "C"' so that the bootstrapping program can link to them properly.

With all that done, I was able to produce a standard Mega Drive ROM file, and I was pleasantly surprised to see that it successfully booted in my Mega Drive emulator. After that, I worked on creating a library for interfacing with the Mega Drive hardware, such as reading the controllers and uploading graphical data to the video display processor. That, in turn, allowed me to make the homebrew more elaborate, eventually developing into a Columns-like block stacking puzzle game:

image-1.png

libgcc
As I was making this homebrew, I encountered a peculiar error: "symbol '__umodsi3' undefined". This is one of those helper functions that I mentioned earlier, which implements 32-bit modulo because the 68000 lacks such an instruction. These functions are provided by libgcc, which can be compiled and installed similarly to the libstdc++-v3 library from before. Instructions for this can be found in the OSDev article. Even with libgcc installed, however, the error persisted; this is because, unlike libstdc++-v3, libgcc needs to be linked to the program ('-lgcc'). With that done, the error was finally gone.

Supporting Global Constructors
As is typical for a puzzle game, the first piece that the player is given should be random. To implement this, I had a global variable that represented the colour of the current piece, and I initialised it using a call to the game's RandomColour function. I expected such code to produce a compiler error, as it does in C, however, C++ actually allows this. Yet, when I ran the game, the piece would always be the default colour. I could even booby-trap the RandomColour function to crash the game, and yet it would not. Clearly, the variable was not being initialised properly.

Leave it to the OSDev wiki to save the day again! In another article, it is detailed that a few extra assembly files need to be written and linked, and the bootstrapping code needs to call a function called '_init', which will in turn call the global constructors. The instructions given are mainly for x86 platforms, but it is easy enough to fill in the gaps for the 68000. Here are the 68000 versions of the 'crti.s' and 'crtn.s' files:

crti.s
Code:
.section .init
.global _init
_init:

.section .fini
.global _fini
_fini:
crtn.s
Code:
.section .init
   rts

.section .fini
   rts


With this done, RandomColour was finally being called and the piece was set to a random colour at the start of the game!

Because Mega Drive games are incapable of exiting, there is no need to implement global destructors. As a result, the linker script discards anything from the '.dtors' and '.fini' sectors.

Conclusion
At this point, the toolchain seems to be very complete, at least as far as free-standing C++ compilers go.

Writing C++ is way easier than writing assembly, since I do not have to worry about register allocation and implementing complex algorithms with long lists of dual-operand instructions. The code-generation can be sub-par ('compilers write better assembly than humans', my arse), but if you are mindful to work around GCC's quirks, you can still get it to produce very efficient assembly.

I have always wondered about the process of turning object files into executables and how they are executed, so to finally explore this subject has been a treat! For the longest time, C and C++ existed in a bubble to me: while I understood the languages themselves, I did not understand the environment in which they operated, and yet this was the complete opposite of how I understood assembly, as I knew the process of initialising the Mega Drive all the way down to the 68000's vector table. With this knowledge, I can finally bridge the gap between the two, and now understand how to initialise a system with assembly, and then run C/C++ code!
 
Last edited:
[Cross-post from the blog]

Long time, no see! I have been busy with many things lately, leaving me with a backlog of projects to talk about, the most recent of which being a revised version of my C++23 Mega Drive toolchain!

In summary, I have made some Bash scripts to automate the process of building and installing the toolchain, bundled them with an example program, and pushed everything to GitHub.

Changes
Since the last blog post, GNU Binutils and GCC have been updated from 2.42 and 14.1.0 to 2.43 and 14.2.0, the Mega Drive utility library 'md.h' has been greatly expanded, implementations of the C standard library 'memset' and 'memcpy' functions have been added, and Makefile scripts have been introduced to make it as easy as possible for projects to utilise the toolchain.

That last point in particular warrants some explanation: previously, each project that used the toolchain needed to manually set the appropriate CC, CXX, LD, CFLAGS, CXXFLAGS, and LDFLAGS in its Makefile, to use the correct compiler and linker script. This was messy and led to much code duplication. Now, all that is necessary is to add a single line to the start of the Makefile: ' include /opt/clownmdsdk/rom.mk'. By doing this, the Makefile will be automatically configured to produce Mega Drive ROMs.

Building the toolchain was previously done manually, as described in the previous blog post. This was very complicated, as it involved many steps that needed to be done in a particular order, and required commands which were difficult to memorise. To streamline the process, I have created a series of Bash scripts to do these automatically. Ideally, I would have used a proper Linux-style package-building script, like a 'PKGBUILD' file, but that would limit the script to only work on platforms with the Pacman package manager, such as Arch Linux and MSYS2. The Bash scripts require slightly more work from the user, but should work on any Unix-like platform. With these, it is considerably easier to build the toolchain from scratch and try it out!

A novelty that I was also able to achieve was to compile the toolchain using Clang. I recently switched from using GCC to Clang in my MSYS2 installation so that I could use sanitisers on Windows, and I did not want to keep a copy of GCC around solely to build my Mega Drive toolchain. Clang is mostly able to compile the toolchain without any modification, however, Clang's linker does not support partial linking on Windows, so GCC's build system needed patching in order to not rely on that feature.

To prevent builds of the toolchain from being dependant on a particular MSYS2 environment, I also found a way to statically link the toolchain, eliminating its dependency on environment-specific DLLs. In fact, the toolchain can theoretically be used on Windows without MSYS2 installed at all! This also allows me to bundle-up a pre-made toolchain for anyone to use. Speaking of which...

Release
The toolchain is now public, and can be found on GitHub under the name 'clownmdsdk'. All code is released under the terms of the 0BSD licence. Binaries are provided for x86-64 Windows.

Also provided is an example program, to showcase how to use the toolchain and its utility libraries. This is the same program that I developed to test the toolchain in the previous blog post:

image-1.png

There is also a certain other project that I worked on to test out the toolchain...

Bonus Project
image.png

This may look like Sonic 2, but people that are familiar with my older works may know what it really is: back in 2015, I began developing a port of Sonic 2 to PC, which was done by manually converting the game's 68000 assembly code to C. It had its chance to shine in the 2017 Sonic Hacking Contest (where it won a trophy), but since then has remained dormant.

Since my toolchain is able to compile C code to 68000 assembly, I figured that it was possible to port this PC port back to the Mega Drive. After a day of work, I found that I was correct! It lacks optimisation, so it lags a lot, but it does work! Anybody that is curious to try this port can download it here: https://sonicresearch.org/clownacy/sonic2portmegadrive.bin

Hopefully this shows that my toolchain is capable of handling serious projects, being able to power one of the Mega Drive's flagship titles!

Closing
That covers just about everything that has happened with this toolchain since the last update. In the future, I hope to finish the C++ port of my Sonic 2 Clone Driver v2 sound driver, and perhaps even integrate it into the Sonic 2 port. It would also be nice to get that Sonic 2 port running at full speed consistently.
 
Back
Top