Progress Report #6

2020 came to an end and left me with an output of two progress reports and a simple, short release note. That's less than I was hoping for, but most time this year went into improving the codebase and some performance tinkering for personal pleasure. In my defense, the last progress report had a much higher quality than the ones before, and I'd like to keep it this way!

Undefined Behavior

In that spirit, let's start with an issue, which was reported by fleroviux in June last year. She tried to play Pokémon Sapphire, and her game froze right after the intro sequence when the character shrinks in size and then enters the world in a moving truck. The same also happens in Ruby because they are essentially the same game.

  • Freezing during intro sequence
    Freezing during intro sequence
  • In the moving truck where we belong
    In the moving truck where we belong

She used the bundled replacement BIOS by Normmatt, where bugs in games are to be expected. I tried it with the original one, and the freezing stopped happening. So I closed the issue and blamed the BIOS for doing some unexpected things, but she quickly reassured me that the bug wasn't happening in her emulator or mGBA when using the same BIOS. I verified that and was left scratching my head about the possible origin of this problem.

I figured it had something to do with the BIOS implementation, but I couldn't find anything wrong with it. Many failed attempts later, GitHub suggested the pret project to me, which is the decompilation of all GB, GBA and even some NDS Pokémon games. I can't believe how people do such things, but I'll gladly use their work to fix bugs in my emulator. I skimmed through the intro-sequence related parts of the code and found this function:

static void InitBackupMapLayoutConnections(struct MapHeader *mapHeader) {
  int count = mapHeader->connections->count;
  struct MapConnection *connection = mapHeader->connections->connections;
  int i;

  gMapConnectionFlags = sDummyConnectionFlags;
  for (i = 0; i < count; i++, connection++) {
    // Handle
  }
}

At first glance, there seems to be nothing wrong with it, but this comment doesn't agree:

BUG: This results in a null pointer dereference when mapHeader->connections is NULL, causing count to be assigned a garbage value. This garbage value just so happens to have the most significant bit set, so it is treated as negative and the loop below thankfully never executes in this scenario.

camthesaxman

I never ran into this bug during testing because it has been fixed in Pokémon Emerald, and that's the game I usually use for quick testing (and pure nostalgia). The dereferenced null pointer returns something they call garbage, which is quite offensive to the poor BIOS, in my opinion. Why the BIOS? Because it starts at address zero, and that's where a dereferenced null pointer reads from.

00000000-00003FFF   BIOS - System ROM
00004000-01FFFFFF   Not used
02000000-0203FFFF   WRAM - On-board Work RAM
02040000-02FFFFFF   Not used
03000000-03007FFF   WRAM - On-chip Work RAM
03008000-03FFFFFF   Not used
04000000-040003FE   I/O Registers
04000400-04FFFFFF   Not used

The BIOS in the Game Boy Advance is read-protected to prevent dumping. Guess how that turned out. That means we can only read from the BIOS if the program counter is inside of it. In plain English: only BIOS functions can read BIOS memory. Otherwise, it will return the last read value, which will be the one located at the address:

  • 0x0DC+8 after startup
  • 0x188+8 after SWI
  • 0x134+8 during IRQ
  • 0x13C+8 after IRQ

In the case of our dereferenced null pointer, we've just returned from a SWI. The code for this in the original BIOS looks like the following:

movs      pc, lr          ; addr: 00000188  data: E1B0F00E
mov       r12, 0x4000000  ; addr: 0000018C  data: E3A0C301
mov       r2, 0x4         ; addr: 00000190  data: E3A02004

It uses the instruction movs pc, lr to move the link register into the program counter. The link register contains the next instruction after a function call, so it pretty much acts like your typical return. Because of the GBA's three-staged instruction pipeline, we've already fetched the value at address 0x190 and its value will be returned for future protected BIOS reads like dereferenced null pointers. In this case, the value has its sign bit set and the loop body is never executed.

movs      pc, lr                ; addr: 000000AC  data: E1B0F00E
andeq     r1, r0, r4, lsl 0x10  ; addr: 000000B0  data: 00001804
andeq     r1, r0, r4, lsr 0xA   ; addr: 000000B4  data: 00001524

Unfortunately, the replacement BIOS doesn't reproduce this behavior. Here we return with the same instruction, but the prefetched value is now positive and we run the loop 1524 times. I thought this was the source of the problem, but it wasn't. Until this point, the emulator did everything correctly and the bug hunt ended with an anticlimactic result.

I fixed the bug eventually when working on something seemingly unrelated. Reads from unused memory regions return values based on prefetched values in the CPU's pipeline. Some small issues in that code were fixed with this commit. It seems that the game tries to access unused memory regions at some point when running the loop with the corrupted loop counter, and returning the "proper bad value" fixes the freezing behavior.

Sprite Render Cycles

This issue was quite a simple fix compared to the previous one. The available amount of sprite render cycles is limited to 1210 if the "H-Blank interval free" bit in the DISPCNT register is set or 954 otherwise. That means the amount of sprites per scanline is limited. If you ignore that limit, you end up with something like the first image, where the sprite on top overlaps with the status bar.

  • Gunstar Super Heroes without render cycle limit
    Gunstar Super Heroes without render cycle limit
  • Gunstar Super Heroes with render cycle limit
    Gunstar Super Heroes with render cycle limit

Calculating the number of cycles it takes to render a sprite is quite easy. It takes width cycles for normal and 2 * width + 10 cycles for sprites with affine transformations. Enabled sprites with x-coordinates outside of the screen also affect this quota and programmers should be mindful to explicitly disable them instead of moving them outside the screen.

Real-Time Clock

The next thing I want to talk about is an actual new feature of the emulator. If you own a third-generation Pokémon game and start it, you will notice that it complains about a drained battery. Time-based events will no longer work because its internal real-time clock ran dry.

  • Empty battery warning
    Empty battery warning
  • Ever saw that back then? (bonus)
    Ever saw that back then? (bonus)

Until recently, eggvance perfectly emulated old Pokémon cartridges in the sense that both their RTCs don't work. It was a feature, I swear. The RTC is connected to three of the four GamePak GPIO pins as follows:

  • Serial Clock (SCK) at address 0x80000C4
  • Serial Input/Output (SIO) at address 0x80000C5
  • Clock Select (CS) at address 0x80000C6

A typical transfer looks like this:

  1. Set CS=0 and SCK=1
  2. Wait for a rising CS edge
  3. Receive command byte (described below)
  4. Send/receive command bytes
  5. Wait for a falling CS edge

Receiving a command byte looks like this:

  1. Wait for a rising SCK edge
  2. Read SIO bit
  3. Repeat until a byte has been transferred

Combining these two flows allows us to implement a functioning RTC. The documentation in GBATEK can be quite confusing in that regard because it first describes the NDS RTC and then the differences from the GBA one. Once everything had been put into place, I was able to grow berries in the Pokémon Emerald.

  • Saplings planted
    Saplings planted
  • Time to harvest
    Time to harvest

I later stumbled across a NanoBoyAdvance issue reported by Robert Peip, which mentions that the RTC doesn't work in Sennen Kazoku. The game boots and then shows an error screen mentioning "broken clock equipment". I tested it in my emulator and observed the same behavior.

  • Complaints about a bad RTC
    Complaints about a bad RTC
  • Fixed and ready for the intro
    Fixed and ready for the intro

Debugging the game showed that Sennen Kazoku didn't set SCK high in step one of the transfer sequence. I removed the conditions and then everything worked as expected. Other games with RTCs continued to work with that change, so I kept it.

 switch (state) {
   case State::InitOne:
-    if (port.cs.low() && port.sck.high())
+    if (port.cs.low())
       setState(State::InitTwo);
     break;
   // ...
 }

Accuracy Improvements

I mentioned three remaining things in the last release post: RTC emulation, improved accuracy and audio. With RTC off the list, there was only one thing left before I could start implementing audio. Even though eggvance was quite accurate, it had some problems in the timing section because it didn't emulate the prefetch buffer.

Here are some of the things I implemented/changed:

  • DMA bus
  • Memory improvements
  • Interrupt delay
  • Timer delay
  • Prefetch emulation (not perfect)

And the resulting mGBA suite coverage compared to other established emulators:

Testeggvance 0.2eggvance 0.3mGBA 0.8.4higan v115Total
Memory14561552155215521552
I/O read123123114123123
Timing4041496152014241660
Timer count-up365496610449936
Timer IRQ2865703690
Shifter140140132132140
Carry9393939393
Multiply long5252525272
DMA10481220123211361256
Edge case126110

I was happy to finally have something you could call relatively cycle-accurate. But it came at a cost. Prefetch emulation tanked performance, going from 635 fps in the Pokémon Emerald hometown down to a mere 485 fps. I was shocked, but the issue turned out to be easier to fix than expected. The MSVC optimizer just didn't inline the prefetch code.

That might not sound like a problem until you realize that we are on the hottest of paths out there. It gets called millions of times per second, so eliminating the function call overhead is very important. After force-inlining it, I was back at 575 fps which is a good value. My goal is to finish the emulator at something around the 500 fps mark for demanding games. The ones that don't utilize the CPUs halt functionality. I am looking at you GameFreak devs.

Sound?

I love my writing efficiency. I began this progress report at the start of January, with all the previous topics lined out as bullet points. Then I continued working on my emulator, implemented the FIFO channels relatively quickly, and decided to merge them. And then the squares channels. And then the wave channel. And then the noise channel. And now I'm here with a well-working APU/DSP, but it never was supposed to be a part of this report.

Intro sequence of Rhythm Tengoku with some nice stereo

I'll write another one where I describe the sound basics and also give some examples. Most of the GBA's sound is composed of FIFO samples, so it's hard to show all audio channels in action and how they are combined into a nice result.

Final Words

That's it. I'm done. The roadmap for this year is:

  • Implement all sound channels
  • Optimize sound
  • Implement a scheduler
  • Improve AGS coverage
  • Implement more features (better config, save states, whatever)
  • Final code cleanup

Then I will put this project to rest and dive into something new. I thought about writing a classic Game Boy emulator, which shouldn't take more than a month because the GBA is a supercharged version of that, and most code could be reused. Another nice thing would be an NES emulator, which was my first console back in the day. I also thought about jumping into NDS emulation, but I'm not sure if I'm good enough for that, we'll see…