Chris White - Emulation & Decompilation

Monday, February 22, 2010

C Compilers

Following my previous post, where I questioned whether Gauntlet's code was written in C as opposed to assembler, I've come to the realisation that OutRun's slave CPU code was probably generated by a compiler and is not hand-written assembler.

The slave CPU controls the road rendering. So essentially it generates the curves, height variation, road splitting and appearance, from the level data.

This has added an extra layer of obfuscation to the decompilation, as the resulting code is less logical and is convoluted to follow. Maybe AM2 had spare CPU cycles to play with, and decided to simplify the source code to this complex area by writing the code in C, as opposed to assembly code as used by the main CPU.

One of the first instructions sets one of the address registers to point at the start of RAM:

lea ($60000).l,a5

The a5 register is never changed and there seems to be an over-reliance on using this block of memory, where a data register would be much faster:

move.l  d1,$712(a5)
; ...
add.l   $712(a5),d2     ; Why not use add.l d1,d2?

And then there are blocks of code that are just pure spaghetti or irrelevant. I don't know much about compilers, but back in 1986 it's clear they produced dreadful code.

ROM:00001C80 tst.w   $720(a5)
ROM:00001C84 beq.w   *+4          ; What is this here for?
ROM:00001C88 addi.w  #$100,d1

On the other hand, a quick look at Gauntlet's code in Mame's debugger does not seem to yield equal levels of insanity. Maybe Gauntlet was coded in assembler after all.

I'd like to know what compiler Sega/AM2 were using. Does anyone know if it's possible to determine from a signature in the code? Is C the likely source language?

Labels:

6 Comments:

  • This notation is used a lot in 68000 code. I remember using it on the Atari ST and Amiga. It is used to make the code "PC relative" depending on where the operating system loads the code into memory.

    I guess they were assembling the code and testing it on a test rig, not burning it to a ROM every time so they'd normally use lea start(pc), a5 during testing. Where "start" is a label in the assembler before the first line of code. For the arcade hardware they'd need to use absolute addressing so they lea ($60000).l,a5. It saves having to use absolute addresses everywhere. They probably had an IFDEF for their build mode.

    ROM:00001C80 tst.w $720(a5)
    ROM:00001C84 beq.w *+4 ; What is this here for?

    tst.w sets the Z flag, so if 0 then skip else addi.w #$100,d1

    If they did use a C compiler then that's pretty efficient for an if-then-else.

    By Blogger Michael, at 3 March 2010 11:27  

  • Thanks for the reply.

    I understand the notation and it has it's uses, but unless I'm mistaken it doesn't make sense in this situation. (This is just one of many examples of bizarre code the sub CPU runs, there are thousands of lines riddled with weirdness).

    The BEQ simply jumps to 0x1C88 depending on the zero flag as you point out. However, the code will fall through to the same address regardless of whether the zero flag is set. Make sense?

    By Blogger yt, at 3 March 2010 11:38  

  • One other point that leads me to the compiler conclusion, is that the main CPU code is written in a different style.

    It's perfectly logical, there is virtually no redundant code and it's clearly been written by hand in assembler. So far the main CPU code has been relatively painless to work with.

    The sub CPU, on the other hand, is littered with dead code, duplicated code and misses basic optimisations that even a beginner would spot. The structure/organisation of the routines don't make a lot of sense even when you figure out how they work... you end up thinking "why code it like that?!"

    On the plus side I'm nearly done with the sub CPU. :)

    By Blogger yt, at 3 March 2010 12:02  

  • RE:
    ROM:00001C80 tst.w $720(a5)

    It could be the conditional resultant in the final assembled code, regardless, indicates perhaps the (routine/that address) previously served as a testing point to isolate portions of code and now no longer relevant.

    The dead code/ IE: classic NOP instr$ is a good indictor of a c-compiler/maybe to align the code to even addr$ too.

    Michael's postings to absolutely spot on regarding the use of (a5).

    Another example on the Amiga would be:
    lea $dff000,a5 (base addr$ of custom registers)

    then move data indirectly:
    move.w #$fff,$180(a5) as opposed to move.w #$fff,$dff180


    If that "beq.w" *is* C compiled that is quite efficent optimised code. As a compiler would have probably used the CMP.W then a BEQ adding extra instr$ cycles.

    I have an old archived website resource that as some 68000 ref material.
    http://amigarealm.whdownload.com/computing/nkb14.htm (see the PDF file)

    Regards.

    By Blogger Paul Andrews Jr., at 4 March 2010 18:30  

  • This post has been removed by the author.

    By Blogger yt, at 4 March 2010 20:01  

  • Ah - interesting stuff. Yes, what Michael said about the PC relative code makes sense. It was the redundant BEQ which confused me more, and your idea about the testing point is a definitely a good one.

    I guess the road generation code could easily be reused from a previous Sega title as well.

    Thanks for the link.

    By Blogger yt, at 4 March 2010 20:02  

Post a Comment

<< Home