Mark's Project Blog: Petris

Petris now has a theme song, courtesy of my niece. :)

I asked for some help from my nieces to come up with a small bit of music for the game, since I'm not much of a composer myself. After I described the constraints of the NES to them (two square waves, one triangle, one noise), my oldest niece was able to get it to me as a recording she put together on an iPad; I just had to get it transcribed into a format the NES would accept and a music engine to play it.

Enter FamiStudio.

FamiStudio

FamiStudio is a nifty combination of tool and framework for constructing NES music and sound effects. It consists of a couple pieces:

A desktop application for composing music or sounds against the NES hardware
Several audio engines in assembly that can be embedded in an existing NES source code

The system supports both the in-hardware features of the chip and a few software-implemented effects (such as arpeggio). Music is broken down into "patterns" and "instruments," where patterns allow for saving memory by repeating the same musical phrase and instruments combine "voicing" rules with notes to create various timbres the sound circuitry can support. I put together a FamiStudio project and dove in.

My niece gave me a fairly simple piano-and-drum piece, so I only needed two instruments for it. For the drum, I used a bassy frequency and tweaked the volume envelope to give a sharp hit with a quick die-off. The volume is quiet for one frame to give a "squishy" feel, which sounds more bass (soft mallet hitting a big membrane, maybe? I don't know much music physics ;) ).

To get the sound a bit more "drummy," I used the arpeggio effect to tweak the noise frequency higher for two frames, then back down to the selected frequency. It puts a bit of hiss on the front, like a metal rattle that dies off; not quite snare drum, but a bit more color than a flat "boomph" sound.

The square wave carries the two-note melody and only has a few tweaks. Duty cycle selects what percent of time the square wave is "on" and can vary between 12.5%, 25%, 50%, or 75% (which is just 25% again; human ear can't tell the difference). 50% gives me a good old-fashioned "pure" square wave with no flavor, which is very piano. For volume, I went with a sharp attack and a long-tail decay; no deep thought on this I just doodled on the waveform until it "felt" good.

Now that the song exists, I can export it as FamiStudio Music Code using the NESASM format, since Petris is written in nbasic and nbasic includes an escape hatch to NESASM. The result is a data file containing a representation of the music, but not the commands to play it.

Halfway there

The nbasic language doesn't include a "file import" operator, so I copied-and-pasted the content of that file between asm and endasm directives. Now, I just needed some code to play it.

FamiStudio Sound Engine

FamiStudio's GitHub repository includes an implementation of the audio engine in several different NES assembly languages. I grabbed the one for NESASM and pasted it directly into Petris's source file. The audio engine is quite featureful, and quite large; everything I'd written for Petris to that point now composes 19% of the soruce file. So I compiled and..... The assembler choked.

So I'm not sure if I'm using an older version of NESASM (version 3.1) or if FamiStudio was built against a newer version, but NESASM itself has a hard limit of about 32 characters worth of label that the FamiStudio Sound Engine completely ignores. I made and documented 14 search-replaces to trim the label lengths down to something acceptable to NESASM, and we were off to the races.

To connect the audio engine to my game, I had to run some procedures exposed by the engine. The engine documents these pretty well, but a quick summary follows. Since we're operating at assembly level and don't have a standard function-call protocol, each function documents what the a, x, and y registers have to be set to for the function to work.

famistudio_init sets everyting up. It accepts a selector of 0 for PAL timing and 1 for NTSC timing (a), and a pointer to the label at the top of the music data spit out by FamiStudio (low-byte x, high-byte y; in my case, that's untitled_music_data.
famistudio_music_play begins playing a song. It takes the index of the song to play (a); I only have one, so I just had to load a 0 into the accumulator.
famistudio_music_stop kills playback and takes no input params
famistudio_update is a "meat and potatoes" function that must be called once per animation frame (i.e. once per non-maskable interrupt cycle) to allow the engine to move the music forward. More on this later.

The nbasic language is very good about bridging to assembly, so calling these was no more complex than throwing down a small wrapper subroutine to make juggling variables easy.

It's almost certainly overkill to save all the registers, but no harm done

I added gosubs to these routines to the relevant locations, through a call to the updater into my nmi interrupt logic, and... Started an earthquake.

Here's what happened. I made the mistake of adding the update function into my code here, after drawing is completed:

You feel like you're gonna have a bad time...

Problem is, that means I'm running the audio update loop inside the vertical blanking period (the NMI fires at the start of vertical blank. This is, generally, setting yourself up for failure; if the audio processing takes so long that the vblank period ends, any more manipulation of the PPU before the next NMI fires is going to fight the PPU itself for control of state, as the PPU uses its memory latches to generate a frame of video output. Specifically in my case, vwait_end does some cleanup and sets the scroll register correctly, so firing it after vblank ends causes offset artifacts as video rendering continues with the wrong scroll value.

Fortunately, the solution is real straightforward: don't do that.

Now we're safely setting up audio while the PPU is doing the heavy-lifting of emitting one frame of video, and life goes on.

Lessons Learned

This was a fun exercise to finish, and I'm honestly very surprised at how good the public tooling is for NES audio. The audio engine available under the MIT license was a huge win, and I want to extend a huge thank you to Mathieu Gauthier (a.k.a. BleuBleu) for making that exist. He doesn't appear to have a Patreon, so drop a note on one of his video tutorials and tell him what a cool guy he is.

A few more thoughts about creating a NES game with nbasic (see previous story http://fixermark.blogspot.com/2020/08/petris-creating-nes-game.html for the game itself).

Code to read state of the NES controllers

I chose to use nbasic for this game because I had experience with it in the past; in college, I participated in a student-taught course run by the author of nbasic, which was excellent. I still use the course's site as a reference for "toe-dipping" into NES programming (along with the vast nesdev wiki). I and a team of 2 others created a small Zelda knock-off that is, unfortunately, lost to history. But using nbasic was a positive experience, so I sought it out again when I got the bug in my brain to try and write another NES game.

I'll talk a bit about the structure of Petris's code, then about my experience working with nbasic, and my plans moving forward.

A Dive Into Petris's Code

Roughly speaking, Petris is divided into initialization, game phase loops, game logic code, PPU control code (i.e. render logic), a big chunk of raw data to control what is shown on screen, and a mandatory footer. The header and footer are still cargo-cult code for me; I started from the Starter Source on the nbasic site and expanded out from there (leaning on examples also found on the course site as I went).

Initialization

Initialization begins at the start: label, where we switch off the PPU, do some things that are heavily recommended as best-practice, and configure the PPU as we wish to use it later. Disabling the PPU puts nothing on the screen, but guarantees draw behavior won't interfere with heavy-duty memory manipulation for the PPU, as the CPU and PPU both use the same latches for manipulating memory. It's useful for setup.

We then declare a set of arrays of various sizes. Nbasic allows for arrays to be declared that can be fit anywhere, or at specific locations (this is useful for naming dedicated memory addresses), or in "zeropage" memory (it's a quirk of the 6502 CPU design that RAM with a 0x00 high-order-byte is faster to access, because special opcodes can be used to fetch it with a one-byte address identifier). The design of nbasic doesn't lend itself super-well to structures, so instead of, for example, having a structure encapsulating the current controller values, values from the previous frame, and "rising edge buttons" (i.e. button-down events on this frame), I have three separate arrays, each of length 4 for four controllers.

The last step, at initgame:, is prepping up game-state logic: timers, sprite initial state (position and animation frame), scores, and a playing flag indicating the game is in progress. Then a bunch of subroutines are called to draw the game into PPU memory (since it's a simple one-screen game, there isn't any scrolling and scroll logic is used just to flip between the main game screen and the "Player wins!" screen).

Game Phase Loops

The game has, broadly, four phases: wait for start, countdown to play, play game, and player-win screen. A series of four blocks of code handles these. Each is structured basically the same way:

Wait for NMI ("non-maskable-interrupt", the signal used for, among other things, telling the CPU that the PPU is entering vertical-blanking mode and it's safe to mess with PPU memory)
Update the PPU to move things around on the screen
Use the vwait_end subroutine to wait for the PPU to start drawing things on screen, which is a good time to run game logic
Do game logic

Game logic varies from phase to phase: some just run counters, some ask for updates from controllers. The playing flag is set while the main-game phase is running, which other subroutines can use to decide if we're actually in the game phase.

Game Logic Code

Various subroutines in this area handle things the game needs to do to maintain game state, including reading controllers, updating scores, and updating logic that drives animations. Some of these subroutines take input in the form of assuming nbasic named variables are set (I could possibly have used the stack for this, but see "Using nbasic" for why I chose not to). They also use named variables to maintain internal state, which caused me to trip over a bug in nesasm; nesasm tops out at 32 characters for any label (a simple buffer-overrun protection), but it handles long labels by ceasing to read characters, not by throwing an error. This approach is memory-wasteful, in that each subroutine with internal state is going to take a couple bytes of memory that nothing else can use (but the alternative is coming up with a clever protocol for re-using bytes, which risks having two subroutines "cross talk" and tap-dance on each other's internal state).

Of particular note here is load_joysticks: and detect_edge:, which populates joy_edge, an array of bits representing which buttons are freshly-pressed this detection cycle. This is needed for distinguishing new taps from buttons held down (scoring is done at the rate of 1 point per fresh tap, to get that old-school "hammer on the controller for more points" feel).

This is also where animate lives, which is a simple routine for driving "animations" in the form of x and y offsets to a sprite over time. Each sprite's position is controlled by two value sets: the actual displayed x and y position (which lives in the sprite_mem array and is flashed into the PPU every redraw) and the logical x and y position (which live in sprite_x and sprite_y). The sprite_anim_idx value is an offset into an animation data field that drives animation; every frame, the sprite's x and y offset are read from animation at sprite_anim_idx, added to sprite_x and sprite_y, and the result stored to the on-screen position in sprite_mem. Then sprite_anim_idx is rolled forward, and if it points to the sentinel value 0x80, the animation is done. This is the beginning of a mechanism that could allow for more complex animations (for example, we could add changing the displayed CHR for the sprite, or an additional sentinel value could be added to create looping animations).

Squirreled away under the PPU control code and before the raw data is a bit more game logic that includes a shamelessly-stolen randomizer. The NES itself doesn't have a "true" source of entropy (i.e. there's no built-in battery-backed clock, so without battery-backed memory, the NES's operation is fully deterministic from powerup based on code and controller inputs). The game uses the time between start and player 1 pressing 'A' to start the game to seed a random number generator driving where the dog wants to be pet.

PPU Control Code

This code controls moving things to the PPU for display. Huge, repetitive chunks of this code are loading and storing slabs of images on screen. There's plenty of room for improvement, and I sort of got better over time as I had to write more (for example, I'm not at all proud of load_dog in its current form, which is split up to account for the limitations of CPU speed to make room for screen redraws between loading chunks of the dog). In the dog-loading code, I also accounted for a limitation of the 6502 CPU architecture: when loading data from an absolute memory offset, the 8-bit math only allows at most 256 bytes of data to be addressed from one absolute offset. There are much better ways to design this (the 6502 also allows "indirect indexed" and "indexed indirect" memory access, which reads and writes values based on two bytes in zero-page memory and is the closest thing to modern pointer arithmetic the 6502 offers). The nbasic language itself doesn't have a syntax to take advantage of this feature (as far as I could tell), and I shied away from using direct assembly (for reasons I describe in "Using nbasic"), so I repeated a lot of simple "ripper logic" loops for the sole purpose of loading the same kind of thing from different points in memory.

Raw Data

The wide blocks of raw data are mostly used to map objects in the game to their positions on screen and the CHR tiles that make up their images. Generally, these come in two flavors:

starting PPU Y and X address, first CHR, and how many CHRs make up the image (mostly for text)
starting PPU Y and X address and a list of CHR bytes making up the image horizontally (for big things like the dog)

Finally, the game palette is configured to give some nice colors for the dog and four distinct player colors for the hands.

Were I to write this all over again, indirect-indexed addressing could really cut down on the amount of repeititon here, as well as in the PPU control code (possibly at the cost of speed, since the indirect CPU operations cost more clock cycles... But this is not a game in need of high-performance tuning!).

Using nbasic

Overall, I found nbasic to be relatively straightforward and much preferred to raw assembly-bashing; it makes some things easier and a few things maybe a little harder

Pros

The language really simplifies some things that would require verbose proper form in assembly. In particular, abstracting indirect memory access as arrays and simple arithmetic as prefix-based expressions makes that logic much easier to read than gleaning the meaning of blobs of assembly. I found I would quickly fall into a "flow" of bashing out new segments of the code.
The language isn't far abstracted from the underlying assembly; in places where it did something surprising, taking the compiler's assembly dump and matching statements up against the statements in nbasic was very easy. For something like this, where individual bytes can matter (for both space and CPU cycles), that lack of indirection on the abstraction is helpful.

Cons

I tripped over a couple of bugs in the implementation. One I was able to push a fix for; the others were more challenging to reproduce and included an issue mixing binary, hex, and decimal representations of numbers in one data statement. There are also a few cases where nbasic will emit code that doesn't pass the assembler (for example, the branchto operator doesn't confirm whether or not the branch distance is too far, trusting the assembler to know). These issues were minor and relatively easy to work around.
While nbasic allows users to touch the three 6502 registers directly (i.e. a, x, and y refer to the actual registers), the way they interact with nbasic code is ill-specified and very implementation-dependent. In practice, I found it safest to use assembly directly when I manipulated those variables.
There is no concept of functions or local state in nbasic. This could be considered both a pro and a con; the NES architecture is small, and cycles matter. Managing internal state like that costs resources. However, a lightweight function or local-state abstraction would be nice-to-have.
As far as I can tell, nbasic doesn't really allow for "full" pointer arithmetic; memory can be accessed by offset from absolute addresses, but accessing it via "the 2-byte address stored at this address" seems to be nonexistent (unless I missed it).
Partially because of the difficulty of doing pointer arithmetic, one of the bits of toil coding in this environment is repeating functions with minor tweaks, such as changing the labeled input. The ability to do templates or macro expansions would be valuable for decreasing the amount of repetition needed in the code.

Possible Future Extensions

If I were to extend nbasic, the main thing I'd add is a function-call operation. The 6502 has a stack abstraction that makes a push-call-pop argument-passing interface extremely possible. It could be a fun project to extend nbasic in this way. Local state could be a bit trickier, but coupled wit h function calls, the notion of a local variable can be managed by pushing them to the stack when a call occurs.

An abstraction for pointer arithmetic would also be valuable. I can imagine doing it by providing a simple abstraction for setting two subsequent bytes of zero-page memory to a labeled address, then allowing the [array offset] accessor to reference them. It's possible one would even want those two-byte zero-page values to have a special name (maybe a new pointer declaration, like the existing array declaration) to denote their purpose?

What is next?

There are other languages available for programming the 6502, and I'm interested in checking them out. For my next project, I plan to explore the CO2 language, which is a LISP-based system created to build the game "What Remains." I'll be interested to see how a more abstract language changes the game of making the game.

Mark's Project Blog

Thursday, November 5, 2020

Adding music to a NES game