Friday, September 25, 2020

Handwave code and using co2

Let's break down the code in Handwave to see how co2 works as a NES game development language.

Controller logic code in Handwave
Code to read state of the NES controllers



Here, I'll talk a bit about Handwave's code and some of my experiences working with co2 as a programming language.

A Dive Into Handwave's Code

Not unlike my previous game, Handwave is broken up into initialization, game phase loops, game logic, controller logic, PPU control code, and raw data. Additionally, there is audio logic to control the APU, and two large state machines to read the music data (one to convert the data to notes on the screen, and the other to drive detection of when notes should be played and trigger the APU correctly). I was able to get started from co2's packaged-in example code and leaned on some snippets of What Remains's source code for examples and language quirks.

The source code is here for following along.

Initialization

At the top of the file is a nes-header macro that declares some "hardware" configuration (this basically tells emulators what sort of cartridge this game would have run on, were it a real cart). We then have a set of defaddr and defconst declarations to name things. I absolutely love that this language allows for constant declarations; takes a lot of cognitive load off of managing the code to be able to name things!

The game's code code essentially has two entrypoints defined by the (defvector) declaractions: reset is the code run when the system is powered on (or the reset button is pressed), and nmi is the code run every time the vertical blank signal (i.e. a "non-maskable-interrupt") is sent to indicate we're between frames of video. In reset, we do the regular setup for preparing the game (initializing variables and rendering an initial screen to the PPU). Several init- and load- subroutines live here, which prep various pieces with various patterns of data. co2 provides several convenience routines in its "standard library," as it were, that simplify this process; ppu-memset is sugar on top of initializing a range of PPU with a constant value (by storing an address to the PPU_ADDR address and then looping across storing data to the PPU_DATA address); ppu-memcpy is similar sugar for ripping a chunk of RAM into the PPU. A variety of set-sprite- methods do the math to set pieces of sprite data (assuming the sprite data is positioned at the recommended #x200 address).

Game Phase Loops

At the moment, the game has two phases: waiting for start, and playing. Since there are only two, a simple g-playing global variable tracks which one we're in. Every nmi, we first update everything that requires the PPU to be quiescent (;;; TIMING CRITICAL CODE), then we drop into the relevant loop depending on mode.

In the waiting-to-play mode, we listen for buttons to indicate players want to play (check-wave-activations), listen for "netcode mode" to be toggled (toggle-netcode), clear the logged-in players if player 1 hits select (clear-active-handwaves), and start when start is pressed (by setting g-playing to #t). "Netcode mode" is a feature I added to make the game easier to play on multiple machines over a network, which I'll explain in a future post.

Play mode scrolls the notes, checks to see if players are playing notes, and runs the animation loop for moving sprites around.

Game Logic

Several pieces of game logic live here, split into their own subroutines. (check-wave-activations) looks to see if controller buttons have been pushed to sound the handwave "bells". Any buttons pressed trigger (on-wave-triggered), which either logs the player in (if we're not g-playing) or plays the note (there are two ways notes play, depending on whether we're in "netcode mode" or not).

There's a utility function buried in here that I want to highlight: (roll-left-n). This sub gets around a small "bug" (really, a lack of feature) in co2: the left-shift operator (<<) can only accept a constant value as the number of bits to shift (note: this is still more functionality than raw 6502 assembly gives us; the underlying ASL operator always shifts left only one bit!). This utility function accepts a second argument and uses it as the number of bits to shift. The name isn't great; I forgot that ROL is also a 6502 assembly operator, which "rolls" a field of 8 bits (moves the MSB to the LSB and moves every other bit one towards MSB). I like now easy it is in co2 to lay down subroutines like this; now that I have it, it's just as easy to use as the existing (<<) routine (though a lot more CPU-intense).

(handle-next-draw-note) is one of the two "parsers" for the music representation language I created. The language consists of bytes that are either "nodes" or "directives." A "node" guides how music is played, and can be a note or a rest; a "directive" controls the musical staff. It currently supports ending the song, but in the future it can be used to indicate changing the voicing of a handwave. The format is detailed in the area headed by ;;;; SONG DATA. This parser is called once every "beat" ( about 8 frames of animation); it consumes 1 or more bytes of music information to determine what notes should be rendered at the right edge of the screen. The return value is a state machine flag (either read another byte or pause reading for n beats to rest) and 16 bits indicating which notes should be drawn; input to the function is 16 bits serving as accumulators for the notes to be drawn. A global, g-song-render-index, tracks where in memory we are.

Pointers are a very cool feature of co2; they smooth over the fact that in the 6502, arithmetic registers are 8-bits wide but the address space is 16 bits. Several specialized opcodes in the 6502 allow you to do indirect indexed lookup of memory by treating two sequential bytes in "zero-page" memory (the addresses $00-$FF) as one 16-bit address. So co2 has utility functions to represent a single 16-bit zero-page value and to use those values to "peek" and "poke" (read and write) memory and to increment the pointer (taking care of the overflow in the math to step the MSB of the 16-bit address when the LSB overflows). Handwave uses two pointers to track song progression: the g-song-render-index draws notes, and the g-song-play-index handles audio playback, 27 beats behind the first pointer (to give notes time to crawl right-to-left across the screen).

Shortly after (handle-next-draw-note) is (handle-next-play-note). It's very similar in structure, but instead of adjusting PPU state, it determines if the player is trying to sound the note and adjusts the audio state. The state machine itself is basically the same, and it also takes in two 8-bit accumulators to track the state of all 16 playable "handwaves." There is some fanciness to account for netcode mode; outside of that mode, the logic determines if a note is for a handwave not controlled by a player and auto-plays it, but inside of that mode it also tracks whether the player is trying to play a note and might be lagging.

Controller Logic

The (read-joypads) sub pulls data from all four controllers and updates a set of global variables:
  1. pad-data, which tracks which buttons were pressed
  2. pad-data-last-frame, which tracks button presses on the previous run of read-joypads
  3. pad-press, which is the buttons that are newly-pressed (i.e. "button down" events)
This data is read from (button-pressed), which gives a few into the buttons newly-pressed this read.

Graphics Logic

(plot-notes) draws 16 notes onto the column of the staff just off-screen, using (find-scroll-edge) to locate which column should be updated. Next, there's an (anim-sprites) routine. This is a rewrite of a similar routine I built for Petris; it runs through some small lists of offsets to sprite position each frame to "bump" the sprite until a "stop" indicator of #xFF is reached, which ends the animation. Setting a memory location to an index of one of the animation offsets starts applying the animation to the sprite the next time the subroutine is called.

Audio Routines

The NES audio toolkit consists of five waveform generators and a simple mixer to combine them. Handwave currently uses only the two square wave ("pulse") generators (the sawtooth generator has no volume control, which makes it a poor fit for a bell-like sound). The audio processor provides a limited amount of automation, of a sort; it's capable of automatically tracking note play duration, a pitch-bend effect, and a "fade" effect (which we use to give a nice bell sound). Timbre can also be adjusted by tweaking the duty cycle; the square wave can be "on" for 12%, 25%, 50%, or 75% of the time. We use a 50% pulse width and a volume-decay of 3, so the envelope logic diminishes volume once per 3+1 = 4 quarter-frames, i.e. once per audio frame (audio runs at about 60Hz timing on frames); since the decay level steps from 15 to 0, the sound decays over 15 frames, which is about 1/4 second.

For Handwave, notes are played by (play-note). We set the pitch from a precomputed lookup table for the 16 notes (calculated by the logic in song.scm, which I'll explain in a future post). A simple flip-flop value in g-which-pulse alternates between 0 and 1 every time a note is played to choose whether it's played on the first or second pulse generator (so up to 2 notes can sound simultaneously).


Data Section

The final portion of the code hard-codes various data values, including which buttons play which handwaves, the note pitches, the palette configs for sprites and background, the animation for the handwave icons to "bump" when they're played, and the song data itself.

How NES programming differs from modern web apps

My day job is to do user interfaces for web applications. One of the things I enjoy about NES programming is that the discipline itself has different best practices; I like working on something that makes me shift my focus and remember there are other modes of development. Some big differences:
  • You own the world: Modern app development often has to account for the possibility that the user is doing multiple things. In a web app, the user could close or reload the page at any time, or could put the whole machine to sleep or disconnect the network. None of that is a real concern in a NES game; nothing else is fighting for your resources, the entire CPU, graphics, and audio systems are your own, and if the user resets, it's because they want to restart the game. Battery-backed games can find the need to care about saving and restoring state, but simple games don't worry.

  • Pointer arithmetic is different from data arithmetic: In almost all modern programming on a system of at least the complexity of a CPU, the width of data and a memory address are the same. All general-purpose-computer architectures in this half-decade are using 64-bit address space and 64-bit arithmetic (at least!). And addresses are unlikely to get larger (2^64 is enough indexes to assign a unique number to every grain of sand on every beach on Earth, with a bit left over... that "bit" being "a second Earth's beaches"). It's extremely convenient for memory addresses to be smaller than (or the same size) as numbers the CPU can do math on; that means "pointer arithmetic" is just regular arithmetic.

    The 6502-series microprocessor in the NES uses 8 bits for arithmetic and 16 for addresses. Address math requires setup, relative to regular math. And the CPU has a whole set of special (slower) opcodes to allow for doing operations on memory locations referenced by 2-byte "pointers"---as long as those pointers live at a memory address between #x00 and #xFF. Fortunately, co2 abstracts much of this with special pointer functions.

  • Global variables are best practice: In big systems applications, context and local state is preferred over global variables because "global" means "anything can read or write it." That gets messy fast when you have multiple components touching multiple pieces; it quickly becomes a system where nobody knows what's going on.

    In a small system like a NES game, we don't have the luxury of context. The stack is extremely shallow and intended mostly for remembering subroutine addresses. And as we've seen, scrubbing back and forth across memory takes extra setup and precious CPU cycles. So for many things (especially game state and position of things on screen), global variables rule the day. There's no reason to pass a pointer down four or five levels of a call stack to reference the beginning of sprite memory if every part of the code that works with sprites just knows the address of the first one!

    Unfortunately, this best practice harms one thing: code re-use. Keeping state local makes it easier to copy-and-paste a chunk of code without dangling references. This matters a lot less in the NES world (not much code can be sensibly copy-and-pasted, since every game is so different), but it does matter.

    Worth noting is that this best practice doesn't imply we never use local state. Like all best practices, it has exceptions; utility subroutines that could be called from many places are harder to use if they need to use global variables to "pass arguments." The co2 language has a great "compiled stack" feature to address this, which can intelligently carve up spare memory at compile-time into slices that are mutually-exclusive, so they can be re-used by different subroutines. You get the benefits of modern heap-allocated memory without heap management.

What comes next?

In subsequent posts, I'll talk a little bit about the mini-language I put together to craft songs and the hack I added to account for lag as we tried playing the game using Kosmi.


Tuesday, September 8, 2020

More NES Roms (and languages!): Writing Handwave in co2

Another NES game! This time, I wanted to make a cooperative music game in a similar vein to Rock Band. Handwave is a game for up to eight or sixteen players (sharing 4 controllers). As notes scroll across the screen, you push your button to sound your note, like in a handbell choir. No scoring; you know you're "winning" when the song sounds good.

Ding

For this game, I changed up my tools just a bit: I changed the emulator I tested against and changed the language and tooling I used to write the game. This post talks about those changes; subsequent posts will talk a bit about the difference between the new language and nbasic, break down the code, go into detail about the mini-language I wrote to make the music, and talk a bit about the challenges I faced trying to make a multiplayer music game work over a network.

FCEUX emulator

FCEUX is an emulator that is popular in the TAS (tool-assisted speedrun) crowd. It sports several features over the one I previously used, Nestopia: the main ones I care about are the ability to reload the ROM completely without closing and re-opening it (ctrl-F1) and a memory watcher / debugger / editor, which proved useful for tracking down some hard-to-see bugs. It's also the basis code for the NES emulator that runs in-browser on the Kosmi website, which is where we play NES games.

One issue I ran into is that the link one gets from the FCEUX main site isn't great; it's hosting a many-years-old version that has some known bugs (the one that bit me is that it mis-maps controller inputs 3 and 4). The best source for working copies of the emulator is nightlies hosted on GitHub.

co2 language

co2 is a racket-derived language and build tool that generates asm6-compatible assembly. It's in the LISP family of languages, which makes it quite a bit more distant from raw assembly than the nbasic I've used previously. That having been said, I really enjoyed it and can give it a strong recommendation, even in its current slightly-buggy state. Here's what I encountered that I'm most happy about.

Pros

  1. clean function abstraction: co2 offers more abstractions atop the machine code than I found in nbasic (or, obviously, in rolling raw 6502 assembly). The language relies heavily on subroutines (created with (defsub)), which can take (and return) multiple arguments. This is a significant departure from the nbasic way, which was arguments passed by developer's choice of standard.
  2. local variables: co2 offers local variables via a system of intelligently auto-assigning free memory in such a way that no call hierarchy "stomps on" the memory used by any caller in the call chain. It pulls this trick off via a method called "compiled stack," which precludes recursive calls but otherwise works (and operates statically; each subroutine knows at compile time what RAM is safe to use for local variables, so you're not paying the overhead of stack manipulation or some kind of heap manager on a system with meager CPU cycles to devote to such a thing). For me, this feature was killer; I didn't have to manually keep track of the bytes used by individual functions and worry about whether they'd clobber each other.
  3. standard library support: Relative to nbasic, the set of basic NES PPU and RAM manipulation features in co2 is fairly robust. There are utility methods for setting pieces of sprite data (set-sprite-id!, set-sprite-y!, etc.), RAM memset and memcpy utilities, complex racket-derived flow control constructs (like cond and while), and even the ability to do some limited 16-bit manipulations (mostly supported at compile-time). 
  4. racket: I'm sure some people will put this in the "cons" category, but I'm very comfortable with LISP-family languages, and racket adds some great features atop the basic LISP model that make it very attractive for writing domain-specific languages. I was able to follow the pattern of co2's design to derive my own micro-compiler for a music mini-language that simplified programming the game's music.

Cons

  1. bugs: I tripped over a few, enumerated later.
  2. documentation: It's incomplete. The team is extremely receptive to donations of more docs (and if I get time, I plan to add a few more). But I did have to learn enough racket to read the source code so I could understand what some functions and standard library routines did. I have some unanswered questions as a result (such as how to use defbuffer and whether it can be used with poke!).
  3. runtime: While it didn't bother me, the environment co2 imposes on your game does create a runtime cost; certain memory locations are reserved for co2's own use. This could introduce challenges if you find yourself needing to squeeze out every CPU cycle, or if you try and copy-paste assembly code that makes assumptions about memory layout. Personally, I think the benefits far outweigh the cost of having someone dictate my memory layout.

Bugs

There are a few bugs in co2's implementation. Note: these are all observed as of this commit and might be fixed in future versions).
  1. Some constructs should work but don't, for example, 
    (peek animations (+ 1 (<< cur-anim-frame 1)))
    fails to compile with "ERROR arg->str: (<< cur-anim-frame 1)", which appears to be a sub-error due to the compiler failing to stringify the (<< etc) list.
  2. The compiler can emit code the assembler can't handle (for example, "branch out of range" failure for large if statements). The solution to that one is to bundle the too-large expression into a subroutine.
  3. Attempting to (<< constant variable) is an error, but the error is "assert failed: variable #<procedure:number?>". This one makes sense in the sense that in the underlying 6502 assembly, left- or right-shifting by a variable isn't a supported feature (i.e. both operators only take a literal for number of bits to shift), but it's a bit surprising that the language doesn't abstract that detail. I went ahead and wrote a variable-amount bit-shifter in the game to compensate.
  4. Mixing multi-return with a subroutine call doesn't work (i.e.
    (set! bar 1)
    (return foo bar)
    is fine, but
    (return foo (calculate-bar args))
    gets dicey)
  5. (loop n a b body) accepts literals and variables as a or b but not expressions that evaluate to a value. If you store the result of the expression evaluation to a temporary variable, you're fine.

Future development

Handwave isn't done. Some features I'd like to add in the future:
  • Support for all five sound channels on the NES (currently, it only uses the two square waves)
  • The ability to "put down and pick up" a "wave" (i.e. change voicing rules for one of the notes mid-song)
  • Letting a wave "ring" vs. "damping" (i.e. holding a button down to keep a note sounding; makes the most sense for sawtooth sounds, which tend to run continuously)
  • Some sort of scoring
  • Improvements to the "netcode" for multiplayer over an environment with latency

Conclusions

I really enjoyed putting this game together, and the excuse to play around in a very unique LISP-family language. If you'd like to play around with (or play) the game, give it a try! The ROM and source are available on GitHub.

Have fun!