Thursday, August 6, 2020

Petris code and using nbasic

A few more thoughts about creating a NES game with nbasic (see previous story for the game itself).

Code to read state of the NES controllers

I chose to use nbasic for this game because I had experience with it in the past; in college, I participated in a student-taught course run by the author of nbasic, which was excellent. I still use the course's site as a reference for "toe-dipping" into NES programming (along with the vast nesdev wiki). I and a team of 2 others created a small Zelda knock-off that is, unfortunately, lost to history. But using nbasic was a positive experience, so I sought it out again when I got the bug in my brain to try and write another NES game.

I'll talk a bit about the structure of Petris's code, then about my experience working with nbasic, and my plans moving forward.

A Dive Into Petris's Code

Roughly speaking, Petris is divided into initialization, game phase loops, game logic code, PPU control code (i.e. render logic), a big chunk of raw data to control what is shown on screen, and a mandatory footer. The header and footer are still cargo-cult code for me; I started from the Starter Source on the nbasic site and expanded out from there (leaning on examples also found on the course site as I went).


Initialization begins at the start: label, where we switch off the PPU, do some things that are heavily recommended as best-practice, and configure the PPU as we wish to use it later. Disabling the PPU puts nothing on the screen, but guarantees draw behavior won't interfere with heavy-duty memory manipulation for the PPU, as the CPU and PPU both use the same latches for manipulating memory. It's useful for setup.

We then declare a set of arrays of various sizes. Nbasic allows for arrays to be declared that can be fit anywhere, or at specific locations (this is useful for naming dedicated memory addresses), or in "zeropage" memory (it's a quirk of the 6502 CPU design that RAM with a 0x00 high-order-byte is faster to access, because special opcodes can be used to fetch it with a one-byte address identifier). The design of nbasic doesn't lend itself super-well to structures, so instead of, for example, having a structure encapsulating the current controller values, values from the previous frame, and "rising edge buttons" (i.e. button-down events on this frame), I have three separate arrays, each of length 4 for four controllers.

The last step, at initgame:, is  prepping up game-state logic: timers, sprite initial state (position and animation frame), scores, and a playing flag indicating the game is in progress. Then a bunch of subroutines are called to draw the game into PPU memory (since it's a simple one-screen game, there isn't any scrolling and scroll logic is used just to flip between the main game screen and the "Player wins!" screen).

Game Phase Loops

The game has, broadly, four phases: wait for start, countdown to play, play game, and player-win screen. A series of four blocks of code handles these. Each is structured basically the same way:
  1. Wait for NMI ("non-maskable-interrupt",  the signal used for, among other things, telling the CPU that the PPU is entering vertical-blanking mode and it's safe to mess with PPU memory)
  2. Update the PPU to move things around on the screen
  3. Use the vwait_end subroutine to wait for the PPU to start drawing things on screen, which is a good time to run game logic
  4. Do game logic
Game logic varies from phase to phase: some just run counters, some ask for updates from controllers. The playing flag is set while the main-game phase is running, which other subroutines can use to decide if we're actually in the game phase.

Game Logic Code

Various subroutines in this area handle things the game needs to do to maintain game state, including reading controllers, updating scores, and updating logic that drives animations. Some of these subroutines take input in the form of assuming nbasic named variables are set (I could possibly have used the stack for this, but see "Using nbasic" for why I chose not to). They also use named variables to maintain internal state, which caused me to trip over a bug in nesasm; nesasm tops out at 32 characters for any label (a simple buffer-overrun protection), but it handles long labels by ceasing to read characters, not by throwing an error. This approach is memory-wasteful, in that each subroutine with internal state is going to take a couple bytes of memory that nothing else can use (but the alternative is coming up with a clever protocol for re-using bytes, which risks having two subroutines "cross talk" and tap-dance on each other's internal state).

Of particular note here is load_joysticks: and detect_edge:, which populates joy_edge, an array of bits representing which buttons are freshly-pressed this detection cycle. This is needed for distinguishing new taps from buttons held down (scoring is done at the rate of 1 point per fresh tap, to get that old-school "hammer on the controller for more points" feel). 

This is also where animate lives, which is a simple routine for driving "animations" in the form of x and y offsets to a sprite over time. Each sprite's position is controlled by two value sets: the actual displayed x and y position (which lives in the sprite_mem array and is flashed into the PPU every redraw) and the logical x and y position (which live in sprite_x and sprite_y). The sprite_anim_idx value is an offset into an animation data field that drives animation; every frame, the sprite's x and y offset are read from animation at sprite_anim_idx, added to sprite_x and sprite_y, and the result stored to the on-screen position in sprite_mem. Then sprite_anim_idx is rolled forward, and if it points to the sentinel value 0x80, the animation is done. This is the beginning of a mechanism that could allow for more complex animations (for example, we could add changing the displayed CHR for the sprite, or an additional sentinel value could be added to create looping animations).

Squirreled away under the PPU control code and before the raw data is a bit more game logic that includes a shamelessly-stolen randomizer. The NES itself doesn't have a "true" source of entropy (i.e. there's no built-in battery-backed clock, so without battery-backed memory, the NES's operation is fully deterministic from powerup based on code and controller inputs). The game uses the time between start and player 1 pressing 'A' to start the game to seed a random number generator driving where the dog wants to be pet.

PPU Control Code

This code controls moving things to the PPU for display. Huge, repetitive chunks of this code are loading and storing slabs of images on screen. There's plenty of room for improvement, and I sort of got better over time as I had to write more (for example, I'm not at all proud of load_dog in its current form, which is split up to account for the limitations of CPU speed to make room for screen redraws between loading chunks of the dog). In the dog-loading code, I also accounted for a limitation of the 6502 CPU architecture: when loading data from an absolute memory offset, the 8-bit math only allows at most 256 bytes of data to be addressed from one absolute offset. There are much better ways to design this (the 6502 also allows "indirect indexed" and "indexed indirect" memory access, which reads and writes values based on two bytes in zero-page memory and is the closest thing to modern pointer arithmetic the 6502 offers). The nbasic language itself doesn't have a syntax to take advantage of this feature (as far as I could tell), and I shied away from using direct assembly (for reasons I describe in "Using nbasic"), so I repeated a lot of simple "ripper logic" loops for the sole purpose of loading the same kind of thing from different points in memory.

Raw Data

The wide blocks of raw data are mostly used to map objects in the game to their positions on screen and the CHR tiles that make up their images. Generally, these come in two flavors:
  • starting PPU Y and X address, first CHR, and how many CHRs make up the image (mostly for text)
  • starting PPU Y and X address and a list of CHR bytes making up the image horizontally (for big things like the dog)
Finally, the game palette is configured to give some nice colors for the dog and four distinct player colors for the hands.

Were I to write this all over again, indirect-indexed addressing could really cut down on the amount of repeititon here, as well as in the PPU control code (possibly at the cost of speed, since the indirect CPU operations cost more clock cycles... But this is not a game in need of high-performance tuning!).

Using nbasic

Overall, I found nbasic to be relatively straightforward and much preferred to raw assembly-bashing; it makes some things easier and a few things maybe a little harder


  • The language really simplifies some things that would require verbose proper form in assembly. In particular, abstracting indirect memory access as arrays and simple arithmetic as prefix-based expressions makes that logic much easier to read than gleaning the meaning of blobs of assembly. I found I would quickly fall into a "flow" of bashing out new segments of the code.
  • The language isn't far abstracted from the underlying assembly; in places where it did something surprising, taking the compiler's assembly dump and matching statements up against the statements in nbasic was very easy. For something like this, where individual bytes can matter (for both space and CPU cycles), that lack of indirection on the abstraction is helpful.


  • I tripped over a couple of bugs in the implementation. One I was able to push a fix for; the others were more challenging to reproduce and included an issue mixing binary, hex, and decimal representations of numbers in one data statement. There are also a few cases where nbasic will emit code that doesn't pass the assembler (for example, the branchto operator doesn't confirm whether or not the branch distance is too far, trusting the assembler to know). These issues were minor and relatively easy to work around.
  • While nbasic allows users to touch the three 6502 registers directly (i.e. a, x, and y refer to the actual registers), the way they interact with nbasic code is ill-specified and very implementation-dependent. In practice, I found it safest to use assembly directly when I manipulated those variables.
  • There is no concept of functions or local state in nbasic. This could be considered both a pro and a con; the NES architecture is small, and cycles matter. Managing internal state like that costs resources. However, a lightweight function or local-state abstraction would be nice-to-have.
  • As far as I can tell, nbasic doesn't really allow for "full" pointer arithmetic; memory can be accessed by offset from absolute addresses, but accessing it via "the 2-byte address stored at this address" seems to be nonexistent (unless I missed it).
  • Partially because of the difficulty of doing pointer arithmetic, one of the bits of toil coding in this environment is repeating functions with minor tweaks, such as changing the labeled input. The ability to do templates or macro expansions would be valuable for decreasing the amount of repetition needed in the code.

Possible Future Extensions

If I were to extend nbasic, the main thing I'd add is a function-call operation. The 6502 has a stack abstraction that makes a push-call-pop argument-passing interface extremely possible. It could be a fun project to extend nbasic in this way. Local state could be a bit trickier, but coupled wit h function calls, the notion of a local variable can be managed by pushing them to the stack when a call occurs.

An abstraction for pointer arithmetic would also be valuable. I can imagine doing it by providing a simple abstraction for setting two subsequent bytes of zero-page memory to a labeled address, then allowing the [array offset] accessor to reference them. It's possible one would even want those two-byte zero-page values to have a special name (maybe a new pointer declaration, like the existing array declaration) to denote their purpose?

What is next?

There are other languages available for programming the 6502, and I'm interested in checking them out. For my next project, I plan to explore the CO2 language, which is a LISP-based system created to build the game "What Remains." I'll be interested to see how a more abstract language changes the game of making the game.

Monday, August 3, 2020

Petris: Creating a NES game

There is a nifty site called Kosmi that allows you to play NES games as a group with up to 4 players using a web-based emulator. As my friends and I have been sheltering to avoid COVID-19, we used it to play a couple of old NES games. One tricky bit is there aren't a lot of great 4-player games out there.

Of course, the NES architecture is well-understood, so I decided to make one. Based on a suggestion from my wife, I made a simple game, where the object is to pet an adorable dog. Along the way, I learned some neat things and was reminded of how fascinatingly ancient the NES architecture is.


For crafting the ROM, I relied heavily on resources put together by Bob Rost in 2004, including his own NES programming language, nbasic. It's an enhancement on the relatively standard nesasm assembly language that allows for some additional "sugar" features at a relatively low cost to program size and efficiency; it saved me a bundle of assembly-language hacking. I forked my own instance of the nbasic project (for reasons I may go into in a later blog post). 

For graphics, I used a simple editor called YY-CHR, which is tuned for the features of the NES CHR format (in particular, it respects the 4-color constraint and makes it easy to see how changes map to results in the final product). A simple Makefile to keep everything synchronized, and I was off to the races.



The NES used a 6502-series microprocessor. This is actually the first micro-language architecture I ever touched (it's the same assembly as used in the old Apple ][ computer line). By modern standards, it's quirky and ancient; the system is limited to three registers (a, x, y), 16-bit addressing, and 8-bit math; the addressing being wider than the math means the architecture isn't capable of considering one entire memory address for pointer arithmetic, so I find the need to write a lot of redundant code; very similar functions end up being separate because all references to memory need to work from different base addresses, coded into the code itself.

While it's quirky, it's enough CPU to run a game.


Editing the CHR file with YY-CHR

The 6502 chip isn't clocked nearly fast enough to push pixels to a screen at a rate that keeps the NTSC or PAL television standards happy. To solve this, the Nintendo used a custom chip called the "PPU," which kept images flowing to the screen using some very regimented rules. The rules are described on the NESDEV wiki, but some brief highlights:
  • The exact pixels generated come from tiles stored in a particular ROM (in the cartridge), the pattern table. The pattern table describes tiles of 8-by-8 pixels that can be rendered. The nesasm system can take CHR files and "bake" them directly into the ROM as the pattern table.
  • Color of the pixels is determined by palettes. For background, there is one shared "background" color and three colors that can vary in each palette; there are four total background palettes, for variation of up to 13 colors on screen at a time. Foreground has similar rules (4 palettes, 3 colors), with one color reserved as "transparent", which shows the background.
  • Background images are rendered by setting values in the "nametable," an array of values indexing into the CHR tiles. So each 8x8 chunk of the background corresponds to one tile. The color of pixels in the tile is controlled by palette selection in the "attribute tables," which select color for 16x16 (i.e. 2-tiles-by-2-tiles) segments of the screen. There are up to 4 nametable / attribute table pairs, allowing for screen scrolling.
  • Sprites are configured as X and Y position, the CHR they display (or 2 CHRs, if "8x16 mode" is selected), and some attributes controlling color, flipping the sprite, etc.
The CPU accesses all of this complexity using some "magic addresses" that set values on the PPU. Writing values to address $2006 sets an address latch in the PPU's memory, and bytes written to $2007 are then stored to that address, cursor-style (i.e. if $10, $00 is stored to $2006, then values written to $2007 will be stored at address $1000, $1001, $1002, etc.). One tricky bit is that the $2006 latch is used by the PPU when it's sending colors to the TV, so changing it when the NES is actually drawing can change what colors are output and mess up the screen. This is actually a feature, not a bug; fancy graphics effects (such as scrolling one section of the screen separately from another, like in Legend of Zelda) are implemented by altering the screen scroll mid-draw operation.


Relative to the PPU, controllers are very simple. The controller architecture in the NES is a serial bus; setting a 1 to the $4016 address tells every controller to remember what buttons are currently pressed, and then setting it back to 0 and reading from $4016 (and $4017 for controller 2) gives one button's press state at a time. Since I made a 4-player game, it also needs the button values from controllers 3 and 4; the NES "Four Score" 4-controller adapter did this by adding controller 3 to 1's output and controller 4 to 2's output, essentially making it look like controllers 1 and 2 had twice as many buttons. Not too hard to work with.

The Game

Petris is a game where you pet an adorable dog. You score 1 point for every time you pet the dog in its favorite spot, which changes as time goes on. At the end, the player with the most points (and the adorable dog) win!

To try it out, download the NES file and load it into your favorite NES emulator. If your emulator supports 4 controllers, you can play with three friends. Enjoy! :)