Monday, October 18, 2021

Moving big chunks of text: how I use emacs

(This post is a followup to a previous post on the topic of using emacs for blobs of text) 

As a programmer, sometimes your job is to move blobs of text.

This is unavoidable for many reasons: while we strive for approaches that maximize brevity and we do our best to avoid repeating ourselves, there are tradeoffs. .Configuration files, in particular, tend to be "flat;" Logic that isn't repeated can be harder to comprehend (seeking information in two or more files to understand a concept) and if there is a mechanism for expanding a meta-configuration language into a "flat" configuration, someone has to build and maintain that mechanism.

As an example of a task I have to do often, let's assume I have the configuration for a dashboard program. The dashboard has multiple configurations in which it runs (monitoring several environments, perhaps).

A list of directories displayed in the emacs *dired* mode

Each directory might look a little different, but they all have a 'config' subdirectory.

An environment subdirectory in emacs *dired* mode

The config subdir has several files, one of which is an environment config:

A config subdirectory ein emacs *dired* mode, showing the env.conf file

Finally, env might have several key-value pairs, which vary from file to file.

The contents of an env.conf file

Let's say that I need to update where we get our data from, and for all of these environments, I now need to pull from the copernicus datasource instead of the tycho datasource. I could go through, one by one, and make the edit, but that takes time (and repeatedly re-typing the same small set of keystrokes is error-prone). When I have to manage nearly the same configuration for multiple parallel processes, I infrequently find myself needing to copy the same config change across many files. When I need to do that, I turn to emacs keyboard macros.

Because emacs is built as a text editor running in a virtual LISP machine (as discussed in my previous post), it is capable of recording and replaying every input. We can start a macro at any time with Ctrl-X, then (, (for brevity, I'm using emacs abbreviations from here forward, where the previous macro is C-x (). C-x e can then be used to replay a macro. At the end of macro replay, emacs is in a special "mini-mode" where hitting e again will play the macro again, over and over. I like this approach because it lets me break down the task into smaller steps and spot-check the work; doing these kinds of edits as a shell script is some people's preference, but I feel the shell solution is usually a bit like hammering a few nails by renting a steamroller to drive over them: quite a bit of setup, and if you mess up, you really mess up.

So here's how I approach this task:

  • Navigate to the top-level directory
  • Do the task by hand one time to check for sharp edges
  • Before doing it the second time, Ctrl-X ( to start a macro
  • Now, record the macro. As we go, we're going to be sensitive to how individual configs could vary and lean towards using commands, not the arrow keys to navigate. Arrow navigation will fail us if a subdirectory has too many files or the env.conf file has too many parameters.
    • Enter to descend to the dashboard subdir (dev, in my case)
    • Ctrl-X [, move cursor to beginning of buffer
    • Ctrl-s, interactive search forward. Type config and hit enter. Tricky bit: I have to be careful typing, if the search comes up empty here, my macro will "Bell." More on that in a bit, but if I "bell," I usually just C-x ) to stop editing the macro, finish this one out by hand, and start a macro on the next one.
    • Enter to resolve search, then enter again to descend directory
    • Same plan: beginning of buffer, C-s, env.conf, enter-enterNow that we're in the config file, C-x [ to beginning of buffer, then C-S and search for DATASOURCE=. Enter to confirm search, which moves the edit point to after the equals sign
    • C-k, kill the whole line
    • type copernicus
    • C-x C-s to save the buffer, updating the file
    • C-x k enterC-x k enterC-x k enter to get to the dashboards directory again (closing up all the subdirectory buffers and the env.conf file as we go)
    • (This is key) Hit down arrow one time to move the directory cursor to the next directory
    • C-x ) to close macro


Now that we've done it one time, I can just hit C-x e, then e e e e to update prod, remote, staging, and test.

Checking your work and the zen of emacs: buffers are state machines

So why bother with this instead of a shell script? What I like about this approach is that if something goes wrong, it's much easier to recover than a shell script. If an error occurs while a shell script is running and the script bails, I'm now in a not-great state: running the script again will try to re-edit the files that are already edited, which is rarely what I want. There's no way for the script to know; it has no context on previous runs. But emacs keeps context in the form of the point (cursor)  position in the top-level directory buffer, which doubles as a progress-tracker. This is a valuable piece of the zen of using emacs, which is worth highlighting:

Emacs buffers are stateful. The point doubles as a progres tracker.

This gives emacs a nice tradeoff between fully-manual and fully-automatic edits for repetitive tasks. The command-line is a sword; a shell script is a machine gun nest. Emacs keyboard macros are a semi-automatic weapon: every push of e is a pull of the trigger. If something unexpected happens (i.e. a search fails because the env.conf file or the DATASOURCE row is missing), emacs will take a "bell" event and the macro will interrupt, which allows me to correct the state of the world and then run the macro on the next line instead of starting over.

Using the buffer point as state opens up a couple clever tricks that I find significantly harder to do in shell. Say, for example, that instead of switching everything from tycho to copernicus, I needed to set each file to its own DATASOURCE. In shell, this'd be a little tricky; I'd have to do a lookup file of some kind. With emacs, I just create a new temporary buffer, *datasources*, in which I put a sequence of environment-name / datasource name pairs ("admin: newton","dev: einstein", etc.). Then, I'd change the procedure I described previously as follows:


  • Open the temporary buffer in a second window
  • At the beginning of the macro: before opening the directory, select the directory name and use M-w (metakey-w, usually "alt") to save the directory name
  • At the step where I'd insert copernicus, instead do C-x o to switch to the second buffer
  • C-x [ to go to the beginning of the buffer, then C-s C-y enter to search the buffer for the name of the directory
  • rightarrow to the beginning of the value next to the directory name, then C-space, C-e, M-w to select the value and copy it to the "kill ring" (emacs' concept of a paste buffer)
  • C-x o to go back to the env.conf buffer and do the copernicus replacement, but use C-y to paste ("yank") the value copied from the other buffer

Here's what it looks like.

There are many ways to solve this problem, and different coders will have different favorite approaches. I'll definitely find people who swear by using a shell script for all this. But I think the important thing is to talk about it; something I've noticed in my career is how rarely developers talk about the way they do their craft. If you're a professional developer, I definitely recommend taking some time to look into other people's favorite approaches and find what works best for you.

No comments:

Post a Comment