Monday, November 22, 2021

When not to use classes in JavaScript

... all the time.

Do you want to be more specific?

Okay, yes, I probably should.

JavaScript is secretly a very simple language with some ill-thought-out features layered on top of it. The most famous one is probably the class architecture, which started as "duck-typed" (objects can have arbitrary fields assigned to them), prototype-based (an object is only a specific kind of object because it happens to have a previously-very-visible-but-now-hidden-the-visible-field-is-deprecated-forget-I-mentioned-it-__proto__-what-is-that-even "Prototype" property that adds fields beyond the ones living directly in the object), and mutable (an object can become a different type of object by calling Object.setPrototypeOf(instance, newPrototype), which on modern browsers will stab your performance directly in the jibblies but will also change an object to another type of object). Instances of classes are very convenient—Who doesn't love getting an object and just calling myObject.doSomeStuff(args)?—and they can really help you organize your code.

Here's why you should use them less.

They require special serialization and deserialization

Since you're using JavaScript, you're probably in a web browser or talking to a web browser (note: if you are using JavaScript and not in those circumstances, I'm sending you a digital hug and I highly encourage you to explore the wide, beautiful world of programming languages that aren't a hacked-together shambling mass built atop a Netscape demo from 1995 with name and syntax specifically chosen to capitalize on the popularity of a totally-unrelated language to the Scheme that birthed it). This means that sooner or later, you're shipping data over the network to or from the server. If your data can be represented as 'POJOs' (plain ol' JavaScript objects), the entire process of converting the data to or from network format is a single pair of library calls in modern JavaScript runtimes, JSON.stringify and JSON.parse. You'll likely still need to validate the parsed data is the right format, since the Web is a hellworld full of active attackers (such as your own server code compiled to the wrong version), but you're 90% of the way there.

But if your objects are class instances? Oh dear. I'm sorry you did that to yourself. Don't forget to define a toJSON method to explicitly select what fields you want serialized, and keep it up-to-date as the class changes. If you don't, you'll get the object's "enumerable properties" (what those are is left as an exercise for the reader). And on the deserialization side, don't forget to specify a reviver function that takes pieces of your parsed JSON, pattern-matches them against the original instances, and uses the class constructor to change the object to an instance. Be careful: reviver is looking at sub-trees and you're at risk if some sub-trees with the same properties should be different classes; I recommend synchronizing the toJSON for those classes to add a 'tag' field that can be inspected to pop the POJO back into an object instance. And don't forget to synchronize all that serialization and deserialization logic with the server's representation of the data, or uh-oh!

... or you could do none of that, and just have the in-memory object representation track more closely to the on-wire model by not using classes.

They add overhead

Every instance of an object is carrying around some indication of what its prototype is. That's not a lot of data on a large object, but on a small object (such as an RGB "Color" specifier or a three-number 3D coordinate), the "what kind of thing am I" field is over 10% of the data stored in the object. And since JavaScript is so dynamic, there are very few optimizations the runtime can practically do to strip out that data. Unless your code frequently has to pass colors or 3D coordinates in the same information channel and doesn't know which (which we might consider a code smell), all those extra "Hey I'm a color here are my methods" tags are wasted space.

They're a pain in the ass to unit test

What you want to do in unit tests is confirm your functions manipulate state correctly. Because class use encourages state hiding, it makes it trickier to write unit tests... Do you want to test your private or protected methods? "No," says the purist, "you should test your public API." Okay, but the public API is relying internally on some couple dozen private functions, so now I'm writing big, jangly dependency-heavy tests to get around the fact that I can't just call myInstance.privateMethod directly and test its output. And if I'm using a mocking library, I'm now mocking stuff up in MyClass.prototype, and sometimes I'm working around private methods by adding a makeTestable public method of some kind... It's all a bit of a mess. JavaScript's class model has no escape hatches for letting test logic sneak a peek into the permission space of a class, and it's frustrating.

Inheritance and instanceof are traps

One reason people sometimes turn to classes is inheritance. The "Shape" that could be a "Circle" or "Rectangle" is the most classic example. Tricky thing is, it's also one of the more corner-case examples (are you doing GUI programming?); many inheritance hierarchies aren't nearly so clean, and JavaScript handles the classic "diamond of death" problem (where one class inherits from two other classes that inherit from the same base class) by... Not allowing multiple inheritance. Inheritance also somewhat clashes with the mutable-class "feature" of the language... You can change the type of an object by editing its prototype, but you can't edit the prototype chain to, for example, swap one instance's grandparent with some other class (if you try to do so by editing prototype relationships, congratulations... You've now mutated every class in your runtime).

When you have to dynamically determine if an unknown object is an instance of some class, you can use the built-in instanceof operator. This walks the prototype chain for the object to see if the specified class shows up anywhere as a parent. This works great until you get into anything complicated involving libraries and modules. Suddenly, you discover that your ThreeDCoordinate isn't a ThreeDCoordinate because it was built with ThreeDLib version 2.7, but you're using npm and the code you're running right now is in a library you added which is depending on version 2.9 of ThreeDLib, and no, the 2.9 and 2.7 ThreeDCoordinate classes aren't the same class even though they are 100% the same code.

So what should we do instead?

JavaScript is actually perfectly capable of supporting a functional pattern riding atop POJO data. In this model, we rarely use classes; we just build objects by calling functions that instantiate them and manipulate those objects as "bags of data referenced by field." In fact, the language's "duck-typed" nature makes this simpler than in other languages: we don't have to be overly-cautious about type. I don't need my inputs to getDistance(coord1, coord2) to really be ThreeDCoordinate objects; if they have x,y, and z fields that are numbers, I can act on them.

With POJOs manipulated by functions, I don't need special handling for serialization, my objects are much smaller (and I can't modify a prototype chain so I can't incur the expense of doing a very slow operation in a modern browser), and I can get inheritance by either extending objects (taking one object and adding fields to it... Not great, because this also incurs browser overhead) or composing objects (making a new object that has a field containing the "parent" object).

There are some possible downsides to this approach. One is the lack of enforced discipline in only having some methods available on some objects means you'll have to be more careful with your code to keep your inputs straight (it's easier to pass the wrong object to the wrong handler function if you're not referencing the methods via myObject.method()). Another is functions divorced from the data they care about can tend to end up wordy; it's no longer myCoordinate.translateX(value), it's ThreeDCoordTranslateX(coord, value). The latter is actually a namespace problem, not a class problem; modern JavaScript offers some good tooling around namespacing functions (either via modules or the "poor man's module," a class full of static functions (one weird tip; Google hates it)).

One additional downside is the lack of private members. To be honest, while I find these conceptually useful I don't find I need the language itself enforcing discipline around them these days. My experience is that the question "how private is private" is wuzzier than I want it to be. The object model enforces it as "data only visible inside the methods of the class," and I find myself needing to "jail-break" that abstraction (for testing or "friend-class" reasons) too often. For functions, I can get privacy by scoping them to the module level. For data, if the API is sufficiently complex that private data matters, I put creation and maintenance of the object behind constructor and mutator functions and only change the data through those functions.

A TypeScript plug

It's definitely worth noting that I don't program much in plain JavaScript these days. The language enables a lot of shoot-your-own-leg-off opportunities regarding its near-total lack of static type enforcement.

To implement the approach I'm describing here, what I really do these days is build my types of objects as interfaces and use interface inheritance to indicate when one object can be treated as a subset of another object. I construct objects in functions declared to return a particular interface-conforming object and write functions that take in a particular interface-conforming object. The compiler will do the work at compile time to let me know if I'm trying to pass the wrong type to the wrong function. In the relatively rare cases that I'm handling multiple types of object on the same channel, I can use tag fields (or the structure of the data) and type guards to turn mystery-typed objects into an understood type.

The zeroth rule is there are no rules

I've been hard on classes, but this walk has been a bit tongue-and-cheek. In reality, classes are an integral part of JavaScript, they aren't going anywhere, and it's okay to use them. I wanted to get us out of the mindset that they're the only way to solve problems, so I've asked you to imagine a world where we should never be using them.

I actually use them often, but I have some specific rules of thumb on when to use and when to avoid them:

1. If you have a big type family and inheritance is cheaper than special-casing

If you're dealing with a family of a dozen or more related types, where most of them share implementation but a few do have special handling needs (i.e. the traditional "shapes are circles and rectangles" problem), you may very well be better off using classes than an elaborate family of handler functions and special-case logic for switching on particular instances of the type family. In my experience, big bags of things mapping to tangible objects will fit this description.

2. Don't use them to describe data on the wire

It's hardly ever worth it to do the heavyweight serialization / deserialization of mapping data on the wire to class instances. If your data is going on the wire, keep it simple. Note that point 1 and point 2 come into conflict sometimes. There is no universal answer here; you'll be making tradeoffs one way or the other if you class-up your big type family that also serializes onto the wire. At least if you go that road, you can make implement toJSON on every class and a reviver that understands the whole class family part of the process.

3. Don't use instanceof

At this point in my career, I consider instanceof harmful; it actively conflicts with the ability to use different versions of a library in the same codebase, and fails in silent and confusing ways. It also bakes knowledge of the class hierarchy into possibly-unrelated code. Try not to do it.

No comments:

Post a Comment