brokensandals.net -> Technical -> Stuff I learned writing a javascript tracer

Posted 2020-02-12.

I haven’t done a ton of work in javascript, and I’ve generally only needed a superficial knowledge of it. Recently I wanted to make something that would let me run code in the browser and display what happened at every step of its execution. Javascript was the obvious choice for this, but I needed to learn a lot more about the language. Here’s an assortment of factoids that stood out to me along the way.

Focusing on weird or suprising bits may come across as negative, but actually this project improved my opinion of javascript dramatically. It may lack the conceptual elegance of some languages, but ES2015 and later are perfectly pleasant to work in.

Sorting numbers is an uncommon use case, right?

Let’s start with a tidbit that’s probably widely known: calling sort on an array sorts the elements as strings by default. Remember that or you’ll get surprising results when you sort an array of numbers.

[1, 2, 10].sort() // => [1, 10, 2]

NaN isn’t NaN except when it is

Also well-known is that NaN isn’t equal to itself according to ===.

NaN === NaN // => false

You might expect this to cause problems if you use NaN as a key in a Map, but map keys actually use a slightly different definition of equality in which NaN does equal itself.

const m = new Map();
m.set(NaN, 'hi!');
m.get(NaN) // => 'hi!'

Symbols

Symbols are useful for when you want to attach a property to an object without worrying that its name might collide with other properties.

The existence of symbol properties means Object.getOwnPropertyNames alone doesn’t give you a complete list of an object’s properties.

const obj = {
  str: 'foo',
  [Symbol('sym')]: 'bar',
};

Object.getOwnPropertyNames(obj) // => [ 'str' ]
Object.getOwnPropertySymbols(obj) // => [ Symbol(sym) ]

The selling point of symbols is that they can be unique even when their descriptions collide:

const sym1 = Symbol('foo');
const sym2 = Symbol('foo');

sym1 === sym2 // => false
sym1 === sym1 // => true

const obj = {
  [sym1]: 'a',
  [sym2]: 'b',
};

Object.getOwnPropertySymbols(obj) // => [ Symbol(foo), Symbol(foo) ]

I wanted my tracer to display as much information as possible about the values it traces, which means it should not only show a symbol’s description, but also enable you to tell whether two symbols traced at different points would be considered equal or not. To accomplish that I had the serializer keep a map of all symbols it encounters and assign an ID to each one.

Properties on primitives

One weird bit is that symbols are totally immutable, but quietly ignore attempts to change them.

const sym = Symbol('foo');
sym.description // => 'foo'

sym.description = 'bar';
sym.description // => 'foo'

sym.x = 'y';
sym.x // => undefined

In fact, most primitive types seem to behave that way, except for null and undefined:

const str = 'hi';
str.x = 'y';
str.x // => undefined

const num = 3;
num.x = 'y';
num.x // => undefined

const bool = true;
bool.x = 'y';
bool.x // => undefined

const nullval = null;
nullval.x = 'y'; // throws TypeError

const undefval = undefined;
undefval.x = 'y'; // throws TypeError

Weak maps and symbol keys

I was pleasantly surprised to learn javascript now has a WeakMap type. This allowed me to write an encoder/serializer for arbitrary objects that assigns consistent IDs to each object over time, without leaking memory¹.

I was unpleasantly surprised to learn that WeakMaps can’t use symbols for keys:

const wm = new WeakMap();
wm.set(Symbol('foo'), 'bar'); // throws TypeError

My encoder has to remember each symbol in case it appears again, and has no way to signal to the garbage collector that the map entry for a given symbol only matters if there are other references to that symbol too. So, if a program were to continuously create new symbols and pass them to my encoder, the program would eventually run out of memory. Bummer. Not likely to come up, though.

Getters that return functions

As you may know, javascript binds the this keyword based on the calling code. For example:

const obj = {
  name: 'world',
  greet() {
    console.log('hello ' + this.name);
  }
};

obj.greet() // => hello world

const f = obj.greet;
f() // => hello undefined (or TypeError if you're in strict mode)

Now, it’s clear what this code will do:

const obj = {
  name: 'world',
  get greeting() {
    return 'hello ' + this.name;
  }
};

obj.greeting // => 'hello world'

But it’s less obvious, at least to me, that the following will do the same:

const obj = {
  name: 'world',
  get greetingMaker() {
    return function() {
      return 'hello ' + this.name;
    }
  }
};

obj.greetingMaker() // => 'hello world'

Apparently, given x.y(), where y refers to a getter function and that getter itself returns a function, javascript binds this to x for both the invocation of the getter function and the invocation of the function returned by the getter.

Changing Function.prototype.apply

Javascript is highly dynamic; you’re free to change the behavior of a lot of built-in things.

Consider the apply method:

function greet() {
  console.log('hi!');
}

greet(); // => hi!
greet.apply(); // => hi!

What if we alter it?

greet.apply = () => console.log('go away.');
greet(); // => hi!
greet.apply(); // => go away.

OK, so that affects explicit calls to .apply, but has no deeper implications. Boring.

But what if we try to change it everywhere?

Function.prototype.apply = () => console.log('GO AWAY');
greet(); // => GO AWAY
greet.apply(); // => GO AWAY
new Date() // => GO AWAY
'hello world'.toUpperCase() // => GO AWAY

Oops. We broke all function calls. Think of all the security holes we just plugged, though!

(This is the behavior in Node.js - it’s not the same everywhere.)

Bound functions are black boxes

The bind method is often used to wrap a function so that this will always have the same value no matter how the function is called.

const obj = {
  name: 'world',
  greet() {
    console.log('hello ' + this.name);
  }
};

const fn = obj.greet.bind(obj);
fn() // => hello world

If you’re inspecting a bound function (say, for troubleshooting), there are some obvious things you’d want to know:

What is the original function object it’s wrapping?
What object is bound to this?
What values are bound to any other parameters?

I would like my tracer to serialize all that information. Unfortunately, javascript doesn’t seem to provide any way to do that. Bound functions do not have properties pointing to the original function or any of the bindings. The only hint you can get is that their names start with 'bound '.

obj.greet.toString() // => "greet() {\n    console.log('hello ' + this.name);\n  }"
obj.greet.bind(obj).toString() // => 'function () { [native code] }'

obj.greet.name // => 'greet'
obj.greet.bind(obj).name // => 'bound greet'

Another oddity: you can call bind on a bound function, but the binding for this is set in stone after the first bind.

function greet() {
  console.log('hello ' + this.name);
}

const obj1 = { name: 'world' };
const obj2 = { name: 'Newman' };

greet.bind(obj1)() // => hello world
greet.bind(obj2)() // => hello Newman
greet.bind(obj1).bind(obj2)() // => hello world

Bindings for other arguments accumulate, though:

function add(a, b) {
  return a + b;
}

add.bind(null, 1)(2) // => 3
add.bind(null, 1).bind(null, 2)() // => 3

Proxy objects are very sneaky

I wanted the serializer for my tracer to record all the details about any object it encounters, without accidentally invoking any code associated with the object. This means, for example, that it uses Object.getOwnPropertyDescriptor(obj, key) instead of obj[key], so that if the key refers to a getter, the getter won’t be called.

For better or worse, there’s at least one type of object for which this passive inspection is unachievable: proxies.

const obj = { description: 'jejune' };
let count = 0;
const proxy = new Proxy(obj, {
  get(target, prop) {
    count++;
    return obj[prop];
  },

  getOwnPropertyDescriptor(target, prop) {
    count++;
    return Object.getOwnPropertyDescriptor(target, prop);
  },
});

proxy.description // => 'jejune'
Object.getOwnPropertyDescriptor(proxy, 'description').value // => 'jejune'
count // => 2

To my knowledge, there’s not a reliable cross-platform way to determine if an object is a Proxy or to get access to the target or handler without invoking the handler.

However, as someone pointed out on StackOverflow, you can redefine the Proxy constructor itself to let you spy on any subsequently-created proxies:

const knownProxies = new WeakMap();

Proxy = new Proxy(Proxy, {
  construct(func, args) {
    const proxy = new func(...args);
    knownProxies.set(proxy, args);
    return proxy;
  }
});

function isProxy(obj) {
  return !!knownProxies.get(obj);
}

function getProxyTarget(obj) {
  return knownProxies.get(obj)[0];
}

function getProxyHandler(obj) {
  return knownProxies.get(obj)[1];
}

The encoder has to keep track of what objects it has seen before and what ID it assigned them. If it used a regular Map, no object you serialize could ever be garbage collected, because it would stay in the encoder’s map forever. An alternative approach would be for the encoder to attach a property to the original object containing the assigned ID, but this carries greater risk of causing problems or inconvenience elsewhere in the program.