16 KiB
Laymen’s JS spec
While reading this, you are strongly advised to have a JS console opened, so you can try some stuff out for yourself. This document won't explain the syntactical oddities of JS, just the runtime semantics for someone looking to implement a JS runtime.
1. Value types
- Undefined and null - basically the same
- Booleans (true, false)
- Numbers (64 bit floats)
- Symbols - unique and special "marker" GC-able values that contain nothing
- BigInts - integers with no size restrictions
- Strings - characters are of a 16-bit width, and may contain \0
- Objects - a string to value dictionary, with a prototype (later on)
- Functions - an object that can be called, basically an object with a func ptr appended to it
- Arrays - Objects that have special behaviors when accessing number members, as well as “length”
2. Objects
2.1. Object basics
In layman's terms, a JS object is just a string to value hashmap. When we do a member lookup, we have to just convert the lookup key to a string and search for it in the hashmap (of course there's more to it, but let's get to it later).
NOTE: Keys are stored in order, so if you set "a", "b" and "c" in order, when we iterate the object's keys, we will get them in the same order - a, b and c. You can try it out in the console, by writing out the object literal ({ a: 10, b: 20, c: 30 })
. You can shuffle the properties around, and they will keep their order.
Objects define a few base operations for members:
- Member lookup (get the member's value)
- Member setting
- Member query (check if we have the member)
- Member removal
- Member definition (for now, it might look the same as member set, but we'll talk about it later)
- Member enumeration (in order)
2.2. Object prototypes
Prototypes are in the base of JS's OOP model. In short, it is a object that the lookup will "fallback" to, if we don't find the member in our object. Every JS object can have a prototype (some objects have what's called a null
prototype). Since all object can have a prototype, we can have object A that has prototype B, and B can have prototype C. In this case, C can define "foo", B can define "bar" and A can define "baz". When we look up "foo" in A, the lookup will fallback on B, which will fallback on C, and then the member will be found.
var A = { baz: 1 };
var B = { bar: 2 }
var C = { foo: 3 };
Object.setPrototypeOf(A, B);
Object.setPrototypeOf(B, C);
A.foo // 3
B.foo // 3
A.bar // 2
C.foo = 10;
A.foo // 10
B.foo // 10
Prototypes however make it impossible to know which object we're acting upon. This is why JS defines two types of member operations - normal and those, restricted to the current object (aka they don't work with the prototype). So with this consideration, our actual list of member operations is the following:
- Own member lookup
- Own member set
- Own member query
- Own member enumeration
- Own member removal
- Own member definition (again, we will get to the difference from
set
) - Normal member lookup
- Normal member set
- Normal member query
- Normal member enumeration
Note how we can ony define and remove members from the specific object. In short, this is because we almost never want to delete a member from the prototype if we don't find it in the object itself. This is because a prototype may be used by thousands of objects, and if one of them accidentally deletes a property of the prototype, it will break the rest. The same logic follows for member definition
Additionally, we can define three more operations with the prototype:
- Prototype get
- Prototype set
- Prototype remove
2.3. Properties and fields
Until now, we've looked at just members, which were just JS values. However, the JS model is far more flexible. First of all, we have two types of members: fields and properties.
Fields are the simpler member - they are just a value, with the following additional options:
- writable - if false, sets to this member will not have an effect
- configurable - we'll get to it
- enumerable - if false, the enumeration will skip over this member by default
Properties are the more complex counterpart of members - when we try to get them, a user-defined JS function will be called, and its return value will be returned as the member's value. Similarly, a user-defined JS function with one argument is called when setting the value of the member. Those two functions - getters and setters, are optionally defined (aka a user may only define the getter, making the property effectively readonly)
Properties have the same flags as fields, except for writable - as was said above, the setter can just be omitted.
2.4. Member configuration (redefinition) and deletion
Since members have the above mentioned flags, we can "tweak" them under certain circumstances. This is called a configuration. The act of redefining or deleting a member, to be precise, is a configuration.
As you might've spotted, we have a "configurable" flag. This flag is used to "lock down" the member, preventing people from redefining it. After setting "configurable" to false, we can no longer delete or redefine the member. In fact, just one redefinition is allowed for non-configurable members - if the member is a field, and it is writable, we are allowed to redefine it with a field with the same value, enumerable and configurable flag, but with "writable" set to false.
Otherwise, if we try to redefine a non-enumerable member with the exact same parameters, the operation will succeed, and nothing will change.
Another caveat is that, when a configurable property is redefined, instead of straight up overwritting it, if the property has a null getter or setter, the respective function is going to be taken from the old object:
var obj = {};
Object.defineProperty(obj, "test", { get: () => 10, enumerable: true, configurable: true });
/*
{
get: [Function],
set: undefined,
configurable: true,
enumerable: true,
}
*/
Object.defineProperty(obj, "test", { set: v => console.log(v), enumerable: false, configurable: true });
Object.getOwnMemberDescriptor(obj, "test");
/*
{
get: [Function],
set: [Function],
configurable: true,
enumerable: false,
}
*/
The pseudocode for redefining a field is as follows:
function redefine(old, new) {
if (old.configurable) {
if (old is Property) {
old.get = new.get ?? old.get;
old.set = new.set ?? old.set;
}
old = new;
return true;
}
else {
if (old.enumerable != new.enumerable) return false;
if (new.configurable) return false;
if (old is Field) {
if (!old.writable) {
if (old.value != new.value) return false;
if (new.writable) return false;
}
else old = new;
return true;
}
else (old is Property) {
if (old.get != new.get) return false;
if (old.set != new.set) return false;
return true;
}
}
}
Deletion is much more straight-forward - we only ought to check if the member exists and is configurable. Note that a deletion will succeed if the member doesn't exist - it will fail only if the member isn't configurable.
2.5. Normal, non-extensible, sealed and frozen objects
These are the four states of a JS object, which will determine to what extent we can modify it.
- Normal mode - all the so far explained semantics are the same
- Non-extensible mode, we can't define new members, but we are allowed to redefine and delete them. The prototype of the object is also locked as-is
- Sealed - Same as non-extensible, but all the members are non-configurable, too (can't delete them and can't redefine them, except for the mentioned mechanics)
- Frozen - Same as non-extensible, but all members are non-configurable and all fields are readonly. Properties retain their setters
Note that a runtime may implement only a extensible flag on the object, and the rest may be done from JS instead.
3. Arrays
In short, arrays are nothing more than special-case objects. All arrays have the "length" member, as well as members with numeric keys, ranging from 0 to length. These keys are arbitrary, and it's just a "contract" of sorts that they will represent elements from an array. JS arrays are sparse as well. This means that a JS array may define elements from 0 to 5, and then from 8 to 10. In this case, elements 6 to 7 are not members of the array (Try it out by writing the [1,,2]
literal in your terminal. You will see [1, <empty>, 2]
, and when you try to call Object.getOwnPropertyDescriptor([1,,2], "1")
, you will get undefined
, aka no such field). Another caveat is that when you define the member "5" of an array of size "3", the array's length will automagically update to 6
(because index 5
is the 6th element). Setting the "length" property of the array can have two effects: if the new size is bigger, nothing (the array will just grow, and will be filled by empty values), and if the size is smaller, the array will automagically shrink to that size, deleting the members that are outside the bounds of the array.
Most sane engines of course define a special case object for arrays, which are usually backed by a linear buffer. When you try to grow the array by setting members out of its bounds, it will grow its buffer to accommodate its new members. When deleting a member (which will make the array sparse), engines usually take one of two routes: they either enter panic mode and convert the whole array to a normal object, or just set a special "flag" value in place of the empty value.
Other nasty things the user can do are: define a normal member (for example "pe6o" or "0.5") in the array, defining a property (getter-setter member) with the name "0" in an array. The first two are handled by just having an underlying object on speed dial. The last example however is more interesting - in such cases, most engines will either, again, enter panic mode and revert to object mode, or just overlay the array with a backing object.
4. Functions
Functions are at the core of what makes the JS clock tick. As in any language, they are the storage of JavaScript code that gets executed when the function value gets invoked. But JS functions are objects as well. This means that you can work with a function as you would with any other regular object - define properties of it, freeze it, list its keys, etc.
However, a function wouldn't be a function without its ability to be called. In JS, functions can be called in two distinct ways:
- apply - a normal call, aka
my_func(a, b, c)
- construct - a call with new, aka
new my_func(a, b, c)
Applying functions
When applying a function, you will pass all the arguments you put in the parens, but an implicit this
argument will get passed, too. In most cases, this
will be passed as undefined
. After that, the function will consume the arguments, execute its body and return the result. Then the result will become the evaluated value of the call expression.
Now, the JS syntax allows for one special way of calling - a member call. This is achieved by calling a member expression: a.b(c)
. In this case, the value of a
will be passed as the implicit this
argument of the call. This here leads to a lot of JS gotchas, mostly when trying to pass a method of an object as a value to somewhere else, when you get these this is undefined
exceptions. This is easily resolved by calling a.member.bind(a)
, which will produce a new function that when invoked will replace the implicit this argument with the passed value instead.
Constructing functions
In JS, we achieve OOP by "simulating" a class with a function. When we call a function with the new
prefix, we effectively call it in the special "construct" mode. In this mode, as the this
argument, a special object is passed in that the function will modify. After the function evaluates, if its return value is a primitive, that special object is returned. Otherwise, the return value of the function becomes the evaluated value of the expression. In pseudocode, this is how a high level "construct" would look:
function construct(func, ...args) {
const obj = {};
const res = func.apply(obj, args);
if (is_primitive(res)) return obj;
else return res;
}
The Function.prototype member
This member is a special member of each function. It contains the object that will become the prototype (__proto__
) of the newly created this
for the constructed function. In it we define the instance methods, getters and setters of the class.
The post ES6 construct method
ES6 introduced classes and inheritance, which led to the necessity of reworking the function model a little bit. In essence, the following was changed:
- In a constructor of a derived class, the
this
"variable" remains uninitialized until thesuper
constructor has been called. When it is called, its value will becomethis
's value. - The function is given the function that is being instantiated, instead of the
this
object. This is because the function may be the super class of another, and it needs to make the prototype ofthis
equal to theprototype
field of the derived class, instead of its own
Implementing this change in the runtime however is somewhat trivial.
5. Variables
Variables in JS are simple: each function has a single repository of variables that are accessible from the inception of the function. When you declare a variable, it is not visible from the point of declaration, but instead from the point of the function start. This can lead to some confusion, but also makes the runtime implementation 10 times easier.
Capturing variables
Since in JS, a function may be defined in another, we need a mechanism via which the variables of the parent function can be made accessible by the child function. This can happen via variable capturing. Basically, during compilation, if it's determined that JS tries to access a variable from its parent function, that variable is marked as "capturable", aka a variable that is kept as a pointer to a value, instead of just a value, so that other functions can share that pointer, and the child function keeps track of which variables it has captured and to which parent variables it corresponds.
The runtime however, as far as its concerned, needs to only keep track of the captured, capturable and regular variables and provide a mechanism of constructing a function by supplying the raw function body (aka the instructions, name and other metadata) and a list of the captured variable instances.
ES6 variables
Since the addition of the new let
and const
variables, the runtime model has been complicated a little bit - a function still has one repository of variables, but the runtime now needs to keep track of whether or not the variable has been initialized yet. This is because we can have code like this:
const a = () => console.log(b);
a();
const b = 10;
For which static analysis of the scope is impossible. Of course, the compiler can feel free to omit these checks whenever it determines that the variable is definitely assigned, but the defined-ness of the variable must be guaranteed for it to be accessible.
Another consequence of the new variable model is that we can have code like this:
const funcs = [];
for (let i = 0; i < 100; i++) {
funcs[i] = () => i;
}
console.log(funcs[69]()); // prints 69
In this case, for each iteration of the loop a new capturable instance needs to be created. This requires the runtime to have a mechanism of dynamically creating (and destroying) capturables.
The global scope
When the compiler tries to resolve a variable, but doesn't find it anywhere along the scope chain, the variable is converted to a global variable access. In essence, this is just a more fancy way to get a field from an implicit "globals" object. However, unlike the traditional my_obj.test
, where if test
is not a member of my_obj
, undefined
is returned, instead, an error that reports that no such variable exists is thrown instead. The same mechanic follows for trying to assign to a nonexistent property of the global scope.
The only way of defining a new value of the global scope is by using var name = value
in the top-level of the file. This statement will get 1 to 1 converted to globalThis.name = value
(globalThis
for the confused is just a reference to the global object. Alternatives are self
, window
and just global
).