Node.js Startup: Removing code cache copies

Tags: ,

In an earlier post of this series, we successfully captured a profile of Node.js’s startup. I noticed from the CPU profiles that there was a lot of memory copying:

Finding the sources of these copies was pretty easy: I set a breakpoint on memmove, looked at the backtrace to try to figure out why the copy was occurring, and did some refactoring to remove it. A lot of the memory copies came from unnecessary copies of the “code cache”. Javascript typically gets compiled into bytecode before execution. Node.js also stores the compiled bytecode for all of its builtin modules. (For those familiar with Python, this is like a version of __pycache__ only for builtin modules.)

One interesting (& in retrospect, obvious) source of copies was code like this:

const uint8_t data[] = {1, 2, 3, 4};

std::vector<uint8_t> data_vec{data, data + sizeof(data)};

Creating data_vec requires copying data onto the heap. This was pretty easy to fix by just changing the users to deal with const uint8_t* instead.

While fixing this issue, I noticed something strange about the bytecode cache. Here is a pseudo-code implementation of the bytecode cache:

compileModule(string id, string function) {
    cached_bytecode = NULL;
    if (cache.has(id)) {
        cached_bytecode = cache.remove(id);
    }

    compiled_function = Compile(function, cached_bytecode);
    // (Compile consumes cached_bytecode,
    //  it can no longer be used.)

    cache.add(id, compiled_function.get_bytecode());
}

Unfortunately re-populating the cache with compiled_function.get_bytecode() was not cheap! In fact, the bytecode cache was slowing down the compilation time rather than speeding it up: it would have been cheaper (from a startup perspective) to just not have the cache at all.

The solution is pretty simple: we leave the bytecode in the cache instead of removing it. This requires changing some code so that cached_bytecode can be reused multiple times, but that was pretty tractable. In addition to speeding up the “empty” startup, this made the average Node.js builtin module around 20% faster to require, which should speedup startup even more for real-world use-cases.

Combining this optimization with another one which removes another unnecessary copy, we get a nice 12% improvement in startup time:

and by removing the extra copies floating around in memory, we also get a solid decrease in unique set size:

Posted on 2023-05-22