Node.js Startup: Removing code cache copies
Tags: nodejs-startup, low-level
In an earlier post of this series, we successfully captured a profile of Node.js’s startup. I noticed from the CPU profiles that there was a lot of memory copying:Finding the sources of these copies was pretty easy: I set a breakpoint
on memmove
, looked at the backtrace to try to figure out why the copy
was occurring, and did some refactoring to remove it. A lot of the
memory copies came from unnecessary copies of the “code cache”.
Javascript typically gets compiled into bytecode before execution.
Node.js also stores the compiled bytecode for all of its builtin
modules. (For those familiar with Python, this is like a version of
__pycache__
only for builtin modules.)
One interesting (& in retrospect, obvious) source of copies was code like this:
const uint8_t data[] = {1, 2, 3, 4};
std::vector<uint8_t> data_vec{data, data + sizeof(data)};
Creating data_vec
requires copying data
onto the heap. This was pretty easy
to fix by just changing the users to deal with const uint8_t*
instead.
While fixing this issue, I noticed something strange about the bytecode cache. Here is a pseudo-code implementation of the bytecode cache:
(string id, string function) {
compileModule= NULL;
cached_bytecode if (cache.has(id)) {
= cache.remove(id);
cached_bytecode }
= Compile(function, cached_bytecode);
compiled_function // (Compile consumes cached_bytecode,
// it can no longer be used.)
.add(id, compiled_function.get_bytecode());
cache}
Unfortunately re-populating the cache with compiled_function.get_bytecode()
was
not cheap! In fact, the bytecode cache was slowing down the
compilation time rather than speeding it up: it would have been cheaper
(from a startup perspective) to just not have the cache at all.
The solution is pretty simple: we leave the bytecode in the cache instead of
removing it. This requires changing some code so that cached_bytecode
can be
reused multiple times, but that was pretty tractable. In addition
to speeding up the “empty” startup, this made the average Node.js builtin
module around 20% faster to require, which should speedup startup even more for
real-world use-cases.
Combining this optimization with another one which removes another unnecessary copy, we get a nice 12% improvement in startup time:
and by removing the extra copies floating around in memory, we also get a solid decrease in unique set size: