Unwinding Node.js/V8 Javascript stacks in eBPF
Tags: low-level
ustackjs
is a Node.js/V8 Javascript stack unwinder in eBPF, allowing
you to view backtraces for native C++ code. It is available here.
To see how to use it, let’s consider the following Javascript program.
function foo() { return new Uint8Array(1024); }
function bar() { foo(); }
bar();
Allocating a Uint8Array
eventually calls a C++ function known as
“v8::Isolate::AdjustAmountOfExternalAllocatedMemory
”. You can trace calls to
this function using ustackjs
. First, we will need to get the mangled name:
$ nm `which node` | grep AdjustAmountOfExternalAllocatedMemory
0000000000b9ac00 T _ZN2v87Isolate37AdjustAmountOfExternalAllocatedMemoryEl
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...
Now we can pass that to ustackjs.py
to see all the callsites.
$ sudo python3 ustackjs.py --node `which node` \
_ZN2v87Isolate37AdjustAmountOfExternalAllocatedMemoryEl
5845 5845 5845 12888.364267112: 1 event:
561614bf9f40 v8::internal::Builtin_ArrayBufferConstructor(int, unsigned long*, v8::internal::Isolate*)+0x120 ([node])
5616155ecd79 Builtins_CEntry_Return1_DontSaveFPRegs_ArgvOnStack_BuiltinExit+0x39 ([node])
56161556e7ec Builtins_JSBuiltinsConstructStub+0xec ([node])
561615660652 Builtins_CreateTypedArray+0x892 ([node])
5616155de2c7 Builtins_TypedArrayConstructor+0x87 ([node])
56161556e7ec Uint8Array ([js])
561595706add [unknown]
561595706bb6 foo ([js])
561595706cb6 bar ([js])
# ...
There are both C++ functions, like Builtins_CreateTypedArray
, and
Javascript functions (foo
and bar
).
The output is in perf-script(1)
format, which lets you import it into
various tools like Speedscope.
This functionality of tracing calls to a particular native function is
useful, but of course it may not quite be what you want. For example,
you may want to trace system calls, capture the arguments passed to the
function, etc.. Since the tool is open-source, you can modify
ustackjs
to your liking in order to capture whatever information you
need.
To my knowledge, this is the first tool of its class for Node.js/V8.
Similar tools exist for other interpreted languages, like Python or
Ruby, but Javascript is particularly complicated because of its JIT. And
while tools like perf
or gdb
can be used for native stack traces,
neither of those support low-overhead ways to get backtraces
like eBPF can!
How it works
I adapted the algorith here from llnode, which can get the backtrace
from a coredump. There is not too much to it, since V8 actually makes
unwinding pretty easy. Essentially the register rbp
points to the
saved frame pointer (i.e., the rbp
of the previous function). For
Javascript frames, you can traverse some nearby objects to eventually
get to the name of the function. For C++ frames, you’ll realize that it
doesn’t work the Javascript unwinding fails. Then you can set rbp <- old rbp
and keep unwinding, until you eventually reach some maximum
limit or fail to unwind.
Here are the gory details in a picture, showing the pointers you need to traverse to get the name of a Javascript function.
┌──────────────────────────┐
│ return address │
├──────────────────────────┤
│ saved frame pointer ◄─┼─ rbp
├──────────────────────────┤
│ "context" │
├──────────────────────────┤
│ JSFunction pointer ────┼──┐
└──────────────────────────┘ │
│
┌──────────────────────────┐ │
│ JSFunction map ◄─┼──┘
├──────────────────────────┤
│ │
│ ... │
│ │
├──────────────────────────┤
│ SharedFunctionInfo ptr ─┼───┐
└──────────────────────────┘ │
│
┌──────────────────────────┐ │
│ SharedFunctionInfo map ◄─┼───┘
├──────────────────────────┤
│ │
│ ... │
├──────────────────────────┤
│ name or scope info ptr ─┼───┐
└──────────────────────────┘ │
│
┌──────────────────────────┐ │
│ ScopeInfo map ◄─┼───┘
├──────────────────────────┤
│ ... │
├──────────────────────────┤
│ context_local_count │
├──────────────────────────┤
│ followed by │
│ 2 * context_local_count │
│ 8-byte words │
├──────────────────────────┤
│ name pointer ┼───┐
└──────────────────────────┘ │
│
┌──────────────────────────┐ │
│ String map ◄─┼───┤
├──────────────────────────┤ │
│ length of string │ │
├──────────────────────────┤ │
│ string data: "foo" │ │
└──────────────────────────┘ │
│
┌──────────────────────────┐ │
│Root map pointer ◄─┼───┘
├──────────────────────────┤
│instance type │
└──────────────────────────┘
Is it safe to use?
I have not formally measured the overhead of this, my current guesstimate is
somewhere in the double-digit microseconds per stack trace. I recommend running
with a low value of --max-depth
and --max-function-name-length
, and turning
it up carefully until you get enough information to debug whatever you’re
looking at.
What’s left to do?
This was more of a proof-of-concept, although I’ve already found it surprisingly useful. Here are some things which are missing and could be nice additions:
- Precisely quantify the performance impact.
- Get various constants from the binary, rather than hardcoding them.
llnode
uses “v8dbg_*
” symbols, which lets it work across different Node.js versions. - Cache some data until a GC occurs. This can avoid pointer chasing, which is probably not great for performance.
- Support other types of strings. Currently we only support “one-byte seq strings”. This means that we can’t always print out the Javascript function name – for example we’ll choke on anything which has unicode. There is a tradeoff here in that supporting more types of strings is slower.
- Don’t probe 4 slots for the right string. Right now this copies a hack from
llnode – it actually tries multiple offsets in
ScopeInfo
to find the location of the function name. We should be able to get the exact slot at the cost of some additional code complexity. - Some support for inlining. It would be nice if we could optionally detect inlining and correctly unwind it. This needs to be optional as the performance hit is likely quite high.
- Support WebAssembly functions.