[exploration] More precise breakpoint filtering for better debugging performance
Categories
(Core :: JavaScript: WebAssembly, task, P3)
Tracking
()
People
(Reporter: lth, Unassigned)
References
(Blocks 1 open bug)
Details
Attachments
(1 file)
Bug 1756951 removed breakpoint patching (in order to remove code cloning), but the cost of this is that we must filter breakpoints (per-instruction + function enter/leave) in some other way. It is much too expensive to call out the C++ debug trap handler for each breakpoint site, so the solution adopted in that bug was a per-function flag table that is checked inside a per-function stub. For functions that are not targeted by any breakpoints, the resulting filtering overhead is quite acceptable.
However, a function that is targeted by any breakpoints slows down by three orders of magnitude, as the debug trap handler is invoked for every breakpoint site within the function. This can become a practical problem for some kinds of debugging (cf the impracticality of using conditional breakpoints in loops in eg gdb). We could therefore benefit from more precise filtering. This bug collects the filtering ideas.
Precise per-breakpoint filters
The most precise filter is per-breakpoint-site. For this, we need one bit per site that could have a breakpoint. We would need to compute the table index from the breakpoint site location. The filtering code is called from the breakpoint site so it has access to the address of the site; the site address is therefore just the return address minus some base address.
On x86 any code location can be a breakpoint site - code is not aligned - so we'd get about a 12.5% memory increase (one bit per eight bits). This is pretty expensive, and it would be useful to see if aligning the breakpoint site somehow would not be better, for a 3.1% overhead (one bit per 32 bits). We'd pay on average 2 bytes for alignment in the code, but if breakpoint sites are rare enough the total memory overhead might be lower. Still, data memory is cheaper than code memory, so it might be better to pay for a large data table.
On arm64, the code location is already aligned, so the overhead of a per-instruction table would be 3.1%, probably fairly reasonable in practice.
Imprecise per-coderange filters
Should we still decide that we need to skimp on memory, we can lump ranges of code together. For example, on x86, we could have a bit per four bytes even if we don't align code. There's a small risk that one breakpoint would affect another, but they would be close together so it would not matter.
Even lumping 16 bytes of code together we'd be down to less than 1% overhead, and 32 bytes of code gives us 0.5% overhead. In practice, no functions are smaller than 16 bytes and all functions are 16-byte aligned, so a 16-byte range would naturally incorporate per-function filtering as well.
Filtering cost
The current filter first checks whether the instance is being debugged at all via an in-line check (checking whether the breakpoint handler is installed), calling the filtering stub if so. The filtering stub then looks up the function index in the table to determine whether the function is being debugged. This second filtering could be replaced by the more precise filter, or, if that is too expensive because the bit lookup is too expensive, could be augmented with a third code address filter. Only benchmarking will provide an answer here.
| Reporter | ||
Updated•3 years ago
|
| Reporter | ||
Comment 1•3 years ago
|
||
Description
•