Closed Bug 642054 Opened 13 years ago Closed 11 years ago

Implement a profiler for JavaScript

Tracking

()

Status:

RESOLVED INVALID

People

(Reporter: dmandelin, Assigned: sfink)

References

Details

Attachments

(5 files, 11 obsolete files)

v1: charge to filename, lineno; works for the interpreter; posix only 13 years ago David Mandelin [:dmandelin] 4.41 KB, patch		Details \| Diff \| Splinter Review
v2 -- works for the methodjit, a little bit 13 years ago David Mandelin [:dmandelin] 9.85 KB, patch		Details \| Diff \| Splinter Review
sample output for 100 iters of 'earley' in v8bench 13 years ago David Mandelin [:dmandelin] 2.65 KB, text/plain		Details
v3 -- handles IC jitcode, classifies some C++ functions 13 years ago David Mandelin [:dmandelin] 4.41 KB, patch		Details \| Diff \| Splinter Review
real v3 -- IC code, some C++ functions 13 years ago David Mandelin [:dmandelin] 27.50 KB, patch		Details \| Diff \| Splinter Review
v4 -- better output, more stub calls classified 13 years ago David Mandelin [:dmandelin] 36.01 KB, patch		Details \| Diff \| Splinter Review
Coarse-grained 'earley' results 13 years ago David Mandelin [:dmandelin] 622 bytes, text/plain		Details
'earley' by script 13 years ago David Mandelin [:dmandelin] 2.20 KB, text/plain		Details
'earley' by script with fine-grained breakdown 13 years ago David Mandelin [:dmandelin] 14.39 KB, text/plain		Details
Postprocessing script 13 years ago David Mandelin [:dmandelin] 1.73 KB, text/plain		Details
with jsprofiler.cpp 13 years ago David Mandelin [:dmandelin] 43.19 KB, patch		Details \| Diff \| Splinter Review
SIGPROF-based JS profiler 13 years ago Steve Fink [:sfink] [:s:] 45.72 KB, patch		Details \| Diff \| Splinter Review
sampling profiler using SIGPROF and profiler thread 13 years ago Steve Fink [:sfink] [:s:] 57.17 KB, patch		Details \| Diff \| Splinter Review
WIP: sampling profiler using SIGPROF and profiler thread 13 years ago Steve Fink [:sfink] [:s:] 61.46 KB, patch		Details \| Diff \| Splinter Review
Remove need for context or locking for JS_PCToLineNumber 13 years ago Steve Fink [:sfink] [:s:] 3.25 KB, patch		Details \| Diff \| Splinter Review
WIP: sampling profiler using SIGPROF and profiler thread 13 years ago Steve Fink [:sfink] [:s:] 65.23 KB, patch		Details \| Diff \| Splinter Review

David Mandelin [:dmandelin]

Reporter

Description

•

13 years ago

Attached patch v1: charge to filename, lineno; works for the interpreter; posix only (obsolete) — Details — Splinter Review

We should have a profiler that charges time against scripts/code-lines/JSOps.

David Mandelin [:dmandelin]

Reporter

Comment 1

•

13 years ago

Attached patch v2 -- works for the methodjit, a little bit (obsolete) — Details — Splinter Review

Attachment #519575 - Attachment is obsolete: true

David Mandelin [:dmandelin]

Reporter

Comment 2

•

13 years ago

Attached file sample output for 100 iters of 'earley' in v8bench — Details

The hottest "code" is:

    3786    0.740 unknown jitcode/stub called via jitcode 0

This is probably either (a) IC jitcode or (b) C++ functions called from jitcode. For (a), we can just add IC jitcode chunks to the map, although this may require a more scalable data structure (currently I scan a vector). (b) is harder, but it seems reasonable to try to walk the C stack back until we reach jitcode. For long-running C++ functions (e.g., js_GC), we will probably want to record that we are in that function and put it in the profile explicitly.

David Mandelin [:dmandelin]

Reporter

Comment 3

•

13 years ago

Attached patch v3 -- handles IC jitcode, classifies some C++ functions (obsolete) — Details — Splinter Review

Attachment #519729 - Attachment is obsolete: true

David Mandelin [:dmandelin]

Reporter

Comment 4

•

13 years ago

Attached patch real v3 -- IC code, some C++ functions (obsolete) — Details — Splinter Review

Attachment #519831 - Attachment is obsolete: true

David Mandelin [:dmandelin]

Reporter

Comment 5

•

13 years ago

Attached patch v4 -- better output, more stub calls classified (obsolete) — Details — Splinter Review

Attachment #519833 - Attachment is obsolete: true

David Mandelin [:dmandelin]

Reporter

Comment 6

•

13 years ago

Attached file Coarse-grained 'earley' results — Details

I'm starting to get some possibly useful output. Here is 'earley', 100 iters, with coarse-grained classification on an opt build. Lots of time in lambda/callobj/arguments.

David Mandelin [:dmandelin]

Reporter

Comment 7

•

13 years ago

Attached file 'earley' by script — Details

Output by script. Most of our time is in the top 7, which are sc_list, a couple of local lambdas (loop2, loop3), sc_Pair, another pair of local lambdas (deriv_trees sc_loop1_98), and sc_append.

David Mandelin [:dmandelin]

Reporter

Comment 8

•

13 years ago

Attached file 'earley' by script with fine-grained breakdown — Details

Same as the previous, but broken down by activity and line number within script.

David Mandelin [:dmandelin]

Reporter

Comment 9

•

13 years ago

Attached file Postprocessing script — Details

This postprocesses profiler output into summaries by activity and script.

Brian Hackett [Laid off!]

Comment 10

•

13 years ago

The v4 patch seems to be missing jsprofiler.*

What is the plan for this going forward?  Is there a plan to make it usable with the browser and convey profiling info without needing postprocessing?  Not sure whether to look more into this timer-based approach or to use performance counters as in bug 652535.

Blocks: 652535

David Mandelin [:dmandelin]

Reporter

Comment 11

•

13 years ago

Attached patch with jsprofiler.cpp (obsolete) — Details — Splinter Review

David Mandelin [:dmandelin]

Reporter

Comment 12

•

13 years ago

(In reply to comment #10)
> The v4 patch seems to be missing jsprofiler.*
> 
> What is the plan for this going forward?  Is there a plan to make it usable
> with the browser and convey profiling info without needing postprocessing?  

I would like to see that happen.

> Not sure whether to look more into this timer-based approach or to use 
> performance counters as in bug 652535.

Both are potentially interesting and seem worthwhile. Performance counters seems to have the advantage of being more customizable, while sampling has the advantage of (with a bunch more work) being able to charge time to C++ slow calls by name.

Steve Fink [:sfink] [:s:]

Assignee

Comment 13

•

13 years ago

Attached patch SIGPROF-based JS profiler (obsolete) — Details — Splinter Review

I took a look at this today. I'm attaching an updated version that works on x86_64 (and attempts to work on ARM, though I haven't tried compiling there) and also works around some of the threading issues.

I restricted the SIGPROF signal to just the main thread as much as possible. It should be enough for now. Otherwise, I get way too many collisions between multiple threads running the signal handler at the same time.

However, this still leaves some pretty big signal-related issues. As written, the signal handler has to do unsafe operations -- specifically, allocating memory is very bad. I get a deadlock within a second or so because SIGPROF fires within malloc and so it deadlocks with its own allocations.

To get this to work, it seems like we'd need to queue up the information to be gathered and send it off to another thread or something to be processed. But that'll also require locking anything needed in memory (so eg script->filename and JIT code don't go away). I'm not sure how doable that is. I don't know whether that'll be enough to allow collection on all threads, though.

Hmm... thinking more, perhaps the thing to do is to send everything over to another thread for processing: every JIT code creation event would report the generated address range, along with the script filename and whatever else was needed. It adds overhead even for things that never get a sample hit, but the signal handler would only need to write() (or msgsnd() or tack onto a lockless finite queue or whatever). It would report the IP, the profiler's current activity value, and perhaps a backtrace of IPs.

Attachment #519957 - Attachment is obsolete: true

Attachment #528240 - Attachment is obsolete: true

Brian Hackett [Laid off!]

Comment 14

•

13 years ago

(In reply to comment #13)
> However, this still leaves some pretty big signal-related issues. As
> written, the signal handler has to do unsafe operations -- specifically,
> allocating memory is very bad. I get a deadlock within a second or so
> because SIGPROF fires within malloc and so it deadlocks with its own
> allocations.

You might want to look at the approach in bug 652535, which doesn't use signals and profiles a randomized sample of call entries/exits at a pretty small runtime cost.  Doesn't have the fidelity of this approach (just reports time per script), but signal handlers might be even more problematic going forward with IonMonkey's plan to put the JS stack on the C stack.

Is it possible to avoid signals and just have another thread periodically poll the main thread to see what it's doing?  Vaguely recall that's how V8's profiler works, but I never looked at it in detail.

Steve Fink [:sfink] [:s:]

Assignee

Comment 15

•

13 years ago

(In reply to comment #14)
> You might want to look at the approach in bug 652535, which doesn't use
> signals and profiles a randomized sample of call entries/exits at a pretty
> small runtime cost.  Doesn't have the fidelity of this approach (just
> reports time per script), but signal handlers might be even more problematic
> going forward with IonMonkey's plan to put the JS stack on the C stack.

Cool, I've read that bug but hadn't noticed your sampling trick.

> Is it possible to avoid signals and just have another thread periodically
> poll the main thread to see what it's doing?  Vaguely recall that's how V8's
> profiler works, but I never looked at it in detail.

I was trying to think of a way to do that. On Linux, I could use ptrace, which is a little gross. I guess I should look at V8.

systemtap would let me grab an IP and stack at every context switch.

Bill McCloskey [inactive unless it's an emergency] (:billm)

Comment 16

•

13 years ago

I think it would be really cool if we could make use of the new Linux perf infrastructure. It's incredibly powerful, and it's undergoing a ton of development right now.

You can hook into it in various ways. At the lowest level, you can make a few syscalls and it will start filling a circular buffer with IP/stack trace info. But you can also get lots more info about cache misses, IPC, etc. It would be awesome if we could connect this back to the JS source.

Here's some info:
  https://perf.wiki.kernel.org/index.php/Main_Page
  http://www.linux-kongress.org/2010/slides/lk2010-perf-acme.pdf
Unfortunately, the best docs are in the kernel tree itself, under tools/perf. However, on most distributions you can try this stuff out with a command like |perf stats ls|. At most, you'll have to install a package that's specific to the kernel you're running.

Steve Fink [:sfink] [:s:]

Assignee

Comment 17

•

13 years ago

(In reply to comment #14)
> Is it possible to avoid signals and just have another thread periodically
> poll the main thread to see what it's doing?  Vaguely recall that's how V8's
> profiler works, but I never looked at it in detail.

I've just skimmed through v8's profiler. On Linux, they do pretty much what I said in the last paragraph of comment 13 (though it's just a thread, so no write() etc silliness needed). They use SIGPROF, but the signal handler just adds some fixed-size info onto the head of a circular queue. A separate profiler thread reads the records off and processes them. Interestingly, they don't use an interval timer to trigger the SIGPROFs. Instead, the profiler thread sends them. A code comment says they get much better accuracy from

  while (1) {
    send signal;
    usleep();
  }

than from an interval timer. Interesting.

On Windows, the scheduling is the same (a while loop with a sleep for the profiler thread), but the profiler can use GetThreadContext(profiled_thread) to reach into the other thread's state.

OSX is similar to Windows, but the profiler thread suspends the profiled thread first, then uses thread_get_state() to peek into it.

v8 has two types of profiling, CPU profiling and JS profiling. (I think the latter might be called runtime profiling in the source?) CPU profiling grabs the ip, sp, fp, and I think the stack and some global "VM state" context marker.

JS profiling notes whether or not JS code is running at that sample instant, and computes a windowed average that gives "what percent of time is currently spent running JS code?" They use it to control how much optimization to do: <20% JS means never optimize, <75% means only optimize really hot stuff (they triple the threshold), and >=75% means "normal": check which JS functions exceed the optimization threshold, and queue them up for optimization.

Funny optimization hack: if only JS profiling is enabled, it'll notice if no JS has run for a while, and suspend the profiler thread until JS code is reentered.

v8 has to track executable code creation/destruction (as do we), and handle code movement triggered by their moving GC. If something in the sampling window moves, they update the pointer within the window.

Steve Fink [:sfink] [:s:]

Assignee

Comment 18

•

13 years ago

Attached patch sampling profiler using SIGPROF and profiler thread (obsolete) — Details — Splinter Review

Here's a WIP update that resolves some of the concurrency issues and might conceivably work across multiple threads, though I haven't tried it yet.

It's big and ugly, Linux-only, and has numerous remaining concurrency problems. It doesn't handle context shutdown or even script discard, I don't think.

But it gives output similar to dmandelin's original output, and now survives my test cases that hung with the older version.

Assignee: general → sphink

Attachment #530765 - Attachment is obsolete: true

Status: NEW → ASSIGNED

Steve Fink [:sfink] [:s:]

Assignee

Comment 19

•

13 years ago

Attached patch WIP: sampling profiler using SIGPROF and profiler thread (obsolete) — Details — Splinter Review

Did a bunch of cleanup. Starting to resemble something real. The biggest concurrency issues are now resolved, though several remain. (The asynchronous signal handler is a surprisingly large pain.)

This patch depends for correctness on the one I'm about to post to make it possible to pass in a NULL context to JS_PCToLineNumber, although it'll run fine without it.

Attachment #535235 - Attachment is obsolete: true

Steve Fink [:sfink] [:s:]

Assignee

Comment 20

•

13 years ago

Attached patch Remove need for context or locking for JS_PCToLineNumber (obsolete) — Details — Splinter Review

Even though JS_PCToLineNumber just scans through the src notes attached to a JSScript, it requires a JSContext* and if you're really unlucky, will temporarily grab the debugger lock. (Specifically, if you give it the initial PC of a function definition -- JSOP_DEFFUN -- and you've set a trap on that PC, it needs to take the debugger lock so it can access the trap table.)

That makes it useless for the sampling profiler, because you might get a SIGPROF while the debugger lock is held, and you'd deadlock. (And NSPR doesn't provide a trylock.)

Steve Fink [:sfink] [:s:]

Assignee

Comment 21

•

13 years ago

Attached patch WIP: sampling profiler using SIGPROF and profiler thread (obsolete) — Details — Splinter Review

That last one tends to seg fault pretty quickly when it accesses a partially-built stack frame.

As a (perhaps) temporary workaround, I've updated to mark the stack as off-limits while the interpreter is pushing a frame, and while in JIT code or unlabeled code called from JIT code. (The latter should be possible to eliminate.) The tracejit is not handled specially; I haven't really thought through what it'll need.

This survives earley-boyer, at least. Next I want to validate that the output is sane, before I attempt to resolve the remaining concurrency issues.

Attachment #535811 - Attachment is obsolete: true

Steve Fink [:sfink] [:s:]

Assignee

Comment 22

•

13 years ago

I'm still missing marking some places where the stack is off-limits, and I haven't implemented the script lifetime management, but I wanted to stare at the data for a while as a sanity check and to see whether it's collecting any useful info yet. For a visual display of what it's producing, please check out http://10.250.2.131/~sfink/v8/ui.html (from within the moz network.)

For earley-boyer, the main test case I've been staring at for this bug, it very visibly continues to point the finger at STUB_CREATETHIS (stubs::CreateThis).

Note that the above URL gives data for js -m run-<test>.js. So its not representative of what happens in cases where we would actually trace a significant portion. There's nothing stopping me from running it with -m -t or whatever; it's just that I only have hooks for providing rich data when running under the mjit.

Steve Fink [:sfink] [:s:]

Assignee

Comment 23

•

13 years ago

Updated a bit and moved to a public URL: http://people.mozilla.org/~sfink/v8/ui.html

This has the bulk of the "unknown" bits categorized, though it'd be really nice to get symbol information to fill in the rest.

Steve Fink [:sfink] [:s:]

Assignee

Updated

•

13 years ago

Blocks: 675102

Eric Shepherd [:sheppy]

Updated

•

13 years ago

Keywords: dev-doc-needed

Steve Fink [:sfink] [:s:]

Assignee

Comment 24

•

13 years ago

I don't want to deal with juggling the patches here, but I'd like to keep a more or less working copy available, so I'm going to obsolete the patches and use http://hg.mozilla.org/users/sfink_mozilla.com/patches-jsprofiler/ as a public patch queue instead.

Steve Fink [:sfink] [:s:]

Assignee

Updated

•

13 years ago

Attachment #535816 - Attachment is obsolete: true

Steve Fink [:sfink] [:s:]

Assignee

Updated

•

13 years ago

Attachment #536532 - Attachment is obsolete: true

Steve Fink [:sfink] [:s:]

Assignee

Comment 25

•

11 years ago

SPS covers this now.

Status: ASSIGNED → RESOLVED

Closed: 11 years ago

Resolution: --- → INVALID

Jason Orendorff [:jorendorff]

Updated

•

11 years ago

Keywords: dev-doc-needed

You need to log in before you can comment on or make changes to this bug.