Figure out a way to correlate JavaScript execution with event loop non-responsiveness

RESOLVED WORKSFORME

Status

()

RESOLVED WORKSFORME
8 years ago
6 years ago

People

(Reporter: ted, Unassigned)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment)

One of the Firefox goals is to ensure that the UI is responsive. Dietrich's suggestion is, for a first pass, to try to figure out what JavaScript is executing when we're unresponsive. I'm going to try to mash up my instrumentation from bug 606574 along with sfink's work from bug 580055 to map unresponsive times to JavaScript execution, so we can target slow parts of the UI.
Longer-term, it would be nice to have this all be somewhat scriptable, so that we could have a testing extension that logged when we started an action, collected the non-responsiveness/JS calls data, and logged when we stopped, and could provide results in-browser. I don't know how feasible that is.
Posted file correlatejs.py
This script takes a two logfiles on the commandline:
1) The logfile from the event loop responsiveness instrumentation (MOZ_INSTRUMENT_EVENT_LOP_OUTPUT)
2) The logfile from slowcalls (NSPR_LOG_FILE, when you set NSPR_LOG_MODULES=SlowCalls:5 with the patch from bug 580055)

It spits out output like:
(T - T+x),[stack]

Where T - T+x are the timestamps (in milliseconds) of the unresponsive period, and [stack] is a JS callstack of something executing at that time. Output needs work!
Assignee: nobody → ted.mielczarek
Bug 642054 seems related.  One could imagine collecting only the profile data for unresponsive intervals to answer the question: when we are unresponsive, where is our time spent in JS?
Right. As it stands, the code I implemented doesn't know whether the app is unresponsive until after the fact. :) In bug 653703 I talked about possibly changing that so that the background thread would wait for 50ms at max, then start profiling until the event loop responded, which would probably work.
Or, depending on the overhead, always be profiling, but throw away the data at the end of each interval if the interval wasn't very long.
(In reply to comment #4)
> Bug 642054 seems related.  One could imagine collecting only the profile
> data for unresponsive intervals to answer the question: when we are
> unresponsive, where is our time spent in JS?

Yes, I was thinking of that too. (At the moment, I'm also spitting out native stack frames for any sample that hits at an unrecognized instruction pointer, and manually postprocessing those with addr2line. So you kinda sorta don't need the "in JS" qualifier.)

(In reply to comment #5)
> Right. As it stands, the code I implemented doesn't know whether the app is
> unresponsive until after the fact. :) In bug 653703 I talked about possibly
> changing that so that the background thread would wait for 50ms at max, then
> start profiling until the event loop responded, which would probably work.

...but that makes it a little trickier!

Though perhaps in practice it could be workable. I'm actually queuing up samples for a while before a processing thread reads them, so you sort of have about 160ms to decide whether you were unresponsive. That's tricky to guarantee, though, so either doing:

(In reply to comment #6)
> Or, depending on the overhead, always be profiling, but throw away the data
> at the end of each interval if the interval wasn't very long.

or keeping all samples but copying them out into another queue that only gets flushed when the event loop monitor says to would work. (Actually, if you provided a global or preferably thread-local variable that said "I am currently processing event X, and if I still say this by time 1123234324324.332234, then I was unresponsive since then", I could do it as a plain consumer.)

But I still have a fair amount of work to get bug 642054 in reasonable shape and cross-platform first.
No longer blocks: 651580
The new profiler (bug 713227 and friends) will fix this instead.
Depends on: 713227
No longer depends on: 580055
Assignee: ted.mielczarek → nobody
Both bugs blocking this have been fixed, and SPS is well-established.  I'm going to close this.  Please reopen if you disagree.
Status: NEW → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → FIXED
Resolution: FIXED → WORKSFORME
You need to log in before you can comment on or make changes to this bug.