Open Bug 714216 Opened 13 years ago Updated 2 years ago

Simple Jank Metering

Categories

(Firefox :: General, defect)

12 Branch
x86_64
Windows 7
defect

Tracking

()

People

(Reporter: dmandelin, Unassigned)

References

Details

Attachments

(3 files)

Attached patch PoC patchSplinter Review
I wanted to try out a simple idea for measuring jank. Mostly, it just times how long each event in the main thread event loop takes to run. For a few items that I know can jank (GC, CC, saving sessionstore.js), it prints out how long that activity takes, so it can be compared to the delay of the next event.

Preliminary results and remarks in following comments.
Attached file Analysis script
This is from a sample run of Firefox with about 8 tabs open. The analysis excludes startup: I used a simple manually triggered 'mark' line in the output to decide when startup is done. In this session, I opened some Google News tabs (experiencing a lot of subjective jank), closed them, and then deleted a bunch of emails in Zimbra, which often janks for me.

One thing I forgot to mention before is that the instrumentation only prints out events that take longer than 1 ms to handle, just to cut down on irrelevant data.

The basic output in the attachment shows summary statistics in several groups: first for all events, then for ones during which a GC happened, then for ones with a CC, etc. 

Remarks:

- It's a very skewed distribution: the mean time for these events is about 30 ms, but the median is only 2 ms.
- The browser was busy handling events that take 1 ms or longer for about 37% of the wall clock time. That seems pretty high--it's not surprising that the browser is generally not so responsive if it is busy >30% of the time.
- When a GC or CC occurs, the event during which it occurs is almost always about the same length.
- With session store saves, the correlation is weaker: often the session store save takes less time than the event. Not sure why.

- In this session, we get a 1 second event every 13 seconds or so. I would classify that as 'terrible'.
- We get a 100 ms event every 3 seconds or so. So that means glitching in animations pretty frequently.
- We get a 10 ms event about 3 times a second. That's going to hurt 60 fps gaming.

- GC-triggering events are pretty long: 30 ms median, 80 ms mean. But, we don't get that many GCs: they are only about 1/6 of >100 ms events, and were never over 1 s. In general browsing, I do get GC jank very frequently (observed with the console).
- CC-triggering events are usually not bad at all for me: median of 10 ms, mean 20 ms, but there was one 60 ms CC. That also matches my experience: usually CC pauses are not noticeable in general browsing, but sometimes they are. The small ones will hurt 60fps gaming, though.
- Session store is also usually fairly quick, but slower than CC, and sometimes it's long.
- The mystery group is big: most pauses, especially the longest ones, are neither GC, CC, nor session store.
I also made a few graphs:

https://docs.google.com/spreadsheet/ccc?key=0AsEOzaxftycIdGlhUWdmQWN6a2dYWE1od21XY3FXdmc

Jank Period shows how often pauses of length > X occur. For example, it shows that a pause of 200 ms or more happens about every 6 seconds. It's kind of interesting that for pauses under 400 ms or so, the period is proportional to the pause length (or in other words, frequency is 1/length, which is a power-law distribution). Longer pauses follow a different pattern, and there are too few to generalize.

By Class shows pause lengths, with dot color indicating the kind of event (or really, known jank cause that occurred during the event). The X axis is a dummy: I plot at random X values to make it easier to see how many values there are of roughly the same length. This visually shows that most pause events (and all long pause events) are not CC, GC, or session, which I already noted above from the summary stats.

Short By Class shows the same thing, but only for events under 500 ms, so that they are easier to see on the Y scale. Again, it gives a nice visual comparison of CC vs. GC and so on. It also makes it a lot easier to see the spread than the summary stats do: e.g., you can see that GC pauses were pretty consistent, and that CC was really consistent except for a few outliers.
(In reply to Reuben Morais [:reuben] from comment #3)
> Isn't this doing something very similar to Tp5 Responsiveness?
> 
> https://mxr.mozilla.org/mozilla-central/source/toolkit/xre/EventTracer.cpp#38
> https://groups.google.com/d/msg/mozilla.dev.platform/0AkEHXV3iiU/4FJnIkNiFboJ

Indeed, I was inspired by EventTracer (forgot to mention that in comment 0). But I wanted to measure everything, and measure it directly. I also wanted to be able to easily run my own analyses and correlate the results with other things I know about.
Some things I think it would be interesting to add:

- Also measure the time between an event being dispatched and it starting to run. (EventTracer measures wait+run, currently my PoC measures only run.)
- On top of that, analyze which events are making other events wait.
- Classify events according to what triggered them or what they are trying to do.
- Capture stacks during long event runs or wait periods.
(In reply to David Mandelin from comment #5)
> (In reply to Reuben Morais [:reuben] from comment #3)
> > Isn't this doing something very similar to Tp5 Responsiveness?
> > 
> > https://mxr.mozilla.org/mozilla-central/source/toolkit/xre/EventTracer.cpp#38
> > https://groups.google.com/d/msg/mozilla.dev.platform/0AkEHXV3iiU/4FJnIkNiFboJ
> 
> Indeed, I was inspired by EventTracer (forgot to mention that in comment 0).
> But I wanted to measure everything, and measure it directly. I also wanted
> to be able to easily run my own analyses and correlate the results with
> other things I know about.

Cool, I was just checking if you were rewriting stuff that already exists :)

Interesting data – I find it weird that most of the GC/CC pauses are under 150ms.
I'll use a build with this patch and try to reproduce some of the huge CC pauses so we can correlate them to other pause durations.
> - The mystery group is big: most pauses, especially the longest ones, are neither GC,
> CC, nor session store.

Other obvious candidates to consider logging are "content script" and "refresh driver tick".
You could also look through the dependent bugs in bug 429592 for some ideas.

> Interesting data – I find it weird that most of the GC/CC pauses are under 150ms.

If you look at telemetry, GC/CC pauses aren't all that terrible overall.  It just has a really awful long tail, especially for people who have a ton of tabs.
Please note that you're only measuring the jank of XPCOM events, not native events. Because much of our interactive time is spent processing native events, we should make sure to measure those as well.
It would definitely be interesting to run with both EventTracer+this analysis and see if things correlate well, or if there are pauses that either methodology is missing.

When we were tuning EventTracer to run in Tp5, jlebar wanted to increase the sampling frequency to increase coverage. We did test runs with the sampling interval as low as 1ms, but at that point it started to perturb Talos results, so we bumped it back up to the current levels.
> When we were tuning EventTracer to run in Tp5, jlebar wanted to increase the sampling frequency to 
> increase coverage.

Well, I'm a bit more interested in decreasing noise in the final number.  The current design adds noise smaller than the sampling frequency (we settled on 10ms?), but this would get us exact number.

But now that we have event-tracer running on Talos, I see that we have tons of noise on some platforms and close to zero noise on others.  Something is going on, but I suspect that the sampling-frequency noise isn't at fault (since then the noise would be everywhere).  Unfortunately graphserver is not showing me TP5-Responsiveness atm, so I can't paste a link here.

One advantage of measuring directly rather than sampling is that this doesn't destroy battery life.  It might even be fast enough to run in nightly/aurora builds.
Oh, I see; apparently the TP5-Responsivess isn't running on m-i?  Anyway, graph here: http://goo.gl/TKNBp
Ted thinks that the win/mac differences we see in TP5-Responsiveness may be due to differences in the native event loops on mac/win and/or differences in how we inject events into those event loops.

It would be great if we could test this hypothesis by using the same (or at least similar) code cross-platform.
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #10)
> Please note that you're only measuring the jank of XPCOM events, not native
> events. Because much of our interactive time is spent processing native
> events, we should make sure to measure those as well.

Ah, that might explain why some subjective janks didn't show up in the output. I knew nothing about our event loop(s), I just used a debugger to find out where stuff was getting called from, and kept finding what is apparently the XPCOM event loop. Is the Windows native event loop this thing:

  http://mxr.mozilla.org/mozilla-central/source/widget/windows/nsAppShell.cpp#343
(In reply to Ted Mielczarek [:ted, :luser] from comment #11)
> It would definitely be interesting to run with both EventTracer+this
> analysis and see if things correlate well, or if there are pauses that
> either methodology is missing.

Where is EventTracer documented again (i.e., how do I run it)? Does it just dump to stdout, too, so I'm just comparing its messages to the other kind?
Blocks: 715774
> http://mxr.mozilla.org/mozilla-central/source/widget/windows/nsAppShell.
> cpp#343

Yes mostly. You can grep for calls to the hang monitor http://mxr.mozilla.org/mozilla-central/search?string=hangmonitor%3A%3A to catch places where we block on events or unblock to run an event. I'm thinking that it may be best to just hook the jank metering directly up with the hangmonitor calls.
Whiteboard: [Snappy]
I'm going to mark this wontfix, since we have a proper profiler now (bug 713227)
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: