Open Bug 1505944 Opened 6 years ago Updated 21 days ago

ts_paint results don't match firstPaint time in a profile (base time in the profile uses StartupInfo.main but results in talos use StartupInfo.processCreation)

Categories

(Core :: Gecko Profiler, enhancement, P3)

enhancement

Tracking

()

People

(Reporter: bgrins, Unassigned)

References

Details

Attachments

(1 file, 1 obsolete file)

In two profiles gathered locally with:

./mach talos-test -a ts_paint --geckoProfile --cycles 4

browser.xhtml
- https://perfht.ml/2Db2MOb

browser.xul:
- https://perfht.ml/2DajPjl

This test (ts_paint) is waiting for the content process to start up and load a page, which is guarded on [onload] running in the chrome. Looking at the markers before onload, there's a large RefreshDriverTick immediately after readystatechange in browser.xhtml, driven largely by Styles flushing.

Note that in browser.xhtml we do set the root node to be hidden until readystatechange to prevent incremental layout from constructing XBL during parse, and that attribute gets removed here: https://searchfox.org/mozilla-central/rev/17f55aee76b7c4610a974cffd3453454e0c8de7b/browser/base/content/browser.xul#102. So presumably the RefreshDriverTick and style flush is in result of that.

But in the b.xhtml case that flush takes 25ms (with the full RefreshDriverTick taking 54ms):

{"start":532.314513,"dur":25.434350999999992,"name":"Styles","data":{"category":"Paint","elementsMatched":303,"elementsTraversed":340,"docShellId":"{863a3376-adc2-9f44-85a3-6437f7f9e441}","elementsStyled":340,"stylesShared":37,"startTime":532.314513,"endTime":557.748864,"type":"Styles","stylesReused":9,"cause":{"time":497.378935,"stack":19210}},"title":null}

In b.xul I see a number of tiny style flushes early on that basically traverse 0 elements. The first "big" ones happen after DOMContentLoaded (but still before onload), and they are .8ms and 1.8ms respectively:

{"start":770.285668,"dur":0.869246999999973,"name":"Styles","data":{"category":"Paint","elementsMatched":14,"elementsTraversed":73,"docShellId":"{78482f18-1fca-b340-b063-bb31e99cc311}","elementsStyled":29,"stylesShared":6,"startTime":770.285668,"endTime":771.154915,"type":"Styles","stylesReused":1,"cause":{"time":531.7254780000001,"stack":19439}},"title":null}

{"start":799.8638070000001,"dur":1.8775319999999738,"name":"Styles","data":{"category":"Paint","elementsMatched":43,"elementsTraversed":262,"docShellId":"{78482f18-1fca-b340-b063-bb31e99cc311}","elementsStyled":261,"stylesShared":0,"startTime":799.8638070000001,"endTime":801.741339,"type":"Styles","stylesReused":52,"cause":{"time":799.634703,"stack":19505}},"title":null}
One thing I'd like to figure out is why so many more elements seem to be matched in the browser.xhtml case. A couple thoughts:
- Is the xul document somehow suppressing styles?
- Do we have to do more work to check if a css rule matches now that most elements are xul namespaced but the document is html namespaced? Some of the CSS files don't seem to specify a default namespace (like https://searchfox.org/mozilla-central/source/browser/base/content/tabbrowser.css).
For taking the profile in b.xhtml I built with `mk_add_options 'export MOZ_BROWSER_XHTML=1'` (artifact builds work fine for ts_paint, although it needs to be a full build for tpaint since that uses window.open).
In the XHTML profile's 25ms style flush:
 - 8ms is in "Gecko_GetFontMetrics" (which involves e.g. 3ms of "TDescriptorSource::CreateSplicedFonts" and other stuff that looks like font initialization).  Those functions don't show up in the XUL profile at all, strangely.

 - 5ms is in StartupCache::GetBuffer (which seems to be unzipping some cached-to-disk XBL info, or something?)

 - Of the remaining time (if I drop samples with those functions ^^ from the profile), and look at the remaining samples:
   * Servo_TraverseSubtree is only in 3ms of the remaining samples.
   * "LoadBindings" is in 7ms of the remaining samples (with strings like "AtomizeChars" and "RegexpParser" in some of the mangled function names)


So on the whole, it looks like this 25ms style flush is doing very little Stylo work, and is mostly initializing/loading XBL bindings and font stuff.
>   Those functions don't show up in the XUL profile at all, strangely.

Oops, sorry -- this suspicious observation was wrong. I was looking at the "marker chart" which unsurprisingly didn't have any results for those searches. :) But the call tree does have results.
Specifically:
 - the XUL profile's "Call Tree" pane does shows 16ms of _Gecko_GetFontMetrics, during the "MozBeforeInitialXULLayout" event.
 - it also shows 3ms of CreateSplicedFonts inside of "__ZN7mozilla3dom11XULDocument11DoneWalkingEv"
 - it also shows 5ms of StartupCache::GetBuffer inside of that same function (__ZN7mozilla3dom11XULDocument11DoneWalkingEv)
So at first glance, it looks like the XUL case may be spreading out some busywork to happen a bit earlier (before the style flush where the busywork happens in the XHTML case).

Note that XUL's 16ms "Gecko_GetFontMetrics" call is part of part of some lazy restyle work that happens during some XBL setup ("toolbar_XBL_Constructor").  It doesn't show up as a blue bar on the profile timeline, but it does have "resolvestyle" in the backtrace, and it happens because XBL is causing JS to run which depends on some style info.

So: at first glance, it seems like that 25ms style flush is a batch of work that's mostly happening at roughly the same time in both cases, but in one case it's mostly triggered by XBL->JS->restyle, whereas in the other case (when we suppress XBL) it's triggered by a refresh driver tick.

So, there doesn't seem to be anything too concerning here, AFAICT.
Thanks Daniel! I'm going to look into why we are running XBL in this case and also see if we can get away from hiding the entire root node in the hope that it will spread some of this work out.
Here's a profile where instead of hiding the root node, I prevent XBL from running on the toolbar element until after readystatechange (which is needed to avoid JS errors due to early construction): https://perfht.ml/2D9Zk6m.

This helps with the time spent inside Styles (although we end up doing a pretty huge number of elements). But there's a big RefreshDriverTick still before DOMContentLoaded. Does anything stand out to you here - specifically this one: https://perfht.ml/2D8GT20. The xul document seems to somehow avoid any individual RefreshDriverTick of that size.
Flags: needinfo?(dholbert)
So, I'm looking at the timing of the "load" event in comment 7's profile vs. the XUL profile from comment 0 (since that seems to be held up by the refresh driver tick), and I'm noticing one thing:

* In the XHTML example from comment 7, we have 27ms of nsCSSFrameConstructor work before the "load" marker, 10ms of which is during the refresh driver tick.

...whereas:
* In the XUL profile, we do *much less* frame construction before the "load" marker, and we do *much more* frame construction between "load" and "DOMContentLoaded". There are only 10ms with nsCSSFrameConstructor in the backtrace up until "load" fires (nearly all in nsCSSFrameConstructor::ConstructRootFrame, doing stuff with fonts and other OS-level initialization stuff).  And then there are 20ms with nsCSSFrameConstructor in the backtrace *between load and DOMContentLoaded*.

So to the extent that "load" is an event that we're measuring here, that difference may be something we want to look into.
Relatedly:
* in the XHTML example from comment 7, we spend 21ms with "Reflow" in the backtrace before DOMContentLoaded is fired (all of which is during the refresh driver tick, and is shown as a blue bar)

* in the XUL profile, we spend 0ms with "Reflow" in the backtrace before DOMContentLoaded is fired. (The marker chart does show one 0.11ms reflow as part of a 0.13ms refresh driver tick just before "load" fires), but clearly that's negligible.)


This is likely just a downstream effect of the observations in comment 9 -- we're delaying most of the frame construction in the XUL case, which means we've got nothing to reflow until later on.
Flags: needinfo?(dholbert)
(In reply to Daniel Holbert [:dholbert] from comment #9)
> So to the extent that "load" is an event that we're measuring here, that
> difference may be something we want to look into.

I believe "load" is tangentially what we are measuring here. For ts_paint I'm not positive when the timing starts (startup?) but it ends once the first content process is ready (MozAfterPaint?). The thing is that we don't start initializing the content process until the first MozAfterPaint in the parent (inside delayedStartup: https://searchfox.org/mozilla-central/rev/a3894a4dcfb5d42f2e6eee6cb9faf7141879ef1a/browser/base/content/browser.js#1453). And we don't run MozAfterPaint until [onload] is complete.

Mike, is that an accurate summary? And also, do you know what the start of the measurement is for ts_paint?
Flags: needinfo?(mconley)
Hi bgrins,

ts_paint is a little misleading, because it looks like it waits for / cares about MozAfterPaint events, but really the value that's being recorded is the delta between the firstPaint startup marker and the process start startup marker:

https://searchfox.org/mozilla-central/rev/a3894a4dcfb5d42f2e6eee6cb9faf7141879ef1a/testing/talos/talos/startup_test/tspaint_test.html#25
Flags: needinfo?(mconley)
(In reply to Mike Conley (:mconley) (:⚙️) from comment #12)
> Hi bgrins,
> 
> ts_paint is a little misleading, because it looks like it waits for / cares
> about MozAfterPaint events, but really the value that's being recorded is
> the delta between the firstPaint startup marker and the process start
> startup marker:
> 
> https://searchfox.org/mozilla-central/rev/
> a3894a4dcfb5d42f2e6eee6cb9faf7141879ef1a/testing/talos/talos/startup_test/
> tspaint_test.html#25

I only see firstPaint markup in the parent process, so are we are comparing the diff between that marker in the parent with when the first content process starts (#2 in this case https://perfht.ml/2Db2MOb)? Or comparing it with when the parent process started ?
Flags: needinfo?(mconley)
(In reply to Brian Grinstead [:bgrins] from comment #13)
> I only see firstPaint markup in the parent process, so are we are comparing
> the diff between that marker in the parent with when the first content
> process starts (#2 in this case https://perfht.ml/2Db2MOb)? Or comparing it
> with when the parent process started ?

We're comparing it against the parent process start. Both readings come from the startupInfo object which is queried from the parent, and reports the parent's results.
Flags: needinfo?(mconley)
(In reply to Mike Conley (:mconley) (:⚙️) from comment #14)
> (In reply to Brian Grinstead [:bgrins] from comment #13)
> > I only see firstPaint markup in the parent process, so are we are comparing
> > the diff between that marker in the parent with when the first content
> > process starts (#2 in this case https://perfht.ml/2Db2MOb)? Or comparing it
> > with when the parent process started ?
> 
> We're comparing it against the parent process start. Both readings come from
> the startupInfo object which is queried from the parent, and reports the
> parent's results.

Alright thanks, that will help a lot. I have one more question, but it's not urgent and can wait till next week:

When I run, say `./mach talos-test -a ts_paint --geckoProfile --cycles 4` and then afterwards open up `testing/mozharness/build/local.json` (where talos tells me the results are), I see something like:

```
{
  "framework": {
    "name": "talos"
  },
  "suites": [
    {
      "extraOptions": [
        "e10s",
        "stylo",
        "geckoProfile"
      ],
      "name": "ts_paint",
      "subtests": [
        {
          "name": "ts_paint",
          "replicates": [
            996.0,
            906.0,
            6961.0,
            888.0
          ],
          "value": 906.0
        }
      ]
    }
  ]
}
```

But, then when I open the profile (say Cycle 0), flip to the marker table, filter for firstPaint I see .831 seconds: https://perfht.ml/2DeQwMN. I was expecting this would be 9.96 seconds (to match the first result). What gives?
Flags: needinfo?(mconley)
See Also: → 1507018
Component: General → CSS Parsing and Computation
Product: Firefox → Core
Good question. Let's see:

The firstPaint start-up recording is done here:

https://searchfox.org/mozilla-central/rev/d850d799a0009f851b5535580e0a8b4bb2c591d7/view/nsViewManager.cpp#361

This calls RecordOnce, which itself calls Record(ev):

https://searchfox.org/mozilla-central/rev/d850d799a0009f851b5535580e0a8b4bb2c591d7/toolkit/components/startup/StartupTimeline.cpp#44

which sets the profiler marker here:

https://searchfox.org/mozilla-central/rev/d850d799a0009f851b5535580e0a8b4bb2c591d7/toolkit/components/startup/StartupTimeline.h#62


Interestingly, RecordOnce special-cases FIRST_PAINT right here:

https://searchfox.org/mozilla-central/rev/d850d799a0009f851b5535580e0a8b4bb2c591d7/toolkit/components/startup/StartupTimeline.cpp#46-53

and that's the value that gets written into Telemetry.

Okay, now, let's look at the nsIAppStartup::GetStartupInfo thing that's used for the ts_paint results:

https://searchfox.org/mozilla-central/rev/d850d799a0009f851b5535580e0a8b4bb2c591d7/toolkit/components/startup/nsAppStartup.cpp#753-802

It looks like it's pulling that info directly from the timestamps that are recorded in StartupTimeline. And then it looks like the timestamp is adjusted so that it's a delta from the time of process start:

https://searchfox.org/mozilla-central/rev/d850d799a0009f851b5535580e0a8b4bb2c591d7/toolkit/components/startup/nsAppStartup.cpp#133-139

So... in theory, the marker and the delta should somewhat agree - unless the difference is coming from what nsAppStartup thinks to be the process start time. I'm not sure. :/

Maybe gsvelto might know? He worked on the nsAppStartup code.
Flags: needinfo?(mconley) → needinfo?(gsvelto)
Both timers are using deltas from TimeStamp::ProcessCreation() so they should be coherent. The telemetry timestamps is in milliseconds from what I can tell so it should be 0.996s, not 9.96s. The results are still different but not terribly so. The profiler is getting it's own timestamp here:

https://searchfox.org/mozilla-central/rev/d850d799a0009f851b5535580e0a8b4bb2c591d7/tools/profiler/core/platform.cpp#3734

... while the StartupTimeline is taking it's own here:

https://searchfox.org/mozilla-central/rev/d850d799a0009f851b5535580e0a8b4bb2c591d7/toolkit/components/startup/StartupTimeline.h#63

So they're bound to be different, I just wouldn't expect this much of a difference (100+ ms is still a lot). One way to make them coherent would be to call the two-parameters version of profiler_add_marker() [1] with the same timestamp as the one that will be recorded in telemetry.

I can cook up a patch that does that if it would help here.

[1] https://searchfox.org/mozilla-central/rev/d850d799a0009f851b5535580e0a8b4bb2c591d7/tools/profiler/public/GeckoProfiler.h#596
Flags: needinfo?(gsvelto)
Try this patch and see if it helps. If it does I'll polish it up and get it landed.
Priority: -- → P3
(In reply to Gabriele Svelto [:gsvelto] from comment #19)
> Created attachment 9025154 [details] [diff] [review]
> [PATCH] Use the same timestamp for StartupTimeline records and corresponding
> profiler markers
> 
> Try this patch and see if it helps. If it does I'll polish it up and get it
> landed.

Thanks for adding this. I did a test where I run `./mach talos-test -a ts_paint --cycles 2 --geckoProfile` both with and without the patch.

1) Without patch:
- Profile cycle 0 - firstPaint .976s: https://perfht.ml/2Tsod2R
- Profile cycle 1 - firstPaint .940s: https://perfht.ml/2AaCgRI
- Open testing/mozharness/build/local.json
    {
      "framework": {
        "name": "talos"
      },
      "suites": [
        {
          "extraOptions": [
            "e10s",
            "stylo",
            "geckoProfile"
          ],
          "name": "ts_paint",
          "subtests": [
            {
              "name": "ts_paint",
              "replicates": [
                1108.0,
                1095.0
              ],
              "value": 1095.0
            }
          ]
        }
      ]
    }

2) With patch:
- Profile cycle 0 - firstPaint 1.084s: https://perfht.ml/2A8s8ZV 
- Profile cycle 1 - firstPaint 1.018s: https://perfht.ml/2AaCgRI
- Open testing/mozharness/build/local.json
    {
      "framework": {
        "name": "talos"
      },
      "suites": [
        {
          "extraOptions": [
            "e10s",
            "stylo",
            "geckoProfile"
          ],
          "name": "ts_paint",
          "subtests": [
            {
              "name": "ts_paint",
              "replicates": [
                1263.0,
                1184.0
              ],
              "value": 1184.0
            }
          ]
        }
      ]
    }

In both cases, they are directionally correct (the replicate with the lower value ends up hitting firstPaint earlier than the one with the higher value), but the absolute numbers don't seem to match up in either case (off by ~100-200ms).
Flags: needinfo?(gsvelto)
(In reply to Brian Grinstead [:bgrins] from comment #20)
> In both cases, they are directionally correct (the replicate with the lower
> value ends up hitting firstPaint earlier than the one with the higher
> value), but the absolute numbers don't seem to match up in either case (off
> by ~100-200ms).

It seems that whatever telemetry is putting in the ts_paint event is not what we expect. I'll leave the NI and I'll try to investigate this further ASAP.
Flags: needinfo?(gsvelto)
Flags: needinfo?(gsvelto)
OK, I figured this out. I did some tests with a debug build and the results were off the scale with ts_paint being over 2s later than the firstPaint marker in the profiler. That got me thinking and I started poking at the StartupInfo structure... and found out that the firstPaint marker in the profile matched exactly the |StartupInfo.firstPaint - StartupInfo.main| difference while the time in ts_paint was |StartupInfo.firstPaint - StartupInfo.processCreation|.

That's it. I'm not familiar with the profiler code so I don't know why it does that but the discrepancy between the two numbers is just that, they're computing the offsets from a different starting point.
Flags: needinfo?(gsvelto)

Hey florian, do you or someone on your team know the answer to why the firstPaint marker's origin is StartupInfo.main rather than StartupInfo.processCreation for the profiler? (see comment 22)

Flags: needinfo?(florian)

Morphing this bug into the profiler issue, since the initial reported issue appears to be due mostly to moving around frame construction, and the perf fix for browser.xhtml will likely be prototype-cache related, not layout related.

Component: CSS Parsing and Computation → Gecko Profiler
Summary: browser.xhtml: Investigate slow refreshdrivertick / style flush before DOMContentLoaded → ts_paint results don't match firstPaint time in a profile (base time in the profile uses StartupInfo.main but results in talos use StartupInfo.processCreation)
No longer blocks: top-level-html
Attachment #9023817 - Attachment is obsolete: true
Severity: normal → S3
Flags: needinfo?(florian)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: