Closed Bug 1392280 Opened 7 years ago Closed 7 years ago

Regression in FX_NEW_WINDOW_MS

Categories

(Firefox :: General, defect, P1)

defect

Tracking

()

RESOLVED FIXED
Firefox 57
Iteration:
57.2 - Aug 29
Tracking Status
firefox55 --- unaffected
firefox56 --- unaffected
firefox57 --- fixed

People

(Reporter: past, Assigned: eoger)

References

(Blocks 1 open bug)

Details

(Whiteboard: [photon-performance])

Attachments

(2 files)

Telemetry shows a regression in the time it takes to open a new tab since Friday:

https://mzl.la/2vXkYFv

The regression is significant (takes us back to early-55 days) so we should figure out what caused it.
Whiteboard: [photon-performance]
(In reply to Panos Astithas [:past] (56 Regression Engineering Owner) (please ni?) from comment #0)
> Telemetry shows a regression in the time it takes to open a new tab since
> Friday:
> 

To be clear, this probe measures the time to open new windows, and not new tabs.
Whiteboard: [photon-performance] → [photon-performance] [triage]
Normally, this kind of bug would be hellish to try to diagnose, especially because tpaint (the window open talos test) didn't notice any regression.

I was, however, able to reproduce this regression locally. Here's what I did:

1) Get a baseline measurement of FX_NEW_WINDOW_MS on the 2017-08-17 Nightly. I used:

./mach mozregression --launch=2017-08-17 --pref "toolkit.telemetry.enabled:true"

Then opened a bunch of windows using Ctrl-N, and then opened up about:telemetry and looked at the FX_NEW_WINDOW_MS histogram. Here's what I got:

FX_NEW_WINDOW_MS
11 samples, average = 60.4, sum = 664

 22 |  0  0%
 37 |#########################  6  55%
 62 |#####################  5  45%
103 |  0  0%


Then I did the same thing for the 2017-08-19 build:

./mach mozregression --launch=2017-08-19 --pref "toolkit.telemetry.enabled:true"

FX_NEW_WINDOW_MS
18 samples, average = 99.6, sum = 1793

 37 |  0  0%
 62 |#########################  12  67%
103 |#############  6  33%
171 |  0  0%

The text-based histograms are probably making this a little difficult to notice, but there's been a definite shift towards the low 100's in the 2017-08-19th build as compared to the 2017-08-17th build.

I was able to reproduce this regression with multiple repetitions of the experiment.

2) Then, I ran the following mozregression command:

./mach mozregression --good=2017-08-17 --bad=2017-08-19 --pref "toolkit.telemetry.enabled:true"

and repeated the above experiment, using a fixed number of windows to manually open with Ctrl-N (I chose 7 arbitrarily).


Here's what I got:

 3:57.38 INFO: Narrowed inbound regression window from [6a8b37d4, ccffcdc9] (3 builds) to [6a8b37d4, b0f76b21] (2 builds) (~1 steps left)
 3:57.38 INFO: No more inbound revisions, bisection finished.
 3:57.38 INFO: Last good revision: 6a8b37d47733df46e639c885eda6616e66379711
 3:57.38 INFO: First bad revision: b0f76b216a196dc786c2853e3077746d8b772b2a
 3:57.38 INFO: Pushlog:
https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=6a8b37d47733df46e639c885eda6616e66379711&tochange=b0f76b216a196dc786c2853e3077746d8b772b2a

That's bug 1380377. A manual backout of that patch locally confirms that this improves the FX_NEW_WINDOW_MS probe by a significant degree for me locally.

So I think that's our best candidate for the regressor.
Blocks: 1380377
Bug 1380377 is simple enough that a backout for a few days would probably confirm or refute that it's the regressor.
Attached patch not_a_fix.patchSplinter Review
I can reproduce this reliably following mconley's comment 3 instructions.
Funny thing is, dropping a <html:img> with a correct src attribute anywhere in the file """"fixes"""" the problem. Not really sure what is happening here. (see attached file).
What course of action can we take?
(In reply to Edouard Oger [:eoger] from comment #5)
> What course of action can we take?

I recommend the following:

1) Back out bug 1380377, and ensure that we see our FX_NEW_WINDOW_MS numbers bounce back
2) Put window.performance.mark markers around the start and end points for the FX_NEW_WINDOW_MS probe, and then use window.performance.measure to construct a measurement between those two markers. This will cause the time range being measured to show up in the Gecko Profiler timeline.
3) Capture a profile of opening a few windows without the patch, and then with the patch, and then manually compare the profiles looking for an explanation.

I can coach somebody through (2) and (3), but I think the highest priority is probably (1).
Flags: qe-verify-
Priority: -- → P2
Whiteboard: [photon-performance] [triage] → [photon-performance]
Assignee: nobody → eoger
Status: NEW → ASSIGNED
Priority: P2 → P1
Normal profile:
https://perfht.ml/2xpHfeB

Regressed profile:
https://perfht.ml/2xpGdz1

We can see that we are calling an expensive nsDocument::DispatchContentLoadedEvents method in the regressed profile, which might explain the slowdown. Can we ni? someone who might know what is happening here?
Merged backout:
https://hg.mozilla.org/mozilla-central/rev/5f5549876ea6

Does this resolve this bug and reopen 1380377?
Flags: needinfo?(eoger)
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Iteration: --- → 57.2 - Aug 29
Target Milestone: --- → Firefox 57
(In reply to Edouard Oger [:eoger] from comment #10)
> Normal profile:
> https://perfht.ml/2xpHfeB
> 
> Regressed profile:
> https://perfht.ml/2xpGdz1
> 
> We can see that we are calling an expensive
> nsDocument::DispatchContentLoadedEvents method in the regressed profile,
> which might explain the slowdown. Can we ni? someone who might know what is
> happening here?

Comparing these 2 profiles, I don't see a big difference in the code that's running in them. The major difference I see is that in the "normal" profile the "window-open" measurement stops around when we start executing the 'load' handler of the new window, whereas in the 'regressed' profile the "window-open" measurement stops just before the 'MozAfterPaint' event.

This makes me wonder if it's possible that the FX_NEW_WINDOW_MS probe has been measuring incorrect data because the "document-shown" observer notification may not be what we want (after paint is more relevant).
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: