Telemetry shows a regression in the time it takes to open a new tab since Friday: https://mzl.la/2vXkYFv The regression is significant (takes us back to early-55 days) so we should figure out what caused it.
(In reply to Panos Astithas [:past] (56 Regression Engineering Owner) (please ni?) from comment #0) > Telemetry shows a regression in the time it takes to open a new tab since > Friday: > To be clear, this probe measures the time to open new windows, and not new tabs.
Here are the pushes to mozilla-central between the August 17th and August 18th Nightly builds: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=932388b8c22c9775264e543697ce918415db9e23&tochange=a6a1f5c1d971dbee67ba6eec7ead7902351ddca2
Whiteboard: [photon-performance] → [photon-performance] [triage]
Normally, this kind of bug would be hellish to try to diagnose, especially because tpaint (the window open talos test) didn't notice any regression. I was, however, able to reproduce this regression locally. Here's what I did: 1) Get a baseline measurement of FX_NEW_WINDOW_MS on the 2017-08-17 Nightly. I used: ./mach mozregression --launch=2017-08-17 --pref "toolkit.telemetry.enabled:true" Then opened a bunch of windows using Ctrl-N, and then opened up about:telemetry and looked at the FX_NEW_WINDOW_MS histogram. Here's what I got: FX_NEW_WINDOW_MS 11 samples, average = 60.4, sum = 664 22 | 0 0% 37 |######################### 6 55% 62 |##################### 5 45% 103 | 0 0% Then I did the same thing for the 2017-08-19 build: ./mach mozregression --launch=2017-08-19 --pref "toolkit.telemetry.enabled:true" FX_NEW_WINDOW_MS 18 samples, average = 99.6, sum = 1793 37 | 0 0% 62 |######################### 12 67% 103 |############# 6 33% 171 | 0 0% The text-based histograms are probably making this a little difficult to notice, but there's been a definite shift towards the low 100's in the 2017-08-19th build as compared to the 2017-08-17th build. I was able to reproduce this regression with multiple repetitions of the experiment. 2) Then, I ran the following mozregression command: ./mach mozregression --good=2017-08-17 --bad=2017-08-19 --pref "toolkit.telemetry.enabled:true" and repeated the above experiment, using a fixed number of windows to manually open with Ctrl-N (I chose 7 arbitrarily). Here's what I got: 3:57.38 INFO: Narrowed inbound regression window from [6a8b37d4, ccffcdc9] (3 builds) to [6a8b37d4, b0f76b21] (2 builds) (~1 steps left) 3:57.38 INFO: No more inbound revisions, bisection finished. 3:57.38 INFO: Last good revision: 6a8b37d47733df46e639c885eda6616e66379711 3:57.38 INFO: First bad revision: b0f76b216a196dc786c2853e3077746d8b772b2a 3:57.38 INFO: Pushlog: https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=6a8b37d47733df46e639c885eda6616e66379711&tochange=b0f76b216a196dc786c2853e3077746d8b772b2a That's bug 1380377. A manual backout of that patch locally confirms that this improves the FX_NEW_WINDOW_MS probe by a significant degree for me locally. So I think that's our best candidate for the regressor.
Bug 1380377 is simple enough that a backout for a few days would probably confirm or refute that it's the regressor.
Created attachment 8899571 [details] [diff] [review] not_a_fix.patch I can reproduce this reliably following mconley's comment 3 instructions. Funny thing is, dropping a <html:img> with a correct src attribute anywhere in the file """"fixes"""" the problem. Not really sure what is happening here. (see attached file). What course of action can we take?
(In reply to Edouard Oger [:eoger] from comment #5) > What course of action can we take? I recommend the following: 1) Back out bug 1380377, and ensure that we see our FX_NEW_WINDOW_MS numbers bounce back 2) Put window.performance.mark markers around the start and end points for the FX_NEW_WINDOW_MS probe, and then use window.performance.measure to construct a measurement between those two markers. This will cause the time range being measured to show up in the Gecko Profiler timeline. 3) Capture a profile of opening a few windows without the patch, and then with the patch, and then manually compare the profiles looking for an explanation. I can coach somebody through (2) and (3), but I think the highest priority is probably (1).
Priority: -- → P2
Whiteboard: [photon-performance] [triage] → [photon-performance]
Assignee: nobody → eoger
Status: NEW → ASSIGNED
Priority: P2 → P1
Comment on attachment 8899879 [details] Bug 1392280 - Backout bug 1380377. https://reviewboard.mozilla.org/r/171212/#review176354 Thanks!
Attachment #8899879 - Flags: review?(mconley) → review+
Pushed by firstname.lastname@example.org: https://hg.mozilla.org/integration/autoland/rev/5f5549876ea6 Backout bug 1380377. r=mconley
Normal profile: https://perfht.ml/2xpHfeB Regressed profile: https://perfht.ml/2xpGdz1 We can see that we are calling an expensive nsDocument::DispatchContentLoadedEvents method in the regressed profile, which might explain the slowdown. Can we ni? someone who might know what is happening here?
Merged backout: https://hg.mozilla.org/mozilla-central/rev/5f5549876ea6 Does this resolve this bug and reopen 1380377?
Status: ASSIGNED → RESOLVED
Last Resolved: 2 years ago
status-firefox55: --- → unaffected
status-firefox56: --- → unaffected
status-firefox57: affected → fixed
Resolution: --- → FIXED
(In reply to Edouard Oger [:eoger] from comment #10) > Normal profile: > https://perfht.ml/2xpHfeB > > Regressed profile: > https://perfht.ml/2xpGdz1 > > We can see that we are calling an expensive > nsDocument::DispatchContentLoadedEvents method in the regressed profile, > which might explain the slowdown. Can we ni? someone who might know what is > happening here? Comparing these 2 profiles, I don't see a big difference in the code that's running in them. The major difference I see is that in the "normal" profile the "window-open" measurement stops around when we start executing the 'load' handler of the new window, whereas in the 'regressed' profile the "window-open" measurement stops just before the 'MozAfterPaint' event. This makes me wonder if it's possible that the FX_NEW_WINDOW_MS probe has been measuring incorrect data because the "document-shown" observer notification may not be what we want (after paint is more relevant).
You need to log in before you can comment on or make changes to this bug.