Closed
Bug 1366314
Opened 9 years ago
Closed 7 years ago
WebPageTest's migration from Addon to Web Extension regresses page load
Categories
(Firefox :: Extension Compatibility, defect)
Firefox
Extension Compatibility
Tracking
()
RESOLVED
WONTFIX
People
(Reporter: Harald, Unassigned)
Details
Attachments
(3 files)
Followup for bug 1348394.
Mark from Speedcurve reported a slow down in first paint. Page load metrics beyond page load are not affected (see attached graph).
Recording before: https://wpt.speedcurve.com/result/170428_7Q_1e2c421f26e287b8b82926ede9d8be5b/
After: https://wpt.speedcurve.com/result/170429_7Y_ba2d3619f6c702b3ef16ba738248f3de/
Aside, another possible root cause is also the Firefox did not have e10s enabled before the web extension shipped; which makes this an e10s issue.
| Reporter | ||
Updated•9 years ago
|
Flags: needinfo?(dothayer)
Comment 1•9 years ago
|
||
I would probably just dupe this to bug 1363905 where we are explicitly measuring using Talos the first paint metrics for WebExtensions. Is there something specific to WebPageTest that might be causing this, it is possible for a WebExtension to slow a page down?
| Reporter | ||
Comment 2•9 years ago
|
||
I would probably add it to the meta, not dupe it. I don't see any mention of first paint bug 1363905 but I'd assume its browser startup and not page load?
There might be something specific in the extension causing this, but it was mostly copied over from the Chrome extension so it doesn't intentionally slow down first paint ;).
This bug is first paint for page load and it is especially curious that later metrics are not affected, so it just seems to wait for more information before flushing out the first pixels.
Comment 3•9 years ago
|
||
What's alarming to me is the magnitude of the difference seen in that graph. I can think of a few things that might have some performance impact (loading a script on document_start, which we can change, and doing blocking interception of response headers, which is trickier to change), but they should only make a difference on the order of milliseconds, if that - definitely not the multiple seconds that we're seeing here. I have a suspicion that the difference is due to the fact that the new webextension, like the chrome extension, uses a window.performance.now() timestamp on payload, whereas the old extension uses no such timestamp (still looking into where the actual number comes from).
There's a comment above the place where they do this in the Chrome extension:
// This script is automatically injected into every page before it loads.
// We need to use it to register for the earliest onLoad callback
// since the navigation timing times are sometimes questionable.
window.addEventListener('load', function() {
var timestamp = 0;
if (window['performance'] != undefined)
timestamp = window.performance.now();
var fixedViewport = 0;
if (document.querySelector("meta[name=viewport]"))
fixedViewport = 1;
chrome.extension.sendRequest({'message': 'wptLoad',
'fixedViewport': fixedViewport,
'timestamp': timestamp}, function(response) {});
}, false);
Any chance our navigation timing times also questionable / based on a different clock?
In any case I'm going to continue looking at this, just thought I'd give an update.
Flags: needinfo?(dothayer)
| Reporter | ||
Comment 4•9 years ago
|
||
> I have a suspicion that the difference is due to the fact that the new webextension, like the chrome extension, uses a window.performance.now() timestamp on payload, whereas the old extension uses no such timestamp (still looking into where the actual number comes from).
> Any chance our navigation timing times also questionable / based on a different clock?
That sounds like you are looking into why the metrics might be reported wrongly by the browser, but all metrics in the graph are visual metrics, not reported by the browser but by analyzing captured video of the viewport.
Another hint might be in https://github.com/WPO-Foundation/webpagetest/issues/873
Comment 5•9 years ago
|
||
(In reply to :Harald Kirschner :digitarald from comment #4)
> That sounds like you are looking into why the metrics might be reported
> wrongly by the browser, but all metrics in the graph are visual metrics, not
> reported by the browser but by analyzing captured video of the viewport.
Ack, well ignore me then. In that case, I'm still confused by the graphs. Isn't SpeedIndex basically the definite integral of remaining page content with respect to time? To be unchanged like it is in the graph wouldn't it actually have to be faster at showing the rest of the page in order to compensate for the time in which nothing was painted? Or does SpeedIndex only take the integral from Start Render to Visually Complete?
| Reporter | ||
Comment 6•9 years ago
|
||
Maybe Patrick can confirm that Speed Index should not move if first render is delayed. Maybe there is more to those numbers.
Flags: needinfo?(patmeenan)
Comment 7•9 years ago
|
||
You'd expect it to move depending on how significant the content painted early is. If it's just a tiny header and the bulk of the content displays at the same time then it will move less.
Before/after filmstrips and waterfalls would really help.
Flags: needinfo?(patmeenan)
| Reporter | ||
Comment 8•9 years ago
|
||
Comparing the 2 recordings referenced in the first comment: https://wpt.speedcurve.com/video/compare.php?tests=170428_7Q_1e2c421f26e287b8b82926ede9d8be5b-r:1-c:0,170429_7Y_ba2d3619f6c702b3ef16ba738248f3de-r:1-c:0
Comment 9•9 years ago
|
||
From the filmstrips, Speed Index does about what you'd expect. Start render is just a bit of the header and some of the text paints sooner too but the one with the slower start render paints the big image areas sooner which pulls the Speed Index back in.
It looks like the new one is doing a fair bit more work when the document commits (CPU spikes for longer, pushing out the DNS lookups for the static resources). Could be the content script injection at that point (unless multi-process is also active and doing something).
It should be possible to do most of what is needed without injecting the content script until onload which could help.
| Reporter | ||
Comment 10•9 years ago
|
||
:dthayer, how much impact could injection + multiprocess make? Would be interesting to record a profile with the extension running; would that be possible?
Just to understand if this is a pathological issue; do we have telemetry for how long injection of content script takes or known issues around this?
Flags: needinfo?(dothayer)
Comment 11•9 years ago
|
||
It's worth noting that at most we're talking about 500ms from the content script injection in the linked test. Dom Interactive and DCL both moved by something like 4.5 seconds and a LOT more image resources loaded while it was waiting.
If the UA string is accurate, the test was run with a rather ancient Firefox 44. Can we have them verify if they are running an old build or if they are pulling the latest updates from WPT and just using a custom UA string?
FWIW, testing on the public WPT from the same location shows the same delayed DCL and Interactive: https://www.webpagetest.org/result/170520_GS_9d4cbd6e9c1d9f89a293d2186cf5bc79/
Is it possible that the timing just changed significantly with the move to multi-process and the main parser is getting delayed more by image decodes/renders and other activity?
Comment 12•9 years ago
|
||
Raw test data is showing 53.0.0.6312 for the browser version so must be a custom UA string.
My money is timing changes with the move to electrolysis radically changed the timing for layout/paint/parse scheduling, at least in this one case. The old addon forced single-process which is why the change triggered.
Comment 13•9 years ago
|
||
Confirming it's a custom UA string and the actual browser is Firefox 53.0.0.6312. We're seeing the same shift in start render across a range of sites. Screenshot attached.
| Reporter | ||
Comment 14•9 years ago
|
||
https://speedcurve.com/demo/site/?b=firefox&d=-1&de=1494831599&ds=1492412400&r=eu-west-1&s=306&u=934 shows probably a worst case I have seen so far in the speedcurve demo pages. Slowdown across all metrics: First render 3s to 8s, Speed Index 5s to 14s, Visual Complete 5s to 15s.
Next steps for me are:
1) install the 3.0 agent with the sdk addon on a test machine to check if Firefox runs with e10s
2) collect a platform profile with the web extension
Doug, can you investigate the overhead of content scripts or other web extension artifacts that could impact page load.
Summary: WebPageTest's migration from Addon to Web Extension delays first paint → WebPageTest's migration from Addon to Web Extension regresses page load
| Reporter | ||
Comment 15•9 years ago
|
||
Nominating for triage, the numbers in the last comment suggest a [qf:investigate:p1]
Whiteboard: [qf]
Comment 16•9 years ago
|
||
Is there a command-line option or a pref that will force e10s to be disabled in 53? That should make it easy to test the same config with the same extension with it enabled/disabled to see if that is the cause.
I saw browser.tabs.remote.autostart but I wasn't sure if it was still there and if it worked for turning it off (instead of just a flag to enable it).
| Reporter | ||
Comment 17•9 years ago
|
||
Comment 18•9 years ago
|
||
I can't seem to reproduce this locally. I have a couple of profiles of loading The Verge, but nothing remarkable is going on in them. Per https://github.com/WPO-Foundation/webpagetest/issues/744 if I know where to send a profile to and what the flow should look like, it should be trivial to get a profile with the new add-on, but that will only work on 55. In any case I think the best thing to do is try the new add-on with e10s disabled.
(In reply to Patrick Meenan from comment #16)
> Is there a command-line option or a pref that will force e10s to be disabled
> in 53? That should make it easy to test the same config with the same
> extension with it enabled/disabled to see if that is the cause.
>
> I saw browser.tabs.remote.autostart but I wasn't sure if it was still there
> and if it worked for turning it off (instead of just a flag to enable it).
I think the best pref for that is browser.tabs.remote.force-disable
Flags: needinfo?(dothayer)
Comment 19•9 years ago
|
||
Thanks for the pref, that helped. To disable e10s on WPT you can use a script like:
firefoxPref browser.tabs.remote.force-disable true
navigate https://aws.amazon.com/
(make sure to use tabs between each of the pieces in case formatting gets lost)
Here is a comparison with e10s enabled and disabled: https://www.webpagetest.org/video/compare.php?tests=170523_D6_ec1220582b2a8bc231cc2d5cb90f51d2,170523_7G_c2640de013346fd21963933a75490446
Looks like at least for the initial reported page, the render difference comes from e10s and not the extension (at least most of it). Same goes for the massive change in DOM Content Loaded (though onload fires a lot sooner).
It's worth noting that the EC2 VM's are pretty anemic (m3.medium, single core) and multi-process is probably suffering quite a bit as a result.
Running on the faster thinkpads brings the performance a lot closer together: https://www.webpagetest.org/video/compare.php?tests=170523_45_9ba55f6356fc127c2d08ee8f7ad35f25,170523_ZG_f9f2a7cd04205916d912544f808ec5de
| Reporter | ||
Comment 20•9 years ago
|
||
Mark mentioned that m3.medium is used as well for Speedcurve. This so far sounds like the culprit is e10s on single core virtual machines. I will verify on a local reference hardware as well.
| Reporter | ||
Comment 21•9 years ago
|
||
Pending on me profiling on dual-core hardware to verify that this is a single-core specific issue.
Whiteboard: [qf]
| Reporter | ||
Updated•9 years ago
|
Flags: needinfo?(hkirschner)
| Reporter | ||
Comment 22•9 years ago
|
||
I have not been able to get much traction on that as this only seems to affect single core machines. I tried to reproduce the issue on Nightly, but only the WPT Thinkpads have Nightly installed. The following comparison does the previous analysis for Nightly/Firefox, e10s/non-e10s; and the issue is not visible: https://www.webpagetest.org/video/compare.php?tests=170606_Q6_17EF-r:1-l:nonE10sNightly,170606_RF_17EY-r:2-l:e10sNightly,170606_FP_17PR-c:0-r:4-l:nonE10sRelease,170606_8G_17Q6-c:0-r:5-l:e10sRelease
Flags: needinfo?(hkirschner)
Comment 23•7 years ago
|
||
Closing due to lack of activity.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WONTFIX
You need to log in
before you can comment on or make changes to this bug.
Description
•