Open Bug 1502138 Opened 1 year ago Updated 3 months ago
[meta] Determine cause(s) of variable page load timing results
When using Firefox we observe a high degree of variability in page load times. Possibly the best method to reproduce this is to run the raptor tp-6 tests locally. These tests make use of Mitmproxy to record pages and play them back locally, thus minimizing network variability. But the recorded results still reveal that page load events occur at significantly different times on multiple runs. Consider this example of the hero element timing on the captured amazon.com page: (Note that the first sample is the cold load, and is thus expected to be slower) "name": "raptor-tp6-amazon-firefox-hero:hero1", "replicates": [ 1630, 848, 850, 835, 858, 846, 797, 854, 891, 778, 840, 847, 786, 820, 793, 833, 1177, 793, 795, 851, 846, 846, 791, 805, 823 ] Ignoring the first load sample, these results show a relative standard deviation of 9.15%. This bug is to determine components and functionality within components that is causing this high degree of variability.
Updates: • The variance has been reproduced on Windows and OSX, locally and on lab hardware. • The DAMP tools for minimizing noise (e.g. forcing GCs) cannot be used here since the tests are driven by a web extension without access to privileged api. I did make a patch which forced GCs and CCs, but this did not reduce noise. • An implementation of the setTimeout() clamp discussed here also did not reduce noise: https://bugzilla.mozilla.org/show_bug.cgi?id=1373723 • Significant variance is seen in Chrome as well, although it's noticeably lower. • As a test, disabling the baseline jit significantly reduces variance. However there are still very long outliers. • When profiling subtests, large outliers can be frequently be seen. Repeatedly, these are caused by long network requests. e.g. Compare these profiles of the same page reload: fast (hover to see that load 708 takes 1.2ms): https://perf-html.io/public/2c1f6d9e42e5c9734cd139728116c8ab549a2d6b/network-chart/?globalTrackOrder=0-1-2-3-4&hiddenGlobalTracks=1-2-3&localTrackOrderByPid=23635-1-0~23638-0~23636-0~&range=1.9505_3.4173~2.0038_2.0852&thread=5&v=3 slow (hover to see that load 2085 takes 187ms): https://perf-html.io/public/afc715b6d5f2385651a5aa03636aa88e02bbe1f1/network-chart/?globalTrackOrder=0-1-2-3-4&hiddenGlobalTracks=1-2-3&localTrackOrderByPid=23635-1-0~23638-0~23636-0~&range=1.9510_3.6641~1.9989_2.2543&thread=5&v=3
Related - :jesup put up a patch that puts necko request cache status (Hit, Miss, etc) in Network markers of the profiler https://bugzilla.mozilla.org/show_bug.cgi?id=1504930
Updates: • On local raptor tp6 tests (both Windows and OSX opt release), noise (measured as relative standard deviation) can be cut approximately in half by making a hacked change to Necko which increases the expiration time of http resources. This prevents the large http resource response time discrepancies that we see in the above profiles. • Strangely, when this test change is run on a try server, a similar drop in noise has yet been observed. i.e. I'm not seeing much of a difference in the lab. I'm investigating why this the case.
Making the small changes required to allow raptor-tp6 to capture 25 profiles on the try server allowed me to see that http response time is also a contributing factor in the slowest test results on lab hardware, at least when profiled. Compare these two captures, run in the lab, where one loads in 932ms: https://perfht.ml/2OGZXGQ And the other in 1070ms: https://perfht.ml/2OH059i
Discovered that long http request response times are still making it through the experimental patch. These introduce a massive amount of noise. e.g. fast run https://perf-html.io/public/098258668043a372ed5ad720895d06da59cff37c/network-chart/?globalTrackOrder=0-1-2-3-4&hiddenGlobalTracks=1-2-3&localTrackOrderByPid=78670-1-0~78673-0~78671-0~&thread=5&v=3 slow run https://perf-html.io/public/a2f737991b24e94e78b5928ee9dd607705d8d358/network-chart/?globalTrackOrder=0-1-2-3-4&hiddenGlobalTracks=1-2-3&localTrackOrderByPid=78670-1-0~78673-0~78671-0~&thread=5&v=3 This single run, 1 of 24, is 1/3 of the standard deviation.
Updates: • The long http requests, mentioned above, were discovered to also be cases where cached resources were not used (e.g. "no-cache" directive), and either the mitmproxy or some component in the runtime is taking an exceptionally longtime to complete them. Extending the Necko cache hack ("generous-cache") allows these items to remain in the cache by ignoring cache validation protocol. This is useful only for testing of performance changes, but cannot be enabled in the runtime. But it raises the question of _why_ we are seeing these spikes in processing of recorded requests. • In profiling the youtube pageload (almost always the noisiest suite), I found another problem: it doesn’t appear that there is sufficient time between page cycles to finish executing JS. In addition, the screen is always refreshing due to a spinner. This isn’t the cleanest profile, but I think you can see that the browser is anything but idle prior to the test start, and we see fallout from that in GC major: https://perfht.ml/2OOvQ08 A simple patch extends the 1 second delay between page loads to 15 seconds and this significantly drops the noise. Patches have been made to bandaid these sources of noise, and the reduced variance can be seen in these results: (from Windows10, x64opt, run in the lab). https://docs.google.com/spreadsheets/d/1nMQjBARxlz1mVZR_I11MRXTjG6o6ndfze_cBhW6-6ow/edit#gid=0
Updates: • Building on the above patches, I ran brief experiments to defer the garbage collection by modifying the heap growth parameters. This in fact increased the noisiness of the tests. Suggesting that possibly more frequent GC could reduce noise. However the impact of GC is still minor compared to the network requests. • Realized that our recorded webpage playback tool, mitmproxy, by default makes live connections to upstream servers (e.g. https://www.amazon.com). (This was very much unexpected for me!). The reason is a feature called "upstream certificate sniffing" https://mitmproxy.readthedocs.io/en/v2.0.2/features/upstreamcerts.html mitmproxy pauses the recorded playback and makes a brief connection to the upstream server to have a look at the ssl certificate. This is needed for cases where the client connects by direct ip instead of hostname (and two other subtle cert naming fallbacks). Naturally this introduces significant variability (particularly locally over wifi) due to the network requests. Using the "--no-upstream-cert" option allows the proxy to playback entirely from disk without a network connection. I have evidence that this reduces the variance significantly, and am investigating the implications and results.
Updates: • The disabling of mitmproxy "upstream certificate sniffing" behavior mentioned in Comment 7 reduces the variance of the tp6 raptor tests in general (by ~25% in lab tests and ~50% in local testing). In some cases the removal does regress the performance. I'm investigating why (unexpected since the network responses are now all served from disk). • In looking closer at the metrics recorded, I logged this issue where it appears that the hero element timing (measured through IntersectionObserver) is not lining up with the recorded screenshots for a given profile: https://bugzilla.mozilla.org/show_bug.cgi?id=1510999 (understandable since the rendering occurs asynchronously). • The Performance and Performance Test Engineering teams shared a very productive meeting in Mozlando. Out of this discussion came the consensus that pageload, (effectively loadEvent) would be added as the tier 1 performance test. See https://bugzilla.mozilla.org/show_bug.cgi?id=1504013 The existing tests will be moved to tier 2. (Previously their results were geomeaned to form the pageload value). This should let us track a metric more representative of the perceived load and also facilitate further reduction in the noise (since we're no longer tracking 4 markers (dcm, fnbp, ttfi, hero). ttfi never fires for some pages (e.g. youtube, due to constant JS, and, as we saw, the hero element tracking is not correct in at least two pages). • As Kyle Lahnakoski, pointed out, the noise in the raptor tp6 tests is bimodal, which is making some of our current statistical comparisons effectively invalid. Plan is still to find the source of the behaviors which introduce the bimodality, rather than using statistical techniques to deal with the bimodal data.
So when looking at the page load times I'm seeing a bimodal distribution where the significantly slower results are caused by one of the following: 1. Over the course of the 24 reloads, certain HTTP resources will (based on their cache directive) need to be refetched, or at the very least, re-validated. This will take significantly longer to load/validate than items in the cache. This can throw the load time of a page off by hundreds of milliseconds. 2. Over the course of the 24 reloads there will be runs with significant Garbage Collection during the early stages of the pageload. This has a tendency to delay other operations (which presumably require access to the parent process's main thread). This can also throw off the load time of a page by hundreds of milliseconds. Notes: • An experiment was done with the latest mitmproxy 4.0.4 and deterministic JS added. (Not a full integration of mitmproxy 4.04 into raptor, but recording and playback of a page in Firefox using the Performance.timing web api). The load times were still bimodal. • An experiment was done in which the Network feature "race-network-with-cache" was disabled. This did not remove bimodal load time behaviour from the tests. • An experiment was done where the raptor tests were run with the Necko cache (memory/disk) disabled. This did significantly reduce the noise (generally all loads were very slow). But the practicality of this change is questionable as the cache is a key component of the browser. It was suggested that a better solution would be to have raptor tp6 run cold load tests (i.e. loads from a single profile) as well as the current reload tests. Note that cold load tests are already planned for 2019.
Summary: Determine cause(s) of variable page load timing results → [meta] Determine cause(s) of variable page load timing results
You need to log in before you can comment on or make changes to this bug.