2.93% Quantum_1 (windows7-32) regression on push 12783b5a0fca4162a36a61b45f7b2a5bc2b452d1 (Thu Jun 1 2017)

NEW
Assigned to

Status

()

Core
Networking
P2
normal
6 months ago
a month ago

People

(Reporter: igoldan, Assigned: schien)

Tracking

(Blocks: 1 bug, 4 keywords)

unspecified
Unspecified
Windows 7
perf, regression, stale-bug, talos-regression
Points:
---

Firefox Tracking Flags

(firefox55 wontfix, firefox56 wontfix, firefox57 wontfix)

Details

(Whiteboard: [PBg-HTTP-M4][necko-triaged])

Talos has detected a Firefox performance regression from push:

https://hg.mozilla.org/integration/autoland/pushloghtml?changeset=12783b5a0fca4162a36a61b45f7b2a5bc2b452d1

As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

  3%  Quantum_1 summary windows7-32 opt e10s     452.17 -> 465.42


You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=6992

On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format.

To learn more about the regressing test(s), please see: https://wiki.mozilla.org/Buildbot/Talos/Tests

For information on reproducing and debugging the regression, either on try or locally, see: https://wiki.mozilla.org/Buildbot/Talos/Running

*** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! ***

Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling
Blocks: 1015466, 1346783
Component: Untriaged → Networking
Product: Firefox → Core
This is a brand new test we just introduced. It only does a simple page load on http://www.google.com

:schien Could you look over this issue and confirm it?
Flags: needinfo?(schien)
on my radar and working on it. keep the ni flag.
Assignee: nobody → schien
@igoldan, few questions:

1. is the number directly represents the ms to load google.com?
2. how can I run this performance test on local machine and on try server?
3. can you provide the test script of this test case? (a DXR link should be enough)
Flags: needinfo?(schien) → needinfo?(ionut.goldan)
this number right now is measuring the first time to get the onload handler after loading the url.

This test only runs on windows right now and can be run with |mach talos-test quantum_1| just like any other talos test.  It does download and install a lot of different software pieces as this is a new test we just got turned on last week and have many further steps to take with it.

on try server, I know for sure running |try -b o -p win32 -u none -t all| will get the Q1 job to run.

:rwood can confirm/clarify
Flags: needinfo?(ionut.goldan) → needinfo?(rwood)
Thanks @jmaher.

Still trying to locate the exact place where the 13ms spent. One theory is that additional PBackground IPDL is created for deliver HTTP data off-main-thread. 13ms seems to be a reasonable amount of time to create a IPC channel. However I still need more concrete prove to justify the usage.
I tried some try pushes and found that you need to do |try -b o -p win32 -u none -t all| to get q1 to work, we need to add quantum-pageload to the list of available try jobs to run.

Comment 7

6 months ago
I filed Bug 1370244 re: getting the test to run via correct try syntax
Flags: needinfo?(rwood)
I measure the time cost on each stage between HTTP channel async open and onStopRequest. The major time difference is the HTTP background channel setup delay. The average time of creating HTTP background channel is 16ms on try server while running Quantum_1. However the PBackground mechanism can help reduce the propagation delay from 22ms to 14ms while sending the following OnStart/OnData/OnStop IPC message. Based on the statistics, the loading time for each HTTP channel will regress about 8ms.

The loading pattern of google.com is 1) load the main HTML document, 2) load JS and image in parallel. Therefore the total amount of delay before onload is around (8ms * 2) = 16ms, which is about the time difference we recorded on Quantum_1.

Here is the following plan:
1) Test Bug 1338493 to see if we can reduce more propagation delay by further OMT HTTP on content process.
2) To reduce the creation overhead of PBackground IPC, we can try create single HTTP background channel per process instead of per channel.
After applying Bug 1338493
1) background channel setup delay and IPC propagation delay on background channel will be reduces to less than 1 ms.
2) IPC propagation delay on OnStartRequest is increased to 26ms (3 ms more than before applying any PBg-Http patches)

Although the background channel IPC delay is reduced significantly, the overall HTTP channel operation is still 3~5 ms bad because OnStartRequest is blocking the process of those IPC messages.

If we can successfully move OnStartRequest IPC to background channel, we'll be able to reduce about 20 ms on the HTTP channel operation.
Blocks: 1338218
Whiteboard: [necko-active][PBg-HTTP-M3]
tried to get gecko profiler data from try server but failed.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=ef63d5d58815cf7753dba50062e8ee7281afbc7c

No profiling data is uploaded but a crash is observed after test finished.
@igoldan, do you know how to properly get profiler data while running the Quantum_1 test cases?
Flags: needinfo?(ionut.goldan)
the profiling support has been partially broken for a few months and I am not sure if we need to do something special for this test.  I recall something about needing to push to try with --rebuild 2 in order to get it to work, but that was for linux/osx- I am not sure about windows.  

:mconley has done recent work with geckoProfiling and talos, possibly he knows what works and doesn't.
Flags: needinfo?(ionut.goldan) → needinfo?(mconley)
Trying to run this test locally to get profiling data today but failed.

|mach talos-test quantum_1| is not recognized by mach command.
|mach talos-test -a quantum_pageload_google| can be executed and toolchains are downloaded successfully. However Firefox will be crash with error message "FATAL ERROR: Non-local network connections are disabled and a connection".

@jmaher do you have any insight?
Flags: needinfo?(jmaher)
|mach talos-test -a quantum_pageload_google| is the right one, but the non-local network connections we need to resolve.  You can hack this for now by commenting out:
https://dxr.mozilla.org/mozilla-central/source/testing/talos/talos/ffsetup.py?q=path%3Affsetup.py&redirect_type=single#72

I filed bug 1375456 to fix this properly.
Flags: needinfo?(jmaher)
(In reply to Joel Maher ( :jmaher) from comment #11)
> :mconley has done recent work with geckoProfiling and talos, possibly he
> knows what works and doesn't.

Yep, I'm trying to fix this in bug 1374333.

The patches I've already written seem to have made this possible in automation.

Here's a profile I got from this try push: https://treeherder.mozilla.org/#/jobs?repo=try&revision=25547351e25027e742e1554a06bf493ac8c24de0

http://bit.ly/2rWwxxK

schien - I'll upload my WIP patches to bug 1374333 if you want to work with them on this while I try to get my patches into the tree.
Flags: needinfo?(mconley)
Whiteboard: [necko-active][PBg-HTTP-M3] → [necko-active][PBg-HTTP-M4]
quantum_pageload is moved from monitoring "window.onload" to first paint.

I tried manually disable PBackground IPC for HTTP channel and rerun the talos test.
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=3491ddfe3180171a78febefd3b29551c5cd06604&newProject=try&newRevision=36c892dbea102bbda865cf665c99212d5caec9ba&framework=1&showOnlyImportant=0

tp6_google shows no noticeable difference.
tp6_amazon / tp6_youtube shows improvements while PBackground IPC is used.
tp6_facebook shows performance degradation while PBackground IPC is used.
I've identified that reason why window.onload is regressed after OMT Http IPC is enabled and it is related to how HTML parser OMT is implemented.

Here is the important event sequence observed from gecko profiling result on current m-c:

0) parent process send OnStartRequest via main thread IPC and ODA/OnStopRequest via PBackground IPC
1) content process receives ODA/OnStopRequest on STS thread and queue these IPC messages in HttpBackgroundChannelChild
2) content process receives OnStartRequest on main thread, notify HttpBackgroundChannelChild to replay pending messages
3) On STS thread, ODA is dispatched to HTML parser thread
4) On parser thread, HTMLTreeOp is generated and dispatched to main thread after parsing html segements received via ODA
5) On parser thread, StopRequestEvent is dispatched to main thread after all the ODA events are consumed by HTML parser
6) On main thread, HTMLTreeOp will manipulate DOM tree and execute JS in <script>
7) On main thread, Vsync might be processed after HTMLTreeOp and reflow will be triggered since DOM tree is largely updated
8) On main thread, StopRequestEvent is finally processed, OnStopRequest is called and this channel object is removed from load group

Before all the PBg-Http works were enabled, StopRequestEvent is usually run on the same call stack after ODA is invoked. Therefore, the OnStopRequest callback and removal from load group is executed before the long HTMLTreeOp and reflow. In this case. window.onload will have higher chance to be invoked earlier.

By preventing HTML parser from invoking HttpChannelChild::RetargetDeliveryTo it shows performance improvements on tp6.
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=04c1d92cabea5de81678468f381dd42d727e5f9c&newProject=try&newRevision=550025b311929e3de867dd7d10eefb0cecccf948&framework=1&showOnlyImportant=0
(note: HTML parser will still copy the data from ODA and dispatch the parsing task to parser thread. Main thread will not be occupied by HTML parsing in this case)

I'm running performance test on more combinations of configuration for enabling OMT IPC and/or OMT ODA based on different content type, to find out the one with best performance.
Bulk priority update: https://bugzilla.mozilla.org/show_bug.cgi?id=1399258
Priority: -- → P1

Comment 18

3 months ago
This is the on-going performance investigation work for 57, keeping as P1.
status-firefox57: --- → affected
Whiteboard: [necko-active][PBg-HTTP-M4] → [PBg-HTTP-M4]

Updated

3 months ago
status-firefox55: --- → wontfix
status-firefox56: --- → wontfix
Keywords: stale-bug
Whiteboard: [PBg-HTTP-M4] → [PBg-HTTP-M4][necko-triaged]
Did some experiment as comment #16 mentioned, I figured out that playing with the event order is not going to get performance gain on all perspectives. That is, onload time might be improved but the first-paint time might be regressed at the same time.

In addition, the performance measurement of Quantum_1 is switched from onload to first-paint, HTTP OMT + PBg-HTTP shows improvements on the first-paint time. I'm going to move the priority of this bug to P2 since this no longer blocking the PBg-HTTP.
No longer blocks: 1338218, 1015466
Priority: P1 → P2

Updated

a month ago
status-firefox57: affected → wontfix
You need to log in before you can comment on or make changes to this bug.