3.09 - 8.77% remote-tp4m / remote-tsvg (android-6-0-armv8-api15, android-7-1-armv8-api15) regression on push d1c5998858a36a1a07afeb9fa8d3c52f95a7cae5 (Tue Jun 6 2017)

RESOLVED FIXED in Firefox 56

Status

()

Core
Networking: HTTP
RESOLVED FIXED
6 months ago
6 months ago

People

(Reporter: jmaher, Assigned: kershaw)

Tracking

(Blocks: 1 bug, {perf, regression})

53 Branch
mozilla56
perf, regression
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox-esr52 unaffected, firefox54 unaffected, firefox55 unaffected, firefox56 fixed)

Details

(Whiteboard: [necko-active])

Attachments

(1 attachment, 1 obsolete attachment)

(Reporter)

Description

6 months ago
We have detected an autophone (Android) regression from push:

https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?changeset=d1c5998858a36a1a07afeb9fa8d3c52f95a7cae5

As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

  9%  remote-tsvg summary android-7-1-armv8-api15 opt      100.35 -> 109.15
  3%  remote-tp4m summary android-6-0-armv8-api15 opt      228.20 -> 235.25


You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=7065

On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the jobs in a pushlog format.

To learn more about the regressing test(s), please see: https://wiki.mozilla.org/EngineeringProductivity/Autophone
(Reporter)

Comment 1

6 months ago
I believe this is http related vs javascript related (bug 1369337 landed at the same time).

:kershaw, I see that you are the patch author and I believe your changes are the cause of this regression.  Could you help us figure this out by looking at the code from bug 1360581 and determining if it would cause a pageload regression on android (hardware, not emulators).
Flags: needinfo?(kechang)
(I was afraid of this.)  In the worst case we can always back bug 1360581 out and do it more properly next time (at least to push to talos before landing).

Updated

6 months ago
Component: Untriaged → Networking: HTTP
Product: Firefox → Core
(Assignee)

Comment 3

6 months ago
(In reply to Joel Maher ( :jmaher) from comment #1)
> I believe this is http related vs javascript related (bug 1369337 landed at
> the same time).
> 
> :kershaw, I see that you are the patch author and I believe your changes are
> the cause of this regression.  Could you help us figure this out by looking
> at the code from bug 1360581 and determining if it would cause a pageload
> regression on android (hardware, not emulators).

I am not surprised to see this performance regression since this is definitely caused by the code from bug 1360581.
In my patch, every uri has been checked against local blacklist and whitelist before starting connection, so this regression is inevitable.

(In reply to Honza Bambas (:mayhemer) from comment #2)
> (I was afraid of this.)  In the worst case we can always back bug 1360581
> out and do it more properly next time (at least to push to talos before
> landing).

So, can we live with this regression for now and file another bug to improve later, or just back out bug 1360581?
Flags: needinfo?(kechang) → needinfo?(honzab.moz)
(Reporter)

Comment 4

6 months ago
I leave that decision up to the networking team and android team.  I would lean towards leaving this in as we understand the issue and there is no clear path to fix this.  I am sure with more work and brainstorming there are wins to be had!
Do I understand this is Android only regressions?  If so, would it be possible to have a pref that would revert the previous behavior and switch it off on android?

Otherwise, I think if we can find a simple fix, that is verified to at least minimize the perfomance impact and not cause other regressions, in 3 days, then let's leave the bug in until then.  

If we don't make it, then please back it out, since the regression really is not small and we are too close to beta.
Flags: needinfo?(honzab.moz)
(Reporter)

Comment 6

6 months ago
I haven't seen desktop issues, only android so far.
(Reporter)

Comment 7

6 months ago
it appears the root cause was backed out:
https://hg.mozilla.org/integration/mozilla-inbound/rev/78aa5a83314a


and we see some performance improvements:
== Change summary for alert #7081 (as of June 06 2017 20:04 UTC) ==

Improvements:

  9%  remote-tsvg summary android-7-1-armv8-api15 opt      109.69 -> 100.15
  3%  remote-tp4m summary android-6-0-armv8-api15 opt      235.19 -> 228.91

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=7081


so the initial analysis was correct.
(In reply to Joel Maher ( :jmaher) from comment #7)
> it appears the root cause was backed out:
> https://hg.mozilla.org/integration/mozilla-inbound/rev/78aa5a83314a

Thanks Joel.
Status: NEW → RESOLVED
Last Resolved: 6 months ago
Resolution: --- → FIXED
Assignee: nobody → kechang
status-firefox54: --- → unaffected
status-firefox55: --- → fixed
status-firefox-esr52: --- → unaffected
Target Milestone: --- → mozilla55
(Reporter)

Comment 9

6 months ago
and this landed again:
== Change summary for alert #7279 (as of June 14 2017 10:38 UTC) ==

Regressions:

  7%  remote-blank summary android-6-0-armv8-api15 opt      624.63 -> 667.76
  4%  remote-blank summary android-4-2-armv7-api15 opt      1,602.99 -> 1,662.01
  3%  remote-blank summary android-4-4-armv7-api15 opt      998.21 -> 1,030.63

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=7279

:kershaw, can you take a look at this- in this case we have different failures, (all blank page loads vs loading real content).
Status: RESOLVED → REOPENED
Flags: needinfo?(kechang)
Resolution: FIXED → ---
(Assignee)

Comment 10

6 months ago
(In reply to Joel Maher ( :jmaher) from comment #9)
> and this landed again:
> == Change summary for alert #7279 (as of June 14 2017 10:38 UTC) ==
> 
> Regressions:
> 
>   7%  remote-blank summary android-6-0-armv8-api15 opt      624.63 -> 667.76
>   4%  remote-blank summary android-4-2-armv7-api15 opt      1,602.99 ->
> 1,662.01
>   3%  remote-blank summary android-4-4-armv7-api15 opt      998.21 ->
> 1,030.63
> 
> For up to date results, see:
> https://treeherder.mozilla.org/perf.html#/alerts?id=7279
> 
> :kershaw, can you take a look at this- in this case we have different
> failures, (all blank page loads vs loading real content).

I am working on this.
Will update my analysis result later.
Flags: needinfo?(kechang)
Whiteboard: [necko-active]
(Assignee)

Comment 11

6 months ago
Created attachment 8879402 [details] [diff] [review]
Get URIClassifier service eariler

After lots times of try pushes, I found the performance regression is caused by initializing nsIUrlClassifierDBService too late. The latest try push [1] can prove this.

In our current code on m-c, the first time we get nsIUrlClassifierDBService is at [2], which is after start http connection. Since the initialization of nsIUrlClassifierDBService might take a while on main thread, this could block the time when OnStartRequest is called.

This patch moves do_GetService(NS_URICLASSIFIERSERVICE_CONTRACTID) before calling ShouldEnableTrackingProtection(), so we can ensure that the nsIUrlClassifierDBService will be initialized before starting http connection.


[1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=165d386f8271426d6a89346f993250683152702f
[2] http://searchfox.org/mozilla-central/rev/2bcd258281da848311769281daf735601685de2d/netwerk/base/nsChannelClassifier.cpp#572
Attachment #8879402 - Flags: review?(francois)
Comment on attachment 8879402 [details] [diff] [review]
Get URIClassifier service eariler

Review of attachment 8879402 [details] [diff] [review]:
-----------------------------------------------------------------

Good find!
Attachment #8879402 - Flags: review?(francois) → review+
(Assignee)

Comment 13

6 months ago
Created attachment 8879418 [details] [diff] [review]
Get URIClassifier service eariler, r=francois

Thanks for the quick review!
Attachment #8879402 - Attachment is obsolete: true
Attachment #8879418 - Flags: review+
(Assignee)

Updated

6 months ago
Keywords: checkin-needed
status-firefox55: fixed → unaffected
status-firefox56: --- → affected
Target Milestone: mozilla55 → ---

Comment 14

6 months ago
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/2254061f8f51
Get URIClassifier service eariler. r=francois
Keywords: checkin-needed

Comment 15

6 months ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/2254061f8f51
Status: REOPENED → RESOLVED
Last Resolved: 6 months ago6 months ago
status-firefox56: affected → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla56
Good work Kershaw!  Thanks.
(Reporter)

Comment 17

6 months ago
I see some improvements already:
== Change summary for alert #7427 (as of June 20 2017 17:25 UTC) ==

Improvements:

  5%  remote-blank summary android-7-1-armv8-api15 opt      831.79 -> 787.47
  4%  remote-blank summary android-4-2-armv7-api15 opt      1,646.01 -> 1,576.88
  4%  remote-blank summary android-4-4-armv7-api15 opt      1,025.71 -> 983.55
  3%  remote-twitter summary android-7-1-armv8-api15 opt    1,110.74 -> 1,079.83

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=7427
You need to log in before you can comment on or make changes to this bug.