Closed Bug 1370583 Opened 7 years ago Closed 7 years ago

3.09 - 8.77% remote-tp4m / remote-tsvg (android-6-0-armv8-api15, android-7-1-armv8-api15) regression on push d1c5998858a36a1a07afeb9fa8d3c52f95a7cae5 (Tue Jun 6 2017)

Categories

(Core :: Networking: HTTP, defect)

53 Branch
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla56
Tracking Status
firefox-esr52 --- unaffected
firefox54 --- unaffected
firefox55 --- unaffected
firefox56 --- fixed

People

(Reporter: jmaher, Assigned: kershaw)

References

Details

(Keywords: perf, regression, Whiteboard: [necko-active])

Attachments

(1 file, 1 obsolete file)

We have detected an autophone (Android) regression from push:

https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?changeset=d1c5998858a36a1a07afeb9fa8d3c52f95a7cae5

As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

  9%  remote-tsvg summary android-7-1-armv8-api15 opt      100.35 -> 109.15
  3%  remote-tp4m summary android-6-0-armv8-api15 opt      228.20 -> 235.25


You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=7065

On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the jobs in a pushlog format.

To learn more about the regressing test(s), please see: https://wiki.mozilla.org/EngineeringProductivity/Autophone
I believe this is http related vs javascript related (bug 1369337 landed at the same time).

:kershaw, I see that you are the patch author and I believe your changes are the cause of this regression.  Could you help us figure this out by looking at the code from bug 1360581 and determining if it would cause a pageload regression on android (hardware, not emulators).
Flags: needinfo?(kechang)
(I was afraid of this.)  In the worst case we can always back bug 1360581 out and do it more properly next time (at least to push to talos before landing).
Component: Untriaged → Networking: HTTP
Product: Firefox → Core
(In reply to Joel Maher ( :jmaher) from comment #1)
> I believe this is http related vs javascript related (bug 1369337 landed at
> the same time).
> 
> :kershaw, I see that you are the patch author and I believe your changes are
> the cause of this regression.  Could you help us figure this out by looking
> at the code from bug 1360581 and determining if it would cause a pageload
> regression on android (hardware, not emulators).

I am not surprised to see this performance regression since this is definitely caused by the code from bug 1360581.
In my patch, every uri has been checked against local blacklist and whitelist before starting connection, so this regression is inevitable.

(In reply to Honza Bambas (:mayhemer) from comment #2)
> (I was afraid of this.)  In the worst case we can always back bug 1360581
> out and do it more properly next time (at least to push to talos before
> landing).

So, can we live with this regression for now and file another bug to improve later, or just back out bug 1360581?
Flags: needinfo?(kechang) → needinfo?(honzab.moz)
I leave that decision up to the networking team and android team.  I would lean towards leaving this in as we understand the issue and there is no clear path to fix this.  I am sure with more work and brainstorming there are wins to be had!
Do I understand this is Android only regressions?  If so, would it be possible to have a pref that would revert the previous behavior and switch it off on android?

Otherwise, I think if we can find a simple fix, that is verified to at least minimize the perfomance impact and not cause other regressions, in 3 days, then let's leave the bug in until then.  

If we don't make it, then please back it out, since the regression really is not small and we are too close to beta.
Flags: needinfo?(honzab.moz)
I haven't seen desktop issues, only android so far.
it appears the root cause was backed out:
https://hg.mozilla.org/integration/mozilla-inbound/rev/78aa5a83314a


and we see some performance improvements:
== Change summary for alert #7081 (as of June 06 2017 20:04 UTC) ==

Improvements:

  9%  remote-tsvg summary android-7-1-armv8-api15 opt      109.69 -> 100.15
  3%  remote-tp4m summary android-6-0-armv8-api15 opt      235.19 -> 228.91

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=7081


so the initial analysis was correct.
(In reply to Joel Maher ( :jmaher) from comment #7)
> it appears the root cause was backed out:
> https://hg.mozilla.org/integration/mozilla-inbound/rev/78aa5a83314a

Thanks Joel.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Assignee: nobody → kechang
Target Milestone: --- → mozilla55
and this landed again:
== Change summary for alert #7279 (as of June 14 2017 10:38 UTC) ==

Regressions:

  7%  remote-blank summary android-6-0-armv8-api15 opt      624.63 -> 667.76
  4%  remote-blank summary android-4-2-armv7-api15 opt      1,602.99 -> 1,662.01
  3%  remote-blank summary android-4-4-armv7-api15 opt      998.21 -> 1,030.63

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=7279

:kershaw, can you take a look at this- in this case we have different failures, (all blank page loads vs loading real content).
Status: RESOLVED → REOPENED
Flags: needinfo?(kechang)
Resolution: FIXED → ---
(In reply to Joel Maher ( :jmaher) from comment #9)
> and this landed again:
> == Change summary for alert #7279 (as of June 14 2017 10:38 UTC) ==
> 
> Regressions:
> 
>   7%  remote-blank summary android-6-0-armv8-api15 opt      624.63 -> 667.76
>   4%  remote-blank summary android-4-2-armv7-api15 opt      1,602.99 ->
> 1,662.01
>   3%  remote-blank summary android-4-4-armv7-api15 opt      998.21 ->
> 1,030.63
> 
> For up to date results, see:
> https://treeherder.mozilla.org/perf.html#/alerts?id=7279
> 
> :kershaw, can you take a look at this- in this case we have different
> failures, (all blank page loads vs loading real content).

I am working on this.
Will update my analysis result later.
Flags: needinfo?(kechang)
Whiteboard: [necko-active]
After lots times of try pushes, I found the performance regression is caused by initializing nsIUrlClassifierDBService too late. The latest try push [1] can prove this.

In our current code on m-c, the first time we get nsIUrlClassifierDBService is at [2], which is after start http connection. Since the initialization of nsIUrlClassifierDBService might take a while on main thread, this could block the time when OnStartRequest is called.

This patch moves do_GetService(NS_URICLASSIFIERSERVICE_CONTRACTID) before calling ShouldEnableTrackingProtection(), so we can ensure that the nsIUrlClassifierDBService will be initialized before starting http connection.


[1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=165d386f8271426d6a89346f993250683152702f
[2] http://searchfox.org/mozilla-central/rev/2bcd258281da848311769281daf735601685de2d/netwerk/base/nsChannelClassifier.cpp#572
Attachment #8879402 - Flags: review?(francois)
Comment on attachment 8879402 [details] [diff] [review]
Get URIClassifier service eariler

Review of attachment 8879402 [details] [diff] [review]:
-----------------------------------------------------------------

Good find!
Attachment #8879402 - Flags: review?(francois) → review+
Thanks for the quick review!
Attachment #8879402 - Attachment is obsolete: true
Attachment #8879418 - Flags: review+
Keywords: checkin-needed
Target Milestone: mozilla55 → ---
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/2254061f8f51
Get URIClassifier service eariler. r=francois
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/2254061f8f51
Status: REOPENED → RESOLVED
Closed: 7 years ago7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla56
Good work Kershaw!  Thanks.
I see some improvements already:
== Change summary for alert #7427 (as of June 20 2017 17:25 UTC) ==

Improvements:

  5%  remote-blank summary android-7-1-armv8-api15 opt      831.79 -> 787.47
  4%  remote-blank summary android-4-2-armv7-api15 opt      1,646.01 -> 1,576.88
  4%  remote-blank summary android-4-4-armv7-api15 opt      1,025.71 -> 983.55
  3%  remote-twitter summary android-7-1-armv8-api15 opt    1,110.74 -> 1,079.83

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=7427
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: