Closed Bug 1585582 Opened 5 years ago Closed 5 years ago

6.05 - 10.6% raptor-tp6-tumblr-firefox loadtime regression on push d52a0108ed8ef2593db6fe0dafedbc2790941fd0 (Mon September 30 2019)

Categories

(Core :: Networking: HTTP, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla73
Tracking Status
firefox-esr60 --- unaffected
firefox-esr68 --- unaffected
firefox67 --- unaffected
firefox68 --- unaffected
firefox69 --- unaffected
firefox70 --- unaffected
firefox71 --- wontfix
firefox72 --- wontfix
firefox73 --- fixed

People

(Reporter: marauder, Assigned: jkt)

References

(Regression)

Details

(Keywords: perf, perf-alert, regression, Whiteboard: [necko-triaged])

Attachments

(5 files)

Raptor has detected a Firefox performance regression from push:

https://hg.mozilla.org/integration/autoland/pushloghtml?changeset=d52a0108ed8ef2593db6fe0dafedbc2790941fd0

(linux64-shippable, linux64-shippable-qr, macosx1014-64-shippable, windows10-64-shippable, windows10-64-shippable-qr, windows7-32-shippable)

As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

11% raptor-tp6-tumblr-firefox loadtime macosx1014-64-shippable opt 942.85 -> 1,042.75
9% raptor-tp6-tumblr-firefox loadtime windows7-32-shippable opt 599.27 -> 654.92
8% raptor-tp6-tumblr-firefox loadtime windows10-64-shippable opt 614.17 -> 665.42
8% raptor-tp6-tumblr-firefox loadtime windows10-64-shippable-qr opt 611.15 -> 661.29
7% raptor-tp6-tumblr-firefox loadtime linux64-shippable-qr opt 632.46 -> 673.58
6% raptor-tp6-tumblr-firefox loadtime linux64-shippable opt 613.27 -> 650.38

Improvements:

3% raptor-tp6-google-mail-firefox fcp linux64-shippable-qr opt 241.98 -> 235.75
2% raptor-tp6-google-mail-firefox linux64-shippable-qr opt 268.75 -> 262.12

You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=23284

On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a Treeherder page showing the Raptor jobs in a pushlog format.

To learn more about the regressing test(s) or reproducing them, please see: https://wiki.mozilla.org/TestEngineering/Performance/Raptor

*** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! ***

Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/TestEngineering/Performance/Talos/RegressionBugsHandling

Blocks: 1578356
Component: Performance → Networking: HTTP
Flags: needinfo?(jkt)
Product: Testing → Core
Regressed by: 1237782
Target Milestone: --- → mozilla71
Version: Version 3 → unspecified

:marauder how confident are we that my patch caused this, the graph doesn't have many data points on it.

I can't see that tumblr loads a manifest at all which granted would probably cause this regression.
The zip file appears to contain everything that it loads, I'm not sure if I am reading it correctly but again I can't see a appcache manifest file loading here.

Flags: needinfo?(jkt) → needinfo?(marian.raiciof)
Flags: needinfo?(marian.raiciof)
Priority: -- → P1
Whiteboard: [necko-triaged]
Assignee: nobody → jkt

To me this test is broken, I compared a reversion of the patch using osx to the current central and the reversion was slower:

https://treeherder.mozilla.org/perf.html?selectMulti=raptor-tp6-tumblr-firefox+opt#/graphs?timerange=1209600&series=try,2014435,1,10&series=mozilla-central,1872186,1,10&highlightedRevisions=16891d6bd2d99&highlightedRevisions=a1c23ce146b21&highlightAlerts=0&zoom=1569630724365,1570030096005,403.84859259259247,896.3651851851853

I have a patch that changes all the code to using static prefs and also prevents the browser even spinning up the parser and again locally it was slower, the test to me seems flakey in it's results and perhaps that has been another patch in central that has caused this around the time of my patch.

I'll update this bug when my series of patches have passed through try which I am confident that it would fix any regressions if it were possible given that I turned everything static. I think we should probably think about putting these changes in regardless as they vastly simplify the cache service... However I'm not confident we will see a perf improvement in this particular test and I still think we should be looking for another cause of it.

Flags: needinfo?(fstrugariu)

To shed some light on the performance sherriffing process we base our performance data on autoland and inbound.
Also when we confirm an performance alert improvement or regression we re-trigger and run the same test tests to the commits before and after the suspected alert to make sure we are detecting the correct commit.

From your comment I see you are comparing try with mozilla central, which is a moving target.

I would suggest to test your changes on try. first push a commit with latest autoland and the secon push sould contain the same "latest" autoland and your changes. This way you can compare them and see if there are any performance improvments/regressions in your changes.
https://wiki.mozilla.org/TestEngineering/Performance/Raptor#Running_Raptor_on_Try

The raptor-tp6-tumblr-firefox test contains recordings of the tumbler website that are replayed with a proxy service with no outbound connections to make we are in a "air tight" environment.
https://wiki.mozilla.org/TestEngineering/Performance/Raptor#Warm_Page-Load

As your initial bug makes changes to the appcache I suspect tumblr uses this technology in their implementation and that's why our test identified a regression.
Also If yo take a look your changes introduces a "bimodality" to the tests results. Test that historically was quite stable

If you have any issues running our tests on try or questions about performance you can find us on the #perftest channel on Slack

Flags: needinfo?(fstrugariu) → needinfo?(jkt)

Hey :)

(In reply to Florin Strugariu [:Bebe] (needinfo me) from comment #8)

To shed some light on the performance sherriffing process we base our performance data on autoland and inbound.
Also when we confirm an performance alert improvement or regression we re-trigger and run the same test tests to the commits before and after the suspected alert to make sure we are detecting the correct commit.

From your comment I see you are comparing try with mozilla central, which is a moving target.

I would suggest to test your changes on try. first push a commit with latest autoland and the secon push sould contain the same "latest" autoland and your changes. This way you can compare them and see if there are any performance improvments/regressions in your changes.

I will test them now on autoland, I think they are worthwhile anyway in reducing complexity but we can always move them into another bug.

https://wiki.mozilla.org/TestEngineering/Performance/Raptor#Running_Raptor_on_Try

Sorry my previous comment wasn't clear as I had both try and central on the graph I linked to (https://treeherder.mozilla.org/perf.html?selectMulti=raptor-tp6-tumblr-firefox+opt#/graphs?timerange=1209600&series=try,2014435,1,10&series=mozilla-central,1872186,1,10&highlightedRevisions=16891d6bd2d99&highlightedRevisions=a1c23ce146b21&highlightAlerts=0&zoom=1569630724365,1570030096005,403.84859259259247,896.3651851851853).

The highlighted commits were both from try, the latest central at the time code pushed to try vs the latest code on central with the patch I made reverted. The reverted patch was slower.

The raptor-tp6-tumblr-firefox test contains recordings of the tumbler website that are replayed with a proxy service with no outbound connections to make we are in a "air tight" environment.
https://wiki.mozilla.org/TestEngineering/Performance/Raptor#Warm_Page-Load

As your initial bug makes changes to the appcache I suspect tumblr uses this technology in their implementation and that's why our test identified a regression.

I downloaded the zip that the proxy service uses and from what I can tell nothing was using appcache.

Also If yo take a look your changes introduces a "bimodality" to the tests results. Test that historically was quite stable

If you have any issues running our tests on try or questions about performance you can find us on the #perftest channel on Slack

Some questions:

  • If the problem is indeed that tumblr or a subresource used appcache can we update the test as it doesn't look like they still do.
  • If they do on live (I don't have an account to verify their account pages right now), can we accept the performance hit.
  • Is there anything else that might introduce this bimodality?

I ran the tests again a few times:

Inbound (https://treeherder.mozilla.org/#/jobs?repo=try&selectedJob=269781247&revision=4c706780fe5e7ddf1bc781edcd2c41444a648999):
667.98, 743.64, 728.74, 621.98, 733.17, 641.49, 720.56

Inboud with patches (https://treeherder.mozilla.org/#/jobs?repo=try&revision=7579cdd52e02def2baf447fc861fe00489d3cac4):
738.8, 688.22, 660.91, 642.16, 713.79, 727.71, 755.35, 745.04

Inbound with backout (https://treeherder.mozilla.org/#/jobs?repo=try&revision=708c015b2805a70c1b185bb4592f9f6a013caa69):
697.62, 700.58, 619.79, 699.12, 610.98, 608.65, 706.36, 784.55

I don't see anything obvious in any of these, maybe try isn't stable enough for perf testing this?

Thanks

Flags: needinfo?(jkt) → needinfo?(fstrugariu)

Jonathan, can we get an update on the status of this bug? (I see that you have approved patches that could land and we are close to merge day) Thanks!

Flags: needinfo?(jkt)

Sorry I moved onto another bug, I wasn't able to reproduce any notable changes with any of the patches or the backout. I can land those and we can see if there is an improvement?

Flags: needinfo?(jkt) → needinfo?(pascalc)

(In reply to Jonathan Kingston [:jkt] from comment #11)

Sorry I moved onto another bug, I wasn't able to reproduce any notable changes with any of the patches or the backout. I can land those and we can see if there is an improvement?

That sounds good to me.

Flags: needinfo?(pascalc)
Pushed by jkingston@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/d30cbc91bfda Don't parse manifest when storage is disabled r=baku
Pushed by jkingston@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/8015660b170f Simplify exposure of AppCache interface for SecureContext r=baku https://hg.mozilla.org/integration/autoland/rev/c13ec2bbc0a7 Remove all pref observers in CacheService r=mayhemer https://hg.mozilla.org/integration/autoland/rev/98d7a4f4388f Set appcache prefs on before start to prevent races in tests. r=baku
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Pushed by jkingston@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/78a947a05f08 Simplify exposure of AppCache interface for SecureContext r=baku https://hg.mozilla.org/integration/autoland/rev/0bd7ff0df367 Remove all pref observers in CacheService r=mayhemer https://hg.mozilla.org/integration/autoland/rev/a55fbe377803 Set appcache prefs on before start to prevent races in tests. r=baku
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: mozilla71 → ---
Pushed by jkingston@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/43eaaab0cec7 Simplify exposure of AppCache interface for SecureContext r=baku https://hg.mozilla.org/integration/autoland/rev/8e5bb8f2f831 Remove all pref observers in CacheService r=mayhemer https://hg.mozilla.org/integration/autoland/rev/341d72ef4bf5 Set appcache prefs on before start to prevent races in tests. r=baku

(In reply to Jonathan Kingston [:jkt] from comment #9)

[...]
I downloaded the zip that the proxy service uses and from what I can tell nothing was using appcache.
[...]
Some questions:

  • If the problem is indeed that tumblr or a subresource used appcache can we update the test as it doesn't look like they still do.

If the downloaded zip wasn't hinting anything about our recordings using appcache, then probably this isn't something trivial to check. Still, we need to know for sure if these recordings are using appcache or not. Please try and simulate something at Firefox source code level, so we can tell if the appcache mechanics are hit when running the affected perf tests.

  • If they do on live (I don't have an account to verify their account pages right now), can we accept the performance hit.

If we can properly confirm that the appcache mechanics are hit, then we can consider accepting the performance hit.

  • Is there anything else that might introduce this bimodality?

Not really. We've done many retriggers & concluded that bug 1237782 alone brought the bimodality.

I ran the tests again a few times:

Inbound (https://treeherder.mozilla.org/#/jobs?repo=try&selectedJob=269781247&revision=4c706780fe5e7ddf1bc781edcd2c41444a648999):
667.98, 743.64, 728.74, 621.98, 733.17, 641.49, 720.56

Inboud with patches (https://treeherder.mozilla.org/#/jobs?repo=try&revision=7579cdd52e02def2baf447fc861fe00489d3cac4):
738.8, 688.22, 660.91, 642.16, 713.79, 727.71, 755.35, 745.04

Inbound with backout (https://treeherder.mozilla.org/#/jobs?repo=try&revision=708c015b2805a70c1b185bb4592f9f6a013caa69):
697.62, 700.58, 619.79, 699.12, 610.98, 608.65, 706.36, 784.55

I don't see anything obvious in any of these, maybe try isn't stable enough for perf testing this?

I don't think there are issues with simulating this on Try. We checked the noise levels, which are similar with what we have on trunk branches.

Flags: needinfo?(fstrugariu)
Status: REOPENED → RESOLVED
Closed: 5 years ago5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla72

Is this something which needs Beta uplift consideration? Looks like a pretty sizeable regression.

Blocks: 1592626
No longer blocks: 1592626

:igoldan do we know if the patches fixed the regression and if we should uplift?

Flags: needinfo?(jkt) → needinfo?(igoldan)

(In reply to Jonathan Kingston [:jkt] from comment #23)

:igoldan do we know if the patches fixed the regression and if we should uplift?

I checked the push from comment 19 and it didn't bring any kind of improvements.

Flags: needinfo?(igoldan)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---

:jkt please make sure to follow the steps described in comment 20. Otherwise, we cannot conclude this bug and close it.

Flags: needinfo?(jkt)

:igoldan sorry I have been looking at some fission blockers so I have been unable to give any time to this. I will say that I haven't been able to see the code being triggered under the raptor tests however I wasn't able to confirm either if the test actually notices when a content process crashes either... that's about where I got last time I looked.
So until I can 100% confirm that this isn't appcache then I suspect this should remain open. I'm pretty sure it must be otherwise we would be seeing regressions across the board if let's say new tab had some reason it was spinning up the appcache service even if it wasn't being used.

If this is urgent that we investigate further to ensure we don't produce a stable regression are we able to test if the pref I added fixes the regression?

Flags: needinfo?(jkt)

Ionuț could you answer Jonathan's question above please? Thanks

Flags: needinfo?(igoldan)

(In reply to Jonathan Kingston [:jkt] from comment #26)

If this is urgent that we investigate further to ensure we don't produce a stable regression

If we don't investigate now, we'll lose the ability to conclude this. This is because we'll update our tests and their new baselines won't tell us whether the regressions went away or not.
And in time, this ticket will be closed as WONTFIX or INCOMPLETE.

are we able to test if the pref I added fixes the regression?

Are you referring to a new fix?

Flags: needinfo?(igoldan) → needinfo?(jkt)

Are you referring to a new fix?

Sorry no, I was interested if the regression is removed by flipping the pref back which the patch adds. Which of course means we can chase it again in the future too.

Flags: needinfo?(jkt) → needinfo?(igoldan)

(In reply to Jonathan Kingston [:jkt] from comment #29)

Are you referring to a new fix?

Sorry no, I was interested if the regression is removed by flipping the pref back which the patch adds.

Could you check that via a Try push and share the results?

Flags: needinfo?(igoldan)

We have shipped our last beta for 71, so this is wontfix for 71.

Flags: needinfo?(jkt)

Even though the baselines have returned to normal since around Oct 22, that is only attributed to bug 1583700, which caused these perf changes.

If bug 1591864's changes are somewhat related to bug 1237782, please let us know.

Does this bug still need to stay open? Are we accepting the perf regression here?
In any case, it is likely too late for a fix in 72.

Should I try outreach to tumblr?

When changing the pref I didn't see any difference in raptor: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=1259e9c66e5295cba3543c2bc0e0830e037af518&newProject=try&newRevision=8f32be0fa95103fd28c36cdcef7be7f3097b02d1&framework=10

This might mean something that has changed has an impact outside of the pref. I was looking into this but got put onto other work.

I would like to keep this open to investigate further however I can't guarantee set time on this to fix it. For me anyway this is at least blocking rolling this pref enable out further to release channel.

I'm not even sure if we should outreach to tumblr given that the regression isn't even clear if it's from appcache.

Flags: needinfo?(jkt)

Ok with the latest patch there isn't any difference in the performance however with the latest patch and the pref disabled I'm seeing ~6% improvement on the test:
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=14e686d8583a0a7ac15856dad4572d0cad44f45e&newProject=try&newRevision=b4a869f06939ef24ec4c5e0ceb0f505f7ceca2e0&framework=10

It does seem bimodal still though which is odd.

Pushed by jkingston@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/8f9b1c2601bc Ensure AppCache is disabled in all cases. r=baku
Status: REOPENED → RESOLVED
Closed: 5 years ago5 years ago
Resolution: --- → FIXED
Target Milestone: mozilla72 → mozilla73
Has Regression Range: --- → yes
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: