Open Bug 1585582 Opened 2 months ago Updated 4 days ago

6.05 - 10.6% raptor-tp6-tumblr-firefox loadtime regression on push d52a0108ed8ef2593db6fe0dafedbc2790941fd0 (Mon September 30 2019)

Categories

(Core :: Networking: HTTP, defect, P1)

defect

Tracking

()

REOPENED
mozilla72
Tracking Status
firefox-esr60 --- unaffected
firefox-esr68 --- unaffected
firefox67 --- unaffected
firefox68 --- unaffected
firefox69 --- unaffected
firefox70 --- unaffected
firefox71 --- affected
firefox72 --- affected

People

(Reporter: marauder, Assigned: jkt)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: perf, perf-alert, regression, Whiteboard: [necko-triaged])

Attachments

(4 files)

Raptor has detected a Firefox performance regression from push:

https://hg.mozilla.org/integration/autoland/pushloghtml?changeset=d52a0108ed8ef2593db6fe0dafedbc2790941fd0

(linux64-shippable, linux64-shippable-qr, macosx1014-64-shippable, windows10-64-shippable, windows10-64-shippable-qr, windows7-32-shippable)

As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

11% raptor-tp6-tumblr-firefox loadtime macosx1014-64-shippable opt 942.85 -> 1,042.75
9% raptor-tp6-tumblr-firefox loadtime windows7-32-shippable opt 599.27 -> 654.92
8% raptor-tp6-tumblr-firefox loadtime windows10-64-shippable opt 614.17 -> 665.42
8% raptor-tp6-tumblr-firefox loadtime windows10-64-shippable-qr opt 611.15 -> 661.29
7% raptor-tp6-tumblr-firefox loadtime linux64-shippable-qr opt 632.46 -> 673.58
6% raptor-tp6-tumblr-firefox loadtime linux64-shippable opt 613.27 -> 650.38

Improvements:

3% raptor-tp6-google-mail-firefox fcp linux64-shippable-qr opt 241.98 -> 235.75
2% raptor-tp6-google-mail-firefox linux64-shippable-qr opt 268.75 -> 262.12

You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=23284

On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a Treeherder page showing the Raptor jobs in a pushlog format.

To learn more about the regressing test(s) or reproducing them, please see: https://wiki.mozilla.org/TestEngineering/Performance/Raptor

*** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! ***

Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/TestEngineering/Performance/Talos/RegressionBugsHandling

Blocks: 1578356
Component: Performance → Networking: HTTP
Flags: needinfo?(jkt)
Product: Testing → Core
Regressed by: 1237782
Target Milestone: --- → mozilla71
Version: Version 3 → unspecified

:marauder how confident are we that my patch caused this, the graph doesn't have many data points on it.

I can't see that tumblr loads a manifest at all which granted would probably cause this regression.
The zip file appears to contain everything that it loads, I'm not sure if I am reading it correctly but again I can't see a appcache manifest file loading here.

Flags: needinfo?(jkt) → needinfo?(marian.raiciof)
Priority: -- → P1
Whiteboard: [necko-triaged]
Assignee: nobody → jkt

To me this test is broken, I compared a reversion of the patch using osx to the current central and the reversion was slower:

https://treeherder.mozilla.org/perf.html?selectMulti=raptor-tp6-tumblr-firefox+opt#/graphs?timerange=1209600&series=try,2014435,1,10&series=mozilla-central,1872186,1,10&highlightedRevisions=16891d6bd2d99&highlightedRevisions=a1c23ce146b21&highlightAlerts=0&zoom=1569630724365,1570030096005,403.84859259259247,896.3651851851853

I have a patch that changes all the code to using static prefs and also prevents the browser even spinning up the parser and again locally it was slower, the test to me seems flakey in it's results and perhaps that has been another patch in central that has caused this around the time of my patch.

I'll update this bug when my series of patches have passed through try which I am confident that it would fix any regressions if it were possible given that I turned everything static. I think we should probably think about putting these changes in regardless as they vastly simplify the cache service... However I'm not confident we will see a perf improvement in this particular test and I still think we should be looking for another cause of it.

Flags: needinfo?(fstrugariu)

Depends on D47959

To shed some light on the performance sherriffing process we base our performance data on autoland and inbound.
Also when we confirm an performance alert improvement or regression we re-trigger and run the same test tests to the commits before and after the suspected alert to make sure we are detecting the correct commit.

From your comment I see you are comparing try with mozilla central, which is a moving target.

I would suggest to test your changes on try. first push a commit with latest autoland and the secon push sould contain the same "latest" autoland and your changes. This way you can compare them and see if there are any performance improvments/regressions in your changes.
https://wiki.mozilla.org/TestEngineering/Performance/Raptor#Running_Raptor_on_Try

The raptor-tp6-tumblr-firefox test contains recordings of the tumbler website that are replayed with a proxy service with no outbound connections to make we are in a "air tight" environment.
https://wiki.mozilla.org/TestEngineering/Performance/Raptor#Warm_Page-Load

As your initial bug makes changes to the appcache I suspect tumblr uses this technology in their implementation and that's why our test identified a regression.
Also If yo take a look your changes introduces a "bimodality" to the tests results. Test that historically was quite stable

If you have any issues running our tests on try or questions about performance you can find us on the #perftest channel on Slack

Flags: needinfo?(fstrugariu) → needinfo?(jkt)

Hey :)

(In reply to Florin Strugariu [:Bebe] (needinfo me) from comment #8)

To shed some light on the performance sherriffing process we base our performance data on autoland and inbound.
Also when we confirm an performance alert improvement or regression we re-trigger and run the same test tests to the commits before and after the suspected alert to make sure we are detecting the correct commit.

From your comment I see you are comparing try with mozilla central, which is a moving target.

I would suggest to test your changes on try. first push a commit with latest autoland and the secon push sould contain the same "latest" autoland and your changes. This way you can compare them and see if there are any performance improvments/regressions in your changes.

I will test them now on autoland, I think they are worthwhile anyway in reducing complexity but we can always move them into another bug.

https://wiki.mozilla.org/TestEngineering/Performance/Raptor#Running_Raptor_on_Try

Sorry my previous comment wasn't clear as I had both try and central on the graph I linked to (https://treeherder.mozilla.org/perf.html?selectMulti=raptor-tp6-tumblr-firefox+opt#/graphs?timerange=1209600&series=try,2014435,1,10&series=mozilla-central,1872186,1,10&highlightedRevisions=16891d6bd2d99&highlightedRevisions=a1c23ce146b21&highlightAlerts=0&zoom=1569630724365,1570030096005,403.84859259259247,896.3651851851853).

The highlighted commits were both from try, the latest central at the time code pushed to try vs the latest code on central with the patch I made reverted. The reverted patch was slower.

The raptor-tp6-tumblr-firefox test contains recordings of the tumbler website that are replayed with a proxy service with no outbound connections to make we are in a "air tight" environment.
https://wiki.mozilla.org/TestEngineering/Performance/Raptor#Warm_Page-Load

As your initial bug makes changes to the appcache I suspect tumblr uses this technology in their implementation and that's why our test identified a regression.

I downloaded the zip that the proxy service uses and from what I can tell nothing was using appcache.

Also If yo take a look your changes introduces a "bimodality" to the tests results. Test that historically was quite stable

If you have any issues running our tests on try or questions about performance you can find us on the #perftest channel on Slack

Some questions:

  • If the problem is indeed that tumblr or a subresource used appcache can we update the test as it doesn't look like they still do.
  • If they do on live (I don't have an account to verify their account pages right now), can we accept the performance hit.
  • Is there anything else that might introduce this bimodality?

I ran the tests again a few times:

Inbound (https://treeherder.mozilla.org/#/jobs?repo=try&selectedJob=269781247&revision=4c706780fe5e7ddf1bc781edcd2c41444a648999):
667.98, 743.64, 728.74, 621.98, 733.17, 641.49, 720.56

Inboud with patches (https://treeherder.mozilla.org/#/jobs?repo=try&revision=7579cdd52e02def2baf447fc861fe00489d3cac4):
738.8, 688.22, 660.91, 642.16, 713.79, 727.71, 755.35, 745.04

Inbound with backout (https://treeherder.mozilla.org/#/jobs?repo=try&revision=708c015b2805a70c1b185bb4592f9f6a013caa69):
697.62, 700.58, 619.79, 699.12, 610.98, 608.65, 706.36, 784.55

I don't see anything obvious in any of these, maybe try isn't stable enough for perf testing this?

Thanks

Flags: needinfo?(jkt) → needinfo?(fstrugariu)

Jonathan, can we get an update on the status of this bug? (I see that you have approved patches that could land and we are close to merge day) Thanks!

Flags: needinfo?(jkt)

Sorry I moved onto another bug, I wasn't able to reproduce any notable changes with any of the patches or the backout. I can land those and we can see if there is an improvement?

Flags: needinfo?(jkt) → needinfo?(pascalc)

(In reply to Jonathan Kingston [:jkt] from comment #11)

Sorry I moved onto another bug, I wasn't able to reproduce any notable changes with any of the patches or the backout. I can land those and we can see if there is an improvement?

That sounds good to me.

Flags: needinfo?(pascalc)
Pushed by jkingston@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/d30cbc91bfda
Don't parse manifest when storage is disabled r=baku
Pushed by jkingston@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/8015660b170f
Simplify exposure of AppCache interface for SecureContext r=baku
https://hg.mozilla.org/integration/autoland/rev/c13ec2bbc0a7
Remove all pref observers in CacheService r=mayhemer
https://hg.mozilla.org/integration/autoland/rev/98d7a4f4388f
Set appcache prefs on before start to prevent races in tests. r=baku
Status: NEW → RESOLVED
Closed: Last month
Resolution: --- → FIXED
Pushed by jkingston@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/78a947a05f08
Simplify exposure of AppCache interface for SecureContext r=baku
https://hg.mozilla.org/integration/autoland/rev/0bd7ff0df367
Remove all pref observers in CacheService r=mayhemer
https://hg.mozilla.org/integration/autoland/rev/a55fbe377803
Set appcache prefs on before start to prevent races in tests. r=baku
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: mozilla71 → ---
Pushed by jkingston@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/43eaaab0cec7
Simplify exposure of AppCache interface for SecureContext r=baku
https://hg.mozilla.org/integration/autoland/rev/8e5bb8f2f831
Remove all pref observers in CacheService r=mayhemer
https://hg.mozilla.org/integration/autoland/rev/341d72ef4bf5
Set appcache prefs on before start to prevent races in tests. r=baku

(In reply to Jonathan Kingston [:jkt] from comment #9)

[...]
I downloaded the zip that the proxy service uses and from what I can tell nothing was using appcache.
[...]
Some questions:

  • If the problem is indeed that tumblr or a subresource used appcache can we update the test as it doesn't look like they still do.

If the downloaded zip wasn't hinting anything about our recordings using appcache, then probably this isn't something trivial to check. Still, we need to know for sure if these recordings are using appcache or not. Please try and simulate something at Firefox source code level, so we can tell if the appcache mechanics are hit when running the affected perf tests.

  • If they do on live (I don't have an account to verify their account pages right now), can we accept the performance hit.

If we can properly confirm that the appcache mechanics are hit, then we can consider accepting the performance hit.

  • Is there anything else that might introduce this bimodality?

Not really. We've done many retriggers & concluded that bug 1237782 alone brought the bimodality.

I ran the tests again a few times:

Inbound (https://treeherder.mozilla.org/#/jobs?repo=try&selectedJob=269781247&revision=4c706780fe5e7ddf1bc781edcd2c41444a648999):
667.98, 743.64, 728.74, 621.98, 733.17, 641.49, 720.56

Inboud with patches (https://treeherder.mozilla.org/#/jobs?repo=try&revision=7579cdd52e02def2baf447fc861fe00489d3cac4):
738.8, 688.22, 660.91, 642.16, 713.79, 727.71, 755.35, 745.04

Inbound with backout (https://treeherder.mozilla.org/#/jobs?repo=try&revision=708c015b2805a70c1b185bb4592f9f6a013caa69):
697.62, 700.58, 619.79, 699.12, 610.98, 608.65, 706.36, 784.55

I don't see anything obvious in any of these, maybe try isn't stable enough for perf testing this?

I don't think there are issues with simulating this on Try. We checked the noise levels, which are similar with what we have on trunk branches.

Flags: needinfo?(fstrugariu)
Status: REOPENED → RESOLVED
Closed: Last month26 days ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla72

Is this something which needs Beta uplift consideration? Looks like a pretty sizeable regression.

No longer blocks: 1592626

:igoldan do we know if the patches fixed the regression and if we should uplift?

Flags: needinfo?(jkt) → needinfo?(igoldan)

(In reply to Jonathan Kingston [:jkt] from comment #23)

:igoldan do we know if the patches fixed the regression and if we should uplift?

I checked the push from comment 19 and it didn't bring any kind of improvements.

Flags: needinfo?(igoldan)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---

:jkt please make sure to follow the steps described in comment 20. Otherwise, we cannot conclude this bug and close it.

Flags: needinfo?(jkt)

:igoldan sorry I have been looking at some fission blockers so I have been unable to give any time to this. I will say that I haven't been able to see the code being triggered under the raptor tests however I wasn't able to confirm either if the test actually notices when a content process crashes either... that's about where I got last time I looked.
So until I can 100% confirm that this isn't appcache then I suspect this should remain open. I'm pretty sure it must be otherwise we would be seeing regressions across the board if let's say new tab had some reason it was spinning up the appcache service even if it wasn't being used.

If this is urgent that we investigate further to ensure we don't produce a stable regression are we able to test if the pref I added fixes the regression?

Flags: needinfo?(jkt)
You need to log in before you can comment on or make changes to this bug.