Closed Bug 1660458 Opened 4 years ago Closed 4 years ago

0.42% installer size (osx-shippable) regression on push 8cb615e26bdcff161b73c03d919db2c7cf140e16 (Thu August 20 2020)

Categories

(Core :: Gecko Profiler, defect, P1)

Firefox 81
defect

Tracking

()

RESOLVED FIXED
82 Branch
Tracking Status
firefox-esr68 --- unaffected
firefox-esr78 --- unaffected
firefox79 --- unaffected
firefox80 --- wontfix
firefox81 --- fixed
firefox82 --- fixed

People

(Reporter: aesanu, Assigned: mozbugz)

References

(Regression)

Details

(Keywords: perf-alert, regression)

Attachments

(1 file)

Perfherder has detected a build_metrics performance regression from push 8cb615e26bdcff161b73c03d919db2c7cf140e16. As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

0.42% installer size osx-shippable opt nightly 75,624,499.58 -> 75,944,078.75

Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests. Please follow our guide to handling regression bugs and let us know your plans within 3 business days, or the offending patch(es) will be backed out in accordance with our regression policy.

For more information on performance sheriffing please see our FAQ.

Flags: needinfo?(gsquelart)
Component: Performance → Gecko Profiler
Product: Testing → Core

Hey Gerald, which of the Bug 1659901 and Bug 1657174 make sense to cause regression?

Bug 1657174 only added some assertions in a cpp, while bug 1659901 added some inline-able code in a popular header, so I'd say it's most probably from bug 1659901.

I'll investigate soon...

Assignee: nobody → gsquelart
Severity: -- → S3
Flags: needinfo?(gsquelart)
Priority: -- → P1
No longer regressed by: 1657174
Has Regression Range: --- → yes

I've been investigating this over the past few days. (It can be slow for me, it takes >1h to build on my Mac, or I need to use Try.)

First, it seems that bug 1659901 fixed a source of 600+ crashes per day on Android (tracked in bug 1657174), so I think we do need to keep it, or at least something like it.

I did an experiment on Try, to verify the assumption that these patches caused the regression (and also, to test&compare potential fixes). Running the "Bpgo B" task, here are the installer sizes:

So that confirms this bug, an increase of more than 300KB.

I've got a WIP patch attempting to fix the size issue, while trying to keep the benefits of fixing crashes:

So there is still a small size increase, but it's only 80KB. I hope that's more acceptable.

Andra, could you please confirm:

  1. Do my try numbers look correct to you, am I testing the right thing?
  2. Since the patches causing this size regression are fixing lots of crashes, you will not back them out automatically?
  3. A size increase of around 80KB (instead of 300+) would be acceptable?
Flags: needinfo?(aesanu)

Hey Gerald,

  1. The numbers seem to be alright.
  2. It's not the case for an automatically backout.
  3. A regression is considered if the installer size is >= 100KB, a size increase of around 80KB would resolve this problem.
Flags: needinfo?(aesanu)

The profiler should guarantee that TLS initializations are done only once (from profiler_init()), so there is no need to potentially do it at every TLS access.
Instead, the initialization functions set the TLS states once (to Initialized or Unavailable, depending on the underlying TLS initialization success), and later accesses to this effectively-read-only flag can be done without thread synchronization.
This reduces the generated code size.

There are also DEBUG-build checks that no accesses are done before initialization; this is theoretically racy, but it's only used to detect too-early accesses in DEBUG builds, which should never happen anyway.

Installer sizes: (Sizes in MiB, differences in KiB)

Try Mac Linux (32b) Windows (32b)
Base 72.8 71.1 77.9
Revert regression 72.6 (-235.5) 71.0 (-117.4) 77.8 (-57.9)
This patch 72.7 (+115.3) 71.0 (+42.7) 77.8 (+10.0)

The 2nd row indicates how much space the regression patches cost (e.g., 235.5KiB for the Mac installer), and the 3rd row shows how much the proposed patch here costs after reverting the regression (e.g., 115.3KiB on Mac).

It's not quite as good as comment 3 gave us hope for, but I think it's still better than before (at least half the size of the previous patches), knowing that we need these patches to fix a correctness issue that resulted in lots of crashes on Android (but probably on other platforms as well though in smaller numbers).
So I would propose that we review and land this patch, and I'm happy to open a follow-up, to see if we can make it even better later on...

Pushed by gsquelart@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/fe8841cd50d7 Improve TLS initialization and usage, remove inline-able function-static init - r=canaltinova
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → 82 Branch

This commit introduced a new regression. As this is already a regression bug I will reopen it so we can investigate it here.

== Change summary for alert #26891 (as of Wed, 09 Sep 2020 01:22:30 GMT) ==

Regressions:

0.19% installer size osx-shippable opt instrumented 104,324,197.33 -> 104,518,962.17

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=26891

Status: RESOLVED → REOPENED
Flags: needinfo?(gsquelart)
Resolution: FIXED → ---

(In reply to Florin Strugariu [:Bebe] (needinfo me) from comment #9)

This commit introduced a new regression. As this is already a regression bug I will reopen it so we can investigate it here.
0.19% installer size osx-shippable opt instrumented 104,324,197.33 -> 104,518,962.17

Looking back at my tries (from comment 6), for the mac "instr" builds, we have:

  • Base: 104,352,248
  • Reverting the previous regression: 104,437,498
  • This patch: 104,519,662

So even though the patch does add 163KB, it's really "only" 80KB compared to what would it would have been before bug 1659901.

More importantly I think, that's the "instrumented" installer, I thought it was only used to then do the PGO build with goes into the published installer? (Please let me know if I'm incorrect.)
So is it important to keep low?
Especially since the patch here seems to have decreased the non-instrumented installer by 178KB:
https://treeherder.mozilla.org/perf.html#/graphs?highlightAlerts=1&highlightedRevisions=80ac8d8c74d53&series=autoland,2259234,1,2&series=autoland,1921012,1,2&timerange=1209600

Florin, what do you think?
In any case, I'd prefer to keep this patch (better code than before) associated with this bug, so please file a new bug if you think we really should investigate the instrumented-build size regression.

Status: REOPENED → RESOLVED
Closed: 4 years ago4 years ago
Flags: needinfo?(gsquelart)
Resolution: --- → FIXED

Agreed that instrumented build regressions aren't important if it's not affecting the builds we actually ship to users. I'm a bit torn on whether this is worth uplifting this late in the cycle, though. We're down to our final beta before RC week next week. Gerald, what are your thoughts?

(In reply to Ryan VanderMeulen [:RyanVM] from comment #11)

Agreed that instrumented build regressions aren't important if it's not affecting the builds we actually ship to users. I'm a bit torn on whether this is worth uplifting this late in the cycle, though. We're down to our final beta before RC week next week. Gerald, what are your thoughts?

Thank you Ryan.

While I'm fairly confident the code here is even better than the one we uplifted in bug 1659901, it hasn't had much time to prove it's indeed doing its job correctly; It does bring a bit more risk than before:

  • Bug 1659901: Function-static initialization, guaranteed to be thread-safe by the compiler (assuming we trust the compiler of course!)
  • Now: No active thread safety, but assuming that the initialization code is always run first before any other access (which I believe to be true because it's called very early when Firefox starts, but can't be guaranteed 100%).

So I personally don't think it's worth the risk of uplifting, for the small decrease in installer size. But that decrease will still happen as this bug rides the train.

Thanks for the input. Let's let it ride 82 then.

Comment on attachment 9174157 [details]
Bug 1660458 - Improve TLS initialization and usage, remove inline-able function-static init - r?canaltinova

Beta/Release Uplift Approval Request

  • User impact if declined: needed for a crash fix uplift (bug 1666246)
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): It has been on Nightly for three weeks and no problems were found.
  • String changes made/needed:
Attachment #9174157 - Flags: approval-mozilla-release?

Comment on attachment 9174157 [details]
Bug 1660458 - Improve TLS initialization and usage, remove inline-able function-static init - r?canaltinova

Approved for 81.0.1.

Attachment #9174157 - Flags: approval-mozilla-release? → approval-mozilla-release+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: