1515213 - Investigation usage of clang -ftrivial-auto-var-init flag

Paul Theriault [:pauljt] (no longer reading bugmail)

Reporter

Description

•

6 years ago

Bug 1514965 has the details here - I'm just filing this bug so we have a tracking bug in our queue for this. Decoder suggested it might be a useful activity for our team to pursue.

Chris Peterson [:cpeterson]

Comment 1

•

6 years ago

Microsoft enabled a similar MSVC flag for auto initialization (with default value 0) for Windows kernel code. If we used clang on all platforms, we could use -ftrivial-auto-var-init=zero to zero-initialize auto variables for safety, not just debugging. https://twitter.com/JosephBialek/status/1062774315098112001 "Please join the Windows kernel in wishing farewell to uninitialized plain-old-data structs on the stack. As of today's WIPFast build, any Windows code compiled with /kernel also gets compiled with InitAll, a compiler security feature that initializes POD structs at declaration. Between 2017 and mid 2018, this feature would have killed 49 MSRC cases that involved uninitialized struct data leaking across a trust boundary. It would have also mitigated a number of bugs involving uninitialized struct data being used directly. To date, we are seeing noise level performance regressions caused by this change. We accomplished this by improving the compilers ability to kill redundant stores. While everything is initialized at declaration, most of these initializations can be proven redundant and eliminated. We are conservative with how we roll this out. Starting with kernel components and than expanding scope. It's a bit scary from a perf perspective and I'd like to avoid fire drills. Kernel components are by far the most impacted in MSRC case data so it's a good starting point."

Frederik Braun [:freddy]

Updated

•

6 years ago

Type: enhancement → task

Paul Theriault [:pauljt] (no longer reading bugmail)

Reporter

Updated

•

6 years ago

Priority: -- → P2

Chris Peterson [:cpeterson]

Updated

•

6 years ago

Depends on: 1546873

Frederik Braun [:freddy]

Updated

•

5 years ago

Component: Security: Review Requests → Source Code Analysis

Product: Firefox → Firefox Build System

Chris Peterson [:cpeterson]

Comment 6

•

3 years ago

Here is the Chrome bug to enable -ftrivial-auto-var-init=pattern in official release builds. It's currently enabled in only non-official builds for desktop and ChromeOS, but not Android.

https://bugs.chromium.org/p/chromium/issues/detail?id=977230

Blocks: exploit-mitigation

See Also: → https://bugs.chromium.org/p/chromium/issues/detail?id=977230

Tyson Smith [:tsmith]

Updated

•

3 years ago

Keywords: sec-want

Tyson Smith [:tsmith]

Comment 7

•

3 years ago

•

Edited

Here is a perfherder comparison with -ftrivial-auto-var-init=pattern enabled on an opt build.

https://treeherder.mozilla.org/perfherder/compare?originalProject=try&originalRevision=ef2d1eaa5a80d405df665726cc3e6497289db79d&newProject=try&newRevision=7f330cc417d914f6fd6a0107307884712a5bca7b&framework=13&page=1

Additional notes:

In a recent fuzzing collaboration meeting it was mentioned that efforts to run MSan instrumented builds of Chromium will likely not continue. Comment #6 has a link to the Chromium bug to enable -ftrivial-auto-var-init=pattern
MSan and Valgrind are not reasonable methods of identifying issues. MSan would required a huge initial investment to get Firefox and all of it's dependencies built and instrumented and would require ongoing maintenance (and as mentioned above is being moved away from by other vendors). Valgrind requires a special build (which is no longer maintained on TC) and requires constant suppression list maintenance. It also adds an unreasonable runtime performance overhead that makes fuzzing much less effective. Both methods depend on test suites and fuzzing to identify issues.
Enabling -ftrivial-auto-var-init=pattern should not have ongoing development costs and does not rely on test suites and fuzzing to identify issues to keep users safe. It mitigates risks without treating uninitialized memory usage as defined behavior (preventing developer from actively relying on it).
Forced initialization should be optimized out when using optimization levels >0 if memory is properly initialized before use.
no-opt builds will suffer a higher performance regression (as mentioned previously) and should be excluded from the list of builds enabling this flag.

Questions that I think need to be answered before moving forward are:

What is the threshold or upper limit of the acceptable performance regression?
How can the performance regressions identified by the above perfherder link be investigated further?
Are there areas of the code that we know should be excluded?
Would it be reasonable to enable for Nightly and early Beta builds? This is only meant as an alternative if enabling for all channels is for some reason not an option. This would help shake out any existing latent or intermittent crashes dues to uninitialized memory usage.
- Would this give us any additional performance metrics?
- Should we start on a single platform?

Sylvestre Ledru [:Sylvestre]

Comment 8

•

3 years ago

Dave, if you have an opinion on this :)

Flags: needinfo?(dave.hunt)

Tyson Smith [:tsmith]

Updated

•

3 years ago

URL: https://clang.llvm.org/docs/ClangComm...

Emilio Cobos Álvarez (:emilio)

Comment 9

•

3 years ago

Would it be reasonable to enable for Nightly and early Beta builds? This would help shake out any existing latent or intermittent crashes dues to uninitialized memory usage.

Would it? Wouldn't it paper over issues in Nightly / Early beta, but crash and/or be potentially exploitable in release? (I might be misunderstanding something though)

Alex Thayer [:alexical] (she/her)

Comment 10

•

3 years ago

Quick drive-by question: I'm wondering what this does with AutoTArray (and similar things like nsAutoString) - my guess would be that initialization of its stack buffer is something that doesn't tend to get optimized out and actually affects us?

Tyson Smith [:tsmith]

Comment 11

•

3 years ago

(In reply to Emilio Cobos Álvarez (:emilio) from comment #9)

Would it be reasonable to enable for Nightly and early Beta builds? This would help shake out any existing latent or intermittent crashes dues to uninitialized memory usage.

Would it? Wouldn't it paper over issues in Nightly / Early beta, but crash and/or be potentially exploitable in release? (I might be misunderstanding something though)

At the moment there is no mitigation in place, so all channels are vulnerable. The pattern used may or my not trigger crashes but it is not zero and it's consistent so if it does trigger a crash then it should be much easier to track down the cause. The pattern should also be much easier to eye-ball in crash stats. The intention is not to find the bugs that just a bonus :)

Of course the intention is to enable in all channels.

Emilio Cobos Álvarez (:emilio)

Comment 12

•

3 years ago

Ah i see. As long as it makes stuff more likely to crash on nightly then it'd be an improvement i guess :)

Randell Jesup [:jesup] (needinfo me)

Comment 13

•

3 years ago

•

Edited

From the Chrome bug:

Binary size increase: 1.5%
linux-perf/speedometer2 Total Score: 2.5% slower
linux-perf/blink_perf.dom and linux-perf/dromaeo regression vary from 0 to 5%

FYI. Zero initialization:
Binary size increase: 1.05%
linux-perf/speedometer2 Total Score: 0.5% faster
linux-perf/blink_perf.dom and linux-perf/dromaeo from up 12% speedup
                                                 to 4.5% slowdown

Also, I see a bunch of followup bugs to fix 5-15% regressions in things like VP9 encode, CSS, etc (disabling the flag for those I think).

The perfherder results need more runs; most are "medium" confidence - small differences need more runs for statistical confidence. I see small regressions is most speedindex numbers, and other performance numbers, typically in the 1-6% range.

Some focused benchmarks show large regressions, like WebAudio - up to 33%: https://treeherder.mozilla.org/perfherder/comparesubtest?originalProject=try&newProject=try&newRevision=7f330cc417d914f6fd6a0107307884712a5bca7b&originalSignature=3442639&newSignature=3442639&framework=13&originalRevision=ef2d1eaa5a80d405df665726cc3e6497289db79d&page=1

Taking this sort of hit on pageload would be concerning, and would require a lot more looking and thought about the impact (and pointwise mitigations/disabling).

Alex Thayer [:alexical] (she/her)

Comment 14

•

3 years ago

•

Edited

(In reply to Doug Thayer [:dthayer] (he/him) from comment #10)

Quick drive-by question: I'm wondering what this does with AutoTArray (and similar things like nsAutoString) - my guess would be that initialization of its stack buffer is something that doesn't tend to get optimized out and actually affects us?

Yeah - here's an example: https://searchfox.org/mozilla-central/rev/20ebe9f39e8790e2db123f89fab3293dc0a796ac/dom/base/nsFrameLoader.cpp#1946

Here's the important bit of disassembly around this line in the shippable build with -ftrivial-auto-var-init:

...
 60056d3: ba 00 01 00 00        mov    $0x100,%edx
 60056d8: 48 89 df              mov    %rbx,%rdi
 60056db: be aa 00 00 00        mov    $0xaa,%esi
 60056e0: ff 15 12 15 eb 03     callq  *0x3eb1512(%rip)        # 9eb6bf8 <memset@GLIBC_2.2.5>
 60056e6: 41 0f 10 84 24 68 02  movups 0x268(%r12),%xmm0
...

We memset the 256-byte buffer (16 nsIDs which are each 16 bytes) to 0xaa before passing it into nsISHistory::RemoveEntries. The joke here is that we're only filling it with one item to begin with and nsISHistory::RemoveEntries doesn't append to the ids array and then we don't do anything with the array afterwards.

Can we decorate AutoTArray's mAutoBuf with __attribute((uninitialized)) to get the compiler to explicitly leave it uninitialized? In these types of cases of stack-storage container types I think that's relatively safe.

EDIT: it appears the uninitialized attribute only works for local variables. Rats.

Jesse Schwartzentruber (:truber)

Comment 15

•

3 years ago

Assuming we can't mitigate pattern initialization performance enough to make it feasible everywhere, another option would be to use pattern-init in nightly/beta and zero-init in release -- assuming the latter is better optimized based on chromium's measurements in comment 13. That way we can't unintentionally rely on zero-init semantics but still get the security benefits of initialization.

Andrew Creskey [:acreskey]

Comment 16

•

3 years ago

Tyson, it's interesting that the perfherder comparison does flag some reproducible regressions.
I would like to see how this comparison does on a Windows shippable (aka PGO) build, where the majority of users are.

Also, can we include the Talos performance tests? If we ask in #perftest:mozilla.org we should be able to get some help on the useful tests.

Dave Hunt [:davehunt] [he/him] ⌚BST

Comment 17

•

3 years ago

(In reply to Tyson Smith [:tsmith] from comment #7)

What is the threshold or upper limit of the acceptable performance regression?

There is no such threshold defined. Perfherder alerts on regressions exceeding 2%. Regressions of over 100ms are more likely to be perceived by users, however we risk accumulation of smaller regressions contributing to a perceivable degradation over time.

How can the performance regressions identified by the above perfherder link be investigated further?

The jobs can be retriggered with the profiler enabled, which can help with investigation.

Flags: needinfo?(dave.hunt)

Tyson Smith [:tsmith]

Updated

•

3 years ago

Depends on: 1769118

Tyson Smith [:tsmith]

Updated

•

3 years ago

Depends on: 1769128

Tyson Smith [:tsmith]

Updated

•

3 years ago

Depends on: 1771223

Alex Thayer [:alexical] (she/her)

Updated

•

3 years ago

Depends on: 1771996

BMO Automation

Updated

•

2 years ago

Product: Firefox Build System → Developer Infrastructure

BMO Automation

Updated

•

2 years ago

Severity: normal → S3

[:sergesanspaille]

Comment 18

•

2 years ago

While doing some extra bibliography on the subject, I realized that clang supports the attribute __attribute__((uninitialized)) which means that if the source of regression can be identified, and the initialization here is not needed, there is a way out, which is great.

[:sergesanspaille]

Comment 19

•

2 years ago

The following methodology makes it possible to detect individual stack allocation that would require some care (i.e. either code rewrite or manual annotation) to not be automatically initialized:

MOZCONFIG=./local-llvm.mozconfig ./mach build

where local-llvm.mozconfig contains

ac_add_options --enable-optimize="-O2  -ftrivial-auto-var-init=pattern -g -Rpass-missed=annotation-remarks"`

which outputs, for instance

0:16.45 /home/ssp/sources/mozilla-unified/security/nss/lib/certhigh/ocspsig.c:563:22: remark: Call to memset inserted by -ftrivial-auto-var-init. Memory operation size: 48 bytes.
 0:16.45  Written Variables: response (48 bytes). [-Rpass-missed=annotation-remarks]
 0:16.45     CERTOCSPResponse response;

Would that make sense to go through these?

[:sergesanspaille]

Comment 20

•

2 years ago

(In reply to [:sergesanspaille] from comment #19)

The following methodology makes it possible to detect individual stack allocation that would require some care (i.e. either code rewrite or manual annotation) to not be automatically initialized:
MOZCONFIG=./local-llvm.mozconfig ./mach build
where local-llvm.mozconfig contains
ac_add_options --enable-optimize="-O2  -ftrivial-auto-var-init=pattern -g -Rpass-missed=annotation-remarks"`
which outputs, for instance
0:16.45 /home/ssp/sources/mozilla-unified/security/nss/lib/certhigh/ocspsig.c:563:22: remark: Call to memset inserted by -ftrivial-auto-var-init. Memory operation size: 48 bytes.
 0:16.45  Written Variables: response (48 bytes). [-Rpass-missed=annotation-remarks]
 0:16.45     CERTOCSPResponse response;
Would that make sense to go through these?

Flags: needinfo?(twsmith)

Tyson Smith [:tsmith]

Comment 21

•

2 years ago

(In reply to [:sergesanspaille] from comment #20)

Would that make sense to go through these?

I think so, if it is not an unreasonable amount of effort. We will likely come across some potential security issues. I wonder what the ratio of code rewrite vs manual annotation will be.

Let's track this work in another bug. Can you please create a tracking bug and link issues to it when you have time?

Flags: needinfo?(twsmith) → needinfo?(sguelton)

Tom Ritter [:tjr]

Comment 22

•

2 years ago

•

Edited

So Serge's technique lists all places where we are adding a memset call (and potentially introducing a slowdown) - we don't have to manually edit the code or add an annotation (the flag is adding the memset call for us) - we only need to do that if it provides us a benefit (i.e. fixes a perf regression).

If we can cross-reference the list with code coverage we might be able to identify which of those are really hot paths and potentially be the source of measurable slowdown...

[:sergesanspaille]

Comment 23

•

2 years ago

Attached file trivial-auto-var-init.log — Details

Flags: needinfo?(sguelton)

[:sergesanspaille]

Comment 24

•

2 years ago

Attached to this comment a sorted list of places were auto-init have been added. I don't have coverage info, but some values are frightening. I'll give a try at removing the most unneeded ones. I've opened https://bugzilla.mozilla.org/show_bug.cgi?id=1808891 to track those.

Henri Sivonen (:hsivonen)

Comment 25

•

2 years ago

That way we can't unintentionally rely on zero-init semantics but still get the security benefits of initialization.

Why shouldn't we rely on zero-init semantics? We're using clang only these days, but MSVC&GCC also supporting zero-init suggests that it's plausible enough as a long-term multi-compiler feature that it doesn't paint us in a corner. (Personally, I like to have faith that the committee will eventually do the reasonable thing and accept the paper to hoist zero init to the standard.)

A productivity anecdote to balance against performance issues:

For reasons, bug 543435 went on hold for a while. When I resumed work and rebased, all tests were orange even though reftests and crashtests as well as most WPTs had been green previously. I couldn't bisect, because the mach bootstrap toolchain hosting wasn't valid far enough back. I saw that the failure was in Marionette code that was waiting for the initial browser window to open, so that explained why all test suites were affected. I tried undoing the most recent changes to my patch. Didn't help. I tried investigating recent changes to Marionette. Nope. In Pernosco, I saw that browser.xhtml never fired its load event. This was a plausible failure mode that could arise from the changes in my patch interacting badly with some change done on trunk meanwhile. While failing to find anything obvious, I found another issue and got a bit ahead of myself trying to fix that other thing before finding the root cause. That other attempt caused an assertion to fire in a plausible way. However, when I tried to record it, a different assertion fired. I debugged that in Pernosco. By analyzing where a weird value came from, I eventually realized that I had failed to initialize one bit in a field of adjacent 1-bit bools. clang had made the effort to let the bit show through from whatever was there previously instead of making use of the UB to just do a wide zero write across the uninitialized bit and adjacent zero-initialized bits. Either this had changed between mach bootstrap-provided clang versions or the surrounding code had changed what showed through the uninitialized bit.

The failure mode involved plausible incorrect rabbit holes, I bothered people in meetings about the issue to get ideas, the problem was resolved partly by luck (that I tried to advance something different before debugging the problem at had to completion), and the whole productivity-harming thing could have been avoided by zero-initializing by default.

Henri Sivonen (:hsivonen)

Comment 26

•

2 years ago

It's been pointed out to me that the clang flag here and the paper only cover automatic storage duration and the bitfield from comment 25 was a member of a heap-allocated object. Anyway, the general point of what kind of grief uninitialized values can cause should still be relevant.

[:sergesanspaille]

Comment 27

•

1 years ago

Extra data point: major clang users have switched to use -ftrivial-auto-var-init=pattern in debug build and -ftrivial-auto-var-init=zero in release builds, leading clang to support the later while it was being discouraged at some point. There's a very nice thread on the associated RFC

https://discourse.llvm.org/t/making-ftrivial-auto-var-init-zero-a-first-class-option/55143

I''l compare the two approach soon on firefox builds.

[:sergesanspaille]

Comment 28

•

1 years ago

Current speedometer3 results with -ftrivial-auto-var-init=pattern:

https://treeherder.mozilla.org/perfherder/compare?originalProject=try&originalRevision=41e6e1aeac9783eb3d7d95a5c955f5eeea0043b8&newProject=try&newRevision=cff9c5bb4e7b20baa6b2081d2de137c9fdcba0a1&page=1&framework=13

Current speedometer3 results with -ftrivial-auto-var-init=zero:

https://treeherder.mozilla.org/perfherder/compare?originalProject=try&originalRevision=41e6e1aeac9783eb3d7d95a5c955f5eeea0043b8&newProject=try&newRevision=e89cf1dd9410eed20881f2160bdb68aff1ee0b8a&page=1&framework=13

Basically, and as expected, it's fatser to initialize to zero rather than a pattern. Maybe surprisingly, we even get a speedup on Linux by doing so, probably because it fills the cache in a more linear pattern (educated guess).

Having a lot at build metrics, the generated binary is also smaller, so that really looks good. And we're left with a -1.5% speedometer3 regression on Windows.

Chris Peterson [:cpeterson]

Comment 29

•

1 year ago

Chrome is removing -ftrivial-auto-var-init from (AV1 decoder) libaom's compilation flags because it added over 13% overhead in local tests compared with standalone libaom:

https://bugs.chromium.org/p/chromium/issues/detail?id=977230#c141