Open Bug 1515213 Opened 6 years ago Updated 1 year ago

Investigation usage of clang -ftrivial-auto-var-init flag

Categories

(Developer Infrastructure :: Source Code Analysis, task, P2)

Tracking

(Not tracked)

People

(Reporter: pauljt, Unassigned)

References

(Depends on 3 open bugs, Blocks 1 open bug, )

Details

(Keywords: sec-want, Whiteboard: investigation)

Attachments

(1 file)

Bug 1514965 has the details here - I'm just filing this bug so we have a tracking bug in our queue for this. Decoder suggested it might be a useful activity for our team to pursue.
Microsoft enabled a similar MSVC flag for auto initialization (with default value 0) for Windows kernel code. If we used clang on all platforms, we could use -ftrivial-auto-var-init=zero to zero-initialize auto variables for safety, not just debugging. https://twitter.com/JosephBialek/status/1062774315098112001 "Please join the Windows kernel in wishing farewell to uninitialized plain-old-data structs on the stack. As of today's WIPFast build, any Windows code compiled with /kernel also gets compiled with InitAll, a compiler security feature that initializes POD structs at declaration. Between 2017 and mid 2018, this feature would have killed 49 MSRC cases that involved uninitialized struct data leaking across a trust boundary. It would have also mitigated a number of bugs involving uninitialized struct data being used directly. To date, we are seeing noise level performance regressions caused by this change. We accomplished this by improving the compilers ability to kill redundant stores. While everything is initialized at declaration, most of these initializations can be proven redundant and eliminated. We are conservative with how we roll this out. Starting with kernel components and than expanding scope. It's a bit scary from a perf perspective and I'd like to avoid fire drills. Kernel components are by far the most impacted in MSRC case data so it's a good starting point."
Type: enhancement → task
Priority: -- → P2
Depends on: 1546873
Component: Security: Review Requests → Source Code Analysis
Product: Firefox → Firefox Build System

Here is the Chrome bug to enable -ftrivial-auto-var-init=pattern in official release builds. It's currently enabled in only non-official builds for desktop and ChromeOS, but not Android.

https://bugs.chromium.org/p/chromium/issues/detail?id=977230

Keywords: sec-want

Here is a perfherder comparison with -ftrivial-auto-var-init=pattern enabled on an opt build.

https://treeherder.mozilla.org/perfherder/compare?originalProject=try&originalRevision=ef2d1eaa5a80d405df665726cc3e6497289db79d&newProject=try&newRevision=7f330cc417d914f6fd6a0107307884712a5bca7b&framework=13&page=1

Additional notes:

  • In a recent fuzzing collaboration meeting it was mentioned that efforts to run MSan instrumented builds of Chromium will likely not continue. Comment #6 has a link to the Chromium bug to enable -ftrivial-auto-var-init=pattern

  • MSan and Valgrind are not reasonable methods of identifying issues. MSan would required a huge initial investment to get Firefox and all of it's dependencies built and instrumented and would require ongoing maintenance (and as mentioned above is being moved away from by other vendors). Valgrind requires a special build (which is no longer maintained on TC) and requires constant suppression list maintenance. It also adds an unreasonable runtime performance overhead that makes fuzzing much less effective. Both methods depend on test suites and fuzzing to identify issues.

  • Enabling -ftrivial-auto-var-init=pattern should not have ongoing development costs and does not rely on test suites and fuzzing to identify issues to keep users safe. It mitigates risks without treating uninitialized memory usage as defined behavior (preventing developer from actively relying on it).

  • Forced initialization should be optimized out when using optimization levels >0 if memory is properly initialized before use.

  • no-opt builds will suffer a higher performance regression (as mentioned previously) and should be excluded from the list of builds enabling this flag.

Questions that I think need to be answered before moving forward are:

  • What is the threshold or upper limit of the acceptable performance regression?

  • How can the performance regressions identified by the above perfherder link be investigated further?

  • Are there areas of the code that we know should be excluded?

  • Would it be reasonable to enable for Nightly and early Beta builds? This is only meant as an alternative if enabling for all channels is for some reason not an option. This would help shake out any existing latent or intermittent crashes dues to uninitialized memory usage.

    • Would this give us any additional performance metrics?
    • Should we start on a single platform?

Dave, if you have an opinion on this :)

Flags: needinfo?(dave.hunt)

Would it be reasonable to enable for Nightly and early Beta builds? This would help shake out any existing latent or intermittent crashes dues to uninitialized memory usage.

Would it? Wouldn't it paper over issues in Nightly / Early beta, but crash and/or be potentially exploitable in release? (I might be misunderstanding something though)

Quick drive-by question: I'm wondering what this does with AutoTArray (and similar things like nsAutoString) - my guess would be that initialization of its stack buffer is something that doesn't tend to get optimized out and actually affects us?

(In reply to Emilio Cobos Álvarez (:emilio) from comment #9)

Would it be reasonable to enable for Nightly and early Beta builds? This would help shake out any existing latent or intermittent crashes dues to uninitialized memory usage.

Would it? Wouldn't it paper over issues in Nightly / Early beta, but crash and/or be potentially exploitable in release? (I might be misunderstanding something though)

At the moment there is no mitigation in place, so all channels are vulnerable. The pattern used may or my not trigger crashes but it is not zero and it's consistent so if it does trigger a crash then it should be much easier to track down the cause. The pattern should also be much easier to eye-ball in crash stats. The intention is not to find the bugs that just a bonus :)

Of course the intention is to enable in all channels.

Ah i see. As long as it makes stuff more likely to crash on nightly then it'd be an improvement i guess :)

From the Chrome bug:

Binary size increase: 1.5%
linux-perf/speedometer2 Total Score: 2.5% slower
linux-perf/blink_perf.dom and linux-perf/dromaeo regression vary from 0 to 5%

FYI. Zero initialization:
Binary size increase: 1.05%
linux-perf/speedometer2 Total Score: 0.5% faster
linux-perf/blink_perf.dom and linux-perf/dromaeo from up 12% speedup
                                                 to 4.5% slowdown

Also, I see a bunch of followup bugs to fix 5-15% regressions in things like VP9 encode, CSS, etc (disabling the flag for those I think).

The perfherder results need more runs; most are "medium" confidence - small differences need more runs for statistical confidence. I see small regressions is most speedindex numbers, and other performance numbers, typically in the 1-6% range.

Some focused benchmarks show large regressions, like WebAudio - up to 33%: https://treeherder.mozilla.org/perfherder/comparesubtest?originalProject=try&newProject=try&newRevision=7f330cc417d914f6fd6a0107307884712a5bca7b&originalSignature=3442639&newSignature=3442639&framework=13&originalRevision=ef2d1eaa5a80d405df665726cc3e6497289db79d&page=1

Taking this sort of hit on pageload would be concerning, and would require a lot more looking and thought about the impact (and pointwise mitigations/disabling).

(In reply to Doug Thayer [:dthayer] (he/him) from comment #10)

Quick drive-by question: I'm wondering what this does with AutoTArray (and similar things like nsAutoString) - my guess would be that initialization of its stack buffer is something that doesn't tend to get optimized out and actually affects us?

Yeah - here's an example: https://searchfox.org/mozilla-central/rev/20ebe9f39e8790e2db123f89fab3293dc0a796ac/dom/base/nsFrameLoader.cpp#1946

Here's the important bit of disassembly around this line in the shippable build with -ftrivial-auto-var-init:

...
 60056d3: ba 00 01 00 00        mov    $0x100,%edx
 60056d8: 48 89 df              mov    %rbx,%rdi
 60056db: be aa 00 00 00        mov    $0xaa,%esi
 60056e0: ff 15 12 15 eb 03     callq  *0x3eb1512(%rip)        # 9eb6bf8 <memset@GLIBC_2.2.5>
 60056e6: 41 0f 10 84 24 68 02  movups 0x268(%r12),%xmm0
...

We memset the 256-byte buffer (16 nsIDs which are each 16 bytes) to 0xaa before passing it into nsISHistory::RemoveEntries. The joke here is that we're only filling it with one item to begin with and nsISHistory::RemoveEntries doesn't append to the ids array and then we don't do anything with the array afterwards.

Can we decorate AutoTArray's mAutoBuf with __attribute((uninitialized)) to get the compiler to explicitly leave it uninitialized? In these types of cases of stack-storage container types I think that's relatively safe.

EDIT: it appears the uninitialized attribute only works for local variables. Rats.

Assuming we can't mitigate pattern initialization performance enough to make it feasible everywhere, another option would be to use pattern-init in nightly/beta and zero-init in release -- assuming the latter is better optimized based on chromium's measurements in comment 13. That way we can't unintentionally rely on zero-init semantics but still get the security benefits of initialization.

Tyson, it's interesting that the perfherder comparison does flag some reproducible regressions.
I would like to see how this comparison does on a Windows shippable (aka PGO) build, where the majority of users are.

Also, can we include the Talos performance tests? If we ask in #perftest:mozilla.org we should be able to get some help on the useful tests.

(In reply to Tyson Smith [:tsmith] from comment #7)

  • What is the threshold or upper limit of the acceptable performance regression?

There is no such threshold defined. Perfherder alerts on regressions exceeding 2%. Regressions of over 100ms are more likely to be perceived by users, however we risk accumulation of smaller regressions contributing to a perceivable degradation over time.

  • How can the performance regressions identified by the above perfherder link be investigated further?

The jobs can be retriggered with the profiler enabled, which can help with investigation.

Flags: needinfo?(dave.hunt)
Depends on: 1769118
Depends on: 1769128
Depends on: 1771223
Product: Firefox Build System → Developer Infrastructure
Severity: normal → S3

While doing some extra bibliography on the subject, I realized that clang supports the attribute __attribute__((uninitialized)) which means that if the source of regression can be identified, and the initialization here is not needed, there is a way out, which is great.

The following methodology makes it possible to detect individual stack allocation that would require some care (i.e. either code rewrite or manual annotation) to not be automatically initialized:

MOZCONFIG=./local-llvm.mozconfig ./mach build

where local-llvm.mozconfig contains

ac_add_options --enable-optimize="-O2  -ftrivial-auto-var-init=pattern -g -Rpass-missed=annotation-remarks"`

which outputs, for instance

0:16.45 /home/ssp/sources/mozilla-unified/security/nss/lib/certhigh/ocspsig.c:563:22: remark: Call to memset inserted by -ftrivial-auto-var-init. Memory operation size: 48 bytes.
 0:16.45  Written Variables: response (48 bytes). [-Rpass-missed=annotation-remarks]
 0:16.45     CERTOCSPResponse response;

Would that make sense to go through these?

(In reply to [:sergesanspaille] from comment #19)

The following methodology makes it possible to detect individual stack allocation that would require some care (i.e. either code rewrite or manual annotation) to not be automatically initialized:

MOZCONFIG=./local-llvm.mozconfig ./mach build

where local-llvm.mozconfig contains

ac_add_options --enable-optimize="-O2  -ftrivial-auto-var-init=pattern -g -Rpass-missed=annotation-remarks"`

which outputs, for instance

0:16.45 /home/ssp/sources/mozilla-unified/security/nss/lib/certhigh/ocspsig.c:563:22: remark: Call to memset inserted by -ftrivial-auto-var-init. Memory operation size: 48 bytes.
 0:16.45  Written Variables: response (48 bytes). [-Rpass-missed=annotation-remarks]
 0:16.45     CERTOCSPResponse response;

Would that make sense to go through these?

Flags: needinfo?(twsmith)

(In reply to [:sergesanspaille] from comment #20)

Would that make sense to go through these?

I think so, if it is not an unreasonable amount of effort. We will likely come across some potential security issues. I wonder what the ratio of code rewrite vs manual annotation will be.

Let's track this work in another bug. Can you please create a tracking bug and link issues to it when you have time?

Flags: needinfo?(twsmith) → needinfo?(sguelton)

So Serge's technique lists all places where we are adding a memset call (and potentially introducing a slowdown) - we don't have to manually edit the code or add an annotation (the flag is adding the memset call for us) - we only need to do that if it provides us a benefit (i.e. fixes a perf regression).

If we can cross-reference the list with code coverage we might be able to identify which of those are really hot paths and potentially be the source of measurable slowdown...

Flags: needinfo?(sguelton)

Attached to this comment a sorted list of places were auto-init have been added. I don't have coverage info, but some values are frightening. I'll give a try at removing the most unneeded ones. I've opened https://bugzilla.mozilla.org/show_bug.cgi?id=1808891 to track those.

That way we can't unintentionally rely on zero-init semantics but still get the security benefits of initialization.

Why shouldn't we rely on zero-init semantics? We're using clang only these days, but MSVC&GCC also supporting zero-init suggests that it's plausible enough as a long-term multi-compiler feature that it doesn't paint us in a corner. (Personally, I like to have faith that the committee will eventually do the reasonable thing and accept the paper to hoist zero init to the standard.)

A productivity anecdote to balance against performance issues:

For reasons, bug 543435 went on hold for a while. When I resumed work and rebased, all tests were orange even though reftests and crashtests as well as most WPTs had been green previously. I couldn't bisect, because the mach bootstrap toolchain hosting wasn't valid far enough back. I saw that the failure was in Marionette code that was waiting for the initial browser window to open, so that explained why all test suites were affected. I tried undoing the most recent changes to my patch. Didn't help. I tried investigating recent changes to Marionette. Nope. In Pernosco, I saw that browser.xhtml never fired its load event. This was a plausible failure mode that could arise from the changes in my patch interacting badly with some change done on trunk meanwhile. While failing to find anything obvious, I found another issue and got a bit ahead of myself trying to fix that other thing before finding the root cause. That other attempt caused an assertion to fire in a plausible way. However, when I tried to record it, a different assertion fired. I debugged that in Pernosco. By analyzing where a weird value came from, I eventually realized that I had failed to initialize one bit in a field of adjacent 1-bit bools. clang had made the effort to let the bit show through from whatever was there previously instead of making use of the UB to just do a wide zero write across the uninitialized bit and adjacent zero-initialized bits. Either this had changed between mach bootstrap-provided clang versions or the surrounding code had changed what showed through the uninitialized bit.

The failure mode involved plausible incorrect rabbit holes, I bothered people in meetings about the issue to get ideas, the problem was resolved partly by luck (that I tried to advance something different before debugging the problem at had to completion), and the whole productivity-harming thing could have been avoided by zero-initializing by default.

It's been pointed out to me that the clang flag here and the paper only cover automatic storage duration and the bitfield from comment 25 was a member of a heap-allocated object. Anyway, the general point of what kind of grief uninitialized values can cause should still be relevant.

Extra data point: major clang users have switched to use -ftrivial-auto-var-init=pattern in debug build and -ftrivial-auto-var-init=zero in release builds, leading clang to support the later while it was being discouraged at some point. There's a very nice thread on the associated RFC

https://discourse.llvm.org/t/making-ftrivial-auto-var-init-zero-a-first-class-option/55143

I''l compare the two approach soon on firefox builds.

Current speedometer3 results with -ftrivial-auto-var-init=pattern:

https://treeherder.mozilla.org/perfherder/compare?originalProject=try&originalRevision=41e6e1aeac9783eb3d7d95a5c955f5eeea0043b8&newProject=try&newRevision=cff9c5bb4e7b20baa6b2081d2de137c9fdcba0a1&page=1&framework=13

Current speedometer3 results with -ftrivial-auto-var-init=zero:

https://treeherder.mozilla.org/perfherder/compare?originalProject=try&originalRevision=41e6e1aeac9783eb3d7d95a5c955f5eeea0043b8&newProject=try&newRevision=e89cf1dd9410eed20881f2160bdb68aff1ee0b8a&page=1&framework=13

Basically, and as expected, it's fatser to initialize to zero rather than a pattern. Maybe surprisingly, we even get a speedup on Linux by doing so, probably because it fills the cache in a more linear pattern (educated guess).

Having a lot at build metrics, the generated binary is also smaller, so that really looks good. And we're left with a -1.5% speedometer3 regression on Windows.

Chrome is removing -ftrivial-auto-var-init from (AV1 decoder) libaom's compilation flags because it added over 13% overhead in local tests compared with standalone libaom:

https://bugs.chromium.org/p/chromium/issues/detail?id=977230#c141

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: