Open Bug 1031653 Opened 10 years ago Updated 2 years ago

Always compile all C and C++ code with -fwrapv

Categories

(Firefox Build System :: General, enhancement, P3)

enhancement

Tracking

(Not tracked)

People

(Reporter: jruderman, Assigned: andi)

References

(Blocks 1 open bug)

Details

(Keywords: platform-parity, sec-want)

Attachments

(1 file)

[Split from bug 919486]

In "standard C++", signed integer overflow is undefined behavior. This leads compilers to do surprising things like "optimizing out" explicit overflow checks that happen too late or themselves overflow. Sometimes these overflow checks are needed for memory safety.

We should turn on -fwrapv or -fno-strict-overflow, switching to a dialect of C++ in which signed integer overflow is defined behavior.  Fixing only the problems we spot with UBSan & STACK is a losing game.  Many other open-source projects, including the Linux Kernel and CPython, made the same decision.

MSVC seems to default to something like -fwrapv: http://blogs.msdn.com/b/oldnewthing/archive/2014/06/27/10537746.aspx#10538032
I can help with talos results as needed.
An upcoming version of MSVC will be similar, exploiting undefined signed overflow by default, but providing a flag to disable these sketchy optimizations.

https://blogs.msdn.microsoft.com/vcblog/2016/05/04/new-code-optimizer/

"Historically, Visual C++ did not take advantage of the fact that the C and C++ standards consider the result of overflowing signed operations undefined. Other compilers are very aggressive in this regard, which motivated the decision to implement some patterns which take advantage of undefined integer overflow behavior. We implemented the ones we thought were safe and didn’t impose any unnecessary security risks in generated code."

"A new undocumented compiler flag has been added to disable these optimizations, in case an application that is not standard-conformant fails: -d2SSAOptimizerUndefinedIntOverflow-."
Somewhat relatedly, it sounds like Android is going to ship UBSan compiled code in their media stack to detect some overflows: http://android-developers.blogspot.fr/2016/05/hardening-media-stack.html
Product: Core → Firefox Build System
Signed integer UB continues to be a nuisance to developers and forces us
to write less efficient code just to avoid it.  That is, because this UB
is exploited by compilers to generate bogus code that can create security
issues when an overflow occurs (bug 1292443), we write less performant code
that is even slower than if the compiler just didn't exploit this UB in
the first place.  IOW, this compiler optimization is either
counter-productive or unsafe.

I feel rather strongly that we should add -fwrapv to all our builds.
OS: Mac OS X → All
Hardware: x86_64 → All
Summary: Build with -fwrapv when using gcc/clang → Always compile all C and C++ code with -fwrapv
Fwiw, there seems to be some momentum recently towards actually making
signed integer arithmetic defined as two-complements in the C/C++ standards.
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0907r0.html
That would be wonderful, but in the meantime we need -fwrapv.
#include "mozilla/WrappingOperations.h" implements guaranteed-to-wrap operations.  I also have much of a patch for a Wrapping<T> class along similar lines to Rust's wrapping<T> struct, that I haven't found time to find a first easy use to replace.

We have both signed- and unsigned-overflow sanitizer build options:

  --enable-signed-overflow-sanitizer
  --enable-unsigned-overflow-sanitizer

that work just fine with clang (and I think maybe the former with gcc, memory hazy on the point).  The JS engine is very nearly free of signed-overflow errors.  We hit none on any of our tests.  The only signed overflows I'm aware of, are in some gnarly date code where I don't even know what we *want* (tho WrappingAdd/Subtract is an easy hackaround if demanded).

We also have blacklists of files and functions with known overflows, or functions that we don't want to impose overflow-checking costs on:

  build/sanitizers/ubsan_signed_overflow_blacklist.txt
  build/sanitizers/ubsan_unsigned_overflow_blacklist.txt

Building with the above options uses these files.  The blacklists may seem long, but I think they're probably mostly lies -- the majority of the signed exemptions are really for *unsigned* overflow.  I just haven't had time to sort through them.  (The two files unfortunately were combined at one time, and when I split them I lacked the time to go through the duplicates to determine *which* blacklist each entry belonged in.)

IMO signed integer overflow is not an intractable problem to detect on treeherder (even in the field, if we wanted to ship nightlies with sanitizer support), and with a little examination it would not be hard to add the sanitizer flags to our builds to run on treeherder.
(The unsigned-overflow sanitizer is offered on the theory that unsigned overflows are often unintended bugs.  I mostly doubt the theory -- particularly because it fails on the common

  unsigned x = ..;
  while (x-- > 0)
    ...;

pattern, but there it is.

Also -- #include "mozilla/Attributes.h" offers attributes that will disable either overflow sanitizer on a per-function basis.  Hash functions, for example, routinely will have unsigned overflows, and such functions should generally -- after inspection -- be tagged with MOZ_NO_SANITIZE_UNSIGNED_OVERFLOW.  And in super-hot functions that can't tolerate signed-overflow checks, MOZ_NO_SANITIZE_SIGNED_OVERFLOW also exists to disable the checks.)
According to Botond, the "Signed integers are two’s complement" proposal has been accepted for C++20:
https://botondballo.wordpress.com/2018/06/20/trip-report-c-standards-meeting-in-rapperswil-june-2018/

So, since this will happen anyway in a future compiler we might as well add -fwrapv now.
This sounds pretty reasonable. Per comment 3 we should also add –d2UndefIntOverflow– for MSVC. If there are specific bits of code where we've manually written code to avoid UB it might be interesting to compare the currently generated code and the generated code with the manual workarounds removed, but with -fwrapv so we could have concrete evidence of your theory.

In any event, if the C++ standard is moving in this direction then I can't think of any sensible counterargument.
(In reply to Mats Palmgren (:mats) from comment #10)
> According to Botond, the "Signed integers are two’s complement" proposal has
> been accepted for C++20:
> https://botondballo.wordpress.com/2018/06/20/trip-report-c-standards-meeting-
> in-rapperswil-june-2018/
> 
> So, since this will happen anyway in a future compiler we might as well add
> -fwrapv now.

Note that adding -fwrapv is still a behavior change, even under the accepted proposal:

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0907r2.html

"Status-quo[:] If a signed operation would naturally produce a value that is not within the range of the result type, the behavior is undefined. The author had hoped to make this well-defined as wrapping (the operations produce the same value bits as for the corresponding unsigned type), but WG21 had strong resistance against this."
Sigh, it's not all that I hoped for then.

I still think adding -fwrapv is the sensible thing to do here.

Can we please add -fwrapv to our default compilation flags?
IMHO, it's unacceptable that we have UB in our code when we know it can cause exploitable security issues.

Manually wrapping every arithmetic operation (using WrappingAdd or NSCoordSaturatingAdd etc) in code
that mostly deals with manipulating coordinates (Layout, Graphics etc) isn't a tenable solution.

I think this is a matter of adding a line like this, or maybe in security_hardening_cflags instead, depending on whether we want to allow others to compile without it I suppose.

But we should probably run this through raptor/talos at the very least.

Assignee: nobody → mhentges
Status: NEW → ASSIGNED

to use raptor/talos, I would do a large try push before and one after. This looks to be linux/windows/osx - will the affect android as well?

ignoring android, I would do something like this:

hg update central
./mach try fuzzy -q 'test-linux1804 raptor' --rebuild 6
hg update <rev> # update to change made
./mach try fuzzy -q 'test-linux1804 raptor' --rebuild 6

once this is done you have two try pushes and use the perfherder compare to find any hints of what changed.

of course I would recommend running raptor, talos, browsertime; and platforms: test-linux1804, test-windows10-64, test-macosx1015-64.

If you are testing on android, hardware is limited, so I would recommend finding a weekend to do a before/after.

I don't have the cycles for this at the moment, though this does sound valuable to do in the future.

When I originally assigned myself to this, it seemed like a valuable improvement for the C++ folks.
However, when the build team does have time to commit to such build improvements, I'd like to have a good chat with some C++ build peers to ensure that the work that's done aligns with their perception of the most high-impact changes.

Assignee: mhentges → nobody
Status: ASSIGNED → NEW
Priority: -- → P3

(In reply to Andi [:andi] from comment #20)

Default mozilla-central - https://treeherder.mozilla.org/jobs?repo=try&revision=55f80843626e7cb609a0049f0b2e64c583865a23
With the patch - https://treeherder.mozilla.org/jobs?repo=try&revision=e3b0274f66c32879a9c3607ac728d77ecdfb58d6

The pushes only contain raptor on Linux, we should do more suites (raptor, talos, browsertime), and more platforms (linux, win, mac).

(In reply to Marco Castelluccio [:marco] from comment #21)

(In reply to Andi [:andi] from comment #20)

Default mozilla-central - https://treeherder.mozilla.org/jobs?repo=try&revision=55f80843626e7cb609a0049f0b2e64c583865a23
With the patch - https://treeherder.mozilla.org/jobs?repo=try&revision=e3b0274f66c32879a9c3607ac728d77ecdfb58d6

The pushes only contain raptor on Linux, we should do more suites (raptor, talos, browsertime), and more platforms (linux, win, mac).

of course, these are only some preliminary runs.

(In reply to Chris Peterson [:cpeterson] from comment #23)

Here is the Chrome bug to enable -fwrapv and/or -ftrapv:

https://bugs.chromium.org/p/chromium/issues/detail?id=1166960

It seems no one has been working on it for a while now. What do you make of the above results?

As discussed with Marco, if this is causing performance issues, we can also look into enabling this partially on a directory basis or excluding certain parts of the codebase until the performance issues can be resolved (e.g. I assume the layout subdir might cause more issues than other code).

We could also profile more accurately which parts of our code are exhibiting most slowdown from this. According to Google, many options that cause these slowdowns actually only cause major slowdowns through a small portion of the overall code, so we should consider going this route as well.

Assignee: nobody → bpostelnicu
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: