Closed Bug 1494207 Opened Last year Closed Last year

ASan reports ILL when UBSan is enabled

Categories

(Firefox Build System :: General, defect)

defect
Not set

Tracking

(firefox64 fixed)

RESOLVED FIXED
mozilla64
Tracking Status
firefox64 --- fixed

People

(Reporter: tsmith, Assigned: tsmith)

References

Details

Attachments

(2 files, 2 obsolete files)

It appears in cases where we are intentionally (and elsewhere?) crashing via a null deference we get an unexpected SIGILL on the 'ud2' instruction.

When building with UBSan it appear that clang makes use of the 'ud2' instruction.

For example:
sigill was received when executing instruction 'ud2'

0x00007f1b13ce3aee in AnnotateMozCrashReason ()
    at /home/user/code/mozilla-central/objdir-ff-asan-ubsan/dist/include/mozilla/Assertions.h:40
40	  gMozCrashReason = reason;
(gdb) bt
#0  0x00007f1b13ce3aee in AnnotateMozCrashReason ()
    at /home/user/code/mozilla-central/objdir-ff-asan-ubsan/dist/include/mozilla/Assertions.h:40
#1  0x00007f1b13ce3aee in ProcessingError() ()
    at /home/user/code/mozilla-central/dom/ipc/ContentChild.cpp:2461
#2  0x00007f1b0d16193c in MaybeHandleError() ()
    at /home/user/code/mozilla-central/ipc/glue/MessageChannel.cpp:2641
#3  0x00007f1b0d15d9d5 in DispatchMessage() ()
...

I'm not exactly sure how to work around this at the moment. It's not a show stopper but it does complicate bucketing issues and potentially interferes with current tests. ATM this only affect Linux ASan opt and Linux ASan fuzzing opt builds by default (anything using 'ac_add_options --enable-address-sanitizer' combined with 'ac_add_options --enable-address-sanitizer').

I can try adding "Assertions.h" to the compile time blacklist but I don't think this prevent the use of 'ud2'.

I have seen reports of using different optimization levels (<-O2) can prevent this from happening but I have yet to confirm it. And I'm not sure that is the solution we'd want for these builds.

Any other suggestions?
I think the comment on this bug report explains it.

https://bugs.llvm.org/show_bug.cgi?id=27432

I assume adding check "return" to --enable-undefined-sanitizer would catch it properly.
Assignee: nobody → twsmith
Or I think that was the wrong explanation for this particular case, and it should rather be check "null".

I suggest reading point 4. of "Clang's Approach to Handling Undefined Behavior" at the link below, which also seems to confirm that it's only with optimized code.

http://blog.llvm.org/2011/05/what-every-c-programmer-should-know_21.html

Anyway, there is lots more of those by searching llvm.org for ud2.
I don't understand why it seems to happen with UBSAN when it also should without it, so ignore my comments.
(In reply to Tyson Smith [:tsmith] from comment #0)
> It appears in cases where we are intentionally (and elsewhere?) crashing via
> a null deference we get an unexpected SIGILL on the 'ud2' instruction.

Why is that bad? We're intentionally aborting, right? Does it matter if it happens via null or ud2?

(I mean these as genuine questions -- not trying to sound confrontational. Maybe things are different on Linux. On Windows, it's all treated the same: ud2 pops up breakpad or a debugger etc. ud2 is even emitted by rustc as the implementation of `panic` on that platform.)
decoder, I agree with dmajor here (as per our conversation in slack yesterday). I can explore alternatives but can you please indicate what problems have been caused do to this. I believe this should be handled at the crash bucketing level. From what I can tell what matters for instance is that we crash/abort (stop execution) for example at AnnotateMozCrashReason(). The way we do it, segv at null or sigill or abort, really shouldn't matter at the code/execution level.

Right now I think our alternatives are:
1) have stand alone UBSan and ASan builds for T1 builds so tests are run against both to prevent regressions
2) try to patch/hack MOZ_CRASH() etc to fail with a segv at null like before when we compile with UBSan enabled. I don't know how this is possible or even if it is.
Flags: needinfo?(choller)
(In reply to David Major [:dmajor] from comment #4)
> (In reply to Tyson Smith [:tsmith] from comment #0)
> > It appears in cases where we are intentionally (and elsewhere?) crashing via
> > a null deference we get an unexpected SIGILL on the 'ud2' instruction.
> 
> Why is that bad? We're intentionally aborting, right? Does it matter if it
> happens via null or ud2?
> 

It matters because for ud2/ILL we won't know if this was a NULL crash or not and this is crucial for bucketing and other analysis. Essentially all of our crash buckets use crash addresses and the fact that ud2/ILL incorrectly shows up here breaks all of them.

I also believe that this is not intended behavior on UB's side, but we can only know if we start with a smaller example than Firefox. I will see if I can get it to reproduce locally with a smaller example.
Flags: needinfo?(choller)
Does "bucketing" refer to crash-stats, or something else? And what does it do with rust panics?
(In reply to David Major [:dmajor] from comment #7)
> Does "bucketing" refer to crash-stats, or something else? And what does it
> do with rust panics?

It refers to our own crash processing system in fuzzing. We do also process rust panics, I'm not sure what we implemented for them.
For clarity this only seems to be happening in scenarios where we are force crashing. Other null or near null crashes have the expected SEGV output from ASan.
Just for reference, the problem is almost certainly in `MOZ_REALLY_CRASH`, which is where `MOZ_CRASH` and most of our other assertions bottom out:
https://dxr.mozilla.org/mozilla-central/rev/d03b538b6b417ba892d0a92fd693945b741246e1/mfbt/Assertions.h#224
Attached file test.cpp
decoder and I have narrowed it down to building this testcase with:
$ clang++ -fsanitize=bounds -o test test.cpp
The interesting part about this is also that if you build with full UBSan (-fsanitize=undefined), the ILL goes away and it SEGVs again as it should. I don't think this might really be intentional and it if switches back and forth we need to figure out why.
The UBSan docs mention that you can use `__attribute__((no_sanitize("undefined")))` on functions to disable specific checks, but we intentionally have `MOZ_CRASH` defined as a macro so it gets expanded inline so the top frame of the crash stack is the right function. If we had a way to disable that on a block we could presumably do that.

http://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html#id10
(In reply to Ted Mielczarek [:ted] [:ted.mielczarek] from comment #13)
> The UBSan docs mention that you can use
> `__attribute__((no_sanitize("undefined")))` on functions to disable specific
> checks, but we intentionally have `MOZ_CRASH` defined as a macro so it gets
> expanded inline so the top frame of the crash stack is the right function.
> If we had a way to disable that on a block we could presumably do that.
> 
> http://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html#id10

I tried this already with the MOZ_NoReturn function instead using the macro rather than using it directly. But that didn't change the situation (Which makes it even harder for me to understand what is going on).
Here are some funny results:


-fsanitize=address -fsanitize=bounds -fno-sanitize-recover=all
    => ILL

-fsanitize=address -fsanitize=undefined -fno-sanitize-recover=all
    => SEGV

-fsanitize=address -fsanitize=alignment,bool,builtin,bounds,enum,float-cast-overflow,float-divide-by-zero,function,integer-divide-by-zero,nonnull-attribute,null,object-size,pointer-overflow,return,returns-nonnull-attribute,shift,signed-integer-overflow,unreachable,vla-bound,vptr -fno-sanitize-recover=all
    => ILL


The last example lists all of the flags separately that would otherwise be enabled by -fsanitize=undefined. This smells like a bug to me. Removing "bounds" from the list makes it go back to SEGV btw.
So :dmajor and I worked through this and it appears that the NULL-write itself that we use for MOZ_REALLY_CRASH is the problem. The compiler randomly decides to rewrite that depending on the UBSan flags I use.

However, if I switch the address from NULL to 0x1, it segfaults reliably. I now suggested to make a patch just for MOZ_UBSAN to build with 0x1 instead, but :dmajor pointed out that we should maybe consider changing this for regular builds as well because Clang might randomly decide to compile this differently even without UBSan.

Nathan, do you have any opinion on this?
Flags: needinfo?(nfroyd)
I can't find the language in the spec that says nullptr dereferences are undefined behavior.  It does explicitly say that for pointer-to-members, and I'm sure it says it for normal pointers...I just can't find it at the moment.  So I think it's reasonable that UBSan flags that.

I'm less excited about switching to 0x1 everywhere, and would rather not do it unless we had to.
Flags: needinfo?(nfroyd)
I'd be fine making this UBSan only (and it would be a fairly simple patch, too).

:dmajor, did you have any particular problems or any particular part of the spec in mind that should cause us to change it for all the builds? If so, would you mind discussing this with Nathan? I'll be happy to make a patch for either of the two if you both can decide on which strategy it should be :) Thanks!
Flags: needinfo?(dmajor)
I defer to Nathan.
Flags: needinfo?(dmajor)
Attached patch crash_addr.patch (obsolete) — Splinter Review
Here is decoder's patch with a comment added.
Attachment #9013040 - Flags: review?(nfroyd)
Comment on attachment 9013040 [details] [diff] [review]
crash_addr.patch

Review of attachment 9013040 [details] [diff] [review]:
-----------------------------------------------------------------

r=me with the comment change below.

::: mfbt/Assertions.h
@@ +227,5 @@
> + * crash. NULL is preferred however if for some reason NULL cannot be used
> + * this makes choosing another value possible.
> + *
> + * In the case of UBSan certain checks, bounds specifically, cause the compiler
> + * to emit the 'ud2' instruction. This causes forced crashes to manifest as

"...emit the instruction when storing to 0x0."?
Attachment #9013040 - Flags: review?(nfroyd) → review+
Attached patch crash_addr.patch (obsolete) — Splinter Review
Updated wording.
Attachment #9013040 - Attachment is obsolete: true
Attachment #9013336 - Flags: review+
Keywords: checkin-needed
Hi, When trying to apply your patch in order to land it I was given this error.

applying crash_addr.patch
patching file mfbt/Assertions.h
Hunk #1 FAILED at 220
1 out of 1 hunks FAILED -- saving rejects to file mfbt/Assertions.h.rej
patch failed, unable to continue (try -v)
patch failed, rejects left in working directory
errors during apply, please fix and qrefresh crash_addr.patch

I think it needs a rebase.
Flags: needinfo?(twsmith)
Keywords: checkin-needed
Attached patch crash_addr.patchSplinter Review
Attachment #9013336 - Attachment is obsolete: true
Flags: needinfo?(twsmith)
Attachment #9014597 - Flags: review+
Keywords: checkin-needed
Pushed by dluca@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/463cee9e37d9
Add MOZ_CRASH_WRITE_ADDR to avoid ILL with UBSan. r=nfroyd
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/463cee9e37d9
Status: NEW → RESOLVED
Closed: Last year
Resolution: --- → FIXED
Target Milestone: --- → mozilla64
I think ubsan's behavior is valid. 
If you want to make sure you get a SEGV, do something like this:

static volatile int *mynullptr = 0;
...
      *mynullptr = 42;
You need to log in before you can comment on or make changes to this bug.