Closed Bug 1494207 Opened 6 years ago Closed 6 years ago

ASan reports ILL when UBSan is enabled

Tracking

(firefox64 fixed)

Status:

RESOLVED FIXED

Milestone:

mozilla64

Tracking Flags:

Tracking

Status

firefox64

---

fixed

People

(Reporter: tsmith, Assigned: tsmith)

References

Details

Attachments

(2 files, 2 obsolete files)

test.cpp 6 years ago Tyson Smith [:tsmith] 644 bytes, text/plain		Details
crash_addr.patch 6 years ago Tyson Smith [:tsmith] 1.20 KB, patch	froydnj : review+	Details \| Diff \| Splinter Review
crash_addr.patch 6 years ago Tyson Smith [:tsmith] 1.22 KB, patch	tsmith : review+	Details \| Diff \| Splinter Review
crash_addr.patch 6 years ago Tyson Smith [:tsmith] 1.50 KB, patch	tsmith : review+	Details \| Diff \| Splinter Review

Tyson Smith [:tsmith]

Assignee

Description

•

6 years ago

It appears in cases where we are intentionally (and elsewhere?) crashing via a null deference we get an unexpected SIGILL on the 'ud2' instruction.

When building with UBSan it appear that clang makes use of the 'ud2' instruction.

For example:
sigill was received when executing instruction 'ud2'

0x00007f1b13ce3aee in AnnotateMozCrashReason ()
    at /home/user/code/mozilla-central/objdir-ff-asan-ubsan/dist/include/mozilla/Assertions.h:40
40	  gMozCrashReason = reason;
(gdb) bt
#0  0x00007f1b13ce3aee in AnnotateMozCrashReason ()
    at /home/user/code/mozilla-central/objdir-ff-asan-ubsan/dist/include/mozilla/Assertions.h:40
#1  0x00007f1b13ce3aee in ProcessingError() ()
    at /home/user/code/mozilla-central/dom/ipc/ContentChild.cpp:2461
#2  0x00007f1b0d16193c in MaybeHandleError() ()
    at /home/user/code/mozilla-central/ipc/glue/MessageChannel.cpp:2641
#3  0x00007f1b0d15d9d5 in DispatchMessage() ()
...

I'm not exactly sure how to work around this at the moment. It's not a show stopper but it does complicate bucketing issues and potentially interferes with current tests. ATM this only affect Linux ASan opt and Linux ASan fuzzing opt builds by default (anything using 'ac_add_options --enable-address-sanitizer' combined with 'ac_add_options --enable-address-sanitizer').

I can try adding "Assertions.h" to the compile time blacklist but I don't think this prevent the use of 'ud2'.

I have seen reports of using different optimization levels (<-O2) can prevent this from happening but I have yet to confirm it. And I'm not sure that is the solution we'd want for these builds.

Any other suggestions?

u473386

Comment 1

•

6 years ago

I think the comment on this bug report explains it.

https://bugs.llvm.org/show_bug.cgi?id=27432

I assume adding check "return" to --enable-undefined-sanitizer would catch it properly.

Tyson Smith [:tsmith]

Assignee

Updated

•

6 years ago

Assignee: nobody → twsmith

u473386

Comment 2

•

6 years ago

Or I think that was the wrong explanation for this particular case, and it should rather be check "null".

I suggest reading point 4. of "Clang's Approach to Handling Undefined Behavior" at the link below, which also seems to confirm that it's only with optimized code.

http://blog.llvm.org/2011/05/what-every-c-programmer-should-know_21.html

Anyway, there is lots more of those by searching llvm.org for ud2.

u473386

Comment 3

•

6 years ago

I don't understand why it seems to happen with UBSAN when it also should without it, so ignore my comments.

(Away)

Comment 4

•

6 years ago

(In reply to Tyson Smith [:tsmith] from comment #0)
> It appears in cases where we are intentionally (and elsewhere?) crashing via
> a null deference we get an unexpected SIGILL on the 'ud2' instruction.

Why is that bad? We're intentionally aborting, right? Does it matter if it happens via null or ud2?

(I mean these as genuine questions -- not trying to sound confrontational. Maybe things are different on Linux. On Windows, it's all treated the same: ud2 pops up breakpad or a debugger etc. ud2 is even emitted by rustc as the implementation of `panic` on that platform.)

Tyson Smith [:tsmith]

Assignee

Comment 5

•

6 years ago

decoder, I agree with dmajor here (as per our conversation in slack yesterday). I can explore alternatives but can you please indicate what problems have been caused do to this. I believe this should be handled at the crash bucketing level. From what I can tell what matters for instance is that we crash/abort (stop execution) for example at AnnotateMozCrashReason(). The way we do it, segv at null or sigill or abort, really shouldn't matter at the code/execution level.

Right now I think our alternatives are:
1) have stand alone UBSan and ASan builds for T1 builds so tests are run against both to prevent regressions
2) try to patch/hack MOZ_CRASH() etc to fail with a segv at null like before when we compile with UBSan enabled. I don't know how this is possible or even if it is.

Flags: needinfo?(choller)

Christian Holler (:decoder)

Comment 6

•

6 years ago

(In reply to David Major [:dmajor] from comment #4)
> (In reply to Tyson Smith [:tsmith] from comment #0)
> > It appears in cases where we are intentionally (and elsewhere?) crashing via
> > a null deference we get an unexpected SIGILL on the 'ud2' instruction.
> 
> Why is that bad? We're intentionally aborting, right? Does it matter if it
> happens via null or ud2?
> 

It matters because for ud2/ILL we won't know if this was a NULL crash or not and this is crucial for bucketing and other analysis. Essentially all of our crash buckets use crash addresses and the fact that ud2/ILL incorrectly shows up here breaks all of them.

I also believe that this is not intended behavior on UB's side, but we can only know if we start with a smaller example than Firefox. I will see if I can get it to reproduce locally with a smaller example.

Flags: needinfo?(choller)

(Away)

Comment 7

•

6 years ago

Does "bucketing" refer to crash-stats, or something else? And what does it do with rust panics?

Christian Holler (:decoder)

Comment 8

•

6 years ago

(In reply to David Major [:dmajor] from comment #7)
> Does "bucketing" refer to crash-stats, or something else? And what does it
> do with rust panics?

It refers to our own crash processing system in fuzzing. We do also process rust panics, I'm not sure what we implemented for them.

Tyson Smith [:tsmith]

Assignee

Comment 9

•

6 years ago

For clarity this only seems to be happening in scenarios where we are force crashing. Other null or near null crashes have the expected SEGV output from ASan.

(not currently active) Ted Mielczarek

Comment 10

•

6 years ago

Just for reference, the problem is almost certainly in `MOZ_REALLY_CRASH`, which is where `MOZ_CRASH` and most of our other assertions bottom out:
https://dxr.mozilla.org/mozilla-central/rev/d03b538b6b417ba892d0a92fd693945b741246e1/mfbt/Assertions.h#224

Tyson Smith [:tsmith]

Assignee

Comment 11

•

6 years ago

Attached file test.cpp — Details

decoder and I have narrowed it down to building this testcase with:
$ clang++ -fsanitize=bounds -o test test.cpp

Christian Holler (:decoder)

Comment 12

•

6 years ago

The interesting part about this is also that if you build with full UBSan (-fsanitize=undefined), the ILL goes away and it SEGVs again as it should. I don't think this might really be intentional and it if switches back and forth we need to figure out why.

(not currently active) Ted Mielczarek

Comment 13

•

6 years ago

The UBSan docs mention that you can use `__attribute__((no_sanitize("undefined")))` on functions to disable specific checks, but we intentionally have `MOZ_CRASH` defined as a macro so it gets expanded inline so the top frame of the crash stack is the right function. If we had a way to disable that on a block we could presumably do that.

http://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html#id10

Christian Holler (:decoder)

Comment 14

•

6 years ago

(In reply to Ted Mielczarek [:ted] [:ted.mielczarek] from comment #13)
> The UBSan docs mention that you can use
> `__attribute__((no_sanitize("undefined")))` on functions to disable specific
> checks, but we intentionally have `MOZ_CRASH` defined as a macro so it gets
> expanded inline so the top frame of the crash stack is the right function.
> If we had a way to disable that on a block we could presumably do that.
> 
> http://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html#id10

I tried this already with the MOZ_NoReturn function instead using the macro rather than using it directly. But that didn't change the situation (Which makes it even harder for me to understand what is going on).

Christian Holler (:decoder)

Comment 15

•

6 years ago

Here are some funny results:


-fsanitize=address -fsanitize=bounds -fno-sanitize-recover=all
    => ILL

-fsanitize=address -fsanitize=undefined -fno-sanitize-recover=all
    => SEGV

-fsanitize=address -fsanitize=alignment,bool,builtin,bounds,enum,float-cast-overflow,float-divide-by-zero,function,integer-divide-by-zero,nonnull-attribute,null,object-size,pointer-overflow,return,returns-nonnull-attribute,shift,signed-integer-overflow,unreachable,vla-bound,vptr -fno-sanitize-recover=all
    => ILL


The last example lists all of the flags separately that would otherwise be enabled by -fsanitize=undefined. This smells like a bug to me. Removing "bounds" from the list makes it go back to SEGV btw.

Christian Holler (:decoder)

Comment 16

•

6 years ago

So :dmajor and I worked through this and it appears that the NULL-write itself that we use for MOZ_REALLY_CRASH is the problem. The compiler randomly decides to rewrite that depending on the UBSan flags I use.

However, if I switch the address from NULL to 0x1, it segfaults reliably. I now suggested to make a patch just for MOZ_UBSAN to build with 0x1 instead, but :dmajor pointed out that we should maybe consider changing this for regular builds as well because Clang might randomly decide to compile this differently even without UBSan.

Nathan, do you have any opinion on this?

Flags: needinfo?(nfroyd)

Nathan Froyd [:froydnj]

Comment 17

•

6 years ago

I can't find the language in the spec that says nullptr dereferences are undefined behavior.  It does explicitly say that for pointer-to-members, and I'm sure it says it for normal pointers...I just can't find it at the moment.  So I think it's reasonable that UBSan flags that.

I'm less excited about switching to 0x1 everywhere, and would rather not do it unless we had to.

Flags: needinfo?(nfroyd)

Christian Holler (:decoder)

Comment 18

•

6 years ago

I'd be fine making this UBSan only (and it would be a fairly simple patch, too).

:dmajor, did you have any particular problems or any particular part of the spec in mind that should cause us to change it for all the builds? If so, would you mind discussing this with Nathan? I'll be happy to make a patch for either of the two if you both can decide on which strategy it should be :) Thanks!

Flags: needinfo?(dmajor)

(Away)

Comment 19

•

6 years ago

I defer to Nathan.

Flags: needinfo?(dmajor)

Tyson Smith [:tsmith]

Assignee

Comment 20

•

6 years ago

Attached patch crash_addr.patch (obsolete) — Details — Splinter Review

Here is decoder's patch with a comment added.

Attachment #9013040 - Flags: review?(nfroyd)

Nathan Froyd [:froydnj]

Comment 21

•

6 years ago

Comment on attachment 9013040 [details] [diff] [review]
crash_addr.patch

Review of attachment 9013040 [details] [diff] [review]:
-----------------------------------------------------------------

r=me with the comment change below.

::: mfbt/Assertions.h
@@ +227,5 @@
> + * crash. NULL is preferred however if for some reason NULL cannot be used
> + * this makes choosing another value possible.
> + *
> + * In the case of UBSan certain checks, bounds specifically, cause the compiler
> + * to emit the 'ud2' instruction. This causes forced crashes to manifest as

"...emit the instruction when storing to 0x0."?

Attachment #9013040 - Flags: review?(nfroyd) → review+

Tyson Smith [:tsmith]

Assignee

Comment 22

•

6 years ago

Attached patch crash_addr.patch (obsolete) — Details — Splinter Review

Updated wording.

Attachment #9013040 - Attachment is obsolete: true

Attachment #9013336 - Flags: review+

Tyson Smith [:tsmith]

Assignee

Comment 23

•

6 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=9fe35c869f6c8ab144ade4df262d8618c2971e37

Tyson Smith [:tsmith]

Assignee

Updated

•

6 years ago

Keywords: checkin-needed

Dorel Luca [:dluca]

Comment 24

•

6 years ago

Hi, When trying to apply your patch in order to land it I was given this error.

applying crash_addr.patch
patching file mfbt/Assertions.h
Hunk #1 FAILED at 220
1 out of 1 hunks FAILED -- saving rejects to file mfbt/Assertions.h.rej
patch failed, unable to continue (try -v)
patch failed, rejects left in working directory
errors during apply, please fix and qrefresh crash_addr.patch

I think it needs a rebase.

Flags: needinfo?(twsmith)

Keywords: checkin-needed

Tyson Smith [:tsmith]

Assignee

Comment 25

•

6 years ago

Attached patch crash_addr.patch — Details — Splinter Review

Attachment #9013336 - Attachment is obsolete: true

Flags: needinfo?(twsmith)

Attachment #9014597 - Flags: review+

Tyson Smith [:tsmith]

Assignee

Updated

•

6 years ago

Keywords: checkin-needed

Pulsebot

Comment 26

•

6 years ago

Pushed by dluca@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/463cee9e37d9
Add MOZ_CRASH_WRITE_ADDR to avoid ILL with UBSan. r=nfroyd

Keywords: checkin-needed

Cosmin Sabou [:CosminS]

Comment 27

•

6 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/463cee9e37d9

Status: NEW → RESOLVED

Closed: 6 years ago

status-firefox64: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla64

Kostya Serebryany

Comment 28

•

6 years ago

I think ubsan's behavior is valid. 
If you want to make sure you get a SEGV, do something like this:

static volatile int *mynullptr = 0;
...
      *mynullptr = 42;

You need to log in before you can comment on or make changes to this bug.