Open Bug 1070674 Opened 10 years ago Updated 2 years ago

MOZ_REALLY_CRASH should not invoke undefined behavior (implementation-defined behavior should be enough)

Categories

(Core :: MFBT, defect)

Other Branch
defect

Tracking

()

People

(Reporter: bjacob, Unassigned)

Details

Currently MOZ_REALLY_CRASH (which is what MOZ_ASSERT failures trigger) is implemented by dereferencing the literal NULL pointer. That's undefined behavior, as far as I know. This complicates reasoning on the interaction between MOZ_ASSERT and other undefined behavior in bug 990764 comment 46 and below. It would be nice if we could avoid invoking undefined behavior; just implementation-defined behavior should be enough. For example, dereference the pointer value 1, or do an integer division by zero.
Note: of course, this is a 'mild' form of UB, in the sense that compiler implementers know that it is almost always unintentional, and treat it accordingly. So of course, we are not seeing any crazy UB-induced effects from MOZ_REALLY_CRASH. Still --- the above-mentioned bug shows how this unnecessarily complicates some conversations.
Integer division by zero is undefined behavior.

I don't have corresponding definitions in C++11, but in C11, *(uintptr_t*)1 is definitely undefined behavior:
If an invalid value has been assigned to the pointer, the behavior of the unary * operator is
undefined.

It's not really possible to avoid undefined behavior to cause this sort of exit.
How about calling abort() then? I don't suppose that has undefined behavior?
Another avenue would be to replace the compiler-time bad NULL pointer by a runtime value to prevent the compiler from reasoning about it. In this way, even if it might in principle still be UB, that won't be a kind of UB that can affect the compiler's behavior.
I have some evidence that this undefined behavior can cause real problems. While debugging a misbehaving project embedding SpiderMonkey, I watched it emit multiple 'MOZ_CRASH' messages to the console, without the process actually terminating.

I believe that the compiler (GCC 4.9) had concluded that the call to ::abort could be eliminated since it followed a null pointer dereference and was therefore impossible/unreachable. Commenting out the null pointer dereference restored the expected behavior that MOZ_CRASH would lead to process termination, now by way of abort.
Did your compiler remove the null deref too, or did the null deref fail to crash for some other reason?
For gcc and clang, MOZ_CRASH should probably use __builtin_trap. It's efficient, it's not UB, and it's easy for compilers to optimize the surrounding code.
Semi-issues with __builtin_trap().

One, it might well change crash signature to not be a crash at that exact location.  For debugging purposes, we really don't want MOZ_CRASH not parking an attached debugger on the line containing the MOZ_CRASH, forcing every developer to 'up' one or more times before doing any debugging -- it's a really poor use of developer time.  We've gone to some length to eliminate these 'up's (it once was two, then one, then zero).

Two, MOZ_CRASH writes the line number in the null-deref.  This assists with crashdump-debugging and such by giving an extra hint to the developer, reading surrounding assembly, exactly which crash might have been hit, in code that inlines/coalesces multiple crashes to a single location.  We used to just write in a zero, then we changed it to differentiate nearly-positioned crashes this way.

Three, __builtin_trap() or whatever could potentially not be as "safe" a crash as a null-deref is.  To a very real extent we need a guarantee as to exactly what's done in MOZ_CRASH() so we can know this.  Working from the assumption that writing to null dereferences the zero page, and that page is permanently unallocatable on all platforms, we're safe with what the current implementation causes most compilers to do.  Even if technically compilers are within their rights to make the user spew nasal demons.

I suppose the best way around this -- maybe -- would to have the macro expand to inline assembly, written per-architecture.  That would avoid UB problems, I guess, while still letting us deal with the others appropriately.
If #1 or #3 is a problem (seems unlikely), we can file bugs against clang, because they expect us to use __builtin_trap:

>    foo.c:4:2: warning: indirection of non-volatile null pointer will be deleted, not trap [-Wnull-dereference]
>            *(int *)0;
>            ^~~~~~~~~
>    foo.c:4:2: note: consider using __builtin_trap() or qualifying pointer with 'volatile'

For #2, we're already asking developers to look at surrounding assembly, so we can add our own assembly that does something harmless with the line number.
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.