Consider adding a variant of MOZ_CRASH that doesn't require string literals

RESOLVED FIXED in Firefox -esr52

Status

()

defect
RESOLVED FIXED
3 years ago
2 years ago

People

(Reporter: ehoogeveen, Assigned: ehoogeveen)

Tracking

Trunk
mozilla54
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox-esr52 fixed, firefox54 fixed)

Details

Attachments

(10 attachments, 16 obsolete attachments)

3.35 KB, patch
ehoogeveen
: review+
Details | Diff | Splinter Review
1.25 KB, patch
ehoogeveen
: review+
Details | Diff | Splinter Review
953 bytes, patch
ehoogeveen
: review+
Details | Diff | Splinter Review
5.79 KB, patch
ehoogeveen
: review+
Details | Diff | Splinter Review
1.11 KB, patch
ehoogeveen
: review+
Details | Diff | Splinter Review
1.99 KB, patch
ehoogeveen
: review+
Details | Diff | Splinter Review
2.66 KB, patch
ehoogeveen
: review+
Details | Diff | Splinter Review
2.11 KB, patch
ehoogeveen
: review+
Details | Diff | Splinter Review
6.27 KB, patch
ehoogeveen
: review+
Details | Diff | Splinter Review
1.42 KB, patch
ehoogeveen
: review+
Details | Diff | Splinter Review
The crash reason provided by MOZ_CRASH and saved in crash reports is very useful, but sometimes a string literal can't provide enough information. Excessive inlining can also make callstacks hard to understand. As a result we've been seeing more and more custom variants of MOZ_CRASH popping up [1].

[1] https://dxr.mozilla.org/mozilla-central/search?q=MOZ_REALLY_CRASH+-path%3Amfbt&redirect=false
This patch adds MOZ_CRASH_OOL and MOZ_CRASH_DYNAMIC. The former just moves the crash out of line and can also be used if you just want to print a string passed in from elsewhere. The latter acts like snprintf, printing the desired string into an outside buffer (which should outlive the program).

Both of these are potentially less safe than MOZ_CRASH, since the caller needs to make sure its buffer outlives the program and snprintf can get pretty complicated, but I think it's better to centralize this functionality in MFBT than have all these custom variants scattered around the tree.
Attachment #8836109 - Flags: review?(nfroyd)
I believe this was added in bug 1336345. See the description of part 1 for details.
Attachment #8836113 - Flags: review?(jwwang)
I believe this was added in bug 1102052. See the description of part 1 for details.
Attachment #8836115 - Flags: review?(mrbkap)
We do this in several places in SpiderMonkey (some added by me). Jan, could you take a look?
Attachment #8836116 - Flags: review?(jdemooij)
I believe this was added in bug 1289001. See the description of part 1 for details.
Attachment #8836119 - Flags: review?(valentin.gosu)
I believe this was added in bug 1301407. See the description of part 1 for details.
Nathan, I believe you added this in bug 1276669.
Attachment #8836121 - Flags: review?(nfroyd)
... And I believe you reviewed this one in bug 1159244 :)
Attachment #8836122 - Flags: review?(nfroyd)
Mike, I know this isn't what you were talking about in bug 1336662, but I think it's nicer than just calling MOZ_CRASH() with no explanation string.
Attachment #8836123 - Flags: review?(mh+mozilla)
Comment on attachment 8836109 [details] [diff] [review]
Part 1: Add MOZ_CRASH_OOL and MOZ_CRASH_DYNAMIC to crash with a runtime generated explanation string.

Oh, I forgot to mention: Unlike MOZ_CRASH itself, MOZ_CRASH_OOL and MOZ_CRASH_DYNAMIC don't wrap the reason string as "MOZ_CRASH([reason])". This would make them a lot more complicated (I do have a patch locally that does this for MOZ_CRASH_DYNAMIC though), and I believe crash stats just assumes MOZ_CRASH by default (it does use the similar wrapper for MOZ_RELEASE_ASSERT to differentiate it).
Will Socorro need extra logic to coalesce MOZ_CRASH crashes from the same line number that have different explanation strings?
I don't think so; it doesn't affect the signature, and you can already look for moz crash reasons that contain, start with or end with a particular string. To the extent that this might be a problem, it is also preexisting: with the exception of part 9, these places were all using similar logic already (though I think at least one wasn't setting the crash reason at all).
Having only static string literals was a deliberate decision, because it's one less attack vector for making MOZ_CRASH unreliable.  A dynamic version of such is necessarily less secure, because the process of building up that string -- in a situation where we're potentially off the rails with a corrupted heap, or stack, or worse -- could be hijacked by an attacker.  I'm very, very leery of making this change.
I messed up the printf annotation when adding the file and line number arguments. This fixes that, and also adds MOZ_MAYBE_UNUSED to fix dead code warnings-as-errors. Unfortunately on Windows this expands to a weird __pragma warning suppression with the idiosyncrasy that the function name can't be more than one line below it, so I had to ignore the column limit (even 'continuing' the line with a backslash doesn't work).
Attachment #8836109 - Attachment is obsolete: true
Attachment #8836109 - Flags: review?(nfroyd)
Attachment #8836270 - Flags: review?(nfroyd)
(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #13)
> Having only static string literals was a deliberate decision, because it's
> one less attack vector for making MOZ_CRASH unreliable.  A dynamic version
> of such is necessarily less secure, because the process of building up that
> string -- in a situation where we're potentially off the rails with a
> corrupted heap, or stack, or worse -- could be hijacked by an attacker.  I'm
> very, very leery of making this change.

I understand that, but we also have to deal with the fact that people are already doing this anyway, as evidenced by parts 2 through 8 (I admit to contributing to this problem). So we either need to put our foot down and change all those places to do something else (and do something to keep people from adding more instances down the line), or allow it some form.

I can think of three mitigations:
1) Make MOZ_CRASH_DYNAMIC and maybe MOZ_CRASH_OOL do nothing but crash on Beta and Release (or set the crash reason to the format string as a compromise)
2) #undef MOZ_REALLY_CRASH in Assertions.h after we've defined all the macros using it (with some sort of comment)
3) Create some custom version of printf that only handles simple cases (probably hard to define 'simple' though)

Having MOZ_CRASH_OOL and MOZ_CRASH_DYNAMIC available might tempt more people to use them, but it also gives us a better choke point to limit the rot.
Thinking about this more, we can also
4) Add some pretty strict sanity checks, e.g. check that the capacity is less than some constant, check that the format string is smaller than the capacity, and maybe that any string arguments aren't too long (that would require some rudimentary parsing of the format string though).

That way we would have some protection against printf going too far off the rails, which would already be safer than what some of these callers are doing.
Comment on attachment 8836116 [details] [diff] [review]
Part 4: Use MOZ_CRASH_OOL and MOZ_CRASH_DYNAMIC in SpiderMonkey.

Review of attachment 8836116 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for doing this.

::: js/src/gc/Memory.cpp
@@ +854,5 @@
>      MOZ_RELEASE_ASSERT(size > 0);
>      MOZ_RELEASE_ASSERT(p);
>  #if defined(XP_WIN)
>      DWORD oldProtect;
> +    if (!VirtualProtect(p, size, PAGE_NOACCESS, &oldProtect))

Nit: the then-block spans multiple lines so keep the {}

@@ +872,5 @@
>      MOZ_RELEASE_ASSERT(size > 0);
>      MOZ_RELEASE_ASSERT(p);
>  #if defined(XP_WIN)
>      DWORD oldProtect;
> +    if (!VirtualProtect(p, size, PAGE_READONLY, &oldProtect))

And here.
Attachment #8836116 - Flags: review?(jdemooij) → review+
Thanks! Carrying forward r+.

In addition to the change below, I added back the MOZ_ReportCrash call in the JS shell function #ifndef DEBUG, as automation (the sm-asan build at least) seems to rely on it. I'm not entirely sure why (or whether the sm-asan build should even be an opt build), but the missing MOZ_ReportCrash call in opt builds was the only change in behavior introduced by this patch.

> > +    if (!VirtualProtect(p, size, PAGE_NOACCESS, &oldProtect))
> 
> Nit: the then-block spans multiple lines so keep the {}

I'm never sure how to interpret that rule. Fixed in all 3 places.
Attachment #8836116 - Attachment is obsolete: true
Attachment #8836768 - Flags: review+
(In reply to Emanuel Hoogeveen [:ehoogeveen] from comment #18)
> as automation (the sm-asan build at
> least) seems to rely on it. I'm not entirely sure why (or whether the
> sm-asan build should even be an opt build), but the missing MOZ_ReportCrash
> call in opt builds was the only change in behavior introduced by this patch.

Aha, found it: https://dxr.mozilla.org/mozilla-central/source/js/src/tests/lib/jittests.py#415

That workaround definitely relies on the crash message being present, but we also do a similar test below to distinguish OOM from other crashes, so I think making sure we always print the message is the right call for this special case.
With those fixes for part 1 and part 4, this series is green on try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=8c887cfd8045f59725096700087ef07ac65f5dde
Attachment #8836115 - Flags: review?(mrbkap) → review+
Comment on attachment 8836270 [details] [diff] [review]
Part 1 v1.1: Add MOZ_CRASH_OOL and MOZ_CRASH_DYNAMIC to crash with a runtime generated explanation string.

Hmm, I handled the return code of vsnprintf wrong. I'm also adding some of the safety checks I proposed, so clearing review for now.
Attachment #8836270 - Flags: review?(nfroyd)
Attachment #8836120 - Flags: review?(dkeeler)
David, I intended to tag you when I attached this in comment #6. See comment #1 for details.
Comment on attachment 8836120 [details] [diff] [review]
Part 6: Use MOZ_CRASH_DYNAMIC in NSS.

Review of attachment 8836120 [details] [diff] [review]:
-----------------------------------------------------------------

Cool - thanks!
Attachment #8836120 - Flags: review?(dkeeler) → review+
This adds several checks on the arguments passed to MOZ_CRASH_DYNAMIC. Most of them I think are objectively good, but in one case the cure might be worse than the disease. I expect this will have to go through at least one round of changes, but it compiles and I'd like to see what you think.

This implements the following checks:
1) A static_assert that the capacity is <= 1024 (an arbitrary limit)
2) A static_assert that the number of additional arguments is between 1 and 4 (also arbitrary)
3) A check that should reject format strings that aren't string literals
4) A runtime check that the format string fits inside the buffer (for MOZ_CRASH_OOL, a runtime check that the length of the string is <= 1024)
5) A runtime check that the length of every string argument is <= 512 (also arbitrary).

The last check is the hairy one, since it requires doing some rudimentary parsing on the format string. For brevity I made it reject wide strings outright, but I don't know if whatever safety it might offer is worth the additional work.

I still need to test all the cases this is supposed to guard against and make sure it doesn't reject anything valid.
Attachment #8836270 - Attachment is obsolete: true
Attachment #8836952 - Flags: feedback?(nfroyd)
Attachment #8836113 - Flags: review?(jwwang) → review+
Comment on attachment 8836952 [details] [diff] [review]
Part 1 v2: Add MOZ_CRASH_OOL and MOZ_CRASH_DYNAMIC to crash with a runtime generated explanation string.

Review of attachment 8836952 [details] [diff] [review]:
-----------------------------------------------------------------

::: mfbt/Assertions.h
@@ +525,5 @@
> +  MOZ_REALLY_CRASH();
> +}
> +
> +#ifndef DEBUG
> +#  define MOZ_CRASH_DYNAMIC(str, size, format, ...) \

I think the macro is more useful if we don't have to pass a buffer and the buffer size.
E.g. MOZ_CRASH_DYNAMIC("some crash rv=%x", rv);

And MOZ_CRASH_PRINTF might be a name to explain the purpose better.
Attachment #8836119 - Flags: review?(valentin.gosu) → review+
(In reply to JW Wang [:jwwang] [:jw_wang] from comment #25)
> I think the macro is more useful if we don't have to pass a buffer and the
> buffer size.
> E.g. MOZ_CRASH_DYNAMIC("some crash rv=%x", rv);

Hmm, I didn't do this because I think we want to avoid malloc here if possible, and I'm worried about possible races that might result from using the same region of memory for any crash. But thinking about it more, racing issues are probably already possible for some of the places that do this.

Maybe we should just reserve a small region of memory in Assertions.cpp and move MOZ_CrashDynamic in there too, then guard against races using an atomic variable. If another thread tries to initiate a crash while another thread is already crashing, we can just loop until the process has finished crashing. That might be the safest way to do this overall.

> And MOZ_CRASH_PRINTF might be a name to explain the purpose better.

Yeah, I'm not tied to the names here, we can call them whatever. I'll wait for Nathan to weigh in.
Comment on attachment 8836952 [details] [diff] [review]
Part 1 v2: Add MOZ_CRASH_OOL and MOZ_CRASH_DYNAMIC to crash with a runtime generated explanation string.

Review of attachment 8836952 [details] [diff] [review]:
-----------------------------------------------------------------

I having an internal buffer as suggested in comment 26 is a good idea; that way we don't have a bunch of `static const char kCrashBuffer[1024];` things hanging around.

Some comments below.  I share Waldo's concerns from comment 13; many people are doing this, but from a quick skim of the callsites for MOZ_REALLY_CRASH, it looks like people aren't doing this when we think memory has been corrupted (well, mostly; there are definitely checks in the JS engine for memory corruption), but when we've just hit an error we simply can't recover from (e.g. VirtualProtect failing in the JS engine).  If we make it generally available, I'm unsure that people will be as cautious about it.

::: mfbt/Assertions.h
@@ +428,5 @@
> + */
> +static const size_t sMaxExplanationStringLength = 1024;
> +static const size_t sMaxSubstringLength = 512;
> +
> +static inline int

I assume this is `int` because it's intended for use in C code?

@@ +488,5 @@
> +                 char* aStr, const size_t aSize, const char* aFormat, ...)
> +#endif
> +{
> +  MOZ_RELEASE_ASSERT(MOZ_CheckStringLength(aFormat, aSize),
> +                     "The supplied format string is too long!");

If we were *really* C++ programmers, we'd be doing this check and the one below with constexpr functions...  (please don't)

@@ +493,5 @@
> +  va_list aArgs;
> +  va_start(aArgs, aFormat);
> +
> +  va_list argCopy;
> +  va_copy(argCopy, aArgs);

Does this compile everywhere?  We have some gnarly configure code:

http://dxr.mozilla.org/mozilla-central/source/old-configure.in#1727

for checking how to copy `va_list`, and we switch on the result of that.  I *think* the last time I looked, it might have been MSVC that didn't provide the correct primitives, but maybe MSVC 2015 has gotten better about this?

@@ +496,5 @@
> +  va_list argCopy;
> +  va_copy(argCopy, aArgs);
> +  for (size_t n = 0; aFormat[n]; ++n) {
> +    if (aFormat[n] == '%') {
> +      const char next = aFormat[n + 1];

Maybe call this something more descriptive, like `formatChar`?

@@ +506,5 @@
> +        const char* arg = va_arg(argCopy, const char*);
> +        MOZ_RELEASE_ASSERT(MOZ_CheckStringLength(arg, sMaxSubstringLength),
> +                           "The supplied string argument is too long!");
> +      } else {
> +        va_arg(argCopy, int);

I don't think this is right.  What if next == 'p' on a 64-bit platform or 'lld' on a 32-bit platform?  Doesn't that mean that we don't skip over the right number of bytes for this argument?

I see that the SysV x86-64 ABI--and possibly the Win64 ABI--align things to 8-byte boundaries, but the 32-bit case is a concern here, and I wouldn't like to assume that all platforms are as nice as x86-64 in alignment respects.

If we can't use `int` here, then that's going to be quite some code bloat, and I'd like to see if we could possibly move this function out of the header...or debate whether we want to keep all this checking.

@@ +529,5 @@
> +#  define MOZ_CRASH_DYNAMIC(str, size, format, ...) \
> +     do { \
> +       MOZ_STATIC_ASSERT_VALID_ARG_COUNT(__VA_ARGS__); \
> +       static_assert(MOZ_PASTE_PREFIX_AND_ARG_COUNT(, __VA_ARGS__) <= 4, \
> +                     "Only up to 4 additional arguments are allowed!"); \

Should we static_assert that there *are* additional arguments?  I think format checking would catch that case, but it might be more clear to catch it here.

@@ +532,5 @@
> +       static_assert(MOZ_PASTE_PREFIX_AND_ARG_COUNT(, __VA_ARGS__) <= 4, \
> +                     "Only up to 4 additional arguments are allowed!"); \
> +       static_assert((size) <= sMaxExplanationStringLength, \
> +         "The supplied buffer for the explanation string is too big!"); \
> +       MOZ_CrashDynamic((str), (size), "" format, __VA_ARGS__); \

I'm not excited about having to duplicate this macro just so we can pass additional arguments to MOZ_CrashDynamic.  WDYT about:

#ifndef DEBUG
#  define MOZ_CALL_CRASH_DYNAMIC(str, size, format, ...) \
  MOZ_CrashDynamic((str), (size), "" format, ##__VA_ARGS__);
#else
#  define MOZ_CALL_CRASH_DYNAMIC(str, size, format, ...) \
  MOZ_CrashDynamic(__FILE__, __LINE__, (str), (size), "" format, ##__VA_ARGS__);
#endif

And then MOZ_CRASH_DYNAMIC can have a single implementation that calls this macro.  We'd #undef MOZ_CALL_CRASH_DYNAMIC somewhere before the end of the file.
Attachment #8836952 - Flags: feedback?(nfroyd) → feedback+
Attachment #8836121 - Flags: review?(nfroyd) → review+
Attachment #8836122 - Flags: review?(nfroyd) → review+
Attachment #8836123 - Flags: review?(mh+mozilla) → review+
Some purely mechanical prep work for part 1, split out to make part 1 easier to review.
Attachment #8837744 - Flags: review?(nfroyd)
Okay, so in this version I changed a few things:

1) I renamed MOZ_CRASH_OOL to MOZ_CRASH_UNSAFE_OOL, MOZ_CRASH_DYNAMIC to MOZ_CRASH_UNSAFE_PRINTF and MOZ_CrashDynamic to MOZ_CrashPrintf as a more heavy-handed way to deter people from using them unnecessarily (and dynamic -> printf to be more intuitive)
2) MOZ_CrashOOL and MOZ_CrashPrintf now live in Assertions.cpp
3) MOZ_CrashPrintf now always uses the same statically allocated region of memory (defined in Assertions.cpp)
4) MOZ_CrashPrintf now uses an atomic variable to guard against reentrancy (let me know what you think of how I handled that). I could have guarded MOZ_CrashOOL and even MOZ_CRASH the same way, but I don't think the danger is as significant for them (since all they do is set a variable and crash)
5) I removed the dynamic checks. It's a bit of a shame, but I don't think fixing that parser to work correctly is worth it
6) MOZ_CrashOOL and MOZ_CrashPrintf now take a line parameter even opt builds, which they pass to MOZ_REALLY_CRASH(), to hopefully help disambiguate the function calls (since that's partially the point of MOZ_CrashOOL)

Since Assertions.cpp now needs to include Assertions.h, I had to add IMPL_MFBT to an obscure component to avoid errors about redefinitions. Let me know if you think glandium should have a look, but I think it's trivial enough not to bother him.

> I having an internal buffer as suggested in comment 26 is a good idea; that
> way we don't have a bunch of `static const char kCrashBuffer[1024];` things
> hanging around.

Done; this does make it much nicer. The other patches need to be adjusted for this, which I've done locally (I'll upload them once part 1 is set to go).

> Some comments below.  I share Waldo's concerns from comment 13; many people
> are doing this, but from a quick skim of the callsites for MOZ_REALLY_CRASH,
> it looks like people aren't doing this when we think memory has been
> corrupted (well, mostly; there are definitely checks in the JS engine for
> memory corruption), but when we've just hit an error we simply can't recover
> from (e.g. VirtualProtect failing in the JS engine).  If we make it
> generally available, I'm unsure that people will be as cautious about it.

I suppose that's true, but I'll be honest - worrying about memory corruption didn't cross my mind when I added those VirtualProtect failure diagnostics. I added "UNSAFE" to the method names to hopefully make people think a bit harder than I did.

> I assume this is `int` because it's intended for use in C code?

Yeah, C code can use this file so I didn't want to use bool. I guess it does exist as of C99 as a macro, but only if you include <stdbool.h>. Either way, this is gone now.

> > +  MOZ_RELEASE_ASSERT(MOZ_CheckStringLength(aFormat, aSize),
> > +                     "The supplied format string is too long!");
> 
> If we were *really* C++ programmers, we'd be doing this check and the one
> below with constexpr functions...  (please don't)

Heh, I'd have to do it outside the function I think, but it'd be pretty rough without C++14 constexpr (which MSVC 2015 doesn't support). I added a static assertion to do the same thing though - since I made the macro require a string literal for the format string, we can just use the sizeof operator on it. On the other hand, this being a string literal now, I don't think it adds a lot of safety.

By the way, I think we can get rid of MOZ_STATIC_ASSERT in this file - all our supported compilers seem happy with plain static_assert, even in C code.

> > +  va_copy(argCopy, aArgs);
> 
> Does this compile everywhere?  We have some gnarly configure code:
> 
> http://dxr.mozilla.org/mozilla-central/source/old-configure.in#1727
> 
> for checking how to copy `va_list`, and we switch on the result of that.  I
> *think* the last time I looked, it might have been MSVC that didn't provide
> the correct primitives, but maybe MSVC 2015 has gotten better about this?

It definitely compiled, but I didn't get around to checking the details due to the problem below. If https://msdn.microsoft.com/en-us/library/kb57fad8.aspx is to be believed, the versions exposed by stdarg.h should be ISO C99 compliant.

> > +        va_arg(argCopy, int);
> 
> I don't think this is right.  What if next == 'p' on a 64-bit platform or
> 'lld' on a 32-bit platform?  Doesn't that mean that we don't skip over the
> right number of bytes for this argument?

Ouch, you're right. I just assumed the arguments were stored in an array under the hood, but after reading up on it this is definitely not okay. Fixing the parser to take every format specifier into account seems like too much effort for this bug - if we still think it's worth doing I think we should do it in another bug. The implementation could go into Assertions.cpp though, so bloating the header isn't an issue.

> > +       MOZ_STATIC_ASSERT_VALID_ARG_COUNT(__VA_ARGS__); \
> > +       static_assert(MOZ_PASTE_PREFIX_AND_ARG_COUNT(, __VA_ARGS__) <= 4, \
> > +                     "Only up to 4 additional arguments are allowed!"); \
> 
> Should we static_assert that there *are* additional arguments?  I think
> format checking would catch that case, but it might be more clear to catch
> it here.

That's what MOZ_STATIC_ASSERT_VALID_ARG_COUNT is for - it errors out if the number of arguments isn't in the range 1 to 50.

> I'm not excited about having to duplicate this macro just so we can pass
> additional arguments to MOZ_CrashDynamic.  WDYT about:

Good point, I just kept bolting on more things without thinking about how to make it more compact.

> We'd #undef MOZ_CALL_CRASH_DYNAMIC somewhere before the end of the file.

Unfortunately we can't, same with MOZ_REALLY_CRASH - the macros that use them aren't instantiated until later in other files, so they need to stay defined.
Attachment #8836952 - Attachment is obsolete: true
Attachment #8837756 - Flags: review?(nfroyd)
Oh, we'll probably also need to tell Socorro to ignore MOZ_CrashOOL and MOZ_CrashPrintf in crash signatures, so they don't all get grouped together. What's the process for doing that?
Flags: needinfo?(chris.lonnen)
Just a heads-up that a patch from bug 1301407 might conflict with attachment 8836120 [details] [diff] [review] slightly (it just removes the #ifdef ANDROID / #endif bits - everything else is the same).
@ehoogeveen -- we'll need to make some skiplist changes. see https://github.com/mozilla/socorro/tree/master/socorro/siglists
Flags: needinfo?(chris.lonnen)
Attachment #8837744 - Flags: review?(nfroyd) → review+
Comment on attachment 8837756 [details] [diff] [review]
Part 1 v3: Add MOZ_CRASH_UNSAFE_OOL and MOZ_CRASH_UNSAFE_PRINTF to crash with a runtime generated explanation string.

Review of attachment 8837756 [details] [diff] [review]:
-----------------------------------------------------------------

These changes generally look good.  I am very skeptical of the sleeping in the out-of-line crash code, though.

::: mfbt/Assertions.cpp
@@ +55,5 @@
> +{
> +  if (!sCrashing.compareExchange(false, true)) {
> +    // Sleep briefly to give the other thread time to crash, but proceed
> +    // if nothing is happening (in case sCrashing is true due to a bitflip).
> +    Sleep(1);

Uh.  We actually want both threads to crash here?  I thought we'd just spin forever here on the thread that didn't win, since the process is going down and some other thread is crashing.  Especially when the threads are sharing a static buffer to print to.

::: mfbt/Assertions.h
@@ +274,5 @@
> +/*
> + * MOZ_CRASH_UNSAFE_OOL(explanation-string) can be used if the explanation
> + * string cannot be a string literal (but no other processing needs to be done
> + * on it). A regular MOZ_CRASH() is preferred wherever possible, as passing
> + * arbitrary strings from a potentially compromised process is not without risk.

I kinda wish we didn't need this, but...WDYT about adding language here along the lines of, "If you are passing the result of a printf-style function here, you should consider using MOZ_CRASH_UNSAFE_PRINTF instead."?
Attachment #8837756 - Flags: review?(nfroyd) → review-
Thanks for the review! I'll get an updated version up tomorrow.

(In reply to Nathan Froyd [:froydnj] from comment #33)
> Uh.  We actually want both threads to crash here?  I thought we'd just spin
> forever here on the thread that didn't win, since the process is going down
> and some other thread is crashing.  Especially when the threads are sharing
> a static buffer to print to.

My thought process went something like this: One thread should always win the race to crash (in the unlikely event that there *is* a crash), but I don't think we want two threads trying to print into the same buffer concurrently. At the same time, sCrashing could end up getting set due to hardware failure, and we don't want to get stuck forever if that happens. Thus we wait for a while, and proceed if nothing is happening.

But thinking about it a bit more, a race like this should be extremely rare in the first place, and our first priority is still to crash safely. So instead, if sCrashing is set, let's just skip processing the crash reason and crash right away. The chances of this impacting crash stats in any way are extremely low, and any weirdness is probably dwarfed by the amount of nonsense signatures we already get from faulty hardware.

> > + * MOZ_CRASH_UNSAFE_OOL(explanation-string) can be used if the explanation
> > + * string cannot be a string literal (but no other processing needs to be done
> > + * on it). A regular MOZ_CRASH() is preferred wherever possible, as passing
> > + * arbitrary strings from a potentially compromised process is not without risk.
> 
> I kinda wish we didn't need this, but...WDYT about adding language here
> along the lines of, "If you are passing the result of a printf-style
> function here, you should consider using MOZ_CRASH_UNSAFE_PRINTF instead."?

Sure, I can add something like that.
(In reply to Emanuel Hoogeveen [:ehoogeveen] from comment #34)
> (in the unlikely event that there *is* a crash)

Er, in the unlikely event that there is a *race*.
Depends on: 1341035
(In reply to Chris Lonnen :lonnen from comment #32)
> @ehoogeveen -- we'll need to make some skiplist changes. see
> https://github.com/mozilla/socorro/tree/master/socorro/siglists

Thanks Chris, I filed bug 1341035 about this.
Comment on attachment 8839146 [details] [diff] [review]
Part 1 v3.1: Add MOZ_CRASH_UNSAFE_OOL and MOZ_CRASH_UNSAFE_PRINTF to crash with a runtime generated explanation string.

Review of attachment 8839146 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks!
Attachment #8839146 - Flags: review?(nfroyd) → review+
Updated to use MOZ_CRASH_UNSAFE_PRINTF, removing more code.
Attachment #8836113 - Attachment is obsolete: true
Attachment #8839536 - Flags: review+
Updated to use MOZ_CRASH_UNSAFE_PRINTF, simpler than before.
Attachment #8836115 - Attachment is obsolete: true
Attachment #8839537 - Flags: review+
Updated with the new names, and removed the buffers that are no longer needed.
Attachment #8836768 - Attachment is obsolete: true
Attachment #8839538 - Flags: review+
Updated with the new name.
Attachment #8836119 - Attachment is obsolete: true
Attachment #8839539 - Flags: review+
Attachment #8839539 - Attachment description: Part 5: Use MOZ_CRASH_UNSAFE_OOL in Necko. → Part 5 v1.1: Use MOZ_CRASH_UNSAFE_OOL in Necko.
Thanks for the heads up! This was an easy rebase though. Updated to use MOZ_CRASH_UNSAFE_PRINTF, removing the now unneeded static buffer.
Attachment #8836120 - Attachment is obsolete: true
Attachment #8839541 - Flags: review+
I rolled parts 7 and 8 together as they both affect XPCOM. Also updated to use MOZ_CRASH_UNSAFE_PRINTF, removing more code.
Attachment #8836121 - Attachment is obsolete: true
Attachment #8836122 - Attachment is obsolete: true
Attachment #8839542 - Flags: review+
Updated for the new name (and renumbered).
Attachment #8836123 - Attachment is obsolete: true
Attachment #8839544 - Flags: review+
Thanks for the reviews everyone! I'll land this as soon as the Socorro part is done (bug 1341035).
Nicholas, just a heads up: this will change some signatures (prepending MOZ_CrashOOL or MOZ_CrashPrintf, removing jemalloc_crash) and should give mozjemalloc crash reasons (from part 8).
Now with a crucial last minute fix and 100% more functional!

Assertions.cpp can't use MOZ_CRASH_ANNOTATE because it doesn't have the right defines - but it can just set gMozCrashReason directly.
Attachment #8839535 - Attachment is obsolete: true
Attachment #8839612 - Flags: review+
Green on try aside from known intermittents: https://treeherder.mozilla.org/#/jobs?repo=try&revision=b8d98b78305ce89e9c7e2aaeeb866fdaf660d2d
Status: NEW → ASSIGNED
Keywords: checkin-needed
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/00d0810a4e6b
Part 0: Pass __LINE__ as an argument to MOZ_REALLY_CRASH() instead of using it directly. r=froydnj
https://hg.mozilla.org/integration/mozilla-inbound/rev/7fdb44e79e51
Part 1: Add MOZ_CRASH_UNSAFE_OOL and MOZ_CRASH_UNSAFE_PRINTF to crash with a runtime generated explanation string. r=froydnj
https://hg.mozilla.org/integration/mozilla-inbound/rev/253f65ad681f
Part 2: Use MOZ_CRASH_UNSAFE_PRINTF in Media. r=jwwang
https://hg.mozilla.org/integration/mozilla-inbound/rev/02ca86a397be
Part 3: Use MOZ_CRASH_UNSAFE_PRINTF in IPC glue. r=mrbkap
https://hg.mozilla.org/integration/mozilla-inbound/rev/2a7baab53d69
Part 4: Use MOZ_CRASH_UNSAFE_OOL and MOZ_CRASH_UNSAFE_PRINTF in SpiderMonkey. r=jandem
https://hg.mozilla.org/integration/mozilla-inbound/rev/32194844a42a
Part 5: Use MOZ_CRASH_UNSAFE_OOL in Necko. r=valentin
https://hg.mozilla.org/integration/mozilla-inbound/rev/c9cafd66b9e9
Part 6: Use MOZ_CRASH_UNSAFE_PRINTF in NSS. r=keeler
https://hg.mozilla.org/integration/mozilla-inbound/rev/5b533dd2bb8b
Part 7: Use MOZ_CRASH_UNSAFE_PRINTF in XPCOM. r=froydnj
https://hg.mozilla.org/integration/mozilla-inbound/rev/50e44effc231
Part 8: Use MOZ_ASSERT and MOZ_CRASH_UNSAFE_OOL in mozjemalloc. r=glandium
Keywords: checkin-needed
I don't understand how this happened, but the wrong version of part 1 landed here (version 3.2 instead of version 3.3). Just ran into this while testing the patch in bug 1341889.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Setting r+ on this because this was technically already reviewed.
Attachment #8840190 - Flags: review+
Attachment #8840190 - Attachment description: Assertions.cpp can't use MOZ_CRASH_ANNOTATE, so set the crash reason directly. → Part 9: Assertions.cpp can't use MOZ_CRASH_ANNOTATE, so set the crash reason directly.
Parts 1-8 have already landed, so just part 9 please.
Keywords: checkin-needed
Blocks: 1341889
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/294c257938dc
Assertions.cpp can't use MOZ_CRASH_ANNOTATE, so set the crash reason directly. r=froydnj
Keywords: checkin-needed
Hi Ryan, any idea how the mix up here happened? Is it possible you pushed the changesets from my Try run? I didn't do another run for part 1 v3.3 because the changes compared to v3.2 (now part 9) wouldn't have shown up on Try.
Flags: needinfo?(ryanvm)
https://hg.mozilla.org/mozilla-central/rev/294c257938dc
Status: REOPENED → RESOLVED
Closed: 2 years ago2 years ago
Resolution: --- → FIXED
(In reply to Emanuel Hoogeveen [:ehoogeveen] from comment #58)
> Hi Ryan, any idea how the mix up here happened? Is it possible you pushed
> the changesets from my Try run? I didn't do another run for part 1 v3.3
> because the changes compared to v3.2 (now part 9) wouldn't have shown up on
> Try.

Yes, that's exactly what happened. Importing a giant patchset from Bugzilla is a pain, so I tend to look for a Try push after the final patches were posted and import from there instead. Obviously I missed that the one in comment 51 didn't included the revision from comment 50. Sorry for the hassle.
Flags: needinfo?(ryanvm)
Alright, that explains it. I'll keep that in mind in future!
FWIW, with git, there's a command "git bz apply 12345" which will download and apply all of the patches from bug 12345, asking you as it goes along. Maybe the hg tools have something similar. Or you could switch to git. ;)
(In reply to Emanuel Hoogeveen [:ehoogeveen] from comment #61)
> Alright, that explains it. I'll keep that in mind in future!

Or you could push things yourself...
(In reply to Mike Hommey [:glandium] from comment #63)
> Or you could push things yourself...

Yeah, I've been thinking about applying for level 3 access. Still on level 1.
With level 1 you should be able to push to mozreview, where someone else then can trigger autoland, which will always use what's there, not some random things from a previous try push :)
(but at this point, level 3 makes sense too)
For context, we ran into bustage uplifting another patch to ESR52 because it was using the MOZ_CRASH_UNSAFE_OOL macro added by this bug. After discussion with Nathan, we felt that while we could spot fix that specific failure away, it was likely to be a recurring issue given how early we are in the ESR52 cycle. Therefore, we uplifted the patches here to avoid more pain down the road. Ritu took a look at the patches and signed off on it too.
You need to log in before you can comment on or make changes to this bug.