Annotate crash reports triggered by MOZ_CRASH in release builds

RESOLVED FIXED in Firefox 44

Status

()

defect
RESOLVED FIXED
4 years ago
3 years ago

People

(Reporter: Nika, Assigned: Nika, NeedInfo)

Tracking

unspecified
mozilla44
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox44 fixed)

Details

Attachments

(1 attachment, 6 obsolete attachments)

No description provided.
We would like to annotate crashes caused by an explicit MOZ_CRASH invocation in the crashreporter.
This patch annotates MOZ_CRASH invocations performed within libXUL with a crashreporter annotation.
Attachment #8633108 - Flags: review?(nfroyd)
Comment on attachment 8633108 [details] [diff] [review]
Annotate crash reports triggered by MOZ_CRASH in release builds

Review of attachment 8633108 [details] [diff] [review]:
-----------------------------------------------------------------

::: mfbt/Assertions.h
@@ +264,5 @@
>   * If we're a DEBUG build and we crash at a MOZ_CRASH which provides an
>   * explanation-string, we print the string to stderr.  Otherwise, we don't
>   * print anything; this is because we want MOZ_CRASH to be 100% safe in release
>   * builds, and it's hard to print to stderr safely when memory might have been
>   * corrupted.

This comment needs to be updated for the new behavior, I think.
Attachment #8633108 - Flags: review?(nfroyd) → review+
Blocks: 1181217
What are the plans to get this landed?
(In reply to Nathan Froyd [:froydnj][:nfroyd] from comment #2)
> >   * If we're a DEBUG build and we crash at a MOZ_CRASH which provides an
> >   * explanation-string, we print the string to stderr.  Otherwise, we don't
> >   * print anything; this is because we want MOZ_CRASH to be 100% safe in release
> >   * builds, and it's hard to print to stderr safely when memory might have been
> >   * corrupted.
> 
> This comment needs to be updated for the new behavior, I think.

I'd really like to know more about this.  There are definitely, absolutely places where we're using MOZ_CRASH() where we are 100% sure memory corruption has occurred.  I'm very, very unsure that we should give up this guarantee of providing *safe* crashing behavior.

What exactly does crash-report annotation do?  If it definitely makes assumptions about the state of the heap, I am strongly inclined to not take this.  Why can't annotation simply write into a block of statically allocated memory, without allocating anything and without requiring an uncorrupted heap?
We appear to be using MOZ_CRASH for (at least) two diffrent purposes.  One of them is the memory corruption, and we'd want as safe a solution (e.g., current) as possible.  The other one is where something perplexing has occured, we want to crash because we don't know what to do about it, but as far as we can tell there is no corruption, so it's safe to do some more work.  We definitely want to know about these perplexing ones, and those are the ones we want to search for in the crash reports.  Well, at least I do, see bug 1181217 created only a couple of days before this one.  I'm happy with MOZ_CRASH_LOUD or some such that can then be used explicitly in the "no corruption scenario" to do some more work, or extra flags, etc. that would distinguish the two.
Okay.  I think we both agree "loud" is vague and imprecise and that we need a better name, that discourages use except in cases where it's *believed* semantics are off but nothing is intrinsically inconsistent.  I suspect it can't be a single-word modifier or similar, but maybe I'm just not being creative enough.  Suggestions welcome.
Where are the places we MOZ_CRASH when we detect memory corruption?  Getting a good understanding on how common that is will be beneficial here.
Flags: needinfo?(jwalden+bmo)
Assignee: nobody → michael
In crash investigation, it would be great if we could in general search for crashes that happen with an intentional MOZ_CRASH call and also get the string in an annotation that was passed to it as a reason. Those are definitely worth being distinguished from places where we just crash even more unexpectedly and potentially in unsafe ways. That said, I understand there's places where we do MOZ_CRASH because we need to assume anything we do at that point is unsafe. It would be great if we could fit both sides of that somehow.
(In reply to Ehsan Akhgari (don't ask for review please) from comment #7)
> Where are the places we MOZ_CRASH when we detect memory corruption?  Getting
> a good understanding on how common that is will be beneficial here.

I'm going to claim that a great majority (all?) of the graphics MOZ_ASSERTs are the "not a memory corruption" kind.

Perhaps the thing to do is to create a new name for the old/safe functionality, if the memory corruption is the less common situation.  MOZ_CORRUPTION or something like that...
(In reply to Milan Sreckovic [:milan] from comment #9)
> (In reply to Ehsan Akhgari (don't ask for review please) from comment #7)
> > Where are the places we MOZ_CRASH when we detect memory corruption?  Getting
> > a good understanding on how common that is will be beneficial here.
> 
> I'm going to claim that a great majority (all?) of the graphics MOZ_ASSERTs
> are the "not a memory corruption" kind.

Yes, I think that may actually be the majority of MOZ_CRASHes in the entire code base.  I'm asking this because I have never seen a MOZ_CRASH for a case where we know there is memory corruption (it's usually extremely difficult if possible at all to say for sure that we have memory corruption.)

> Perhaps the thing to do is to create a new name for the old/safe
> functionality, if the memory corruption is the less common situation. 
> MOZ_CORRUPTION or something like that...

Yes, if my suspicion that this case is the rare one proves to be accurate, I think the right solution would be to make a new macro for those cases and take this patch as is.  But let's wait for Waldo to weigh in.
(In reply to Milan Sreckovic [:milan] from comment #9)
> (In reply to Ehsan Akhgari (don't ask for review please) from comment #7)
> > Where are the places we MOZ_CRASH when we detect memory corruption?  Getting
> > a good understanding on how common that is will be beneficial here.
> 
> I'm going to claim that a great majority (all?) of the graphics MOZ_ASSERTs
> are the "not a memory corruption" kind.

IIRC MOZ_RELEASE_ASSERT doesn't use MOZ_CRASH internally (and this patch wouldn't annotate MOZ_RELEASE_ASSERT failures - though I agree that should happen too).

> 
> Perhaps the thing to do is to create a new name for the old/safe
> functionality, if the memory corruption is the less common situation. 
> MOZ_CORRUPTION or something like that...

It might be more appropriate to instead say that `MOZ_CRASH` is for corruption issues, and that `MOZ_ASSERT` should instead be used in situations where you want to crash and it is safe to annotate?
MOZ_CRASH and MOZ_RELEASE_ASSERT are very close; the later just does a log/stderr message before crashing the same way that MOZ_CRASH does.  So, now or eventually merging them to do the same thing - seems like a reasonable thing to do.  MOZ_CRASH is rather explicit and shorter, so I prefer that name if we're going to just retain one.  On the other hand, if we keep the Android logging part, then the "new" MOZ_CRASH becomes even more dangerous thing to use.  Perhaps we keep them separate until we're happy with how the "new" MOZ_CRASH behaves and then figure it out?
(In reply to Milan Sreckovic [:milan] from comment #12)
> MOZ_CRASH and MOZ_RELEASE_ASSERT are very close; the later just does a
> log/stderr message before crashing the same way that MOZ_CRASH does.  So,
> now or eventually merging them to do the same thing - seems like a
> reasonable thing to do.  MOZ_CRASH is rather explicit and shorter, so I
> prefer that name if we're going to just retain one.  On the other hand, if
> we keep the Android logging part, then the "new" MOZ_CRASH becomes even more
> dangerous thing to use.  Perhaps we keep them separate until we're happy
> with how the "new" MOZ_CRASH behaves and then figure it out?

What I'm trying to say is that MOZ_RELEASE_ASSERT isn't used for corruption issues (Even in release, it reports to the console: https://dxr.mozilla.org/mozilla-central/source/mfbt/Assertions.h#357-380 and https://dxr.mozilla.org/mozilla-central/source/mfbt/Assertions.h#134-149), while MOZ_CRASH doesn't report in release builds at all (https://dxr.mozilla.org/mozilla-central/source/mfbt/Assertions.h#251-259), out of caution for potential corrupted memory. Perhaps we should annotate in MOZ_RELEASE_ASSERT, but not in MOZ_CRASH, and keep MOZ_CRASH for memory corruption, and MOZ_RELEASE_ASSERT for other situations (which will be logged). 

That being said, there are a lot of instances of MOZ_CRASH in our codebase, and it would be really great to annotate them. Perhaps we switch to MOZ_RELEASE_ASSERT and MOZ_CRASH which report, and MOZ_SAFELY_CRASH which will avoid performing any reporting before giving up and killing the process. It might take a lot of effort to find all of the cases where we expect memory corruption and update them though.
waldo: ping?
How about we make this change nightly+dev edition only for 43, land it, and let it ride the 44 train through beta and release as well?
Flags: needinfo?(michael)
(In reply to Milan Sreckovic [:milan] from comment #15)
> How about we make this change nightly+dev edition only for 43, land it, and
> let it ride the 44 train through beta and release as well?

Would we also want to annotate MOZ_RELEASE_ASSERT at the same time? Or just MOZ_CRASH.
Flags: needinfo?(michael)
I'd go with MOZ_CRASH for now.
I still like to understand what Waldo was worrying about...
Should I continue to wait until jwalden responds to the ni? or should I be going ahead with landing this?
No, please go ahead.  If there are concerns in the future we can address them.

Please post to dev-platform when this lands.  Thanks a lot!

(Also, is there a Socorro bug on file to make use of this information?)
Duplicate of this bug: 1181217
(In reply to Ehsan Akhgari (don't ask for review please) from comment #20)
> No, please go ahead.  If there are concerns in the future we can address
> them.
> 
> Please post to dev-platform when this lands.  Thanks a lot!
> 
> (Also, is there a Socorro bug on file to make use of this information?)

I don't believe that there is.
(In reply to Ehsan Akhgari (don't ask for review please) from comment #20)
> (Also, is there a Socorro bug on file to make use of this information?)

Once this landed, we should get a Socorro bug on file - we really should provide a crash report ID there for reference so it can be verified. That said, the Socorro changes there can be done pretty fast and without any code updates on their side.
You didn't address Nathan's comment 2.
I'm late to this discussion but I'm really worried about doing work within a MOZ_CRASH. In general we just don't know what state we're in. For example, AnnotateCrashReport does this pessimistic thing where it enumerates the hash table (to build the annotation string) every time you call Annotate, because it can't be certain that it can delay that work until crash-time.

I'm also worried about how this interacts with our OOM crashes. If we MOZ_CRASH due to memory exhaustion, will calling Annotate in a low-memory condition cause us to lose existing annotation data? (I haven't checked)

I haven't checked this either, but I'm also worried about a potential infinite recursion in:
NS_ABORT_OOM -> MOZ_CRASH -> AnnotateCrashReport -> nsBaseHashtable::Put -> NS_ABORT_OOM
(In reply to David Major [:dmajor] from comment #26)
> I'm late to this discussion but I'm really worried about doing work within a
> MOZ_CRASH. In general we just don't know what state we're in. For example,
> AnnotateCrashReport does this pessimistic thing where it enumerates the hash
> table (to build the annotation string) every time you call Annotate, because
> it can't be certain that it can delay that work until crash-time.

How can we do it better?

> I'm also worried about how this interacts with our OOM crashes. If we
> MOZ_CRASH due to memory exhaustion, will calling Annotate in a low-memory
> condition cause us to lose existing annotation data? (I haven't checked)

Do we use MOZ_CRASH to detect OOM?  That is surprising to me.  I was under the impression that OOM crashes are coming from infallible allocation failures?

> I haven't checked this either, but I'm also worried about a potential
> infinite recursion in:
> NS_ABORT_OOM -> MOZ_CRASH -> AnnotateCrashReport -> nsBaseHashtable::Put ->
> NS_ABORT_OOM

This should definitely be investigated, yes.  Michael, do you mind doing that please?
(In reply to Ehsan Akhgari (don't ask for review please) from comment #27)
> (In reply to David Major [:dmajor] from comment #26)
> > I'm late to this discussion but I'm really worried about doing work within a
> > MOZ_CRASH. In general we just don't know what state we're in. For example,
> > AnnotateCrashReport does this pessimistic thing where it enumerates the hash
> > table (to build the annotation string) every time you call Annotate, because
> > it can't be certain that it can delay that work until crash-time.
> 
> How can we do it better?

I would have preferred some new macro that we substitute only in places that are safe (e.g. graphics) but I may be too late on this.
 
> > I'm also worried about how this interacts with our OOM crashes. If we
> > MOZ_CRASH due to memory exhaustion, will calling Annotate in a low-memory
> > condition cause us to lose existing annotation data? (I haven't checked)
> 
> Do we use MOZ_CRASH to detect OOM?  That is surprising to me.  I was under
> the impression that OOM crashes are coming from infallible allocation
> failures?

I don't understand what you mean by "detect" OOM. When an infallible allocation fails, it calls mozalloc_handle_oom -> mozalloc_abort -> MOZ_CRASH. (Or at least that's mozjemalloc. The words may be slightly different for jemalloc4.) We also have the "optional fallible" pattern, such as in nsBaseHashTable::Put, where the infallible version calls the fallible version, and on failure calls NS_ABORT_OOM -> MOZ_CRASH.
Sorry. I thought I had fixed the feedback already on my local copy, oops.

I don't have time to investigate the oom concerns today, so I think I'll back this out and come back to it tomorrow.
Not sure if pulsebot will catch the backout - because I might've done it wrong :S https://hg.mozilla.org/integration/mozilla-inbound/rev/50126e0f1d13

(Sorry if I did it wrong)
Flags: needinfo?(michael)
(In reply to David Major [:dmajor] from comment #28)
> (In reply to Ehsan Akhgari (don't ask for review please) from comment #27)
> > (In reply to David Major [:dmajor] from comment #26)
> > > I'm late to this discussion but I'm really worried about doing work within a
> > > MOZ_CRASH. In general we just don't know what state we're in. For example,
> > > AnnotateCrashReport does this pessimistic thing where it enumerates the hash
> > > table (to build the annotation string) every time you call Annotate, because
> > > it can't be certain that it can delay that work until crash-time.
> > 
> > How can we do it better?
> 
> I would have preferred some new macro that we substitute only in places that
> are safe (e.g. graphics) but I may be too late on this.

The problem is that MOZ_CRASH is already being used for a purpose and we want to detect when we actually crash on it.  Using a different macro name doesn't solve that problem.

> > > I'm also worried about how this interacts with our OOM crashes. If we
> > > MOZ_CRASH due to memory exhaustion, will calling Annotate in a low-memory
> > > condition cause us to lose existing annotation data? (I haven't checked)
> > 
> > Do we use MOZ_CRASH to detect OOM?  That is surprising to me.  I was under
> > the impression that OOM crashes are coming from infallible allocation
> > failures?
> 
> I don't understand what you mean by "detect" OOM. When an infallible
> allocation fails, it calls mozalloc_handle_oom -> mozalloc_abort ->
> MOZ_CRASH. (Or at least that's mozjemalloc. The words may be slightly
> different for jemalloc4.) We also have the "optional fallible" pattern, such
> as in nsBaseHashTable::Put, where the infallible version calls the fallible
> version, and on failure calls NS_ABORT_OOM -> MOZ_CRASH.

Ah yeah, that is definitely something we should address.  Thanks for the clarification!

(In reply to Michael Layzell [:mystor] from comment #29)
> Sorry. I thought I had fixed the feedback already on my local copy, oops.
> 
> I don't have time to investigate the oom concerns today, so I think I'll
> back this out and come back to it tomorrow.

Sounds good.  Thank you!
(In reply to Michael Layzell [:mystor] from comment #30)
> Not sure if pulsebot will catch the backout - because I might've done it
> wrong :S https://hg.mozilla.org/integration/mozilla-inbound/rev/50126e0f1d13

It looks fine!  It should do the right thing.
(In reply to Ehsan Akhgari (don't ask for review please) from comment #31)
> The problem is that MOZ_CRASH is already being used for a purpose and we
> want to detect when we actually crash on it.  Using a different macro name
> doesn't solve that problem.

Can you tell me more about what you're trying to achieve? If you just want to differentiate MOZ_CRASHes from e.g. bad pointer derefs, you can filter those already by checking for reason==EXCEPTION_BREAKPOINT. Do you need something more than that?
If you need to log the message string, you could set up a "backdoor" to nsExceptionHandler.cpp that bypasses the AnnotateCrashReport machinery. We do this already for a few variables that we don't want to stick in the hash table (search for isGarbageCollecting or eventloopNestingLevel or gOOMAllocationSize). So, MOZ_ReportMozCrashToCrashReporter would do nothing more than stash a const char* into a global. That's pretty safe!
Comment #34 sounds great. What I'd like to be able to is to immediately recognize an intentional MOZ_CRASH from data, and to get the message string in the crash report so we can search for it and hand it to developers investigating the issue. So, if we can do a light-weight annotation of the message string, that would be awesome, as it would exactly provide this from what I can tell.
Kairo did a great job explaining the goal in comment 34.  Is EXCEPTION_BREAKPOINT *only* used for MOZ_CRASH?  Also is it similarly detectable on all platforms?  If the answer to both is yes, I think comment 34 accomplishes our goal in a much better way!
Flags: needinfo?(dmajor)
Err, the first "comment 34" is actually "comment 35".  </long-day>
(In reply to Ehsan Akhgari (don't ask for review please) from comment #37)
> Kairo did a great job explaining the goal in comment 34.  Is
> EXCEPTION_BREAKPOINT *only* used for MOZ_CRASH?  Also is it similarly
> detectable on all platforms?

EXCEPTION_BREAKPOINT means an "int 3" instruction (aka __debugbreak()). Almost always it's from a MOZ_CRASH. On rare occasion it can be seen in other libraries like the C runtime or graphics drivers, where they're doing their own analogue of MOZ_CRASH. I don't know the story on other platforms.

In any case, filtering by the reason code won't actually show you the crash message, so I guess it doesn't meet KaiRo's goal anyway.

> If the answer to both is yes, I think comment
> 34 accomplishes our goal in a much better way!

I intended comment 34 as an alternative, in case filtering on EXCEPTION_BREAKPOINT wasn't good enough (which it sounds like it isn't). So you don't need a "yes" for that :-)
Flags: needinfo?(dmajor)
Makes sense!  Thanks for the clarification.
This would have come in very handy in the days leading up to 41 release, where a start up MOZ_CRASH eventually turned out to be blocking the unthrottling of the said release.  I know we're trying to make sure everything gets done just right, but we're in a desperate need of this and the matching socorro functionality.
(In reply to Milan Sreckovic [:milan] from comment #41)
> This would have come in very handy in the days leading up to 41 release,
> where a start up MOZ_CRASH eventually turned out to be blocking the
> unthrottling of the said release.  I know we're trying to make sure
> everything gets done just right, but we're in a desperate need of this and
> the matching socorro functionality.

I am minutes away from getting a new patch which doesn't perform heap allocation up - so hopefully it will happen very soon!
This is a different patch, so I'm r?-ing froydnj again.

try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=615b9a6b4f06
Attachment #8633108 - Attachment is obsolete: true
Attachment #8667331 - Flags: review?(nfroyd)
Flags: needinfo?(michael)
Thanks Michael! That's exactly what I had in mind, and ought to be much safer.

A minor drive-by nit-pick: since you're writing to |eventFile|, please add MozCrashReason to the list of comments near |static char const * const kCrashEventAnnotations[]|.
Comment added as per comment 44
Attachment #8667331 - Attachment is obsolete: true
Attachment #8667331 - Flags: review?(nfroyd)
Attachment #8667338 - Flags: review?(nfroyd)
Just for my understanding, do I see it correctly that "MozCrashReason" is the field that's annotated by this patch?
If so, once the patch here lands and looks like it sticks (ideally we'd also have a crash ID with this annotation), I'll file a bug to make that available in Socorro's search UI.
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #46)
> Just for my understanding, do I see it correctly that "MozCrashReason" is
> the field that's annotated by this patch?
> If so, once the patch here lands and looks like it sticks (ideally we'd also
> have a crash ID with this annotation), I'll file a bug to make that
> available in Socorro's search UI.

Yes, that is the name of the annotation - Thanks :)
That's great, thanks Michael.  Do you mind also hitting a synthetic MOZ_CRASH in a local build with this patch and crashreporter turned on so that KaiRo can test his Socorro changes against that?
Flags: needinfo?(michael)
KaiRo - I synthesized a crash with MOZ_CRASH and my patch applied, and I saw the MozCrashReason in the detailed report in the crash reporter - but I can't find it in the actual report. (Maybe you know where to find it better than I do :))

Here's the link: https://crash-stats.mozilla.com/report/index/77078ffa-9b41-47d9-bcff-216ed2150930
Flags: needinfo?(michael)
Blocks: 1209958
(In reply to Michael Layzell [:mystor] from comment #49)
> KaiRo - I synthesized a crash with MOZ_CRASH and my patch applied, and I saw
> the MozCrashReason in the detailed report in the crash reporter - but I
> can't find it in the actual report. (Maybe you know where to find it better
> than I do :))
> 
> Here's the link:
> https://crash-stats.mozilla.com/report/index/77078ffa-9b41-47d9-bcff-216ed2150930

Thanks, I see it in the metadata - but that's exactly why I need to file a bug, the field needs to get whitelisted for public view there and for being available in search. And I've done that in bug 1209958. :)
I was talking with dmajor, and he pointed out that the current messages (which included __FILE__ and __LINE__) would probably have too much noise to be really useful, as they would change between nightlies and build slaves. So I've switched to a more-complex-to-generate-but-hopefully-more-stable message format: "MOZ_CRASH(reason) at filename.cpp:functionName"

new try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=a120387db34f
Attachment #8667338 - Attachment is obsolete: true
Attachment #8667338 - Flags: review?(nfroyd)
Attachment #8667950 - Flags: review?(nfroyd)
Here's a new crash which I think should have that format: https://crash-stats.mozilla.com/report/index/6acc6ec6-1dcb-448f-ac07-28a5a2150930
Michael, did you get a chance to see what the size differences between builds were?
Flags: needinfo?(michael)
(In reply to Nathan Froyd [:froydnj] from comment #53)
> Michael, did you get a chance to see what the size differences between
> builds were?

Sorry - I haven't had too much of a chance, because I've been busy with other patches as well.

I pulled down 2 random linux x86-64 builds, one with my patch and one without. libxul was 97.3 MB without, and 97.6 MB with. I don't have time to pull down the others just yet, sorry.
Flags: needinfo?(michael)
Comment on attachment 8667950 [details] [diff] [review]
Annotate crash reports triggered by MOZ_CRASH in release builds

Review of attachment 8667950 [details] [diff] [review]:
-----------------------------------------------------------------

I'd like to see more size numbers (even if we might just have to eat the size increase to make our crashes easier to diagnose), and to make sure that __func__ and WHATEVER_MSVC_USES return comparable results.

::: dom/media/mediasource/ContainerParser.cpp
@@ -119,5 @@
>      , mOffset(0)
>    {}
>  
>    static const unsigned NS_PER_USEC = 1000;
> -  static const unsigned USEC_PER_SEC = 1000000;

I assume this is somehow related to the new include brought in by Assertions.h?

::: mfbt/Assertions.h
@@ +263,5 @@
> +#  ifdef MOZ_CRASH_CRASHREPORT
> +#    define MOZ_CRASH(...) \
> +       do { \
> +         CrashReporter::AnnotateMozCrashReason("MOZ_CRASH(" __VA_ARGS__ ") at ", \
> +                                               __FILE__, __func__); \

I have a couple of concerns about this:

1. __func__ is not universally available (see your try run and places in the tree where we polyfill __func__ for MSVC).
2. The size increase on 64-bit Linux is concerning, even if the size increase is probably slightly less on 32-bit platforms like Android and Windows.
3. (nitpicky) I don't know how __func__ does in the face of templates and such, since the example crash report doesn't include class names in the signature.  Admittedly, some function name is better than what we have today, but...

::: toolkit/crashreporter/nsExceptionHandler.cpp
@@ +476,5 @@
> +                            const char* aFunc)
> +{
> +  // Build the reason string. XP_PATH_MAX is arbitrarially chosen as
> +  // the size for the char buffer, because why not.
> +  int size = XP_PATH_MAX;

Maybe this should be sizeof(gMozCrashReason) or ArrayLength(gMozCrashReason)?
Attachment #8667950 - Flags: review?(nfroyd) → feedback+
We can choose to impose a size limit of let's say 32 characters for MOZ_CRASH arguments.
(In reply to Nathan Froyd [:froydnj] from comment #55)
> Comment on attachment 8667950 [details] [diff] [review]
> Annotate crash reports triggered by MOZ_CRASH in release builds
> 
> Review of attachment 8667950 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> I'd like to see more size numbers (even if we might just have to eat the
> size increase to make our crashes easier to diagnose), and to make sure that
> __func__ and WHATEVER_MSVC_USES return comparable results.

I've got a full try run of builds with the changes: https://treeherder.mozilla.org/#/jobs?repo=try&revision=108d1ae5c654,
And I'll compare them to a different try run I've done recently to see how the sizes compare. I'll put that in a follow-up comment.

> ::: dom/media/mediasource/ContainerParser.cpp
> @@ -119,5 @@
> >      , mOffset(0)
> >    {}
> >  
> >    static const unsigned NS_PER_USEC = 1000;
> > -  static const unsigned USEC_PER_SEC = 1000000;
> 
> I assume this is somehow related to the new include brought in by
> Assertions.h?

Yup - I removed that include because it caused serious problems on windows (nsExceptionHandler.h imports windows.h, which is horrible :S). So this change should be missing from this patch. 

> ::: mfbt/Assertions.h
> @@ +263,5 @@
> > +#  ifdef MOZ_CRASH_CRASHREPORT
> > +#    define MOZ_CRASH(...) \
> > +       do { \
> > +         CrashReporter::AnnotateMozCrashReason("MOZ_CRASH(" __VA_ARGS__ ") at ", \
> > +                                               __FILE__, __func__); \
> 
> I have a couple of concerns about this:
> 
> 1. __func__ is not universally available (see your try run and places in the
> tree where we polyfill __func__ for MSVC).

I've polyfilled it in this patch.

> 2. The size increase on 64-bit Linux is concerning, even if the size
> increase is probably slightly less on 32-bit platforms like Android and
> Windows.

I'm going to pin this size increase on the use of __FILE__ for the file path. In the report I only use the last path segment, to avoid build slaves modifying the message and making it harder to search.

> 3. (nitpicky) I don't know how __func__ does in the face of templates and
> such, since the example crash report doesn't include class names in the
> signature.  Admittedly, some function name is better than what we have
> today, but...

__func__ doesn't take into account templates, or any other form of path to a given function, it just includes the function's name. That's why I was making a point of including __FILE__ as well.

If we were willing to be more non-standard, we could use __PRETTY_FUNCTION__ on clang and gcc, which expands to the type signature of the function (including path IIRC). Unfortunately MSVC doesn't support that, so we'd have to use __FUNCSIG__ which unfortunately produces a different signature for the same function.

If we wanted to do that, we'd end up with something like: 

and for the annotation call we'd use something like:
CrashReporter::AnnotateMozCrashReason("\"" __VA_ARGS__ "\" : ", MOZ_CRASH_FUNCTION_SIGNATURE);

> ::: toolkit/crashreporter/nsExceptionHandler.cpp
> @@ +476,5 @@
> > +                            const char* aFunc)
> > +{
> > +  // Build the reason string. XP_PATH_MAX is arbitrarially chosen as
> > +  // the size for the char buffer, because why not.
> > +  int size = XP_PATH_MAX;
> 
> Maybe this should be sizeof(gMozCrashReason) or ArrayLength(gMozCrashReason)?

Fixed.
Attachment #8667950 - Attachment is obsolete: true
Attachment #8670311 - Flags: review?(nfroyd)
(In reply to Ehsan Akhgari (don't ask for review please) from comment #56)
> We can choose to impose a size limit of let's say 32 characters for
> MOZ_CRASH arguments.

The size problem right now is not the size 32 characters for MOZ_CRASH arguments - the problem is that __FILE__ is used to get the last segment of the file path, and actually includes a string literal containing the entire path to the file in the output binary (although most of it is unused). I'm unsure of a better way to handle that without losing valuable information (because I believe that __func__ isn't descriptive enough of the site, and that there are too many MOZ_CRASH calls with no arguments, which would be all assigned the same reason (my quick count with `git grep 'MOZ_CRASH()' | wc -l` produced 1026 instances).)

Another way which I just thought of to reduce file size might be to make the no-argument case special cased to include __FILE__ and __func__, while if you provide a reason, we don't provide that information. Not sure how much I like that option though.
I fear that this is never going to land if we keep trying to perfect it. Is it not enough to simply record the MOZ_CRASH text? I'll point out that the mechanism by which you'll be looking up these annotations (Super Search) can already tell you the function signature.
dmajor, you make a compelling argument. Here's a MVP version which just reports the MOZ_CRASH arguments. It should have the absolute minimal file size impact, and work just fine.

I'm not obsoleting the old patch, but it would be great if we could just land this so we get at least
some MOZ_CRASH annotations.
Attachment #8670338 - Flags: review?(nfroyd)
Blocks: 1211979
(In reply to David Major [:dmajor] from comment #59)
> I fear that this is never going to land if we keep trying to perfect it. Is
> it not enough to simply record the MOZ_CRASH text? I'll point out that the
> mechanism by which you'll be looking up these annotations (Super Search) can
> already tell you the function signature.

+1
Comment on attachment 8670311 [details] [diff] [review]
Annotate crash reports triggered by MOZ_CRASH in release builds

I'm going to make the more complex patch obsolete, because I don't think we want to go with this route - I think we want the simpler patch so it can land faster.
Attachment #8670311 - Attachment is obsolete: true
Attachment #8670311 - Flags: review?(nfroyd)
Once you have landed a patch for this and it looks ok, I would be interested in uplifting it to aurora.
Comment on attachment 8670367 [details] [diff] [review]
(OPTION 2: MVP) Annotate crash reports triggered by MOZ_CRASH in release builds

Review of attachment 8670367 [details] [diff] [review]:
-----------------------------------------------------------------

Sorry for the delay in reviewing this.
Attachment #8670367 - Flags: review?(nfroyd) → review+
https://hg.mozilla.org/mozilla-central/rev/cb177e743d72
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla44
You need to log in before you can comment on or make changes to this bug.