Closed Bug 987995 Opened 6 years ago Closed 6 years ago

In crash dumps, record whether a JS OOM occurred recently

Categories

(Core :: JavaScript Engine, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla32

People

(Reporter: jorendorff, Assigned: jorendorff)

Details

Attachments

(4 files)

billm pointed out that if we go through a GC and survive, we're probably OK and should clear the bit.
I'm not sure we should clear the bit: maybe make it a tri-state (no OOM/oom/oom-gc-continue)
QA Contact: jorendorff
Jason, setting you as the assignee (which is what I assume you wanted to do, rather than QA contact).
Assignee: general → jorendorff
QA Contact: jorendorff
These will be used in some tests in later parts.

The initial-allocation-failure code in {calloc,realloc}CanGC is commoned up because that's what reportLargeAllocationFailure needs to call.
Attachment #8418253 - Flags: review?(terrence)
The purpose of this will be clear in part 4 --- I want to add crash reporter annotations, and that code is nicer if these three hooks (GC, OOM, large allocation failure) all live in the same place.

It would also be possible to add another virtual method CustomLargeAllocationFailureCallback, and then the "memory-pressure" notification code could live in the XPCJSRuntime subclass. I don't know any of this code very well, so I don't know where it'd make the most sense for stuff to live.
Attachment #8418262 - Flags: review?(continuation)
Attachment #8418267 - Flags: review?(continuation)
Comment on attachment 8418262 [details] [diff] [review]
Part 3 - Move memory pressure callbacks from nsJSEnvironment/XPCJSRuntime to CycleCollectedJSRuntime.

Review of attachment 8418262 [details] [diff] [review]:
-----------------------------------------------------------------

At some point CycleCollectedJSRuntime needs to be renamed to BrowserJSRuntime or something. It is nice to see this stuff being centralized a bit.

::: xpcom/base/CycleCollectedJSRuntime.cpp
@@ +1193,5 @@
> +
> +void
> +CycleCollectedJSRuntime::OnLargeAllocationFailure()
> +{
> +  nsCOMPtr<nsIObserverService> os =

Yeah, you should move this into a CustomCallback like with OnOutOfMemory.  There's no observer service in workers, so this is just a little confusing.

r=me with that
Attachment #8418262 - Flags: review?(continuation) → review+
Comment on attachment 8418267 [details] [diff] [review]
Part 4 - Add new crash reporter annotations

Review of attachment 8418267 [details] [diff] [review]:
-----------------------------------------------------------------

You should also get somebody who knows something about the crash reporter to review the code changes and tests, like bsmedberg or somebody.

::: toolkit/crashreporter/test/unit/test_crash_oom.js
@@ +1,4 @@
>  function run_test()
>  {
>    if (!("@mozilla.org/toolkit/crash-reporter;1" in Components.classes)) {
> +    dump("INFO | test_crash_oom.js | Can't test crashreporter in a non-libxul build.\n");

Nice...

::: xpcom/base/CycleCollectedJSRuntime.cpp
@@ +1155,5 @@
>      MOZ_ASSERT(!mFinalizeRunnable);
>    }
>  }
>  
> +void 

nit: trailing whitespace

@@ +1156,5 @@
>    }
>  }
>  
> +void 
> +CycleCollectedJSRuntime::AnnotateOutOfMemory(OOMState *aStatePtr, OOMState aNewState)

This name is confusing, because it just sounds like you are just annotating, when you are setting, also.  AnnotateAndSetOutOfMemory()?

You should assert that aNewState != OK, because you'll produce a wrong report in that case. The comparison to a field offset looks odd to me, but maybe that's a normal thing I'm unfamiliar with.

You should also assert that aStatePtr is equal to mOutOfMemoryState or the other one, for the same reason.

I'm not sure how much this function really buys you over just copy pasting and specializing everywhere you want to annotate a report, but I guess it is ok.

@@ +1233,5 @@
>    if (os) {
>      os->NotifyObservers(nullptr, "memory-pressure", MOZ_UTF16("heap-minimize"));
>    }
> +
> +  AnnotateOutOfMemory(&mLargeAllocationFailureState, OOMState::Reported);

I think the observer service is synchronous, so I feel like the OOM state won't be set to Reported() before the JSGC_END case of OnGC is triggered.  But maybe I'm wrong.

::: xpcom/base/CycleCollectedJSRuntime.h
@@ +181,5 @@
>    };
>  
>    void FinalizeDeferredThings(DeferredFinalizeType aType);
>  
> +  // Type of mOutOfMemoryState and mLargeAllocationFailedState.

This comment is not very useful.

@@ +185,5 @@
> +  // Type of mOutOfMemoryState and mLargeAllocationFailedState.
> +  MOZ_BEGIN_NESTED_ENUM_CLASS(OOMState, uint32_t)
> +    OK,          // OOM has never happened in this runtime.
> +    Reporting,   // We are currently reporting OOM.
> +    Reported,    // OOM has been reported since the last GC.

I don't understand what Reported is for.  I guess it is so you have some kind of indication that some OOM thing happened even if the custom callback goes haywire?

You could probably write a test for the |Reporting case|, by registering an observer for memory-pressure that crashes.  Assuming the observer service is available in XPCShell, which it may not be.
Attachment #8418267 - Flags: review?(continuation) → review+
Comment on attachment 8418253 [details] [diff] [review]
Part 1 - Add testing functions reportOutOfMemory and reportLargeAllocationFailure

Review of attachment 8418253 [details] [diff] [review]:
-----------------------------------------------------------------

r=me

::: js/src/builtin/TestingFunctions.cpp
@@ +1610,5 @@
> +ReportOutOfMemory(JSContext *cx, unsigned argc, jsval *vp)
> +{
> +    CallArgs args = CallArgsFromVp(argc, vp);
> +    JS_ReportOutOfMemory(cx);
> +    cx->clearPendingException();

It seems like fuzzing with this could trigger some very weird artificial situations. I guess we can always move it to fuzzing-unsafe later if it turns out to be problematic.
Attachment #8418253 - Flags: review?(terrence) → review+
Comment on attachment 8418267 [details] [diff] [review]
Part 4 - Add new crash reporter annotations

Andrew McCreight wrote:
> You should also get somebody who knows something about the crash reporter
> to review the code changes and tests, like bsmedberg or somebody.

OK...
Attachment #8418267 - Flags: review?(benjamin)
Comment on attachment 8418258 [details] [diff] [review]
Part 2 - Add a data parameter to two memory-pressure-related callbacks

Oops, I thought I already r+d this.  Maybe this is what I was reviewing when the power went out earlier...
Attachment #8418258 - Flags: review?(luke) → review+
Comment on attachment 8418267 [details] [diff] [review]
Part 4 - Add new crash reporter annotations

It appears that once an error occurs, the hunk at CycleCollectedJSRuntime::OnGC will re-annotate at every subsequent GC. Is that correct? If so, we should try to avoid that since annotations are eagerly serialized and that's unnecessary work.

Should we reset mOutOfMemoryState and mLargeAllocationFailedState after that annotation is complete?
Flags: needinfo?(jorendorff)
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #12)
> Comment on attachment 8418267 [details] [diff] [review]
> Part 4 - Add new crash reporter annotations
> 
> It appears that once an error occurs, the hunk at
> CycleCollectedJSRuntime::OnGC will re-annotate at every subsequent GC. Is
> that correct?

The first GC will call AnnotateOutOfMemory, which will assign
  mOutOfMemoryState = OOMState::Recovered;

Subsequent GCs will see that (Recovered != Reported in the code below) and avoid calling AnnotateOutOfMemory at all.

CycleCollectedJSRuntime::OnGC says:
>+      if (mOutOfMemoryState == OOMState::Reported) {
>+        AnnotateOutOfMemory(&mOutOfMemoryState, OOMState::Recovered);
>+      }
Flags: needinfo?(jorendorff)
Attachment #8418267 - Flags: review?(benjamin) → review+
(In reply to Andrew McCreight [:mccr8] from comment #8)
> > +void 
> > +CycleCollectedJSRuntime::AnnotateOutOfMemory(OOMState *aStatePtr, OOMState aNewState)
> 
> This name is confusing, because it just sounds like you are just annotating,
> when you are setting, also.  AnnotateAndSetOutOfMemory()?

Changed. Yeah, it's not pretty. I tried writing a little helper class just to avoid the weirdness of this method's signature, but it was also ugly and more code. :-P

> You should assert that aNewState != OK, because you'll produce a wrong
> report in that case. The comparison to a field offset looks odd to me, but
> maybe that's a normal thing I'm unfamiliar with.

It's an address comparison -- no pointers-to-members here.

> @@ +1233,5 @@
> >    if (os) {
> >      os->NotifyObservers(nullptr, "memory-pressure", MOZ_UTF16("heap-minimize"));
> >    }
> > +
> > +  AnnotateOutOfMemory(&mLargeAllocationFailureState, OOMState::Reported);
> 
> I think the observer service is synchronous, so I feel like the OOM state
> won't be set to Reported() before the JSGC_END case of OnGC is triggered. 
> But maybe I'm wrong.

Yep. That's intentional; see below.

> @@ +185,5 @@
> > +  // Type of mOutOfMemoryState and mLargeAllocationFailedState.
> > +  MOZ_BEGIN_NESTED_ENUM_CLASS(OOMState, uint32_t)
> > +    OK,          // OOM has never happened in this runtime.
> > +    Reporting,   // We are currently reporting OOM.
> > +    Reported,    // OOM has been reported since the last GC.
> 
> I don't understand what Reported is for.  I guess it is so you have some
> kind of indication that some OOM thing happened even if the custom callback
> goes haywire?

"Reporting" is what you see if a custom callback goes haywire. Worth reporting differently because it means (a) the bug is not in error-checking code; (b) the allocation is definitely still on the stack, a very nice clue.

"Reported" is what you see if the callbacks do their best, the allocation fails anyway, and then we crash --- probably due to buggy error-handling code when allocation returns nullptr. Note how this implicates a different kind of bug.

I added more in-depth comments here explaining these enum values.

> You could probably write a test for the |Reporting case|, by registering an
> observer for memory-pressure that crashes.  Assuming the observer service is
> available in XPCShell, which it may not be.

It sure is. Thanks!
(In reply to Carsten Book [:Tomcat] from comment #16)
> sorry had to backout this cset for build failures like
> https://tbpl.mozilla.org/php/getParsedLog.php?id=40185188&tree=Mozilla-
> Inbound

more specific it was part 4
You need to log in before you can comment on or make changes to this bug.