Closed Bug 988619 Opened 6 years ago Closed 6 years ago

crash in js::gc::CrashAtUnhandlableOOM(char const*)

Categories

(Core :: JavaScript Engine, defect, critical)

29 Branch
x86
Windows NT
defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla31
Tracking Status
firefox28 --- wontfix
firefox29 --- wontfix
firefox30 --- verified
firefox31 --- verified
b2g-v1.4 --- fixed
b2g-v2.0 --- fixed

People

(Reporter: jbecerra, Assigned: bhackett)

References

Details

(Keywords: topcrash-win)

Crash Data

Attachments

(2 files, 1 obsolete file)

This bug was filed from the Socorro interface and is 
report bp-3c549ee1-c1bb-4762-8750-b56382140325.
=============================================================

This is around number 20 in the top crashers for Firefox 29. There are a few comments from people who experience this crash a few times a day. Not a lot to go on. Most of the crashes are in Windows 7 installations. 

0 	mozjs.dll 	js::gc::CrashAtUnhandlableOOM(char const *) 	js/src/gc/Verifier.cpp
1 	mozjs.dll 	js::types::TypeObject::sweep(js::FreeOp *) 	js/src/jsinfer.cpp
2 	mozjs.dll 	js::types::TypeZone::sweep(js::FreeOp *,bool) 	js/src/jsinfer.cpp
3 	mozjs.dll 	JS::Zone::sweep(js::FreeOp *,bool) 	js/src/gc/Zone.cpp
4 	mozjs.dll 	BeginSweepingZoneGroup 	js/src/jsgc.cpp
5 	mozjs.dll 	SweepPhase 	js/src/jsgc.cpp
6 	mozjs.dll 	IncrementalCollectSlice 	js/src/jsgc.cpp
This is the expected signature coming out of solving a security-relevant crash by crashing earlier and in a safe way.

Jan, are we working on reducing those crashes in some way or is this just another "yes, OOM is bad, but the only way to fix is to address the OOM issues on a more global scale" issue?
Flags: needinfo?(jdemooij)
Crash Signature: [@ js::gc::CrashAtUnhandlableOOM(char const*)] → [@ js::gc::CrashAtUnhandlableOOM(char const*)] [@ js::CrashAtUnhandlableOOM(char const*)]
Attached file memory-report.json.gz
So my last crash report led me to bug 805406.  All my symptoms seemed to line up with that.

Anyway, was instructed to save a memory report when things started to go south before the crash, which I did.  But the crash report sent me here, so I'm uploading the report hoping it might help.

Let me know if you have any questions.
Adding ni?njn for feedback on the memory report.

I *think* this is a memory fragmentation issue, given that vsize is way higher than private/resident, and vsize-max-contiguous is 16mb.

Also, this report contains plaintext URLs of what was tabs were open at the time of the report. njn, should we have about:memory warn users about this before dumping the report? And potentially offer to mask them to protect privacy. We don't want users withholding otherwise useful reports because of what tabs they had open, nor do we want users exposing this information without their knowledge.
Flags: needinfo?(n.nethercote)
(In reply to Matt Woodrow (:mattwoodrow) from comment #3)
> Also, this report contains plaintext URLs of what was tabs were open at the
> time of the report. 

Good to know, thanks. (I *think* I closed all my midget-related tabs before this.)
Wondering if Private tabs are also included in this report? If so that would make your point especially poignant.  
However we manage the non-private tabs we should probably default to masking the private ones.
It's surprising that 'vsize' is so much larger than 'resident' and 'private'. I don't know why that would be the case. I'm not surprised it crashed soon after; you did well to get 'vsize' to 3,757 MiB on Windows.

Caspy7, I see 346 about:blank entries. Would you have had 346 empty tabs open?

Bug 709326 is open for improving the about:memory / private browsing situation.
Flags: needinfo?(n.nethercote)
(In reply to Nicholas Nethercote [:njn] from comment #5)
> Caspy7, I see 346 about:blank entries. Would you have had 346 empty tabs
> open?

I might have a bit of a "tab problem" (honest I can quit any time).

These are unrestored/unloaded tabs.
I don't know if they normally show as about:blank, but I Reset this profile a bit ago and many tabs have not been selected since.
I also use an addon which unloads tabs after a period of time.
> These are unrestored/unloaded tabs.

Ah, yes, that'd be it. Having 346 unrestored tabs isn't nearly as strange as 346 blank tabs.

> I might have a bit of a "tab problem" (honest I can quit any time).

I'm not judging :)
Yeah I'm afraid these OOM crashes will be hard to fix... It's weird though that TypeObject::sweep OOMs so frequently, it doesn't allocate a lot of memory I think.

Brian, any thoughts?
Flags: needinfo?(jdemooij) → needinfo?(bhackett1024)
BTW, can we have different stacks leading to CrashAtUnhandlableOOM? If so, should we append the next frame on the stack to it on crash signatures (i.e. add it to the prefix list)?
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #10)
> BTW, can we have different stacks leading to CrashAtUnhandlableOOM? If so,
> should we append the next frame on the stack to it on crash signatures (i.e.
> add it to the prefix list)?

Yes, please!
Attached patch patch (obsolete) — Splinter Review
This patch tries to recover from these OOMs by marking type sets / objects which have allocation issues during copying as unknown, and then discarding all JIT code and new script property information afterwards (since the constraints and type information are now out of sync).

I don't think this function allocates all that much in normal cases, but in zones with lots of stuff it could allocate a lot, and since this happens during GC sweeping the system has a higher chance of being low on memory.
Attachment #8404716 - Flags: review?(jdemooij)
Flags: needinfo?(bhackett1024)
Assignee: nobody → bhackett1024
Attached patch patchSplinter Review
Actually, I didn't test the first patch enough.  TypeZone::sweep leaves stale recompile index entries in IonScripts for JSScripts which are about to be collected, so calling Invalidate() after sweep() wasn't working.
Attachment #8404716 - Attachment is obsolete: true
Attachment #8404716 - Flags: review?(jdemooij)
Attachment #8404751 - Flags: review?(jdemooij)
Depends on: 994913
This continues to rise on Nightly and is currently #7, accounting for 1.89% of our Firefox 31 crashes in the last week.
Keywords: crashtopcrash-win
(In reply to Anthony Hughes, QA Mentor (:ashughes) from comment #14)
> This continues to rise on Nightly and is currently #7, accounting for 1.89%
> of our Firefox 31 crashes in the last week.

I'll review the patch right now. We can see what happens to our crash stats after it lands.
It looks like the skiplist isn't working yet for CrashAtUnhandlableOOM, so it is possible a bunch of unrelated signatures are being combined together.
Comment on attachment 8404751 [details] [diff] [review]
patch

Review of attachment 8404751 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/gc/Zone.h
@@ +314,5 @@
>      }
>  
>      js::types::TypeZone types;
>  
> +    void sweep(js::FreeOp *fop, bool releaseTypes, bool *oom);

Why the OOM outparam instead of return type? To force callers to handle it somehow?

::: js/src/jsinfer.h
@@ +621,5 @@
>  
>    public:
>      /* Mark this type set as representing a non-data property. */
>      inline void setNonDataProperty(ExclusiveContext *cx);
> +    inline void setNonDataProperty(); // Variant for use during GC.

It'd be good to give it a more scary name so that people don't accidentally call it outside GC.

Something like setNonDataPropertyOnOOM, setNonDataPropertyDuringGC, setNonDataPropertyUnsafe, setNonDataPropertyIgnoreConstraints.
Attachment #8404751 - Flags: review?(jdemooij) → review+
The OOM outparam does make it hard for callers to avoid ignoring it, and additionally indicates that the OOM behaves differently from OOM propagation in other parts of the codebase --- we need to sweep all type information in the zone regardless of whether there is an OOM at some point in the middle.

https://hg.mozilla.org/integration/mozilla-inbound/rev/21aca7217e7a
jit/ExecutionModeInlines.h is handled inconsistently by the includes style-checker, so the positioning in this patch caused |make check| failures on tbpl.  I removed *Inlines.h support from the style checker, renamed the file to *-inl.h, and fixed up the locations of all the existing includes to address this.

https://hg.mozilla.org/integration/mozilla-inbound/rev/81dd41a77b43

I also just filed bug 996385 to make style-checking happen in the build process, so this mistake can't happen again.
https://hg.mozilla.org/mozilla-central/rev/21aca7217e7a
https://hg.mozilla.org/mozilla-central/rev/81dd41a77b43
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla31
Depends on: 997992
This is currently the #12 topcrasher on Aurora(Fx30). I project (by virtue of crashes above it with fixes in flight or just landed) it will be in top ten by the time 30 is merged to beta. It's also still in top 20 on Beta(Fx29)

Is this something that should be uplifted to Beta and Aurora now, or is it best left to ride the 31 train?
Tracy, for sure, we will take this patch for aurora. For beta, it will depend the risk evaluation but the patch is rather big.
Depends on: 999402
Brain, can you do an uplift request for this one for Aurora?
Crash Signature: [@ js::gc::CrashAtUnhandlableOOM(char const*)] [@ js::CrashAtUnhandlableOOM(char const*)] → [@ js::gc::CrashAtUnhandlableOOM(char const*)] [@ js::CrashAtUnhandlableOOM(char const*)] [@ js::CrashAtUnhandlableOOM(char const*) | js::types::TypeZone::sweep(js::FreeOp*, bool)] [@ js::gc::CrashAtUnhandlableOOM(char const*) | js::types::TypeObject::…
Flags: needinfo?(bhackett1024)
Comment on attachment 8404751 [details] [diff] [review]
patch

[Approval Request Comment]
Bug caused by (feature/regressing bug #): bug 980630
User impact if declined: Higher chance of (safe) OOM crashes
Testing completed (on m-c, etc.): on m-c
Risk to taking this patch (and alternatives if risky): none
Attachment #8404751 - Flags: approval-mozilla-aurora?
Flags: needinfo?(bhackett1024)
Attachment #8404751 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Needs rebasing for the Aurora uplift.
Flags: needinfo?(bhackett1024)
Depends on: 1000145
Pushed, with the fix for this patch from bug 1000145 included.

https://hg.mozilla.org/releases/mozilla-aurora/rev/ec1d3e26a172
Flags: needinfo?(bhackett1024)
Duplicate of this bug: 1002544
I'm not seeing these signatures anymore in Firefox 29 nor 31. Strangely though I'm seeing js::CrashAtUnhandlableOOM(char const*) | js::types::TypeZone::sweep(js::FreeOp*, bool) rise in Firefox 30:

https://crash-stats.mozilla.com/topcrasher/products/Firefox/versions/30.0a2/date_range_type/report/crash_type/browser/os_name/None/result_count/300?days=7

It's currently at #12 (0.91%), up 14 positions in the last week. Should this bug be reopened to investigate further?
This didn't land in aurora until 4/25, and there aren't crashes in 30.0 builds more recent than that in your link.
(In reply to Anthony Hughes, QA Mentor (:ashughes) [Away until 2014-04-28] from comment #28)
> I'm not seeing these signatures anymore in Firefox 29 nor 31.

Actually, "js::gc::CrashAtUnhandlableOOM(char const*) | js::types::TypeObject::sweep(js::FreeOp*)" is there on 29, as expected given this patch did not land there (and the skiplist changes adding the second frame).
I don't see any crashes for "js::CrashAtUnhandlableOOM(char const*) | js::types::TypeZone::sweep(js::FreeOp*, bool)" later than the 20140425004002 build. 


For  js::CrashAtUnhandlableOOM(char const*) , there are only 2 crash reports in the last 7 days. 

Based on that huge reduction in crashes across all these signatures, I'll verify this as fixed.
Status: RESOLVED → VERIFIED
Duplicate of this bug: 998808
You need to log in before you can comment on or make changes to this bug.