Closed
Bug 1072151
Opened 10 years ago
Closed 7 years ago
crash in OOM | unknown | js::CrashAtUnhandlableOOM(char const*) | js::Nursery::moveToTenured(js::gc::MinorCollectionTracer*, JSObject*)
Categories
(Core :: JavaScript: GC, defect)
Tracking
()
People
(Reporter: JasnaPaka, Unassigned)
References
Details
(Keywords: crash, topcrash, Whiteboard: [tbird crash])
Crash Data
This bug was filed from the Socorro interface and is
report bp-86285ca1-1931-4a1f-b4c7-6b2a02140924.
=============================================================
I have scrolled on Facebook. Random crash.
Comment 1•10 years ago
|
||
I reproduced the crash on Firefox 34 Beta 2 on Windows 7 32bit. Here is the crash report: https://crash-stats.mozilla.com/report/index/bp-0adaedc8-6a94-4910-b6e6-4d0b72141021.
I don't have proper STR. I had many sites opened (Facebook, Youtube, Yahoo Mail, Pinterest, Google Maps) and I was navigating between them.
In the last week 2259 crashes occured with this signature (on Windows 7).
Comment 2•10 years ago
|
||
This is the #10 topcrash in Firefox 33, and is also showing up significantly in 34.0b1. The top urls for this crash signature are for Facebook.
Crashing thread:
0 mozjs.dll js::CrashAtUnhandlableOOM(char const*) js/src/jscntxt.cpp
1 mozjs.dll js::Nursery::moveToTenured(js::gc::MinorCollectionTracer*, JSObject*) js/src/gc/Nursery.cpp
2 mozjs.dll js::Nursery::collectToFixedPoint(js::gc::MinorCollectionTracer*, js::Nursery::TenureCountCache&) js/src/gc/Nursery.cpp
3 mozjs.dll js::Nursery::collect(JSRuntime*, JS::gcreason::Reason, js::Vector<js::types::TypeObject*, 0, js::SystemAllocPolicy>*) js/src/gc/Nursery.cpp
4 mozjs.dll js::gc::GCRuntime::gcCycle(bool, __int64, js::JSGCInvocationKind, JS::gcreason::Reason) js/src/jsgc.cpp
5 mozjs.dll js::gc::GCRuntime::collect(bool, __int64, js::JSGCInvocationKind, JS::gcreason::Reason) js/src/jsgc.cpp
6 mozjs.dll RunLastDitchGC js/src/jsgc.cpp
7 mozjs.dll js::gc::ArenaLists::refillFreeList<1>(js::ThreadSafeContext*, js::gc::AllocKind) js/src/jsgc.cpp
8 mozjs.dll js::gc::AllocateNonObject<JSFatInlineString, 1>(js::ThreadSafeContext*) js/src/jsgcinlines.h
9 mozjs.dll js::ConcatStrings<1>(js::ThreadSafeContext*, JS::Handle<JSString*>, JS::Handle<JSString*>) js/src/vm/String.cpp
10 libGLESv2.dll gl::ResourceManager::getTexture(unsigned int) gfx/angle/src/libglesv2/ResourceManager.cpp
11 libGLESv2.dll gl::GetCurrentData() gfx/angle/src/libglesv2/main.cpp
12 libGLESv2.dll glActiveTexture gfx/angle/src/libglesv2/libGLESv2.cpp
13 xul.dll mozilla::WebGLContext::UnbindFakeBlackTextures() dom/canvas/WebGLContextDraw.cpp
14 xul.dll mozilla::WebGLContext::DrawElements(unsigned int, int, unsigned int, __int64) dom/canvas/WebGLContextDraw.cpp
15 xul.dll mozilla::dom::WebGLRenderingContextBinding::drawElements obj-firefox/dom/bindings/WebGLRenderingContextBinding.cpp
16 @0x52c837bf
Component: General → JavaScript: GC
Comment 3•10 years ago
|
||
Is there anything that can be done about this, Terrence?
Flags: needinfo?(terrence)
Most of these are below 300M address space remaining, so this may just be the first thing they fail on.
It's interesting though that there is a minority of crashes with loads of virtual and pagefile remaining: https://crash-stats.mozilla.com/search/?signature=%3DOOM+|+unknown+|+js%3A%3ACrashAtUnhandlableOOM%28char+const*%29+|+js%3A%3ANursery%3A%3AmoveToTenured%28js%3A%3Agc%3A%3AMinorCollectionTracer*%2C+JSObject*%29&available_virtual_memory=%3E1000000000&available_page_file=%3E1000000000&_facets=signature&_facets=version&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=available_virtual_memory
Also, any chance for a size annotation on these aborts?
Comment 5•10 years ago
|
||
Written in parallel with David's comment 4, so not taking advantage of the new data there:
We generally shouldn't be getting LastDitchGC as it indicates that our GC heap limit tripped before our malloc trigger: we really don't want this to happen ever because last-ditch GC's are non-incremental. It's probably something particular with FB's workload; it would be nice to know what part of the fast-heap-growth curve FB is in when this happens.
In the long term, we need to find a way to cope better when we're near the limit; however, If we've FB tripping this right now, we need to do something in the short term as well. We could either scale down our heap-growth triggers such that we GC sooner when there is still memory available, or we could keep a larger ballast around. Of course, if we're near the heap limit anyway, neither of these is going to help much and it's going to hurt performance elsewhere at the same time.
Flags: needinfo?(terrence)
Comment 6•10 years ago
|
||
(In reply to David Major [:dmajor] (UTC+13) from comment #4)
>
> Also, any chance for a size annotation on these aborts?
1MiB, via VirtualAlloc, 1MiB aligned. The alignment requirement might be killing us, although :ehoogeveen did a ton to help mitigate that issue a few months ago.
(In reply to Terrence Cole [:terrence] from comment #5)
> In the long term, we need to find a way to cope better when we're near the
> limit; however, If we've FB tripping this right now, we need to do something
> in the short term as well.
Still a top crash on 34 beta, and still mostly on FB. Is there any hope for a fix before the train leaves?
Flags: needinfo?(terrence)
Comment 8•10 years ago
|
||
[Tracking Requested - why for this release]:
[Tracking Requested - why for this release]:
This wasn't marked topcrash (till now) and wasn't marked for tracking for 34, somehow. It is the #10 topcrash for 34.0b10 but not at a super high volume.
I'm tagging it now, but this may not make it into 34.
status-firefox33:
--- → affected
status-firefox34:
--- → affected
status-firefox35:
--- → affected
tracking-firefox34:
--- → ?
tracking-firefox35:
--- → ?
Keywords: topcrash
Comment 9•10 years ago
|
||
(In reply to David Major [:dmajor] (UTC+13) from comment #7)
> (In reply to Terrence Cole [:terrence] from comment #5)
> > In the long term, we need to find a way to cope better when we're near the
> > limit; however, If we've FB tripping this right now, we need to do something
> > in the short term as well.
>
> Still a top crash on 34 beta, and still mostly on FB. Is there any hope for
> a fix before the train leaves?
Not really. The short term solution would have been bug 1095620, but the blocking bug 1074961 took almost 2 months more to complete than expected due to existing wrongness. I think we're going to have to let this ride for another release. :-(
Flags: needinfo?(terrence)
Comment 10•10 years ago
|
||
Thanks terrence! I will go ahead and mark it wontfix for 34 then. Our overall crash rate for 34 is looking pretty good actually!
I feel that you should have this animated gif of a kitten with a butterfly:
http://image.blingee.com/images19/content/output/000/000/000/7a9/785756861_1112924.gif
Comment 11•10 years ago
|
||
Our fork of jemalloc now caches up to 128 chunks worth of memory (bug 1073662), and may be getting a variant of the GC allocation logic (bug 1005844). Once we have both, it might be a good idea to see if we can rip out the GC allocation logic in favor of making jemalloc do the heavy lifting, so the GC can benefit from the recycled chunks.
Unfortunately, the chunk recycling logic fundamentally cannot handle chunks of different sizes on Windows, so it is limited to chunksized allocations there (so it won't help with, say, allocating the nursery itself). I'm also not sure if jemalloc actually *exposes* a way to choose the desired alignment, but I'm sure we can make it do so.
Also note that if the patch in bug 1005844 is rejected, I do not think we should unify the logic as it will be a step backward for the GC (just caching chunks does not help use all available chunks in high fragmentation situations).
Comment 12•10 years ago
|
||
This is at #19 for ff35, marking wontfix given we have nothing new to try and bug 1074961 is resolved on 36, we should be OK shipping with this but seeing reduced/no volume in the 36 release.
Comment 13•10 years ago
|
||
Yepp, still exists in the 35.0.1.
Firefox 35.0.1 Crash Report [@ OOM | unknown | js::CrashAtUnhandlableOOM(char const*) | js::Nursery::moveToTenured(js::gc::MinorCollectionTracer*, JSObject*) ]
https://crash-stats.mozilla.com/report/index/f2bfa16c-6386-4258-8219-0d9c82150129
but if it's fixed for 36, let's let it rest.
Comment 14•10 years ago
|
||
initially wanted to leave this thread in peace, but my system fails with this several times a day...
the last two
https://crash-stats.mozilla.com/report/index/75fb60a5-842b-4e4d-b52a-ceadf2150204
https://crash-stats.mozilla.com/report/index/800f9df1-bd6d-4ac5-8c6c-447912150204
will have to stop watching youtube playlists...
Comment 15•10 years ago
|
||
This is the #15 topcrash on Firefox 36.0b with a high number of crashes still in 36.0b5. Not in the top 10 but significantly high volume.
Comment 16•10 years ago
|
||
This may be fixed in 36.0b6 actually. Kairo is there enough data by now to judge or should it wait another day?
Flags: needinfo?(kairo)
Comment 17•10 years ago
|
||
(In reply to Liz Henry :lizzard from comment #16)
> This may be fixed in 36.0b6 actually. Kairo is there enough data by now to
> judge or should it wait another day?
It's still #8 but it's in a similar position in 35, and we've always had GC crashes around there, so I'm not that concerned about the signature itself. Now if we find reproducible cases, we surely want to look into them, and if the JS team might have a good idea what's going on here then as well, but otherwise it doesn't sound very actionable to me.
Flags: needinfo?(kairo)
Comment 18•10 years ago
|
||
Uhg, so bug 1074961 was supposed to make it so that we could easily do something like bug 1073662 and keep more memory live to use as a buffer. But it turns out that to do that safely (e.g. without causing OOM elsewhere), we really need to be able to estimate when our GC triggers are going to fire. Currently our GC triggers are a catastrophe: 5+ years of ad-hoc additions, each their own special snowflake. You can read about the current situation at [1] and the work to fix them is ongoing at [2]. We can at least see the light at the end of the tunnel finally, but I'm afraid none of this is going to be suitable for uplift. In the meantime, I'll keep trying to think of a safe shorter term solution as I continue getting more and more context on the problem.
1 - https://dxr.mozilla.org/mozilla-central/source/js/src/gc/GCRuntime.h?from=GCRuntime.h&case=true#198
2 - https://bugzilla.mozilla.org/show_bug.cgi?id=1130211
Comment 19•10 years ago
|
||
Terrence or Kairo, do you have any reason to believe that this crash is related to JS memory or that GC is related? I suspect that this is just a symptom of running out of memory, and that this is primarily related to the OOM issues with youtube video in 36 (as in comment 14).
We can confirm what is actually using memory with a combination of looking at the memory mappings in the minidump and the about:memory data for those crashes which have it. Here's a supersearch link to 36.0b6 crashes with this signature and which come with about:memory data:
https://crash-stats.mozilla.com/search/?signature=%3DOOM+|+unknown+|+js%3A%3ACrashAtUnhandlableOOM%28char+const*%29+|+js%3A%3ANursery%3A%3AmoveToTenured%28js%3A%3Agc%3A%3AMinorCollectionTracer*%2C+JSObject*%29&contains_memory_report=!__null__&version=36.0&build_id=20150202183609&_facets=build_id&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#crash-reports
You'll need to use the API to get access to the about:memory data, since it's not exposed in the UI anywhere. It's not public, so you might need extra permissions.
e.g. I loaded the memory report from https://crash-stats.mozilla.com/report/index/0731731b-62f3-4903-8c19-97c882150203 and this is clearly not JS-related:
Explicit Allocations
286.06 MB (100.0%) ++ explicit
423.11 MB ── private
1,109.52 MB ── resident
3,944.82 MB ── vsize
1.08 MB ── vsize-max-contiguous
So we're running out of virtual memory in this case.
I think that dmajor has a way of categorizing groups of crashes based on this data, but I'm not 100% sure.
Flags: needinfo?(dmajor)
Comment 20•10 years ago
|
||
Odds are the JS engine is trying to allocate a new 1mb chunk to allocate an object in. It wouldn't be too surprising that this is one of the first things to fail in a low-memory situation.
Comment 21•10 years ago
|
||
Yeah this is just low memory. JS isn't really at the heart of the problem.
There's a supersearch field 'write_combine_size' that shows how much memory is going to the gfx stack. Between that field and the Youtube URLs, a lot of these are pointing to the recent video OOM issues.
Flags: needinfo?(dmajor)
Comment 22•10 years ago
|
||
Seems it is a wontfix for 36.
Tracking for 37 since it is in the top 10 (even if we have been tracking this bug for a while now).
Comment 23•10 years ago
|
||
Is this bug actionable or do we expect that it will be actionable at some point? Should we simply resolve this as wontfix?
Comment 24•10 years ago
|
||
For any "OOM | unknown" crash, one action item is to make the size known. I've been grumbling about js::CrashAtUnhandlableOOM for a long time, but given the JS code patterns it may not be easy to annotate. Not sure if it should be dealt with here or a more general bug.
IMO wontfix seems harsh, I'd prefer that you just untrack or call it incomplete, but I guess I don't really care that much.
Comment 25•10 years ago
|
||
(In reply to David Major [:dmajor] (UTC+13) from comment #24)
> IMO wontfix seems harsh, I'd prefer that you just untrack or call it
> incomplete, but I guess I don't really care that much.
Given that this is still an active crash, I think you're right about this. I'm going to untrack as I don't see the value on following up here until we come up with a way to obtain additional information to help us debug this. I also see instances of the bug on 38 and 39 and so have marked both releases as affected.
Comment 26•9 years ago
|
||
Hello we are getting a report what this is happening again on SuMo
Comment 27•9 years ago
|
||
[Tracking Requested - why for this release]:
status-firefox40:
--- → ?
status-firefox41:
--- → ?
Updated•9 years ago
|
Crash Signature: [@ OOM | unknown | js::CrashAtUnhandlableOOM(char const*) | js::Nursery::moveToTenured(js::gc::MinorCollectionTracer*, JSObject*)] → [@ OOM | unknown | js::CrashAtUnhandlableOOM(char const*) | js::Nursery::moveToTenured(js::gc::MinorCollectionTracer*, JSObject*)]
[@ OOM | unknown | js::CrashAtUnhandlableOOM | js::Nursery::moveToTenured]
Comment 28•9 years ago
|
||
Another crash reported on SUMO with TB 38.4.0:
bp-0fa034eb-2833-4185-bccf-0acd32151207
See https://support.mozilla.org/en-US/questions/1097870
Comment 29•9 years ago
|
||
(In reply to Christian Riechers from comment #28)
> Another crash reported on SUMO with TB 38.4.0:
> bp-0fa034eb-2833-4185-bccf-0acd32151207
>
> See https://support.mozilla.org/en-US/questions/1097870
#80 crash signature for Thunderbird 38.4.0. But many are multiple reports by same users. And, as bsmedberg suggests, these look like straight up OOM and the crash signature is of no help
Whiteboard: [tbird crash]
Comment 31•7 years ago
|
||
This signature ends after FIrefox 39 - for any signature containg js::Nursery::moveToTenured.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WORKSFORME
You need to log in
before you can comment on or make changes to this bug.
Description
•