Last Comment Bug 706442 - Firefox 10.0a2 Crash Report [@ js::LifoAlloc::getOrCreateChunk(unsigned int) ]
: Firefox 10.0a2 Crash Report [@ js::LifoAlloc::getOrCreateChunk(unsigned int) ]
Status: RESOLVED FIXED
[js:waitingforinfo][qa?]
: crash, regression, sec-critical
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: 10 Branch
: x86 Windows 7
: -- critical (vote)
: mozilla11
Assigned To: Chris Leary [:cdleary] (not checking bugmail)
:
:
Mentors:
Depends on: 772338
Blocks:
  Show dependency treegraph
 
Reported: 2011-11-30 05:25 PST by Carsten Book [:Tomcat]
Modified: 2015-10-07 18:44 PDT (History)
17 users (show)
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
unaffected
-
unaffected
+
affected
+
-
+
wontfix
unaffected


Attachments
Clear the chunk's next field after releasing further chain chunks. (1.48 KB, patch)
2011-11-30 12:07 PST, Chris Leary [:cdleary] (not checking bugmail)
luke: review+
jpr: approval‑mozilla‑aurora+
Details | Diff | Splinter Review

Description Carsten Book [:Tomcat] 2011-11-30 05:25:55 PST
noticing on the top-changer list on crash-stats.

Example Crash-Report -> https://crash-stats.mozilla.com/report/index/72527b8c-3a12-4e7f-8a27-087c22111129

general overview: https://crash-stats.mozilla.com/report/list?range_value=3&range_unit=days&signature=js%3A%3ALifoAlloc%3A%3AgetOrCreateChunk%28unsigned%20int%29&version=Firefox%3A10.0a2

all windows 7/xp crashes


Crashing Thread
Frame 	Module 	Signature [Expand] 	Source
0 	mozjs.dll 	js::LifoAlloc::getOrCreateChunk 	js/src/ds/LifoAlloc.cpp:180
1 	mozjs.dll 	js::analyze::ScriptAnalysis::addJump 	js/src/jsanalyze.cpp:81
2 	mozjs.dll 	js::analyze::ScriptAnalysis::analyzeBytecode 	js/src/jsanalyze.cpp:593
3 	mozjs.dll 	JSScript::makeAnalysis 	js/src/jsinfer.cpp:5507
4 	mozjs.dll 	JSScript::ensureRanAnalysis 	js/src/jsinferinlines.h:1270
5 	mozjs.dll 	js::types::TypeMonitorCall 	js/src/jsinferinlines.h:327
6 	mozjs.dll 	js::Interpret 	js/src/jsinterp.cpp:3959
7 	mozjs.dll 	js::types::TypeMonitorCallSlow 	js/src/jsinfer.cpp:4972
8 	mozjs.dll 	js::RunScript 	js/src/jsinterp.cpp:584
9 	mozjs.dll 	js_fun_call 	js/src/jsfun.cpp:1761
10 	mozjs.dll 	js::types::TypeSet::addType 	js/src/jsinferinlines.h:1028
11 	mozjs.dll 	js::Interpret 	js/src/jsinterp.cpp:3948
12 	mozjs.dll 	js::types::TypeMonitorCallSlow 	js/src/jsinfer.cpp:4972
13 	mozjs.dll 	js::RunScript 	js/src/jsinterp.cpp:584
14 	mozjs.dll 	js_fun_call 	js/src/jsfun.cpp:1761
15 	mozjs.dll 	js::types::TypeSet::addType 	js/src/jsinferinlines.h:1028
16 	mozjs.dll 	js::Interpret 	js/src/jsinterp.cpp:3948
17 	mozjs.dll 	js::CallObject::create 	js/src/vm/CallObject.cpp:78
18 	mozjs.dll 	js::CreateFunCallObject 	js/src/jsfun.cpp:743
19 	mozjs.dll 	js::InvokeKernel 	js/src/jsinterp.cpp:647
20 	mozjs.dll 	js_fun_call 	js/src/jsfun.cpp:1761
21 	mozjs.dll 	js::types::TypeSet::addType 	js/src/jsinferinlines.h:1028
22 	mozjs.dll 	js::Interpret 	js/src/jsinterp.cpp:3541
23 	mozjs.dll 	js::types::TypeMonitorCallSlow 	js/src/jsinfer.cpp:4972
24 	mozjs.dll 	js::RunScript 	js/src/jsinterp.cpp:584
25 	mozjs.dll 	js_fun_call 	js/src/jsfun.cpp:1761
26 	mozjs.dll 	js::InvokeKernel 	js/src/jsinterp.cpp:629
27 	mozjs.dll 	js::Interpret 	js/src/jsinterp.cpp:3948
28 	mozjs.dll 	js::types::TypeMonitorCallSlow 	js/src/jsinfer.cpp:4972
29 	mozjs.dll 	js::RunScript 	js/src/jsinterp.cpp:584
30 	mozjs.dll 	js::Invoke 	js/src/jsinterp.h:148
31 	mozjs.dll 	js_fun_apply 	js/src/jsfun.cpp:1817
32 	mozjs.dll 	js::InvokeKernel 	js/src/jsinterp.cpp:629
33 	mozjs.dll 	js::Interpret 	js/src/jsinterp.cpp:3948
34 	mozjs.dll 	js::types::TypeMonitorCallSlow 	js/src/jsinfer.cpp:4963
35 	mozjs.dll 	js::RunScript 	js/src/jsinterp.cpp:584
36 	mozjs.dll 	js::Invoke 	js/src/jsinterp.h:148
37 	mozjs.dll 	js_fun_apply 	js/src/jsfun.cpp:1817
38 	mozjs.dll 	js::InvokeKernel 	js/src/jsinterp.cpp:629
39 	mozjs.dll 	js::Interpret 	js/src/jsinterp.cpp:3948
40 	mozjs.dll 	js::types::TypeMonitorCallSlow 	js/src/jsinfer.cpp:4963
41 	mozjs.dll 	js::RunScript 	js/src/jsinterp.cpp:584
42 	mozjs.dll 	js_fun_call 	js/src/jsfun.cpp:1761
43 	mozjs.dll 	js::ContextStack::currentScript 	js/src/vm/Stack-inl.h:619
44 	mozjs.dll 	js::Interpret 	js/src/jsinterp.cpp:4049
45 	mozjs.dll 	js::ContextStack::pushInvokeFrame 	js/src/vm/Stack.cpp:691
46 	mozjs.dll 	js::InvokeKernel 	js/src/jsinterp.cpp:647
47 	mozjs.dll 	JS_CallFunctionValue 	js/src/jsapi.cpp:5199
48 	xul.dll 	nsXPCWrappedJSClass::CallMethod 	js/xpconnect/src/XPCWrappedJSClass.cpp:1530
49 		@0xffffff81 	
50 	mozjs.dll 	JS_WrapObject 	js/src/jsapi.cpp:1438
51 	xul.dll 	XPCConvert::NativeInterface2JSObject 	js/xpconnect/src/XPCConvert.cpp:1276
52 		@0x72c94ff 	
53 	mozjs.dll 	js::types::TypeMonitorCallSlow 	js/src/jsinfer.cpp:4972
54 	mozjs.dll 	js::ContextStack::pushInvokeFrame 	js/src/vm/Stack.cpp:691
55 	xul.dll 	nsHttpTransaction::LocateHttpStart 	netwerk/protocol/http/nsHttpTransaction.cpp:734
56 	xul.dll 	SelectorMatches 	layout/style/nsCSSRuleProcessor.cpp:2146
57 	xul.dll 	SelectorMatches 	layout/style/nsCSSRuleProcessor.cpp:2146
58 	xul.dll 	nsEventDispatcher::Dispatch 	content/events/src/nsEventDispatcher.cpp:677
Comment 1 Chris Leary [:cdleary] (not checking bugmail) 2011-11-30 12:07:59 PST
Created attachment 578039 [details] [diff] [review]
Clear the chunk's next field after releasing further chain chunks.

This bug is on aurora, but not beta.
Comment 2 Chris Leary [:cdleary] (not checking bugmail) 2011-12-01 18:02:27 PST
https://hg.mozilla.org/integration/mozilla-inbound/rev/818d5be34ab7
Comment 3 Chris Leary [:cdleary] (not checking bugmail) 2011-12-02 11:12:35 PST
https://hg.mozilla.org/mozilla-central/rev/818d5be34ab7
Comment 4 Chris Leary [:cdleary] (not checking bugmail) 2011-12-03 14:33:26 PST
Fixed on trunk, but still waiting for aurora approval decision.
Comment 5 JP Rosevear [:jpr] 2011-12-13 14:52:07 PST
Comment on attachment 578039 [details] [diff] [review]
Clear the chunk's next field after releasing further chain chunks.

There was no risk assessment, but we believe this is a low risk fix.
Comment 6 Chris Leary [:cdleary] (not checking bugmail) 2011-12-14 15:40:45 PST
https://hg.mozilla.org/releases/mozilla-aurora/rev/1b3ce7846516

Sorry about the risk assessment -- promise I'll remember to include it next time!
Comment 7 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2011-12-28 13:33:30 PST
Is this something QA can verify?
Comment 8 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2012-02-01 12:57:18 PST
(In reply to Anthony Hughes, Mozilla QA (irc: ashughes) from comment #7)
> Is this something QA can verify?

bump
Comment 9 [:philipp] 2012-06-09 15:09:56 PDT
this crash seems to be still around in current versions...
Comment 11 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2012-06-11 13:41:17 PDT
Seems to be most crashy with Firefox 12 on Windows XP. One of the comments mention CastleVille on Facebook. Firefox 14, 15, and 16 seem to be pretty low, though I'm not sure if that's because of ADUs or if this has been mitigated by a patch.
Comment 12 Alex Keybl [:akeybl] 2012-06-13 13:19:08 PDT
Do we have a test case here that we can verify? Does Comment 10 imply that we still have an sg:crit issue in our code?
Comment 13 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2012-06-13 15:03:55 PDT
If Firefox 11 was fixed when in mozilla-central (as comment 3 and the status flag indicate), this should (in theory) be fixed for Firefox 12 and onward by nature of the train model.

If Firefox 10 was fixed when in mozilla-aurora (as comment 6 and the status flag indicate), this should be fixed in the first Firefox 10.0 ESR.

Though, since QA was never able to reproduce this bug, we were also never able to truly verify the fix. I'm happy to test if there is something we can try.
Comment 14 Scoobidiver (away) 2012-06-13 23:09:38 PDT
It's #169 top browser crasher in 10.0.2, #136 in 11.0 over the last 4 weeks, #116 in 12.0, #110 in 13.0, #99 in 14.0b6, #91 in 15.0a2, and #200 in 16.0a1 over the last week.
10.0.2, 10.0.3, 10.0.4 and 10.0.5 ESR are affected.

Based on 140 comments over the last 4 weeks, some people were playing a game on Facebook, were using Facebook, were playing videos, were signing into their email account, or were opening an email.

Here are interesting comments:
"Virtual memory warning box appeared just before crash."
"Everytime we tries to log off any program this happens."
"since it was updated it crashes all the time" (Fx 13)
"suddenly at facebook - play castle ville canceled and the desktop was visible. reloaded Firefox, facebook, then again the game loaded, was originally built and then grey screen with an AIDS in a circle. "reload page" pressed, again of the game and then again demolition and desktop and this page beginning of visible. https://apps.facebook.com/playcastleville/?fb_source=bookmark_apps&ref=bookmarks&count=0&1_0.1.1=fb_bmpos" (translated from German by Bing Translator)
Comment 15 Al Billings [:abillings] 2012-06-14 05:26:14 PDT
The crashstats don't make this sound very fixed.
Comment 16 David Mandelin [:dmandelin] 2012-06-25 17:17:00 PDT
This appears to be some sort of OOM bug, but it's hard to tell. It's crashing on this line of code:

    BumpChunk *newChunk = BumpChunk::new_(chunkSize);

which calls js_malloc and then initializes the returned memory. The crash addresses are also always page starts, which makes me think somehow the allocator is returning an unmapped page. I thought Windows wasn't supposed to do that, though. It could be a bug in jemalloc.

I did notice that type inference was what was calling into the LIFO allocator on the crash reports that I clicked, so it's possible sometimes type inference is allocating more than it should.
Comment 17 Scoobidiver (away) 2012-06-29 23:34:05 PDT
A user using Sync with a corrupt profile has a Firefox that crashes sometimes with this signature: see bug 769556.
Comment 18 Andrew McCreight [:mccr8] 2012-06-29 23:55:34 PDT
(In reply to David Mandelin from comment #16)
> This appears to be some sort of OOM bug, but it's hard to tell.

Windows crash reports now have information about memory usage.  For instance, 4 random reports with this signature that I selected have system memory usage percentage at 96%, 39%, 98% and 98%.
Comment 19 David Mandelin [:dmandelin] 2012-07-09 15:49:53 PDT
(In reply to Andrew McCreight [:mccr8] from comment #18)
> (In reply to David Mandelin from comment #16)
> > This appears to be some sort of OOM bug, but it's hard to tell.
> 
> Windows crash reports now have information about memory usage.  For
> instance, 4 random reports with this signature that I selected have system
> memory usage percentage at 96%, 39%, 98% and 98%.

OK, so reasonably likely :-) to be mostly OOM-related, possibly from runaway alloc in TI but I don't think that's likely. 

What can we do now? Who knows the real story about what happens on Windows if you call jemalloc and the system is very low on memory?
Comment 20 Andrew McCreight [:mccr8] 2012-07-09 19:04:33 PDT
(In reply to David Mandelin from comment #19)
> What can we do now? Who knows the real story about what happens on Windows
> if you call jemalloc and the system is very low on memory?
jlebar may know.
Comment 21 Justin Lebar (not reading bugmail) 2012-07-09 20:13:41 PDT
> Who knows the real story about what happens on Windows if you call jemalloc and the system is very 
> low on memory?

The common thread I see in all the crash reports I looked at is low "available page file".  That number is ullAvailPageFile from MEMORYSTATUSEX, "The maximum amount of memory the current process can commit, in bytes." [1]

It's pretty weird in some cases [2, 3, 4] that we have less than 5MB of available page file and 100+ MB of "available physical memory" (that's ullAvailPhys: "amount of physical memory currently available, in bytes. This is the amount of physical memory that can be immediately reused without having to write its contents to disk first.").

Looking at some random, unrelated crash reports, there's always much more available page file than available physical memory, so I think this is an anomalous situation.

I'm not sure, but I think this may mean that the system has run out of space in its pagefile -- that is, the pagefile is too small, and Windows can't grow it, perhaps because the system is out of disk space.  Windows doesn't overcommit -- I understand that this means that if a process allocates a bunch of MEM_COMMIT virtual memory but doesn't touch it, that space is reserved and must fit either in core or the page file.  So if something (perhaps Firefox) on the user's machine is eating up a lot of MEM_COMMIT vmem, it's possible we could get into this state where there's a lot of physical memory available (because the pages haven't /actually/ been committed yet), but no pagefile available.

Exactly what this means for jemalloc, I'm not sure yet.  But if anyone cares to correct me on the above (glandium?) that might help.  :)

[1] http://msdn.microsoft.com/en-us/library/windows/desktop/aa366770%28v=vs.85%29.aspx
[2] https://crash-stats.mozilla.com/report/index/d41a602f-015b-4dbb-9263-f687e2120709
[3] https://crash-stats.mozilla.com/report/index/90fcf4e0-db21-4d0b-abf1-8c6cb2120709
[4] https://crash-stats.mozilla.com/report/index/82cce1b7-2ede-42dc-b2f5-6c1312120709
Comment 22 Justin Lebar (not reading bugmail) 2012-07-09 20:18:42 PDT
We never check the return value in jemalloc's pages_commit (VirtualAlloc(MEM_COMMIT)).  I wouldn't be surprised if that's what's failing here.
Comment 23 Scoobidiver (away) 2012-07-10 00:20:42 PDT
In addition to comment 17, that user hit this crash with a new profile, suspicious software uninstalled and a disk check done. Certain other applications (Chrome, Safari, TB) also crashes.
Comment 24 Andrew McCreight [:mccr8] 2012-07-10 04:52:47 PDT
I'm not sure this really needs to be a security bug.  It seems like we have a new crash that just happens to have the same signature as an existing fixed sg:crit.  But maybe it is too early to tell...
Comment 25 Justin Lebar (not reading bugmail) 2012-07-10 06:22:24 PDT
> It seems like we have a new crash that just happens to have the same signature as an existing fixed 
> sg:crit.

Indeed, that's what appears to be happening.

We should be able to tell when I land the abort in bug 772338.
Comment 26 David Mandelin [:dmandelin] 2012-07-11 18:34:30 PDT
Great analysis. Thanks, Justin. By the way, how do you read the MEMORYSTATUSEX out of a crashdump? Do you just call GlobalMemoryStatusEx in the debugger? I had no idea such things were possible...
Comment 27 Justin Lebar (not reading bugmail) 2012-07-11 18:39:43 PDT
> By the way, how do you read the MEMORYSTATUSEX out of a crashdump? Do you just call 
> GlobalMemoryStatusEx in the debugger? I had no idea such things were possible...

No, it's not so magical.  :)  We just make the syscall while we're building the crash report; the data we collect shows up on the crash-stats page.
Comment 28 David Mandelin [:dmandelin] 2012-07-17 14:29:36 PDT
(In reply to Justin Lebar [:jlebar] from comment #27)
> > By the way, how do you read the MEMORYSTATUSEX out of a crashdump? Do you just call 
> > GlobalMemoryStatusEx in the debugger? I had no idea such things were possible...
> 
> No, it's not so magical.  :)  We just make the syscall while we're building
> the crash report; the data we collect shows up on the crash-stats page.

Oh. :-) Well, I never saw those fields before, so I still learned something!
Comment 29 David Mandelin [:dmandelin] 2012-07-17 14:31:56 PDT
This particular flavor of the crash seems unlikely to be sec-critical, so removing the whiteboard tag to prevent confusion.
Comment 30 Andrew McCreight [:mccr8] 2012-08-29 12:07:00 PDT
This really needs a new bug filed for it if we want to track the new regression, as it is unrelated to the original issue.
Comment 31 Lukas Blakk [:lsblakk] use ?needinfo 2012-09-06 16:51:58 PDT
Triage comment:
Since this appears to have been classified as non-exploitable we're wontfixing for ESR10 and will get it in ESR17.

Note You need to log in before you can comment on or make changes to this bug.