Closed Bug 503638 Opened 12 years ago Closed 12 years ago

OOM crash [@ PL_DHashTableOperate | _MD_CURRENT_THREAD] in nsCycleCollector

Categories

(Core :: XPCOM, defect)

x86
Windows XP
defect
Not set
critical

Tracking

()

VERIFIED FIXED
Tracking Status
blocking1.9.2 --- .20+
status1.9.2 --- .20-fixed
status1.9.1 --- wanted

People

(Reporter: goatboy100, Assigned: timeless)

References

Details

(Keywords: crash, verified1.9.2, Whiteboard: [qa-examined-192] [qa-needs-STR])

Crash Data

Attachments

(4 files)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.11) Gecko/2009060215 Firefox/3.0.11 (.NET CLR 3.5.30729)
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.11) Gecko/2009060215 Firefox/3.0.11 (.NET CLR 3.5.30729)

As the title states, the memory usage causes a sudden, and frankly disturbing crash. No Data is lost, but when multiple tabs have been opened and then closed, the memory usage hardly changes. I don't know the specifics of the code, so let me be broad; I think you need to determine if FF is keeping handles on memory that should be cleared, and take the appropriate steps to fix it. If it's not FF keeping handles on memory that is moot (for instance, a webpage that you closed out 35 tabs ago), then please comment on this with what you've found to be the case, and propose a solution.

Reproducible: Always

Steps to Reproduce:
1. start firefox.
2. browse the web as normal, but do not close the session; only a few tabs here and there inside the main window.
3. watch the screen to see FF simply dissapear; no errors, no nothing.
Version: unspecified → 3.0 Branch
Does this happen with Firefox 3.5 or later, in Firefox safe mode, or a new firefox profile? What is normal web browsing? Flash, silverlight, javascript heavy, what? Do you get a crash, meaning, does firefox close on you?
OS: All → Windows XP
Hardware: All → x86
It happens sooner with FF 3.5 than with 3.0.x, normal web browsing consists of text-heavy, websites like yahoo or google mail's web interface, but without flash enabledö I've found that the actually cut by ½ the time it takes FF to crash in this manner. I donät have any addons like silverlight, but I seem to remember reading somewhere that webmail like yahoo and gmail is/are javascript intensive... As for the last of your questions, yes, firefox does close on me, though I've noticed that when FF isn't the latest app to open (top of the heap/bottom of the stack) it's memory usage ebbs and flows until it (or another application) crashes. in the second instance, I can tell that it's FF because the constant increase in RAM usage is accounted for by looking at the "mem usage" field of windows task manager tab "processes"; firefox.exe's memory count integer value increases while all other processes remain static in their mem usage. I'm now 99.99999% sure it's a heap+stack collision, but I need someone who knows the code to help me prove this... Any volunteers?
That stack is from 3.0. Could you maybe try to get one from 3.5? Will probably be the same, but just for reference.
Summary: Memory Usage Causes Sudden Crash. → Memory Usage Causes Sudden Crash [@ PL_DHashTableOperate ]
Keywords: crash
Product: Firefox → Core
QA Contact: general → general
Version: 3.0 Branch → 1.9.0 Branch
We need to figure out what's causing us to crash in PL_DHashTableOperate, since it's not usually an NSPR bug...
Flags: wanted1.9.0.x?
:ss, how do I go about doing that without knowing how to read the FF source code? I don't have any of the test suites (mochi, smoketest, litmus, etc) installed/ready to use, and I wouldn't know how to use them anyway. I'm some-what computer literate, but not in the programming sense... plz explain the steps like you would to your grandmother (no offense!).
Jacob: That wasn't a specific comment to you, more of a general one. From you, we need better reproducible steps so developers can determine what the problem is. If you can try safe mode to see if that helps, it might help to get specific steps to reproduce.

http://support.mozilla.com/en-US/kb/Safe+Mode
I saw a crash in PL_DHashTableOperate today under 3.5.3.  The stack doesn't look the same though so I filed it as a new bug (bug 516113).  If it turns out it's the same, feel free to dupe.
Jacob, how much memory is Firefox using before it crashes?  If it's over 1GB I can easily believe that this is an out-of-memory issue.
The specs on the computer are as follows: 2.4GHz Intel Celeron CPU, 1GB RAM (PageFile.Present = False; // I try to minimize the wear on the hard drive, and that means #EXCLUDE<page_file.pf>), a 120GB hard drive, a CD\DVD burner, and it's all powered by a 500W PSU. I'm now posting from FF 3.5.3, and I've had low-memory conditions with 3.5.*, but they don't usually result in a crash like in the 3.0.* branch; just scripts on web pages being stopped (not a big deal), and the "close tab" button (and select others) disappear (they're no longer displayed in the FF window), though their functionality doesn't go away. Will file a bug for that later. In the mean time, I've got to figure out how best to free up hard drive space.
Summary: Memory Usage Causes Sudden Crash [@ PL_DHashTableOperate ] → Memory Usage Causes Sudden Crash [@ PL_DHashTableOperate ], possibly due to Windows page file being disabled
Summary: Memory Usage Causes Sudden Crash [@ PL_DHashTableOperate ], possibly due to Windows page file being disabled → Memory Usage Causes Sudden Crash [@ PL_DHashTableOperate | _MD_CURRENT_THREAD], possibly due to Windows page file being disabled
ss: not that it matters, but technically pl_dhashtable isn't in nspr at all (histerical explanation omitted)

Signature	PL_DHashTableOperate
UUID	20ffc204-ca51-4b65-95b0-a044d2090711
Time 	2009-07-11 13:43:32.12669
Uptime	61106
Last Crash	7775607 seconds before submission
Product	Firefox
Version	3.0.11
Build ID	2009060215
Branch	1.9.0
OS	Windows NT
OS Version	5.1.2600 Service Pack 3
CPU	x86
CPU Info	GenuineIntel family 15 model 2 stepping 7
Crash Reason	EXCEPTION_ACCESS_VIOLATION
Crash Address	0x8

Crashing Thread
Frame 	Module 	Signature [Expand] 	Source
0 	xul.dll 	PL_DHashTableOperate 	pldhash.c:588
1 	nspr4.dll 	_MD_CURRENT_THREAD 	mozilla/nsprpub/pr/src/md/windows/w95thred.c:298
2 		@0x37903f 	

http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/xpcom/glue/pldhash.c&rev=3.12&mark=588#580

bsmedberg 1.1 keyHash = table->ops->hashKey(table, key);

as hashKey is not the first / second offset, iirc this means that table is null (see crash address). Sadly, that's obviously incredibly unhelpful given the lack of a useful stack.

reporter: please try using ff3.5.4 and https://developer.mozilla.org/En/How_to_get_a_stacktrace_with_WinDbg

ff3.0 is while not technically unsupported, realistically uninteresting (i'm sorry for your crash, but we have newer products with many crash fixes,...).

it's fairly expensive for me to help you debug this and the return for debugging a crash in ff3.0 is rather poor. if we're lucky, it's already fixed. if we're unlucky, i don't help, and both of us lose quite a bit of time.
jesse, et al: please be aware that _MD_CURRENT_THREAD is not the correct caller, it's just somehow the garbage we're getting. That said, I don't mind having a tag, although i worry we might miss someone who has a correct stack.

note that disabling the page file is merely a way of causing us to run out of memory sooner, it's not actually relevant in any useful way. We don't use the page file directly, it just means that windows has more virtual memory for apps (including us).
Summary: Memory Usage Causes Sudden Crash [@ PL_DHashTableOperate | _MD_CURRENT_THREAD], possibly due to Windows page file being disabled → Memory Usage Causes Sudden Crash [@ PL_DHashTableOperate | _MD_CURRENT_THREAD], (oom)
I disable the Pagefile, because it reduces the wear+and+tear on the hard drive over all. With Windows, the pagefile holds the memory stack, and the stack is where everything is referenced from. All that constant reading from&writing to the page file wears the hard drive down MUCH faster. I try to make the computer & conmp prts I have last as lång as I can
jacob: sure, you're free to do that. i'm just explaining that technically as far as our software is concerned, the fact that you've disabled it is irrelevant (and thus doesn't belong in the bug summary). i'm not saying you don't have reason to do it, or that you shouldn't do it.
So am I reading this bug correctly and it just states "we don't handle OOM gracefully in some unknown case"? In that case I'm inclined to resolve it, since Jacob hasn't replied in 3 months. timeless?
Whiteboard: [needs a real stack trace from the reporter]
Stack trace of last out of memory condition before crash.
Attached file Trigger for this bug
Trigger for this bug. Tested on firefox 3.5.8 on windows XP SP3 with 3 GB of ram.
oren: thanks, although it'd be nice if you provided a better warning in your testcases (preferably require the user to click a button in the testcase before killing their browser).

attachment 428529 [details] shows we do:
1316 GCGraphBuilder::GCGraphBuilder(GCGraph &aGraph,
1322     if (!PL_DHashTableInit(&mPtrToNodeMap, &PtrNodeOps, nsnull,
1324         mPtrToNodeMap.ops = nsnull;

which fails, leaving ops as null

attachment 428528 [details] shows we crash:
599     keyHash = table->ops->hashKey(table, key);
   table->ops = 0
        mov     eax,dword ptr [eax+8] ds:0023:00000008=????????

not surprisingly

2444 nsCycleCollector::BeginCollection()
2449     GCGraphBuilder builder(mGraph, mRuntimes);
<the bug is here> nothing checks to see that builder's mPtrToNodeMap.ops is happy
2456             mRuntimes[i]->BeginCycleCollection(builder);

the code is from bug 378514, and still exists today, so while oren's stack is from 3.5.8 (1.9.1), i'm changing this to trunk.
Status: UNCONFIRMED → NEW
Component: General → XPCOM
Depends on: 378514
Ever confirmed: true
QA Contact: general → xpcom
Whiteboard: [needs a real stack trace from the reporter]
Version: 1.9.0 Branch → Trunk
Summary: Memory Usage Causes Sudden Crash [@ PL_DHashTableOperate | _MD_CURRENT_THREAD], (oom) → OOM crash [@ PL_DHashTableOperate | _MD_CURRENT_THREAD] in nsCycleCollector
Attached patch proposalSplinter Review
Assignee: nobody → timeless
Status: NEW → ASSIGNED
Attachment #428677 - Flags: review?(dbaron)
Attachment #428677 - Flags: review?(dbaron) → review+
Comment on attachment 428677 [details] [diff] [review]
proposal

>@@ -1395,6 +1395,7 @@ public:
> 
>     // nsCycleCollectionTraversalCallback methods.
>     NS_IMETHOD_(void) NoteXPCOMRoot(nsISupports *root);
>+    bool Initialized();
> 
> private:
>     NS_IMETHOD_(void) DescribeNode(CCNodeType type, nsrefcnt refCount,
>@@ -1498,6 +1499,11 @@ GCGraphBuilder::NoteXPCOMRoot(nsISupport
>     NoteRoot(nsIProgrammingLanguage::CPLUSPLUS, root, cp);
> }
> 
>+bool
>+GCGraphBuilder::Initialized()
>+{
>+    return !!mPtrToNodeMap.ops;
>+}
> 

Please move both the declaration and definition higher (probably to right after the destructor), since this method is not an nsCycleCollectionTraversalCallback method.


r=dbaron with that
http://hg.mozilla.org/mozilla-central/rev/3874a469cf09

we probably want this for branches...
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Blocks: 624268
blocking1.9.2: --- → .18+
Crash Signature: [@ PL_DHashTableOperate | _MD_CURRENT_THREAD]
Comment on attachment 428677 [details] [diff] [review]
proposal

Approved for 1.9.2.18, a=dveditz for release-drivers
Attachment #428677 - Flags: approval1.9.2.18+
blocking1.9.2: .18+ → .19+
Comment on attachment 428677 [details] [diff] [review]
proposal

Didn't make 3.6.18
Attachment #428677 - Flags: approval1.9.2.19+
Attachment #428677 - Flags: approval1.9.2.18-
Attachment #428677 - Flags: approval1.9.2.18+
We just fixed this old bug on 1.9.2 but did it ever have clear steps to reproduce?
Whiteboard: [qa-examined-192] [qa-needs-STR]
The reported issue (with the steps to reproduce in the description) is not reproducible on the 3.6.20 build:
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.20) Gecko/20110803 Firefox/3.6.20
Setting this as Verified. If there are other (more clear) steps to reproduce, please post them in order to have a double check on this issue. Thanks
Status: RESOLVED → VERIFIED
Whiteboard: [qa-examined-192] [qa-needs-STR] → [qa-examined-192] [qa-needs-STR][verified1.9.2]
Keywords: verified1.9.2
Whiteboard: [qa-examined-192] [qa-needs-STR][verified1.9.2] → [qa-examined-192] [qa-needs-STR]
You need to log in before you can comment on or make changes to this bug.