Closed Bug 590512 Opened 12 years ago Closed 11 years ago

Firefox 4.0b Crash [@ PL_DHashTableOperate | nsPrefBranch::GetBoolPref(char const*, int*) ]

Categories

(Core :: Preferences: Backend, defect)

x86
Windows 7
defect
Not set
critical

Tracking

()

RESOLVED FIXED
Tracking Status
blocking2.0 --- final+

People

(Reporter: chofmann, Assigned: dwitte)

References

Details

(Keywords: crash, regression)

Crash Data

Attachments

(1 file, 1 obsolete file)

A start up crash rising in volume in beta4.  looks like its mostly windows7.

checking --- PL_DHashTableOperate...nsPrefBranch::GetBoolP 20100824-crashdata.csv
found in: 4.0b4 4.0b3
release total-crashes
              PL_DHashTableOperate...nsPrefBranch::GetBoolP crashes
                         pct.
all     283844  38      0.000133876
4.0b4   2929    24      0.00819392
4.0b3   17067   14      0.000820296

os breakdown
PL_DHashTableOperate...nsPrefBranch::GetBoolPTotal 38
Win5.1  0.11
Win6.0  0.03
Win6.1  0.87


Its possible this is associated with the inability to restart and results in repeated crashes.

user comments:

multiplied appearance of "mozilla crash reporters"

won't let me restart. 

several comments about gmail notifier.

Installed the gmail notifier, using the compatibility check addons. After the restart, I get this crash every time.

I just installed Greasemeonkey and Minefield crashed. Go figure.

Stack looks like

http://crash-stats.mozilla.com/report/index/ceb76946-636b-4dc7-8128-e7deb2100825

Frame  	Module  	Signature [Expand]  	Source
0 	xul.dll 	PL_DHashTableOperate 	obj-firefox/xpcom/build/pldhash.c:615
1 	xul.dll 	nsPrefBranch::GetBoolPref 	modules/libpref/src/nsPrefBranch.cpp:194
2 	xul.dll 	nsPrefService::GetBoolPref 	modules/libpref/src/nsPrefService.h:60
3 	xul.dll 	NS_InvokeByIndex_P 	xpcom/reflect/xptcall/src/md/win32/xptcinvoke.cpp:102
4 	xul.dll 	js::InvokeCommon<int > 	js/src/jsinterp.cpp:561
5 	xul.dll 	js::Invoke 	js/src/jsinterp.cpp:694
6 	xul.dll 	js::Interpret 	js/src/jsinterp.cpp:4710
7 	xul.dll 	js::Execute 	js/src/jsinterp.cpp:886
8 	xul.dll 	JS_ExecuteScript 	js/src/jsapi.cpp:4754
9 	xul.dll 	nsJSContext::ExecuteScript 	dom/base/nsJSEnvironment.cpp:1975
10 	xul.dll 	nsXULDocument::ExecuteScript 	content/xul/document/src/nsXULDocument.cpp:3620
11 	xul.dll 	nsXULDocument::ExecuteScript 	content/xul/document/src/nsXULDocument.cpp:3643
12 	xul.dll 	nsXULDocument::OnStreamComplete 	content/xul/document/src/nsXULDocument.cpp:3511
13 	xul.dll 	nsStreamLoader::OnStopRequest 	netwerk/base/src/nsStreamLoader.cpp:125
14 	xul.dll 	nsJARChannel::OnStopRequest 	modules/libjar/nsJARChannel.cpp:874
15 	xul.dll 	nsInputStreamPump::OnStateStop 	netwerk/base/src/nsInputStreamPump.cpp:578
16 	xul.dll 	nsInputStreamPump::OnInputStreamReady 	netwerk/base/src/nsInputStreamPump.cpp:403
17 	xul.dll 	nsInputStreamReadyEvent::Run 	xpcom/io/nsStreamUtils.cpp:112
18 	xul.dll 	nsThread::ProcessNextEvent 	xpcom/threads/nsThread.cpp:547
19 	xul.dll 	mozilla::ipc::MessagePump::Run 	ipc/glue/MessagePump.cpp:142
20 	xul.dll 	xul.dll@0xb7a45b 	
21 	xul.dll 	MessageLoop::RunInternal 	ipc/chromium/src/base/message_loop.cc:219
22 	xul.dll 	MessageLoop::RunHandler 	ipc/chromium/src/base/message_loop.cc:202
23 	xul.dll 	_SEH_epilog4 	
24 	xul.dll 	MessageLoop::Run 	ipc/chromium/src/base/message_loop.cc:176
25 	xul.dll 	nsBaseAppShell::Run 	widget/src/xpwidgets/nsBaseAppShell.cpp:175
26 	xul.dll 	nsAppShell::Run 	widget/src/windows/nsAppShell.cpp:243
27 		@0x77b5ffff 	
28 	xul.dll 	nsCaret::GetCaretRect 	layout/base/nsCaret.h:151
29 	advapi32.dll 	PcwCollectData 	
30 	xul.dll 	nsHTMLOptionElement::QueryInterface 	
31 		@0x7595ffff

more reports at

http://crash-stats.mozilla.com/report/list?range_value=2&range_unit=weeks&signature=PL_DHashTableOperate%20|%20nsPrefBranch%3A%3AGetBoolPref%28char%20const*%2C%20int*%29

also possible that addon compat is mostly turned off 

addons_checked
   2 checked
  36 [unknown]
blocking2.0: --- → ?
Keywords: crash, regression
OS: Mac OS X → Windows 7
Blocking based on regression, we can unblock if this magically goes away.
Assignee: nobody → dwitte
blocking2.0: ? → final+
maybe something to do with migration.  we saw the orginal spike after beta 4 was released, then decline, then smaller another spike after beta 5 was released.

date     tl crashes at
         PL_DHashTableOperate...nsPrefBranch::GetBoolPref.char.const...int..
            count version, count version,...

20100821 16  4.0b3,
20100822 14  4.0b3,
20100823 56  ,31 4.0b3,15 4.0b4,9 4.0b5pre,1 4.0b2,

 -- beta 4 released

20100824 38  ,24 4.0b4,14 4.0b3,
20100825 100 ,89 4.0b4,8 4.0b5pre,2 4.0b3,1 4.0b4pre,
20100826 83  ,81 4.0b4,2 4.0b2,
20100827 40  4.0b4,
20100828 9   ,5 4.0b3,4 4.0b4,
20100829 44  ,33 4.0b4,11 4.0b3,
20100830 35  ,32 4.0b4,3 4.0b5pre,
20100831 25  ,21 4.0b4,4 4.0b5pre,
20100901 10  ,5 4.0b4,3 4.0b5pre,2 4.0b5,
20100902 17  ,15 4.0b4,2 4.0b6pre,
20100903 36  ,34 4.0b4,1 4.0b5pre,1 4.0b2,
20100904 16  ,10 4.0b4,4 4.0b6pre,2 4.0b2,
20100905 28  4.0b4
20100906 6   ,3 4.0b4,2 4.0b6pre,1 4.0b2,

  -- beta 5 released

20100907 45  ,22 4.0b5,21 4.0b4,1 4.0b6pre,1 4.0b5pre,
20100908 59  ,57 4.0b5,2 4.0b4,
20100909 26  ,21 4.0b5,4 4.0b6pre,1 4.0b2,
20100910 29  ,14 4.0b5,9 4.0b6pre,6 4.0b3,
20100911 1   4.0b5
20100912 8   4.0b5
20100913 24  4.0b5
20100914 4   ,2 4.0b7pre,1 4.0b5,1 4.0b4,
20100915 16  ,15 4.0b6,1 4.0b7pre,
on day of the last spike...

checking --- PL_DHashTableOperate...nsPrefBranch::GetBoolPref.char.const...int.. 20100908-crashdata.csv
found in: 4.0b5 4.0b4
release total-crashes
              PL_DHashTableOperate...nsPrefBranch::GetBoolPref.char.const...int.. crashes
                         pct.
all     312235  59      0.00018896
4.0b5   18501   57      0.00308091
4.0b4   20322   2       9.84155e-05

Correlation to startup or time of session
59 total crashes for PL_DHashTableOperate...nsPrefBranch::GetBoolPref.char.const...int.. on 20100908-crashdata.csv
44 startup crashes inside 2 sec.
57 startup crashes inside 30 sec.
59 startup crashes inside 3 min.

os breakdown
PL_DHashTableOperate...nsPrefBranch::GetBoolPref.char.const...int..Total 59
Win5.1  0.31
Win6.0  0.07
Win6.1  0.63
This isn't going away, we've got crashes every day from now back to 9/2. All with the same stack (i.e. from JS). Gmail Notifier is frequent among the reports, but not common to all. Based on the comments it's likely to be related.I recall seeing a posting in the newsgroups within the last week -- someone seeing NS_ERROR_OUT_OF_MEMORY being returned by a call to getBoolPref from script. One of the very few ways this can happen is if gHashTable.ops is NULL. Something's probably up with gHashTable.ops here too, so they could be related. I'll email the guy and try to get more details.
Severity: normal → critical
OK, newsgroup posting was a red herring. Going to look into what Gmail Notifier is up to...
the trend in commment 2 continued in beta6. as we see a decline in the growth of the beta population there seems to be a decline in the crash volume.

20100914 4

-- beta 6 released

20100915 16   33k adus
20100916 33  218k adus
20100917 7   337k adus
20100918 18  353k adus
20100919 18  408k adus
20100920 12  524k adus
20100921 8   557k adus
20100922 1   572k adus
Attached patch fix (obsolete) — Splinter Review
Apparently xpconnect is fine with converting 'undefined' to a null char* string; which kinda asplodes. (When we rejiggered prefbranch to remove security checks, a nullcheck got lost.)

This also removes a couple unnecessary checks on outparam pointers -- the only risk on those is C++ callers, which means they generally always come from the stack.

This patch fixes the crash for me, though Gmail Notifier is still broken somewhere else. But it doesn't crash. :)
Attachment #478391 - Flags: review?(doug.turner)
This is on top of the patch in bug 589905, btw.
using NS_ENSURE_ARG_POINTER will throw back to js a invalid pointer error.  maybe it is better to use NS_ERROR_INVALID_ARG instead for null pref names.

I do not understand why we want to remove the check for a non-null _retval in PrefHasUserValue?
Sure, I can switch to NS_ERROR_INVALID_ARG.

About _retval -- see comment 7.
right, i read 7.  still don't know why we want to review the code.
Because the way you'd typically call it is:

  PRBool ret;
  pb->PrefHasUserValue("blah", &ret);

which makes it impossible to have a null outparam pointer. And from JS, xpconnect will make it impossible.
Comment on attachment 478391 [details] [diff] [review]
fix


I understand that, what I don't have a grasp of is any other callers that might be passing around a bool* that gets nulled.

It just seems like removing something that has been around for a long time without data that proves that the above case never happens in our code or in any code that is built above preference is risky.

Also, tests?
Attached patch v2Splinter Review
Now with better errors and more tests.
Attachment #478391 - Attachment is obsolete: true
Attachment #478471 - Flags: review?(doug.turner)
Attachment #478391 - Flags: review?(doug.turner)
Attachment #478471 - Flags: review?(doug.turner) → review+
http://hg.mozilla.org/mozilla-central/rev/dc689419476d

Going to leave this open for a bit to see if the crashes disappear.
Er, wrong bug. This is definitely fixed. ;)
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Duplicate of this bug: 602694
Crash Signature: [@ PL_DHashTableOperate | nsPrefBranch::GetBoolPref(char const*, int*) ]
You need to log in before you can comment on or make changes to this bug.