Closed Bug 470500 Opened 12 years ago Closed 12 years ago

Firefox 3.1b2 Crash Report [@ nssutil3.dll@0x34c0 ]

Categories

(NSS :: Libraries, defect, P2)

3.12.2
x86
All
defect

Tracking

(Not tracked)

RESOLVED FIXED
3.12.4

People

(Reporter: chofmann, Assigned: nelson)

References

Details

(Keywords: regression, topcrash)

Crash Data

Attachments

(2 files)

This crash is currently ranked #9 on Firefox 3.1b2 -- appears to be a new regression

0  	nssutil3.dll  	nssutil3.dll@0x34c0  	
1 	nss3.dll 	nss3.dll@0x2551c 	
2 	nss3.dll 	nss3.dll@0x2565a 	
3 	nss3.dll 	nss3.dll@0x999a 	
4 	xul.dll 	NS_InvokeByIndex_P 	xpcom/reflect/xptcall/src/md/win32/xptcinvoke.cpp:101
5 	xul.dll 	XPCWrappedNative::CallMethod 	js/src/xpconnect/src/xpcwrappednative.cpp:2422 

Not much to go on but comments indicate:

* Every time I close the program, the Crash Reporter appears, but there are no other indications of an issue.

* Updated Session Manager add-on and crashed on restart

* brought computer out of stand by

Any easy way to tell if this is in NSS or the calling code?
Flags: blocking1.9.1?
This would all be much easier if Firefox's builds of NSS kept the symbols.
Depends on: 458553
Chris,
I suspect this is yet another crash due to using NSS while it is uninitilized
(e.g. before it is initialized or after it has been shut down).
This looks a LOT like the stack shown in Bug 465974.
That bug was exacerbated by the fix for Bug 462806, which caused a lot of
code to use NSS without initializing it.  
But since this bug claims to happen at shutdown, I'm thinking it's more 
likely to be related to Bug 427715 or Bug 450468.
Benjamin: is this maybe a straight dupe of bug 427715?

There's not a lot to go on here to make this a blocker, especially since it's at shutdown. Please renominate if the numbers continue to climb, though.
Flags: blocking1.9.1? → blocking1.9.1-
Attached file nssutil3.dll-crashes
more comments from recent 3.5 b4 users.   this has now moved up to #4 top crash.

users seem to also be in low memory conditions when they hit this signature.  many running facebook apps and viewing pages there.  a few other sites also listed in the attachment.   Maybe the shutdown problems are also under low memory...
I agree with Nelson that there should be debug symbols in your NSS builds. At the minimum, we need to know the names of the entry point(s) in nss3.dll and nssutil3.dll that are being suspected of causing a problem here.
julien: it'd be great if your team fixed NSS so that debug symbols are built by default.
Did something change? We have symbols on the 1.9.0 branch for Windows, but apparently not 1.9.1.

1.9.1 mozconfig:
http://hg.mozilla.org/build/buildbot-configs/file/d943ec01e814/mozilla2/win32/mozilla-1.9.1/release/mozconfig (note it sets MOZ_DEBUG_SYMBOLS=1)
1.9.0 mozconfig:
http://mxr.mozilla.org/mozilla/source/tools/tinderbox-configs/firefox/win32/mozconfig (I believe the tinderbox client script sets MOZ_DEBUG_SYMBOLS=1 here)

I can't tell that we're doing anything different from our end. Did something in NSS change to break this?
dunno, we haven't had symbols in nss for months now. it's really bad. i've been complaining and been ignored for a while.
I filed bug 468701 a while ago, but I thought it was only for Linux/Mac.
*shrug*, we've been w/o coverage on all platforms for a while.
timeless,

Re: comment 6, I wasn't aware of this problem until now, and I don't think there is an NSS bug filed. Most NSS developers don't build the browser, only NSS standalone. I'm not sure that it is an NSS problem. Debug symbols are built with debug builds of NSS by default. I think if you set MOZ_DEBUG_SYMBOLS=1, you can get them for optimized builds as well, at least for Windows. I'm not sure about other platforms.
I'm not sure this is an NSS bug AT ALL.  

Stepping back from issues about NSS symbols, I think there's a more fundamental
question to be asked, which is: 

    How & why did NS_InvokeByIndex_P call any NSS code at all?  

NS_InvokeByIndex_P is a function that enables JavaScript code to call C++ 
methods on C++ objects.  It finds the vTable for the object, and finds the 
entry point in that vTable using an index, then calls that vTable entry point.

But NSS is not written in C++.  It has no C++ classes and no vtables.  So, it
seems unlikely that NS_InvokeByIndex_P could have legitimately called any NSS
function at all.

I suspect this is a case of a call through a "wild pointer".  

It's conceivable to me that someone has cobbled together some structures to 
look like a C++ object and C++ vTable, and has put the address of an NSS 
function in that table.  If that was done, it was not done in NSS.  It's 
not clear to me how that would work, since no NSS function expects a "this"
pointer.  

So, I think any further effort asking "why did NSS crash" will be fruitless.
It's not surprising that NSS would crash if there was a wild call into it.
A better question is: how/why does that wild call occur?
In reply to comment 11:
Julien, there are at least two bugs on file about the fact that, when NSS is 
built as part of Firefox, it is built without any symbols.  Two such bugs are
cited in previous comments in this bug.  But let's let that that issue be 
addressed and resolved in those bugs, not in this one.
NS_IvokeByIndex is mostly a placeholder. Without symbols for NSS, the debuggers have no clue what the optimizers have done to NSS or the callees and can't guess where the functions are in the call stack. I think because of the sheer size of an invokebyindex frame, it might be easier for the debuggers to find them.

But really, until someone gets us symbols for nss, we can't do anything.
Timeless,  When I build NSS with NSS Makefiles on Windows, I get symbols.

Mozilla builds NSS differently than the NSS team does.  Mozilla builds 
override numerous NSS Makefile variables.  Any differences between the NSS 
team's builds and Mozilla's builds of NSS are Mozilla's responsibility.  
It is likely that Mozilla's builds either
a) do not define MOZ_DEBUG_SYMBOLS , or
b) override the definition of OPTIMIZER and/or CFLAGS and/or NOMD_CFLAGS
there are a handfull of different signatures of a longer length than in comment 0

http://crash-stats.mozilla.com/report/index/010382ab-f863-4e76-802b-a1bb72090513

0  	nssutil3.dll  	nssutil3.dll@0x34c0  	
1 	nss3.dll 	nss3.dll@0x25be6 	
2 	nss3.dll 	nss3.dll@0x25d24 	
3 	nss3.dll 	nss3.dll@0x99c1 	
4 	xul.dll 	XPCWrappedNative::CallMethod 	js/src/xpconnect/src/xpcwrappednative.cpp:2450
5 	xul.dll 	XPC_WN_CallMethod 	js/src/xpconnect/src/xpcwrappednativejsops.cpp:1583
6 	js3250.dll 	js_Invoke 	js/src/jsinterp.cpp:1365
7 	js3250.dll 	js_Interpret 	js/src/jsinterp.cpp:5132
8 	js3250.dll 	js_Invoke 	js/src/jsinterp.cpp:1373
9 	js3250.dll 	js_fun_call 	js/src/jsfun.cpp:1688
10 	js3250.dll 	js_Interpret 	js/src/jsinterp.cpp:5100
11 	js3250.dll 	js_Invoke 	js/src/jsinterp.cpp:1373
12 	xul.dll 	nsXPCWrappedJSClass::CallMethod 	js/src/xpconnect/src/xpcwrappedjsclass.cpp:1614
13 	xul.dll 	nsXPCWrappedJS::CallMethod 	js/src/xpconnect/src/xpcwrappedjs.cpp:561
14 	xul.dll 	PrepareAndDispatch 	xpcom/reflect/xptcall/src/md/win32/xptcstubs.cpp:114
15 	xul.dll 	SharedStub 	xpcom/reflect/xptcall/src/md/win32/xptcstubs.cpp:141
16 	xul.dll 	nsTimerImpl::Fire 	xpcom/threads/nsTimerImpl.cpp:465
17 	xul.dll 	nsTimerEvent::Run 	xpcom/threads/nsTimerImpl.cpp:512
Nelson,

Re: comment 13, I looked at every bug referenced in this one, and didn't find any filed against NSS that complains about debug symbols. The only other one that is about missing NSS debug symbols is bug 468701, but it's filed against product "Core", not against NSS. If there is an NSS bug for missing debug symbols, we should make it a blocker for this one.
Flags: wanted1.9.1?
Julien, The two bugs cited above are bug 458553 and bug 468701.  One of them 
already blocks this bug.  One of them is probably a dup of the other. 
Neither is an NSS bug because this bug is not the fault of any NSS file.
See comment 15.
(In reply to comment #14)
> But really, until someone gets us symbols for nss, we can't do anything.

I believe that the patch I just attached to bug 458553 (attachment 379056 [details] [diff] [review]) should help in debugging these crashes. Is there any chance to get it landed for RC1, or is it too late for that?
Now that symbols are available at least for Windows (after the resolution of bug 458553), could nssutil3.dll@0x34c0 actually mean NSSRWLock_LockRead_Util? See e.g.

http://crash-stats.mozilla.com/report/list?product=Firefox&version=Firefox%3A3.5b99&platform=windows&query_search=signature&query_type=exact&query=NSSRWLock_LockRead_Util&date=&range_value=1&range_unit=weeks&do_query=1&signature=NSSRWLock_LockRead_Util

(from the preview of Firefox 3.5, crashes within the last week, Windows only)
Reminiscent of bug 427715, if so.
The lock is one that is created at NSS initialization time. If the call to NSSRWLock_LockRead crashes, that means the lock is not there.
This is most likely because NSS has already been shut down, or not initialized yet. I recommend this bug be moved to PSM. It should not call HASH_Create without NSS being initialized.
Given those crash reports, probably a straight dupe of bug 427715 (which needs to be reopened, I guess)
Here is a URL that fetches a table of crash reports that all bear the same
"signature" as the one reported for this bug.
http://crash-stats.mozilla.com/report/list?product=Firefox&version=Firefox%3A3.6a1pre&query_search=signature&query_type=exact&query=&date=&range_value=4&range_unit=weeks&do_query=1&signature=nssutil3.dll%400x34c0

An examination of that table reveals that not all the stacks are alike.
There are several very distinct stacks in that table. Consider these
different and unique stacks:

http://crash-stats.mozilla.com/report/index/0ce72a70-0c85-4d43-b24e-f76382090602
http://crash-stats.mozilla.com/report/index/37751c81-d531-483f-af4f-1c4732090528
http://crash-stats.mozilla.com/report/index/a819f4df-3c4f-48fd-890a-b69802090524
http://crash-stats.mozilla.com/report/index/323e8b84-0a1c-42c6-b360-673f22090527
http://crash-stats.mozilla.com/report/index/05142ce2-0a5d-47a6-b120-f544b2090521

I wouldn't assume that they all have the same cause (although, they might). 
If indeed it is the case in each and all of these that NSS has been invoked 
while it is not initialized, then these 5 stacks show 5 different bugs that all 
need to be fixed.  It's unfortunate that crash-stats lump them all together.
Bob, please review.

In the past, the NSS team's position has been that all crashes that occur in
NSS as a result of calling NSS while NSS is in an uninitialized state were,
by definition, not the fault of NSS.  We also argued that eliminating those
crashes would, in most cases, merely delay the failure and/or conceal the 
true cause of the problem.  

But clearly the browser folks cannot rid themselves of this fault, and there
are certain very common cases that we can easily detect and avoid.  So, this
patch attempts to do the very thing we have avoided doing, for at least some
(by no means all) common cases. 

If we put this patch in, I predict that the browser will begin to experience
all sorts of new failures, failures that formerly crashed.  But in some of 
those cases, the failure will be in non-NSS code, and so mozilla won't blame
NSS for those failures.
Assignee: nobody → nelson
Status: NEW → ASSIGNED
Attachment #383205 - Flags: review?(rrelyea)
Priority: -- → P2
Target Milestone: --- → 3.12.4
Version: unspecified → 3.12.2
Comment on attachment 383205 [details] [diff] [review]
Patch v1 for NSS Trunk (untested)

r+ rrelyea

These don't hurt.

I'm almost tempted to Assert ("You haven't Initialized NSS yet!!!"), but the diplomatic return error is probably sufficient.

bob
Attachment #383205 - Flags: review?(rrelyea) → review+
Nelson,

Re: comment 25, just returning an error as in your patch is likely to move failures to some other place. However they are likely to still manifest themselves in usages of NSS. I'm not sure how much time we should spend on this. I don't see any alternative to the browser fixing those issues to resolve the problem.

If we are going to spend time to return errors in optimized builds, I think we probably should assert for debug builds too.

I also think there is a much more systematic way to go about this than trying to figure out the failure relatively "late" as in your patch, ie. when a global structure is missing :

if (!NSS_Initialized()) {
    PORT_SetError(SEC_ERROR_NOT_INITIALIZED);
    PORT_Assert(0);
    return SECFailure; /* or whatever other error code is appropriate */
}

We can turn this into a macro that takes an argument for the correct value to return. And insert a call to this macro at the top of most entry points from libnss/libssl/libsmime, which require NSS to be initialized before calling them. There are a few exceptions, like the SSL session cache init functions, but we should be able to make the list and not have the macro for them.

Of course, even those macros wouldn't completely take care of the problem, if there is a race condition - one thread shuts down NSS, while another thread was in the middle of executing an NSS function. But they still would probably go a long way.
julien: what's the point, we already recognize this crash as meaning nss isn't initialized, it's been years of crashing like this, it hasn't gotten fixed :)
timeless,

It can't be fixed by changes in NSS. The best NSS can do is give earlier warnings/errors/asserts when this erroneous situation occurs. NSS cannot implicitly initialize like NSPR.
Checking in pk11auth.c; new revision: 1.10; previous revision: 1.9
Checking in pk11slot.c; new revision: 1.98; previous revision: 1.97
Checking in pk11util.c; new revision: 1.54; previous revision: 1.53

These changes won't fix ALL possible crashes that are due to using NSS 
while it is uninitialized, but it will detect and avoid the ones reported
in this bug.
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Flags: wanted1.9.1?
Flags: wanted1.9.1.x?
Flags: wanted1.9.1.x?
Crash Signature: [@ nssutil3.dll@0x34c0 ]
You need to log in before you can comment on or make changes to this bug.