Closed
Bug 772226
Opened 12 years ago
Closed 12 years ago
Assertion failure: lock != NULL, at prulock.c
Categories
(NSPR :: NSPR, defect)
Tracking
(Not tracked)
RESOLVED
WONTFIX
People
(Reporter: KaiE, Assigned: wtc)
Details
Attachments
(1 file)
1.01 KB,
patch
|
Details | Diff | Splinter Review |
Since around 2012/07/05 09:46:02 we have a new failure on Windows XP machine buildnss03. Most failures are preceeded by: Assertion failure: lock != NULL, at e:/mozilla/security/tinderlight/data/buildnss03_trunk_64_DBG/mozilla/nsprpub/pr/src/threads/combined/prulock.c:198 Example logfile http://tinderbox.mozilla.org/showlog.cgi?log=NSS/1341848422.1341855358.22814.gz&fulltext=1
Assignee | ||
Comment 1•12 years ago
|
||
That assertion failure means some NSS (or NSPR) code is passing a NULL 'lock' argument to PR_Lock: http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/nsprpub/pr/src/threads/combined/prulock.c&rev=3.15&mark=189,198#186 We should track this down. Kai, could you get the call stack of the assertion failure? This requires logging into the VM and running the modutil or certutil command manually. You may need to run the command in a debugger.
Reporter | ||
Comment 2•12 years ago
|
||
I'm having trouble to reproduce this error when running the commands manually. I saw a different error "Failed to add module ... unknown pkcs#11 error". Because it appeared that my command looks fine, I started to experiment. One of the experiments made the command work. I copied the dll to a different, shorter path. That makes the command work. Is it possible that we have a limit for the path? A path of 125 characters fails. A libfile path of 88 characters works... Currently the tests are being run from path /tinderbox/mozilla/security/tinderlight. I consider to move that directory to a shorter path, let the tinderbox test script run and see what happens...
Reporter | ||
Comment 3•12 years ago
|
||
Moved to /tinderbox/tinderlight ... let's wait for the next cycles of buildnss03
Assignee | ||
Comment 4•12 years ago
|
||
Kai: thank you for trying to track this down. I suspect the PR_Lock(null) error occurred during NSS_Shutdown. Perhaps NSS destroyed some lock and then later tried to acquire the lock. Just a wild guess.
Reporter | ||
Comment 5•12 years ago
|
||
I'm now able to reproduce the assertion when running modutil in the Windows debugger. While I'm still confused, because I cannot reproduce the behaviour seen during the test run, I was able to reproduce the assertion using a different approach. I get it when running modutil without any argument at all, in other words, if modutil simply prints the usage output and exits. But there appears to be a race. In most scenarios the assertion warning is printed, but sometimes it's not. Wan-Teh, yes, it's true that comcone called PR_Lock with null. I can see that modutil has two active threads. (I don't know why modutil would start a second thread when print simply the usage information.) The assertions happens on the secondary thread. It appears to be some kind of automatic thread. Thread name is _threadstartex. stack is: msvcr90.dll - threadstartex - callthreadstartex libnspr4.dll - pr_root - _PR_nativeRunThread - line 391 (just after comment "add to list of active threads") - PR_Lock(0) - PR_Assert
Reporter | ||
Comment 6•12 years ago
|
||
stack of main thread: modutil.exe - main libnspr4.dll - PR_Cleanup line 429 - _PR_CleanupBeforeExit line 313 - _PR_MD_CLEANUP_BEFORE_EXIT line 109
Reporter | ||
Comment 7•12 years ago
|
||
I see that _pr_activeLock gets destroyed and set to null in _PR_CleanupThreads. I restarted and set a breakpoint in _PR_CleanupThreads. That breakpoint gets hit first, called from PR_Cleanup line 422. At this time, the main thread is only thread. I set a breakpoint for each line in PR_Cleanup after 422. Each time the debugger stopped until line 429, there was just one single thread. The additional thread gets created inside _PR_CleanupBeforeExit, at the time it calls WSACleanup.
Reporter | ||
Comment 8•12 years ago
|
||
The startFunc of the secondary thread is: ContinuationThread
Reporter | ||
Comment 9•12 years ago
|
||
It would be good if these details can help you understand what's wrong. Please let me know if you need further information, I have stopped the continous building/testing on buildnss03 and will wait for your feedback.
Assignee | ||
Comment 10•12 years ago
|
||
Kai: thank you for the info. I can understand the problem you described. However, it is not clear if it is the same PR_Lock(0) call observed in normal test run on that tinderbox. I noticed that this is the "WINNT" build configuration, which is why I lowered the priority and severity of this bug. The "ContinuationThread" is an internal thread created by NSPR for UDP support. Since NSS doesn't have any tests that use UDP yet, we can comment out the ContinuationThread as an experiment. If the assertion regularly fails on the tinderbox buildnss03, please apply this patch to the NSS source tree locally. If the assertion failure is gone, this will prove that the assertion failure is related to the ContinuationThread. Thanks.
Assignee | ||
Comment 11•12 years ago
|
||
Comment on attachment 645085 [details] [diff] [review] Patch for debugging: comment out ContinuationThread Kai: I have trouble testing this patch at work. I will test this patch at home tonight. Please do not test this patch until I have tested it.
Reporter | ||
Comment 12•12 years ago
|
||
Wan-Teh, I guess you didn't find time since you wrote comment 11 (2 months ago). (The affected Windows buildnss03 machine has been deactivated since that time.) How should we proceed?
Reporter | ||
Comment 13•12 years ago
|
||
(In reply to Wan-Teh Chang from comment #10) > > I noticed that this is the "WINNT" build configuration, which is > why I lowered the priority and severity of this bug. I don't understand. Why is the WINNT configuration a low priority? The WINNT build configuration appears to be the default one! We're using it on all the currently active Windows NSS build machines.
Reporter | ||
Comment 14•12 years ago
|
||
changing component to NSPR
Assignee: nobody → wtc
Component: Test → NSPR
Product: NSS → NSPR
Version: 3.14 → 4.9.2
Reporter | ||
Updated•12 years ago
|
Whiteboard: [waiting for wtc, comment 11]
Reporter | ||
Comment 15•12 years ago
|
||
I would also like to clarify that all Windows XP testing on buildnss03 was sitting idle for 3 months already, because of comment 11. I just reenabled this build machine, but as expected it still shows this bug. Wan-Teh, I'm not sure why you had asked me to wait for you to test this locally. If you want, just let me know, and I can test this patch on buildnss03 (as you had proposed in comment 10, before you asked me to wait for you).
Reporter | ||
Comment 16•12 years ago
|
||
Wan-Teh clarified that Mozilla is using OS_TARGET=WIN95 I will have buildnss03 do one more cycle with WINNT and the debug patch applied, to see what we get. Afterwards, I'll change buildnss03 to use WIN95.
Reporter | ||
Comment 17•12 years ago
|
||
It seems like the patch didn't help? As you can see in this build log http://tinderbox.mozilla.org/showlog.cgi?log=NSS/1351272886.1351284204.11257.gz&fulltext=1 the tree contains a patch for ntio.c but we still get the assertions. Anyway. It's time to switch buildnss03 to WIN95 now.
Reporter | ||
Comment 18•12 years ago
|
||
We don't get this failure on buildnss03 Since Wan-Teh clarified the WINNT configuration is outdated, I don't have an interest to track this bug any more. I suggest WONTFIX.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → WONTFIX
Reporter | ||
Updated•12 years ago
|
Whiteboard: [waiting for wtc, comment 11]
Comment 19•11 years ago
|
||
Hi Kai,Wan-Teh, We are also getting this same error message on Windows 2003 and 2008 (32 bits). But the error is thrown only with debug bits. Same problem does not occur with optimized build. Do you have any update on this issue? Is there any workaround/solution to this problem?
Comment 20•11 years ago
|
||
We are building NSPR 4.9.5 with NSS 3.14.3
Comment 21•11 years ago
|
||
> We are also getting this same error message on Windows 2003 and 2008 (32
> bits). But the error is thrown only with debug bits. Same problem does not
> occur with optimized build.
> Do you have any update on this issue? Is there any workaround/solution to
> this problem?
This problem will never occur in any optimized build because assertions are disabled in optimized builds.
Are you using OS_TARGET=WIN95 or OS_TARGET=WINNT?
Please attack a stacktrace to this bug.
Comment 22•11 years ago
|
||
We are using OS_TARGET=WINNT. I will get back to you with stack trace of the failure.
Comment 23•11 years ago
|
||
Please use OS_TARGET=WIN95.
You need to log in
before you can comment on or make changes to this bug.
Description
•