Assertion failure: lock != NULL, at prulock.c

RESOLVED WONTFIX

Status

NSPR
NSPR
--
major
RESOLVED WONTFIX
6 years ago
5 years ago

People

(Reporter: kaie, Assigned: Wan-Teh Chang)

Tracking

4.9.2
x86_64
Windows XP

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment)

(Reporter)

Description

6 years ago
Since around 2012/07/05 09:46:02 we have a new failure on Windows XP machine buildnss03.

Most failures are preceeded by:

Assertion failure: lock != NULL, at e:/mozilla/security/tinderlight/data/buildnss03_trunk_64_DBG/mozilla/nsprpub/pr/src/threads/combined/prulock.c:198

Example logfile
http://tinderbox.mozilla.org/showlog.cgi?log=NSS/1341848422.1341855358.22814.gz&fulltext=1
(Assignee)

Comment 1

6 years ago
That assertion failure means some NSS (or NSPR) code is passing a NULL
'lock' argument to PR_Lock:

http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/nsprpub/pr/src/threads/combined/prulock.c&rev=3.15&mark=189,198#186

We should track this down.

Kai, could you get the call stack of the assertion failure?  This requires
logging into the VM and running the modutil or certutil command manually.
You may need to run the command in a debugger.
(Reporter)

Comment 2

6 years ago
I'm having trouble to reproduce this error when running the commands manually.

I saw a different error "Failed to add module ... unknown pkcs#11 error".

Because it appeared that my command looks fine, I started to experiment. One of the experiments made the command work. I copied the dll to a different, shorter path. That makes the command work.

Is it possible that we have a limit for the path? A path of 125 characters fails. A libfile path of 88 characters works...

Currently the tests are being run from path /tinderbox/mozilla/security/tinderlight.

I consider to move that directory to a shorter path, let the tinderbox test script run and see what happens...
(Reporter)

Comment 3

6 years ago
Moved to /tinderbox/tinderlight ... let's wait for the next cycles of buildnss03
(Assignee)

Comment 4

6 years ago
Kai: thank you for trying to track this down.  I suspect the PR_Lock(null)
error occurred during NSS_Shutdown.  Perhaps NSS destroyed some lock and
then later tried to acquire the lock.  Just a wild guess.
(Reporter)

Comment 5

6 years ago
I'm now able to reproduce the assertion when running modutil in the Windows debugger.

While I'm still confused, because I cannot reproduce the behaviour seen during the test run, I was able to reproduce the assertion using a different approach. I get it when running modutil without any argument at all, in other words, if modutil simply prints the usage output and exits. But there appears to be a race. In most scenarios the assertion warning is printed, but sometimes it's not.

Wan-Teh, yes, it's true that comcone called PR_Lock with null.

I can see that modutil has two active threads. (I don't know why modutil would start a second thread when print simply the usage information.)

The assertions happens on the secondary thread. It appears to be some kind of automatic thread. Thread name is _threadstartex. stack is:
msvcr90.dll  - threadstartex
             - callthreadstartex
libnspr4.dll - pr_root
             - _PR_nativeRunThread - line 391
                     (just after comment "add to list of active threads")
             - PR_Lock(0)
             - PR_Assert
(Reporter)

Comment 6

6 years ago
stack of main thread:

modutil.exe  - main
libnspr4.dll - PR_Cleanup line 429
             - _PR_CleanupBeforeExit line 313
             - _PR_MD_CLEANUP_BEFORE_EXIT line 109
(Reporter)

Comment 7

6 years ago
I see that _pr_activeLock gets destroyed and set to null in _PR_CleanupThreads.

I restarted and set a breakpoint in _PR_CleanupThreads.

That breakpoint gets hit first, called from PR_Cleanup line 422.
At this time, the main thread is only thread.

I set a breakpoint for each line in PR_Cleanup after 422.
Each time the debugger stopped until line 429, there was just one single thread.

The additional thread gets created inside _PR_CleanupBeforeExit,
at the time it calls WSACleanup.
(Reporter)

Comment 8

6 years ago
The startFunc of the secondary thread is:
  ContinuationThread
(Reporter)

Comment 9

6 years ago
It would be good if these details can help you understand what's wrong.
Please let me know if you need further information, I have stopped the continous building/testing on buildnss03 and will wait for your feedback.
(Assignee)

Comment 10

6 years ago
Created attachment 645085 [details] [diff] [review]
Patch for debugging: comment out ContinuationThread

Kai: thank you for the info.  I can understand the problem you
described.  However, it is not clear if it is the same PR_Lock(0)
call observed in normal test run on that tinderbox.

I noticed that this is the "WINNT" build configuration, which is
why I lowered the priority and severity of this bug.

The "ContinuationThread" is an internal thread created by NSPR
for UDP support.  Since NSS doesn't have any tests that use UDP
yet, we can comment out the ContinuationThread as an experiment.

If the assertion regularly fails on the tinderbox buildnss03,
please apply this patch to the NSS source tree locally.  If
the assertion failure is gone, this will prove that the assertion
failure is related to the ContinuationThread.

Thanks.
(Assignee)

Comment 11

6 years ago
Comment on attachment 645085 [details] [diff] [review]
Patch for debugging: comment out ContinuationThread

Kai: I have trouble testing this patch at work.
I will test this patch at home tonight.  Please do
not test this patch until I have tested it.
(Reporter)

Comment 12

6 years ago
Wan-Teh, I guess you didn't find time since you wrote comment 11 (2 months ago).

(The affected Windows buildnss03 machine has been deactivated since that time.)

How should we proceed?
(Reporter)

Comment 13

6 years ago
(In reply to Wan-Teh Chang from comment #10)
> 
> I noticed that this is the "WINNT" build configuration, which is
> why I lowered the priority and severity of this bug.

I don't understand. Why is the WINNT configuration a low priority?

The WINNT build configuration appears to be the default one!
We're using it on all the currently active Windows NSS build machines.
(Reporter)

Comment 14

6 years ago
changing component to NSPR
Assignee: nobody → wtc
Component: Test → NSPR
Product: NSS → NSPR
Version: 3.14 → 4.9.2
(Reporter)

Updated

6 years ago
Whiteboard: [waiting for wtc, comment 11]
(Reporter)

Comment 15

6 years ago
I would also like to clarify that all Windows XP testing on buildnss03 was sitting idle for 3 months already, because of comment 11.

I just reenabled this build machine, but as expected it still shows this bug.

Wan-Teh, I'm not sure why you had asked me to wait for you to test this locally.

If you want, just let me know, and I can test this patch on buildnss03 (as you had proposed in comment 10, before you asked me to wait for you).
(Reporter)

Comment 16

6 years ago
Wan-Teh clarified that Mozilla is using OS_TARGET=WIN95

I will have buildnss03 do one more cycle with WINNT and the debug patch applied, to see what we get.

Afterwards, I'll change buildnss03 to use WIN95.
(Reporter)

Comment 17

6 years ago
It seems like the patch didn't help?

As you can see in this build log
http://tinderbox.mozilla.org/showlog.cgi?log=NSS/1351272886.1351284204.11257.gz&fulltext=1

the tree contains a patch for ntio.c but we still get the assertions.


Anyway. It's time to switch buildnss03 to WIN95 now.
(Reporter)

Comment 18

6 years ago
We don't get this failure on buildnss03

Since Wan-Teh clarified the WINNT configuration is outdated, I don't have an interest to track this bug any more.

I suggest WONTFIX.
Status: NEW → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → WONTFIX
(Reporter)

Updated

6 years ago
Whiteboard: [waiting for wtc, comment 11]

Comment 19

5 years ago
Hi Kai,Wan-Teh,

We are also getting this same error message on Windows 2003 and 2008 (32 bits). But the error is thrown only with debug bits. Same problem does not occur with optimized build.
Do you have any update on this issue? Is there any workaround/solution to this problem?

Comment 20

5 years ago
We are building NSPR 4.9.5 with NSS 3.14.3
> We are also getting this same error message on Windows 2003 and 2008 (32
> bits). But the error is thrown only with debug bits. Same problem does not
> occur with optimized build.
> Do you have any update on this issue? Is there any workaround/solution to
> this problem?

This problem will never occur in any optimized build because assertions are disabled in optimized builds.

Are you using OS_TARGET=WIN95 or OS_TARGET=WINNT?

Please attack a stacktrace to this bug.

Comment 22

5 years ago
We are using OS_TARGET=WINNT.

I will get back to you with stack trace of the failure.
Please use OS_TARGET=WIN95.
You need to log in before you can comment on or make changes to this bug.