This bug was reported and tracked down by Takis Psarogiannakopoulos. NSPR's pthreads code assumes that 0 is an invalid pthread_t value and stores 0 in a lock or monitor's 'owner' field to mean the lock or monitor is not locked (i.e., has no owner). This assumption is not portable. First, pthread_t is not necessarily an integral type. Second, 0 may be a valid pthread_t value, and indeed that is true on DG/UX. The fix is to add a new boolean field, 'locked', to indicate whether a lock or monitor is locked, and the 'owner' field is only used when 'locked' is true. I will only do this for PRLock. For PRMonitor, I will make its 'owner' field a PRThread* pointer and use the value NULL to indicate that the monitor is not locked.
I checked in my patch on the trunk. Larry, I would appreciate it if you could review my patch.
r=larryh. Looks OK to me.
For reference, this patch made PRMonitor unusable for locking in memory allocators or wrappers of memory allocators -- see bug 95275.
The PRMonitor part of this patch has been backed out. We will add ifdef's to handle the platforms where an invalid pthread id is not 0. Here are some of the comments I'm going to add to _pth.h to explain the decision. Unfortunately some of our clients depend on certain properties of our PRMonitor implementation, preventing us from replacing it by a portable implementation. - High-performance servers like the fact that PR_EnterMonitor only calls PR_Lock and PR_ExitMonitor only calls PR_Unlock. (A portable implementation would use a PRLock and a PRCondVar to implement the recursive lock in a monitor and call both PR_Lock and PR_Unlock in PR_EnterMonitor and PR_ExitMonitor.) Unfortunately this forces us to read the monitor owner field without holding a lock. - One way to make it safe to read the monitor owner field without holding a lock is to make that field a PRThread* (one should be able to read a pointer with a single machine instruction). However, PR_GetCurrentThread calls calloc if it is called by a thread that was not created by NSPR. The malloc tracing tools in the Mozilla client use PRMonitor for locking in their malloc, calloc, and free functions. If PR_EnterMonitor calls any of these functions, infinite recursion ensues.