Open Bug 516989 Opened 15 years ago Updated 2 years ago

Improve condition variable implementation for modern windows platforms

Categories

(NSPR :: NSPR, enhancement)

x86
Windows XP
enhancement

Tracking

(Not tracked)

People

(Reporter: bas.schouten, Unassigned)

References

Details

Attachments

(2 files, 4 obsolete files)

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.0; en-GB; rv:1.9.0.14) Gecko/2009082707 Firefox/3.0.14 (.NET CLR 3.5.30729) Build Identifier: The current implementation of condition variables relies on a relatively complicated scheduling mechanism. This was written for Windows 95, since windows 95 a wider array of wait functions has become available, one of them being SignalObjectAndWait, which, from windows 2000 onwards can be used to produce better implementations. On Vista and onwards it could use the platform native condition variables. Reproducible: Always
Status: UNCONFIRMED → NEW
Ever confirmed: true
This is an initial attempt to do this. This is based on the paper at http://www.cs.wustl.edu/~schmidt/win32-cv-1.html. I made some small changes replacing some of the critical sections by atomic operations, that I believe should in these cases suffice. This should work for Mozilla by the virtue of the surrounding code always having one lock associated with the cvar and locking the associated lock before entering the Notify functions. The #ifdefs are not all in the right place and it plain inserts the defined and replaces w95cv with w32cv. Currently this passes the unit tests and seems to perform worse in the cvar2.exe test with large numbers of threads. I believe partially because the implementation requires all locks to have been changed from mutex to critical section. Making there be a larger locking/unlocking overhead in the test cases, which only do rapid locking/unlocking. I believe it might be worth trying to implement a custom mutex which allows utilizing SignalObjectAndWait and still maintaining the benefit of preventing a kernel mode transition on acquiring the mutex.
Erm, I also forgot to change the comment block on the top of the file to roughly describe the working. It still describes the w95cv.c implementation.
Thanks for the patch. There are bugs in Schmidt & Pyarali's algorithm. (You may be able to find descriptions of the bugs with a web search or in the pthreads-win32 source code.) That's why I implemented my own wait list, rather than letting the threads wait on an event or semaphore object. We need to use critical sections for better performance. That means we can't use SignalObjectAndWait.
(In reply to comment #3) > Thanks for the patch. There are bugs in Schmidt & Pyarali's > algorithm. (You may be able to find descriptions of the bugs > with a web search or in the pthreads-win32 source code.) > That's why I implemented my own wait list, rather than letting > the threads wait on an event or semaphore object. > > We need to use critical sections for better performance. That > means we can't use SignalObjectAndWait. Right, I wrote another piece of code, which I'm still testing and need to profile. Which uses a custom mutex system(which seems to work but heh, I better test it pretty heavily for making any such claim with as many edge cases as mutexes suffer from), it does something fairly similar in avoiding the kernel mode calls. I believe pthreads-win32 does something similar but I still need to take the time to dig into their code and see what exactly they did. I tested this with the cvar2 test case and lock test case. In the lock test case it performs similar to criticalsections. In the cvar2 I still see significantly better performance in the current implementation than my SOAW algorithm in the case of a single condvar waking up a series of threads. I also think this is partially due to the current code being 'optimized' for that test somewhat :-). The releasing on unlock is very good at preventing contention when a large number of threads gets released all at once. I believe the actual CPU time consumed by the SOAW implementation is less. But sadly I haven't been able to run a profiler on it yet.
I added a new patch. This implements the previously suggested 'custom critical section' which will work with SignalObjectAndWait. It passes all unit tests and the browser seems to run well. It performs slightly better to much better on most of the cvar2 tests, and somewhat worse on one of them. I have theories about the extreme differences but it would be a bit lengthy to discuss them here. I was unable to find cases where this would violate the conditions of POSIX condition variables. The discussions in win32-pthreads seem to consider the older solutions proposed in the paper. But I might have missed something, if you happen to know where things go wrong exactly or point me at a document describing the issue I'd like to hear.
Comment on attachment 403363 [details] [diff] [review] Win32 CVars using SOAW and implementing light-weight mutex I still think this approach is better (the patch could use some clean-up, but hey). I'd still like to know where in this approach the problems are. As virtually all objective measurements seem to show an improvement.
Attachment #403363 - Flags: feedback?(wtc)
Bas: thank you very much for your patch. I am very sorry for my lack of response. I believe I intended to reply to you but forgot to do that. I updated your patch to the current tip of the NSPR source tree and made the following changes: 1. I define the WIN32_CV macro in _win95.h rather than on the compiler command line. This avoid changing the makefiles. (I know you did a good job of modifying only the makefiles that need -DWIN32_CV.) _win95.h is one of the places where we can define configuration macros for NSPR's internal use. 2. I changed #ifndef WIN32_CV A #else B #endif to #ifdef WIN32_CV B #else A #endif 3. I folded lines longer than 80 characters, fixed indentation, removed spaces at the end of lines, changed "mutices" to "mutexes", and updated the MPL in the new w32cv.c file to MPL 2.0. Comments on the patch: It is not easy to emulate a condition variables using semaphores and events. People who tried this made subtle mistakes. So I cannot check in this patch unless I have reviewed it, and I am not sure if I can or have the time to do a good review. I noticed that the thread that signals or broadcasts a condition variable has to block until all the waiting threads have waken up. The current implementation in w95cv.c does not do this kind of blocking after signaling or broadcasting a condition variable. So I am worried about performance problems. However, none of the NSPR test programs that measure condition variable performance showed more than a negligible difference, and actually the difference is in favor of your implementation. There seems to be a bug: if I run the perf.exe program repeatedly, occasionally it hangs. I didn't investigate it. Your implementation is smaller (down from 315 lines to 180 lines). It may be a good idea to do another implementation based on the new condition variable functions in Windows Vista. Thank you again.
Attachment #401071 - Attachment is obsolete: true
Attachment #403363 - Attachment is obsolete: true
Attachment #403363 - Flags: feedback?(wtc)
Thanks a lot for your extensive response! I must admit it's been a long while since I wrote this patch so I'll hopefully have some time in the near future to look at this again.
This patch is incomplete. I still need to add a reference count to _MDCVar. I also need to compare the code in ptsynch.c with it.
Removed unrelated cleanup of pr/src/md/windows/w95io.c from the patch.
Attachment #8398887 - Attachment is obsolete: true
Added a TODO comment about thread.md.blocked_sema.
Attachment #8398888 - Attachment is obsolete: true
See Also: → 1323316

The bug assignee is inactive on Bugzilla, so the assignee is being reset.

Assignee: wtc → nobody
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: