Open
Bug 194339
Opened 22 years ago
Updated 1 year ago
Implement atomic PRStack for Darwin PowerPC
Categories
(NSPR :: NSPR, enhancement)
Tracking
(Not tracked)
NEW
People
(Reporter: wtc, Unassigned)
Details
Attachments
(1 file, 1 obsolete file)
1.88 KB,
patch
|
Details | Diff | Splinter Review |
We should implement atomic PRStack for Darwin PowerPC in PowerPC assembly language.
Comment 1•22 years ago
|
||
The PPC manual has an example of this. E.5 List Insertion The following example shows how the lwarx and stwcx. instructions can be used to implement simple LIFO (last-in-first-out) insertion into a singly-linked list. (Complicated list insertion, in which multiple values must be changed atomically, or in which the correct order of insertion depends on the contents of the elements, cannot be implemented in the manner shown below, and requires a more complicated strategy such as using locks.) The next element pointer from the list element after which the new element is to be inserted, here called the parent element, is stored into the new element, so that the new element points to the next element in the list—this store is performed unconditionally. Then the address of the new element is conditionally stored into the parent element, thereby adding the new element to the list. In this example, it is assumed that the address of the parent element is in GPR3, the address of the new element is in GPR4, and the next element pointer is at offset zero from the start of the element. It is also assumed that the next element pointer of each list element is in a reservation granule separate from that of the next element pointer of all other list elements. loop: lwarx r2,0,r3 #get next pointer stw r2,0(r4) #store in new element sync #let store settle (can omit if not MP) stwcx. r4,0,r3 #add new element to list bne- loop #loop if stwcx. failed In the preceding example, if two list elements have next element pointers in the same reservation granule in a multiprocessor system, livelock can occur. If it is not possible to allocate list elements such that each element’s next element pointer is in a different reservation granule, livelock can be avoided by using the following sequence: lwz r2,0(r3) #get next pointer loopl: mr r5,r2 #keep a copy stw r2,0(r4) #store in new element sync #let store settle loop2: lwarx r2,0,r3 #get it again cmpw r2,r5 #loop if changed (someone bne- loopl #else progressed) stwcx. r4,0,r3 #add new element to list bne- loop2 #loop if failed
Reporter | ||
Comment 2•22 years ago
|
||
PR_StackPush is exactly the List Insertion programming example in The PowerPC Architecture, Book I, Appendix E, Section E.1.3, page 255. I wrote PR_StackPop. PR_StackPop needs to be reviewed by an expert. The use of the sync and isync instructions in PR_StackPush and PR_StackPop also need to be reviewed by an expert.
Comment 3•22 years ago
|
||
Do we have any tests that exercise this code?
Reporter | ||
Comment 4•22 years ago
|
||
Yes. mozilla/nsprpub/pr/tests/stack.c exercises this code. In optimized builds, NSPR uses this code to implement the PRFileDesc freelist. So anything that opens and closes PRFileDesc's (files and sockets) will exercise this code in optimized builds. PRStack is a thread-safe singly-linked list. Elements are inserted and removed at one end (the head). The reason PR_StackPush and PR_StackPop need memory barriers is that this sequence of events may happen: Thread A on CPU 1 writes to an element Thread A on CPU 1 pushes the element to the stack Thread B on CPU 2 pops the element off the stack Thread B on CPU 2 reads from the element So thread B on CPU 2 must be able to read the value in the element written by thread A on CPU 1. On a multiprocessor system with a weak memory consistency model, memory barriers are needed to ensure that. The instructions I'm using will cause this: Thread A on CPU 1 writes to an element sync Thread A on CPU 1 pushes the element to the stack Thread B on CPU 2 pops the element off the stack isync Thread B on CPU 2 reads from the element I am not sure if isync is the right instruction to use there, but that's the instruction used in the "lock" procedure sample code in Section E.1.2 Lock Acquisition and Release, p. 254.
Reporter | ||
Comment 5•22 years ago
|
||
Attachment #115204 -
Attachment is obsolete: true
I was just wondering if anyone has tested this patch. Any idea how much of a performance boost Mac users would see with this?
Updated•18 years ago
|
QA Contact: wtchang → nspr
Updated•2 years ago
|
Severity: normal → S3
Comment 7•1 year ago
|
||
The bug assignee is inactive on Bugzilla, so the assignee is being reset.
Assignee: wtc → nobody
You need to log in
before you can comment on or make changes to this bug.
Description
•