Open Bug 194339 Opened 22 years ago Updated 1 year ago

Implement atomic PRStack for Darwin PowerPC

Categories

(NSPR :: NSPR, enhancement)

4.2.1
PowerPC
macOS
enhancement

Tracking

(Not tracked)

People

(Reporter: wtc, Unassigned)

Details

Attachments

(1 file, 1 obsolete file)

We should implement atomic PRStack for Darwin PowerPC
in PowerPC assembly language.
The PPC manual has an example of this.

E.5 List Insertion
The following example shows how the lwarx and stwcx. instructions can be used to
implement simple LIFO (last-in-first-out) insertion into a singly-linked list.
(Complicated
list insertion, in which multiple values must be changed atomically, or in which
the correct
order of insertion depends on the contents of the elements, cannot be
implemented in the
manner shown below, and requires a more complicated strategy such as using locks.)
The next element pointer from the list element after which the new element is to
be inserted,
here called the parent element, is stored into the new element, so that the new
element
points to the next element in the list—this store is performed unconditionally.
Then the
address of the new element is conditionally stored into the parent element,
thereby adding
the new element to the list.
In this example, it is assumed that the address of the parent element is in
GPR3, the address
of the new element is in GPR4, and the next element pointer is at offset zero
from the start
of the element. It is also assumed that the next element pointer of each list
element is in a
reservation granule separate from that of the next element pointer of all other
list elements.

loop:   lwarx r2,0,r3       #get next pointer
        stw r2,0(r4)        #store in new element
        sync                #let store settle (can omit if not MP)
        stwcx. r4,0,r3      #add new element to list
        bne- loop           #loop if stwcx. failed

In the preceding example, if two list elements have next element pointers in the
same
reservation granule in a multiprocessor system, livelock can occur.
If it is not possible to allocate list elements such that each element’s next
element pointer
is in a different reservation granule, livelock can be avoided by using the
following
sequence:

        lwz r2,0(r3)    #get next pointer
loopl:  mr r5,r2        #keep a copy
        stw r2,0(r4)    #store in new element
        sync            #let store settle
loop2: lwarx r2,0,r3    #get it again
        cmpw r2,r5      #loop if changed (someone
        bne- loopl      #else progressed)
        stwcx. r4,0,r3  #add new element to list
        bne- loop2      #loop if failed
PR_StackPush is exactly the List Insertion programming
example in The PowerPC Architecture, Book I, Appendix E,
Section E.1.3, page 255.

I wrote PR_StackPop.  PR_StackPop needs to be reviewed
by an expert.  The use of the sync and isync instructions
in PR_StackPush and PR_StackPop also need to be reviewed
by an expert.
Do we have any tests that exercise this code?
Yes.  mozilla/nsprpub/pr/tests/stack.c exercises this code.

In optimized builds, NSPR uses this code to implement the
PRFileDesc freelist.  So anything that opens and closes
PRFileDesc's (files and sockets) will exercise this code
in optimized builds.

PRStack is a thread-safe singly-linked list.  Elements
are inserted and removed at one end (the head).

The reason PR_StackPush and PR_StackPop need memory barriers
is that this sequence of events may happen:

Thread A on CPU 1 writes to an element
Thread A on CPU 1 pushes the element to the stack
Thread B on CPU 2 pops the element off the stack
Thread B on CPU 2 reads from the element

So thread B on CPU 2 must be able to read the
value in the element written by thread A on CPU 1.
On a multiprocessor system with a weak memory consistency
model, memory barriers are needed to ensure that.

The instructions I'm using will cause this:

Thread A on CPU 1 writes to an element
sync
Thread A on CPU 1 pushes the element to the stack
Thread B on CPU 2 pops the element off the stack
isync
Thread B on CPU 2 reads from the element

I am not sure if isync is the right instruction to
use there, but that's the instruction used in the
"lock" procedure sample code in Section E.1.2
Lock Acquisition and Release, p. 254.
Attached patch Proposed patchSplinter Review
Attachment #115204 - Attachment is obsolete: true
I was just wondering if anyone has tested this patch. Any idea how much of a performance boost 
Mac users would see with this?
QA Contact: wtchang → nspr
Severity: normal → S3

The bug assignee is inactive on Bugzilla, so the assignee is being reset.

Assignee: wtc → nobody
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: