Closed Bug 71718 Opened 23 years ago Closed 23 years ago

Carbonized Mac NSPR needs architectural work to run on dual-CPU machines

Categories

(NSPR :: NSPR, defect, P1)

PowerPC
macOS
defect

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: mikepinkerton, Assigned: sfraser_bugs)

References

Details

Attachments

(7 files, 9 obsolete files)

16.91 KB, patch
Details | Diff | Splinter Review
13.78 KB, patch
Details | Diff | Splinter Review
22.59 KB, text/html
Details
4.96 KB, text/html
Details
9.82 KB, patch
sdagley
: review+
Details | Diff | Splinter Review
5.98 KB, text/html
Details
2.12 KB, text/html
Details
Run the fizzilla build on a dual-cpu g4 (like the one in my cube) and the app 
stalls loading most pages. Here is the email from Apple (vlubet@apple.com):

Mike,

I looked at the code on macsockotpt.c and do not see any call to
OTEnterNotifier() and OTLeaveNotifier().

On Mac OS X, Open Transport notifiers, timer tasks and deferred task do not
run to completion. That means that code from the main event loop or another
thread may preempt code running from OT notifiers, OT timer tasks and OT
deferred tasks. The solution is to always protect your data by using either
atomic operations or by using OTEnterNotifier()/OTLeaveNotifier().

Bracketing your critical sections with OTEnterNotifier()/OTLeaveNotifier()
is usually the most convenient solution as this ensures that either the
notifier or the main event loop / cooperative thread is accessing common
data.

For example here is what I would do:

1369             do {
                     Boolean entered = OTEnterNotifier(endpoint);
1370                 fd->secret->md.read.thread = me;
1371                 fd->secret->md.readReady = PR_FALSE;                //
expect the worst   
1372                 result = OTRcv(endpoint, buf, bytesLeft, NULL);
1373                 if (fd->secret->nonblocking) {
1374                     fd->secret->md.readReady = (result !=
kOTNoDataErr);
                         if (entered)
                             OTLeaveNotifier(endpoint);
1375                     break;
1376                 } else {
1377                     if (result != kOTNoDataErr) {
1378                         // If we successfully read a blocking socket,
check for more data.
1379                         // According to IM:OT, we should be able to
rely on OTCountDataBytes
1380                         // to tell us whether there is a nonzero amount
of data pending.
1381                         size_t count;
1382                         OSErr tmpResult;
1383                         tmpResult = OTCountDataBytes(endpoint, &count);
1384                         fd->secret->md.readReady = ((tmpResult ==
kOTNoError) && (count > 0));
1385         
                             if (entered)
                                 OTLeaveNotifier(endpoint);
1386                         break;
1387                     }
1388 
1389                     // Blocking read, but no data available. Wait for
an event
1390                     // on this endpoint, and hope that we get a T_DATA
event.
                         if (entered)
                                 OTLeaveNotifier(endpoint);
1391                     WaitOnThisThread(me, PR_INTERVAL_NO_TIMEOUT);
1392                     result = me->md.osErrCode;
1393                     if (result != kOTNoError) // interrupted thread,
etc.
1394                         break;
1395 
1396                     // Prepare to loop back and try again
1397                     PrepareForAsyncCompletion(me, fd->secret->md.osfd);
1398                 }
1399             }
1400             // Retry read if we had to wait for data to show up.
1401             while(1);

It certainly behaves like our endpoints are being clobbered in a race-condition-
ish situation.
Apple has shipped us a dual CPU G4 so we don't have to camp in pink's cube trying 
to debug this
Status: NEW → ASSIGNED
Keywords: mozilla0.8.1
Target Milestone: --- → mozilla0.9.1
They have? Can I have it?  ;)
sfraser has the dual CPU loaner Mac, now he has the bug too :-)
Assignee: sdagley → sfraser
Status: ASSIGNED → NEW
Priority: -- → P1
moving to 9.2
Target Milestone: mozilla0.9.1 → mozilla0.9.2
I'll just point out that the dual-CPU loaner system is due back to Apple on 6/1 
so that, while this fix may not be critical for 0.9.1, it needs to land well 
before 0.9.2
Whiteboard: OSX+
mass move, v2.
qa to me.
QA Contact: tever → benc
The Mac NSPR code has serious issues on Mac OS X, which are particularly visible
on machines with two processors.

The main issue here is that OT notifiers, async file I/O callbacks, and timer
tasks each use pthreads under the hood, and are thus interruptible. This breaks
assumptions that the mac code currently makes relating to the switching on and
off of interrupts, and lock consistency.

It would appear that the only good solution to this problem is to build NSPR as
a Mach-O binary, and have it use the Unix implementation. If we do that, we need
to ensure that the implementation permits clients to call any Carbon Toolbox
functions on threads, as they can now. It would be bad to move to pthreads, and
to find, suddenly, that bad things happen when trying to do, say, drawing, or
memory allocation, on non-main threads.
Assignee: sfraser → wtc
Component: Networking → NSPR
Product: Browser → NSPR
QA Contact: benc → larryh
Summary: Most pages stall on dual-cpu G4 → Carbonized Mac NSPR needs architectural work to run on dual-CPU machines
Target Milestone: mozilla0.9.2 → ---
Steve, could you please work on this bug?  We already
have an implementation of NSPR for Mac OS X based on
pthreads.
Assignee: wtc → sdagley
wtc, can you tell me which nspr defines need to be lit to do a nspr for mac os x 
build?
The only way to build NSPR for Mac OS X is to use the Unix build
instructions, i.e., configure and make.

I don't know how to integrate this with Code Warrior.
I spoke to Quinn at Apple's WWDC about calling into Mach-O from CFM and he 
pointed me to <http://developer.apple.com/samplecode/Sample_Code/
Runtime_Architecture/CallMachOFramework.htm>.  It's intended for calling 
frameworks but I'm hoping it can be adapted to general Mach-O libs.
Thanks.  As it happens I found that on my own.  And I got that much working... I 
can load the System framework at _MD_EARLY_INIT or whatever it's called.  But 
then I'm stuck.  I have no pthread headers that will work with CW.  In fact, just 
to get the first part working I had to copy over the Universal folder from CW7EA.  
There was needed stuff missing from the headers otherwise.  

I tried to handroll my own pthread declararions, because I need only a few calls.  
But unfortunately initializing a pthread mutex requires a *structure* as one of 
the params (what the heck were those posix folks thinking?) so it gets 
complicated to get it all correct.  I really need posix headers that work with 
CodeWarrior.

Or I could try porting nspr over to ProjectBuilder.  Then I could make a knock-
off nspr lib that had pthreads locks in the right places to get dual cpu macs 
happy.
oops.  Sorry Steve, I thought you were replying to a question I asked in another 
forum.  You are actually looking at the opposite solution from what I am 
thinking.  You want to make a mach-o nspr and clal it from cfm fizzilla, right?  
I want to make the current cfm nspr use pthread locks from the system framework 
make the INTS_OFF/INTS_ON stuff in nspr actually acquire a real lock that the 
underlying os will honor.

I'm worried about using real pthreads in mac nspr because I think you can only 
make most system calls from the main thread and I don't know that we can enforce 
that restriction on the current code base.  So I'm thinking keep the nspr threads 
and just enforce syncronization with the callbacks/timers/notifiers using a real 
lock.
ok, new idea.  make the INTS_ON/INTS_OFF stuff enter and leave a critical
region.  That way everything turning on/off nspr ints is competing on the same
critical region, acting just like a lock.  The reason this is good is that the
mac MP api supports this, and on mac os x it is just a wrapper for posix stuff,
so the underlying os threads running the callbacks/timers/notifiers will respect
this.  I'll wager the MP headers don't pose any problems for our build system,
so it should be easy to do.  i'll try it later today and see.
two thoughts:
- pro7 can build mach-o, so why aren't all the pthread headers there?
- aren't the correct pthread headers on the OSX developer install for you to
look at?
Yes, you will be able to call MP stuff. This might be one way to go.
Target Milestone: --- → 4.2
beta stopper
Whiteboard: OSX+ → OSX++
I've played around with this some.  My first attempt seems to just lock all the
threads up.  The chief difficulty is that _PR_INTSOFF retreives a value that
indicates whether the region is already entered.  But the MP api does not expose
any way to get that info for a critical region.  The result is that the nspr
code triggers more entries into the critical region than exits.

I think it may be possible to make _PR_INTSOFF & _PR_INTSON not care about that
value.  The number of places in the code that actually examine that value are
few, and are almost entirely in mac only code.  I think this value is only
needed because _PR_INTSOFF doesn't currently acquire a "real" lock of any kind. 

I'll play with this some more over the weekend.
How about keeping a counter for the number of times _PR_INTSOFF has been called, 
and only enter/leave the critical section on 0/1 transitions, sort of like 
destructors and reference counting.  That should have similar semantics to the 
current usage of _PR_INTSOFF.

Or it may be even simpler than that.  You may just need to keep a flag indicating 
whether you're in the critical section, and return that flag.

If you're around this afternoon, I'd love to talk with you about it.
you would probably need a separate lock on this counter too, as you could be
pre-empted between checking the value and then acting on it causing a
race-condition.
re gordon: i was already trying that.
re mike: i was entering my critical section before using the counter.

there is some code around that saves the counter, and then enters the intsOff
state setting the counter to 1, and then restores the counter when it's done. 
This will have the effect of not turning ints back on if they weren't on before,
but it totally demolishes the concept of the counter being a reentrancy count.  
i think i can just not use a counter at all, and tweak the code in the handful
of places that use it.  Almost all are in mac only code, and most of it is to
detect when a notification might have been dropped becasue it came while ints
were off.  Under osx the notification wont be dropperd in that situation: if
someone else is in the critical section when the notification comes in, the
notification will block on the critical section, and then fire when it can enter.  
at request of macdev, attaching my patches.  they  DO NOT work.  But they 
show the lines I am thinking along.  
Attached patch diffs of nsprpb (obsolete) — Splinter Review
a second attempt.  not sure yet why this one doesn't work.
Attached patch nspr patch (obsolete) — Splinter Review
sdagley - are any of the machines which we'll get from Apple for testing a
dual-CPU machine?  If dual-CPU machine seems to have it's own set of problems,
we may need to test on that.
We should *definately* test on dual-CPU machines, and it would be good for QA to 
invest in a few of these. Note that, currently, our builds are pretty much dead 
on such machines running Mac OS X because of this bug.
Whiteboard: OSX++ → OSX++, PDT+
lchiang, we aren't getting any dual CPUs in the new batch but Joe Francis has 
voluntereed his dual-cpu system for testing while he's on sabbatical next month
Keywords: nsBranch
Steve, this is PDT+ bug. Tomorrow, on Tuesday, we'll try to build the first RTM
candidate. It would be good, if this could be resolved ASAP.
what going on with this bug?  getting close to the time where it may have to be
latered to future milestone/release if we are not there, or past there already...
This bug is specific to Mac OS X and does not impact the 'Classic' builds for Mac 
OS 8.6/9.x.  As to where we're at with it, as an AOL employee I can't use the 
appropriate profanity in this public forum to describe it so I'll just say we 
don't expect to have a fix this month.
*** Bug 90941 has been marked as a duplicate of this bug. ***
we're not gonna have this fixed by beta, removing pdt+
Whiteboard: OSX++, PDT+ → OSX++
Whiteboard: OSX++ → OSX++, meeting w/Apple in August to attempt fix w/o requiring new architecture
*** Bug 93824 has been marked as a duplicate of this bug. ***
*** Bug 93720 has been marked as a duplicate of this bug. ***
The following Apple tech note may be helpful in resolving this bug, as well as
giving Fizzilla better OS X performance.

http://developer.apple.com/technotes/tn/tn2028.html

- Adam
Added keyword: nsenterprise+
Keywords: nsenterprise+
Simon is now working on this so reassigning bug
Assignee: sdagley → sfraser
Whiteboard: OSX++, meeting w/Apple in August to attempt fix w/o requiring new architecture → OSX++
Attachment #38164 - Attachment is obsolete: true
Attachment #38876 - Attachment is obsolete: true
Attached patch Final patch for review (obsolete) — Splinter Review
Attachment #49135 - Attachment is obsolete: true
That last patch works. On single CPU machines, it's performance impact is below
the noise threshold of the page loading tests I used to measure it. It uses MP
critical sections to fix thread synchronization issues, blocking multiple MP
threads from running while NSPR interrupts are turned off.

The code is currently active if TARGET_CARBON is defined to be 1. It might be
necessary to detect the OS version using Gestalt if this affects Carbon under 9
performance (which is unknown).

I need some strong reviews, please.
Status: NEW → ASSIGNED
I have some further changes to the patch, to conditionalize the critical region 
code for Mac OS X only, and to add some explanatory comments. Please wait...
Blocks: 97866
is this going to make it soon?  putting on the "+"
Keywords: nsbranchnsbranch+
Attachment #49168 - Attachment is obsolete: true
Attached patch Really final patch (obsolete) — Splinter Review
The latest patch does two new things:
1. Return non-Carbon builds to their previous behaviour (just some macro changes).
2. In carbon builds, conditionally uses critical sections only if we are running
on Mac OS X. So Carbon-on-classic builds are not penalized.
Attachment #49391 - Attachment is obsolete: true
The new patch:
* Replaces the array of critical sections with just one (since I only used one),
  which now matches the single critical region entry count.
* Removes a bogus WalkTheStack prototype.
I reviewed Simon's "really final patch" (attachment 49391 [details] [diff] [review]).
I already sent him some suggested changes after a quick
look.  Then I stared at the patch for a good hour.  After
I internalized the patch, I found that Simon's solution is
actually very simple.  (It is all those macros that makes
the patch look complicated.) It is exactly what jfrancis
outlined in his comments back in June.  It surprises me that
the simple solution is enough to solve this problem.
Congratulations to all of you who worked on this bug!

I will go over the patch with Simon next week.  I just
wanted to point out that the patch also contains a fix
for bug 97866.  That bug is marked as depending on this
bug, but I think it is not related and its fix should
be reviewed and checked in independently.
(This looks close to a fix - great!  Who at Netscape has a dual-processor
machine besides kin?  Once this is fixed, we need to do some testing.)
PDT+ per PDT. Let's get one.
Whiteboard: OSX++ → OSX+,PDT+
Sorry for the somewhat basic quizzing but:

1- This bug discusses fixing the MacOS X builds only right? So it is completely
irrelevant to running MacOS X w/ classic and classic build.

2- What happened if you ran MacOS X builds on dual-processor systems before this
fix? (Boom?!). Did this affect dual processor systems in pre-X?

Need this info to figure out how much upgrading is needed for testing.
ben:

1) correct, this does not affect the classic bits (non-carbon) running on osx in 
the classic environment

2) pages would stall, and never finish loading. sometimes the app would crash 
when you quit after failing to load pages.
Attached patch Super-final patch (obsolete) — Splinter Review
Attachment #49408 - Attachment is obsolete: true
The super-final patch:
* removes the async file I/O changes, which are now in a patch in bug 97866
* replaces separate intsoff/intson implemenations with a single, simplified
intsoff routine, and removes some macro code duplication.
* Uses intsoff/fastintson rather than setintsoff(1)/setintsoff(0) in a couple of
callback routines.
* Improves the comments in _MD_PauseCPU and DoneWaitingOnThisThread().

gordon, please review.
wtc, if you'll then bless this, I can check it in.
r=gordon, unless you want to clean up _macos.h or add a wrapper to
DoneWaitingOnThisThread().

wtc, the #ifdefs in primpl.h give me headaches.  Are you sure they're correct? 
There seem to be redundant nesting of global and local thread cases.
And now the bad news. Adding the critical regions on Mac OS X does have some 
perceptible impact on performance in some areas. Page loading is not affected, 
but some UI operations are (I noticed, for example, that loading the buddy list 
in a commercial build got slower), and the browser feels someone less responsive 
on X. It may be that parts of the code that rely a lot on PR_Lock/PR_Unlock, or 
atomic operations, are slowed down by this patch.
This is not on Tian Tian Must Fix list for this release. Should we consider
minusing it, and changing the priority for this release?
sdagley commenting here - If this isn't on tiantian's list it damn sure should
be - this is most DEFINITELY a stop ship bug.  Otherwise explain to Marty Fisher
hy we don't run on his new Mac :-)
Whiteboard: OSX+,PDT+ → [OSX+],PDT+
Seconding Steve's comments.  As someone involved with the evolution of this bug 
fix I feel we are much better off with it than without it.  Both MacOS X and dual 
CPU macs are catching on pretty quickly.  Lets not leave those folks stranded.
Hi:
 This bug is the FIRST one listed on my stop ship bug list compiled yesterday.
Please check again the list on
http://bugscape.netscape.com/show_bug.cgi?id=9485. And it is the first bug on
the 8/21 list as well. 

Check it into the trunk, and let's get some testing numbers (New Window,
Startup, etc.). Tian Tian and Hong to coordinate test efforts.
Attachment #49782 - Attachment is obsolete: true
Chnages in this patch: 
* removed redundant duplicate of _MD_INTSOFF macro in _macos.h
* In code blocks like this:

    if (_PR_MD_GET_INTSOFF()) {
        thread->md.missedIONotify = PR_TRUE;
        cpu->u.missed[cpu->where] |= _PR_MISSED_IO;
        return;
    }

which are outside of the critical region, reversed the order of setting
missedIONotify and the _PR_MISSED_IO to mitigate the effects of preemption at
this point.

Note that the patch is a diff -b (ignoring space change), so spacing looks worse
than it really is.
Comment on attachment 49998 [details] [diff] [review]
Patch with minor revisions

r=gordon
I was unable to edit patch status to specify "has-review". Does that have
something to do with the Product field?
I will coordinate the testing for the perf numbers.  Results to be posted when 
they come in.  Per simon:

1.  Tests run on single processor CPU (we don't have dual handy)
2.  Run on today's trunk build vs. tomorrow's trunk build
3.  Run:  window open, startup, IM, prefs, mail performance tests
4.  Results posted back here as attachments

Thanks
Most of the changes in my patch from Simon's last patch
are a matter of style.
- I make sure no lines are longer than 80 characters.
- I do not make whitespace changes.  This is to make the
  diffs cleaner.
- Whether tabs or spaces are used follows the local
  convention.  Again, this is to make the diffs easier
  to read.
- "Regions" and "REGIONS" are changed to "Region" and
  "REGION" because there is only one critical region.
- "TermCriticalRegions" is renamed "DestroyCriticalRegion".
- Some macro definitions were moved between primpl.h and
  _macos.h and the intermediate macro _MD_INTSOFF() was
  deleted.
- gUseCriticalRegion and gCriticalRegion are declared static
  because they are only used in macthr.c.

The real changes are:
- In _PR_INTSOFF(), I leave the critical region immediately
  so that the entry count is always 1.  This way, in
  _PR_FAST_INTSON() and _PR_INTSON(), I only need to leave
  the critical region once if we need to leave the critical
  region completely.  As a result, the global variable
  gCriticalRegionEntryCount can be deleted.
- In macio.c, AsyncIOCompletion(), _PR_INTSOFF() and
  _PR_FAST_INTSON() are not necessary because they are called
  in DoneWaitingOnThisThread().  I also moved the setting of
  thread->md.osErrCode before the if block.

Simon, please review my patch and test it.  If it doesn't
work and it takes too long to fix it, you can go ahead and
check in your patch on the NSPRPUB_PRE_4_2_CLIENT_BRANCH.
I will check email from home tonight.
The sentence
  In _PR_INTSOFF(), I leave the critical region immediately
  so that the entry count is always 1.
should read
  In _PR_INTSOFF(), I leave the critical region immediately
  if we are already in the critical region so that the entry
  count is always 1.

wtc, your final patch does not work, alas. As-is, it deadlocks on launch. I 
noticed that this seems wrong:

 #define _PR_INTSOFF(_is) \
     PR_BEGIN_MACRO \
+        ENTER_CRITICAL_REGION(); \
         (_is) = _PR_MD_GET_INTSOFF(); \
         _PR_MD_SET_INTSOFF(1); \
+        if (_is) \
+            LEAVE_CRITICAL_REGION(); \
     PR_END_MACRO
 

and should probably be

         _PR_MD_SET_INTSOFF(1); \
+        if (_is == 0) \
+            LEAVE_CRITICAL_REGION(); \

but with that fixed, we get a slew of assertions about leaving non-existent 
critical sections.
I checked my previous patch into the NSPRPUB_PRE_4_2_CLIENT_BRANCH at 8:45pm, so 
it will show up in trunk builds tomorrow (9/20).
trunk testing [BEFORE checkin] using:

* 2001.09.18.20-trunk commercial
* on Mac OS 10.0.4, 450 Mhz G3, 384Mhz
* modern theme [note: separate profiles used for prefs and page load tests]

Preferences dialog open: 2.50sec
Switching from Navigator to Mail & Newsgroups panel: 0.70sec

will soon attach page load results [side note: crashed several times after page
load completed --see bug 100852].
Comment on attachment 50169 [details]
2001.09.18.20-trunk comm page load results

re-ran the pre-checkin page  load tests, thus invalidating this run. will reattach new set.
Attachment #50169 - Attachment is obsolete: true
tried running the page load tests using the 2001.09.20.16-trunk bits: alas, it
would not complete. it hung [but not on the same page] each of the three times i
tried --even after rebooting the machine after the second hang. :(

will try to run the preferences perf tests. hopefully that will go more smoothly...
alas, i'm blocked with the preferences testing as well. :(

i need to restart the app btwn each dialog-open timing [to get four timings and
report their average]. i was able to launch the app a first time with a new
profile...but with the second launch the browser window didn't even appear: only
got the menubar and the rainbow spinning cursor.

restarting the machine kinda "works around" the hang, but it's a rather harsh
workaround.

another note: simon mentioned that this might be a 10.0.4 issue --since he was
recently able to run the page load tests [same 2001.09.20.16-trunk build] on his
dual-cpu G4 [which has a later version of 10.x].
recent email rec'd from todds@bigfoot.com:

"...I've been following this bug since I want to use mozilla on my dual 800 with
OS X 10.0.4.

"I just noticed your most recent comment at 22:17, and wanted to let you know
that I am experiencing the same spinning rainbow cursor, and the browser never
starts when using the 2001.09.20 build.

"Hope that helps in any way, let me know if I can help otherwise."
Hold off on testing here folks. It looks like my changes causes deadlock on 
single and dual-CPU machines running 10.0.x builds. Damn. Running a new build 
now, more tomorrow.
Running 10.1 would not affect the deadlock?
Let's find out if this will ever work on 10.04? Simon is researching ...
I think 10.0.x is a lost cause. I just got a new dual-800, and was having 
MAJOR stability problems with 10.0.4 4S10. The machine would randomly 
crash. I posted to the Darwin list, and was told 10.0.4 has major SMP 
issues and to wait for 10.1. Let's just say this was good advice.

See http://www.darwinfo.org/devlist.php3?number=11117 for the full 
message.
Looking at Stephen's mail results for OS X for 9-18 vs. 9-21 builds, there is
some amount of slow down in the 9-21 build.
Can we get those tests re-run with  Mac OS X 10.abouttobereleased?
Do you need a dual CPU or does a single CPU work for the speed tests?  If a
single CPU works, lemme know exactly which files to grab for the tests to be
consistent.
Brice - thanks for the offer to help.  For the speed tests, here is one of them
(Mail).  All the test files and methodology are there. The important thing is to
use the same machine for the before/after builds with simon's fix.   A single
CPU is fine for this speed test.   Do you have version 10.1 of the OS?   If so,
that would be great.

sdagley - can you give me a CD with the updated OS?
I looked at the perf results attachment and I think I can do all that ... gotta
see about the stopwatch, though :)

If I have any questions, I'll let ya know.

I assume grabbing today's 'nightly' is fine enough for a build w/ the patch
incorporated?
Here's the deal. The MP APIs that I'm using for dual CPU protection are broken 
under 10.0.x, but fixed for 10.1. I have code that works around this problem, 
that will make things work on 10.0.x. So I need to check that in, and we then 
have to do another round of performance testing.
Oh - and I can't confirm anything about having 10.1 - to my knowledge, there is
no such release yet. :)
Keywords: nsenterprise+
Brice, yes.  Thanks for helping with this.  So, to summarize:

Folder loading speed: done with local folders
Thread pane scrolling: local or IMAP
Message display speed: local folders
Thread pane sorting: local or imap
Message Reply Speed: IMAP, see
http://www.mozilla.org/mailnews/performance/msgcompose_loggin.html for details
on how to generate a performance logfile for this.
Batch delete 5 (10kb each) messages: local
Mail module loading time: I wouldn't bother with this, it's too complex
currently (using IMAP, non-isolated servers, etc.) to be truly reliable.
Keywords: nsenterprise+
marking nsenterprise- to get off the enterprise stop ship list.
Keywords: nsenterprise-
Keywords: nsenterprise+
I tested the lastest changes on 10.1 single- and dual- CPU machines, and both 
work just fine.
I have the 10.1 update to Mac OS X.  Stephen or Sairuh, pls stop by to install
for testing.

Simon - do you expect to see a timing difference between OS X and 10.1?
Attachment #50595 - Attachment is obsolete: true
spoke with lisa briefly: stephend upgraded to 10.1, but i'll stick with 10.0.4
for  the nonce --so that we can test with smfr's latest patch [after
review/check-in] on both OS versions.
Comment on attachment 50724 [details] [diff] [review]
Tidied up MDEnter/Leave critical section code

r=sdagley
Attachment #50724 - Flags: review+
I checked in the working critical region code to the NSPR branch, which is used 
by the mozilla TRUNK. So this bug is now good to go on for testing.
holy crap.  So that's why I couldn't get this to work before - I was doing the 
same thing under 10.0.4
Simon - Pls get the SR=, so we can plus this one, and check it in today.
Whiteboard: [OSX+],PDT+ → [OSX+],PDT
Whiteboard: [OSX+],PDT → [OSX+],PDT,[ETA 09.26]
This has all the reviews it needs to be checked in to the branch. It's already on 
the trunk.
trunk testing of Preferences panel: compared with my 2001-09-20 16:31 results.
same configuration, with a new profile [modern, no sidebar].

AFTER checkin, using 2001.09.25.20-trunk
----------------------------------------
Open Preferences dialog open: 3.39sec
Switching from Navigator to Mail & Newsgroups panel: 0.97sec


BEFORE checkin, using 2001.09.18.20-trunk
-----------------------------------------
Preferences dialog open: 2.50sec
Switching from Navigator to Mail & Newsgroups panel: 0.70sec

looks like this checkin might've slowed thing down a bit.

currently running page load tests. will attach when finished.
I thought we were supposed to test on the 26th builds:

AFTER build:

Trunk build from 9/26 (when it shows up).

If the fix was committed on 9/25 at 7:38pm, how does it show up in the 25-20
builds, which are produced at 8 pm (and presumably take more than ~20 minutes to
pull and build).

Just checking to make sure I test the right build.
I just wanted to point out the the focus of the QA
testing should be on the stability of this patch.
That is, whether the Mozilla client crashes or
deadlocks with the patch applied.

The current code is broken on Mac OS X.  No matter
how much faster it is, it is broken.

If Simon's patch fixes the thread safety problem
on Mac OS X, it should be checked in.  There is not
much point in comparing its performance with the
performance of broken code on Mac OS X.  Any
performance comparision should be done on classic
Mac OS, where the current code works.
check it in - PDT+
Whiteboard: [OSX+],PDT,[ETA 09.26] → [OSX+],PDT+,[ETA 09.26]
i've created bug 101838 for the performance slowdown i've noticed in the
2001.09.25.20-trunk builds.
Startup Results
I take an average of three for launch and relaunch on both builds. I tested 
these manually via a stopwatch. Times are in seconds. 
Build 2001-09-18-20-trunk
Launch: 19.25 
Re-launch: 12.49
Build 2001-09-25-20-trunk
Launch: 24.17
Re-launch: 16.62

So, as you can see.. a fairly large jump in times in the 9-25 build :( 
There were no peaks or weird jumps, all times on both builds were fairly 
consistent. 
OS X v10.1, iMac DV 400MHz, 768MB RAM

Test/Build|--9/18--|--9/27--|
Startup 1 |  24.45 |  18.40 |
Startup 2 |  10.31 |  11.78 |

Startup 1 indicates that I rebooted and launched Mozilla for the first time,
Startup 2 indicates that I launched Mozilla directly after quitting the same
build of Mozilla, after a clean reboot.

Basically, I rebooted, launched build 9/18, quit, launched again - recorded
times, rebooted, launched build 9/27, quit, launched again - recorded times. 
I'll try to get some Mail Perfs now that I have a stopwatch.
I'm really sorry that I have to report this ... but, I believe I just
experienced deadlock in Mozilla, build 2001092620 - I had the spinning pizza for
at least 5 minutes and when I checked out top, Moz was hovering right around 50%
usage.  I had to forcibly kill Moz :(

I don't really know what 'caused' it ... I was viewing a web page when my wife
sent me an email, so I command-2'd over to Mail/News, hit command-t to get new
mail, then clicked in the 'Denise' folder that her messages get filtered into
and that's when the pizza started spinning.

I don't have a dual-proc machine, this is the same machine I posted the perf
results from (iMac DV 400 w/ 768MB).  If this *wasn't* deadlock, what could it
have been?
Checked in on 0.9.4 branch. Keeping open for NSPR tip.
Whiteboard: [OSX+],PDT+,[ETA 09.26]
Checked in on the NSPR tip.
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
kin - I heard you have a dual processor machine.  Can you verify for QA that
that the branch build starts on your machine?
I have a G4 533 dual and Mozilla build nr. 2001093021 works great. Startup time
could be less (I hope).
lchiang - I have no dual processor machines ... I believe both jfrancis and
sfraser do.
Thank you.

I think I will mark this verified (for startup of application on dual processor
Mac) per sharky_k_@mac.com's comment.

I have also asked Marty F. at AOL to confirm for me on his dual processor for
the commercial build.  I will also inquire with one other source for the
commercial build.
Status: RESOLVED → VERIFIED
I will also test on fkeeney's dual processor which he has graciously lent to me.
 If anyone wants to come by and use it, you are more than welcome to (B20-1)
Some results:

On sdagley's G4/733 it took around 5 minutes to download 2220 headers (using
yesterday's trunk)
On fkeeney's dualG4/500 it took around 5 minutes to download 3482 headers (using
today's branch)

On sdagley's G4/733 it took around 30 seconds to download 2220 headers (using a
debug build from a few weeks ago, pre Simon's fix).  This on a non-optimized build.

As you can see this is a HUGE performance degradation.  It is literally 10 times
slower.
posted new bug for mail performance degradation: 
http://bugzilla.mozilla.org/show_bug.cgi?id=102797

lrg tested major windows and browsed around on dual CPU machine for about 3
hours and didn't find any problems (other than generally sluggish performance).
You need to log in before you can comment on or make changes to this bug.