Closed Bug 71718 Opened 24 years ago Closed 23 years ago

Carbonized Mac NSPR needs architectural work to run on dual-CPU machines

Tracking

(Not tracked)

Status:

VERIFIED FIXED

Milestone:

4.2

People

(Reporter: mikepinkerton, Assigned: sfraser_bugs)

References

Details

Attachments

(7 files, 9 obsolete files)

diffs of nsprpb 23 years ago Joe Francis 8.35 KB, patch		Details \| Diff \| Splinter Review
nspr patch 23 years ago Joe Francis 30.48 KB, patch		Details \| Diff \| Splinter Review
work in progress patch. needs some cleanup. 23 years ago Simon Fraser [no longer active] 21.28 KB, patch		Details \| Diff \| Splinter Review
Final patch for review 23 years ago Simon Fraser [no longer active] 11.25 KB, patch		Details \| Diff \| Splinter Review
Really final patch 23 years ago Simon Fraser [no longer active] 14.67 KB, patch		Details \| Diff \| Splinter Review
New patch addressing wtc's email comments 23 years ago Simon Fraser [no longer active] 13.48 KB, patch		Details \| Diff \| Splinter Review
Super-final patch 23 years ago Simon Fraser [no longer active] 16.17 KB, patch		Details \| Diff \| Splinter Review
Patch with minor revisions 23 years ago Simon Fraser [no longer active] 16.91 KB, patch		Details \| Diff \| Splinter Review
Proposed patch, based on sfraser's last patch. 23 years ago Wan-Teh Chang 13.78 KB, patch		Details \| Diff \| Splinter Review
2001.09.18.20-trunk comm page load results 23 years ago sairuh (rarely reading bugmail) 8.92 KB, text/html		Details
two runs of page load results using 2001.09.18.20-trunk 23 years ago sairuh (rarely reading bugmail) 22.59 KB, text/html		Details
OS X perf results for Mail 23 years ago stephend@netscape.com (gone - use stephen.donner@gmail.com instead) 4.96 KB, text/html		Details
Patch to fix MPEnter/LeaveCriticalRegion on 10.0.x 23 years ago Simon Fraser [no longer active] 6.33 KB, patch		Details \| Diff \| Splinter Review
Tidied up MDEnter/Leave critical section code 23 years ago Simon Fraser [no longer active] 9.82 KB, patch	sdagley : review+	Details \| Diff \| Splinter Review
Mail performance results, take 2 (10.1, this time) 23 years ago stephend@netscape.com (gone - use stephen.donner@gmail.com instead) 5.98 KB, text/html		Details
IM results of Performance testing on OSX. Everything is slower here also. 23 years ago scalkins 2.12 KB, text/html		Details

Mike Pinkerton (not reading bugmail)

Reporter

Description

•

24 years ago

Run the fizzilla build on a dual-cpu g4 (like the one in my cube) and the app stalls loading most pages. Here is the email from Apple (vlubet@apple.com): Mike, I looked at the code on macsockotpt.c and do not see any call to OTEnterNotifier() and OTLeaveNotifier(). On Mac OS X, Open Transport notifiers, timer tasks and deferred task do not run to completion. That means that code from the main event loop or another thread may preempt code running from OT notifiers, OT timer tasks and OT deferred tasks. The solution is to always protect your data by using either atomic operations or by using OTEnterNotifier()/OTLeaveNotifier(). Bracketing your critical sections with OTEnterNotifier()/OTLeaveNotifier() is usually the most convenient solution as this ensures that either the notifier or the main event loop / cooperative thread is accessing common data. For example here is what I would do: 1369 do { Boolean entered = OTEnterNotifier(endpoint); 1370 fd->secret->md.read.thread = me; 1371 fd->secret->md.readReady = PR_FALSE; // expect the worst 1372 result = OTRcv(endpoint, buf, bytesLeft, NULL); 1373 if (fd->secret->nonblocking) { 1374 fd->secret->md.readReady = (result != kOTNoDataErr); if (entered) OTLeaveNotifier(endpoint); 1375 break; 1376 } else { 1377 if (result != kOTNoDataErr) { 1378 // If we successfully read a blocking socket, check for more data. 1379 // According to IM:OT, we should be able to rely on OTCountDataBytes 1380 // to tell us whether there is a nonzero amount of data pending. 1381 size_t count; 1382 OSErr tmpResult; 1383 tmpResult = OTCountDataBytes(endpoint, &count); 1384 fd->secret->md.readReady = ((tmpResult == kOTNoError) && (count > 0)); 1385 if (entered) OTLeaveNotifier(endpoint); 1386 break; 1387 } 1388 1389 // Blocking read, but no data available. Wait for an event 1390 // on this endpoint, and hope that we get a T_DATA event. if (entered) OTLeaveNotifier(endpoint); 1391 WaitOnThisThread(me, PR_INTERVAL_NO_TIMEOUT); 1392 result = me->md.osErrCode; 1393 if (result != kOTNoError) // interrupted thread, etc. 1394 break; 1395 1396 // Prepare to loop back and try again 1397 PrepareForAsyncCompletion(me, fd->secret->md.osfd); 1398 } 1399 } 1400 // Retry read if we had to wait for data to show up. 1401 while(1); It certainly behaves like our endpoints are being clobbered in a race-condition- ish situation.

Mike Pinkerton (not reading bugmail)

Reporter

Updated

•

24 years ago

Keywords: correctness, mozilla0.8.1, nsbeta1, nsmac1

Steve Dagley

Comment 1

•

24 years ago

Apple has shipped us a dual CPU G4 so we don't have to camp in pink's cube trying to debug this

Status: NEW → ASSIGNED

Keywords: mozilla0.8.1

Target Milestone: --- → mozilla0.9.1

Simon Fraser [no longer active]

Assignee

Comment 2

•

24 years ago

They have? Can I have it? ;)

Steve Dagley

Comment 3

•

24 years ago

sfraser has the dual CPU loaner Mac, now he has the bug too :-)

Assignee: sdagley → sfraser

Status: ASSIGNED → NEW

Simon Fraser [no longer active]

Assignee

Updated

•

24 years ago

Priority: -- → P1

rubydoo123

Comment 4

•

24 years ago

moving to 9.2

Target Milestone: mozilla0.9.1 → mozilla0.9.2

Steve Dagley

Comment 5

•

24 years ago

I'll just point out that the dual-CPU loaner system is due back to Apple on 6/1 so that, while this fix may not be critical for 0.9.1, it needs to land well before 0.9.2

Steve Dagley

Updated

•

24 years ago

Whiteboard: OSX+

benc

Comment 6

•

24 years ago

mass move, v2. qa to me.

QA Contact: tever → benc

Simon Fraser [no longer active]

Assignee

Comment 7

•

23 years ago

The Mac NSPR code has serious issues on Mac OS X, which are particularly visible on machines with two processors. The main issue here is that OT notifiers, async file I/O callbacks, and timer tasks each use pthreads under the hood, and are thus interruptible. This breaks assumptions that the mac code currently makes relating to the switching on and off of interrupts, and lock consistency. It would appear that the only good solution to this problem is to build NSPR as a Mach-O binary, and have it use the Unix implementation. If we do that, we need to ensure that the implementation permits clients to call any Carbon Toolbox functions on threads, as they can now. It would be bad to move to pthreads, and to find, suddenly, that bad things happen when trying to do, say, drawing, or memory allocation, on non-main threads.

Assignee: sfraser → wtc

Component: Networking → NSPR

Product: Browser → NSPR

QA Contact: benc → larryh

Summary: Most pages stall on dual-cpu G4 → Carbonized Mac NSPR needs architectural work to run on dual-CPU machines

Target Milestone: mozilla0.9.2 → ---

Wan-Teh Chang

Comment 8

•

23 years ago

Steve, could you please work on this bug? We already have an implementation of NSPR for Mac OS X based on pthreads.

Assignee: wtc → sdagley

Joe Francis

Comment 9

•

23 years ago

wtc, can you tell me which nspr defines need to be lit to do a nspr for mac os x build?

Wan-Teh Chang

Comment 10

•

23 years ago

The only way to build NSPR for Mac OS X is to use the Unix build instructions, i.e., configure and make. I don't know how to integrate this with Code Warrior.

Steve Dagley

Comment 11

•

23 years ago

I spoke to Quinn at Apple's WWDC about calling into Mach-O from CFM and he pointed me to <http://developer.apple.com/samplecode/Sample_Code/ Runtime_Architecture/CallMachOFramework.htm>. It's intended for calling frameworks but I'm hoping it can be adapted to general Mach-O libs.

Joe Francis

Comment 12

•

23 years ago

Thanks. As it happens I found that on my own. And I got that much working... I can load the System framework at _MD_EARLY_INIT or whatever it's called. But then I'm stuck. I have no pthread headers that will work with CW. In fact, just to get the first part working I had to copy over the Universal folder from CW7EA. There was needed stuff missing from the headers otherwise. I tried to handroll my own pthread declararions, because I need only a few calls. But unfortunately initializing a pthread mutex requires a *structure* as one of the params (what the heck were those posix folks thinking?) so it gets complicated to get it all correct. I really need posix headers that work with CodeWarrior. Or I could try porting nspr over to ProjectBuilder. Then I could make a knock- off nspr lib that had pthreads locks in the right places to get dual cpu macs happy.

Joe Francis

Comment 13

•

23 years ago

oops. Sorry Steve, I thought you were replying to a question I asked in another forum. You are actually looking at the opposite solution from what I am thinking. You want to make a mach-o nspr and clal it from cfm fizzilla, right? I want to make the current cfm nspr use pthread locks from the system framework make the INTS_OFF/INTS_ON stuff in nspr actually acquire a real lock that the underlying os will honor. I'm worried about using real pthreads in mac nspr because I think you can only make most system calls from the main thread and I don't know that we can enforce that restriction on the current code base. So I'm thinking keep the nspr threads and just enforce syncronization with the callbacks/timers/notifiers using a real lock.

Joe Francis

Comment 14

•

23 years ago

ok, new idea. make the INTS_ON/INTS_OFF stuff enter and leave a critical region. That way everything turning on/off nspr ints is competing on the same critical region, acting just like a lock. The reason this is good is that the mac MP api supports this, and on mac os x it is just a wrapper for posix stuff, so the underlying os threads running the callbacks/timers/notifiers will respect this. I'll wager the MP headers don't pose any problems for our build system, so it should be easy to do. i'll try it later today and see.

Mike Pinkerton (not reading bugmail)

Reporter

Comment 15

•

23 years ago

two thoughts: - pro7 can build mach-o, so why aren't all the pthread headers there? - aren't the correct pthread headers on the OSX developer install for you to look at?

Simon Fraser [no longer active]

Assignee

Comment 16

•

23 years ago

Yes, you will be able to call MP stuff. This might be one way to go.

Wan-Teh Chang

Updated

•

23 years ago

Target Milestone: --- → 4.2

Mike Pinkerton (not reading bugmail)

Reporter

Comment 17

•

23 years ago

beta stopper

Whiteboard: OSX+ → OSX++

Joe Francis

Comment 18

•

23 years ago

I've played around with this some. My first attempt seems to just lock all the threads up. The chief difficulty is that _PR_INTSOFF retreives a value that indicates whether the region is already entered. But the MP api does not expose any way to get that info for a critical region. The result is that the nspr code triggers more entries into the critical region than exits. I think it may be possible to make _PR_INTSOFF & _PR_INTSON not care about that value. The number of places in the code that actually examine that value are few, and are almost entirely in mac only code. I think this value is only needed because _PR_INTSOFF doesn't currently acquire a "real" lock of any kind. I'll play with this some more over the weekend.

gordon

Comment 19

•

23 years ago

How about keeping a counter for the number of times _PR_INTSOFF has been called, and only enter/leave the critical section on 0/1 transitions, sort of like destructors and reference counting. That should have similar semantics to the current usage of _PR_INTSOFF. Or it may be even simpler than that. You may just need to keep a flag indicating whether you're in the critical section, and return that flag. If you're around this afternoon, I'd love to talk with you about it.

Mike Pinkerton (not reading bugmail)

Reporter

Comment 20

•

23 years ago

you would probably need a separate lock on this counter too, as you could be pre-empted between checking the value and then acting on it causing a race-condition.

Joe Francis

Comment 21

•

23 years ago

re gordon: i was already trying that. re mike: i was entering my critical section before using the counter. there is some code around that saves the counter, and then enters the intsOff state setting the counter to 1, and then restores the counter when it's done. This will have the effect of not turning ints back on if they weren't on before, but it totally demolishes the concept of the counter being a reentrancy count. i think i can just not use a counter at all, and tweak the code in the handful of places that use it. Almost all are in mac only code, and most of it is to detect when a notification might have been dropped becasue it came while ints were off. Under osx the notification wont be dropperd in that situation: if someone else is in the critical section when the notification comes in, the notification will block on the critical section, and then fire when it can enter.

Joe Francis

Comment 22

•

23 years ago

at request of macdev, attaching my patches. they DO NOT work. But they show the lines I am thinking along.

Joe Francis

Comment 23

•

23 years ago

Attached patch diffs of nsprpb (obsolete) — Details — Splinter Review

Joe Francis

Comment 24

•

23 years ago

a second attempt. not sure yet why this one doesn't work.

Joe Francis

Comment 25

•

23 years ago

Attached patch nspr patch (obsolete) — Details — Splinter Review

lchiang

Comment 26

•

23 years ago

sdagley - are any of the machines which we'll get from Apple for testing a dual-CPU machine? If dual-CPU machine seems to have it's own set of problems, we may need to test on that.

Simon Fraser [no longer active]

Assignee

Comment 27

•

23 years ago

We should *definately* test on dual-CPU machines, and it would be good for QA to invest in a few of these. Note that, currently, our builds are pretty much dead on such machines running Mac OS X because of this bug.

Mike Pinkerton (not reading bugmail)

Reporter

Updated

•

23 years ago

Whiteboard: OSX++ → OSX++, PDT+

Steve Dagley

Comment 28

•

23 years ago

lchiang, we aren't getting any dual CPUs in the new batch but Joe Francis has voluntereed his dual-cpu system for testing while he's on sabbatical next month

Keywords: nsBranch

Marek Z. Jeziorek

Comment 29

•

23 years ago

Steve, this is PDT+ bug. Tomorrow, on Tuesday, we'll try to build the first RTM candidate. It would be good, if this could be resolved ASAP.

chris hofmann

Comment 30

•

23 years ago

what going on with this bug? getting close to the time where it may have to be latered to future milestone/release if we are not there, or past there already...

Steve Dagley

Comment 31

•

23 years ago

This bug is specific to Mac OS X and does not impact the 'Classic' builds for Mac OS 8.6/9.x. As to where we're at with it, as an AOL employee I can't use the appropriate profanity in this public forum to describe it so I'll just say we don't expect to have a fix this month.

Boris Zbarsky [:bzbarsky]

Comment 32

•

23 years ago

*** Bug 90941 has been marked as a duplicate of this bug. ***

Mike Pinkerton (not reading bugmail)

Reporter

Comment 33

•

23 years ago

we're not gonna have this fixed by beta, removing pdt+

Whiteboard: OSX++, PDT+ → OSX++

Steve Dagley

Updated

•

23 years ago

Whiteboard: OSX++ → OSX++, meeting w/Apple in August to attempt fix w/o requiring new architecture

Mike Pinkerton (not reading bugmail)

Reporter

Comment 34

•

23 years ago

*** Bug 93824 has been marked as a duplicate of this bug. ***

Mike Pinkerton (not reading bugmail)

Reporter

Comment 35

•

23 years ago

*** Bug 93720 has been marked as a duplicate of this bug. ***

Adam Masri

Comment 36

•

23 years ago

The following Apple tech note may be helpful in resolving this bug, as well as giving Fizzilla better OS X performance. http://developer.apple.com/technotes/tn/tn2028.html - Adam

Tiantian Kong

Comment 37

•

23 years ago

Added keyword: nsenterprise+

Keywords: nsenterprise+

Steve Dagley

Comment 38

•

23 years ago

Simon is now working on this so reassigning bug

Assignee: sdagley → sfraser

Simon Fraser [no longer active]

Assignee

Updated

•

23 years ago

Whiteboard: OSX++, meeting w/Apple in August to attempt fix w/o requiring new architecture → OSX++

Simon Fraser [no longer active]

Assignee

Comment 39

•

23 years ago

Attached patch work in progress patch. needs some cleanup. (obsolete) — Details — Splinter Review

Simon Fraser [no longer active]

Assignee

Updated

•

23 years ago

Attachment #38164 - Attachment is obsolete: true

Simon Fraser [no longer active]

Assignee

Updated

•

23 years ago

Attachment #38876 - Attachment is obsolete: true

Simon Fraser [no longer active]

Assignee

Comment 40

•

23 years ago

Attached patch Final patch for review (obsolete) — Details — Splinter Review

Simon Fraser [no longer active]

Assignee

Updated

•

23 years ago

Attachment #49135 - Attachment is obsolete: true

Simon Fraser [no longer active]

Assignee

Comment 41

•

23 years ago

That last patch works. On single CPU machines, it's performance impact is below the noise threshold of the page loading tests I used to measure it. It uses MP critical sections to fix thread synchronization issues, blocking multiple MP threads from running while NSPR interrupts are turned off. The code is currently active if TARGET_CARBON is defined to be 1. It might be necessary to detect the OS version using Gestalt if this affects Carbon under 9 performance (which is unknown). I need some strong reviews, please.

Status: NEW → ASSIGNED

Simon Fraser [no longer active]

Assignee

Comment 42

•

23 years ago

I have some further changes to the patch, to conditionalize the critical region code for Mac OS X only, and to add some explanatory comments. Please wait...

Jussi-Pekka Mantere

Updated

•

23 years ago

Blocks: 97866

chris hofmann

Comment 43

•

23 years ago

is this going to make it soon? putting on the "+"

Keywords: nsbranch → nsbranch+

Simon Fraser [no longer active]

Assignee

Updated

•

23 years ago

Attachment #49168 - Attachment is obsolete: true

Simon Fraser [no longer active]

Assignee

Comment 44

•

23 years ago

Attached patch Really final patch (obsolete) — Details — Splinter Review

Simon Fraser [no longer active]

Assignee

Comment 45

•

23 years ago

The latest patch does two new things: 1. Return non-Carbon builds to their previous behaviour (just some macro changes). 2. In carbon builds, conditionally uses critical sections only if we are running on Mac OS X. So Carbon-on-classic builds are not penalized.

Simon Fraser [no longer active]

Assignee

Comment 46

•

23 years ago

Attached patch New patch addressing wtc's email comments (obsolete) — Details — Splinter Review

Simon Fraser [no longer active]

Assignee

Updated

•

23 years ago

Attachment #49391 - Attachment is obsolete: true

Simon Fraser [no longer active]

Assignee

Comment 47

•

23 years ago

The new patch: * Replaces the array of critical sections with just one (since I only used one), which now matches the single critical region entry count. * Removes a bogus WalkTheStack prototype.

Wan-Teh Chang

Comment 48

•

23 years ago

I reviewed Simon's "really final patch" (attachment 49391 [details] [diff] [review]). I already sent him some suggested changes after a quick look. Then I stared at the patch for a good hour. After I internalized the patch, I found that Simon's solution is actually very simple. (It is all those macros that makes the patch look complicated.) It is exactly what jfrancis outlined in his comments back in June. It surprises me that the simple solution is enough to solve this problem. Congratulations to all of you who worked on this bug! I will go over the patch with Simon next week. I just wanted to point out that the patch also contains a fix for bug 97866. That bug is marked as depending on this bug, but I think it is not related and its fix should be reviewed and checked in independently.

lchiang

Comment 49

•

23 years ago

(This looks close to a fix - great! Who at Netscape has a dual-processor machine besides kin? Once this is fixed, we need to do some testing.)

Jaime Rodriguez, Jr.

Comment 50

•

23 years ago

PDT+ per PDT. Let's get one.

Whiteboard: OSX++ → OSX+,PDT+

benc

Comment 51

•

23 years ago

Sorry for the somewhat basic quizzing but: 1- This bug discusses fixing the MacOS X builds only right? So it is completely irrelevant to running MacOS X w/ classic and classic build. 2- What happened if you ran MacOS X builds on dual-processor systems before this fix? (Boom?!). Did this affect dual processor systems in pre-X? Need this info to figure out how much upgrading is needed for testing.

Mike Pinkerton (not reading bugmail)

Reporter

Comment 52

•

23 years ago

ben: 1) correct, this does not affect the classic bits (non-carbon) running on osx in the classic environment 2) pages would stall, and never finish loading. sometimes the app would crash when you quit after failing to load pages.

Simon Fraser [no longer active]

Assignee

Comment 53

•

23 years ago

Attached patch Super-final patch (obsolete) — Details — Splinter Review

Simon Fraser [no longer active]

Assignee

Updated

•

23 years ago

Attachment #49408 - Attachment is obsolete: true

Simon Fraser [no longer active]

Assignee

Comment 54

•

23 years ago

The super-final patch: * removes the async file I/O changes, which are now in a patch in bug 97866 * replaces separate intsoff/intson implemenations with a single, simplified intsoff routine, and removes some macro code duplication. * Uses intsoff/fastintson rather than setintsoff(1)/setintsoff(0) in a couple of callback routines. * Improves the comments in _MD_PauseCPU and DoneWaitingOnThisThread(). gordon, please review. wtc, if you'll then bless this, I can check it in.

gordon

Comment 55

•

23 years ago

r=gordon, unless you want to clean up _macos.h or add a wrapper to DoneWaitingOnThisThread(). wtc, the #ifdefs in primpl.h give me headaches. Are you sure they're correct? There seem to be redundant nesting of global and local thread cases.

Simon Fraser [no longer active]

Assignee

Comment 56

•

23 years ago

And now the bad news. Adding the critical regions on Mac OS X does have some perceptible impact on performance in some areas. Page loading is not affected, but some UI operations are (I noticed, for example, that loading the buddy list in a commercial build got slower), and the browser feels someone less responsive on X. It may be that parts of the code that rely a lot on PR_Lock/PR_Unlock, or atomic operations, are slowed down by this patch.

Jaime Rodriguez, Jr.

Comment 57

•

23 years ago

This is not on Tian Tian Must Fix list for this release. Should we consider minusing it, and changing the priority for this release?

Brian Nesse (gone)

Comment 58

•

23 years ago

sdagley commenting here - If this isn't on tiantian's list it damn sure should be - this is most DEFINITELY a stop ship bug. Otherwise explain to Marty Fisher hy we don't run on his new Mac :-)

Steve Dagley

Updated

•

23 years ago

Whiteboard: OSX+,PDT+ → [OSX+],PDT+

Joe Francis

Comment 59

•

23 years ago

Seconding Steve's comments. As someone involved with the evolution of this bug fix I feel we are much better off with it than without it. Both MacOS X and dual CPU macs are catching on pretty quickly. Lets not leave those folks stranded.

Tiantian Kong

Comment 60

•

23 years ago

Hi: This bug is the FIRST one listed on my stop ship bug list compiled yesterday. Please check again the list on http://bugscape.netscape.com/show_bug.cgi?id=9485. And it is the first bug on the 8/21 list as well.

Jaime Rodriguez, Jr.

Comment 61

•

23 years ago

Check it into the trunk, and let's get some testing numbers (New Window, Startup, etc.). Tian Tian and Hong to coordinate test efforts.

Simon Fraser [no longer active]

Assignee

Comment 62

•

23 years ago

Attached patch Patch with minor revisions — Details — Splinter Review

Simon Fraser [no longer active]

Assignee

Updated

•

23 years ago

Attachment #49782 - Attachment is obsolete: true

Simon Fraser [no longer active]

Assignee

Comment 63

•

23 years ago

Chnages in this patch: * removed redundant duplicate of _MD_INTSOFF macro in _macos.h * In code blocks like this: if (_PR_MD_GET_INTSOFF()) { thread->md.missedIONotify = PR_TRUE; cpu->u.missed[cpu->where] |= _PR_MISSED_IO; return; } which are outside of the critical region, reversed the order of setting missedIONotify and the _PR_MISSED_IO to mitigate the effects of preemption at this point. Note that the patch is a diff -b (ignoring space change), so spacing looks worse than it really is.

gordon

Comment 64

•

23 years ago

Comment on attachment 49998 [details] [diff] [review] Patch with minor revisions r=gordon

gordon

Comment 65

•

23 years ago

I was unable to edit patch status to specify "has-review". Does that have something to do with the Product field?

lchiang

Comment 66

•

23 years ago

I will coordinate the testing for the perf numbers. Results to be posted when they come in. Per simon: 1. Tests run on single processor CPU (we don't have dual handy) 2. Run on today's trunk build vs. tomorrow's trunk build 3. Run: window open, startup, IM, prefs, mail performance tests 4. Results posted back here as attachments Thanks

Wan-Teh Chang

Comment 67

•

23 years ago

Attached patch Proposed patch, based on sfraser's last patch. — Details — Splinter Review

Wan-Teh Chang

Comment 68

•

23 years ago

Most of the changes in my patch from Simon's last patch are a matter of style. - I make sure no lines are longer than 80 characters. - I do not make whitespace changes. This is to make the diffs cleaner. - Whether tabs or spaces are used follows the local convention. Again, this is to make the diffs easier to read. - "Regions" and "REGIONS" are changed to "Region" and "REGION" because there is only one critical region. - "TermCriticalRegions" is renamed "DestroyCriticalRegion". - Some macro definitions were moved between primpl.h and _macos.h and the intermediate macro _MD_INTSOFF() was deleted. - gUseCriticalRegion and gCriticalRegion are declared static because they are only used in macthr.c. The real changes are: - In _PR_INTSOFF(), I leave the critical region immediately so that the entry count is always 1. This way, in _PR_FAST_INTSON() and _PR_INTSON(), I only need to leave the critical region once if we need to leave the critical region completely. As a result, the global variable gCriticalRegionEntryCount can be deleted. - In macio.c, AsyncIOCompletion(), _PR_INTSOFF() and _PR_FAST_INTSON() are not necessary because they are called in DoneWaitingOnThisThread(). I also moved the setting of thread->md.osErrCode before the if block. Simon, please review my patch and test it. If it doesn't work and it takes too long to fix it, you can go ahead and check in your patch on the NSPRPUB_PRE_4_2_CLIENT_BRANCH. I will check email from home tonight.

Wan-Teh Chang

Comment 69

•

23 years ago

The sentence In _PR_INTSOFF(), I leave the critical region immediately so that the entry count is always 1. should read In _PR_INTSOFF(), I leave the critical region immediately if we are already in the critical region so that the entry count is always 1.

Simon Fraser [no longer active]

Assignee

Comment 70

•

23 years ago

wtc, your final patch does not work, alas. As-is, it deadlocks on launch. I noticed that this seems wrong: #define _PR_INTSOFF(_is) \ PR_BEGIN_MACRO \ + ENTER_CRITICAL_REGION(); \ (_is) = _PR_MD_GET_INTSOFF(); \ _PR_MD_SET_INTSOFF(1); \ + if (_is) \ + LEAVE_CRITICAL_REGION(); \ PR_END_MACRO and should probably be _PR_MD_SET_INTSOFF(1); \ + if (_is == 0) \ + LEAVE_CRITICAL_REGION(); \ but with that fixed, we get a slew of assertions about leaving non-existent critical sections.

Simon Fraser [no longer active]

Assignee

Comment 71

•

23 years ago

I checked my previous patch into the NSPRPUB_PRE_4_2_CLIENT_BRANCH at 8:45pm, so it will show up in trunk builds tomorrow (9/20).

sairuh (rarely reading bugmail)

Comment 72

•

23 years ago

trunk testing [BEFORE checkin] using: * 2001.09.18.20-trunk commercial * on Mac OS 10.0.4, 450 Mhz G3, 384Mhz * modern theme [note: separate profiles used for prefs and page load tests] Preferences dialog open: 2.50sec Switching from Navigator to Mail & Newsgroups panel: 0.70sec will soon attach page load results [side note: crashed several times after page load completed --see bug 100852].

sairuh (rarely reading bugmail)

Comment 73

•

23 years ago

Attached file 2001.09.18.20-trunk comm page load results (obsolete) — Details

sairuh (rarely reading bugmail)

Comment 74

•

23 years ago

Comment on attachment 50169 [details] 2001.09.18.20-trunk comm page load results re-ran the pre-checkin page load tests, thus invalidating this run. will reattach new set.

Attachment #50169 - Attachment is obsolete: true

sairuh (rarely reading bugmail)

Comment 75

•

23 years ago

Attached file two runs of page load results using 2001.09.18.20-trunk — Details

sairuh (rarely reading bugmail)

Comment 76

•

23 years ago

tried running the page load tests using the 2001.09.20.16-trunk bits: alas, it would not complete. it hung [but not on the same page] each of the three times i tried --even after rebooting the machine after the second hang. :( will try to run the preferences perf tests. hopefully that will go more smoothly...

sairuh (rarely reading bugmail)

Comment 77

•

23 years ago

alas, i'm blocked with the preferences testing as well. :( i need to restart the app btwn each dialog-open timing [to get four timings and report their average]. i was able to launch the app a first time with a new profile...but with the second launch the browser window didn't even appear: only got the menubar and the rainbow spinning cursor. restarting the machine kinda "works around" the hang, but it's a rather harsh workaround. another note: simon mentioned that this might be a 10.0.4 issue --since he was recently able to run the page load tests [same 2001.09.20.16-trunk build] on his dual-cpu G4 [which has a later version of 10.x].

sairuh (rarely reading bugmail)

Comment 78

•

23 years ago

recent email rec'd from todds@bigfoot.com: "...I've been following this bug since I want to use mozilla on my dual 800 with OS X 10.0.4. "I just noticed your most recent comment at 22:17, and wanted to let you know that I am experiencing the same spinning rainbow cursor, and the browser never starts when using the 2001.09.20 build. "Hope that helps in any way, let me know if I can help otherwise."

Simon Fraser [no longer active]

Assignee

Comment 79

•

23 years ago

Hold off on testing here folks. It looks like my changes causes deadlock on single and dual-CPU machines running 10.0.x builds. Damn. Running a new build now, more tomorrow.

Jussi-Pekka Mantere

Comment 80

•

23 years ago

Running 10.1 would not affect the deadlock?

Jaime Rodriguez, Jr.

Comment 81

•

23 years ago

Let's find out if this will ever work on 10.04? Simon is researching ...

Brian Bergstrand

Comment 82

•

23 years ago

I think 10.0.x is a lost cause. I just got a new dual-800, and was having MAJOR stability problems with 10.0.4 4S10. The machine would randomly crash. I posted to the Darwin list, and was told 10.0.4 has major SMP issues and to wait for 10.1. Let's just say this was good advice. See http://www.darwinfo.org/devlist.php3?number=11117 for the full message.

stephend@netscape.com (gone - use stephen.donner@gmail.com instead)

Comment 83

•

23 years ago

Attached file OS X perf results for Mail — Details

lchiang

Comment 84

•

23 years ago

Looking at Stephen's mail results for OS X for 9-18 vs. 9-21 builds, there is some amount of slow down in the 9-21 build.

Steve Dagley

Comment 85

•

23 years ago

Can we get those tests re-run with Mac OS X 10.abouttobereleased?

Brice Ruth

Comment 86

•

23 years ago

Do you need a dual CPU or does a single CPU work for the speed tests? If a single CPU works, lemme know exactly which files to grab for the tests to be consistent.

lchiang

Comment 87

•

23 years ago

Brice - thanks for the offer to help. For the speed tests, here is one of them (Mail). All the test files and methodology are there. The important thing is to use the same machine for the before/after builds with simon's fix. A single CPU is fine for this speed test. Do you have version 10.1 of the OS? If so, that would be great. sdagley - can you give me a CD with the updated OS?

Brice Ruth

Comment 88

•

23 years ago

I looked at the perf results attachment and I think I can do all that ... gotta see about the stopwatch, though :) If I have any questions, I'll let ya know. I assume grabbing today's 'nightly' is fine enough for a build w/ the patch incorporated?

Simon Fraser [no longer active]

Assignee

Comment 89

•

23 years ago

Here's the deal. The MP APIs that I'm using for dual CPU protection are broken under 10.0.x, but fixed for 10.1. I have code that works around this problem, that will make things work on 10.0.x. So I need to check that in, and we then have to do another round of performance testing.

Brice Ruth

Comment 90

•

23 years ago

Oh - and I can't confirm anything about having 10.1 - to my knowledge, there is no such release yet. :)

Simon Fraser [no longer active]

Assignee

Comment 91

•

23 years ago

Attached patch Patch to fix MPEnter/LeaveCriticalRegion on 10.0.x (obsolete) — Details — Splinter Review

grega

Updated

•

23 years ago

Keywords: nsenterprise+

stephend@netscape.com (gone - use stephen.donner@gmail.com instead)

Comment 92

•

23 years ago

Brice, yes. Thanks for helping with this. So, to summarize: Folder loading speed: done with local folders Thread pane scrolling: local or IMAP Message display speed: local folders Thread pane sorting: local or imap Message Reply Speed: IMAP, see http://www.mozilla.org/mailnews/performance/msgcompose_loggin.html for details on how to generate a performance logfile for this. Batch delete 5 (10kb each) messages: local Mail module loading time: I wouldn't bother with this, it's too complex currently (using IMAP, non-isolated servers, etc.) to be truly reliable.

Keywords: nsenterprise+

grega

Comment 93

•

23 years ago

marking nsenterprise- to get off the enterprise stop ship list.

Keywords: nsenterprise-

grega

Updated

•

23 years ago

Keywords: nsenterprise+

Simon Fraser [no longer active]

Assignee

Comment 94

•

23 years ago

I tested the lastest changes on 10.1 single- and dual- CPU machines, and both work just fine.

lchiang

Comment 95

•

23 years ago

I have the 10.1 update to Mac OS X. Stephen or Sairuh, pls stop by to install for testing. Simon - do you expect to see a timing difference between OS X and 10.1?

Simon Fraser [no longer active]

Assignee

Updated

•

23 years ago

Attachment #50595 - Attachment is obsolete: true

Simon Fraser [no longer active]

Assignee

Comment 96

•

23 years ago

Attached patch Tidied up MDEnter/Leave critical section code — Details — Splinter Review

sairuh (rarely reading bugmail)

Comment 97

•

23 years ago

spoke with lisa briefly: stephend upgraded to 10.1, but i'll stick with 10.0.4 for the nonce --so that we can test with smfr's latest patch [after review/check-in] on both OS versions.

Steve Dagley

Comment 98

•

23 years ago

Comment on attachment 50724 [details] [diff] [review] Tidied up MDEnter/Leave critical section code r=sdagley

Attachment #50724 - Flags: review+

Simon Fraser [no longer active]

Assignee

Comment 99

•

23 years ago

I checked in the working critical region code to the NSPR branch, which is used by the mozilla TRUNK. So this bug is now good to go on for testing.

Joe Francis

Comment 100

•

23 years ago

holy crap. So that's why I couldn't get this to work before - I was doing the same thing under 10.0.4

Jaime Rodriguez, Jr.

Comment 101

•

23 years ago

Simon - Pls get the SR=, so we can plus this one, and check it in today.

Whiteboard: [OSX+],PDT+ → [OSX+],PDT

Jaime Rodriguez, Jr.

Updated

•

23 years ago

Whiteboard: [OSX+],PDT → [OSX+],PDT,[ETA 09.26]

Simon Fraser [no longer active]

Assignee

Comment 102

•

23 years ago

This has all the reviews it needs to be checked in to the branch. It's already on the trunk.

sairuh (rarely reading bugmail)

Comment 103

•

23 years ago

trunk testing of Preferences panel: compared with my 2001-09-20 16:31 results. same configuration, with a new profile [modern, no sidebar]. AFTER checkin, using 2001.09.25.20-trunk ---------------------------------------- Open Preferences dialog open: 3.39sec Switching from Navigator to Mail & Newsgroups panel: 0.97sec BEFORE checkin, using 2001.09.18.20-trunk ----------------------------------------- Preferences dialog open: 2.50sec Switching from Navigator to Mail & Newsgroups panel: 0.70sec looks like this checkin might've slowed thing down a bit. currently running page load tests. will attach when finished.

stephend@netscape.com (gone - use stephen.donner@gmail.com instead)

Comment 104

•

23 years ago

I thought we were supposed to test on the 26th builds: AFTER build: Trunk build from 9/26 (when it shows up). If the fix was committed on 9/25 at 7:38pm, how does it show up in the 25-20 builds, which are produced at 8 pm (and presumably take more than ~20 minutes to pull and build). Just checking to make sure I test the right build.

Wan-Teh Chang

Comment 105

•

23 years ago

I just wanted to point out the the focus of the QA testing should be on the stability of this patch. That is, whether the Mozilla client crashes or deadlocks with the patch applied. The current code is broken on Mac OS X. No matter how much faster it is, it is broken. If Simon's patch fixes the thread safety problem on Mac OS X, it should be checked in. There is not much point in comparing its performance with the performance of broken code on Mac OS X. Any performance comparision should be done on classic Mac OS, where the current code works.

Jaime Rodriguez, Jr.

Comment 106

•

23 years ago

check it in - PDT+

Whiteboard: [OSX+],PDT,[ETA 09.26] → [OSX+],PDT+,[ETA 09.26]

sairuh (rarely reading bugmail)

Comment 107

•

23 years ago

i've created bug 101838 for the performance slowdown i've noticed in the 2001.09.25.20-trunk builds.

stephend@netscape.com (gone - use stephen.donner@gmail.com instead)

Comment 108

•

23 years ago

Attached file Mail performance results, take 2 (10.1, this time) — Details

Jennifer Kobayashi

Comment 109

•

23 years ago

Startup Results I take an average of three for launch and relaunch on both builds. I tested these manually via a stopwatch. Times are in seconds. Build 2001-09-18-20-trunk Launch: 19.25 Re-launch: 12.49 Build 2001-09-25-20-trunk Launch: 24.17 Re-launch: 16.62 So, as you can see.. a fairly large jump in times in the 9-25 build :( There were no peaks or weird jumps, all times on both builds were fairly consistent.

Brice Ruth

Comment 110

•

23 years ago

OS X v10.1, iMac DV 400MHz, 768MB RAM Test/Build|--9/18--|--9/27--| Startup 1 | 24.45 | 18.40 | Startup 2 | 10.31 | 11.78 | Startup 1 indicates that I rebooted and launched Mozilla for the first time, Startup 2 indicates that I launched Mozilla directly after quitting the same build of Mozilla, after a clean reboot. Basically, I rebooted, launched build 9/18, quit, launched again - recorded times, rebooted, launched build 9/27, quit, launched again - recorded times. I'll try to get some Mail Perfs now that I have a stopwatch.

scalkins

Comment 111

•

23 years ago

Attached file IM results of Performance testing on OSX. Everything is slower here also. — Details

Brice Ruth

Comment 112

•

23 years ago

I'm really sorry that I have to report this ... but, I believe I just experienced deadlock in Mozilla, build 2001092620 - I had the spinning pizza for at least 5 minutes and when I checked out top, Moz was hovering right around 50% usage. I had to forcibly kill Moz :( I don't really know what 'caused' it ... I was viewing a web page when my wife sent me an email, so I command-2'd over to Mail/News, hit command-t to get new mail, then clicked in the 'Denise' folder that her messages get filtered into and that's when the pizza started spinning. I don't have a dual-proc machine, this is the same machine I posted the perf results from (iMac DV 400 w/ 768MB). If this *wasn't* deadlock, what could it have been?

Simon Fraser [no longer active]

Assignee

Comment 113

•

23 years ago

Checked in on 0.9.4 branch. Keeping open for NSPR tip.

Keywords: nsbeta1, nsbranch+, nsenterprise-, nsmac1

Whiteboard: [OSX+],PDT+,[ETA 09.26]

Simon Fraser [no longer active]

Assignee

Comment 114

•

23 years ago

Checked in on the NSPR tip.

Status: ASSIGNED → RESOLVED

Closed: 23 years ago

Resolution: --- → FIXED

lchiang

Comment 115

•

23 years ago

kin - I heard you have a dual processor machine. Can you verify for QA that that the branch build starts on your machine?

sharky_k_

Comment 116

•

23 years ago

I have a G4 533 dual and Mozilla build nr. 2001093021 works great. Startup time could be less (I hope).

kinmoz

Comment 117

•

23 years ago

lchiang - I have no dual processor machines ... I believe both jfrancis and sfraser do.

lchiang

Comment 118

•

23 years ago

Thank you. I think I will mark this verified (for startup of application on dual processor Mac) per sharky_k_@mac.com's comment. I have also asked Marty F. at AOL to confirm for me on his dual processor for the commercial build. I will also inquire with one other source for the commercial build.

Status: RESOLVED → VERIFIED

Hong Kwon

Comment 119

•

23 years ago

I will also test on fkeeney's dual processor which he has graciously lent to me. If anyone wants to come by and use it, you are more than welcome to (B20-1)

Hong Kwon

Comment 120

•

23 years ago

Some results: On sdagley's G4/733 it took around 5 minutes to download 2220 headers (using yesterday's trunk) On fkeeney's dualG4/500 it took around 5 minutes to download 3482 headers (using today's branch) On sdagley's G4/733 it took around 30 seconds to download 2220 headers (using a debug build from a few weeks ago, pre Simon's fix). This on a non-optimized build. As you can see this is a HUGE performance degradation. It is literally 10 times slower.

Hong Kwon

Comment 121

•

23 years ago

posted new bug for mail performance degradation: http://bugzilla.mozilla.org/show_bug.cgi?id=102797 lrg tested major windows and browsed around on dual CPU machine for about 3 hours and didn't find any problems (other than generally sluggish performance).

You need to log in before you can comment on or make changes to this bug.