Closed Bug 71718 Opened 24 years ago Closed 23 years ago

Carbonized Mac NSPR needs architectural work to run on dual-CPU machines

Categories

(NSPR :: NSPR, defect, P1)

PowerPC
macOS
defect

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: mikepinkerton, Assigned: sfraser_bugs)

References

Details

Attachments

(7 files, 9 obsolete files)

16.91 KB, patch
Details | Diff | Splinter Review
13.78 KB, patch
Details | Diff | Splinter Review
22.59 KB, text/html
Details
4.96 KB, text/html
Details
9.82 KB, patch
sdagley
: review+
Details | Diff | Splinter Review
5.98 KB, text/html
Details
2.12 KB, text/html
Details
Run the fizzilla build on a dual-cpu g4 (like the one in my cube) and the app stalls loading most pages. Here is the email from Apple (vlubet@apple.com): Mike, I looked at the code on macsockotpt.c and do not see any call to OTEnterNotifier() and OTLeaveNotifier(). On Mac OS X, Open Transport notifiers, timer tasks and deferred task do not run to completion. That means that code from the main event loop or another thread may preempt code running from OT notifiers, OT timer tasks and OT deferred tasks. The solution is to always protect your data by using either atomic operations or by using OTEnterNotifier()/OTLeaveNotifier(). Bracketing your critical sections with OTEnterNotifier()/OTLeaveNotifier() is usually the most convenient solution as this ensures that either the notifier or the main event loop / cooperative thread is accessing common data. For example here is what I would do: 1369 do { Boolean entered = OTEnterNotifier(endpoint); 1370 fd->secret->md.read.thread = me; 1371 fd->secret->md.readReady = PR_FALSE; // expect the worst 1372 result = OTRcv(endpoint, buf, bytesLeft, NULL); 1373 if (fd->secret->nonblocking) { 1374 fd->secret->md.readReady = (result != kOTNoDataErr); if (entered) OTLeaveNotifier(endpoint); 1375 break; 1376 } else { 1377 if (result != kOTNoDataErr) { 1378 // If we successfully read a blocking socket, check for more data. 1379 // According to IM:OT, we should be able to rely on OTCountDataBytes 1380 // to tell us whether there is a nonzero amount of data pending. 1381 size_t count; 1382 OSErr tmpResult; 1383 tmpResult = OTCountDataBytes(endpoint, &count); 1384 fd->secret->md.readReady = ((tmpResult == kOTNoError) && (count > 0)); 1385 if (entered) OTLeaveNotifier(endpoint); 1386 break; 1387 } 1388 1389 // Blocking read, but no data available. Wait for an event 1390 // on this endpoint, and hope that we get a T_DATA event. if (entered) OTLeaveNotifier(endpoint); 1391 WaitOnThisThread(me, PR_INTERVAL_NO_TIMEOUT); 1392 result = me->md.osErrCode; 1393 if (result != kOTNoError) // interrupted thread, etc. 1394 break; 1395 1396 // Prepare to loop back and try again 1397 PrepareForAsyncCompletion(me, fd->secret->md.osfd); 1398 } 1399 } 1400 // Retry read if we had to wait for data to show up. 1401 while(1); It certainly behaves like our endpoints are being clobbered in a race-condition- ish situation.
Apple has shipped us a dual CPU G4 so we don't have to camp in pink's cube trying to debug this
Status: NEW → ASSIGNED
Keywords: mozilla0.8.1
Target Milestone: --- → mozilla0.9.1
They have? Can I have it? ;)
sfraser has the dual CPU loaner Mac, now he has the bug too :-)
Assignee: sdagley → sfraser
Status: ASSIGNED → NEW
Priority: -- → P1
moving to 9.2
Target Milestone: mozilla0.9.1 → mozilla0.9.2
I'll just point out that the dual-CPU loaner system is due back to Apple on 6/1 so that, while this fix may not be critical for 0.9.1, it needs to land well before 0.9.2
Whiteboard: OSX+
mass move, v2. qa to me.
QA Contact: tever → benc
The Mac NSPR code has serious issues on Mac OS X, which are particularly visible on machines with two processors. The main issue here is that OT notifiers, async file I/O callbacks, and timer tasks each use pthreads under the hood, and are thus interruptible. This breaks assumptions that the mac code currently makes relating to the switching on and off of interrupts, and lock consistency. It would appear that the only good solution to this problem is to build NSPR as a Mach-O binary, and have it use the Unix implementation. If we do that, we need to ensure that the implementation permits clients to call any Carbon Toolbox functions on threads, as they can now. It would be bad to move to pthreads, and to find, suddenly, that bad things happen when trying to do, say, drawing, or memory allocation, on non-main threads.
Assignee: sfraser → wtc
Component: Networking → NSPR
Product: Browser → NSPR
QA Contact: benc → larryh
Summary: Most pages stall on dual-cpu G4 → Carbonized Mac NSPR needs architectural work to run on dual-CPU machines
Target Milestone: mozilla0.9.2 → ---
Steve, could you please work on this bug? We already have an implementation of NSPR for Mac OS X based on pthreads.
Assignee: wtc → sdagley
wtc, can you tell me which nspr defines need to be lit to do a nspr for mac os x build?
The only way to build NSPR for Mac OS X is to use the Unix build instructions, i.e., configure and make. I don't know how to integrate this with Code Warrior.
I spoke to Quinn at Apple's WWDC about calling into Mach-O from CFM and he pointed me to <http://developer.apple.com/samplecode/Sample_Code/ Runtime_Architecture/CallMachOFramework.htm>. It's intended for calling frameworks but I'm hoping it can be adapted to general Mach-O libs.
Thanks. As it happens I found that on my own. And I got that much working... I can load the System framework at _MD_EARLY_INIT or whatever it's called. But then I'm stuck. I have no pthread headers that will work with CW. In fact, just to get the first part working I had to copy over the Universal folder from CW7EA. There was needed stuff missing from the headers otherwise. I tried to handroll my own pthread declararions, because I need only a few calls. But unfortunately initializing a pthread mutex requires a *structure* as one of the params (what the heck were those posix folks thinking?) so it gets complicated to get it all correct. I really need posix headers that work with CodeWarrior. Or I could try porting nspr over to ProjectBuilder. Then I could make a knock- off nspr lib that had pthreads locks in the right places to get dual cpu macs happy.
oops. Sorry Steve, I thought you were replying to a question I asked in another forum. You are actually looking at the opposite solution from what I am thinking. You want to make a mach-o nspr and clal it from cfm fizzilla, right? I want to make the current cfm nspr use pthread locks from the system framework make the INTS_OFF/INTS_ON stuff in nspr actually acquire a real lock that the underlying os will honor. I'm worried about using real pthreads in mac nspr because I think you can only make most system calls from the main thread and I don't know that we can enforce that restriction on the current code base. So I'm thinking keep the nspr threads and just enforce syncronization with the callbacks/timers/notifiers using a real lock.
ok, new idea. make the INTS_ON/INTS_OFF stuff enter and leave a critical region. That way everything turning on/off nspr ints is competing on the same critical region, acting just like a lock. The reason this is good is that the mac MP api supports this, and on mac os x it is just a wrapper for posix stuff, so the underlying os threads running the callbacks/timers/notifiers will respect this. I'll wager the MP headers don't pose any problems for our build system, so it should be easy to do. i'll try it later today and see.
two thoughts: - pro7 can build mach-o, so why aren't all the pthread headers there? - aren't the correct pthread headers on the OSX developer install for you to look at?
Yes, you will be able to call MP stuff. This might be one way to go.
Target Milestone: --- → 4.2
beta stopper
Whiteboard: OSX+ → OSX++
I've played around with this some. My first attempt seems to just lock all the threads up. The chief difficulty is that _PR_INTSOFF retreives a value that indicates whether the region is already entered. But the MP api does not expose any way to get that info for a critical region. The result is that the nspr code triggers more entries into the critical region than exits. I think it may be possible to make _PR_INTSOFF & _PR_INTSON not care about that value. The number of places in the code that actually examine that value are few, and are almost entirely in mac only code. I think this value is only needed because _PR_INTSOFF doesn't currently acquire a "real" lock of any kind. I'll play with this some more over the weekend.
How about keeping a counter for the number of times _PR_INTSOFF has been called, and only enter/leave the critical section on 0/1 transitions, sort of like destructors and reference counting. That should have similar semantics to the current usage of _PR_INTSOFF. Or it may be even simpler than that. You may just need to keep a flag indicating whether you're in the critical section, and return that flag. If you're around this afternoon, I'd love to talk with you about it.
you would probably need a separate lock on this counter too, as you could be pre-empted between checking the value and then acting on it causing a race-condition.
re gordon: i was already trying that. re mike: i was entering my critical section before using the counter. there is some code around that saves the counter, and then enters the intsOff state setting the counter to 1, and then restores the counter when it's done. This will have the effect of not turning ints back on if they weren't on before, but it totally demolishes the concept of the counter being a reentrancy count. i think i can just not use a counter at all, and tweak the code in the handful of places that use it. Almost all are in mac only code, and most of it is to detect when a notification might have been dropped becasue it came while ints were off. Under osx the notification wont be dropperd in that situation: if someone else is in the critical section when the notification comes in, the notification will block on the critical section, and then fire when it can enter.
at request of macdev, attaching my patches. they DO NOT work. But they show the lines I am thinking along.
Attached patch diffs of nsprpb (obsolete) — Splinter Review
a second attempt. not sure yet why this one doesn't work.
Attached patch nspr patch (obsolete) — Splinter Review
sdagley - are any of the machines which we'll get from Apple for testing a dual-CPU machine? If dual-CPU machine seems to have it's own set of problems, we may need to test on that.
We should *definately* test on dual-CPU machines, and it would be good for QA to invest in a few of these. Note that, currently, our builds are pretty much dead on such machines running Mac OS X because of this bug.
Whiteboard: OSX++ → OSX++, PDT+
lchiang, we aren't getting any dual CPUs in the new batch but Joe Francis has voluntereed his dual-cpu system for testing while he's on sabbatical next month
Keywords: nsBranch
Steve, this is PDT+ bug. Tomorrow, on Tuesday, we'll try to build the first RTM candidate. It would be good, if this could be resolved ASAP.
what going on with this bug? getting close to the time where it may have to be latered to future milestone/release if we are not there, or past there already...
This bug is specific to Mac OS X and does not impact the 'Classic' builds for Mac OS 8.6/9.x. As to where we're at with it, as an AOL employee I can't use the appropriate profanity in this public forum to describe it so I'll just say we don't expect to have a fix this month.
*** Bug 90941 has been marked as a duplicate of this bug. ***
we're not gonna have this fixed by beta, removing pdt+
Whiteboard: OSX++, PDT+ → OSX++
Whiteboard: OSX++ → OSX++, meeting w/Apple in August to attempt fix w/o requiring new architecture
*** Bug 93824 has been marked as a duplicate of this bug. ***
*** Bug 93720 has been marked as a duplicate of this bug. ***
The following Apple tech note may be helpful in resolving this bug, as well as giving Fizzilla better OS X performance. http://developer.apple.com/technotes/tn/tn2028.html - Adam
Added keyword: nsenterprise+
Keywords: nsenterprise+
Simon is now working on this so reassigning bug
Assignee: sdagley → sfraser
Whiteboard: OSX++, meeting w/Apple in August to attempt fix w/o requiring new architecture → OSX++
Attachment #38164 - Attachment is obsolete: true
Attachment #38876 - Attachment is obsolete: true
Attached patch Final patch for review (obsolete) — Splinter Review
Attachment #49135 - Attachment is obsolete: true
That last patch works. On single CPU machines, it's performance impact is below the noise threshold of the page loading tests I used to measure it. It uses MP critical sections to fix thread synchronization issues, blocking multiple MP threads from running while NSPR interrupts are turned off. The code is currently active if TARGET_CARBON is defined to be 1. It might be necessary to detect the OS version using Gestalt if this affects Carbon under 9 performance (which is unknown). I need some strong reviews, please.
Status: NEW → ASSIGNED
I have some further changes to the patch, to conditionalize the critical region code for Mac OS X only, and to add some explanatory comments. Please wait...
Blocks: 97866
is this going to make it soon? putting on the "+"
Keywords: nsbranchnsbranch+
Attachment #49168 - Attachment is obsolete: true
Attached patch Really final patch (obsolete) — Splinter Review
The latest patch does two new things: 1. Return non-Carbon builds to their previous behaviour (just some macro changes). 2. In carbon builds, conditionally uses critical sections only if we are running on Mac OS X. So Carbon-on-classic builds are not penalized.
Attachment #49391 - Attachment is obsolete: true
The new patch: * Replaces the array of critical sections with just one (since I only used one), which now matches the single critical region entry count. * Removes a bogus WalkTheStack prototype.
I reviewed Simon's "really final patch" (attachment 49391 [details] [diff] [review]). I already sent him some suggested changes after a quick look. Then I stared at the patch for a good hour. After I internalized the patch, I found that Simon's solution is actually very simple. (It is all those macros that makes the patch look complicated.) It is exactly what jfrancis outlined in his comments back in June. It surprises me that the simple solution is enough to solve this problem. Congratulations to all of you who worked on this bug! I will go over the patch with Simon next week. I just wanted to point out that the patch also contains a fix for bug 97866. That bug is marked as depending on this bug, but I think it is not related and its fix should be reviewed and checked in independently.
(This looks close to a fix - great! Who at Netscape has a dual-processor machine besides kin? Once this is fixed, we need to do some testing.)
PDT+ per PDT. Let's get one.
Whiteboard: OSX++ → OSX+,PDT+
Sorry for the somewhat basic quizzing but: 1- This bug discusses fixing the MacOS X builds only right? So it is completely irrelevant to running MacOS X w/ classic and classic build. 2- What happened if you ran MacOS X builds on dual-processor systems before this fix? (Boom?!). Did this affect dual processor systems in pre-X? Need this info to figure out how much upgrading is needed for testing.
ben: 1) correct, this does not affect the classic bits (non-carbon) running on osx in the classic environment 2) pages would stall, and never finish loading. sometimes the app would crash when you quit after failing to load pages.
Attached patch Super-final patch (obsolete) — Splinter Review
Attachment #49408 - Attachment is obsolete: true
The super-final patch: * removes the async file I/O changes, which are now in a patch in bug 97866 * replaces separate intsoff/intson implemenations with a single, simplified intsoff routine, and removes some macro code duplication. * Uses intsoff/fastintson rather than setintsoff(1)/setintsoff(0) in a couple of callback routines. * Improves the comments in _MD_PauseCPU and DoneWaitingOnThisThread(). gordon, please review. wtc, if you'll then bless this, I can check it in.
r=gordon, unless you want to clean up _macos.h or add a wrapper to DoneWaitingOnThisThread(). wtc, the #ifdefs in primpl.h give me headaches. Are you sure they're correct? There seem to be redundant nesting of global and local thread cases.
And now the bad news. Adding the critical regions on Mac OS X does have some perceptible impact on performance in some areas. Page loading is not affected, but some UI operations are (I noticed, for example, that loading the buddy list in a commercial build got slower), and the browser feels someone less responsive on X. It may be that parts of the code that rely a lot on PR_Lock/PR_Unlock, or atomic operations, are slowed down by this patch.
This is not on Tian Tian Must Fix list for this release. Should we consider minusing it, and changing the priority for this release?
sdagley commenting here - If this isn't on tiantian's list it damn sure should be - this is most DEFINITELY a stop ship bug. Otherwise explain to Marty Fisher hy we don't run on his new Mac :-)
Whiteboard: OSX+,PDT+ → [OSX+],PDT+
Seconding Steve's comments. As someone involved with the evolution of this bug fix I feel we are much better off with it than without it. Both MacOS X and dual CPU macs are catching on pretty quickly. Lets not leave those folks stranded.
Hi: This bug is the FIRST one listed on my stop ship bug list compiled yesterday. Please check again the list on http://bugscape.netscape.com/show_bug.cgi?id=9485. And it is the first bug on the 8/21 list as well.
Check it into the trunk, and let's get some testing numbers (New Window, Startup, etc.). Tian Tian and Hong to coordinate test efforts.
Attachment #49782 - Attachment is obsolete: true
Chnages in this patch: * removed redundant duplicate of _MD_INTSOFF macro in _macos.h * In code blocks like this: if (_PR_MD_GET_INTSOFF()) { thread->md.missedIONotify = PR_TRUE; cpu->u.missed[cpu->where] |= _PR_MISSED_IO; return; } which are outside of the critical region, reversed the order of setting missedIONotify and the _PR_MISSED_IO to mitigate the effects of preemption at this point. Note that the patch is a diff -b (ignoring space change), so spacing looks worse than it really is.
Comment on attachment 49998 [details] [diff] [review] Patch with minor revisions r=gordon
I was unable to edit patch status to specify "has-review". Does that have something to do with the Product field?
I will coordinate the testing for the perf numbers. Results to be posted when they come in. Per simon: 1. Tests run on single processor CPU (we don't have dual handy) 2. Run on today's trunk build vs. tomorrow's trunk build 3. Run: window open, startup, IM, prefs, mail performance tests 4. Results posted back here as attachments Thanks
Most of the changes in my patch from Simon's last patch are a matter of style. - I make sure no lines are longer than 80 characters. - I do not make whitespace changes. This is to make the diffs cleaner. - Whether tabs or spaces are used follows the local convention. Again, this is to make the diffs easier to read. - "Regions" and "REGIONS" are changed to "Region" and "REGION" because there is only one critical region. - "TermCriticalRegions" is renamed "DestroyCriticalRegion". - Some macro definitions were moved between primpl.h and _macos.h and the intermediate macro _MD_INTSOFF() was deleted. - gUseCriticalRegion and gCriticalRegion are declared static because they are only used in macthr.c. The real changes are: - In _PR_INTSOFF(), I leave the critical region immediately so that the entry count is always 1. This way, in _PR_FAST_INTSON() and _PR_INTSON(), I only need to leave the critical region once if we need to leave the critical region completely. As a result, the global variable gCriticalRegionEntryCount can be deleted. - In macio.c, AsyncIOCompletion(), _PR_INTSOFF() and _PR_FAST_INTSON() are not necessary because they are called in DoneWaitingOnThisThread(). I also moved the setting of thread->md.osErrCode before the if block. Simon, please review my patch and test it. If it doesn't work and it takes too long to fix it, you can go ahead and check in your patch on the NSPRPUB_PRE_4_2_CLIENT_BRANCH. I will check email from home tonight.
The sentence In _PR_INTSOFF(), I leave the critical region immediately so that the entry count is always 1. should read In _PR_INTSOFF(), I leave the critical region immediately if we are already in the critical region so that the entry count is always 1.
wtc, your final patch does not work, alas. As-is, it deadlocks on launch. I noticed that this seems wrong: #define _PR_INTSOFF(_is) \ PR_BEGIN_MACRO \ + ENTER_CRITICAL_REGION(); \ (_is) = _PR_MD_GET_INTSOFF(); \ _PR_MD_SET_INTSOFF(1); \ + if (_is) \ + LEAVE_CRITICAL_REGION(); \ PR_END_MACRO and should probably be _PR_MD_SET_INTSOFF(1); \ + if (_is == 0) \ + LEAVE_CRITICAL_REGION(); \ but with that fixed, we get a slew of assertions about leaving non-existent critical sections.
I checked my previous patch into the NSPRPUB_PRE_4_2_CLIENT_BRANCH at 8:45pm, so it will show up in trunk builds tomorrow (9/20).
trunk testing [BEFORE checkin] using: * 2001.09.18.20-trunk commercial * on Mac OS 10.0.4, 450 Mhz G3, 384Mhz * modern theme [note: separate profiles used for prefs and page load tests] Preferences dialog open: 2.50sec Switching from Navigator to Mail & Newsgroups panel: 0.70sec will soon attach page load results [side note: crashed several times after page load completed --see bug 100852].
Comment on attachment 50169 [details] 2001.09.18.20-trunk comm page load results re-ran the pre-checkin page load tests, thus invalidating this run. will reattach new set.
Attachment #50169 - Attachment is obsolete: true
tried running the page load tests using the 2001.09.20.16-trunk bits: alas, it would not complete. it hung [but not on the same page] each of the three times i tried --even after rebooting the machine after the second hang. :( will try to run the preferences perf tests. hopefully that will go more smoothly...
alas, i'm blocked with the preferences testing as well. :( i need to restart the app btwn each dialog-open timing [to get four timings and report their average]. i was able to launch the app a first time with a new profile...but with the second launch the browser window didn't even appear: only got the menubar and the rainbow spinning cursor. restarting the machine kinda "works around" the hang, but it's a rather harsh workaround. another note: simon mentioned that this might be a 10.0.4 issue --since he was recently able to run the page load tests [same 2001.09.20.16-trunk build] on his dual-cpu G4 [which has a later version of 10.x].
recent email rec'd from todds@bigfoot.com: "...I've been following this bug since I want to use mozilla on my dual 800 with OS X 10.0.4. "I just noticed your most recent comment at 22:17, and wanted to let you know that I am experiencing the same spinning rainbow cursor, and the browser never starts when using the 2001.09.20 build. "Hope that helps in any way, let me know if I can help otherwise."
Hold off on testing here folks. It looks like my changes causes deadlock on single and dual-CPU machines running 10.0.x builds. Damn. Running a new build now, more tomorrow.
Running 10.1 would not affect the deadlock?
Let's find out if this will ever work on 10.04? Simon is researching ...
I think 10.0.x is a lost cause. I just got a new dual-800, and was having MAJOR stability problems with 10.0.4 4S10. The machine would randomly crash. I posted to the Darwin list, and was told 10.0.4 has major SMP issues and to wait for 10.1. Let's just say this was good advice. See http://www.darwinfo.org/devlist.php3?number=11117 for the full message.
Looking at Stephen's mail results for OS X for 9-18 vs. 9-21 builds, there is some amount of slow down in the 9-21 build.
Can we get those tests re-run with Mac OS X 10.abouttobereleased?
Do you need a dual CPU or does a single CPU work for the speed tests? If a single CPU works, lemme know exactly which files to grab for the tests to be consistent.
Brice - thanks for the offer to help. For the speed tests, here is one of them (Mail). All the test files and methodology are there. The important thing is to use the same machine for the before/after builds with simon's fix. A single CPU is fine for this speed test. Do you have version 10.1 of the OS? If so, that would be great. sdagley - can you give me a CD with the updated OS?
I looked at the perf results attachment and I think I can do all that ... gotta see about the stopwatch, though :) If I have any questions, I'll let ya know. I assume grabbing today's 'nightly' is fine enough for a build w/ the patch incorporated?
Here's the deal. The MP APIs that I'm using for dual CPU protection are broken under 10.0.x, but fixed for 10.1. I have code that works around this problem, that will make things work on 10.0.x. So I need to check that in, and we then have to do another round of performance testing.
Oh - and I can't confirm anything about having 10.1 - to my knowledge, there is no such release yet. :)
Keywords: nsenterprise+
Brice, yes. Thanks for helping with this. So, to summarize: Folder loading speed: done with local folders Thread pane scrolling: local or IMAP Message display speed: local folders Thread pane sorting: local or imap Message Reply Speed: IMAP, see http://www.mozilla.org/mailnews/performance/msgcompose_loggin.html for details on how to generate a performance logfile for this. Batch delete 5 (10kb each) messages: local Mail module loading time: I wouldn't bother with this, it's too complex currently (using IMAP, non-isolated servers, etc.) to be truly reliable.
Keywords: nsenterprise+
marking nsenterprise- to get off the enterprise stop ship list.
Keywords: nsenterprise-
Keywords: nsenterprise+
I tested the lastest changes on 10.1 single- and dual- CPU machines, and both work just fine.
I have the 10.1 update to Mac OS X. Stephen or Sairuh, pls stop by to install for testing. Simon - do you expect to see a timing difference between OS X and 10.1?
Attachment #50595 - Attachment is obsolete: true
spoke with lisa briefly: stephend upgraded to 10.1, but i'll stick with 10.0.4 for the nonce --so that we can test with smfr's latest patch [after review/check-in] on both OS versions.
Comment on attachment 50724 [details] [diff] [review] Tidied up MDEnter/Leave critical section code r=sdagley
Attachment #50724 - Flags: review+
I checked in the working critical region code to the NSPR branch, which is used by the mozilla TRUNK. So this bug is now good to go on for testing.
holy crap. So that's why I couldn't get this to work before - I was doing the same thing under 10.0.4
Simon - Pls get the SR=, so we can plus this one, and check it in today.
Whiteboard: [OSX+],PDT+ → [OSX+],PDT
Whiteboard: [OSX+],PDT → [OSX+],PDT,[ETA 09.26]
This has all the reviews it needs to be checked in to the branch. It's already on the trunk.
trunk testing of Preferences panel: compared with my 2001-09-20 16:31 results. same configuration, with a new profile [modern, no sidebar]. AFTER checkin, using 2001.09.25.20-trunk ---------------------------------------- Open Preferences dialog open: 3.39sec Switching from Navigator to Mail & Newsgroups panel: 0.97sec BEFORE checkin, using 2001.09.18.20-trunk ----------------------------------------- Preferences dialog open: 2.50sec Switching from Navigator to Mail & Newsgroups panel: 0.70sec looks like this checkin might've slowed thing down a bit. currently running page load tests. will attach when finished.
I thought we were supposed to test on the 26th builds: AFTER build: Trunk build from 9/26 (when it shows up). If the fix was committed on 9/25 at 7:38pm, how does it show up in the 25-20 builds, which are produced at 8 pm (and presumably take more than ~20 minutes to pull and build). Just checking to make sure I test the right build.
I just wanted to point out the the focus of the QA testing should be on the stability of this patch. That is, whether the Mozilla client crashes or deadlocks with the patch applied. The current code is broken on Mac OS X. No matter how much faster it is, it is broken. If Simon's patch fixes the thread safety problem on Mac OS X, it should be checked in. There is not much point in comparing its performance with the performance of broken code on Mac OS X. Any performance comparision should be done on classic Mac OS, where the current code works.
check it in - PDT+
Whiteboard: [OSX+],PDT,[ETA 09.26] → [OSX+],PDT+,[ETA 09.26]
i've created bug 101838 for the performance slowdown i've noticed in the 2001.09.25.20-trunk builds.
Startup Results I take an average of three for launch and relaunch on both builds. I tested these manually via a stopwatch. Times are in seconds. Build 2001-09-18-20-trunk Launch: 19.25 Re-launch: 12.49 Build 2001-09-25-20-trunk Launch: 24.17 Re-launch: 16.62 So, as you can see.. a fairly large jump in times in the 9-25 build :( There were no peaks or weird jumps, all times on both builds were fairly consistent.
OS X v10.1, iMac DV 400MHz, 768MB RAM Test/Build|--9/18--|--9/27--| Startup 1 | 24.45 | 18.40 | Startup 2 | 10.31 | 11.78 | Startup 1 indicates that I rebooted and launched Mozilla for the first time, Startup 2 indicates that I launched Mozilla directly after quitting the same build of Mozilla, after a clean reboot. Basically, I rebooted, launched build 9/18, quit, launched again - recorded times, rebooted, launched build 9/27, quit, launched again - recorded times. I'll try to get some Mail Perfs now that I have a stopwatch.
I'm really sorry that I have to report this ... but, I believe I just experienced deadlock in Mozilla, build 2001092620 - I had the spinning pizza for at least 5 minutes and when I checked out top, Moz was hovering right around 50% usage. I had to forcibly kill Moz :( I don't really know what 'caused' it ... I was viewing a web page when my wife sent me an email, so I command-2'd over to Mail/News, hit command-t to get new mail, then clicked in the 'Denise' folder that her messages get filtered into and that's when the pizza started spinning. I don't have a dual-proc machine, this is the same machine I posted the perf results from (iMac DV 400 w/ 768MB). If this *wasn't* deadlock, what could it have been?
Checked in on 0.9.4 branch. Keeping open for NSPR tip.
Whiteboard: [OSX+],PDT+,[ETA 09.26]
Checked in on the NSPR tip.
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
kin - I heard you have a dual processor machine. Can you verify for QA that that the branch build starts on your machine?
I have a G4 533 dual and Mozilla build nr. 2001093021 works great. Startup time could be less (I hope).
lchiang - I have no dual processor machines ... I believe both jfrancis and sfraser do.
Thank you. I think I will mark this verified (for startup of application on dual processor Mac) per sharky_k_@mac.com's comment. I have also asked Marty F. at AOL to confirm for me on his dual processor for the commercial build. I will also inquire with one other source for the commercial build.
Status: RESOLVED → VERIFIED
I will also test on fkeeney's dual processor which he has graciously lent to me. If anyone wants to come by and use it, you are more than welcome to (B20-1)
Some results: On sdagley's G4/733 it took around 5 minutes to download 2220 headers (using yesterday's trunk) On fkeeney's dualG4/500 it took around 5 minutes to download 3482 headers (using today's branch) On sdagley's G4/733 it took around 30 seconds to download 2220 headers (using a debug build from a few weeks ago, pre Simon's fix). This on a non-optimized build. As you can see this is a HUGE performance degradation. It is literally 10 times slower.
posted new bug for mail performance degradation: http://bugzilla.mozilla.org/show_bug.cgi?id=102797 lrg tested major windows and browsed around on dual CPU machine for about 3 hours and didn't find any problems (other than generally sluggish performance).
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: