Closed
Bug 71718
Opened 24 years ago
Closed 23 years ago
Carbonized Mac NSPR needs architectural work to run on dual-CPU machines
Categories
(NSPR :: NSPR, defect, P1)
Tracking
(Not tracked)
VERIFIED
FIXED
4.2
People
(Reporter: mikepinkerton, Assigned: sfraser_bugs)
References
Details
Attachments
(7 files, 9 obsolete files)
16.91 KB,
patch
|
Details | Diff | Splinter Review | |
13.78 KB,
patch
|
Details | Diff | Splinter Review | |
22.59 KB,
text/html
|
Details | |
4.96 KB,
text/html
|
Details | |
9.82 KB,
patch
|
sdagley
:
review+
|
Details | Diff | Splinter Review |
5.98 KB,
text/html
|
Details | |
2.12 KB,
text/html
|
Details |
Run the fizzilla build on a dual-cpu g4 (like the one in my cube) and the app
stalls loading most pages. Here is the email from Apple (vlubet@apple.com):
Mike,
I looked at the code on macsockotpt.c and do not see any call to
OTEnterNotifier() and OTLeaveNotifier().
On Mac OS X, Open Transport notifiers, timer tasks and deferred task do not
run to completion. That means that code from the main event loop or another
thread may preempt code running from OT notifiers, OT timer tasks and OT
deferred tasks. The solution is to always protect your data by using either
atomic operations or by using OTEnterNotifier()/OTLeaveNotifier().
Bracketing your critical sections with OTEnterNotifier()/OTLeaveNotifier()
is usually the most convenient solution as this ensures that either the
notifier or the main event loop / cooperative thread is accessing common
data.
For example here is what I would do:
1369 do {
Boolean entered = OTEnterNotifier(endpoint);
1370 fd->secret->md.read.thread = me;
1371 fd->secret->md.readReady = PR_FALSE; //
expect the worst
1372 result = OTRcv(endpoint, buf, bytesLeft, NULL);
1373 if (fd->secret->nonblocking) {
1374 fd->secret->md.readReady = (result !=
kOTNoDataErr);
if (entered)
OTLeaveNotifier(endpoint);
1375 break;
1376 } else {
1377 if (result != kOTNoDataErr) {
1378 // If we successfully read a blocking socket,
check for more data.
1379 // According to IM:OT, we should be able to
rely on OTCountDataBytes
1380 // to tell us whether there is a nonzero amount
of data pending.
1381 size_t count;
1382 OSErr tmpResult;
1383 tmpResult = OTCountDataBytes(endpoint, &count);
1384 fd->secret->md.readReady = ((tmpResult ==
kOTNoError) && (count > 0));
1385
if (entered)
OTLeaveNotifier(endpoint);
1386 break;
1387 }
1388
1389 // Blocking read, but no data available. Wait for
an event
1390 // on this endpoint, and hope that we get a T_DATA
event.
if (entered)
OTLeaveNotifier(endpoint);
1391 WaitOnThisThread(me, PR_INTERVAL_NO_TIMEOUT);
1392 result = me->md.osErrCode;
1393 if (result != kOTNoError) // interrupted thread,
etc.
1394 break;
1395
1396 // Prepare to loop back and try again
1397 PrepareForAsyncCompletion(me, fd->secret->md.osfd);
1398 }
1399 }
1400 // Retry read if we had to wait for data to show up.
1401 while(1);
It certainly behaves like our endpoints are being clobbered in a race-condition-
ish situation.
Reporter | ||
Updated•24 years ago
|
Comment 1•24 years ago
|
||
Apple has shipped us a dual CPU G4 so we don't have to camp in pink's cube trying
to debug this
Assignee | ||
Comment 2•24 years ago
|
||
They have? Can I have it? ;)
Comment 3•24 years ago
|
||
sfraser has the dual CPU loaner Mac, now he has the bug too :-)
Assignee: sdagley → sfraser
Status: ASSIGNED → NEW
Assignee | ||
Updated•24 years ago
|
Priority: -- → P1
Comment 5•24 years ago
|
||
I'll just point out that the dual-CPU loaner system is due back to Apple on 6/1
so that, while this fix may not be critical for 0.9.1, it needs to land well
before 0.9.2
Updated•24 years ago
|
Whiteboard: OSX+
Assignee | ||
Comment 7•23 years ago
|
||
The Mac NSPR code has serious issues on Mac OS X, which are particularly visible
on machines with two processors.
The main issue here is that OT notifiers, async file I/O callbacks, and timer
tasks each use pthreads under the hood, and are thus interruptible. This breaks
assumptions that the mac code currently makes relating to the switching on and
off of interrupts, and lock consistency.
It would appear that the only good solution to this problem is to build NSPR as
a Mach-O binary, and have it use the Unix implementation. If we do that, we need
to ensure that the implementation permits clients to call any Carbon Toolbox
functions on threads, as they can now. It would be bad to move to pthreads, and
to find, suddenly, that bad things happen when trying to do, say, drawing, or
memory allocation, on non-main threads.
Assignee: sfraser → wtc
Component: Networking → NSPR
Product: Browser → NSPR
QA Contact: benc → larryh
Summary: Most pages stall on dual-cpu G4 → Carbonized Mac NSPR needs architectural work to run on dual-CPU machines
Target Milestone: mozilla0.9.2 → ---
Comment 8•23 years ago
|
||
Steve, could you please work on this bug? We already
have an implementation of NSPR for Mac OS X based on
pthreads.
Assignee: wtc → sdagley
Comment 9•23 years ago
|
||
wtc, can you tell me which nspr defines need to be lit to do a nspr for mac os x
build?
Comment 10•23 years ago
|
||
The only way to build NSPR for Mac OS X is to use the Unix build
instructions, i.e., configure and make.
I don't know how to integrate this with Code Warrior.
Comment 11•23 years ago
|
||
I spoke to Quinn at Apple's WWDC about calling into Mach-O from CFM and he
pointed me to <http://developer.apple.com/samplecode/Sample_Code/
Runtime_Architecture/CallMachOFramework.htm>. It's intended for calling
frameworks but I'm hoping it can be adapted to general Mach-O libs.
Comment 12•23 years ago
|
||
Thanks. As it happens I found that on my own. And I got that much working... I
can load the System framework at _MD_EARLY_INIT or whatever it's called. But
then I'm stuck. I have no pthread headers that will work with CW. In fact, just
to get the first part working I had to copy over the Universal folder from CW7EA.
There was needed stuff missing from the headers otherwise.
I tried to handroll my own pthread declararions, because I need only a few calls.
But unfortunately initializing a pthread mutex requires a *structure* as one of
the params (what the heck were those posix folks thinking?) so it gets
complicated to get it all correct. I really need posix headers that work with
CodeWarrior.
Or I could try porting nspr over to ProjectBuilder. Then I could make a knock-
off nspr lib that had pthreads locks in the right places to get dual cpu macs
happy.
Comment 13•23 years ago
|
||
oops. Sorry Steve, I thought you were replying to a question I asked in another
forum. You are actually looking at the opposite solution from what I am
thinking. You want to make a mach-o nspr and clal it from cfm fizzilla, right?
I want to make the current cfm nspr use pthread locks from the system framework
make the INTS_OFF/INTS_ON stuff in nspr actually acquire a real lock that the
underlying os will honor.
I'm worried about using real pthreads in mac nspr because I think you can only
make most system calls from the main thread and I don't know that we can enforce
that restriction on the current code base. So I'm thinking keep the nspr threads
and just enforce syncronization with the callbacks/timers/notifiers using a real
lock.
Comment 14•23 years ago
|
||
ok, new idea. make the INTS_ON/INTS_OFF stuff enter and leave a critical
region. That way everything turning on/off nspr ints is competing on the same
critical region, acting just like a lock. The reason this is good is that the
mac MP api supports this, and on mac os x it is just a wrapper for posix stuff,
so the underlying os threads running the callbacks/timers/notifiers will respect
this. I'll wager the MP headers don't pose any problems for our build system,
so it should be easy to do. i'll try it later today and see.
Reporter | ||
Comment 15•23 years ago
|
||
two thoughts:
- pro7 can build mach-o, so why aren't all the pthread headers there?
- aren't the correct pthread headers on the OSX developer install for you to
look at?
Assignee | ||
Comment 16•23 years ago
|
||
Yes, you will be able to call MP stuff. This might be one way to go.
Updated•23 years ago
|
Target Milestone: --- → 4.2
Comment 18•23 years ago
|
||
I've played around with this some. My first attempt seems to just lock all the
threads up. The chief difficulty is that _PR_INTSOFF retreives a value that
indicates whether the region is already entered. But the MP api does not expose
any way to get that info for a critical region. The result is that the nspr
code triggers more entries into the critical region than exits.
I think it may be possible to make _PR_INTSOFF & _PR_INTSON not care about that
value. The number of places in the code that actually examine that value are
few, and are almost entirely in mac only code. I think this value is only
needed because _PR_INTSOFF doesn't currently acquire a "real" lock of any kind.
I'll play with this some more over the weekend.
Comment 19•23 years ago
|
||
How about keeping a counter for the number of times _PR_INTSOFF has been called,
and only enter/leave the critical section on 0/1 transitions, sort of like
destructors and reference counting. That should have similar semantics to the
current usage of _PR_INTSOFF.
Or it may be even simpler than that. You may just need to keep a flag indicating
whether you're in the critical section, and return that flag.
If you're around this afternoon, I'd love to talk with you about it.
Reporter | ||
Comment 20•23 years ago
|
||
you would probably need a separate lock on this counter too, as you could be
pre-empted between checking the value and then acting on it causing a
race-condition.
Comment 21•23 years ago
|
||
re gordon: i was already trying that.
re mike: i was entering my critical section before using the counter.
there is some code around that saves the counter, and then enters the intsOff
state setting the counter to 1, and then restores the counter when it's done.
This will have the effect of not turning ints back on if they weren't on before,
but it totally demolishes the concept of the counter being a reentrancy count.
i think i can just not use a counter at all, and tweak the code in the handful
of places that use it. Almost all are in mac only code, and most of it is to
detect when a notification might have been dropped becasue it came while ints
were off. Under osx the notification wont be dropperd in that situation: if
someone else is in the critical section when the notification comes in, the
notification will block on the critical section, and then fire when it can enter.
Comment 22•23 years ago
|
||
at request of macdev, attaching my patches. they DO NOT work. But they
show the lines I am thinking along.
Comment 23•23 years ago
|
||
Comment 24•23 years ago
|
||
a second attempt. not sure yet why this one doesn't work.
Comment 25•23 years ago
|
||
Comment 26•23 years ago
|
||
sdagley - are any of the machines which we'll get from Apple for testing a
dual-CPU machine? If dual-CPU machine seems to have it's own set of problems,
we may need to test on that.
Assignee | ||
Comment 27•23 years ago
|
||
We should *definately* test on dual-CPU machines, and it would be good for QA to
invest in a few of these. Note that, currently, our builds are pretty much dead
on such machines running Mac OS X because of this bug.
Reporter | ||
Updated•23 years ago
|
Whiteboard: OSX++ → OSX++, PDT+
Comment 28•23 years ago
|
||
lchiang, we aren't getting any dual CPUs in the new batch but Joe Francis has
voluntereed his dual-cpu system for testing while he's on sabbatical next month
Keywords: nsBranch
Comment 29•23 years ago
|
||
Steve, this is PDT+ bug. Tomorrow, on Tuesday, we'll try to build the first RTM
candidate. It would be good, if this could be resolved ASAP.
Comment 30•23 years ago
|
||
what going on with this bug? getting close to the time where it may have to be
latered to future milestone/release if we are not there, or past there already...
Comment 31•23 years ago
|
||
This bug is specific to Mac OS X and does not impact the 'Classic' builds for Mac
OS 8.6/9.x. As to where we're at with it, as an AOL employee I can't use the
appropriate profanity in this public forum to describe it so I'll just say we
don't expect to have a fix this month.
Comment 32•23 years ago
|
||
*** Bug 90941 has been marked as a duplicate of this bug. ***
Reporter | ||
Comment 33•23 years ago
|
||
we're not gonna have this fixed by beta, removing pdt+
Whiteboard: OSX++, PDT+ → OSX++
Updated•23 years ago
|
Whiteboard: OSX++ → OSX++, meeting w/Apple in August to attempt fix w/o requiring new architecture
Reporter | ||
Comment 34•23 years ago
|
||
*** Bug 93824 has been marked as a duplicate of this bug. ***
Reporter | ||
Comment 35•23 years ago
|
||
*** Bug 93720 has been marked as a duplicate of this bug. ***
Comment 36•23 years ago
|
||
The following Apple tech note may be helpful in resolving this bug, as well as
giving Fizzilla better OS X performance.
http://developer.apple.com/technotes/tn/tn2028.html
- Adam
Comment 38•23 years ago
|
||
Simon is now working on this so reassigning bug
Assignee: sdagley → sfraser
Assignee | ||
Updated•23 years ago
|
Whiteboard: OSX++, meeting w/Apple in August to attempt fix w/o requiring new architecture → OSX++
Assignee | ||
Comment 39•23 years ago
|
||
Assignee | ||
Updated•23 years ago
|
Attachment #38164 -
Attachment is obsolete: true
Assignee | ||
Updated•23 years ago
|
Attachment #38876 -
Attachment is obsolete: true
Assignee | ||
Comment 40•23 years ago
|
||
Assignee | ||
Updated•23 years ago
|
Attachment #49135 -
Attachment is obsolete: true
Assignee | ||
Comment 41•23 years ago
|
||
That last patch works. On single CPU machines, it's performance impact is below
the noise threshold of the page loading tests I used to measure it. It uses MP
critical sections to fix thread synchronization issues, blocking multiple MP
threads from running while NSPR interrupts are turned off.
The code is currently active if TARGET_CARBON is defined to be 1. It might be
necessary to detect the OS version using Gestalt if this affects Carbon under 9
performance (which is unknown).
I need some strong reviews, please.
Status: NEW → ASSIGNED
Assignee | ||
Comment 42•23 years ago
|
||
I have some further changes to the patch, to conditionalize the critical region
code for Mac OS X only, and to add some explanatory comments. Please wait...
Comment 43•23 years ago
|
||
is this going to make it soon? putting on the "+"
Assignee | ||
Updated•23 years ago
|
Attachment #49168 -
Attachment is obsolete: true
Assignee | ||
Comment 44•23 years ago
|
||
Assignee | ||
Comment 45•23 years ago
|
||
The latest patch does two new things:
1. Return non-Carbon builds to their previous behaviour (just some macro changes).
2. In carbon builds, conditionally uses critical sections only if we are running
on Mac OS X. So Carbon-on-classic builds are not penalized.
Assignee | ||
Comment 46•23 years ago
|
||
Assignee | ||
Updated•23 years ago
|
Attachment #49391 -
Attachment is obsolete: true
Assignee | ||
Comment 47•23 years ago
|
||
The new patch:
* Replaces the array of critical sections with just one (since I only used one),
which now matches the single critical region entry count.
* Removes a bogus WalkTheStack prototype.
Comment 48•23 years ago
|
||
I reviewed Simon's "really final patch" (attachment 49391 [details] [diff] [review]).
I already sent him some suggested changes after a quick
look. Then I stared at the patch for a good hour. After
I internalized the patch, I found that Simon's solution is
actually very simple. (It is all those macros that makes
the patch look complicated.) It is exactly what jfrancis
outlined in his comments back in June. It surprises me that
the simple solution is enough to solve this problem.
Congratulations to all of you who worked on this bug!
I will go over the patch with Simon next week. I just
wanted to point out that the patch also contains a fix
for bug 97866. That bug is marked as depending on this
bug, but I think it is not related and its fix should
be reviewed and checked in independently.
Comment 49•23 years ago
|
||
(This looks close to a fix - great! Who at Netscape has a dual-processor
machine besides kin? Once this is fixed, we need to do some testing.)
Comment 51•23 years ago
|
||
Sorry for the somewhat basic quizzing but:
1- This bug discusses fixing the MacOS X builds only right? So it is completely
irrelevant to running MacOS X w/ classic and classic build.
2- What happened if you ran MacOS X builds on dual-processor systems before this
fix? (Boom?!). Did this affect dual processor systems in pre-X?
Need this info to figure out how much upgrading is needed for testing.
Reporter | ||
Comment 52•23 years ago
|
||
ben:
1) correct, this does not affect the classic bits (non-carbon) running on osx in
the classic environment
2) pages would stall, and never finish loading. sometimes the app would crash
when you quit after failing to load pages.
Assignee | ||
Comment 53•23 years ago
|
||
Assignee | ||
Updated•23 years ago
|
Attachment #49408 -
Attachment is obsolete: true
Assignee | ||
Comment 54•23 years ago
|
||
The super-final patch:
* removes the async file I/O changes, which are now in a patch in bug 97866
* replaces separate intsoff/intson implemenations with a single, simplified
intsoff routine, and removes some macro code duplication.
* Uses intsoff/fastintson rather than setintsoff(1)/setintsoff(0) in a couple of
callback routines.
* Improves the comments in _MD_PauseCPU and DoneWaitingOnThisThread().
gordon, please review.
wtc, if you'll then bless this, I can check it in.
Comment 55•23 years ago
|
||
r=gordon, unless you want to clean up _macos.h or add a wrapper to
DoneWaitingOnThisThread().
wtc, the #ifdefs in primpl.h give me headaches. Are you sure they're correct?
There seem to be redundant nesting of global and local thread cases.
Assignee | ||
Comment 56•23 years ago
|
||
And now the bad news. Adding the critical regions on Mac OS X does have some
perceptible impact on performance in some areas. Page loading is not affected,
but some UI operations are (I noticed, for example, that loading the buddy list
in a commercial build got slower), and the browser feels someone less responsive
on X. It may be that parts of the code that rely a lot on PR_Lock/PR_Unlock, or
atomic operations, are slowed down by this patch.
Comment 57•23 years ago
|
||
This is not on Tian Tian Must Fix list for this release. Should we consider
minusing it, and changing the priority for this release?
Comment 58•23 years ago
|
||
sdagley commenting here - If this isn't on tiantian's list it damn sure should
be - this is most DEFINITELY a stop ship bug. Otherwise explain to Marty Fisher
hy we don't run on his new Mac :-)
Updated•23 years ago
|
Whiteboard: OSX+,PDT+ → [OSX+],PDT+
Comment 59•23 years ago
|
||
Seconding Steve's comments. As someone involved with the evolution of this bug
fix I feel we are much better off with it than without it. Both MacOS X and dual
CPU macs are catching on pretty quickly. Lets not leave those folks stranded.
Comment 60•23 years ago
|
||
Hi:
This bug is the FIRST one listed on my stop ship bug list compiled yesterday.
Please check again the list on
http://bugscape.netscape.com/show_bug.cgi?id=9485. And it is the first bug on
the 8/21 list as well.
Comment 61•23 years ago
|
||
Check it into the trunk, and let's get some testing numbers (New Window,
Startup, etc.). Tian Tian and Hong to coordinate test efforts.
Assignee | ||
Comment 62•23 years ago
|
||
Assignee | ||
Updated•23 years ago
|
Attachment #49782 -
Attachment is obsolete: true
Assignee | ||
Comment 63•23 years ago
|
||
Chnages in this patch:
* removed redundant duplicate of _MD_INTSOFF macro in _macos.h
* In code blocks like this:
if (_PR_MD_GET_INTSOFF()) {
thread->md.missedIONotify = PR_TRUE;
cpu->u.missed[cpu->where] |= _PR_MISSED_IO;
return;
}
which are outside of the critical region, reversed the order of setting
missedIONotify and the _PR_MISSED_IO to mitigate the effects of preemption at
this point.
Note that the patch is a diff -b (ignoring space change), so spacing looks worse
than it really is.
Comment 64•23 years ago
|
||
Comment on attachment 49998 [details] [diff] [review]
Patch with minor revisions
r=gordon
Comment 65•23 years ago
|
||
I was unable to edit patch status to specify "has-review". Does that have
something to do with the Product field?
Comment 66•23 years ago
|
||
I will coordinate the testing for the perf numbers. Results to be posted when
they come in. Per simon:
1. Tests run on single processor CPU (we don't have dual handy)
2. Run on today's trunk build vs. tomorrow's trunk build
3. Run: window open, startup, IM, prefs, mail performance tests
4. Results posted back here as attachments
Thanks
Comment 67•23 years ago
|
||
Comment 68•23 years ago
|
||
Most of the changes in my patch from Simon's last patch
are a matter of style.
- I make sure no lines are longer than 80 characters.
- I do not make whitespace changes. This is to make the
diffs cleaner.
- Whether tabs or spaces are used follows the local
convention. Again, this is to make the diffs easier
to read.
- "Regions" and "REGIONS" are changed to "Region" and
"REGION" because there is only one critical region.
- "TermCriticalRegions" is renamed "DestroyCriticalRegion".
- Some macro definitions were moved between primpl.h and
_macos.h and the intermediate macro _MD_INTSOFF() was
deleted.
- gUseCriticalRegion and gCriticalRegion are declared static
because they are only used in macthr.c.
The real changes are:
- In _PR_INTSOFF(), I leave the critical region immediately
so that the entry count is always 1. This way, in
_PR_FAST_INTSON() and _PR_INTSON(), I only need to leave
the critical region once if we need to leave the critical
region completely. As a result, the global variable
gCriticalRegionEntryCount can be deleted.
- In macio.c, AsyncIOCompletion(), _PR_INTSOFF() and
_PR_FAST_INTSON() are not necessary because they are called
in DoneWaitingOnThisThread(). I also moved the setting of
thread->md.osErrCode before the if block.
Simon, please review my patch and test it. If it doesn't
work and it takes too long to fix it, you can go ahead and
check in your patch on the NSPRPUB_PRE_4_2_CLIENT_BRANCH.
I will check email from home tonight.
Comment 69•23 years ago
|
||
The sentence
In _PR_INTSOFF(), I leave the critical region immediately
so that the entry count is always 1.
should read
In _PR_INTSOFF(), I leave the critical region immediately
if we are already in the critical region so that the entry
count is always 1.
Assignee | ||
Comment 70•23 years ago
|
||
wtc, your final patch does not work, alas. As-is, it deadlocks on launch. I
noticed that this seems wrong:
#define _PR_INTSOFF(_is) \
PR_BEGIN_MACRO \
+ ENTER_CRITICAL_REGION(); \
(_is) = _PR_MD_GET_INTSOFF(); \
_PR_MD_SET_INTSOFF(1); \
+ if (_is) \
+ LEAVE_CRITICAL_REGION(); \
PR_END_MACRO
and should probably be
_PR_MD_SET_INTSOFF(1); \
+ if (_is == 0) \
+ LEAVE_CRITICAL_REGION(); \
but with that fixed, we get a slew of assertions about leaving non-existent
critical sections.
Assignee | ||
Comment 71•23 years ago
|
||
I checked my previous patch into the NSPRPUB_PRE_4_2_CLIENT_BRANCH at 8:45pm, so
it will show up in trunk builds tomorrow (9/20).
Comment 72•23 years ago
|
||
trunk testing [BEFORE checkin] using:
* 2001.09.18.20-trunk commercial
* on Mac OS 10.0.4, 450 Mhz G3, 384Mhz
* modern theme [note: separate profiles used for prefs and page load tests]
Preferences dialog open: 2.50sec
Switching from Navigator to Mail & Newsgroups panel: 0.70sec
will soon attach page load results [side note: crashed several times after page
load completed --see bug 100852].
Comment 73•23 years ago
|
||
Comment 74•23 years ago
|
||
Comment on attachment 50169 [details]
2001.09.18.20-trunk comm page load results
re-ran the pre-checkin page load tests, thus invalidating this run. will reattach new set.
Attachment #50169 -
Attachment is obsolete: true
Comment 75•23 years ago
|
||
Comment 76•23 years ago
|
||
tried running the page load tests using the 2001.09.20.16-trunk bits: alas, it
would not complete. it hung [but not on the same page] each of the three times i
tried --even after rebooting the machine after the second hang. :(
will try to run the preferences perf tests. hopefully that will go more smoothly...
Comment 77•23 years ago
|
||
alas, i'm blocked with the preferences testing as well. :(
i need to restart the app btwn each dialog-open timing [to get four timings and
report their average]. i was able to launch the app a first time with a new
profile...but with the second launch the browser window didn't even appear: only
got the menubar and the rainbow spinning cursor.
restarting the machine kinda "works around" the hang, but it's a rather harsh
workaround.
another note: simon mentioned that this might be a 10.0.4 issue --since he was
recently able to run the page load tests [same 2001.09.20.16-trunk build] on his
dual-cpu G4 [which has a later version of 10.x].
Comment 78•23 years ago
|
||
recent email rec'd from todds@bigfoot.com:
"...I've been following this bug since I want to use mozilla on my dual 800 with
OS X 10.0.4.
"I just noticed your most recent comment at 22:17, and wanted to let you know
that I am experiencing the same spinning rainbow cursor, and the browser never
starts when using the 2001.09.20 build.
"Hope that helps in any way, let me know if I can help otherwise."
Assignee | ||
Comment 79•23 years ago
|
||
Hold off on testing here folks. It looks like my changes causes deadlock on
single and dual-CPU machines running 10.0.x builds. Damn. Running a new build
now, more tomorrow.
Comment 80•23 years ago
|
||
Running 10.1 would not affect the deadlock?
Comment 81•23 years ago
|
||
Let's find out if this will ever work on 10.04? Simon is researching ...
Comment 82•23 years ago
|
||
I think 10.0.x is a lost cause. I just got a new dual-800, and was having
MAJOR stability problems with 10.0.4 4S10. The machine would randomly
crash. I posted to the Darwin list, and was told 10.0.4 has major SMP
issues and to wait for 10.1. Let's just say this was good advice.
See http://www.darwinfo.org/devlist.php3?number=11117 for the full
message.
Comment 84•23 years ago
|
||
Looking at Stephen's mail results for OS X for 9-18 vs. 9-21 builds, there is
some amount of slow down in the 9-21 build.
Comment 85•23 years ago
|
||
Can we get those tests re-run with Mac OS X 10.abouttobereleased?
Comment 86•23 years ago
|
||
Do you need a dual CPU or does a single CPU work for the speed tests? If a
single CPU works, lemme know exactly which files to grab for the tests to be
consistent.
Comment 87•23 years ago
|
||
Brice - thanks for the offer to help. For the speed tests, here is one of them
(Mail). All the test files and methodology are there. The important thing is to
use the same machine for the before/after builds with simon's fix. A single
CPU is fine for this speed test. Do you have version 10.1 of the OS? If so,
that would be great.
sdagley - can you give me a CD with the updated OS?
Comment 88•23 years ago
|
||
I looked at the perf results attachment and I think I can do all that ... gotta
see about the stopwatch, though :)
If I have any questions, I'll let ya know.
I assume grabbing today's 'nightly' is fine enough for a build w/ the patch
incorporated?
Assignee | ||
Comment 89•23 years ago
|
||
Here's the deal. The MP APIs that I'm using for dual CPU protection are broken
under 10.0.x, but fixed for 10.1. I have code that works around this problem,
that will make things work on 10.0.x. So I need to check that in, and we then
have to do another round of performance testing.
Comment 90•23 years ago
|
||
Oh - and I can't confirm anything about having 10.1 - to my knowledge, there is
no such release yet. :)
Assignee | ||
Comment 91•23 years ago
|
||
Keywords: nsenterprise+
Brice, yes. Thanks for helping with this. So, to summarize:
Folder loading speed: done with local folders
Thread pane scrolling: local or IMAP
Message display speed: local folders
Thread pane sorting: local or imap
Message Reply Speed: IMAP, see
http://www.mozilla.org/mailnews/performance/msgcompose_loggin.html for details
on how to generate a performance logfile for this.
Batch delete 5 (10kb each) messages: local
Mail module loading time: I wouldn't bother with this, it's too complex
currently (using IMAP, non-isolated servers, etc.) to be truly reliable.
Keywords: nsenterprise+
Comment 93•23 years ago
|
||
marking nsenterprise- to get off the enterprise stop ship list.
Keywords: nsenterprise-
Keywords: nsenterprise+
Assignee | ||
Comment 94•23 years ago
|
||
I tested the lastest changes on 10.1 single- and dual- CPU machines, and both
work just fine.
Comment 95•23 years ago
|
||
I have the 10.1 update to Mac OS X. Stephen or Sairuh, pls stop by to install
for testing.
Simon - do you expect to see a timing difference between OS X and 10.1?
Assignee | ||
Updated•23 years ago
|
Attachment #50595 -
Attachment is obsolete: true
Assignee | ||
Comment 96•23 years ago
|
||
Comment 97•23 years ago
|
||
spoke with lisa briefly: stephend upgraded to 10.1, but i'll stick with 10.0.4
for the nonce --so that we can test with smfr's latest patch [after
review/check-in] on both OS versions.
Comment 98•23 years ago
|
||
Comment on attachment 50724 [details] [diff] [review]
Tidied up MDEnter/Leave critical section code
r=sdagley
Attachment #50724 -
Flags: review+
Assignee | ||
Comment 99•23 years ago
|
||
I checked in the working critical region code to the NSPR branch, which is used
by the mozilla TRUNK. So this bug is now good to go on for testing.
Comment 100•23 years ago
|
||
holy crap. So that's why I couldn't get this to work before - I was doing the
same thing under 10.0.4
Comment 101•23 years ago
|
||
Simon - Pls get the SR=, so we can plus this one, and check it in today.
Whiteboard: [OSX+],PDT+ → [OSX+],PDT
Updated•23 years ago
|
Whiteboard: [OSX+],PDT → [OSX+],PDT,[ETA 09.26]
Assignee | ||
Comment 102•23 years ago
|
||
This has all the reviews it needs to be checked in to the branch. It's already on
the trunk.
Comment 103•23 years ago
|
||
trunk testing of Preferences panel: compared with my 2001-09-20 16:31 results.
same configuration, with a new profile [modern, no sidebar].
AFTER checkin, using 2001.09.25.20-trunk
----------------------------------------
Open Preferences dialog open: 3.39sec
Switching from Navigator to Mail & Newsgroups panel: 0.97sec
BEFORE checkin, using 2001.09.18.20-trunk
-----------------------------------------
Preferences dialog open: 2.50sec
Switching from Navigator to Mail & Newsgroups panel: 0.70sec
looks like this checkin might've slowed thing down a bit.
currently running page load tests. will attach when finished.
I thought we were supposed to test on the 26th builds:
AFTER build:
Trunk build from 9/26 (when it shows up).
If the fix was committed on 9/25 at 7:38pm, how does it show up in the 25-20
builds, which are produced at 8 pm (and presumably take more than ~20 minutes to
pull and build).
Just checking to make sure I test the right build.
Comment 105•23 years ago
|
||
I just wanted to point out the the focus of the QA
testing should be on the stability of this patch.
That is, whether the Mozilla client crashes or
deadlocks with the patch applied.
The current code is broken on Mac OS X. No matter
how much faster it is, it is broken.
If Simon's patch fixes the thread safety problem
on Mac OS X, it should be checked in. There is not
much point in comparing its performance with the
performance of broken code on Mac OS X. Any
performance comparision should be done on classic
Mac OS, where the current code works.
Comment 106•23 years ago
|
||
check it in - PDT+
Whiteboard: [OSX+],PDT,[ETA 09.26] → [OSX+],PDT+,[ETA 09.26]
Comment 107•23 years ago
|
||
i've created bug 101838 for the performance slowdown i've noticed in the
2001.09.25.20-trunk builds.
Comment 109•23 years ago
|
||
Startup Results
I take an average of three for launch and relaunch on both builds. I tested
these manually via a stopwatch. Times are in seconds.
Build 2001-09-18-20-trunk
Launch: 19.25
Re-launch: 12.49
Build 2001-09-25-20-trunk
Launch: 24.17
Re-launch: 16.62
So, as you can see.. a fairly large jump in times in the 9-25 build :(
There were no peaks or weird jumps, all times on both builds were fairly
consistent.
Comment 110•23 years ago
|
||
OS X v10.1, iMac DV 400MHz, 768MB RAM
Test/Build|--9/18--|--9/27--|
Startup 1 | 24.45 | 18.40 |
Startup 2 | 10.31 | 11.78 |
Startup 1 indicates that I rebooted and launched Mozilla for the first time,
Startup 2 indicates that I launched Mozilla directly after quitting the same
build of Mozilla, after a clean reboot.
Basically, I rebooted, launched build 9/18, quit, launched again - recorded
times, rebooted, launched build 9/27, quit, launched again - recorded times.
I'll try to get some Mail Perfs now that I have a stopwatch.
Comment 111•23 years ago
|
||
Comment 112•23 years ago
|
||
I'm really sorry that I have to report this ... but, I believe I just
experienced deadlock in Mozilla, build 2001092620 - I had the spinning pizza for
at least 5 minutes and when I checked out top, Moz was hovering right around 50%
usage. I had to forcibly kill Moz :(
I don't really know what 'caused' it ... I was viewing a web page when my wife
sent me an email, so I command-2'd over to Mail/News, hit command-t to get new
mail, then clicked in the 'Denise' folder that her messages get filtered into
and that's when the pizza started spinning.
I don't have a dual-proc machine, this is the same machine I posted the perf
results from (iMac DV 400 w/ 768MB). If this *wasn't* deadlock, what could it
have been?
Assignee | ||
Comment 113•23 years ago
|
||
Checked in on 0.9.4 branch. Keeping open for NSPR tip.
Keywords: nsbeta1,
nsbranch+,
nsenterprise-,
nsmac1
Whiteboard: [OSX+],PDT+,[ETA 09.26]
Assignee | ||
Comment 114•23 years ago
|
||
Checked in on the NSPR tip.
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Comment 115•23 years ago
|
||
kin - I heard you have a dual processor machine. Can you verify for QA that
that the branch build starts on your machine?
Comment 116•23 years ago
|
||
I have a G4 533 dual and Mozilla build nr. 2001093021 works great. Startup time
could be less (I hope).
Comment 117•23 years ago
|
||
lchiang - I have no dual processor machines ... I believe both jfrancis and
sfraser do.
Comment 118•23 years ago
|
||
Thank you.
I think I will mark this verified (for startup of application on dual processor
Mac) per sharky_k_@mac.com's comment.
I have also asked Marty F. at AOL to confirm for me on his dual processor for
the commercial build. I will also inquire with one other source for the
commercial build.
Status: RESOLVED → VERIFIED
Comment 119•23 years ago
|
||
I will also test on fkeeney's dual processor which he has graciously lent to me.
If anyone wants to come by and use it, you are more than welcome to (B20-1)
Comment 120•23 years ago
|
||
Some results:
On sdagley's G4/733 it took around 5 minutes to download 2220 headers (using
yesterday's trunk)
On fkeeney's dualG4/500 it took around 5 minutes to download 3482 headers (using
today's branch)
On sdagley's G4/733 it took around 30 seconds to download 2220 headers (using a
debug build from a few weeks ago, pre Simon's fix). This on a non-optimized build.
As you can see this is a HUGE performance degradation. It is literally 10 times
slower.
Comment 121•23 years ago
|
||
posted new bug for mail performance degradation:
http://bugzilla.mozilla.org/show_bug.cgi?id=102797
lrg tested major windows and browsed around on dual CPU machine for about 3
hours and didn't find any problems (other than generally sluggish performance).
You need to log in
before you can comment on or make changes to this bug.
Description
•