Closed Bug 127150 Opened 22 years ago Closed 21 years ago

Mac crash in NSPR eating up CPU cycles in Talkback system.

Categories

(NSPR :: NSPR, defect)

PowerPC
Mac System 9.x
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: greer, Assigned: greer)

Details

Attachments

(2 files, 1 obsolete file)

In recent months the Talkback system has experienced intermittent episodes in 
which >50% of the system's CPU time is devoted to a single process. Despite the 
increased CPU cycles the process halts any further processing of data and must 
be killed and restarted.
 
  We have discovered that the problem is caused by Mac crashes whose stack 
traces go through _PR_UserRunThread.  If more than one crash (using the 
NSPR20.xSYM symbols library) occurs the system continues to divide 100% of the 
system processing time amongst the processes involved, eventually shutting down 
the entire digestion of TB incidents.

  We can work around the problem by renaming the NSPR20.xSYM symbols in Mac 
builds so that the system does not see them. Obviously that is not a long term 
solution and reduces the resolution of stack traces in Talkback reporting in 
the meantime.

  We need to understand why the symbols in NSPR20.xSYM are creating a conflict 
with the digestion of bbx files in the Talkback system.
cc'ing JJ, Simon and Steve in hopes that they might have some insight on this 
issue.
Hardware: PC → Macintosh
Bug 126482 includes a crash like those that have been troubling the Talkback
system. Attaching steps and stack.

This bug is *not about the crashes*, but about their effects on the TB system. 

I am including details for future testing in the event a solution is found.
My guess is that this is happening after I turned on symbol generation in NSPR 
(bug 119329). Talkback's stack-walking code probably doesn't work correctly for 
routines like PR_RunUserThread() that don't have a blr instruction at the end, 
because they have not exit points:

{
  while (1)
  {
    ...
  }
}

I've had to fix this to make stack walking work in the client; the fix is to put 
in a bogus exit point (that is never hit).
Assigned the bug to Simon.
Assignee: wtc → sfraser
Simon, if the fix is a simple exit point, would you please add one? That would
be a big help and I'm betting we can push the change through for an a=. Thanks.   
chofmann tells me we can get approval for this in M099 if it gets an r=/sr=
*soon*. Who can give the r=/sr= for this patch?
We just need wtc to review the patch. Becuase it's NSPR code, it doesn't need sr.
Status: NEW → ASSIGNED
Oh, and we should probably check that this does actually fix the problem, since
I'm working on hunches here. How can we do that?
I would figure that we would need to get a one-off Mac build and push the
symbols to Twister, then see if we could crash that build. Ideally, Talkback
would handle the incident submission corectly.

JJ, how difficult would that be to do?
using the most recent release build, I can rebuild nspr with Simon's patch, then
substitute the corresponding sym file on the talkback server and do a test run
with the crash to see what happens.
Assuming we get wtc's review, this can be done any day after OS9 smoketests
pass, all we need is to coordinate/schedule the test.
If this patch works, it's fine to check it in.

Would be nice to add a few more words to the comment.
JJ now that wtc has weighed in, do you have time to build NSPR with the patch
this afternoon? And I'll need one of you to try the crash in comment #2. I
appreciate the help.
Attachment #71817 - Attachment is obsolete: true
I've done what I can here.
Assignee: sfraser → greer
Status: ASSIGNED → NEW
Mozilla CFM build is dead.

See bug 116795 for OS X Talkbalk.
Status: NEW → RESOLVED
Closed: 21 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: