Closed Bug 127150 Opened 23 years ago Closed 22 years ago

Mac crash in NSPR eating up CPU cycles in Talkback system.

Categories

(NSPR :: NSPR, defect)

PowerPC
Mac System 9.x
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: greer, Assigned: greer)

Details

Attachments

(2 files, 1 obsolete file)

In recent months the Talkback system has experienced intermittent episodes in which >50% of the system's CPU time is devoted to a single process. Despite the increased CPU cycles the process halts any further processing of data and must be killed and restarted. We have discovered that the problem is caused by Mac crashes whose stack traces go through _PR_UserRunThread. If more than one crash (using the NSPR20.xSYM symbols library) occurs the system continues to divide 100% of the system processing time amongst the processes involved, eventually shutting down the entire digestion of TB incidents. We can work around the problem by renaming the NSPR20.xSYM symbols in Mac builds so that the system does not see them. Obviously that is not a long term solution and reduces the resolution of stack traces in Talkback reporting in the meantime. We need to understand why the symbols in NSPR20.xSYM are creating a conflict with the digestion of bbx files in the Talkback system.
cc'ing JJ, Simon and Steve in hopes that they might have some insight on this issue.
Hardware: PC → Macintosh
Bug 126482 includes a crash like those that have been troubling the Talkback system. Attaching steps and stack. This bug is *not about the crashes*, but about their effects on the TB system. I am including details for future testing in the event a solution is found.
My guess is that this is happening after I turned on symbol generation in NSPR (bug 119329). Talkback's stack-walking code probably doesn't work correctly for routines like PR_RunUserThread() that don't have a blr instruction at the end, because they have not exit points: { while (1) { ... } } I've had to fix this to make stack walking work in the client; the fix is to put in a bogus exit point (that is never hit).
Assigned the bug to Simon.
Assignee: wtc → sfraser
Simon, if the fix is a simple exit point, would you please add one? That would be a big help and I'm betting we can push the change through for an a=. Thanks.
chofmann tells me we can get approval for this in M099 if it gets an r=/sr= *soon*. Who can give the r=/sr= for this patch?
We just need wtc to review the patch. Becuase it's NSPR code, it doesn't need sr.
Status: NEW → ASSIGNED
Oh, and we should probably check that this does actually fix the problem, since I'm working on hunches here. How can we do that?
I would figure that we would need to get a one-off Mac build and push the symbols to Twister, then see if we could crash that build. Ideally, Talkback would handle the incident submission corectly. JJ, how difficult would that be to do?
using the most recent release build, I can rebuild nspr with Simon's patch, then substitute the corresponding sym file on the talkback server and do a test run with the crash to see what happens. Assuming we get wtc's review, this can be done any day after OS9 smoketests pass, all we need is to coordinate/schedule the test.
If this patch works, it's fine to check it in. Would be nice to add a few more words to the comment.
JJ now that wtc has weighed in, do you have time to build NSPR with the patch this afternoon? And I'll need one of you to try the crash in comment #2. I appreciate the help.
Attachment #71817 - Attachment is obsolete: true
I've done what I can here.
Assignee: sfraser → greer
Status: ASSIGNED → NEW
Mozilla CFM build is dead. See bug 116795 for OS X Talkbalk.
Status: NEW → RESOLVED
Closed: 22 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: