All users were logged out of Bugzilla on October 13th, 2018

Mac crash in NSPR eating up CPU cycles in Talkback system.



17 years ago
16 years ago


(Reporter: greer, Assigned: greer)


Firefox Tracking Flags

(Not tracked)



(2 attachments, 1 obsolete attachment)



17 years ago
In recent months the Talkback system has experienced intermittent episodes in 
which >50% of the system's CPU time is devoted to a single process. Despite the 
increased CPU cycles the process halts any further processing of data and must 
be killed and restarted.
  We have discovered that the problem is caused by Mac crashes whose stack 
traces go through _PR_UserRunThread.  If more than one crash (using the 
NSPR20.xSYM symbols library) occurs the system continues to divide 100% of the 
system processing time amongst the processes involved, eventually shutting down 
the entire digestion of TB incidents.

  We can work around the problem by renaming the NSPR20.xSYM symbols in Mac 
builds so that the system does not see them. Obviously that is not a long term 
solution and reduces the resolution of stack traces in Talkback reporting in 
the meantime.

  We need to understand why the symbols in NSPR20.xSYM are creating a conflict 
with the digestion of bbx files in the Talkback system.

Comment 1

17 years ago
cc'ing JJ, Simon and Steve in hopes that they might have some insight on this 
Hardware: PC → Macintosh

Comment 2

17 years ago
Created attachment 70849 [details]
Steps to reproduce a crash passing through NSPR20

Bug 126482 includes a crash like those that have been troubling the Talkback
system. Attaching steps and stack.

This bug is *not about the crashes*, but about their effects on the TB system. 

I am including details for future testing in the event a solution is found.

Comment 3

17 years ago
My guess is that this is happening after I turned on symbol generation in NSPR 
(bug 119329). Talkback's stack-walking code probably doesn't work correctly for 
routines like PR_RunUserThread() that don't have a blr instruction at the end, 
because they have not exit points:

  while (1)

I've had to fix this to make stack walking work in the client; the fix is to put 
in a bogus exit point (that is never hit).

Comment 4

17 years ago
Assigned the bug to Simon.
Assignee: wtc → sfraser

Comment 5

17 years ago
Simon, if the fix is a simple exit point, would you please add one? That would
be a big help and I'm betting we can push the change through for an a=. Thanks.   

Comment 6

17 years ago
Created attachment 71817 [details] [diff] [review]
Proposed fix: cause the compiler to generate 'blr' instructions that Talkback stack tracing probably rely on

Comment 7

17 years ago
chofmann tells me we can get approval for this in M099 if it gets an r=/sr=
*soon*. Who can give the r=/sr= for this patch?

Comment 8

17 years ago
We just need wtc to review the patch. Becuase it's NSPR code, it doesn't need sr.

Comment 9

17 years ago
Oh, and we should probably check that this does actually fix the problem, since
I'm working on hunches here. How can we do that?

Comment 10

17 years ago
I would figure that we would need to get a one-off Mac build and push the
symbols to Twister, then see if we could crash that build. Ideally, Talkback
would handle the incident submission corectly.

JJ, how difficult would that be to do?

Comment 11

17 years ago
using the most recent release build, I can rebuild nspr with Simon's patch, then
substitute the corresponding sym file on the talkback server and do a test run
with the crash to see what happens.
Assuming we get wtc's review, this can be done any day after OS9 smoketests
pass, all we need is to coordinate/schedule the test.

Comment 12

17 years ago
If this patch works, it's fine to check it in.

Would be nice to add a few more words to the comment.

Comment 13

17 years ago
JJ now that wtc has weighed in, do you have time to build NSPR with the patch
this afternoon? And I'll need one of you to try the crash in comment #2. I
appreciate the help.

Comment 14

17 years ago
Created attachment 71992 [details] [diff] [review]
Patch that actually compiles
Attachment #71817 - Attachment is obsolete: true

Comment 15

17 years ago
I've done what I can here.
Assignee: sfraser → greer

Comment 16

16 years ago
Mozilla CFM build is dead.

See bug 116795 for OS X Talkbalk.
Last Resolved: 16 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.