Closed Bug 148453 Opened 23 years ago Closed 21 years ago

Mozilla no longer dumps core on SEGV signal!

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: mozilla-bugs, Assigned: ccarlen)

References

Details

(Keywords: dataloss, regression)

Attachments

(2 files, 3 obsolete files)

When old action was SIG_DFL, make sure that default handler is actually executed. 22 years ago Aleksey Nogin 1.92 KB, patch		Details \| Diff \| Splinter Review
When original action was SIG_DFL, make sure the default handler is still executed before we exit. 22 years ago Aleksey Nogin 1.98 KB, patch		Details \| Diff \| Splinter Review
When original action was SIG_DFL, make sure the default handler is still executed before we exit. 22 years ago Aleksey Nogin 1.43 KB, patch	ccarlen : review- dbaron : superreview-	Details \| Diff \| Splinter Review
compiles on solaris and linux 21 years ago timeless 1.46 KB, patch	bryner : review- dbaron : superreview+	Details \| Diff \| Splinter Review
this is what i had in mind 21 years ago Brian Ryner (not reading) 1.30 KB, patch	brendan : review+ dbaron : superreview+	Details \| Diff \| Splinter Review

Aleksey Nogin

Reporter

Description

•

23 years ago

I've noticed that Mozilla have recently stopped dumping the core when it crashes; instead it just exits with exit status 11. I believe that this was caused by the bug 76431 checkin - the SEGV signal gets intercepted by the profile manager which then deletes the lock file and exits and core is never dumped! This is pretty bad. It makes it significantly harder to investigate unexpected (e.g. those that I didn't cause intentially when investigating a previous crash) crashes.

Aleksey Nogin

Reporter

Comment 1

•

23 years ago

This is a recent regression. It's also a dataloss - important stack trace information gets lost instead of being saved in a core file.

Keywords: dataloss, regression

Andrew Hagen

Updated

•

23 years ago

Keywords: mozilla1.0, nsbeta1

Bradley Baetz (:bbaetz)

Comment 2

•

23 years ago

This will also presumably affect the stack-trace-on-crash hack which dbaron will check in one of these days. The code chains signal handlers, so as long as we run after talkback is initialied, talkback should still work. If the profile code runs before, then the lock files probably aren't cleaned up - anyone with a talkback build (or the comm source) want to test that? Can we reset the sigaction to SIG_DFL, and then raise() from a signal handler? Or should the final call be to abort(), not _exit ?

Keywords: mozilla1.0.1

Christopher Aillon (sabbatical, not receiving bugmail)

Comment 3

•

23 years ago

re-assign and cc some people from bug 76431

Assignee: Matti → ccarlen

Component: Browser-General → Profile Manager BackEnd

QA Contact: imajes-qa → ktrina

Dan Mosedale (:dmosedale, :dmose)

Comment 4

•

23 years ago

I wonder whether this affects talkback as well?

Andrew Schultz

Comment 5

•

23 years ago

this doesn't seem to affect talkback, but it prevents platforms without talkback from getting stacktraces on unreproducible crashes.

Aleksey Nogin

Reporter

Comment 6

•

23 years ago

Can bug 14989 check-ins help here?

Aleksey Nogin

Reporter

Comment 7

•

22 years ago

Any progress on this? A recent Mozilla build (2002122113) crashed twice on me over the last two days without me doing anything distinctive, and I have no idea what's wrong. :-( It would be *really* nice to get the core dumps back!

Depends on: 76431

Keywords: mozilla1.3

Aleksey Nogin

Reporter

Comment 8

•

22 years ago

Attached patch When old action was SIG_DFL, make sure that default handler is actually executed. (obsolete) — Details — Splinter Review

This patch ammends the current nsProfileAccess signal handler to check whether the old handler was SIG_DFL and if so, then restore that SIG_DFL sigaction and re-raise the signal to make sure the default action is executed. This ensures that the core is actually dumped on SIGSEGV/SIGABRT/etc.

Aleksey Nogin

Reporter

Comment 9

•

22 years ago

Comment on attachment 110048 [details] [diff] [review] When old action was SIG_DFL, make sure that default handler is actually executed. Please review. Thanks!

Attachment #110048 - Flags: superreview?

Attachment #110048 - Flags: review?

Aleksey Nogin

Reporter

Comment 10

•

22 years ago

Comment on attachment 110048 [details] [diff] [review] When old action was SIG_DFL, make sure that default handler is actually executed. Oops, made a typo, fixing.

Attachment #110048 - Attachment is obsolete: true

Attachment #110048 - Flags: superreview?

Attachment #110048 - Flags: review?

Aleksey Nogin

Reporter

Comment 11

•

22 years ago

Attached patch When original action was SIG_DFL, make sure the default handler is still executed before we exit. (obsolete) — Details — Splinter Review

Fixed a small typo in the patch.

Aleksey Nogin

Reporter

Updated

•

22 years ago

Attachment #110116 - Flags: superreview?

Attachment #110116 - Flags: review?

timeless

Updated

•

22 years ago

Attachment #110116 - Flags: superreview?(dveditz)

Attachment #110116 - Flags: superreview?

Attachment #110116 - Flags: review?(blizzard)

Attachment #110116 - Flags: review?

Christian :Biesinger (don't email me, ping me on IRC)

Comment 12

•

22 years ago

does this maybe fix bug 176886?

Aleksey Nogin

Reporter

Comment 13

•

22 years ago

I doubt it (although I do not really know how the talkback works). My code only makes sure that the default handler is executed properly when there was no non-default handler specified before the profile was locked. If talkback is implemented as a signal handler, then profile locking should start with non-default handler and my code shouldn't change anything... But if something really strange is going with signal handling, it might help, I do not know.

Tom Mraz

Updated

•

22 years ago

Flags: blocking1.3b?

Brant Gurganus

Comment 14

•

22 years ago

By the definitions on <http://bugzilla.mozilla.org/bug_status.html#severity> and <http://bugzilla.mozilla.org/enter_bug.cgi?format=guided>, crashing and dataloss bugs are of critical or possibly higher severity. Only changing open bugs to minimize unnecessary spam. Keywords to trigger this would be crash, topcrash, topcrash+, zt4newcrash, dataloss.

Severity: major → critical

Asa Dotzler [:asa]

Updated

•

22 years ago

Flags: blocking1.3b? → blocking1.3b-

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Updated

•

22 years ago

Blocks: 176886

Grace Bush

Updated

•

22 years ago

QA Contact: ktrina → gbush

Aleksey Nogin

Reporter

Comment 15

•

22 years ago

Attached patch When original action was SIG_DFL, make sure the default handler is still executed before we exit. (obsolete) — Details — Splinter Review

Updating for bitrot.

Attachment #110116 - Attachment is obsolete: true

Aleksey Nogin

Reporter

Comment 16

•

22 years ago

Comment on attachment 113656 [details] [diff] [review] When original action was SIG_DFL, make sure the default handler is still executed before we exit. It's a pretty simple fix, please review.

Attachment #113656 - Flags: superreview?(dveditz)

Attachment #113656 - Flags: review?(blizzard)

Aleksey Nogin

Reporter

Updated

•

22 years ago

Attachment #110116 - Flags: superreview?(dveditz)

Attachment #110116 - Flags: review?(blizzard)

Aleksey Nogin

Reporter

Comment 17

•

22 years ago

Comment on attachment 113656 [details] [diff] [review] When original action was SIG_DFL, make sure the default handler is still executed before we exit. Blizzard thinks dbaron should look at it.

Attachment #113656 - Flags: review?(blizzard) → review?(dbaron)

Daniel Veditz [:dveditz]

Comment 18

•

22 years ago

removing obsolete mozilla1.0 keywords and the misleading "dataloss" (which applies to losing user data not an inability to get debugging core dumps). Accordingly bumping severity back down to normal, but it would be nice to get fixed.

Severity: critical → normal

Keywords: dataloss, mozilla1.0, mozilla1.0.1

Samir Gehani

Comment 19

•

22 years ago

adt: nsbeta1-

Keywords: nsbeta1 → nsbeta1-

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 20

•

22 years ago

Comment on attachment 113656 [details] [diff] [review] When original action was SIG_DFL, make sure the default handler is still executed before we exit. > struct sigaction act, oldact; > act.sa_handler = FatalSignalHandler; >- act.sa_flags = 0; >+ act.sa_flags = SA_NOMASK; > sigfillset(&act.sa_mask); I'm surprised this does anything useful, given that |man sigaction| seems to suggest that, with the full |sa_mask|, it won't, since "In addition [to sa_mask], the signal which triggered the handler will be blocked, unless the SA_NODEFER or SA_NOMASK flags are used." But I'm guessing that one of them is undone by the new |sigaction| call and the other isn't, or something like that? But it looks like this code was moved in revision 1.80. Could you attach a new patch? Sorry to take so long to get to this...

Attachment #113656 - Flags: superreview?(dveditz)

Attachment #113656 - Flags: superreview+

Attachment #113656 - Flags: review?(dbaron)

Attachment #113656 - Flags: review?(ccarlen)

Conrad Carlen (not reading bugmail)

Assignee

Comment 21

•

22 years ago

Comment on attachment 113656 [details] [diff] [review] When original action was SIG_DFL, make sure the default handler is still executed before we exit. >@@ -2131,7 +2135,7 @@ > // because mozilla is run via nohup. > struct sigaction act, oldact; > act.sa_handler = FatalSignalHandler; >- act.sa_flags = 0; >+ act.sa_flags = SA_NOMASK; I moved the patch to where this code is now. Problem is, SA_NOMASK doesn't exist on OS X. Then probably not BSD either? If this is really needed, you need to figure out the #ifdefs.

Attachment #113656 - Flags: review?(ccarlen) → review-

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Updated

•

22 years ago

Attachment #113656 - Flags: superreview+ → superreview-

Martins Krikis

Comment 22

•

21 years ago

I disagree with comment #18---"dataloss" is not necessarily misleading, it can be right on the point because there is _no_ autosave (see bug 16359 and bug 16360). My wife lost several hours of work yesterday and I failed to find any of it in kernel memory (not sure why, probably destroyed it while making a copy of it; I've been successful before with other programs crashing). A coredump, I believe, would have allowed me to recover her text fairly easily. My Mozilla 1.4 for GNU/Linux does not dump core regardless of whether the talkback agent is or is not in use. If people want coredumps (which they show by raising the ulimit for it from the normally default 0), then they should get coredumps. Please fix this. It may be a weird way to recover data but it will be something. Autosave hasn't happened for 3+ years already. This should be easier to fix, I think.

Andrew Hagen

Updated

•

21 years ago

Severity: normal → critical

Flags: blocking1.6a?

Flags: blocking1.5?

Flags: blocking1.4.2?

Flags: blocking1.4.1?

Keywords: mozilla1.3 → dataloss

Brendan Eich [:brendan]

Comment 23

•

21 years ago

There's no need to set blocking1.6a yet, the trunk is still open for checkins. /be

Status: NEW → ASSIGNED

Flags: blocking1.6a?

Asa Dotzler [:asa]

Updated

•

21 years ago

Flags: blocking1.5?

Flags: blocking1.4.1?

timeless

Comment 24

•

21 years ago

Attached patch compiles on solaris and linux — Details — Splinter Review

Attachment #113656 - Attachment is obsolete: true

timeless

Comment 25

•

21 years ago

Comment on attachment 133812 [details] [diff] [review] compiles on solaris and linux With the patch: $ ./run-mozilla.sh ./xpcshell js> Components.classes['@mozilla.org/generic-factory;1'].createInstance() Segmentation Fault - core dumped $ uname -a SunOS DEATHSTARII 5.9 Generic_112233-08 sun4u sparc SUNW,Ultra-80 Yes I have a patch for generic factory too, but it's known broken today, so it was convenient :)

Attachment #133812 - Flags: superreview?(dbaron)

Attachment #133812 - Flags: review?(bryner)

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 26

•

21 years ago

Comment on attachment 133812 [details] [diff] [review] compiles on solaris and linux Sure, sr=dbaron, but don't invent new bracing styles. ... } else if (...) { ... isn't normal. Prevailing style here suggests adding a break before the else and a break before the { following the if.

Attachment #133812 - Flags: superreview?(dbaron) → superreview+

Brendan Eich [:brendan]

Comment 27

•

21 years ago

Comment on attachment 133812 [details] [diff] [review] compiles on solaris and linux And how about space after , in argument lists (sigaction)? /be

Brian Ryner (not reading)

Comment 28

•

21 years ago

Comment on attachment 133812 [details] [diff] [review] compiles on solaris and linux I'm a little concerned about signal handling recursion should we ever crash in the signal handler, due to NODEFER being used. Could you instead just unblock the signal before calling raise() by calling sigprocmask() ?

Brian Ryner (not reading)

Comment 29

•

21 years ago

Attached patch this is what i had in mind — Details — Splinter Review

unblock the signal in question just before invoking the default handler

Brendan Eich [:brendan]

Comment 30

•

21 years ago

Comment on attachment 134344 [details] [diff] [review] this is what i had in mind >+ if (oldact->sa_handler == SIG_DFL) { >+ // Make sure the default sig handler is executed period at end of comment sentence. >+ // We need it to get Mozilla to dump core. >+ sigaction(signo,oldact,NULL); >+ >+ // Now that we've restored the default handler, unmask the >+ // signal and invoke it. >+ Why the extra blank line here, but not previously. I'd say axe this one, but in any event be consistent. >+ sigset_t unblock_sigs; >+ sigemptyset(&unblock_sigs); >+ sigaddset(&unblock_sigs, signo); >+ >+ sigprocmask(SIG_UNBLOCK, &unblock_sigs, NULL); >+ >+ raise(signo); >+ } else if (oldact->sa_handler && oldact->sa_handler != SIG_IGN) >+ { >+ oldact->sa_handler(signo); >+ } > } > > // Backstop exit call, just in case. Sorry for the drive-by nit-picking, it looks good to me otherwise. Timeless, darin? /be

Darin Fisher

Comment 31

•

21 years ago

Comment on attachment 134344 [details] [diff] [review] this is what i had in mind this patch makes good sense to me. to be nit picky, the brace style is not consistent near the "else" and there should be spaces in the parameter list of sigaction ;-) r/sr=darin fwiw

Brian Ryner (not reading)

Comment 32

•

21 years ago

Comment on attachment 133812 [details] [diff] [review] compiles on solaris and linux Minusing, I prefer the patch I attached that doesn't use SA_NODEFER...

Attachment #133812 - Flags: review?(bryner) → review-

Brian Ryner (not reading)

Comment 33

•

21 years ago

Comment on attachment 134344 [details] [diff] [review] this is what i had in mind Brendan, what do you think?

Attachment #134344 - Flags: review?(brendan)

Brendan Eich [:brendan]

Comment 34

•

21 years ago

Comment on attachment 134344 [details] [diff] [review] this is what i had in mind Fix the nit mentioned in comment 27 and r/sr=me. /be

Attachment #134344 - Flags: review?(brendan) → review+

Brian Ryner (not reading)

Updated

•

21 years ago

Attachment #134344 - Flags: superreview?(dbaron)

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 35

•

21 years ago

Comment on attachment 134344 [details] [diff] [review] this is what i had in mind >+ } else if (oldact->sa_handler && oldact->sa_handler != SIG_IGN) >+ { Put the |else| on a new line after the }, make sure someone's tested this, and sr=dbaron.

Attachment #134344 - Flags: superreview?(dbaron) → superreview+

Brian Ryner (not reading)

Comment 36

•

21 years ago

checked in.

Status: ASSIGNED → RESOLVED

Closed: 21 years ago

Resolution: --- → FIXED

Mike Kaply [:mkaply]

Comment 37

•

21 years ago

Brian, Can you check the correct version of this patch into 1.4.2 and mark fixed1.4.2 please?

Flags: blocking1.4.2? → blocking1.4.2+

Nobody; OK to take it and work on it

Updated

•

9 years ago

Product: Core → Core Graveyard

You need to log in before you can comment on or make changes to this bug.