Closed Bug 486580 Opened 11 years ago Closed 7 years ago

[Mac] some talos Tp runs crash without leaving a crash report

Categories

(Toolkit :: Crash Reporting, defect, critical)

x86
macOS
defect
Not set
critical

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: dbaron, Unassigned)

References

Details

(Keywords: crash, intermittent-failure)

There have been some talos Tp runs on Mac mozilla-central tinderboxes that are crashing without leaving a crash report.  Since apparently we do successfully get crash reports some of the time, it seems like this may mean that there are some types of crashes for which we're not generating crash reports.  (There could also be some problem in the talos automation, Mac only, that causes the crash reporting to only work some of the time, but, nevertheless, filing this as a Breakpad bug for now.)

Example runs are:
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1238667257.1238675704.17939.gz
which ended with:

NOISE: Cycle 10: loaded http://localhost/page_load_test/pages/www.chinaren.com/www.chinaren.com/index.html (next: http://localhost/page_load_test/pages/www.sourceforge.net/sourceforge.net/index.php.html)
../Minefield.app/Contents/MacOS/run-mozilla.sh: line 399:   983 Terminated              "$prog" ${1+"$@"}
Failed tp: 
		Stopped Thu, 02 Apr 2009 05:34:54
FAIL: Busted: tp
FAIL: browser crash
program finished with exit code 0

and http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1238670844.1238673451.6349.gz
which ended with:

NOISE: Cycle 6: loaded http://localhost/page_load_test/pages/www.elmundo.es/www.elmundo.es/index.html (next: http://localhost/page_load_test/pages/www.google.ro/www.google.ro/index.html)
../Minefield.app/Contents/MacOS/run-mozilla.sh: line 399:   958 Terminated              "$prog" ${1+"$@"}
Failed tp: 
		Stopped Thu, 02 Apr 2009 04:58:21
FAIL: Busted: tp
FAIL: browser crash
program finished with exit code 0


These crashes should be generating minidumps so that we get data on what is causing the crash.
Whiteboard: [orange]
"Terminated" sounds like the harness killed it. AFAIK Breakpad can't catch that on OS X. It registers a Mach exception handler, and the Mach exception/POSIX signal mapping is lossy or something.
CCing alice for comment.
I have observed mac crashes that generate no crashreport, and this is with no interference from talos: ie, the browser crashes and disappears and talos makes no attempt to send a term signal to anything.

In this case, it appears that a term signal was sent - but I do believe that there are a type of mac browser crashes that leave behind no crashreport.
Right. In the general case, I have seen crashes in the past that just completely foil our crash reporting code, resulting in no minidump or a zero-byte minidump. I've even seen this on Windows, where we simply call MinidumpWriteDump() from a Microsoft DLL to produce the dump. I guess sometimes the process just gets in a bad enough state that you can't do anything.
Duplicate of this bug: 487222
Alice: could you modify Talos to indicate when it forcibly kills the browser? Per comment 1 I suspect these "Terminated" logs are Talos killing the browser, which we can't catch with Breakpad. I have a separate bug on file on handling browser hangs more gracefully (bug 483968), but as it stands it's currently conflated here with another issue whereby the browser actually crashes without leaving a dump.
Filed bug 488298 re: my last comment
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1239828414.1239831614.26543.gz&fulltext=1
MacOSX Darwin 8.8.1 talos mozilla-central nochrome qm-pmac-trunk10
That's different, that says "browser frozen", which means it's a hang.
(In reply to comment #14)
> That's different, that says "browser frozen", which means it's a hang.

Thanks, I filed bug 522662.
Talos will now use different crash messaging - once bug 672192 is fixed.

Crash with stack:
"crash during run (stack found)"

Browser frozen, process terminated, stack created:
"stack found after process termination"

This should clearly differentiate between a stack generated by a 'natural' crash and one created by talos process termination.
Mass marking whiteboard:[orange] bugs WFM (to clean up TBPL bug suggestions) that:
* Haven't changed in > 6months
* Whose whiteboard contains none of the strings: {disabled,marked,random,fuzzy,todo,fails,failing,annotated,leave open,time-bomb}
* Passed a (quick) manual inspection of bug summary/whiteboard to ensure they weren't a false positive.

I've also gone through and searched for cases where the whiteboard wasn't labelled correctly after test disabling, by using attachment description & basic comment searches. However if the test for which this bug was about has in fact been disabled/annotated/..., please accept my apologies & reopen/mark the whiteboard appropriately so this doesn't get re-closed in the future (and please ping me via IRC or email so I can try to tweak the saved searches to avoid more edge cases).

Sorry for the spam! Filter on: #FFA500
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WORKSFORME
Whiteboard: [orange]
You need to log in before you can comment on or make changes to this bug.