Closed
Bug 880285
Opened 11 years ago
Closed 10 years ago
Intermittent b2g18* crashtest,reftest timeout followed by crash (Fatal signal 11 (SIGSEGV) at 0x42e00000)
Categories
(Firefox OS Graveyard :: General, defect)
Tracking
(Not tracked)
RESOLVED
WONTFIX
People
(Reporter: RyanVM, Unassigned)
References
Details
(Keywords: crash, intermittent-failure)
Attachments
(1 file)
504 bytes,
patch
|
ted
:
review+
|
Details | Diff | Splinter Review |
Looks similar to bug 818103, but this still occurs on the b2g18 branches even after updating the emulator with the fix for bug 867996. We hit this very frequently (at least once per push), primarily on the crashtests. Under our normal tree rules, the crash rate is high enough to warrant hiding the tests. https://tbpl.mozilla.org/php/getParsedLog.php?id=23856504&tree=Mozilla-B2g18
Comment 1•11 years ago
|
||
Any idea why we're not getting stack dumps on this crash?
Flags: needinfo?(ahalberstadt)
Reporter | ||
Comment 2•11 years ago
|
||
Also, this has been happening for a long time, but we'd been starring the failures as bug 818103. To avoid cluttering this bug up like that one, I will not be copying/pasting every log link into here when I star. Unless you hear otherwise, it's safe to assume this is still happening with high frequency :)
Comment 3•11 years ago
|
||
(In reply to Mike Habicher [:mikeh] from comment #1) > Any idea why we're not getting stack dumps on this crash? I see: 05:57:02 INFO - checking for crashes in '/data/local/tests/reftest/profile/minidumps' Followed by no other output. This usually indicates that there are no minidumps being generated (otherwise there would be a minidump found message or something similar). The mechanism that generates the minidumps is kind of a black box to me. Ted, do you know what might be going on?
Flags: needinfo?(ahalberstadt) → needinfo?(ted)
Comment 4•11 years ago
|
||
I don't really know how to read the logcat tea leaves, but it strikes me that maybe this crash isn't actually being caught by Breakpad. Compare the end of the logcat from comment 0: https://tbpl.mozilla.org/php/getParsedLog.php?id=23856504&tree=Mozilla-B2g18 --------------------------------- 05:57:03 INFO - I/Gecko ( 948): REFTEST TEST-START | http://10.0.2.2:8888/tests/layout/generic/crashtests/683702-1.xhtml | 36 / 633 (5%) 05:57:03 ERROR - F/libc ( 948): Fatal signal 11 (SIGSEGV) at 0x42e00000 (code=2) 05:57:03 ERROR - This usually indicates the B2G process has crashed --------------------------------- to the end of the logcat from a crash in bug 818103: https://tbpl.mozilla.org/php/getParsedLog.php?id=23052012&full=1&branch=birch#error2 --------------------------------- 15:10:32 WARNING - E/GeckoConsole( 766): [JavaScript Error: "The character encoding of the HTML document was not declared. The document will render with garbled text in some browser configurations if the document contains characters from outside the US-ASCII range. The character encoding of the page must be declared in the document or in the transfer protocol." {file: "http://10.0.2.2:8888/tests/layout/reftests/bugs/269908-4-ref.html" line: 0}] 15:10:32 INFO - Return code: 0 ---------------------------------
Flags: needinfo?(ted)
Comment 5•11 years ago
|
||
I also note that there's a string that shows up in the logcat in both logs: 05:50:53 INFO - exception: [Exception... "Component returned failure code: 0xc1f30001 (NS_ERROR_NOT_INITIALIZED) [nsICrashReporter.annotateCrashReport]" nsresult: "0xc1f30001 (NS_ERROR_NOT_INITIALIZED)" location: "JS frame :: chrome://browser/content/shell.js :: <TOP_LEVEL> :: line 224" data: no]creating 1! However, this only shows up twice in the log where we get a minidump, but 4 times in the one where we didn't. Perhaps we're not actually enabling Breakpad properly sometimes?
Reporter | ||
Comment 6•11 years ago
|
||
https://tbpl.mozilla.org/php/getParsedLog.php?id=24075221&tree=Mozilla-B2g18
Reporter | ||
Comment 7•11 years ago
|
||
Out of sight, out of mind? Is there anyone who can look at this please?
Comment 8•11 years ago
|
||
How frequently is this crash happening? We are only running reftest-sanity on b2g18, I wonder if it wouldn't just be more worthwhile to turn them off (though I guess crashtests are a different matter).. I guess it depends on: a) how much longer b2g18 is going to be around b) how many commits are going to be pushed there now that the first couple releases are wrapping up Another option would be to enable the full stack emulator builds like we have on m-c, but this may require a fair amount of work for releng (and wouldn't be a guaranteed fix).
Reporter | ||
Comment 9•11 years ago
|
||
This hits crashtests too, and I think turning off what few reftests we run on b2g18 to fix an intermittent crash will not go over well. Also, b2g18 will be around until at least the end of the year into early next year. It's not going away any time soon. Alex will have to answer about how much activity we expect on it going forward.
Flags: needinfo?(akeybl)
Reporter | ||
Comment 10•11 years ago
|
||
Oh, and it happens roughly once every 1-2 pushes. It hits in spurts, so we might see multiple crashes on one push, then 2 pushes without, etc. It's intermittent, but quite frequent.
Comment 11•11 years ago
|
||
Ok I didn't realize it was going to be around for that long. I don't know how much value reftest sanity is providing, but we are at least running a fair amount of crashtests, so I agree we don't want to turn them off. Maybe it would be worth getting full stack emulator builds going on b2g18 then.
Comment 12•11 years ago
|
||
(In reply to Andrew Halberstadt [:ahal] from comment #8) > How frequently is this crash happening? We are only running reftest-sanity > on b2g18, I wonder if it wouldn't just be more worthwhile to turn them off > (though I guess crashtests are a different matter).. > > I guess it depends on: > a) how much longer b2g18 is going to be around > b) how many commits are going to be pushed there now that the first couple > releases are wrapping up > > Another option would be to enable the full stack emulator builds like we > have on m-c, but this may require a fair amount of work for releng (and > wouldn't be a guaranteed fix). It'll be around till March 2014, and we can expect at least 30-40 landings a cycle until then. Agreed we shouldn't turn these off.
Flags: needinfo?(akeybl)
Reporter | ||
Comment 16•11 years ago
|
||
(In reply to Andrew Halberstadt [:ahal] from comment #11) > Maybe it would be worth getting full stack emulator builds going on b2g18 > then. Who needs to make this call?
Flags: needinfo?(ahalberstadt)
Comment 17•11 years ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #16) > (In reply to Andrew Halberstadt [:ahal] from comment #11) > > Maybe it would be worth getting full stack emulator builds going on b2g18 > > then. > > Who needs to make this call? The people who make that call are me, Joduinn, and Jgriffin. It represents the same serious amount of work that adding a new platform always creates (you might as well treat it like a new platform. That means lots of strange intermittents, then a period of fixing them, and then they will be running green. Not a silver bullet. And if we still crash and we aren't actually setting up breakpad properly like Ted hypothesizes above, then we're back at square 1. Ahal, would there be a way to run the same set of crashtests on a full stack emulator build using a b2g18 build of gecko/gaia/gonk? That way we can see how much it buys us. Ted, is there a way to determine if breakpad isn't being enabled? Perhaps something we can dump out to the log from the build in order to let us know whether that is a red herring or not?
Flags: needinfo?(ted)
Comment 18•11 years ago
|
||
(In reply to Clint Talbert ( :ctalbert ) from comment #17) > (In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #16) > > (In reply to Andrew Halberstadt [:ahal] from comment #11) > > > Maybe it would be worth getting full stack emulator builds going on b2g18 > > > then. > > > > Who needs to make this call? > The people who make that call are me, Joduinn, and Jgriffin. It represents > the same serious amount of work that adding a new platform always creates > (you might as well treat it like a new platform. That means lots of strange > intermittents, then a period of fixing them, and then they will be running > green. Not a silver bullet. FWIW, we are already running the full builds on b2g18, it would just be a matter of changing some config files to enable the tests. Worst case scenario there will be a bunch of intermittents and we'll have to hide, fix, or disable them again. Though, yes, this worst case could be likely. > Ahal, would there be a way to run the same set of crashtests on a full stack > emulator build using a b2g18 build of gecko/gaia/gonk? That way we can see > how much it buys us. I'll look into when I have some time (might not be for a few days). Though I wouldn't expect to be able to reproduce the problem in question locally. Our success reproducing this bug in the past has been zero.
Flags: needinfo?(ahalberstadt)
Comment 19•11 years ago
|
||
We could stick a line in reftest.js to dump the crashreporter status, it's pretty simple. Something like: var cr = Components.classes["@mozilla.org/toolkit/crash-reporter;1"] .getService(Components.interfaces.nsICrashReporter); dump("crashreporter enabled: " + cr.enabled + "\n");
Flags: needinfo?(ted)
Reporter | ||
Comment 21•11 years ago
|
||
Clint, ahal says he probably won't have time to look into this any time soon. Is there someone else who might? We still hit it pretty consistently on b2g18*, even with the full-stack builds.
Flags: needinfo?(ctalbert)
Comment 22•11 years ago
|
||
Ryan, We are chasing this issue down at the b2g work week. Right now, Jonas is trying to find out who would be the right person to dig into this for us. So, I'm going to transition my needinfo request to him.
Flags: needinfo?(jonas)
Comment 23•11 years ago
|
||
With 1.1 coming out, b2g 18 (1.0.1) is about to be deprecated. If this doesn't show up on other trees, then I think we just keep moving forward and focus our resources on 1.1, 1.2 and 1.3.
Flags: needinfo?(ctalbert)
Andrew is your guy
Flags: needinfo?(jonas)
Comment 28•11 years ago
|
||
So now this is "just" finding someone to investigate?
Flags: needinfo?(ryanvm)
Reporter | ||
Comment 29•11 years ago
|
||
Correct. Switching to full-stack builds & tests did not make the problem to go away, so someone needs to investigate.
Flags: needinfo?(ryanvm)
Comment 30•11 years ago
|
||
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #19) > We could stick a line in reftest.js to dump the crashreporter status, it's > pretty simple. Something like: > var cr = Components.classes["@mozilla.org/toolkit/crash-reporter;1"] > .getService(Components.interfaces.nsICrashReporter); > dump("crashreporter enabled: " + cr.enabled + "\n"); How's the attached (untested)?
Attachment #826007 -
Flags: review?(ted)
Comment 31•11 years ago
|
||
Comment on attachment 826007 [details] [diff] [review] testBug880285.patch Review of attachment 826007 [details] [diff] [review]: ----------------------------------------------------------------- Plausible.
Attachment #826007 -
Flags: review?(ted) → review+
Comment 32•11 years ago
|
||
ahal: do you think this is another symptom of the stuff we were discussing in bug 866937?
Flags: needinfo?(ahalberstadt)
Comment 33•11 years ago
|
||
Yeah could be, > 05:57:03 ERROR - F/libc ( 948): Fatal signal 11 (SIGSEGV) at 0x42e00000 (code=2) looks like it originates from libc and :jrmuizel says that we are missing libc symbols in bug 866937. If we did get the symbols, I'd probably still need to hunt down and backport some of the check_for_crashes patches, but that should be fairly straight forward. It's the getting the symbols part that is a mystery to me.
Flags: needinfo?(ahalberstadt)
Comment 34•11 years ago
|
||
The symbols are irrelevant here, it's whether we're catching the crash and producing a minidump or not. Your analysis in the other bug showed that we were crashing before we restarted with Breakpad enabled, I was wondering if this was something similar.
Comment 35•11 years ago
|
||
It could be, I can't see the logfile anymore and there don't seem to be any recent failures like this on b2g18, so I'm not exactly sure where the crash is happening. Just for the record, in the month of October we ran B2G Emulator ICS builds 5 times on b2g18 (and this crash didn't happen in any of them). Ryan, are you still noticing this crash? If so is this still something that should be a high priority to fix? It seems like b2g18 is winding down and the return on fixing this is getting smaller and smaller.
Flags: needinfo?(ryanvm)
Reporter | ||
Comment 36•11 years ago
|
||
The frequency certainly seems to be diminished. Recent retriggers show it ~10% of the time. This branch has approximately 4 months of support left and a diminishing number of checkins to it. I guess I'd be OK with leaving it as-is at this point, but I do get worried about setting an "ignore it long enough and we can eventually WONTFIX it" precedent. This bug has been on file for 5 months now.
Flags: needinfo?(ryanvm)
Reporter | ||
Comment 37•11 years ago
|
||
https://tbpl.mozilla.org/php/getParsedLog.php?id=30152672&tree=Mozilla-B2g18 12/76 runs = 16%
Comment 38•11 years ago
|
||
I'd explicitly like to not consider this as setting any precedent, but rather allocating resources to where they are most useful; we're currently working with developers to identify and fix a number of asserts and crashes on m-c, and that seems likely to have a significant future payback compared to diverting resources to work on this bug. But, I agree it's definitely sub-optimal. Ahal, can you bring this up on Friday's B2G engineering meeting? Let's get jonas et al to make the final call.
Flags: needinfo?(ctalbert)
Comment 39•11 years ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM UTC-5] from comment #37) > https://tbpl.mozilla.org/php/getParsedLog.php?id=30152672&tree=Mozilla-B2g18 > > 12/76 runs = 16% And thanks for the this statistic, btw!
Comment 40•11 years ago
|
||
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #34) > The symbols are irrelevant here, it's whether we're catching the crash and > producing a minidump or not. Your analysis in the other bug showed that we > were crashing before we restarted with Breakpad enabled, I was wondering if > this was something similar. So to answer the question, no it isn't because it is crashing before we restart the b2g process with crashreporting enabled. It could be because we aren't checking for crashes when we should be.
Comment 41•11 years ago
|
||
Thanks. Just trying to make sure we're asking the right questions so we know what's actually wrong.
Reporter | ||
Comment 42•10 years ago
|
||
https://tbpl.mozilla.org/php/getParsedLog.php?id=32584189&tree=Mozilla-B2g18-v1.1.0hd
Reporter | ||
Updated•10 years ago
|
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → WONTFIX
You need to log in
before you can comment on or make changes to this bug.
Description
•