Closed Bug 818103 Opened 12 years ago Closed 12 years ago

Intermittent B2G emulator reftest, crashtest timeout | application crashed [@ libc.so + 0xdc04][@ libc.so + 0xdc00]

Categories

(Firefox OS Graveyard :: General, defect)

ARM
Gonk (Firefox OS)
defect
Not set
critical

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: emorley, Assigned: mikeh)

References

Details

(Keywords: crash, intermittent-failure)

Crash Data

Attachments

(1 obsolete file)

b2g_ics_armv7a_gecko_emulator mozilla-inbound opt test reftest-6 on 2012-12-01 22:09:15 PST for push 8868ca286572 slave: talos-r3-fed-070 https://tbpl.mozilla.org/php/getParsedLog.php?id=17529403&tree=Mozilla-Inbound { 22:23:24 INFO - REFTEST TEST-START | http://10.0.2.2:8888/tests/layout/reftests/reftest-sanity/font-default.html | 7 / 20 (35%) 22:23:24 WARNING - TEST-UNEXPECTED-FAIL | http://10.0.2.2:8888/tests/layout/reftests/reftest-sanity/font-default.html | application timed out after 330 seconds with no output 22:23:24 WARNING - This is a harness error. 22:23:24 INFO - INFO | automation.py | Application ran for: 0:06:32.180782 22:23:24 INFO - INFO | automation.py | Reading PID log: /tmp/tmpfDMxl5pidlog 22:23:24 INFO - WARNING | automationutils.processLeakLog() | refcount logging is off, so leaks can't be detected! 22:23:24 INFO - REFTEST INFO | runreftest.py | Running tests: end. 22:23:25 ERROR - Return code: 1 }
I don't think this is specific to a particular test: https://tbpl.mozilla.org/php/getParsedLog.php?id=18632898&tree=Cedar I've seen it with other tests too, though it doesn't happen very often.
Summary: Intermittent B2G emulator reftest-sanity/font-default.html | application timed out after 330 seconds with no output → Intermittent B2G emulator reftest timeout | application timed out after 330 seconds with no output
Removing the test name from the summary will mean TBPL can't star it. Whilst we do have failure modes on some platforms that fall into the "happens on any test, so not worth trying to put them in the summary" category - the fact that we had another instance that occurred in comment 1 makes me think we should at least try and list the most common test names comma separated in the summary, so they can be more easily starred (this is how intermittent failure bugs are used for all other platforms). Sound reasonable? :-)
If you want to start out doing that, then sure. But my inclination is that this is completely unrelated to the tests and that something is happening in B2G which causes a timeout in whatever test happens to be running at the moment. I've seen this maybe 4-5 times on Cedar and none of them were with the same test (note that on Cedar we are running ~400 tests per chunk as opposed to the 13 running on the other branches).
Ah the small number of tests explains why the same one has appeared in the summary twice. In which case, let's forget that :-)
Of course, if we can't star it automatically, we'll just leave it unstarred, and the constant presence of unstarred orange will cause people to just completely ignore failures on this platform, the same way that they still do for Android failures. https://tbpl.mozilla.org/php/getParsedLog.php?id=18689026&tree=Mozilla-Inbound
This seems to be a regression, reftests have been running since November and the first I've seen this was in the last week.
It's possible this intermittent crash is due to how we are starting the reftests on B2G. Bug 807970 was filed to ensure we are running them properly. I'll work on that and hopefully it will fix this problem.
Just want to clarify comment 27. My theory is that by replacing shell.xul with reftest.xul the code that is causing the crash will never execute in the first place. So it will get rid of these intermittent oranges, but won't actually fix the root of the problem. Also want to clarify that it's a theory ;)
So it looks like this is still happening even with my patch to bug 807970: https://tbpl.mozilla.org/php/getParsedLog.php?id=18876069&tree=Cedar The patch is preliminary and there are some issues with it I need to fix, but at this point it looks like it does not prevent the B2G process from crashing like I had hoped. I'm not really sure what else I can do about this bug :(
Only real difference between a crashtest and a reftest is that you don't care what a crashtest looks like, as long as it doesn't look like it crashed. There's also bug 821420 for the same thing in the mochitest harness, which describes it as a crash causing the appearance of a timeout. https://tbpl.mozilla.org/php/getParsedLog.php?id=19380484&tree=Mozilla-Inbound
Depends on: 821420
Summary: Intermittent B2G emulator reftest timeout | application timed out after 330 seconds with no output → Intermittent B2G emulator reftest, crashtest timeout | application timed out after 330 seconds with no output
I'm moving this to B2G/General as I don't think it is a problem with the reftest harness (though the harness could certainly do a better job at providing info). If you are looking at this the first time, the important take aways are: 1) The timeout happens because the b2g process is crashing 2) This is a regression, we never saw this until sometime early-mid January 3) It happens on both m-c and b2g18 4) Mochitests have similar problems (see bug 821420) 5) It seems to have gotten worse recently (in the last week or two), but this is just my casual observation Let me know if there's anything I can do to provide more information. I'll try my best, though this is kind of difficult to reproduce as is.
Component: Reftest → General
Product: Testing → Boot2Gecko
Version: Trunk → unspecified
Summary: Intermittent B2G emulator reftest, crashtest timeout | application timed out after 330 seconds with no output → Intermittent B2G emulator reftest, crashtest timeout | application timed out after 330 seconds with no output ("This usually indicates the B2G process has crashed")
Depends on: 843296
ahal, whilst this may in fact be a platform issue, the harness doesn't provide enough info here - and either way, this bug and bug 821420 are soon going to result in B2G testsuites being hidden by default again. Please can you be point for improving the harness and/or banging B2G people's heads together to take a look at this? (or can you find someone more appropriate to do so) Thank you :-)
Flags: needinfo?(ahalberstadt)
(In reply to Ed Morley [:edmorley UTC+0] from comment #202) > ahal, whilst this may in fact be a platform issue, the harness doesn't > provide enough info here - and either way, this bug and bug 821420 are soon > going to result in B2G testsuites being hidden by default again. Please can > you be point for improving the harness and/or banging B2G people's heads > together to take a look at this? (or can you find someone more appropriate > to do so) Thank you :-) See https://bugzilla.mozilla.org/show_bug.cgi?id=821420#c112
Flags: needinfo?(ahalberstadt)
There are now crash stacks available for this bug which show up in the log: https://tbpl.mozilla.org/php/getParsedLog.php?id=21057455&tree=Mozilla-Inbound&full=1
Marshall, I believe you've been signed up for this. :) You can see all the relevant test failures in TBPL on inbound using: https://tbpl.mozilla.org/?tree=Mozilla-Inbound&jobname=emulator&showall=1 All of the recent failures have a crash stack in the log, e.g., https://tbpl.mozilla.org/php/getParsedLog.php?id=21187122&tree=Mozilla-Inbound&full=1#error1 Most of them look like they occur during a call to jemalloc.
Assignee: nobody → marshall
Severity: normal → critical
Crash Signature: [@ libc.so@0xdc04]
Keywords: crash
Summary: Intermittent B2G emulator reftest, crashtest timeout | application timed out after 330 seconds with no output ("This usually indicates the B2G process has crashed") → Intermittent B2G emulator reftest, crashtest timeout | application crashed [@ libc.so + 0xdc04]
marshall, any update on this? The failure rate on this is really crazy!
I haven't been able to reproduce this locally, but the symbols are giving me some hints. I added some debugging messages to where I think the problem is, and have a try run going: https://tbpl.mozilla.org/?tree=Try&rev=7987171f4226
So I've been able to narrow down the problem, but it looks like I'll need more debugging in libui which lives in the Android 'toolchain', and can't be changed in a TBPL push :( For now I've downloaded the TBPL b2g binaries + symbols, and I'm running the reftests locally with my libui changes. Hopefully I'll eventually be able to repro with the right binaries..
No luck reproducing this locally, and without being able to update the toolchain I can't easily add more debugging for the try servers.. Is there an easy way for me to get a custom toolchain onto TBPL?
Flags: needinfo?
(In reply to Marshall Culpepper [:marshall_law] from comment #344) > No luck reproducing this locally, and without being able to update the > toolchain I can't easily add more debugging for the try servers.. > > Is there an easy way for me to get a custom toolchain onto TBPL? (An empty needinfo request doesn't notify anyone, and gets cleared by the next commenter whomever that may be - in this case tbplbot). Picking jgriffin at random, please redirect if needed :-)
Flags: needinfo?(jgriffin)
(In reply to Marshall Culpepper [:marshall_law] from comment #344) > No luck reproducing this locally, and without being able to update the > toolchain I can't easily add more debugging for the try servers.. > > Is there an easy way for me to get a custom toolchain onto TBPL? I'll leave the needinfo up on jgriffin in case I'm wrong, but I don't think there is. One thing we might be able to do is set you up with your own project branch and make it use an emulator package that you have access to. Another option would be to give you a loaner slave, though you might have the same problem in that reproducing locally is difficult. Not sure which of the two makes more sense at this point.
If you want a custom toolchain on TBPL, I think we can do this. If you build an emulator locally that has the changes you want, then invoke the scripts/package-emulator.sh script, it will generate an emulator package. We can then ask rel-eng to upload that, and then you can put a pointer to that in b2g/test/emulator.manifest as part of a try push. ping me if you need help with this.
Flags: needinfo?(jgriffin)
Depends on: 866937
No longer depends on: 866937
Depends on: 866937
Depends on: 867996
Assignee: marshall → mhabicher
Depends on: 870863
https://tbpl.mozilla.org/php/getParsedLog.php?id=22874263&tree=Mozilla-Inbound b2g_ics_armv7a_gecko_emulator mozilla-inbound opt test reftest-3 on 2013-05-12 00:52:25 PDT for push 39f9c42b2668 slave: talos-r3-fed-060 01:10:23 WARNING - TEST-UNEXPECTED-FAIL | http://10.0.2.2:8888/tests/layout/reftests/bugs/385569-1b.html | application timed out after 330 seconds with no output 01:10:39 WARNING - PROCESS-CRASH | http://10.0.2.2:8888/tests/layout/reftests/bugs/385569-1b.html | application crashed [@ libc.so + 0xdc04] 01:10:40 ERROR - Return code: 1
<3 :-D
Blocks: 872167
Comment on attachment 749412 [details] Link to PR to fix emulator out-of-bounds read Migrated patch to bug 867996, which has a cleaner comment history.
Attachment #749412 - Attachment is obsolete: true
Attachment #749412 - Flags: review?(mwu)
No longer blocks: 872167
Depends on: 872282
Summary: Intermittent B2G emulator reftest, crashtest timeout | application crashed [@ libc.so + 0xdc04] → Intermittent B2G emulator reftest, crashtest timeout | application crashed [@ libc.so + 0xdc04][@ libc.so + 0xdc00]
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: