Closed Bug 808410 Opened 12 years ago Closed 12 years ago

Android shutdown crashes on mochitest/crashtest/reftest/jsreftest + all crashes on robocop are green on TBPL

Tracking

(firefox17 wontfix, firefox18 affected, firefox19 affected, firefox-esr10 wontfix, firefox-esr17 affected)

Status:

RESOLVED FIXED

Milestone:

mozilla20

Tracking Flags:

Tracking

Status

firefox17

---

wontfix

firefox18

---

affected

firefox19

---

affected

firefox-esr10

---

wontfix

firefox-esr17

---

affected

People

(Reporter: emorley, Assigned: emorley)

References

(Blocks 1 open bug)

Details

Attachments

(6 files)

Part 1: remotereftest.py should use runTests return value 12 years ago Ed Morley [:emorley] 1.09 KB, patch	jmaher : review+	Details \| Diff \| Splinter Review
Part 2: Robocop - non-zero return values shouldn't be overwritten by later success 12 years ago Ed Morley [:emorley] 1.39 KB, patch	gbrown : review+	Details \| Diff \| Splinter Review
Part 3: checkForCrashes should return true for cases where we hit an exception during minidump stackwalk 12 years ago Ed Morley [:emorley] 1.73 KB, patch	jmaher : review+	Details \| Diff \| Splinter Review
Part 4: Make peptest's checkForCrashes more consistent with that in automationutils.py 12 years ago Ed Morley [:emorley] 2.23 KB, patch	jmaher : review+	Details \| Diff \| Splinter Review
Part 5: Callers of checkForCrashes should use its return value 12 years ago Ed Morley [:emorley] 7.02 KB, patch	jmaher : review+ gps : feedback+	Details \| Diff \| Splinter Review
Part 6: B2G mochitests - Ensure retVal is set even when cleanup did not generate an exception 12 years ago Ed Morley [:emorley] 805 bytes, patch	ahal : review+	Details \| Diff \| Splinter Review

Ed Morley [:emorley]

Assignee

Description

•

12 years ago

Yes, you read that correctly \o/ The fix for this will need backporting to all branches, since this has been present since the beginning of remote reftest (2010). Examples: Android Tegra 250 mozilla-release opt test reftest-1 on 2012-10-23 10:46:30 PDT for push 774491e9f207 slave: tegra-149 results: success (0) ...but shutdown crash in [@ nssCertificate_Destroy] ...remotereftest.py finished with exit code 0 https://tbpl.mozilla.org/php/getParsedLog.php?id=16386302&full=1&branch=mozilla-release Android Armv6 Tegra 250 mozilla-inbound opt test crashtest-3 on 2012-11-04 03:50:10 PST for push 58b4bd7b5065 slave: tegra-274 results: success (0) ...but shutdown crash in [@ nssCertificate_Destroy] ...remotereftest.py finished with exit code 0 https://tbpl.mozilla.org/php/getParsedLog.php?id=16736472&full=1&branch=mozilla-inbound Android Armv6 Tegra 250 mozilla-inbound opt test crashtest-3 on 2012-11-03 20:46:11 PDT for push 38b97d2a1bc5 slave: tegra-277 results: success (0) ...but shutdown crash in [@ nssCertificate_Destroy] ...remotereftest.py finished with exit code 0 https://tbpl.mozilla.org/php/getParsedLog.php?id=16733514&full=1&branch=mozilla-inbound Background... Buildbotcustom's unittest.py does: > # Assume that having the "Unexpected: 0" line > # means the tests run completed (successfully). > # Also check for "^TEST-UNEXPECTED-" for harness errors. > if not re.search(r"^REFTEST INFO \| Unexpected: 0 \(", log, re.MULTILINE) or \ > re.search("^TEST-UNEXPECTED-", log, re.MULTILINE): > return worst_status(superResult, WARNINGS) However, reftest.js' summary appended to the log doesn't specifically look for crashes, which means that in the case of a shutdown crash, we get a summary like: > REFTEST INFO | Result summary: > REFTEST INFO | Successful: 704 (1 pass, 703 load only) > REFTEST INFO | Unexpected: 0 (0 unexpected fail, 0 unexpected pass, 0 unexpected asserts, 0 unexpected fixed asserts, 0 failed load, 0 exception) > REFTEST INFO | Known problems: 10 (0 known fail, 0 known asserts, 0 random, 10 skipped, 0 slow) > REFTEST INFO | Total canvas count = 2 > REFTEST TEST-START | Shutdown > > INFO | automation.py | Application ran for: 0:05:16.468885 > INFO | automation.py | Reading PID log: /tmp/tmpBwFVijpidlog > getting files in '/mnt/sdcard/tests/reftest/profile/minidumps/' > Downloading symbols from: http://ftp.mozilla.org/pub/mozilla.org/mobile/tinderbox-builds/mozilla-inbound-android-armv6/1352026305/fennec-19.0a1.en-US.android-arm-armv6.crashreporter-symbols.zip > PROCESS-CRASH | Shutdown | application crashed (minidump found) ...which means the unittest.py snippet above relies on superResult (~the exit code) only. However remotereftest.py unfortunately does: > retVal = 0 > try: > cmdlineArgs = ["-reftest", manifest] > if options.bootstrap: > cmdlineArgs = [] > dm.recordLogcat() > reftest.runTests(manifest, options, cmdlineArgs) > except: > print "Automation Error: Exception caught while running tests" > traceback.print_exc() > retVal = 1 > > reftest.stopWebServer(options) > try: > logcat = dm.getLogcat() > print ''.join(logcat[-500:-1]) > print dm.getInfo() > except devicemanager.DMError: > print "WARNING: Error getting device information at end of test" > > return retVal ...so unless we get an exception (which we don't in the case of crashes) we'll exit with code 0; causing buildbot (and thus TBPL) to treat the run as a pass. Compare with... runreftest.py: line 288 -- sys.exit(reftest.runTests(args[0], options)) runreftestb2g.py line 548 -- retVal = reftest.runTests(manifest, options, cmdlineArgs) I've checked the other harnesses and none of the others seem to need fixing (http://mxr.mozilla.org/mozilla-central/search?string=.runTests&case=1&find=\.py&findi=\.js&filter=^[^\0]*%24&hitlimit=&tree=mozilla-central).

Ed Morley [:emorley]

Assignee

Comment 1

•

12 years ago

> However, reftest.js' summary appended to the log doesn't specifically look for > crashes, which means that in the case of a shutdown crash, we get a summary like: Bah brain fail, that's an orthogonal issue; meant to say something like: However, in the case of a shutdown crash, we get a summary like: > REFTEST INFO | Result summary: > REFTEST INFO | Successful: 704 (1 pass, 703 load only) > REFTEST INFO | Unexpected: 0 (0 unexpected fail, 0 unexpected pass, 0 unexpected asserts, 0 unexpected fixed asserts, 0 failed load, 0 exception) ...

Ed Morley [:emorley]

Assignee

Comment 2

•

12 years ago

Attached patch Part 1: remotereftest.py should use runTests return value — Details — Splinter Review

Assignee: nobody → bmo

Status: NEW → ASSIGNED

Attachment #678132 - Flags: review?(jmaher)

Ed Morley [:emorley]

Assignee

Updated

•

12 years ago

Blocks: 761987

Ed Morley [:emorley]

Assignee

Comment 4

•

12 years ago

https://tbpl.mozilla.org/?tree=Try&rev=de234f1e54fb

Ed Morley [:emorley]

Assignee

Comment 5

•

12 years ago

Hmm I'm still getting exit code 0: https://tbpl.mozilla.org/php/getParsedLog.php?id=16743181&full=1&branch=try

Joel Maher ( :jmaher ) (UTC -8)

Comment 6

•

12 years ago

Comment on attachment 678132 [details] [diff] [review] Part 1: remotereftest.py should use runTests return value Review of attachment 678132 [details] [diff] [review]: ----------------------------------------------------------------- good catch.

Attachment #678132 - Flags: review?(jmaher) → review+

Ed Morley [:emorley]

Assignee

Comment 7

•

12 years ago

(In reply to Ed Morley [:edmorley UTC+0] from comment #5) > Hmm I'm still getting exit code 0: > https://tbpl.mozilla.org/php/getParsedLog.php?id=16743181&full=1&branch=try Due to bug 810405 comment 1. I'll attach further patches here.

Ed Morley [:emorley]

Assignee

Updated

•

12 years ago

Attachment #678132 - Attachment description: Patch v1 → Part 1: remotereftest.py should use runTests return value

Ed Morley [:emorley]

Assignee

Comment 9

•

12 years ago

Ok, so as the duped bugs have shown, this doesn't just affect reftest variants. We miss Android shutdown crashes on everything but talos, and on robocop we even miss non-shutdown crashes too \o/ This is due to a combination of: * remotereftest.py not using the return value from runTests (dealt with by patch part 1). * The return value for robocop tests being overwritten after each test run, so unless the last test failed, we miss earlier failures. (For non-crashes this doesn't matter, since we do an end-of-overall-run log parse + summary (printLog), where failures can still change the return value. However printLog does not look for PROCESS-CRASH lines, but I'll deal with this in another bug, since with the other fixes here, it doesn't affect buildbo/TBPL job status.) * checkForCrashes returning false when we hit an exception partway through, since we don't set foundCrash to True until the very end. * Callers of checkForCrashes() don't use its return value as part of determining the overall exit code. In addition, I spotted: * mochitest/runtestsb2g.py possibly doesn't always set the retVal when the final mochitest.cleanup succeeds without exception. * We fail to print the overall robocop run summary unless the return value was 0. * checkForZombies() doesn't return the result (unlike checkForCrashes) & callers don't use it to determine the final exit value.

Summary: remotereftest.py ignores runTests() exit code, so Android crashtest/reftest/jsreftest shutdown crashes are green on TBPL → Android shutdown crashes on mochitest/crashtest/reftest/jsreftest + all crashes on robocop are green on TBPL

Ed Morley [:emorley]

Assignee

•

12 years ago

(In reply to Ed Morley [:edmorley UTC+0] from comment #11) > If there were do dumps found we return early, That should be "no dumps".

Ed Morley [:emorley]

Assignee

Comment 13

•

12 years ago

Attached patch Part 4: Make peptest's checkForCrashes more consistent with that in automationutils.py — Details — Splinter Review

Whilst peptest's checkForCrashes sets foundCrash earlier (and so doesn't have the same issue as in comment 11), we might as well do similar cleanup here.

Ed Morley [:emorley]

Assignee

Comment 14

•

12 years ago

Attached patch Part 5: Callers of checkForCrashes should use its return value — Details — Splinter Review

...to ensure shutdown crashes actually make the run fail. Note: The remoteautomation.py changes effectively make the "WARNING: unable to remove directory: <minidumps>" fatal, since the comment alongside implied that it should be. Whilst testing this patch, I have seen robocop runs that hit this, but I suspect they are a real issue that needs to be looked at separately (ie startup crashes that we've been missing until now?). gps, please may I have feedback on the runxpcshelltests.py part (notably the xunitResult format).

Attachment #681926 - Flags: feedback?(gps)

Ed Morley [:emorley]

Assignee

Comment 15

•

12 years ago

Attached patch Part 6: B2G mochitests - Ensure retVal is set even when cleanup did not generate an exception — Details — Splinter Review

If for some reason we hit an exception during runTests, but the later mochitest.cleanup call doesn't cause an exception, we may not set retVal. I can't see us hitting this very often, but just in case.

Ed Morley [:emorley]

Assignee

Comment 16

•

12 years ago

Try run looks good: https://tbpl.mozilla.org/?tree=Try&rev=679262f1068d Example crashes on the Try run, that we now correctly catch... Reftest shutdown crash: https://tbpl.mozilla.org/php/getParsedLog.php?id=17068582&tree=Try Robocop crash during testBookmarksTab: https://tbpl.mozilla.org/php/getParsedLog.php?id=17068026&tree=Try Example "Automation Error: No crash directory (/mnt/sdcard/tests/profile/minidumps/) on remote device" (see comment 14; I think these are startup crashes - either way I believe the right thing to do is make these fatal and deal in another bug): https://tbpl.mozilla.org/php/getParsedLog.php?id=17067851&tree=Try https://tbpl.mozilla.org/php/getParsedLog.php?id=17067872&tree=Try https://tbpl.mozilla.org/php/getParsedLog.php?id=17067832&tree=Try

Ed Morley [:emorley]

Assignee

Comment 17

•

12 years ago

(In reply to Ed Morley [:edmorley UTC+0] from comment #16) > I think these are startup crashes - either way I believe the right thing to > do is make these fatal (which is what the part 5 patch does)

Ed Morley [:emorley]

Assignee

Comment 18

•

12 years ago

(In reply to Ed Morley [:edmorley UTC+0] from comment #16) > Example "Automation Error: No crash directory > (/mnt/sdcard/tests/profile/minidumps/) on remote device" (see comment 14; I > think these are startup crashes - either way I believe the right thing to do > is make these fatal and deal in another bug): > https://tbpl.mozilla.org/php/getParsedLog.php?id=17067851&tree=Try > https://tbpl.mozilla.org/php/getParsedLog.php?id=17067872&tree=Try > https://tbpl.mozilla.org/php/getParsedLog.php?id=17067832&tree=Try In fact, these look like bug 722166 perhaps.

Geoff Brown [:gbrown]

Comment 19

•

12 years ago

Comment on attachment 681916 [details] [diff] [review] Part 2: Robocop - non-zero return values shouldn't be overwritten by later success Review of attachment 681916 [details] [diff] [review]: ----------------------------------------------------------------- LGTM ::: testing/mochitest/runtestsremote.py @@ +497,5 @@ > try: > dm.recordLogcat() > + result = mochitest.runTests(options) > + # Ensure failures aren't obscured by subsequent passes > + if not retVal: nit: "if retVal is None", to be consistent with the check just below

Attachment #681916 - Flags: review+

Ed Morley [:emorley]

Assignee

Comment 20

•

12 years ago

(In reply to Geoff Brown [:gbrown] from comment #19) > Comment on attachment 681916 [details] [diff] [review] > Part 2: Robocop - non-zero return values shouldn't be overwritten by later > success > > nit: "if retVal is None", to be consistent with the check just below That unfortunately won't fix the problem being addressed by that patch - along the lines of comment 10, we should only overwrite retVal if the former value was either None (ie we haven't run any tests yet) or 0.

Gregory Szorc [:gps]

Comment 21

•

12 years ago

Comment on attachment 681926 [details] [diff] [review] Part 5: Callers of checkForCrashes should use its return value Review of attachment 681926 [details] [diff] [review]: ----------------------------------------------------------------- Patch looks good as is. Just some minor style nits. ::: build/automation.py.in @@ +1069,5 @@ > + > + crashed = self.checkForCrashes(profileDir, symbolsPath) > + > + if crashed: > + status = 1 No need for "crashed" variable. @@ +1075,5 @@ > if os.path.exists(processLog): > os.unlink(processLog) > > if self.IS_TEST_BUILD and runSSLTunnel: > ssltunnelProcess.kill() This script looks like it could use some context manager love! ::: build/mobile/b2gautomation.py @@ +103,5 @@ > try: > shutil.rmtree(dumpDir) > except: > print "WARNING: unable to remove directory: %s" % (dumpDir) > + return crashed See comment in remoteautomation.py regarding lingering temp directories and try..finally. ::: build/mobile/remoteautomation.py @@ +100,5 @@ > + try: > + shutil.rmtree(dumpDir) > + except: > + print "WARNING: unable to remove directory: %s" % dumpDir > + return crashed If getDirectory() or checkForCrashes() raises, the temporary directory will linger. I'd use a try..finally to ensure dumpDir is deleted. ::: testing/xpcshell/runxpcshelltests.py @@ +870,5 @@ > self.todoCount += 1 > xunitResult["todo"] = True > > + crashed = checkForCrashes(testdir, self.symbolsPath, testName=name) > + if crashed: This is the only use of "created" so I wouldn't bother with a variable.

Attachment #681926 - Flags: feedback?(gps) → feedback+

Ed Morley [:emorley]

Assignee

Comment 22

•

12 years ago

(In reply to Gregory Szorc [:gps] from comment #21) > ::: build/automation.py.in > @@ +1069,5 @@ > > + > > + crashed = self.checkForCrashes(profileDir, symbolsPath) > > + > > + if crashed: > > + status = 1 > > No need for "crashed" variable. This is added to in bug 812135 (whose patch applies after this), and whilst the variable is still only used once after the patches there, I think it reads more clearly like this. > @@ +1075,5 @@ > > if os.path.exists(processLog): > > os.unlink(processLog) > > > > if self.IS_TEST_BUILD and runSSLTunnel: > > ssltunnelProcess.kill() > > This script looks like it could use some context manager love! These lines weren't changed by the patch. I'll file another bug for this. > ::: build/mobile/b2gautomation.py > @@ +103,5 @@ > > try: > > shutil.rmtree(dumpDir) > > except: > > print "WARNING: unable to remove directory: %s" % (dumpDir) > > + return crashed > > See comment in remoteautomation.py regarding lingering temp directories and > try..finally. > > ::: build/mobile/remoteautomation.py > @@ +100,5 @@ > > + try: > > + shutil.rmtree(dumpDir) > > + except: > > + print "WARNING: unable to remove directory: %s" % dumpDir > > + return crashed > > If getDirectory() or checkForCrashes() raises, the temporary directory will > linger. I'd use a try..finally to ensure dumpDir is deleted. Other than indentation, these weren't changed by this patch, so I'll file another bug. > ::: testing/xpcshell/runxpcshelltests.py > @@ +870,5 @@ > > self.todoCount += 1 > > xunitResult["todo"] = True > > > > + crashed = checkForCrashes(testdir, self.symbolsPath, testName=name) > > + if crashed: > > This is the only use of "created" so I wouldn't bother with a variable. Since bug 812135 doesn't apply to runxpcshelltests.py (or at least, runxpcshelltests.py doesn't have an equivalent at the moment), I'll make this change.

Ed Morley [:emorley]

Assignee

Comment 23

•

12 years ago

(In reply to Ed Morley [:edmorley UTC+0] from comment #20) > (In reply to Geoff Brown [:gbrown] from comment #19) > > Comment on attachment 681916 [details] [diff] [review] > > Part 2: Robocop - non-zero return values shouldn't be overwritten by later > > success > > > > nit: "if retVal is None", to be consistent with the check just below > > That unfortunately won't fix the problem being addressed by that patch - > along the lines of comment 10, we should only overwrite retVal if the former > value was either None (ie we haven't run any tests yet) or 0. gbrown, instead of: "if not retVal:" ...would you rather: "if retVal is None or retVal == 0:" ...to be more explicit, and also consistent with the style lower down?

Ed Morley [:emorley]

Assignee

Comment 24

•

12 years ago

Comment on attachment 681918 [details] [diff] [review] Part 3: checkForCrashes should return true for cases where we hit an exception during minidump stackwalk (For all of these, see the comment where the attachment was added, for reasoning. [datetime link by attachment name should jump you to it])

Attachment #681918 - Flags: review?(jmaher)

Ed Morley [:emorley]

Assignee

Updated

•

12 years ago

Attachment #681919 - Flags: review?(jmaher)

Ed Morley [:emorley]

Assignee

Updated

•

12 years ago

Attachment #681926 - Flags: review?(jmaher)

Ed Morley [:emorley]

Assignee

•

12 years ago

Blocks: 813132

Andrew Halberstadt [:ahal]

Comment 29

•

12 years ago

Comment on attachment 681927 [details] [diff] [review] Part 6: B2G mochitests - Ensure retVal is set even when cleanup did not generate an exception Review of attachment 681927 [details] [diff] [review]: ----------------------------------------------------------------- Taking over trivial review as per request on irc

Attachment #681927 - Flags: review?(jmaher) → review+

Ed Morley [:emorley]

Assignee

Comment 30

•

12 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/f0a6c0c3cd8d https://hg.mozilla.org/integration/mozilla-inbound/rev/0ed85ba08942 https://hg.mozilla.org/integration/mozilla-inbound/rev/55ea556dffb9 https://hg.mozilla.org/integration/mozilla-inbound/rev/966b4616dfac https://hg.mozilla.org/integration/mozilla-inbound/rev/470669b03c2f https://hg.mozilla.org/integration/mozilla-inbound/rev/5cdd053b2788

status-firefox19: affected → fixed

Target Milestone: --- → mozilla19

Ed Morley [:emorley]

Assignee

Updated

•

12 years ago

status-firefox16: wontfix → ---

status-firefox17: affected → wontfix

status-firefox19: fixed → affected

status-firefox-esr17: --- → affected

Target Milestone: mozilla19 → mozilla20

Ed Morley [:emorley]

Assignee

Comment 31

•

12 years ago

(In reply to Geoff Brown [:gbrown] from comment #28) > Yes, I would prefer that. But it's a very minor issue -- change only if you > feel like it. Done :-) (In reply to Joel Maher (:jmaher) from comment #27) > to that note, we should have a try/except/finall around the entire > getDirectory/checkForCrashes code. Right now we only have it around > shutil.rmtree. Filed bug 813577. > ::: testing/xpcshell/runxpcshelltests.py > @@ +870,5 @@ > > self.todoCount += 1 > > xunitResult["todo"] = True > > > > + crashed = checkForCrashes(testdir, self.symbolsPath, testName=name) > > + if crashed: > > likewise 'if checkForCrashes(testdir, self.symbolsPath, testName=name):' Done. (In reply to Gregory Szorc [:gps] from comment #21) > @@ +1075,5 @@ > > if os.path.exists(processLog): > > os.unlink(processLog) > > > > if self.IS_TEST_BUILD and runSSLTunnel: > > ssltunnelProcess.kill() > > This script looks like it could use some context manager love! Filed bug 813576.

Ryan VanderMeulen [:RyanVM]

Comment 32

•

12 years ago

https://hg.mozilla.org/mozilla-central/rev/f0a6c0c3cd8d https://hg.mozilla.org/mozilla-central/rev/0ed85ba08942 https://hg.mozilla.org/mozilla-central/rev/55ea556dffb9 https://hg.mozilla.org/mozilla-central/rev/966b4616dfac https://hg.mozilla.org/mozilla-central/rev/470669b03c2f https://hg.mozilla.org/mozilla-central/rev/5cdd053b2788

Status: ASSIGNED → RESOLVED

Closed: 12 years ago

Resolution: --- → FIXED

Ed Morley [:emorley]

Assignee

Comment 33

•

12 years ago

Added bonus, this made bug 722166 starrable :-)

Blocks: 778688

Ed Morley [:emorley]

Assignee

Updated

•

12 years ago

Blocks: 816952

Ryan VanderMeulen [:RyanVM]

Updated

•

12 years ago

Blocks: 830046

Ryan VanderMeulen [:RyanVM]

Updated

•

12 years ago

No longer blocks: 830046

You need to log in before you can comment on or make changes to this bug.