Closed
Bug 808410
Opened 12 years ago
Closed 12 years ago
Android shutdown crashes on mochitest/crashtest/reftest/jsreftest + all crashes on robocop are green on TBPL
Categories
(Testing :: General, defect)
Tracking
(firefox17 wontfix, firefox18 affected, firefox19 affected, firefox-esr10 wontfix, firefox-esr17 affected)
RESOLVED
FIXED
mozilla20
People
(Reporter: emorley, Assigned: emorley)
References
(Blocks 1 open bug)
Details
Attachments
(6 files)
1.09 KB,
patch
|
jmaher
:
review+
|
Details | Diff | Splinter Review |
1.39 KB,
patch
|
gbrown
:
review+
|
Details | Diff | Splinter Review |
1.73 KB,
patch
|
jmaher
:
review+
|
Details | Diff | Splinter Review |
2.23 KB,
patch
|
jmaher
:
review+
|
Details | Diff | Splinter Review |
7.02 KB,
patch
|
jmaher
:
review+
gps
:
feedback+
|
Details | Diff | Splinter Review |
805 bytes,
patch
|
ahal
:
review+
|
Details | Diff | Splinter Review |
Yes, you read that correctly \o/
The fix for this will need backporting to all branches, since this has been present since the beginning of remote reftest (2010).
Examples:
Android Tegra 250 mozilla-release opt test reftest-1 on 2012-10-23 10:46:30 PDT for push 774491e9f207
slave: tegra-149
results: success (0)
...but shutdown crash in [@ nssCertificate_Destroy]
...remotereftest.py finished with exit code 0
https://tbpl.mozilla.org/php/getParsedLog.php?id=16386302&full=1&branch=mozilla-release
Android Armv6 Tegra 250 mozilla-inbound opt test crashtest-3 on 2012-11-04 03:50:10 PST for push 58b4bd7b5065
slave: tegra-274
results: success (0)
...but shutdown crash in [@ nssCertificate_Destroy]
...remotereftest.py finished with exit code 0
https://tbpl.mozilla.org/php/getParsedLog.php?id=16736472&full=1&branch=mozilla-inbound
Android Armv6 Tegra 250 mozilla-inbound opt test crashtest-3 on 2012-11-03 20:46:11 PDT for push 38b97d2a1bc5
slave: tegra-277
results: success (0)
...but shutdown crash in [@ nssCertificate_Destroy]
...remotereftest.py finished with exit code 0
https://tbpl.mozilla.org/php/getParsedLog.php?id=16733514&full=1&branch=mozilla-inbound
Background...
Buildbotcustom's unittest.py does:
> # Assume that having the "Unexpected: 0" line
> # means the tests run completed (successfully).
> # Also check for "^TEST-UNEXPECTED-" for harness errors.
> if not re.search(r"^REFTEST INFO \| Unexpected: 0 \(", log, re.MULTILINE) or \
> re.search("^TEST-UNEXPECTED-", log, re.MULTILINE):
> return worst_status(superResult, WARNINGS)
However, reftest.js' summary appended to the log doesn't specifically look for crashes, which means that in the case of a shutdown crash, we get a summary like:
> REFTEST INFO | Result summary:
> REFTEST INFO | Successful: 704 (1 pass, 703 load only)
> REFTEST INFO | Unexpected: 0 (0 unexpected fail, 0 unexpected pass, 0 unexpected asserts, 0 unexpected fixed asserts, 0 failed load, 0 exception)
> REFTEST INFO | Known problems: 10 (0 known fail, 0 known asserts, 0 random, 10 skipped, 0 slow)
> REFTEST INFO | Total canvas count = 2
> REFTEST TEST-START | Shutdown
>
> INFO | automation.py | Application ran for: 0:05:16.468885
> INFO | automation.py | Reading PID log: /tmp/tmpBwFVijpidlog
> getting files in '/mnt/sdcard/tests/reftest/profile/minidumps/'
> Downloading symbols from: http://ftp.mozilla.org/pub/mozilla.org/mobile/tinderbox-builds/mozilla-inbound-android-armv6/1352026305/fennec-19.0a1.en-US.android-arm-armv6.crashreporter-symbols.zip
> PROCESS-CRASH | Shutdown | application crashed (minidump found)
...which means the unittest.py snippet above relies on superResult (~the exit code) only. However remotereftest.py unfortunately does:
> retVal = 0
> try:
> cmdlineArgs = ["-reftest", manifest]
> if options.bootstrap:
> cmdlineArgs = []
> dm.recordLogcat()
> reftest.runTests(manifest, options, cmdlineArgs)
> except:
> print "Automation Error: Exception caught while running tests"
> traceback.print_exc()
> retVal = 1
>
> reftest.stopWebServer(options)
> try:
> logcat = dm.getLogcat()
> print ''.join(logcat[-500:-1])
> print dm.getInfo()
> except devicemanager.DMError:
> print "WARNING: Error getting device information at end of test"
>
> return retVal
...so unless we get an exception (which we don't in the case of crashes) we'll exit with code 0; causing buildbot (and thus TBPL) to treat the run as a pass.
Compare with...
runreftest.py:
line 288 -- sys.exit(reftest.runTests(args[0], options))
runreftestb2g.py
line 548 -- retVal = reftest.runTests(manifest, options, cmdlineArgs)
I've checked the other harnesses and none of the others seem to need fixing (http://mxr.mozilla.org/mozilla-central/search?string=.runTests&case=1&find=\.py&findi=\.js&filter=^[^\0]*%24&hitlimit=&tree=mozilla-central).
Assignee | ||
Comment 1•12 years ago
|
||
> However, reftest.js' summary appended to the log doesn't specifically look for
> crashes, which means that in the case of a shutdown crash, we get a summary like:
Bah brain fail, that's an orthogonal issue; meant to say something like:
However, in the case of a shutdown crash, we get a summary like:
> REFTEST INFO | Result summary:
> REFTEST INFO | Successful: 704 (1 pass, 703 load only)
> REFTEST INFO | Unexpected: 0 (0 unexpected fail, 0 unexpected pass, 0 unexpected asserts, 0 unexpected fixed asserts, 0 failed load, 0 exception)
...
Assignee | ||
Comment 2•12 years ago
|
||
Assignee | ||
Comment 4•12 years ago
|
||
Assignee | ||
Comment 5•12 years ago
|
||
Hmm I'm still getting exit code 0:
https://tbpl.mozilla.org/php/getParsedLog.php?id=16743181&full=1&branch=try
Comment 6•12 years ago
|
||
Comment on attachment 678132 [details] [diff] [review]
Part 1: remotereftest.py should use runTests return value
Review of attachment 678132 [details] [diff] [review]:
-----------------------------------------------------------------
good catch.
Attachment #678132 -
Flags: review?(jmaher) → review+
Assignee | ||
Comment 7•12 years ago
|
||
(In reply to Ed Morley [:edmorley UTC+0] from comment #5)
> Hmm I'm still getting exit code 0:
> https://tbpl.mozilla.org/php/getParsedLog.php?id=16743181&full=1&branch=try
Due to bug 810405 comment 1.
I'll attach further patches here.
Assignee | ||
Updated•12 years ago
|
Attachment #678132 -
Attachment description: Patch v1 → Part 1: remotereftest.py should use runTests return value
Assignee | ||
Comment 9•12 years ago
|
||
Ok, so as the duped bugs have shown, this doesn't just affect reftest variants.
We miss Android shutdown crashes on everything but talos, and on robocop we even miss non-shutdown crashes too \o/
This is due to a combination of:
* remotereftest.py not using the return value from runTests (dealt with by patch part 1).
* The return value for robocop tests being overwritten after each test run, so unless the last test failed, we miss earlier failures. (For non-crashes this doesn't matter, since we do an end-of-overall-run log parse + summary (printLog), where failures can still change the return value. However printLog does not look for PROCESS-CRASH lines, but I'll deal with this in another bug, since with the other fixes here, it doesn't affect buildbo/TBPL job status.)
* checkForCrashes returning false when we hit an exception partway through, since we don't set foundCrash to True until the very end.
* Callers of checkForCrashes() don't use its return value as part of determining the overall exit code.
In addition, I spotted:
* mochitest/runtestsb2g.py possibly doesn't always set the retVal when the final mochitest.cleanup succeeds without exception.
* We fail to print the overall robocop run summary unless the return value was 0.
* checkForZombies() doesn't return the result (unlike checkForCrashes) & callers don't use it to determine the final exit value.
Summary: remotereftest.py ignores runTests() exit code, so Android crashtest/reftest/jsreftest shutdown crashes are green on TBPL → Android shutdown crashes on mochitest/crashtest/reftest/jsreftest + all crashes on robocop are green on TBPL
Assignee | ||
Updated•12 years ago
|
Component: Reftest → General
Assignee | ||
Comment 10•12 years ago
|
||
Only sets retVal if it is at its initial 'None' value or else 0. If we have multiple failures during the robocop run, it will remain at the first non-zero return value.
Assignee | ||
Comment 11•12 years ago
|
||
If there were do dumps found we return early, so no need to use foundCrash (and fail to set it if we hit an exception).
Assignee | ||
Comment 12•12 years ago
|
||
(In reply to Ed Morley [:edmorley UTC+0] from comment #11)
> If there were do dumps found we return early,
That should be "no dumps".
Assignee | ||
Comment 13•12 years ago
|
||
Whilst peptest's checkForCrashes sets foundCrash earlier (and so doesn't have the same issue as in comment 11), we might as well do similar cleanup here.
Assignee | ||
Comment 14•12 years ago
|
||
...to ensure shutdown crashes actually make the run fail.
Note: The remoteautomation.py changes effectively make the "WARNING: unable to remove directory: <minidumps>" fatal, since the comment alongside implied that it should be. Whilst testing this patch, I have seen robocop runs that hit this, but I suspect they are a real issue that needs to be looked at separately (ie startup crashes that we've been missing until now?).
gps, please may I have feedback on the runxpcshelltests.py part (notably the xunitResult format).
Attachment #681926 -
Flags: feedback?(gps)
Assignee | ||
Comment 15•12 years ago
|
||
If for some reason we hit an exception during runTests, but the later mochitest.cleanup call doesn't cause an exception, we may not set retVal. I can't see us hitting this very often, but just in case.
Assignee | ||
Comment 16•12 years ago
|
||
Try run looks good:
https://tbpl.mozilla.org/?tree=Try&rev=679262f1068d
Example crashes on the Try run, that we now correctly catch...
Reftest shutdown crash:
https://tbpl.mozilla.org/php/getParsedLog.php?id=17068582&tree=Try
Robocop crash during testBookmarksTab:
https://tbpl.mozilla.org/php/getParsedLog.php?id=17068026&tree=Try
Example "Automation Error: No crash directory (/mnt/sdcard/tests/profile/minidumps/) on remote device" (see comment 14; I think these are startup crashes - either way I believe the right thing to do is make these fatal and deal in another bug):
https://tbpl.mozilla.org/php/getParsedLog.php?id=17067851&tree=Try
https://tbpl.mozilla.org/php/getParsedLog.php?id=17067872&tree=Try
https://tbpl.mozilla.org/php/getParsedLog.php?id=17067832&tree=Try
Assignee | ||
Comment 17•12 years ago
|
||
(In reply to Ed Morley [:edmorley UTC+0] from comment #16)
> I think these are startup crashes - either way I believe the right thing to
> do is make these fatal
(which is what the part 5 patch does)
Assignee | ||
Comment 18•12 years ago
|
||
(In reply to Ed Morley [:edmorley UTC+0] from comment #16)
> Example "Automation Error: No crash directory
> (/mnt/sdcard/tests/profile/minidumps/) on remote device" (see comment 14; I
> think these are startup crashes - either way I believe the right thing to do
> is make these fatal and deal in another bug):
> https://tbpl.mozilla.org/php/getParsedLog.php?id=17067851&tree=Try
> https://tbpl.mozilla.org/php/getParsedLog.php?id=17067872&tree=Try
> https://tbpl.mozilla.org/php/getParsedLog.php?id=17067832&tree=Try
In fact, these look like bug 722166 perhaps.
Comment 19•12 years ago
|
||
Comment on attachment 681916 [details] [diff] [review]
Part 2: Robocop - non-zero return values shouldn't be overwritten by later success
Review of attachment 681916 [details] [diff] [review]:
-----------------------------------------------------------------
LGTM
::: testing/mochitest/runtestsremote.py
@@ +497,5 @@
> try:
> dm.recordLogcat()
> + result = mochitest.runTests(options)
> + # Ensure failures aren't obscured by subsequent passes
> + if not retVal:
nit: "if retVal is None", to be consistent with the check just below
Attachment #681916 -
Flags: review+
Assignee | ||
Comment 20•12 years ago
|
||
(In reply to Geoff Brown [:gbrown] from comment #19)
> Comment on attachment 681916 [details] [diff] [review]
> Part 2: Robocop - non-zero return values shouldn't be overwritten by later
> success
>
> nit: "if retVal is None", to be consistent with the check just below
That unfortunately won't fix the problem being addressed by that patch - along the lines of comment 10, we should only overwrite retVal if the former value was either None (ie we haven't run any tests yet) or 0.
Comment 21•12 years ago
|
||
Comment on attachment 681926 [details] [diff] [review]
Part 5: Callers of checkForCrashes should use its return value
Review of attachment 681926 [details] [diff] [review]:
-----------------------------------------------------------------
Patch looks good as is. Just some minor style nits.
::: build/automation.py.in
@@ +1069,5 @@
> +
> + crashed = self.checkForCrashes(profileDir, symbolsPath)
> +
> + if crashed:
> + status = 1
No need for "crashed" variable.
@@ +1075,5 @@
> if os.path.exists(processLog):
> os.unlink(processLog)
>
> if self.IS_TEST_BUILD and runSSLTunnel:
> ssltunnelProcess.kill()
This script looks like it could use some context manager love!
::: build/mobile/b2gautomation.py
@@ +103,5 @@
> try:
> shutil.rmtree(dumpDir)
> except:
> print "WARNING: unable to remove directory: %s" % (dumpDir)
> + return crashed
See comment in remoteautomation.py regarding lingering temp directories and try..finally.
::: build/mobile/remoteautomation.py
@@ +100,5 @@
> + try:
> + shutil.rmtree(dumpDir)
> + except:
> + print "WARNING: unable to remove directory: %s" % dumpDir
> + return crashed
If getDirectory() or checkForCrashes() raises, the temporary directory will linger. I'd use a try..finally to ensure dumpDir is deleted.
::: testing/xpcshell/runxpcshelltests.py
@@ +870,5 @@
> self.todoCount += 1
> xunitResult["todo"] = True
>
> + crashed = checkForCrashes(testdir, self.symbolsPath, testName=name)
> + if crashed:
This is the only use of "created" so I wouldn't bother with a variable.
Attachment #681926 -
Flags: feedback?(gps) → feedback+
Assignee | ||
Comment 22•12 years ago
|
||
(In reply to Gregory Szorc [:gps] from comment #21)
> ::: build/automation.py.in
> @@ +1069,5 @@
> > +
> > + crashed = self.checkForCrashes(profileDir, symbolsPath)
> > +
> > + if crashed:
> > + status = 1
>
> No need for "crashed" variable.
This is added to in bug 812135 (whose patch applies after this), and whilst the variable is still only used once after the patches there, I think it reads more clearly like this.
> @@ +1075,5 @@
> > if os.path.exists(processLog):
> > os.unlink(processLog)
> >
> > if self.IS_TEST_BUILD and runSSLTunnel:
> > ssltunnelProcess.kill()
>
> This script looks like it could use some context manager love!
These lines weren't changed by the patch. I'll file another bug for this.
> ::: build/mobile/b2gautomation.py
> @@ +103,5 @@
> > try:
> > shutil.rmtree(dumpDir)
> > except:
> > print "WARNING: unable to remove directory: %s" % (dumpDir)
> > + return crashed
>
> See comment in remoteautomation.py regarding lingering temp directories and
> try..finally.
>
> ::: build/mobile/remoteautomation.py
> @@ +100,5 @@
> > + try:
> > + shutil.rmtree(dumpDir)
> > + except:
> > + print "WARNING: unable to remove directory: %s" % dumpDir
> > + return crashed
>
> If getDirectory() or checkForCrashes() raises, the temporary directory will
> linger. I'd use a try..finally to ensure dumpDir is deleted.
Other than indentation, these weren't changed by this patch, so I'll file another bug.
> ::: testing/xpcshell/runxpcshelltests.py
> @@ +870,5 @@
> > self.todoCount += 1
> > xunitResult["todo"] = True
> >
> > + crashed = checkForCrashes(testdir, self.symbolsPath, testName=name)
> > + if crashed:
>
> This is the only use of "created" so I wouldn't bother with a variable.
Since bug 812135 doesn't apply to runxpcshelltests.py (or at least, runxpcshelltests.py doesn't have an equivalent at the moment), I'll make this change.
Assignee | ||
Comment 23•12 years ago
|
||
(In reply to Ed Morley [:edmorley UTC+0] from comment #20)
> (In reply to Geoff Brown [:gbrown] from comment #19)
> > Comment on attachment 681916 [details] [diff] [review]
> > Part 2: Robocop - non-zero return values shouldn't be overwritten by later
> > success
> >
> > nit: "if retVal is None", to be consistent with the check just below
>
> That unfortunately won't fix the problem being addressed by that patch -
> along the lines of comment 10, we should only overwrite retVal if the former
> value was either None (ie we haven't run any tests yet) or 0.
gbrown, instead of:
"if not retVal:"
...would you rather:
"if retVal is None or retVal == 0:"
...to be more explicit, and also consistent with the style lower down?
Assignee | ||
Comment 24•12 years ago
|
||
Comment on attachment 681918 [details] [diff] [review]
Part 3: checkForCrashes should return true for cases where we hit an exception during minidump stackwalk
(For all of these, see the comment where the attachment was added, for reasoning. [datetime link by attachment name should jump you to it])
Attachment #681918 -
Flags: review?(jmaher)
Assignee | ||
Updated•12 years ago
|
Attachment #681919 -
Flags: review?(jmaher)
Assignee | ||
Updated•12 years ago
|
Attachment #681926 -
Flags: review?(jmaher)
Assignee | ||
Updated•12 years ago
|
Attachment #681927 -
Flags: review?(jmaher)
Comment 25•12 years ago
|
||
Comment on attachment 681918 [details] [diff] [review]
Part 3: checkForCrashes should return true for cases where we hit an exception during minidump stackwalk
Review of attachment 681918 [details] [diff] [review]:
-----------------------------------------------------------------
this simplifies the code a bit, thanks!
Attachment #681918 -
Flags: review?(jmaher) → review+
Comment 26•12 years ago
|
||
Comment on attachment 681919 [details] [diff] [review]
Part 4: Make peptest's checkForCrashes more consistent with that in automationutils.py
Review of attachment 681919 [details] [diff] [review]:
-----------------------------------------------------------------
lgtm!
Attachment #681919 -
Flags: review?(jmaher) → review+
Comment 27•12 years ago
|
||
Comment on attachment 681926 [details] [diff] [review]
Part 5: Callers of checkForCrashes should use its return value
Review of attachment 681926 [details] [diff] [review]:
-----------------------------------------------------------------
::: build/mobile/remoteautomation.py
@@ +100,5 @@
> + try:
> + shutil.rmtree(dumpDir)
> + except:
> + print "WARNING: unable to remove directory: %s" % dumpDir
> + return crashed
to that note, we should have a try/except/finall around the entire getDirectory/checkForCrashes code. Right now we only have it around shutil.rmtree.
::: testing/xpcshell/runxpcshelltests.py
@@ +870,5 @@
> self.todoCount += 1
> xunitResult["todo"] = True
>
> + crashed = checkForCrashes(testdir, self.symbolsPath, testName=name)
> + if crashed:
likewise 'if checkForCrashes(testdir, self.symbolsPath, testName=name):'
Attachment #681926 -
Flags: review?(jmaher) → review+
Comment 28•12 years ago
|
||
(In reply to Ed Morley [:edmorley UTC+0] from comment #23)
> (In reply to Ed Morley [:edmorley UTC+0] from comment #20)
> > (In reply to Geoff Brown [:gbrown] from comment #19)
> > > nit: "if retVal is None", to be consistent with the check just below
> >
> > That unfortunately won't fix the problem being addressed by that patch -
> > along the lines of comment 10, we should only overwrite retVal if the former
> > value was either None (ie we haven't run any tests yet) or 0.
>
> gbrown, instead of:
> "if not retVal:"
> ...would you rather:
> "if retVal is None or retVal == 0:"
> ...to be more explicit, and also consistent with the style lower down?
Yes, I would prefer that. But it's a very minor issue -- change only if you feel like it.
Comment 29•12 years ago
|
||
Comment on attachment 681927 [details] [diff] [review]
Part 6: B2G mochitests - Ensure retVal is set even when cleanup did not generate an exception
Review of attachment 681927 [details] [diff] [review]:
-----------------------------------------------------------------
Taking over trivial review as per request on irc
Attachment #681927 -
Flags: review?(jmaher) → review+
Assignee | ||
Comment 30•12 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/f0a6c0c3cd8d
https://hg.mozilla.org/integration/mozilla-inbound/rev/0ed85ba08942
https://hg.mozilla.org/integration/mozilla-inbound/rev/55ea556dffb9
https://hg.mozilla.org/integration/mozilla-inbound/rev/966b4616dfac
https://hg.mozilla.org/integration/mozilla-inbound/rev/470669b03c2f
https://hg.mozilla.org/integration/mozilla-inbound/rev/5cdd053b2788
Target Milestone: --- → mozilla19
Assignee | ||
Updated•12 years ago
|
status-firefox16:
wontfix → ---
status-firefox-esr17:
--- → affected
Target Milestone: mozilla19 → mozilla20
Assignee | ||
Comment 31•12 years ago
|
||
(In reply to Geoff Brown [:gbrown] from comment #28)
> Yes, I would prefer that. But it's a very minor issue -- change only if you
> feel like it.
Done :-)
(In reply to Joel Maher (:jmaher) from comment #27)
> to that note, we should have a try/except/finall around the entire
> getDirectory/checkForCrashes code. Right now we only have it around
> shutil.rmtree.
Filed bug 813577.
> ::: testing/xpcshell/runxpcshelltests.py
> @@ +870,5 @@
> > self.todoCount += 1
> > xunitResult["todo"] = True
> >
> > + crashed = checkForCrashes(testdir, self.symbolsPath, testName=name)
> > + if crashed:
>
> likewise 'if checkForCrashes(testdir, self.symbolsPath, testName=name):'
Done.
(In reply to Gregory Szorc [:gps] from comment #21)
> @@ +1075,5 @@
> > if os.path.exists(processLog):
> > os.unlink(processLog)
> >
> > if self.IS_TEST_BUILD and runSSLTunnel:
> > ssltunnelProcess.kill()
>
> This script looks like it could use some context manager love!
Filed bug 813576.
Comment 32•12 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/f0a6c0c3cd8d
https://hg.mozilla.org/mozilla-central/rev/0ed85ba08942
https://hg.mozilla.org/mozilla-central/rev/55ea556dffb9
https://hg.mozilla.org/mozilla-central/rev/966b4616dfac
https://hg.mozilla.org/mozilla-central/rev/470669b03c2f
https://hg.mozilla.org/mozilla-central/rev/5cdd053b2788
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•