Closed Bug 917252 Opened 11 years ago Closed 11 years ago

B2G mochitests should output a failure message when bug 906793 occurs

Categories

(Testing :: Mochitest, defect)

ARM
Gonk (Firefox OS)
defect
Not set
major

Tracking

(Not tracked)

RESOLVED FIXED
mozilla27

People

(Reporter: emorley, Assigned: ahal)

References

(Blocks 1 open bug)

Details

Attachments

(2 files)

In the case of bug 906793, the log output is:

https://tbpl.mozilla.org/php/getParsedLog.php?id=27960997&tree=Mozilla-Inbound

{
03:31:54     INFO -  12794 INFO TEST-PASS | /tests/dom/tests/mochitest/ajax/prototype/test_Prototype.html | testElementReplaceWithToHTMLMethod - 1 assertions, 0 failures, 0 errors
03:31:54     INFO -  12795 INFO TEST-PASS | /tests/dom/tests/mochitest/ajax/prototype/test_Prototype.html | testElementSelectorMethod - 6 assertions, 0 failures, 0 errors
03:31:54     INFO -  12796 INFO TEST-PASS | /tests/dom/tests/mochitest/ajax/prototype/test_Prototype.html | testElementAdjacent - 4 assertions, 0 failures, 0 errors
03:31:54     INFO -  12797 INFO TEST-PASS | /tests/dom/tests/mochitest/ajax/prototype/test_Prototype.html | testElementIdentify - 5 assertions, 0 failures, 0 errors
03:31:54     INFO -  12798 INFO TEST-PASS | /tests/dom/tests/mochitest/ajax/prototype/test_Prototype.html | testElementClassNameMethod - 4 assertions, 0 failures, 0 errors
03:37:26     INFO -  12799 INFO TEST-PASS | /tB2GRunner TEST-UNEXPECTED-FAIL | /tests/dom/tests/mochitest/ajax/prototype/test_Prototype.html | application timed out after None seconds with no output
03:38:29     INFO -  Mochitest INFO | runtestsb2g.py | Running tests: end.
03:38:46     INFO -  ests/dom/tests/mochitest/ajax/prototype/test_Prototype.html | testElementAncestors - 3 assertions, 0 failures, 0 errors
03:38:46     INFO -  12800 INFO TEST-PASS | /tests/dom/tests/mochitest/ajax/prototype/test_Prototype.html | testElementDescendants - 3 assertions, 0 failures, 0 errors
03:38:46     INFO -  12801 INFO TEST-PASS | /tests/dom/tests/mochitest/ajax/prototype/test_Prototype.html | testElementFirstDescendant - 2 assertions, 0 failures, 0 errors
03:38:46     INFO -  12802 INFO TEST-PASS | /tests/dom/tests/mochitest/ajax/prototype/test_Prototype.html | testElementChildElements - 4 assertions, 0 failures, 0 errors
03:38:46     INFO -  12803 INFO TEST-PASS | /tests/dom/tests/mochitest/ajax/prototype/test_Prototype.html | testElementImmediateDescendants - 1 assertions, 0 failures, 0 errors
03:38:46     INFO -  12804 INFO TEST-PASS | /tests/dom/tests/mochitest/ajax/prototype/test_Prototype.html | testElementPreviousSiblings - 3 assertions, 0 failures, 0 errors
03:38:46     INFO -  12805 INFO TEST-PASS | /tests/dom/tests/mochitest/ajax/prototype/test_Prototype.html | testElementNextSiblings - 3 assertions, 0 failures, 0 errors
03:38:46     INFO -  12806 INFO TEST-PASS | /tests/dom/tests/mochitest/ajax/prototype/test_Prototype.html | testElementSiblings - 2 assertions, 0 failures, 0 errors
03:38:46     INFO -  12807 INFO TEST-PASS | /tests/dom/tests/mochitest/ajax/prototype/test_Prototype.html | testElementUp - 10 assertions, 0 failures, 0 errors
03:38:46     INFO -  12808 INFO TEST-PASS | /tests/dom/tests/mochitest/ajax/prototype/test_Prototype.html | testElementDown - 9 assertions, 0 failures, 0 errors
03:38:46     INFO -  12809 INFO TEST-PASS | /tests/dom/tests/mochitest/ajax/prototype/test_Prototype.html | testElementPrevious - 7 assertions, 0 failures, 0 errors
03:38:46     INFO -  12810 INFO TEST-PASS | /tests/dom/tests/mochitest/ajax/prototype/test_Prototype.html | testElementNext - 8 assertions, 0 failures, 0 errors
03:38:46     INFO -  12811 INFO TEST-PASS | /tests/dom/tests/mochitest/ajax/prototype/test_Prototype.html | testElementInspect - 3 assertions, 0 failures, 0 errors
03:38:46     INFO -  12812 INFO TEST-PASS | /tests/dom/tests/mochitest/ajax/prototype/test_Prototype.html | testElementMakeClipping - 15 assertions, 0 failures, 0 errors
03:38:46     INFO -  12813 INFO TEST-PASS | /tests/dom/tests/mochitest/ajax/prototype/test_Prototype.html | testElementExtend - 151 assertions, 0 failures, 0 errors
03:38:46     INFO -  12814 INFO TEST-PASS | /tests/dom/tests/mochitest/ajax/prototype/test_Prototype.html | testElementExtendReextendsDiscardedNodes - 2 assertions, 0 failures, 0 errors
03:38:46     INFO -  12815 INFO TEST-PASS | /tests/dom/tests/mochitest/ajax/prototype/test_Prototype.html | testElementCleanWhitespace - 9 assertions, 0 failures, 0 errors
03:38:46     INFO -  12816 INFO TEST-PASS | /tests/dom/tests/mochitest/ajax/prototype/test_Prototype.html | testElementEmpty - 3 assertions, 0 failures, 0 errors
03:38:46     INFO -  12817 INFO TEST-PASS | /tests/dom/tests/mochitest/ajax/prototype/test_Prototype.html | testDescendantOf - 22 assertions, 0 failures, 0 errors
03:38:46     INFO - Return code: 0
03:38:46     INFO - dumping logcat
03:38:46     INFO - Running command: ['cat', '/builds/slave/test/build/emulator-5554.log']
03:38:46     INFO - Copy/paste: cat /builds/slave/test/build/emulator-5554.log
03:38:46     INFO -  09-17 10:10:57.495    35    35 I qemu-props: connected to 'boot-properties' qemud service.
03:38:46     INFO -  09-17 10:10:57.495    35    35 I qemu-props: receiving..
...
<snip>
...
03:38:53     INFO -  09-17 10:31:56.035   687   687 I GeckoDump: 12816 INFO TEST-PASS | /tests/dom/tests/mochitest/ajax/prototype/test_Prototype.html | testElementEmpty - 3 assertions, 0 failures, 0 errors
03:38:53     INFO -  09-17 10:31:56.035   687   687 I GeckoDump:
03:38:53     INFO -  09-17 10:31:56.135   687   687 I GeckoDump: 12817 INFO TEST-PASS | /tests/dom/tests/mochitest/ajax/prototype/test_Prototype.html | testDescendantOf - 22 assertions, 0 failures, 0 errors
03:38:53     INFO -  09-17 10:31:56.135   687   687 I GeckoDump:
03:38:53     INFO -  09-17 10:37:26.345    33    33 I ServiceManager: service 'media.resource_manager' died
03:38:53     INFO - Return code: 0
03:38:53     INFO - TinderboxPrint: mochitest<br/><em class="testfail">T-FAIL</em>
03:38:53  WARNING - # TBPL WARNING #
}

Other than the logcat "ServiceManager: service 'media.resource_manager' died", there is no other parsable indication as to why the job failed.

We need to:
* Make runtestsb2g.py (or similar) output a sensible error message.
* Make runtestsb2g.py not return code 0 (see just before logcat dump starts)
From runtestsb2g.py:
{
        try:
            runner_args = { 'profile': self.profile,
                            'devicemanager': self._dm,
                            'marionette': self.marionette,
                            'remote_test_root': self.remote_test_root,
                            'test_script': self.test_script,
                            'test_script_args': self.test_script_args }
            self.runner = B2GRunner(**runner_args)
            self.runner.start(outputTimeout=timeout)
            self.runner.wait()
        except KeyboardInterrupt:
            log.info("runtests.py | Received keyboard interrupt.\n");
            status = -1
        except:
            traceback.print_exc()
            log.error("Automation Error: Received unexpected exception while running application\n")
            status = 1

        self.stopWebServer(options)
        self.stopWebSocketServer(options)

        log.info("runtestsb2g.py | Running tests: end.")

        if manifest is not None:
            self.cleanup(manifest, options)
        return status
}

Unlike the other harnesses, we're relying on an exception to return anything other than zero (as opposed to returning the exit code directly). We need B2GRunner (mozrunner) to raise when we hit bug 906793 - and then either mozrunner or runtestsb2g.py to output a TBPL parsable error when this occurs.

jgriffin, could you find someone to own this? :-)
Flags: needinfo?(jgriffin)
Andrew, would you mind looking at this?
Flags: needinfo?(jgriffin) → needinfo?(ahalberstadt)
Ok so if I understand correctly.. B2GRunner needs to raise an exception when any generic timeout happens (as it can't detect what the cause of the timeout is at the moment).. or else the mochitest runner can be modified to look at the return code and B2GRunner can return 1.

Then if you want a specific message to this bug, rather than a generic timeout one, it'll either have to live in mozharness, or we'll need to modify the test runner to parse logcat (not too crazy about this approach.. though I don't know that logcat parsing really belongs in mozharness either).
Assignee: nobody → ahalberstadt
Status: NEW → ASSIGNED
Flags: needinfo?(ahalberstadt)
Any exception will be better than what we have at present (which is a failed run, but with nothing parsable in stdout, which results in TBPL's "no summary found") :-)
It seems like the real problem here is that we're exiting with return code 0, which I don't understand.  It's not actually timing out:

03:38:46     INFO -  12817 INFO TEST-PASS | /tests/dom/tests/mochitest/ajax/prototype/test_Prototype.html | testDescendantOf - 22 assertions, 0 failures, 0 errors
03:38:46     INFO - Return code: 0

But it's terminating before it finishes the tests.
We do get:
03:38:29     INFO -  Mochitest INFO | runtestsb2g.py | Running tests: end.

It's just higher up the log... seems like another case of stdout being interleaved.
(In reply to Ed Morley [:edmorley UTC+1] from comment #4)
> Any exception will be better than what we have at present (which is a failed
> run, but with nothing parsable in stdout, which results in TBPL's "no
> summary found") :-)

So currently there is an error being printed, search for "application timed out" in your log dump in comment 0. The problem is that it doesn't seem to wait for the process_handler to finish processing output from the test first. I'll take a look into this.

(In reply to Jonathan Griffin (:jgriffin) from comment #5)
> It seems like the real problem here is that we're exiting with return code
> 0, which I don't understand.  It's not actually timing out:
> 
> 03:38:46     INFO -  12817 INFO TEST-PASS |
> /tests/dom/tests/mochitest/ajax/prototype/test_Prototype.html |
> testDescendantOf - 22 assertions, 0 failures, 0 errors
> 03:38:46     INFO - Return code: 0
> 
> But it's terminating before it finishes the tests.

The return code 0 thing is a bug, if it times out then it should return 1. I think Mihnea actually fixed this already, but it hasn't been synced to m-c yet. If you look at bug 906793 though, you'll see a bunch of test runs where the timeout does happen in the middle of a run, e.g:
https://tbpl.mozilla.org/php/getParsedLog.php?id=26722508&tree=B2g-Inbound
(In reply to Andrew Halberstadt [:ahal] from comment #7)
> So currently there is an error being printed, search for "application timed
> out" in your log dump in comment 0. The problem is that it doesn't seem to
> wait for the process_handler to finish processing output from the test
> first. I'll take a look into this.

Ah sorry missed that, the parser didn't highlight it, since it exempts any lines containing TEST-PASS...

03:37:26     INFO -  12799 INFO TEST-PASS | /tB2GRunner TEST-UNEXPECTED-FAIL | /tests/dom/tests/mochitest/ajax/prototype/test_Prototype.html | application timed out after None seconds with no output
I don't understand why the "application timed out" message is being interleaved with the process output. Both the on_output and on_timeout methods are called from within the same thread, and on_timeout is only ever called after all the on_output processing has been finished..

on_output called here: https://github.com/mozilla/mozbase/blob/master/mozprocess/mozprocess/processhandler.py#L701
on_timeout called here: https://github.com/mozilla/mozbase/blob/master/mozprocess/mozprocess/processhandler.py#L708

There can only ever be one thread in that function. Furthermore, I tried creating an artificial timeout to reproduce locally, but no luck. The output is in sequential order as I'd expect it to be.

Jeff, I vaguely remember you working on (or talking about?) a similar issue back in the day. Do you understand how this behaviour might be possible? If not, don't worry about it, I'll keep digging.
Flags: needinfo?(jhammel)
For context the specific on_output and on_timeout callbacks in this case are in mozrunner: https://github.com/mozilla/mozbase/blob/master/mozrunner/mozrunner/remote.py#L184
There are several interleaving issues ticketed in the Testing::Mozbase component, but they're mostly windows so I believe they don't apply here (sorry, I can't dig through https://bugzilla.mozilla.org/buglist.cgi?resolution=---&component=Mozbase&product=Testing&list_id=7920704 right now). I'm not familiar enough end-to-end of how b2g testing is done to have a good guess, but there is to consider stdout/err flushing on the device and host OS, a missing '\n' ...somewhere in the chain, mozdevice, certainly mozprocess (the threading worries me, but I agree with Andrew and don't see anything obviously wrong) and maybe more I can't think of.

> 03:37:26 INFO - 12799 INFO TEST-PASS | /tB2GRunner TEST-UNEXPECTED-FAIL |
> /tests/dom/tests/mochitest/ajax/prototype/test_Prototype.html |
> application timed out after None seconds with no output

Timing out after None seconds confuses me....am I wrong to think that `None` means "never time out"?
Flags: needinfo?(jhammel)
(In reply to Jeff Hammel [:jhammel] from comment #11)
> Timing out after None seconds confuses me....am I wrong to think that `None`
> means "never time out"?

That's another bug that I've fixed in my local patch, unrelated to the issue at hand.

I don't think I'm going to be able to fix the interleaving problem without figuring out how to reproduce it locally. My only theory is that it has something to do with running through mozharness or not (in the mozharness case we have a subprocess within a subprocess.. though don't ask me why that would cause problems).

In the meantime I can at least make the return code non-zero if there was a timeout, and fix the None in the timeout message.
I still haven't been able to figure out the interleaving, but figured I might as well upload the fix to the timeout error message while I was thinking about it.
Attachment #809422 - Flags: review?(jhammel)
Comment on attachment 809422 [details] [diff] [review]
Patch 1.0 - fix timeout error message

wfm
Attachment #809422 - Flags: review?(jhammel) → review+
So like I said, I don't know what to do about the message interleaving, but I can at least fix the status code. This will give you the option of either looking for status code 124 or you could try searching the "application timed out" message (though it won't be at the start of the line).

I chose 124 because that is what the Unix 'timeout' command returns when a program times out. After quick googling, other programs seem to use 124 for timeouts as well.

Will this return value be sufficient to close out this bug? I'll file another one for the interleaving issue if that's the case.
Attachment #809432 - Flags: review?(emorley)
Note, the previous patch will depend on https://github.com/mozilla/mozbase/commit/968f70bf99f6a4a82630569cc4d9dab0f11d17db getting synced to m-c.
(In reply to Andrew Halberstadt [:ahal] from comment #16)
> Note, the previous patch will depend on
> https://github.com/mozilla/mozbase/commit/
> 968f70bf99f6a4a82630569cc4d9dab0f11d17db getting synced to m-c.

I'm working on bug 917750 fwiw
Excellent, thanks!
Depends on: 917750
Comment on attachment 809432 [details] [diff] [review]
Patch 1.1 (m-c) - return status code from b2g mochitests

Thank you :-)
Attachment #809432 - Flags: review?(emorley) → review+
Blocks: 919353
I just cherry picked the necessary changes into m-c in bug 921029 instead of waiting for the full sync.

https://hg.mozilla.org/integration/mozilla-inbound/rev/4e93e321909d
Depends on: 921029
No longer depends on: 917750
https://hg.mozilla.org/mozilla-central/rev/4e93e321909d
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla27
Depends on: 937684
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: