Closed Bug 809753 Opened 7 years ago Closed 7 years ago

Intermittent reftest shutdown Automation Error: Exception caught while running tests

Categories

(Testing :: General, defect)

ARM
Android
defect
Not set

Tracking

(firefox19 fixed)

RESOLVED FIXED
mozilla20
Tracking Status
firefox19 --- fixed

People

(Reporter: philor, Assigned: gbrown)

References

Details

(Keywords: intermittent-failure)

Attachments

(1 file)

+++ This bug was initially created as a clone of Bug #799334 +++

https://tbpl.mozilla.org/php/getParsedLog.php?id=16847588&tree=Firefox
Android Armv6 Tegra 250 mozilla-central opt test reftest-4 on 2012-11-07 18:56:55 PST for push 00cd00ba0ac2
slave: tegra-262

REFTEST TEST-START | Shutdown

Automation Error: Exception caught while running tests
Meant to mention: we crash on shutdown in reftests a lot without noticing it, in NSS, and what may have been a previous version of this same message was frequently associated with uncaught and barely caught ("crashed, can't be bothered to even try to show you a stack") shutdown crashes.
The log in comment 1 has:

  File "reftest/remotereftest.py", line 445, in main
    reftest.runTests(manifest, options, cmdlineArgs)
  File "/builds/tegra-262/test/build/tests/reftest/runreftest.py", line 135, in runTests
    timeout=options.timeout + 30.0)
  File "/builds/tegra-262/test/build/tests/reftest/automation.py", line 1050, in runApp
    status = self.waitForFinish(proc, utilityPath, timeout, maxTime, startTime, debuggerInfo, symbolsPath)
  File "/builds/tegra-262/test/build/tests/reftest/remoteautomation.py", line 77, in waitForFinish
    "allowed maximum time of %d seconds" % (self.lastTestSeen, int(maxTime))
TypeError: int() argument must be a string or a number, not 'NoneType'
36 (98%)
Caused by the patch for bug 808419. 

For remote reftests, maxTime is normally None.
(In reply to Geoff Brown [:gbrown] from comment #2)
> The log in comment 1 has:
> 
>   File "reftest/remotereftest.py", line 445, in main
>     reftest.runTests(manifest, options, cmdlineArgs)
>   File "/builds/tegra-262/test/build/tests/reftest/runreftest.py", line 135,
> in runTests
>     timeout=options.timeout + 30.0)
>   File "/builds/tegra-262/test/build/tests/reftest/automation.py", line
> 1050, in runApp
>     status = self.waitForFinish(proc, utilityPath, timeout, maxTime,
> startTime, debuggerInfo, symbolsPath)
>   File "/builds/tegra-262/test/build/tests/reftest/remoteautomation.py",
> line 77, in waitForFinish
>     "allowed maximum time of %d seconds" % (self.lastTestSeen, int(maxTime))
> TypeError: int() argument must be a string or a number, not 'NoneType'
> 36 (98%)

It looks like maxTime is defined to None here, but we're trying to cast into an integer. I did some digging, and it looks like the whole notion of timeouts in remote tests like these is a big mess (we define timeouts both in the high-level automation.py.in and in the process abstraction we use to wrap around the remote device interactions).

The easy fix for better error reporting here would be to check if maxTime is None and report a different error message if so.

The more complicated fix for better error reporting would be to fix mochitest so that we only define things in one place (maybe not worth it in and of itself).

In either case, I guess we still have the timeouts to worry about...
As a very-easy fix, I think this might work:

>     "allowed maximum time of %s seconds" % (self.lastTestSeen, str(maxTime))
(In reply to Geoff Brown [:gbrown] from comment #5)
> As a very-easy fix, I think this might work:
> 
> >     "allowed maximum time of %s seconds" % (self.lastTestSeen, str(maxTime))

The cast to str is not necessary.

>>> "%s" % None
'None'
>>> "%s" % 1
'1'

Also, in the case where maxTime is None, you'll get a message like "allowed maximum time of None seconds", which is pretty confusing. I think it would be better to do something like:

if maxTime:
  print "... allowed maximum time of %s seconds"
else:
  print "... allowed maximum time"
(In reply to William Lachance (:wlach) from comment #4)
> I did some digging, and it looks like the whole notion of
> timeouts in remote tests like these is a big mess (we define timeouts both
> in the high-level automation.py.in and in the process abstraction we use to
> wrap around the remote device interactions).

I've spent the last few days coming to the same conclusion; I'm collating a bunch of stuff and intend to file bugs on straightening everything out, when I'm back from PTO.
Whiteboard: [orange]
I agree with comment 7 -- there's a bunch of work to do regarding time-outs.  While that is being sorted out, let's fix the message...

This uses :wlach's suggestion from comment 6.
Assignee: nobody → gbrown
Attachment #686181 - Flags: review?(edmorley.bugzilla)
Attachment #686181 - Flags: review?(edmorley.bugzilla) → review+
https://hg.mozilla.org/mozilla-central/rev/99216975f48b
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla20
Blocks: 816501
You need to log in before you can comment on or make changes to this bug.