Closed
Bug 875814
Opened 12 years ago
Closed 11 years ago
Hung/long-running android xpcshell tests fail with "AttributeError: 'NoneType' object has no attribute 'pid'"
Categories
(Testing :: XPCShell Harness, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
mozilla30
People
(Reporter: gbrown, Assigned: gbrown)
References
Details
Attachments
(1 file, 1 obsolete file)
5.28 KB,
patch
|
jmaher
:
review+
|
Details | Diff | Splinter Review |
We probably need to set proc.pid or change runxpcshelltests.py to not reference it, at least for remote tests.
TEST-INFO | /builds/tegra-092/test/build/tests/xpcshell/tests/toolkit/mozapps/extensions/test/xpcshell/test_backgroundupdate.js | running test ...
Exception in thread Thread-294:
Traceback (most recent call last):
File "/tools/python27/lib/python2.7/threading.py", line 551, in __bootstrap_inner
self.run()
File "/tools/python27/lib/python2.7/threading.py", line 755, in run
self.function(*self.args, **self.kwargs)
File "/builds/tegra-092/test/build/tests/xpcshell/runxpcshelltests.py", line 895, in <lambda>
testTimer = Timer(HARNESS_TIMEOUT, lambda: self.testTimeout(name, proc.pid))
AttributeError: 'NoneType' object has no attribute 'pid'
Traceback (most recent call last):
File "xpcshell/remotexpcshelltests.py", line 503, in <module>
main()
File "xpcshell/remotexpcshelltests.py", line 498, in main
**options.__dict__):
File "/builds/tegra-092/test/build/tests/xpcshell/runxpcshelltests.py", line 800, in runTests
pStderr=pStderr, keep_going=keepGoing)
File "/builds/tegra-092/test/build/tests/xpcshell/runxpcshelltests.py", line 907, in run_test
stdout=pStdout, stderr=pStderr, env=self.env, cwd=test_dir)
File "xpcshell/remotexpcshelltests.py", line 309, in launchProcess
self.shellReturnCode = self.device.shell(cmd, f)
File "/builds/tegra-092/test/build/tests/xpcshell/devicemanagerSUT.py", line 339, in shell
raise DMError("Automation Error: Error finding end of line/return value when running '%s'" % cmdline)
devicemanager.DMError: Automation Error: Error finding end of line/return value when running '/data/local/xpcb/xpcw /mnt/sdcard/tests/xpcshell/toolkit/mozapps/extensions/test/xpcshell -r /mnt/sdcard/tests/xpcshell/c/httpd.manifest --greomni /data/local/xpcb/fennec-24.0a1.en-US.android-arm.apk -m -n -s -e 'const _HTTPD_JS_PATH = "/mnt/sdcard/tests/xpcshell/c/httpd.js";' -e 'const _HEAD_JS_PATH = "/mnt/sdcard/tests/xpcshell/head.js";' -e 'const _TESTING_MODULES_DIR = "/mnt/sdcard/tests/xpcshell/m";' -f /mnt/sdcard/tests/xpcshell/head.js -e 'const _SERVER_ADDR = "localhost"' -e 'const _HEAD_FILES = ["/mnt/sdcard/tests/xpcshell/toolkit/mozapps/extensions/test/xpcshell/head_addons.js"];' -e 'const _TAIL_FILES = [];' -e 'const _TEST_FILE = ["test_backgroundupdate.js"];' -e '_execute_test(); quit(0);''
program finished with exit code 1
Assignee | ||
Comment 2•11 years ago
|
||
This is promising in that it allows us to avoid the proc.pid failure on android without affecting the desktop timeout handling. However, it exposes more problems for android timeout handling: The harness timeout is the same as the device manager timeout; relying on the device manager timeout leaves the process executing; killing the process from the harness timeout handling results in a device manager exception when launchProcess tries to parse the return code from sut -- resulting in infinite retries, just as bad as the current situation.
Comment 3•11 years ago
|
||
I assume you just want to do something like we do with Mochitest, where we have two layers of timeouts:
http://mxr.mozilla.org/mozilla-central/source/testing/mochitest/runtests.py#589
The JS harness has its own internal timeout, and the Python harness has that + 30s to fall back to.
If device manager times out, can we detect that and then kill the process from Python?
Assignee | ||
Updated•11 years ago
|
Assignee: nobody → gbrown
Assignee | ||
Comment 4•11 years ago
|
||
It was good to step away from this for a bit -- when I came back, I saw how I was confusing myself.
Once the mechanics of passing proc vs proc.PID were worked out, hung xpcshell tests continued to fail with DMErrors...but that was just because the xpcshell process never completed the test. What happens is that the dm.shell() issues a SUT 'exec' command, and SUTagent waits for xpcshell to finish for 5 minutes -- so launchProcess is still running when the Python timer expires and kills the process. devicemanager may see no return code because the xpcshell process is still running when sut or devicemanager times out or because the xpcshell process is killed from the timer handler...whatever the reason, devicemanager will raise an exception. In other circumstances, that DMError is likely appropriate, so I didn't change that. Instead, I catch the DMError and ignore it in the case of a harness timeout -- seems to work fine and delivers a nice tidy and descriptive failure message.
Here's a try run without any changes except the introduction of a hanging test:
https://tbpl.mozilla.org/?tree=Try&rev=8e2ab989bd16
Notice that Linux reports a simple "Test timed out" while Android reports "AttributeError: 'NoneType' object has no attribute 'pid'
raise DMError("Automation Error: Error finding end of line/return value when running '%s'" % cmdline)"
Here's the same test + this patch applied:
https://tbpl.mozilla.org/?tree=Try&rev=30a45d0be5ad
Now we are all on the same page: "Test timed out."
Attachment #799155 -
Attachment is obsolete: true
Attachment #8372302 -
Flags: review?(jmaher)
Comment 5•11 years ago
|
||
Comment on attachment 8372302 [details] [diff] [review]
fix xpcshell hang/timeout handling
Review of attachment 8372302 [details] [diff] [review]:
-----------------------------------------------------------------
I am really not sure about the self.timedout variable. The try server run does give me proof of this working- but this patch as a standalone patch doesn't help me understand it.
::: testing/xpcshell/remotexpcshelltests.py
@@ +137,5 @@
> + if self.timedout:
> + # If the test timed out, there is a good chance the SUTagent also
> + # timed out and failed to return a return code, generating a
> + # DMError. Ignore the DMError to simplify the error report.
> + self.shellReturnCode = None
who sets self.timedout? is there a thread that does it?
@@ +140,5 @@
> + # DMError. Ignore the DMError to simplify the error report.
> + self.shellReturnCode = None
> + pass
> + else:
> + raise e
one would have to assume that raising an exception inside of a with statement will properly close the file.
@@ +159,5 @@
> + # (first) starts, so its lack of presence is a hint that
> + # something went wrong.
> + print "Automation Error: No crash directory (%s) found on remote device" % self.remoteMinidumpDir
> + # Whilst no crash was found, the run should still display as a failure
> + return True
this is cut/paste from the mochitest runner?
Attachment #8372302 -
Flags: review?(jmaher) → review-
Assignee | ||
Comment 6•11 years ago
|
||
(In reply to Joel Maher (:jmaher) from comment #5)
> ::: testing/xpcshell/remotexpcshelltests.py
> @@ +137,5 @@
> > + if self.timedout:
> > + # If the test timed out, there is a good chance the SUTagent also
> > + # timed out and failed to return a return code, generating a
> > + # DMError. Ignore the DMError to simplify the error report.
> > + self.shellReturnCode = None
>
> who sets self.timedout? is there a thread that does it?
It is set here:
+ def testTimeout(self, test_file, proc):
+ self.timedout = True
testTimeout() is called from existing code -- http://hg.mozilla.org/mozilla-central/annotate/d05c721ea1b0/testing/xpcshell/runxpcshelltests.py#l611 -- in a threading.Timer object, http://docs.python.org/2/library/threading.html#timer-objects. So testTimeout is called in a separate thread, typically while devicemanager.shell() is running, waiting to hear back from sut.
> one would have to assume that raising an exception inside of a with
> statement will properly close the file.
Just so: http://effbot.org/zone/python-with-statement.htm.
> @@ +159,5 @@
> > + # (first) starts, so its lack of presence is a hint that
> > + # something went wrong.
> > + print "Automation Error: No crash directory (%s) found on remote device" % self.remoteMinidumpDir
> > + # Whilst no crash was found, the run should still display as a failure
> > + return True
>
> this is cut/paste from the mochitest runner?
It is derived from http://hg.mozilla.org/mozilla-central/annotate/d05c721ea1b0/build/mobile/remoteautomation.py#l145. The issue here is that when xpcshell is killed, it occasionally does not create its minidump directory. That is a good thing to report, but without this code, we call devicemanager.getDirectory() which throws a DMError, bringing the whole test run to an end.
Comment 7•11 years ago
|
||
Comment on attachment 8372302 [details] [diff] [review]
fix xpcshell hang/timeout handling
thanks for clarifying everything here. I appreciate the context and feel comfortable with this patch.
Attachment #8372302 -
Flags: review- → review+
Assignee | ||
Comment 8•11 years ago
|
||
Comment 9•11 years ago
|
||
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla30
You need to log in
before you can comment on or make changes to this bug.
Description
•