Closed Bug 875814 Opened 12 years ago Closed 11 years ago

Hung/long-running android xpcshell tests fail with "AttributeError: 'NoneType' object has no attribute 'pid'"

Tracking

(Not tracked)

Status:

RESOLVED FIXED

Milestone:

mozilla30

People

(Reporter: gbrown, Assigned: gbrown)

References

Details

Attachments

(1 file, 1 obsolete file)

wip - avoid proc.pid exception on android 11 years ago Geoff Brown [:gbrown] 3.28 KB, patch		Details \| Diff \| Splinter Review
fix xpcshell hang/timeout handling 11 years ago Geoff Brown [:gbrown] 5.28 KB, patch	jmaher : review+	Details \| Diff \| Splinter Review

Geoff Brown [:gbrown]

Assignee

Description

•

12 years ago

We probably need to set proc.pid or change runxpcshelltests.py to not reference it, at least for remote tests. TEST-INFO | /builds/tegra-092/test/build/tests/xpcshell/tests/toolkit/mozapps/extensions/test/xpcshell/test_backgroundupdate.js | running test ... Exception in thread Thread-294: Traceback (most recent call last): File "/tools/python27/lib/python2.7/threading.py", line 551, in __bootstrap_inner self.run() File "/tools/python27/lib/python2.7/threading.py", line 755, in run self.function(*self.args, **self.kwargs) File "/builds/tegra-092/test/build/tests/xpcshell/runxpcshelltests.py", line 895, in <lambda> testTimer = Timer(HARNESS_TIMEOUT, lambda: self.testTimeout(name, proc.pid)) AttributeError: 'NoneType' object has no attribute 'pid' Traceback (most recent call last): File "xpcshell/remotexpcshelltests.py", line 503, in <module> main() File "xpcshell/remotexpcshelltests.py", line 498, in main **options.__dict__): File "/builds/tegra-092/test/build/tests/xpcshell/runxpcshelltests.py", line 800, in runTests pStderr=pStderr, keep_going=keepGoing) File "/builds/tegra-092/test/build/tests/xpcshell/runxpcshelltests.py", line 907, in run_test stdout=pStdout, stderr=pStderr, env=self.env, cwd=test_dir) File "xpcshell/remotexpcshelltests.py", line 309, in launchProcess self.shellReturnCode = self.device.shell(cmd, f) File "/builds/tegra-092/test/build/tests/xpcshell/devicemanagerSUT.py", line 339, in shell raise DMError("Automation Error: Error finding end of line/return value when running '%s'" % cmdline) devicemanager.DMError: Automation Error: Error finding end of line/return value when running '/data/local/xpcb/xpcw /mnt/sdcard/tests/xpcshell/toolkit/mozapps/extensions/test/xpcshell -r /mnt/sdcard/tests/xpcshell/c/httpd.manifest --greomni /data/local/xpcb/fennec-24.0a1.en-US.android-arm.apk -m -n -s -e 'const _HTTPD_JS_PATH = "/mnt/sdcard/tests/xpcshell/c/httpd.js";' -e 'const _HEAD_JS_PATH = "/mnt/sdcard/tests/xpcshell/head.js";' -e 'const _TESTING_MODULES_DIR = "/mnt/sdcard/tests/xpcshell/m";' -f /mnt/sdcard/tests/xpcshell/head.js -e 'const _SERVER_ADDR = "localhost"' -e 'const _HEAD_FILES = ["/mnt/sdcard/tests/xpcshell/toolkit/mozapps/extensions/test/xpcshell/head_addons.js"];' -e 'const _TAIL_FILES = [];' -e 'const _TEST_FILE = ["test_backgroundupdate.js"];' -e '_execute_test(); quit(0);'' program finished with exit code 1

Geoff Brown [:gbrown]

Assignee

Comment 1

•

12 years ago

Already reported for b2g - bug 873053.

Updated

•

11 years ago

Updated

•

11 years ago

Comment 2

•

11 years ago

Attached patch wip - avoid proc.pid exception on android (obsolete) — Details — Splinter Review

This is promising in that it allows us to avoid the proc.pid failure on android without affecting the desktop timeout handling. However, it exposes more problems for android timeout handling: The harness timeout is the same as the device manager timeout; relying on the device manager timeout leaves the process executing; killing the process from the harness timeout handling results in a device manager exception when launchProcess tries to parse the return code from sut -- resulting in infinite retries, just as bad as the current situation.

(not currently active) Ted Mielczarek

Comment 3

•

11 years ago

I assume you just want to do something like we do with Mochitest, where we have two layers of timeouts: http://mxr.mozilla.org/mozilla-central/source/testing/mochitest/runtests.py#589 The JS harness has its own internal timeout, and the Python harness has that + 30s to fall back to. If device manager times out, can we detect that and then kill the process from Python?

Ed Morley [:emorley]

Updated

•

11 years ago

Blocks: 873053

Ed Morley [:emorley]

Updated

•

11 years ago

Updated

•

11 years ago

Blocks: 967704

Geoff Brown [:gbrown]

Assignee

Updated

•

11 years ago

Assignee: nobody → gbrown

Geoff Brown [:gbrown]

Assignee

Comment 4

•

11 years ago

Attached patch fix xpcshell hang/timeout handling — Details — Splinter Review

It was good to step away from this for a bit -- when I came back, I saw how I was confusing myself. Once the mechanics of passing proc vs proc.PID were worked out, hung xpcshell tests continued to fail with DMErrors...but that was just because the xpcshell process never completed the test. What happens is that the dm.shell() issues a SUT 'exec' command, and SUTagent waits for xpcshell to finish for 5 minutes -- so launchProcess is still running when the Python timer expires and kills the process. devicemanager may see no return code because the xpcshell process is still running when sut or devicemanager times out or because the xpcshell process is killed from the timer handler...whatever the reason, devicemanager will raise an exception. In other circumstances, that DMError is likely appropriate, so I didn't change that. Instead, I catch the DMError and ignore it in the case of a harness timeout -- seems to work fine and delivers a nice tidy and descriptive failure message. Here's a try run without any changes except the introduction of a hanging test: https://tbpl.mozilla.org/?tree=Try&rev=8e2ab989bd16 Notice that Linux reports a simple "Test timed out" while Android reports "AttributeError: 'NoneType' object has no attribute 'pid' raise DMError("Automation Error: Error finding end of line/return value when running '%s'" % cmdline)" Here's the same test + this patch applied: https://tbpl.mozilla.org/?tree=Try&rev=30a45d0be5ad Now we are all on the same page: "Test timed out."

Attachment #799155 - Attachment is obsolete: true

Attachment #8372302 - Flags: review?(jmaher)

Joel Maher ( :jmaher ) (UTC -8)

Comment 5

•

11 years ago

Comment on attachment 8372302 [details] [diff] [review] fix xpcshell hang/timeout handling Review of attachment 8372302 [details] [diff] [review]: ----------------------------------------------------------------- I am really not sure about the self.timedout variable. The try server run does give me proof of this working- but this patch as a standalone patch doesn't help me understand it. ::: testing/xpcshell/remotexpcshelltests.py @@ +137,5 @@ > + if self.timedout: > + # If the test timed out, there is a good chance the SUTagent also > + # timed out and failed to return a return code, generating a > + # DMError. Ignore the DMError to simplify the error report. > + self.shellReturnCode = None who sets self.timedout? is there a thread that does it? @@ +140,5 @@ > + # DMError. Ignore the DMError to simplify the error report. > + self.shellReturnCode = None > + pass > + else: > + raise e one would have to assume that raising an exception inside of a with statement will properly close the file. @@ +159,5 @@ > + # (first) starts, so its lack of presence is a hint that > + # something went wrong. > + print "Automation Error: No crash directory (%s) found on remote device" % self.remoteMinidumpDir > + # Whilst no crash was found, the run should still display as a failure > + return True this is cut/paste from the mochitest runner?

Attachment #8372302 - Flags: review?(jmaher) → review-

Geoff Brown [:gbrown]

Assignee

Comment 6

•

11 years ago

(In reply to Joel Maher (:jmaher) from comment #5) > ::: testing/xpcshell/remotexpcshelltests.py > @@ +137,5 @@ > > + if self.timedout: > > + # If the test timed out, there is a good chance the SUTagent also > > + # timed out and failed to return a return code, generating a > > + # DMError. Ignore the DMError to simplify the error report. > > + self.shellReturnCode = None > > who sets self.timedout? is there a thread that does it? It is set here: + def testTimeout(self, test_file, proc): + self.timedout = True testTimeout() is called from existing code -- http://hg.mozilla.org/mozilla-central/annotate/d05c721ea1b0/testing/xpcshell/runxpcshelltests.py#l611 -- in a threading.Timer object, http://docs.python.org/2/library/threading.html#timer-objects. So testTimeout is called in a separate thread, typically while devicemanager.shell() is running, waiting to hear back from sut. > one would have to assume that raising an exception inside of a with > statement will properly close the file. Just so: http://effbot.org/zone/python-with-statement.htm. > @@ +159,5 @@ > > + # (first) starts, so its lack of presence is a hint that > > + # something went wrong. > > + print "Automation Error: No crash directory (%s) found on remote device" % self.remoteMinidumpDir > > + # Whilst no crash was found, the run should still display as a failure > > + return True > > this is cut/paste from the mochitest runner? It is derived from http://hg.mozilla.org/mozilla-central/annotate/d05c721ea1b0/build/mobile/remoteautomation.py#l145. The issue here is that when xpcshell is killed, it occasionally does not create its minidump directory. That is a good thing to report, but without this code, we call devicemanager.getDirectory() which throws a DMError, bringing the whole test run to an end.

Joel Maher ( :jmaher ) (UTC -8)

Comment 7

•

11 years ago

Comment on attachment 8372302 [details] [diff] [review] fix xpcshell hang/timeout handling thanks for clarifying everything here. I appreciate the context and feel comfortable with this patch.

Attachment #8372302 - Flags: review- → review+

Geoff Brown [:gbrown]

Assignee

Comment 8

•

11 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/e4622166d378

Ryan VanderMeulen [:RyanVM]

Comment 9

•

11 years ago

https://hg.mozilla.org/mozilla-central/rev/e4622166d378

Status: NEW → RESOLVED

Closed: 11 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla30

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Hung/long-running android xpcshell tests fail with "AttributeError: 'NoneType' object has no attribute 'pid'"

Categories

(Testing :: XPCShell Harness, defect)

Tracking

(Not tracked)

People

(Reporter: gbrown, Assigned: gbrown)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file, 1 obsolete file)

Description

Comment 1

Updated

Updated

Comment 2

Comment 3

Updated

Updated

Updated

Updated

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Attachment

General

Description

File Name

Content Type