Closed Bug 1293309 Opened 4 years ago Closed 4 years ago

cleardir throws DMError following xpcshell failures | DMError Attempted to push a file to a directory

Categories

(Firefox for Android :: Testing, defect, P2)

49 Branch
defect

Tracking

()

RESOLVED DUPLICATE of bug 1280570

People

(Reporter: gbrown, Assigned: gbrown)

References

Details

Attachments

(1 file)

Some recent xpcshell and cppunit failures contain:

10:58:57     INFO -  mozdevice.devicemanager.DMError: Non-zero return code for command: ['/data/local/xpcb/cleardir', '/storage/sdcard/tests/xpc/minidumps'] (output: '', retval: 'None')

Following a test failure, the failure is reported, then the harnesses check for minidumps and delete the minidump directory - the cleardir call fails and throws.

It is unclear whether this happens on any failure, or if this is intermittent.

I wonder if this is a long-standing issue that has traditionally been hidden by buildbot's retry-on-DMError policy.
https://public-artifacts.taskcluster.net/K2BN9iecRd2ogFDYy5KOsQ/0/public/logs/live_backing.log

0:58:57     INFO -  PROCESS | toolkit/components/extensions/test/xpcshell/test_ext_alarms_does_not_fire.js | JavaScript strict warning: chrome://global/content/bindings/browser.xml, line 369: ReferenceError: reference to undefined property tabBrowser.localNam
10:58:57     INFO -  <<<<<<<
10:58:57     INFO -  Following exceptions were raised:
10:58:57     INFO -  Traceback (most recent call last):
10:58:57     INFO -    File "/home/worker/workspace/build/tests/xpcshell/runxpcshelltests.py", line 169, in run
10:58:57     INFO -      self.run_test()
10:58:57     INFO -    File "/home/worker/workspace/build/tests/xpcshell/runxpcshelltests.py", line 775, in run_test
10:58:57     INFO -      if self.checkForCrashes(self.tempDir, self.symbolsPath, test_name=name):
10:58:57     INFO -    File "/home/worker/workspace/build/tests/xpcshell/remotexpcshelltests.py", line 177, in checkForCrashes
10:58:57     INFO -      self.clearRemoteDir(self.remoteMinidumpDir)
10:58:57     INFO -    File "/home/worker/workspace/build/tests/xpcshell/remotexpcshelltests.py", line 206, in clearRemoteDir
10:58:57     INFO -      self.device.shellCheckOutput([self.remoteClearDirScript, remoteDir])
10:58:57     INFO -    File "/home/worker/workspace/build/venv/local/lib/python2.7/site-packages/mozdevice/devicemanager.py", line 405, in shellCheckOutput
10:58:57     INFO -      raise DMError("Non-zero return code for command: %s (output: '%s', retval: '%s')" % (cmd, output, retval))
10:58:57     INFO -  DMError: Non-zero return code for command: ['/data/local/xpcb/cleardir', '/storage/sdcard/tests/xpc/minidumps'] (output: '', retval: 'None')
10:58:57     INFO -  Traceback (most recent call last):
10:58:57     INFO -    File "/home/worker/workspace/build/tests/xpcshell/remotexpcshelltests.py", line 628, in <module>
10:58:57     INFO -      main()
10:58:57     INFO -    File "/home/worker/workspace/build/tests/xpcshell/remotexpcshelltests.py", line 623, in main
10:58:57     INFO -      **vars(options)):
10:58:57     INFO -    File "/home/worker/workspace/build/tests/xpcshell/runxpcshelltests.py", line 1448, in runTests
10:58:57     INFO -      raise exceptions[0]
10:58:57     INFO -  mozdevice.devicemanager.DMError: Non-zero return code for command: ['/data/local/xpcb/cleardir', '/storage/sdcard/tests/xpc/minidumps'] (output: '', retval: 'None')
10:58:57    ERROR - Return code: 1
10:58:57    ERROR - No tests run or test summary not found
10:58:57     INFO - TinderboxPrint: xpcshell<br/><em class="testfail">T-FAIL</em>
1
See Also: → 1293225, 1291775
I cannot find any cppunit examples -- maybe this is just xpcshell.
Summary: cleardir throws DMError following xpcshell, cppunit failures → cleardir throws DMError following xpcshell failures
Priority: -- → P2
I tried introducing a simple xpcshell test failure, but was unable to reproduce this issue.

I tried introducing a simple xpcshell test crash, but was unable to reproduce this issue.

There are many calls to cleardir over the course of an xpcshell job, but this always seems to happen on minidumps, so I suspect this has something to do with a crash, but no minidumps are ever retrieved. Maybe the crashreporter is hung and locking the minidumps directory? But then shouldn't cleardir's rm or mkdir calls just report an error (instead of seeing cleardir's output: '')?
I do not know what is causing these infrequent failures. I wonder if it might be a temporary condition (busy file?), so I'm adding a retry here.
Attachment #8781265 - Flags: review?(jmaher)
Comment on attachment 8781265 [details] [diff] [review]
retry clearRemoteDir failures

Review of attachment 8781265 [details] [diff] [review]:
-----------------------------------------------------------------

consider the sheriff case, it might not be critical- but if you like the idea change it and r=me.

::: testing/xpcshell/remotexpcshelltests.py
@@ +213,5 @@
> +            time.sleep(10)
> +            try:
> +                out = self.device.shellCheckOutput([self.remoteClearDirScript, remoteDir])
> +            except mozdevice.DMError:
> +                self.log.error("unable to delete %s: '%s'" % (remoteDir, str(out)))

would it make sense to have a different message here to search on?  I think we could detect via 'retrying after 10 seconds' when inspecting the log, but...what about the sheriffs- could we have 'second time unable to delete %s: %s'
Attachment #8781265 - Flags: review?(jmaher) → review+
Thanks. I'll change the wording of the 2nd message. Also note that only the 2nd message is an error.
Pushed by gbrown@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/03bd46e4cf1d
Improve error handling for Android xpcshell minidump clear; r=jmaher
Keywords: leave-open
With that change in place, now we are seeing:

https://public-artifacts.taskcluster.net/QsgCwfT_QkqM3d5lkLsP2w/0/public/logs/live_backing.log

12:31:27  WARNING -  TEST-UNEXPECTED-FAIL | xpcom/tests/unit/test_streams.js | xpcshell return code: -1
12:31:27     INFO -  TEST-INFO took 1181ms
12:31:28     INFO -  unable to delete /storage/sdcard/tests/xpc/minidumps: ''
12:31:28     INFO -  retrying after 10 seconds...
12:31:38     INFO -  failed to delete /storage/sdcard/tests/xpc/minidumps: ''
12:31:38     INFO -  unable to delete /storage/sdcard/tests/xpc/p: ''
12:31:38     INFO -  retrying after 10 seconds...
12:31:48     INFO -  failed to delete /storage/sdcard/tests/xpc/p: ''
12:31:48     INFO -  unable to delete /storage/sdcard/tests/xpc/tmp: ''
12:31:48     INFO -  retrying after 10 seconds...
12:31:58     INFO -  failed to delete /storage/sdcard/tests/xpc/tmp: ''
12:31:58     INFO -  Following exceptions were raised:
12:31:58     INFO -  Traceback (most recent call last):
12:31:58     INFO -    File "/home/worker/workspace/build/tests/xpcshell/runxpcshelltests.py", line 169, in run
12:31:58     INFO -      self.run_test()
12:31:58     INFO -    File "/home/worker/workspace/build/tests/xpcshell/runxpcshelltests.py", line 620, in run_test
12:31:58     INFO -      self.mozInfoJSPath = self.setupMozinfoJS()
12:31:58     INFO -    File "/home/worker/workspace/build/tests/xpcshell/remotexpcshelltests.py", line 81, in setupMozinfoJS
12:31:58     INFO -      self.device.pushFile(local, mozInfoJSPath)
12:31:58     INFO -    File "/home/worker/workspace/build/venv/local/lib/python2.7/site-packages/mozdevice/devicemanagerADB.py", line 229, in pushFile
12:31:58     INFO -      (localname, destname))
12:31:58     INFO -  DMError: Attempted to push a file (/tmp/tmpP8Z2FR) to a directory (/storage/sdcard/tests/xpc/p/mozinfo.json)!
12:31:58     INFO -  Traceback (most recent call last):
12:31:58     INFO -    File "/home/worker/workspace/build/tests/xpcshell/remotexpcshelltests.py", line 639, in <module>
12:31:58     INFO -      main()
12:31:58     INFO -    File "/home/worker/workspace/build/tests/xpcshell/remotexpcshelltests.py", line 634, in main
12:31:58     INFO -      **vars(options)):
12:31:58     INFO -    File "/home/worker/workspace/build/tests/xpcshell/runxpcshelltests.py", line 1448, in runTests
12:31:58     INFO -      raise exceptions[0]
12:31:58     INFO -  mozdevice.devicemanager.DMError: Attempted to push a file (/tmp/tmpP8Z2FR) to a directory (/storage/sdcard/tests/xpc/p/mozinfo.json)!
12:31:58    ERROR - Return code: 1
12:31:58    ERROR - No tests run or test summary not found

Similarly:

https://public-artifacts.taskcluster.net/LCqdcoo6RaCwI798cWhmng/0/public/logs/live_backing.log

00:29:17  WARNING -  TEST-UNEXPECTED-FAIL | toolkit/components/osfile/tests/xpcshell/test_osfile_kill.js | xpcshell return code: 14100
00:29:17     INFO -  TEST-INFO took 4584ms
00:29:17     INFO -  >>>>>>>
00:29:17     INFO -  PROCESS | toolkit/components/osfile/tests/xpcshell/test_osfile_kill.js | xpcw: cd /storage/sdcard/tests/xpc/toolkit/components/osfile/tests/xpcshell
00:29:17     INFO -  PROCESS | toolkit/components/osfile/tests/xpcshell/test_osfile_kill.js | xpcw: xpcshell -r /storage/sdcard/tests/xpc/c/httpd.manifest --greomni /data/local/xpcb/target.apk -m -s -e const _HEAD_JS_PATH = "/storage/sdcard/tests/xpc/head.js"; -e const _MOZINFO_JS_PATH = "/storage/sdcard/tests/xpc/p/mozinfo.json"; -e const _TESTING_MODULES_DIR = "/storage/sdcard/tests/xpc/m"; -f /storage/sdcard/tests/xpc/head.js -e const _SERVER_ADDR = "localhost" -e const _HEAD_FILES = ["/storage/sdcard/tests/xpc/toolkit/components/osfile/tests/xpcshell/head.js"]; -e const _TAIL_FILES = []; -e const _JSDEBUGGER_PORT = 0; -e const _TEST_FILE = ["test_osfile_kill.js"]; -e const _TEST_NAME = "toolkit/components/osfile/tests/xpcshell/test_osfile_kill.js" -e _execute_test(); quit(0);
00:29:17     INFO -  PROCESS | toolkit/components/osfile/tests/xpcshell/test_osfile_kill.js | [14100] WARNING: XPCOM objects created/destroyed from static ctor/dtor: file /home/worker/workspace/build/src/xpcom/base/nsTraceRefcnt.cpp, line 171
00:29:17     INFO -  PROCESS | toolkit/components/osfile/tests/xpcshell/test_osfile_kill.js | [14100] WARNING: XPCOM objects created/destroyed from static ctor/dtor: file /home/worker/workspace/build/src/xpcom/base/nsTraceRefcnt.cpp, line 171
00:29:17     INFO -  PROCESS | toolkit/components/osfile/tests/xpcshell/test_osfile_kill.js | [14100] WARNING: XPCOM objects created/destroyed from static ctor/dtor: file /home/worker/workspace/build/src/xpcom/base/nsTraceRefcnt.cpp, line 171
00:29:17     INFO -  PROCESS | toolkit/components/osfile/tests/xpcshell/test_osfile_kill.js | [14100] WARNING: XPCOM objects created/destroyed from static ctor/dtor: file /home/worker/workspace/build/src/xpcom/base/nsTraceRefcnt.cpp, line 171
00:29:17     INFO -  PROCESS | toolkit/components/osfile/tests/xpcshell/test_osfile_kill.js | [14100] WARNING: XPCOM objects created/destroyed from static ctor/dtor: file /home/worker/workspace/build/src/xpcom/base/nsTraceRefcnt.cpp, line 171
00:29:17     INFO -  PROCESS | toolkit/components/osfile/tests/xpcshell/test_osfile_kill.js | [14100] WARNING: XPCOM objects created/destroyed from static ctor/dtor: file /home/worker/workspace/build/src/xpcom/base/nsTraceRefcnt.cpp, line 171
00:29:17     INFO -  PROCESS | toolkit/components/osfile/tests/xpcshell/test_osfile_kill.js | [14100] WARNING: XPCOM objects created/destroyed from static ctor/dtor: file /home/worker/workspace/build/src/xpcom/base/nsTraceRefcnt.cpp, line 171
00:29:17     INFO -  PROCESS | toolkit/components/osfile/tests/xpcshell/test_osfile_kill.js | Warning: MOZILLA_FIVE_HOME not set.
00:29:17     INFO -  PROCESS | toolkit/components/osfile/tests/xpcshell/test_osfile_kill.js | [14100] WARNING: Couldn't get the user appdata directory. Crash events may not be produced.: file /home/worker/workspace/build/src/toolkit/crashreporter/nsExceptionHandler.cpp, line 272
00:29:17     INFO -  <<<<<<<
00:29:17     INFO -  unable to delete /storage/sdcard/tests/xpc/minidumps: ''
00:29:17     INFO -  retrying after 10 seconds...
00:29:27     INFO -  failed to delete /storage/sdcard/tests/xpc/minidumps: ''
00:29:27     INFO -  TEST-START | toolkit/components/osfile/tests/xpcshell/test_osfile_async_setPermissions.js
00:29:27     INFO -  TEST-SKIP | toolkit/components/osfile/tests/xpcshell/test_osfile_async_setPermissions.js | took 1ms
00:29:27     INFO -  TEST-START | toolkit/components/osfile/tests/xpcshell/test_osfile_win_async_setPermissions.js
00:29:27     INFO -  TEST-SKIP | toolkit/components/osfile/tests/xpcshell/test_osfile_win_async_setPermissions.js | took 0ms
00:29:27     INFO -  TEST-START | toolkit/components/passwordmgr/test/unit/test_module_LoginImport.js
00:29:27     INFO -  TEST-SKIP | toolkit/components/passwordmgr/test/unit/test_module_LoginImport.js | took 1ms
00:29:27     INFO -  TEST-START | toolkit/components/passwordmgr/test/unit/test_module_LoginStore.js
00:29:27     INFO -  TEST-SKIP | toolkit/components/passwordmgr/test/unit/test_module_LoginStore.js | took 1ms
00:29:27     INFO -  TEST-START | toolkit/components/passwordmgr/test/unit/test_removeLegacySignonFiles.js
00:29:27     INFO -  TEST-SKIP | toolkit/components/passwordmgr/test/unit/test_removeLegacySignonFiles.js | took 0ms
00:29:27     INFO -  TEST-START | toolkit/components/passwordmgr/test/unit/test_storage_mozStorage.js
00:29:27     INFO -  TEST-SKIP | toolkit/components/passwordmgr/test/unit/test_storage_mozStorage.js | took 0ms
00:29:27     INFO -  TEST-START | toolkit/components/passwordmgr/test/unit/test_context_menu.js
00:29:27     INFO -  TEST-SKIP | toolkit/components/passwordmgr/test/unit/test_context_menu.js | took 0ms
00:29:27     INFO -  unable to delete /storage/sdcard/tests/xpc/p: ''
00:29:27     INFO -  retrying after 10 seconds...
00:29:37     INFO -  failed to delete /storage/sdcard/tests/xpc/p: ''
00:29:37     INFO -  unable to delete /storage/sdcard/tests/xpc/tmp: ''
00:29:37     INFO -  retrying after 10 seconds...
00:29:47     INFO -  failed to delete /storage/sdcard/tests/xpc/tmp: ''
00:29:47     INFO -  Following exceptions were raised:
00:29:47     INFO -  Traceback (most recent call last):
00:29:47     INFO -    File "/home/worker/workspace/build/tests/xpcshell/runxpcshelltests.py", line 169, in run
00:29:47     INFO -      self.run_test()
00:29:47     INFO -    File "/home/worker/workspace/build/tests/xpcshell/runxpcshelltests.py", line 620, in run_test
00:29:47     INFO -      self.mozInfoJSPath = self.setupMozinfoJS()
00:29:47     INFO -    File "/home/worker/workspace/build/tests/xpcshell/remotexpcshelltests.py", line 81, in setupMozinfoJS
00:29:47     INFO -      self.device.pushFile(local, mozInfoJSPath)
00:29:47     INFO -    File "/home/worker/workspace/build/venv/local/lib/python2.7/site-packages/mozdevice/devicemanagerADB.py", line 229, in pushFile
00:29:47     INFO -      (localname, destname))
00:29:47     INFO -  DMError: Attempted to push a file (/tmp/tmph1Axzk) to a directory (/storage/sdcard/tests/xpc/p/mozinfo.json)!
00:29:47     INFO -  Traceback (most recent call last):
00:29:47     INFO -    File "/home/worker/workspace/build/tests/xpcshell/remotexpcshelltests.py", line 639, in <module>
00:29:47     INFO -      main()
00:29:47     INFO -    File "/home/worker/workspace/build/tests/xpcshell/remotexpcshelltests.py", line 634, in main
00:29:47     INFO -      **vars(options)):
00:29:47     INFO -    File "/home/worker/workspace/build/tests/xpcshell/runxpcshelltests.py", line 1448, in runTests
00:29:47     INFO -      raise exceptions[0]
00:29:47     INFO -  mozdevice.devicemanager.DMError: Attempted to push a file (/tmp/tmph1Axzk) to a directory (/storage/sdcard/tests/xpc/p/mozinfo.json)!
00:29:47    ERROR - Return code: 1
00:29:47    ERROR - No tests run or test summary not found
I read this as "during testing, an adb command fails and then all adb communication is lost".

The mozharness cleanup code sees the same thing:

12:38:28     INFO - >> Verify emulator visible to adb: Attempt #4 of 4
12:38:28     INFO - Running timeout 30 /home/worker/workspace/build/android-sdk18/platform-tools/adb devices
12:38:28     INFO - List of devices attached
12:38:28  WARNING - Unable to communicate with emulator via adb

Now it all makes sense: This is the xpcshell variant of bug 1157948, where the emulator becomes unresponsive in the middle of a test. We haven't noticed this on buildbot because the DMError triggers buildbot's automatic retry; on taskcluster, it is treated as a regular error because of bug 1280570.
Depends on: 1280570
See Also: → 1157948
Duplicate of this bug: 1296126
I can't find the bug about it (probably closed because it used Panda in the summary), but we did have a longstanding, as long as we've retried on DMError, buildbot problem where we infinitely retried xpcshell crashes on Android.

This actually blocks bug 1280570 rather than depending on it - with buildbot, we had tools that would (awkwardly) let me see, eventually, that someone had pushed an xpcshell crash to try including Android, which would have retried a few dozen times before I saw it and killed it; with taskcluster, we have no such tools. I'm quite surprised you couldn't reproduce this with an xpcshell crash, because it really should be 100%: we have no intermittent xpcshell crashes on Android (other than the new ones on TC), and we've always infinitely retried non-intermittent crashes.
I think I know what you mean, but I thought common xpcshell-infinite-retry problems were fixed. I also cannot find the bugs. Bug 875814 should have resolved some cases. Also, I imagine that our migration from sutagent to adb (or maybe panda -> emulator??) changed the way xpcshell crashes are handled.

An Android xpcshell crash was recently introduced on inbound, and it was consistently handled well, with crash reports and no DMError: https://public-artifacts.taskcluster.net/TJMZx5DlTAqmt7GqlMaGow/0/public/logs/live_backing.log.

Of course there is a risk that permanent failures like crashes will somehow trigger DMError and produce infinite retries. I would like to have the number of retries limited by taskcluster; I've just commented to that effect in bug 1280570.
See Also: → 1300272
Summary: cleardir throws DMError following xpcshell failures → cleardir throws DMError following xpcshell failures | DMError Attempted to push a file to a directory
Duplicate of this bug: 1298175
Duplicate of this bug: 1295803
Duplicate of this bug: 1299297
Duplicate of this bug: 1299342
Blocks: 1303634
Duplicate of this bug: 1300913
Duplicate of this bug: 1300335
Duplicate of this bug: 1299737
Duplicate of this bug: 1299005
Duplicate of this bug: 1298176
Duplicate of this bug: 1296803
Duplicate of this bug: 1296415
Duplicate of this bug: 1293525
Duplicate of this bug: 1293225
Duplicate of this bug: 1291907
Duplicate of this bug: 1291775
Duplicate of this bug: 1291767
Duplicate of this bug: 1291586
Duplicate of this bug: 1289104
Duplicate of this bug: 1288403
Duplicate of this bug: 1287449
Duplicate of this bug: 1286401
Duplicate of this bug: 1285743
Duplicate of this bug: 1285436
Duplicate of this bug: 1285435
Duplicate of this bug: 1301277
These are now treated as retries on taskcluster, as they have been on buildbot.

https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=1215be5477392bdc60bd4da85e174d5c6417a096

I think that's the best we can do when the emulator becomes unresponsive in the middle of a test.
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 1280570
Duplicate of this bug: 1296590
Removing leave-open keyword from resolved bugs, per :sylvestre.
Keywords: leave-open
You need to log in before you can comment on or make changes to this bug.