Closed Bug 1558177 Opened 6 years ago Closed 6 years ago

Intermittent raptor-main Error: No crash directory (/sdcard/raptor/profile/minidumps) found on remote device

Categories

(Testing :: Raptor, defect, P5)

Version 3
defect

Tracking

(Not tracked)

RESOLVED INCOMPLETE

People

(Reporter: intermittent-bug-filer, Assigned: alexandru.irimovici)

References

Details

(Keywords: intermittent-failure, regression, Whiteboard: [comment 10])

Filed by: rmaries [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer.html#?job_id=250908832&repo=mozilla-central
Full log: https://queue.taskcluster.net/v1/task/Avft_R_IQtCIVivRPVDR8A/runs/0/artifacts/public/logs/live_backing.log


11:56:43 ERROR - raptor-main Error: No crash directory (/sdcard/raptor/profile/minidumps) found on remote device
11:56:43 INFO - raptor-main Info: removing webext /builds/task_1560167707/workspace/build/tests/raptor/raptor/../webext/raptor
11:56:43 INFO - raptor-main Info: removing test folder for raptor: /sdcard/raptor
11:56:43 INFO - adb shell_output: adb -s FA83T1A00347 wait-for-device shell rm -r /sdcard/raptor; echo adb_returncode=$?, timeout: None, root: False, timedout: None, exitcode: 0, output:
11:56:43 INFO - raptor-control-server Info: shutting down control server
11:56:43 INFO - raptor-main Info: finished
11:56:43 ERROR - Traceback (most recent call last):
11:56:44 INFO - File "/builds/task_1560167707/workspace/build/tests/raptor/raptor/raptor.py", line 1244, in <module>
11:56:44 INFO - main()
11:56:44 INFO - File "/builds/task_1560167707/workspace/build/tests/raptor/raptor/raptor.py", line 1212, in main
11:56:44 INFO - success = raptor.run_tests(raptor_test_list, raptor_test_names)
11:56:44 INFO - File "/builds/task_1560167707/workspace/build/tests/raptor/raptor/raptor.py", line 975, in run_tests
11:56:44 INFO - return super(RaptorAndroid, self).run_tests(tests, test_names)
11:56:44 INFO - File "/builds/task_1560167707/workspace/build/tests/raptor/raptor/raptor.py", line 182, in run_tests
11:56:44 INFO - self.run_test(test, timeout=int(test['page_timeout']))
11:56:44 INFO - File "/builds/task_1560167707/workspace/build/tests/raptor/raptor/raptor.py", line 994, in run_test
11:56:44 INFO - self.run_test_cold(test, timeout)
11:56:44 INFO - File "/builds/task_1560167707/workspace/build/tests/raptor/raptor/raptor.py", line 1049, in run_test_cold
11:56:44 INFO - self.start_playback(test)
11:56:44 INFO - File "/builds/task_1560167707/workspace/build/tests/raptor/raptor/raptor.py", line 372, in start_playback
11:56:44 INFO - self.playback.start()
11:56:44 INFO - File "/builds/task_1560167707/workspace/build/venv/lib/python2.7/site-packages/mozproxy/backends/mitm.py", line 125, in start
11:56:44 INFO - self.setup()
11:56:44 INFO - File "/builds/task_1560167707/workspace/build/venv/lib/python2.7/site-packages/mozproxy/backends/mitm.py", line 384, in setup
11:56:44 INFO - self.install_mitmproxy_cert(self.browser_path)
11:56:44 INFO - File "/builds/task_1560167707/workspace/build/venv/lib/python2.7/site-packages/mozproxy/backends/mitm.py", line 457, in install_mitmproxy_cert
11:56:44 INFO - )[0]
11:56:44 INFO - IndexError: list index out of range
11:56:44 ERROR - Return code: 1
11:56:44 WARNING - setting return code to 1

This seems to be a failure to find the host-utils archive when it is not in the cache (missing from the excerpt above):

11:56:39     INFO -  mozproxy INFO - File host-utils-68.0a1.en-US.linux-x86_64.tar.gz not present in local cache folder /builds/tooltool_cache
11:56:39     INFO -  mozproxy INFO - Attempting to fetch from 'https://tooltool.mozilla-releng.net/'...
11:56:43     INFO -  mozproxy INFO - ...failed to fetch 'host-utils-68.0a1.en-US.linux-x86_64.tar.gz' from https://tooltool.mozilla-releng.net/
11:56:43    ERROR -  mozproxy Traceback (most recent call last):
11:56:43     INFO -  mozproxy   File "/builds/task_1560167685/workspace/mozharness/external_tools/tooltool.py", line 694, in fetch_file
11:56:43     INFO -  mozproxy     f = urllib2.urlopen(req)
11:56:43     INFO -  mozproxy   File "/usr/lib/python2.7/urllib2.py", line 154, in urlopen
11:56:43     INFO -  mozproxy     return opener.open(url, data, timeout)
11:56:43     INFO -  mozproxy   File "/usr/lib/python2.7/urllib2.py", line 435, in open
11:56:43     INFO -  mozproxy     response = meth(req, response)
11:56:43     INFO -  mozproxy   File "/usr/lib/python2.7/urllib2.py", line 548, in http_response
11:56:43     INFO -  mozproxy     'http', request, response, code, msg, hdrs)
11:56:43     INFO -  mozproxy   File "/usr/lib/python2.7/urllib2.py", line 467, in error
11:56:43     INFO -  mozproxy     result = self._call_chain(*args)
11:56:43     INFO -  mozproxy   File "/usr/lib/python2.7/urllib2.py", line 407, in _call_chain
11:56:43     INFO -  mozproxy     result = func(*args)
11:56:43     INFO -  mozproxy   File "/usr/lib/python2.7/urllib2.py", line 654, in http_error_302
11:56:43     INFO -  mozproxy     return self.parent.open(new, timeout=req.timeout)
11:56:43     INFO -  mozproxy   File "/usr/lib/python2.7/urllib2.py", line 429, in open
11:56:43     INFO -  mozproxy     response = self._open(req, data)
11:56:43     INFO -  mozproxy   File "/usr/lib/python2.7/urllib2.py", line 447, in _open
11:56:43     INFO -  mozproxy     '_open', req)
11:56:43     INFO -  mozproxy   File "/usr/lib/python2.7/urllib2.py", line 407, in _call_chain
11:56:43     INFO -  mozproxy     result = func(*args)
11:56:43     INFO -  mozproxy   File "/usr/lib/python2.7/urllib2.py", line 1241, in https_open
11:56:43     INFO -  mozproxy     context=self._context)
11:56:43     INFO -  mozproxy   File "/usr/lib/python2.7/urllib2.py", line 1198, in do_open
11:56:43     INFO -  mozproxy     raise URLError(err)
11:56:43     INFO -  mozproxy URLError: <urlopen error [Errno -2] Name or service not known>
11:56:43     INFO -  mozproxy ERROR - The following files failed: 'host-utils-68.0a1.en-US.linux-x86_64.tar.gz'
11:56:43     INFO -  mozproxy Stopping mitmproxy playback, killing process 707
11:56:43     INFO -  mozproxy Successfully killed the mitmproxy playback process
11:56:43     INFO -  raptor-main Info: removing reverse socket connections
11:56:43     INFO -  adb command_output: adb -s FA83T1A00347 wait-for-device reverse --remove-all, timeout: None, timedout: None, exitcode: 0, output:
11:56:43    ERROR -  raptor-main Error: No crash directory (/sdcard/raptor/profile/minidumps) found on remote device

Since this was filed 8 days ago there are 94 failures across android-hw-7-0 and android-hw-8-0 pgo & opt builds: https://treeherder.mozilla.org/intermittent-failures.html#/bugdetails?startday=2019-05-19&endday=2019-06-18&tree=trunk&bug=1558177
Robert, could you please take a look over this? Thank you.

Recent failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=252269883&repo=autoland&lineNumber=1269
07:42:35 INFO - raptor-main Info: removing reverse socket connections
07:42:35 INFO - adb command_output: adb -s ZY3223F92P wait-for-device reverse --remove-all, timeout: None, timedout: None, exitcode: 0, output:
07:42:35 ERROR - raptor-main Error: No crash directory (/sdcard/raptor/profile/minidumps) found on remote device
07:42:35 INFO - raptor-main Info: removing webext /builds/task_1560843603/workspace/build/tests/raptor/raptor/../webext/raptor
07:42:35 INFO - raptor-main Info: removing test folder for raptor: /sdcard/raptor
07:42:35 INFO - adb shell_output: adb -s ZY3223F92P wait-for-device shell rm -r /sdcard/raptor; echo adb_returncode=$?, timeout: None, root: False, timedout: None, exitcode: 0, output:
07:42:36 INFO - raptor-control-server Info: shutting down control server
07:42:36 INFO - raptor-main Info: finished

Flags: needinfo?(rwood)

(In reply to Greg Mierzwinski [:sparky] from comment #1)

This seems to be a failure to find the host-utils archive when it is not in the cache (missing from the excerpt above):

11:56:39     INFO -  mozproxy INFO - File host-utils-68.0a1.en-US.linux-x86_64.tar.gz not present in local cache folder /builds/tooltool_cache
11:56:39     INFO -  mozproxy INFO - Attempting to fetch from 'https://tooltool.mozilla-releng.net/'...
11:56:43     INFO -  mozproxy INFO - ...failed to fetch 'host-utils-68.0a1.en-US.linux-x86_64.tar.gz' from https://tooltool.mozilla-releng.net/
11:56:43    ERROR -  mozproxy Traceback (most recent call last):
11:56:43     INFO -  mozproxy   File "/builds/task_1560167685/workspace/mozharness/external_tools/tooltool.py", line 694, in fetch_file
11:56:43     INFO -  mozproxy     f = urllib2.urlopen(req)
11:56:43     INFO -  mozproxy   File "/usr/lib/python2.7/urllib2.py", line 154, in urlopen
11:56:43     INFO -  mozproxy     return opener.open(url, data, timeout)
11:56:43     INFO -  mozproxy   File "/usr/lib/python2.7/urllib2.py", line 435, in open
11:56:43     INFO -  mozproxy     response = meth(req, response)
11:56:43     INFO -  mozproxy   File "/usr/lib/python2.7/urllib2.py", line 548, in http_response
11:56:43     INFO -  mozproxy     'http', request, response, code, msg, hdrs)
11:56:43     INFO -  mozproxy   File "/usr/lib/python2.7/urllib2.py", line 467, in error
11:56:43     INFO -  mozproxy     result = self._call_chain(*args)
11:56:43     INFO -  mozproxy   File "/usr/lib/python2.7/urllib2.py", line 407, in _call_chain
11:56:43     INFO -  mozproxy     result = func(*args)
11:56:43     INFO -  mozproxy   File "/usr/lib/python2.7/urllib2.py", line 654, in http_error_302
11:56:43     INFO -  mozproxy     return self.parent.open(new, timeout=req.timeout)
11:56:43     INFO -  mozproxy   File "/usr/lib/python2.7/urllib2.py", line 429, in open
11:56:43     INFO -  mozproxy     response = self._open(req, data)
11:56:43     INFO -  mozproxy   File "/usr/lib/python2.7/urllib2.py", line 447, in _open
11:56:43     INFO -  mozproxy     '_open', req)
11:56:43     INFO -  mozproxy   File "/usr/lib/python2.7/urllib2.py", line 407, in _call_chain
11:56:43     INFO -  mozproxy     result = func(*args)
11:56:43     INFO -  mozproxy   File "/usr/lib/python2.7/urllib2.py", line 1241, in https_open
11:56:43     INFO -  mozproxy     context=self._context)
11:56:43     INFO -  mozproxy   File "/usr/lib/python2.7/urllib2.py", line 1198, in do_open
11:56:43     INFO -  mozproxy     raise URLError(err)
11:56:43     INFO -  mozproxy URLError: <urlopen error [Errno -2] Name or service not known>
11:56:43     INFO -  mozproxy ERROR - The following files failed: 'host-utils-68.0a1.en-US.linux-x86_64.tar.gz'
11:56:43     INFO -  mozproxy Stopping mitmproxy playback, killing process 707
11:56:43     INFO -  mozproxy Successfully killed the mitmproxy playback process
11:56:43     INFO -  raptor-main Info: removing reverse socket connections
11:56:43     INFO -  adb command_output: adb -s FA83T1A00347 wait-for-device reverse --remove-all, timeout: None, timedout: None, exitcode: 0, output:
11:56:43    ERROR -  raptor-main Error: No crash directory (/sdcard/raptor/profile/minidumps) found on remote device

:gbrown, any ideas here?

Flags: needinfo?(rwood) → needinfo?(gbrown)

The url from comment 8 looks correct to me. Possibly a temporary failure? I have a feeling that bc and/or aerickson looked at similar issues in the past. Let's see if Andrew recalls seeing something like this before....

Flags: needinfo?(gbrown) → needinfo?(aerickson)

Most of the recent failures look different: the host-utils download completes successfully, but then other things fail, like:

https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=252516100&repo=mozilla-central&lineNumber=1236

22:38:44 INFO - certutil: function failed: SEC_ERROR_BAD_DATABASE: security library: bad database.
...
22:39:15 ERROR - Traceback (most recent call last):
22:39:15 INFO - File "/builds/task_1560983832/workspace/build/tests/raptor/raptor/raptor.py", line 1281, in <module>
22:39:15 INFO - main()
22:39:15 INFO - File "/builds/task_1560983832/workspace/build/tests/raptor/raptor/raptor.py", line 1249, in main
22:39:15 INFO - success = raptor.run_tests(raptor_test_list, raptor_test_names)
22:39:15 INFO - File "/builds/task_1560983832/workspace/build/tests/raptor/raptor/raptor.py", line 1004, in run_tests
22:39:15 INFO - return super(RaptorAndroid, self).run_tests(tests, test_names)
22:39:15 INFO - File "/builds/task_1560983832/workspace/build/tests/raptor/raptor/raptor.py", line 183, in run_tests
22:39:15 INFO - self.run_test(test, timeout=int(test['page_timeout']))
22:39:15 INFO - File "/builds/task_1560983832/workspace/build/tests/raptor/raptor/raptor.py", line 1025, in run_test
22:39:15 INFO - self.run_test_warm(test, timeout)
22:39:15 INFO - File "/builds/task_1560983832/workspace/build/tests/raptor/raptor/raptor.py", line 1139, in run_test_warm
22:39:15 INFO - self.start_playback(test)
22:39:15 INFO - File "/builds/task_1560983832/workspace/build/tests/raptor/raptor/raptor.py", line 369, in start_playback
22:39:15 INFO - self.playback.start()
22:39:15 INFO - File "/builds/task_1560983832/workspace/build/venv/lib/python2.7/site-packages/mozproxy/backends/mitm.py", line 125, in start
22:39:15 INFO - self.setup()
22:39:15 INFO - File "/builds/task_1560983832/workspace/build/venv/lib/python2.7/site-packages/mozproxy/backends/mitm.py", line 384, in setup
22:39:15 INFO - self.install_mitmproxy_cert(self.browser_path)
22:39:15 INFO - File "/builds/task_1560983832/workspace/build/venv/lib/python2.7/site-packages/mozproxy/backends/mitm.py", line 535, in install_mitmproxy_cert
22:39:15 INFO - if not self.is_mitmproxy_cert_installed():
22:39:15 INFO - File "/builds/task_1560983832/workspace/build/venv/lib/python2.7/site-packages/mozproxy/backends/mitm.py", line 552, in is_mitmproxy_cert_installed
22:39:15 INFO - cmd_output = subprocess.check_output(command, env=os.environ.copy())
22:39:15 INFO - File "/usr/lib/python2.7/subprocess.py", line 574, in check_output
22:39:15 INFO - raise CalledProcessError(retcode, cmd, output=output)
22:39:15 INFO - subprocess.CalledProcessError: Command '['/builds/task_1560983832/workspace/testing/mozproxy/host-utils-68.0a1.en-US.linux-x86_64/certutil', '-d', 'sql:/tmp/tmpo3HV6W.mozrunner/', '-L']' returned non-zero exit status 255
22:39:15 INFO - certutil: could not add certificate to token or database: SEC_ERROR_ADDING_CERT: Error adding certificate to database.

SEC_ERROR_BAD_DATABASE is present in many/all passing tests too...but that still seems wrong!

Regarding the more general issue of this bug: Why are we reporting "No crash directory..."? The harness checks that the minidumps directory is present on teardown; the directory should have been created by the test app. In most/all recent failures reported here, the test app has not been launched. Successful raptor runs have something like:

23:13:51     INFO -  raptor-main Info: starting geckoview
...
23:13:53     INFO -  adb launch_application: am start -W -n org.mozilla.geckoview_example/org.mozilla.geckoview_example.GeckoViewActivity -a android.intent.action.MAIN --ez use_multiprocess True --es args "-profile /sdcard/raptor/profile --es env0 LOG_VERBOSE=1 --es env1 R_LOG_LEVEL=6" -d about:blank
23:13:53     INFO -  adb shell_output: adb -s HT86B1A00744 wait-for-device shell am start -W -n org.mozilla.geckoview_example/org.mozilla.geckoview_example.GeckoViewActivity -a android.intent.action.MAIN --ez use_multiprocess True --es args "-profile /sdcard/raptor/profile --es env0 LOG_VERBOSE=1 --es env1 R_LOG_LEVEL=6" -d about:blank; echo adb_returncode=$?, timeout: None, root: False, timedout: None, exitcode: 0, output: Starting: Intent { act=android.intent.action.MAIN dat=about:blank cmp=org.mozilla.geckoview_example/.GeckoViewActivity (has extras) }
23:13:53     INFO -  Status: ok
23:13:53     INFO -  Activity: org.mozilla.geckoview_example/.GeckoViewActivity

The failures here do not. Possibly launch_firefox_android_app():
https://searchfox.org/mozilla-central/rev/da14c413ef663eb1ba246799e94a240f81c42488/testing/raptor/raptor/raptor.py#940

should set a flag "android_app_launched" and check_for_crashes should no-op if !android_app_launched.

(In reply to Geoff Brown [:gbrown] (pto until June 26) from comment #9)

SEC_ERROR_BAD_DATABASE is present in many/all passing tests too...but that still seems wrong!

Yes - that error message is expected the first time Raptor (mozproxy actually) checks to see if the cert db exists already and it doesn't so then it is created. I have filed follow-up: Bug 1560974 - [mozproxy] Indicate in logging that the SEC_ERROR_BAD_DATABASE error is expected the first time mozproxy checks for the nss ca db.

So looks to me that there are a few issues here being grouped into this one bug description.

Let's take this bug to clean-up the situation as noted by :gbrown in comment 10. :bebe can someone on SV take this, thanks!

Flags: needinfo?(fstrugariu)

I remember having issues finding certutil when my upgrade of tooltool caused a glob failure (Bug 1548008), but not "SEC_ERROR_BAD_DATABASE" exactly.

Flags: needinfo?(aerickson)

Joel any thoughts here?

Flags: needinfo?(jmaher)
Assignee: nobody → airimovici
Status: NEW → ASSIGNED

this is primarily on geckoview-tp6-cold tests- I suspect with cold we do many more profile pushes and browser shutdowns, therefore increasing the chance of failing to pull minidumps.

I am glad to see that Alexandru picked this up, please keep this bug updated with progress.

Flags: needinfo?(jmaher)
Depends on: 1563229

Bug 1563229 will avoid "No crash directory" in most of these cases, which may help to focus this bug on the root cause.

It looks like most of these failures originate in tooltool failures, especially network failures during tooltool downloads originated from mozproxy. One issue then is, why do we see these network failures on bitbar? Another perspective, perhaps mozproxy should retry failed tooltool downloads? Consider https://searchfox.org/mozilla-central/rev/11712bd3ce7454923e5931fa92eaf9c01ef35a0a/testing/mozharness/mozharness/mozilla/tooltool.py#106 - tooltool retry logic in mozharness.

There are 113 total failures in the last 7 days on android-hw-p2-8-0-arm7-api-16 and android-hw-p2-8-0-android-aarch64 opt and pgo

Recent failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=255168134&repo=autoland&lineNumber=1267

4:58:57 INFO - mozproxy checking if the nss cert db already exists in the android browser profile
14:58:57 INFO - certutil: function failed: SEC_ERROR_BAD_DATABASE: security library: bad database.
14:58:57 INFO - mozproxy nss cert db doesn't exist yet
14:59:07 INFO - mozproxy creating nss cert database using command: /builds/task_1562511419/workspace/testing/mozproxy/host-utils-68.0a1.en-US.linux-x86_64/certutil -N -v -d sql:/tmp/tmpqrZibN.mozrunner/ --empty-password
14:59:07 INFO - certutil: function failed: SEC_ERROR_BAD_DATABASE: security library: bad database.
14:59:17 INFO - mozproxy importing mitmproxy cert into db using command: /builds/task_1562511419/workspace/testing/mozproxy/host-utils-68.0a1.en-US.linux-x86_64/certutil -A -d sql:/tmp/tmpqrZibN.mozrunner/ -n mitmproxy-cert -t TC,, -a -i /builds/worker/.mitmproxy/mitmproxy-ca-cert.cer
14:59:27 INFO - mozproxy command to import mitmproxy cert into cert db failed to complete
14:59:27 INFO - mozproxy verifying that the mitmproxy ca cert is installed on android
14:59:27 INFO - mozproxy getting the list of certs in the nss cert db in the android browser profile
14:59:28 INFO - certutil: function failed: SEC_ERROR_BAD_DATABASE: security library: bad database.
14:59:28 INFO - mozproxy certutil command failed
14:59:28 INFO - mozproxy Stopping mitmproxy playback, killing process 710
14:59:28 INFO - mozproxy Successfully killed the mitmproxy playback process
14:59:28 INFO - raptor-main Info: removing reverse socket connections
14:59:29 INFO - adb command_output: adb -s ZY322BVC2F wait-for-device reverse --remove-all, timeout: None, timedout: None, exitcode: 0, output:
14:59:29 ERROR - raptor-main Error: No crash directory (/sdcard/raptor/profile/minidumps) found on remote device
14:59:29 INFO - raptor-main Info: removing webext /builds/task_1562511419/workspace/build/tests/raptor/raptor/../webext/raptor
14:59:29 INFO - raptor-main Info: removing test folder for raptor: /sdcard/raptor
14:59:29 INFO - adb shell_output: adb -s ZY322BVC2F wait-for-device shell rm -r /sdcard/raptor; echo adb_returncode=$?, timeout: None, root: False, timedout: None, exitcode: 0, output:
14:59:29 INFO - raptor-control-server Info: shutting down control server
14:59:29 INFO - raptor-main Info: finished

See Also: → 1549209

(In reply to Cosmin Sabou [:CosminS] from comment #19)

This has failed 111 in the last 7 days: https://treeherder.mozilla.org/intermittent-failures.html#/bugdetails?startday=2019-06-21&endday=2019-06-28&tree=all&bug=1558177
Recent failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=253863545&repo=mozilla-inbound&lineNumber=1249
Robert, could you take over this as Florin has not responded yet?

:gbrown is taking care of the issue in comment 10 (thanks!)

Flags: needinfo?(rwood)
Flags: needinfo?(fstrugariu)
Whiteboard: [stockwell disable-recommended] → [comment 10]
Whiteboard: [comment 10][stockwell disable-recommended] → [comment 10]
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.