Closed Bug 1406564 Opened 7 years ago Closed 7 years ago

Intermittent PROCESS-CRASH | damp | application crashed [unknown top frame]

Categories

(Testing :: Talos, defect, P5)

Version 3
defect

Tracking

(firefox58 fixed)

RESOLVED FIXED
mozilla58
Tracking Status
firefox58 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: ochameau)

References

Details

(Keywords: intermittent-failure, Whiteboard: [stockwell fixed:other])

Attachments

(1 file)

Filed by: csabou [at] mozilla.com

https://treeherder.mozilla.org/logviewer.html#?job_id=135418852&repo=mozilla-central

https://queue.taskcluster.net/v1/task/Fgl-kQXNQaWSbPzlfhEAng/runs/0/artifacts/public/logs/live_backing.log

14:46:39     INFO -  mozcrash Downloading symbols from: https://queue.taskcluster.net/v1/task/EyRcUEZ1QVa2ZaKPSkVDRA/artifacts/public/build/target.crashreporter-symbols.zip
14:46:39     INFO -  PID 1651 | ** Unknown exception behavior: -2147483647
14:46:39     INFO -  PID 1651 | ** Unknown exception behavior: -2147483647
14:46:47     INFO -  mozcrash Copy/paste: /Users/cltbld/tasks/task_1507317996/build/macosx64-minidump_stackwalk /var/folders/vk/pwfthbw16fq0qjvw5y2fxgmr00000w/T/tmpcIv54N/profile/minidumps/3C2E8E58-B13A-4A21-8BF6-37F7D3CCEB0B.dmp /var/folders/vk/pwfthbw16fq0qjvw5y2fxgmr00000w/T/tmpLOmyxO
14:46:47     INFO -  mozcrash Saved minidump as /Users/cltbld/tasks/task_1507317996/build/blobber_upload_dir/3C2E8E58-B13A-4A21-8BF6-37F7D3CCEB0B.dmp
14:46:47     INFO -  PROCESS-CRASH | damp | application crashed [unknown top frame]
14:46:47     INFO -  Crash dump filename: /var/folders/vk/pwfthbw16fq0qjvw5y2fxgmr00000w/T/tmpcIv54N/profile/minidumps/3C2E8E58-B13A-4A21-8BF6-37F7D3CCEB0B.dmp
14:46:47     INFO -  stderr from minidump_stackwalk:
14:46:47     INFO -  2017-10-06 14:46:47: minidump.cc:4359: INFO: Minidump opened minidump /var/folders/vk/pwfthbw16fq0qjvw5y2fxgmr00000w/T/tmpcIv54N/profile/minidumps/3C2E8E58-B13A-4A21-8BF6-37F7D3CCEB0B.dmp
14:46:47     INFO -  2017-10-06 14:46:47: minidump.cc:4466: ERROR: Minidump header signature mismatch: (0x0, 0x0) != 0x504d444d
14:46:47     INFO -  2017-10-06 14:46:47: stackwalk.cc:133: ERROR: Minidump /var/folders/vk/pwfthbw16fq0qjvw5y2fxgmr00000w/T/tmpcIv54N/profile/minidumps/3C2E8E58-B13A-4A21-8BF6-37F7D3CCEB0B.dmp could not be read
14:46:47     INFO -  2017-10-06 14:46:47: minidump.cc:4331: INFO: Minidump closing minidump
14:46:47     INFO -  minidump_stackwalk exited with return code 1
14:46:47     INFO -  TEST-UNEXPECTED-ERROR | damp | Found crashes after test run, terminating test
14:46:47    ERROR -  Traceback (most recent call last):
14:46:47     INFO -    File "/Users/cltbld/tasks/task_1507317996/build/tests/talos/talos/run_tests.py", line 282, in run_tests
14:46:47     INFO -      talos_results.add(mytest.runTest(browser_config, test))
14:46:47     INFO -    File "/Users/cltbld/tasks/task_1507317996/build/tests/talos/talos/ttest.py", line 61, in runTest
14:46:47     INFO -      return self._runTest(browser_config, test_config, setup)
14:46:47     INFO -    File "/Users/cltbld/tasks/task_1507317996/build/tests/talos/talos/ttest.py", line 213, in _runTest
14:46:47     INFO -      test_config['name'])
14:46:47     INFO -    File "/Users/cltbld/tasks/task_1507317996/build/tests/talos/talos/ttest.py", line 45, in check_for_crashes
14:46:47     INFO -      raise TalosCrash("Found crashes after test run, terminating test")
14:46:47     INFO -  TalosCrash: Found crashes after test run, terminating test
14:46:47     INFO -  TEST-INFO took 3611878ms
14:46:47     INFO -  SUITE-END | took 3611s
14:46:48    ERROR - Return code: 2
14:46:48  WARNING - setting return code to 2
14:46:48    ERROR - # TBPL FAILURE #
14:46:48     INFO - Running pre test command check_screen_resolution with 'bash -c screenresolution get && screenresolution list && system_profiler SPDisplaysDataType'
14:46:48     INFO - Running command: ('bash', '-c', 'screenresolution get && screenresolution list && system_profiler SPDisplaysDataType') in /Users/cltbld/tasks/task_1507317996/build
14:46:48     INFO - Copy/paste: bash -c "screenresolution get && screenresolution list && system_profiler SPDisplaysDataType"
Should this be duplicated to the original bug https://bugzilla.mozilla.org/show_bug.cgi?id=1398576?
Flags: needinfo?(poirot.alex)
It seems to be a new and different issue. So I would keep this new one. But feel free to duplicate if that helps sherifing...

This new crash get this new signature in logs:
----------------------------------------------------------------------------
12:54:10     INFO -  TEST-INFO | started process 1708 (/Users/cltbld/tasks/task_1507491859/build/application/Nightly.app/Contents/MacOS/firefox -foreground -profile /var/folders/9k/_6nv_24n2k5dmqkfc1dxc1zh00000w/T/tmpbBqIUH/profile -tp file:/Users/cltbld/tasks/task_1507491859/build/tests/talos/talos/tests/devtools/damp.manifest.develop -tpchrome -tpnoisy -tploadnocache -tpcycles 1 -tppagecycles 5)
12:54:10     INFO -  PID 1708 | Unable to read VR Path Registry from /Users/cltbld/Library/Application Support/OpenVR/.openvr/openvrpaths.vrpath
12:54:11     INFO -  PID 1708 | Unable to read VR Path Registry from /Users/cltbld/Library/Application Support/OpenVR/.openvr/openvrpaths.vrpath
12:54:11     INFO -  PID 1708 | 2017-10-08 12:54:11.759 plugin-container[1710:22134] *** CFMessagePort: bootstrap_register(): failed 1100 (0x44c) 'Permission denied', port = 0x9a3b, name = 'com.apple.tsm.portname'
12:54:11     INFO -  PID 1708 | See /usr/include/servers/bootstrap_defs.h for the error codes.
12:54:12     INFO -  PID 1708 | Unable to read VR Path Registry from /Users/cltbld/Library/Application Support/OpenVR/.openvr/openvrpaths.vrpath
12:54:13     INFO -  PID 1708 | 2017-10-08 12:54:13.013 plugin-container[1711:22204] *** CFMessagePort: bootstrap_register(): failed 1100 (0x44c) 'Permission denied', port = 0x9d57, name = 'com.apple.tsm.portname'
12:54:13     INFO -  PID 1708 | See /usr/include/servers/bootstrap_defs.h for the error codes.
12:54:13     INFO -  PID 1708 | Unable to read VR Path Registry from /Users/cltbld/Library/Application Support/OpenVR/.openvr/openvrpaths.vrpath
12:54:13     INFO -  PID 1708 | 2017-10-08 12:54:13.889 plugin-container[1712:22248] *** CFMessagePort: bootstrap_register(): failed 1100 (0x44c) 'Permission denied', port = 0x9843, name = 'com.apple.tsm.portname'
12:54:13     INFO -  PID 1708 | See /usr/include/servers/bootstrap_defs.h for the error codes.
13:54:10     INFO -  Timeout waiting for test completion; killing browser...
----------------------------------------------------------------------------

Whereas a successful run get that:
----------------------------------------------------------------------------
05:49:12     INFO -  TEST-INFO | started process 1642 (/Users/cltbld/tasks/task_1507553082/build/application/Nightly.app/Contents/MacOS/firefox -foreground -profile /var/folders/jc/7v003mhd31d1mm9tm93nzcc000000w/T/tmpzodE4c/profile -tp file:/Users/cltbld/tasks/task_1507553082/build/tests/talos/talos/tests/devtools/damp.manifest.develop -tpchrome -tpnoisy -tploadnocache -tpcycles 1 -tppagecycles 5)
05:49:12     INFO -  PID 1642 | Unable to read VR Path Registry from /Users/cltbld/Library/Application Support/OpenVR/.openvr/openvrpaths.vrpath
05:49:13     INFO -  PID 1642 | Unable to read VR Path Registry from /Users/cltbld/Library/Application Support/OpenVR/.openvr/openvrpaths.vrpath
05:49:13     INFO -  PID 1642 | 2017-10-09 05:49:13.989 plugin-container[1643:8915] *** CFMessagePort: bootstrap_register(): failed 1100 (0x44c) 'Permission denied', port = 0x9937, name = 'com.apple.tsm.portname'
05:49:13     INFO -  PID 1642 | See /usr/include/servers/bootstrap_defs.h for the error codes.
05:49:14     INFO -  PID 1642 | Open toolbox on 'cold.inspector'
----------------------------------------------------------------------------

It is OSX specific. Both success and failure inclure same errors, so it is hard to say what is going on.
I recently added some dump message like "Open toolbox on 'cold.inspector'" to help debugging DAMP,
but on failure, we don't even get that first log, wjich is done early in tests run.
Flags: needinfo?(poirot.alex)
we have 87 failures in the last week:
https://brasstacks.mozilla.com/orangefactor/index.html?display=Bug&bugid=1406564&startday=2017-10-04&endday=2017-10-11&tree=trunk

this is territory to disable this test.  I am not sure what to do here, :ochameau do you have ideas?
Flags: needinfo?(poirot.alex)
Absolutely no idea, let see what try says with random additional logs:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=b14bd61ed2f335d014116a9f76fc1380fe92788e

If we come to disable this, could we only disable OSX?
Flags: needinfo?(poirot.alex)
yes, I would only disable on the fewest configs possible, in this case OSX.
The main intermittent seems to be related to this line that never resolves:
http://searchfox.org/mozilla-central/rev/31606bbabc50b08895d843b9f5f3da938ccdfbbf/testing/talos/talos/tests/devtools/addon/content/damp.js#318
    await this.testSetup(url);

`url` is unexpectedly undefined. I'm wondering if that's because of that.

Here is another try push to see if defining it correctly fixes this:
  https://treeherder.mozilla.org/#/jobs?repo=try&revision=c80398b25b87a740c3c3856c51e0cc1cc3340b41

Note that there is at least one other kind of intermittent, like this run:
https://treeherder.mozilla.org/logviewer.html#?job_id=136289884&repo=try&lineNumber=6866
14:19:05     INFO -  PID 832 | Loaded: http://www.cnn.com/?refresh=1
14:19:05     INFO -  PID 832 | Loaded: http://localhost:49199/tests/tp5n/aljazeera.net/aljazeera.net/portal.html
14:43:45     INFO -  Timeout waiting for test completion; killing browser...
14:43:45     INFO -  Terminating psutil.Process(pid=832, name='firefox')
14:43:45     INFO -  PID 832 | ** Unknown exception behavior: -2147483647
...
14:43:55     INFO -  PROCESS-CRASH | tps | application crashed [unknown top frame]

This kind of crash is different. I think it crashes only because the whole g2 test suite takes too much time to complete. May be we should bump the timeout, or split DAMP and tps on their own chunk? Also I'm not sure if the timeout is global and apply to both tps *and* DAMP, or if that's only a timeout specific to tps...
Assignee: nobody → poirot.alex
Comment on attachment 8917603 [details]
Bug 1406564 - Open the right document when running cold toolbox open.

https://reviewboard.mozilla.org/r/188562/#review193818

How was this working in any scenario?
Attachment #8917603 - Flags: review?(bgrinstead) → review+
(In reply to Brian Grinstead [:bgrins] from comment #11)
> How was this working in any scenario?

I imagine we were catching some "load" event from about:blank or something.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=1ade1d82e619
With this patch, I only see "tps" timeout.
Shouldn't we slightly increase its timeout (on osx)?
Pushed by apoirot@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/9560b3be1516
Open the right document when running cold toolbox open. r=bgrins
https://hg.mozilla.org/mozilla-central/rev/9560b3be1516
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla58
Whiteboard: [stockwell fixed:other]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: