Closed Bug 738888 Opened 12 years ago Closed 12 years ago

Talos crashes end with "error executing: ... minidump_stackwalk ....dmp ../symbols"

Categories

(Testing :: Talos, defect)

defect
Not set
critical

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: philor, Unassigned)

References

Details

(Keywords: intermittent-failure)

Lots of moving parts that could be at fault, but inbound was still okay while crashing as of https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=66223f04fb55 at 16:30, then merged the Talos zip update seven pushes later, and then eight and nine pushes after that got the next crashes and was

https://tbpl.mozilla.org/php/getParsedLog.php?id=10338558&tree=Mozilla-Inbound
Rev3 Fedora 12 mozilla-inbound talos tpr_responsiveness on 2012-03-23 19:49:08 PDT for push 9c463a882b6f

NOISE: Found crashdump: /tmp/tmpK5shjy/profile/minidumps/73097215-3e75-40b1-6da3e474-423ef3e2.dmp
Failed tp5r: 
		Stopped Fri, 23 Mar 2012 20:51:56
FAIL: Busted: tp5r
FAIL: error executing: '/home/cltbld/talos-slave/talos-data/talos/breakpad/linux/minidump_stackwalk /tmp/tmpK5shjy/profile/minidumps/73097215-3e75-40b1-6da3e474-423ef3e2.dmp ../symbols'
Traceback (most recent call last):
  File "run_tests.py", line 681, in <module>
    main()
  File "run_tests.py", line 678, in main
    test_file(arg, options, parser.parsed)
  File "run_tests.py", line 619, in test_file
    raise e
utils.talosError: "error executing: '/home/cltbld/talos-slave/talos-data/talos/breakpad/linux/minidump_stackwalk /tmp/tmpK5shjy/profile/minidumps/73097215-3e75-40b1-6da3e474-423ef3e2.dmp ../symbols'"
program finished with exit code 1

https://tbpl.mozilla.org/php/getParsedLog.php?id=10337944&tree=Mozilla-Inbound
Rev3 Fedora 12x64 mozilla-inbound talos dirty on 2012-03-23 20:00:10 PDT for push 6bbe864b5162

NOISE: Found crashdump: /tmp/tmpuu9Rf4/profile/minidumps/3c2b99ec-d0aa-34b0-0ca64a8d-2a17987a.dmp
Failed ts_places_generated_med: 
		Stopped Fri, 23 Mar 2012 20:28:25
FAIL: Busted: ts_places_generated_med
FAIL: error executing: '/home/cltbld/talos-slave/talos-data/talos/breakpad/linux64/minidump_stackwalk /tmp/tmpuu9Rf4/profile/minidumps/3c2b99ec-d0aa-34b0-0ca64a8d-2a17987a.dmp ../symbols'
Traceback (most recent call last):
  File "run_tests.py", line 681, in <module>
    main()
  File "run_tests.py", line 678, in main
    test_file(arg, options, parser.parsed)
  File "run_tests.py", line 619, in test_file
    raise e
utils.talosError: "error executing: '/home/cltbld/talos-slave/talos-data/talos/breakpad/linux64/minidump_stackwalk /tmp/tmpuu9Rf4/profile/minidumps/3c2b99ec-d0aa-34b0-0ca64a8d-2a17987a.dmp ../symbols'"
program finished with exit code 1

Could be Linux-only, could be that we just crash a lot more in Talos on Linux than elsewhere, and we haven't gotten around to crashing on any other platform yet.
Not sure if this is normal or not but I see a lot of these [1]:
NOISE: Could not read chrome manifest 'file:///home/cltbld/talos-slave/talos-data/firefox/extensions/%7B972ce4c6-7e08-4474-a285-3208198ce6fd%7D/chrome.manifest'.
NOISE: [JavaScript Warning: "Use of enablePrivilege is deprecated.  Please use code that runs with the system principal (e.g. an extension) instead." {file: "file:///home/cltbld/talos-slave/talos-data/talos/startup_test/startup_test.html?begin=1332558487108" line: 0}]
NOISE: __startTimestamp1332558487717__endTimestamp
NOISE: __startBeforeLaunchTimestamp1332558487108__endBeforeLaunchTimestamp
NOISE: __startAfterTerminationTimestamp1332558488184__endAfterTerminationTimestamp

Just for the record, talos.zip had not changed since Mar. 3rd (bug 732835).

[1]
http://mxr.mozilla.org/mozilla-central/source/xpcom/components/nsComponentManager.cpp#512
(In reply to Armen Zambrano G. [:armenzg] - Release Engineer from comment #7)
> Not sure if this is normal or not but I see a lot of these [1]:
> NOISE: Could not read chrome manifest
> 'file:///home/cltbld/talos-slave/talos-data/firefox/extensions/%7B972ce4c6-
> 7e08-4474-a285-3208198ce6fd%7D/chrome.manifest'.
> NOISE: [JavaScript Warning: "Use of enablePrivilege is deprecated.  Please
> use code that runs with the system principal (e.g. an extension) instead."
> {file:
> "file:///home/cltbld/talos-slave/talos-data/talos/startup_test/startup_test.
> html?begin=1332558487108" line: 0}]
> NOISE: __startTimestamp1332558487717__endTimestamp
> NOISE: __startBeforeLaunchTimestamp1332558487108__endBeforeLaunchTimestamp
> NOISE:
> __startAfterTerminationTimestamp1332558488184__endAfterTerminationTimestamp
> 
> Just for the record, talos.zip had not changed since Mar. 3rd (bug 732835).
> 
> [1]
> http://mxr.mozilla.org/mozilla-central/source/xpcom/components/
> nsComponentManager.cpp#512

I also get a lot of these testing locally.  We should fix the enablePrivilege bug.  I actually don't understand the chrome.manifest bug (though i haven't looked into it). That said, I see it all the time without this failure.

It looks like the underlying binary fails (/home/cltbld/talos-slave/talos-data/talos/breakpad/linux/minidump_stackwalk) though I don't know why.  If https://bugzilla.mozilla.org/show_bug.cgi?id=734163 got fixed we could at least see where it was actually failing and hopefully instrument this.  I haven't seen this error in practice so I don't have much feeling why the failing occurs.  I would blindly guess that the resultant stack dumps are somehow corrupt (not flushed to disk?), but its purely a guess
Does that "program finished with exit code 1" mean that minidump_stackwalk finished with exit code 1, or is that the run_tests.py script finishing with exit code 1?
(In reply to Ted Mielczarek [:ted] from comment #12)
> Does that "program finished with exit code 1" mean that minidump_stackwalk
> finished with exit code 1, or is that the run_tests.py script finishing with
> exit code 1?

The traceback seems to imply the latter
jmaher gave me a copy of this talos.zip, and unzipping it on my Linux system shows that the minidump_stackwalk executables are apparently not zipped with executable permissions. That could definitely cause this error.
So the issue is found: running create_talos_zip.py with python 2.4 (as is on people) will not preserve permissions properly.  Running on 2.7 will.  Casual googling has not given me when exactly this changed, but for now create_talos_zip.py should be run on python 2.7
Not that the stack is useful, but we fixed the problem as described:
https://tbpl.mozilla.org/php/getParsedLog.php?id=10410744&tree=Mozilla-Inbound&full=1
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Whiteboard: [orange]
You need to log in before you can comment on or make changes to this bug.