linux64: After a crash, OSError / "MINIDUMP_STACKWALK binary not found" trying to call stackwalkbin

RESOLVED FIXED

Status

P4
normal
RESOLVED FIXED
9 years ago
5 years ago

People

(Reporter: jruderman, Assigned: bhearsum)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [linux64][automation])

Attachments

(1 attachment)

(Reporter)

Description

9 years ago
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1271051562.1271052569.1838.gz&fulltext=1

Firefox crashed while running dromaeo_css test, then Talos fell over because it couldn't find the minidump stackwalk binary.

NOISE: Found crashdump: /tmp/tmpz16xi2/profile/minidumps/41aa8332-0cf5-63b0-114497e2-5e570d8c.dmp
Traceback (most recent call last):
  File "run_tests.py", line 462, in <module>
    test_file(arg)
  File "run_tests.py", line 424, in test_file
    browser_dump, counter_dump, print_format = mytest.runTest(browser_config, test)
  File "/home/cltbld/talos-slave/talos-data/talos/ttest.py", line 382, in runTest
    self.checkForCrashes(browser_config, profile_dir)
  File "/home/cltbld/talos-slave/talos-data/talos/ttest.py", line 160, in checkForCrashes
    subprocess.call([stackwalkbin, dump, browser_config['symbols_path']], stderr=nullfd)
  File "/home/cltbld/lib/python2.5/subprocess.py", line 444, in call
    return Popen(*popenargs, **kwargs).wait()
  File "/home/cltbld/lib/python2.5/subprocess.py", line 594, in __init__
    errread, errwrite)
  File "/home/cltbld/lib/python2.5/subprocess.py", line 1097, in _execute_child
    raise child_exception
OSError: [Errno 2] No such file or directory
This seems to be affecting unit tests too:
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1271077375.1271077759.17014.gz&fulltext=1#err0
MINIDUMP_STACKWALK binary not found: /home/cltbld/talos-slave/mozilla-central-fedora64-opt-u-mochitests-3/tools/breakpad/linux64/minidump_stackwalk

which makes debugging random crash oranges kind of hard. How hard is this to fix?
This is due to us not producing linux64 symbol packages for talos to download - so when it finds a crash it attempts to run stalkwalk but there isn't any symbol file.
Status: NEW → RESOLVED
Last Resolved: 9 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 554138
This is still happening, and bug 554138 claims to be fixed. Looking at the Talos code, I don't believe that not having symbols would cause this error anyway. If the symbols were missing, minidump_stackwalk would still work, you would just get a stack without function names. The fact that the subprocess module is raising an OSError here means that the minidump_stackwalk binary can't be found for some reason. The Talos code appears to be looking in the same place for Linux64 that it would for Linux32 (unless platform.system() is something other than 'Linux', which it shouldn't be), so I don't know what the real problem is here.
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
(Assignee)

Updated

9 years ago
Duplicate of this bug: 562011
Created attachment 441907 [details] [diff] [review]
[checked in]enable talos to download linux64 symbols

We initially forced talos to skip downloading linux64 symbols as they did not exist, now that they do we need to turn the download on.
Assignee: nobody → anodelman
Attachment #441907 - Flags: review?(bhearsum)
Duplicate of this bug: 562090
Priority: -- → P2
(Assignee)

Updated

9 years ago
Attachment #441907 - Flags: review?(bhearsum) → review+
Comment on attachment 441907 [details] [diff] [review]
[checked in]enable talos to download linux64 symbols

changeset:   707:b97482cf0ba0
Attachment #441907 - Attachment description: enable talos to download linux64 symbols → [checked in]enable talos to download linux64 symbols
Attachment #441907 - Flags: checked-in+
I'm seeing linux64 symbols being downloaded to talos now.  We really need a crash to ensure that this is working right.
Duplicate of this bug: 561017
See also bug 561017 comment #1, the stack walker binary is missing.
Copying nthomas' comment from bug 561017:
> [cltbld@talos-r3-fed64-005 talos]$ pwd
> /home/cltbld/talos-slave/talos-data/talos
> [cltbld@talos-r3-fed64-005 talos]$ ls -l breakpad/
> total 24
> drwxrwxr-x 2 cltbld cltbld 4096 2010-04-24 15:11 CVS
> drwxrwxr-x 3 cltbld cltbld 4096 2010-04-24 15:11 linux
> -rw-rw-r-- 1 cltbld cltbld    5 2010-03-29 09:32 linux64
> drwxrwxr-x 3 cltbld cltbld 4096 2010-04-24 15:11 maemo
> drwxrwxr-x 3 cltbld cltbld 4096 2010-04-24 15:11 osx
> drwxrwxr-x 3 cltbld cltbld 4096 2010-04-24 15:11 win32
> 
> Looks like linux64 is supposed to be a symlink but is a plain file instead. A
> quick google search indicates CVS may suck at symlinks.
So, not actually a talos code issue but a machine configuration issue:

breakpad/linux/minidump_stackwalk: /lib/ld-linux.so.2: bad ELF interpreter: No such file or directory

breakpad/linux/minidump_stackwalk: error while loading shared libraries: libstdc++.so.6: cannot open shared object file: No such file or directory

fixed with
yum install ld-linux.so.2
yum install libstdc++.so.6

Now to figure out how to get this on all the fed64 talos slaves...
armen is going to be working on the dependency roll out.
Assignee: anodelman → armenzg

Comment 17

9 years ago
I don't have time to work on this.
This should be an easy puppet patch but I don't have head-space for this.
Assignee: armenzg → nobody
Priority: P2 → P4
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1273607037.1273607395.969.gz
Summary: linux64: After a crash, "OSError: [Errno 2] No such file or directory" trying to call stackwalkbin → linux64: After a crash, "MINIDUMP_STACKWALK binary not found" trying to call stackwalkbin
Whiteboard: [linux64][automation]

Updated

9 years ago
Duplicate of this bug: 566330

Updated

9 years ago
Summary: linux64: After a crash, "MINIDUMP_STACKWALK binary not found" trying to call stackwalkbin → linux64: After a crash, OSError / "MINIDUMP_STACKWALK binary not found" trying to call stackwalkbin
(Reporter)

Comment 22

9 years ago
It might be easier to put a linux64 version of minidump_stackwalk in the build tools repo than to put 32-bit libraries on all the machines that use it.

FWIW, on Ubuntu I had to "sudo apt-get install ia32-libs".
(Reporter)

Comment 23

9 years ago
And I'm still not getting useful stacks for my crashes.  What am I doing wrong?

> Thread 0 (crashed)
>  0  0x7f16ec8cae98
>     rbx = 0xf09b577e   r12 = 0x032314c0   r13 = 0xf0764efa   r14 = 0x025d4950
>     r15 = 0x00000000   rip = 0xec8cae98   rsp = 0xf6be81b8   rbp = 0xf6be81e0
(Reporter)

Comment 24

9 years ago
Ted helped me partially figure out comment 23, and I filed bug 571578.
I've got a working minidump_stackwalk for 64-bit linux to land.
Assignee: nobody → bhearsum
landed it in 96ab87545e81
This fixed the issue with 64-bit Linux stack traces.
Status: REOPENED → RESOLVED
Last Resolved: 9 years ago8 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.