mozharness talos has issues with minidump_stackwalk

RESOLVED FIXED

Status

Release Engineering
Mozharness
P3
normal
RESOLVED FIXED
5 years ago
4 years ago

People

(Reporter: aki, Assigned: jyeo)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [mozharness][talos])

Attachments

(6 attachments)

(Reporter)

Description

5 years ago
e.g. https://tbpl.mozilla.org/php/getParsedLog.php?id=16357091&tree=Cedar&full=1

15:27:33     INFO -  NOISE: exception getting privileged access, defaulting to XUL_FENNEC
15:27:38     INFO -  NOISE: Found crashdump: /tmp/tmp4OAlyL/profile/minidumps/4dbd15ec-1f53-8926-382c5348-0a6a5952.dmp
15:27:39     INFO -  Failed tp5n:
15:27:39     INFO -  		Stopped Mon, 22 Oct 2012 15:27:39
15:27:39    ERROR -  Traceback (most recent call last):
15:27:39     INFO -    File "/home/cltbld/talos-slave/test/build/venv/lib/python2.6/site-packages/talos/run_tests.py", line 250, in run_tests
15:27:39     INFO -      talos_results.add(mytest.runTest(browser_config, test))
15:27:39     INFO -    File "/home/cltbld/talos-slave/test/build/venv/lib/python2.6/site-packages/talos/ttest.py", line 412, in runTest
15:27:39     INFO -      self.cleanupAndCheckForCrashes(browser_config, profile_dir)
15:27:39     INFO -    File "/home/cltbld/talos-slave/test/build/venv/lib/python2.6/site-packages/talos/ttest.py", line 182, in cleanupAndCheckForCrashes
15:27:39 CRITICAL -      raise talosError("error executing: '%s'" % subprocess.list2cmdline(cmd))
15:27:39 CRITICAL -  talosError: "error executing: '/home/cltbld/talos-slave/test/build/venv/lib/python2.6/site-packages/talos/breakpad/linux64/minidump_stackwalk /tmp/tmp4OAlyL/profile/minidumps/4dbd15ec-1f53-8926-382c5348-0a6a5952.dmp /home/cltbld/talos-slave/test/build/symbols'"
15:27:39 CRITICAL -  FAIL: Busted: tp5n
15:27:39     INFO -  FAIL: error executing: '/home/cltbld/talos-slave/test/build/venv/lib/python2.6/site-packages/talos/breakpad/linux64/minidump_stackwalk /tmp/tmp4OAlyL/profile/minidumps/4dbd15ec-1f53-8926-382c5348-0a6a5952.dmp /home/cltbld/talos-slave/test/build/symbols'
15:27:39    ERROR -  Traceback (most recent call last):
15:27:39     INFO -    File "/home/cltbld/talos-slave/test/build/venv/bin/talos", line 9, in <module>
15:27:39     INFO -      load_entry_point('talos==0.0', 'console_scripts', 'talos')()
15:27:39     INFO -    File "/home/cltbld/talos-slave/test/build/venv/lib/python2.6/site-packages/talos/run_tests.py", line 295, in main
15:27:39     INFO -      run_tests(parser)
15:27:39     INFO -    File "/home/cltbld/talos-slave/test/build/venv/lib/python2.6/site-packages/talos/run_tests.py", line 259, in run_tests
15:27:39     INFO -      raise e
15:27:39 CRITICAL -  talos.utils.talosError: "error executing: '/home/cltbld/talos-slave/test/build/venv/lib/python2.6/site-packages/talos/breakpad/linux64/minidump_stackwalk /tmp/tmp4OAlyL/profile/minidumps/4dbd15ec-1f53-8926-382c5348-0a6a5952.dmp /home/cltbld/talos-slave/test/build/symbols'"
15:27:39    ERROR - Return code: 1
(Reporter)

Comment 1

5 years ago
Same for linux32:

15:12:02     INFO -  DEBUG: created profile
15:32:04     INFO -  NOISE: exception getting privileged access, defaulting to XUL_FENNEC
15:32:09     INFO -  NOISE: Found crashdump: /tmp/tmpORRyKf/profile/minidumps/752b6547-0d26-efa5-61862122-6aff5a42.dmp
15:32:09     INFO -  Failed tscrollr:
15:32:09     INFO -  		Stopped Mon, 22 Oct 2012 15:32:09
15:32:09    ERROR -  Traceback (most recent call last):
15:32:09     INFO -    File "/home/cltbld/talos-slave/test/build/venv/lib/python2.6/site-packages/talos/run_tests.py", line 250, in run_tests
15:32:09     INFO -      talos_results.add(mytest.runTest(browser_config, test))
15:32:09     INFO -    File "/home/cltbld/talos-slave/test/build/venv/lib/python2.6/site-packages/talos/ttest.py", line 412, in runTest
15:32:09     INFO -      self.cleanupAndCheckForCrashes(browser_config, profile_dir)
15:32:09     INFO -    File "/home/cltbld/talos-slave/test/build/venv/lib/python2.6/site-packages/talos/ttest.py", line 182, in cleanupAndCheckForCrashes
15:32:09 CRITICAL -      raise talosError("error executing: '%s'" % subprocess.list2cmdline(cmd))
15:32:09 CRITICAL -  talosError: "error executing: '/home/cltbld/talos-slave/test/build/venv/lib/python2.6/site-packages/talos/breakpad/linux/minidump_stackwalk /tmp/tmpORRyKf/profile/minidumps/752b6547-0d26-efa5-61862122-6aff5a42.dmp /home/cltbld/talos-slave/test/build/symbols'"
15:32:09 CRITICAL -  FAIL: Busted: tscrollr
15:32:09     INFO -  FAIL: error executing: '/home/cltbld/talos-slave/test/build/venv/lib/python2.6/site-packages/talos/breakpad/linux/minidump_stackwalk /tmp/tmpORRyKf/profile/minidumps/752b6547-0d26-efa5-61862122-6aff5a42.dmp /home/cltbld/talos-slave/test/build/symbols'
15:32:09    ERROR -  Traceback (most recent call last):
15:32:09     INFO -    File "/home/cltbld/talos-slave/test/build/venv/bin/talos", line 9, in <module>
15:32:09     INFO -      load_entry_point('talos==0.0', 'console_scripts', 'talos')()
15:32:09     INFO -    File "/home/cltbld/talos-slave/test/build/venv/lib/python2.6/site-packages/talos/run_tests.py", line 295, in main
15:32:09     INFO -      run_tests(parser)
15:32:09     INFO -    File "/home/cltbld/talos-slave/test/build/venv/lib/python2.6/site-packages/talos/run_tests.py", line 259, in run_tests
15:32:09     INFO -      raise e
15:32:09 CRITICAL -  talos.utils.talosError: "error executing: '/home/cltbld/talos-slave/test/build/venv/lib/python2.6/site-packages/talos/breakpad/linux/minidump_stackwalk /tmp/tmpORRyKf/profile/minidumps/752b6547-0d26-efa5-61862122-6aff5a42.dmp /home/cltbld/talos-slave/test/build/symbols'"
15:32:09    ERROR - Return code: 1
any idea why this is failing?  do we need a different environment for launching this process?  Are we not running these tests on linux, the minidump_stackwalk is using the linux version.
(Reporter)

Comment 3

5 years ago
I don't know why it's failing.
I do know that it's calling the linux minidump_stackwalk on linux, and linux64 on linux64, etc. so that's not it.

I'm thinking about parsing for this in mozharness, and if I detect it, doing some debugging later (verifying the binary and directory are there, permissions, etc)
(Reporter)

Updated

5 years ago
Assignee: nobody → aki
(Reporter)

Comment 4

5 years ago
Created attachment 676422 [details] [diff] [review]
try to figure out what's going on here

We should remove this code when we figure out what's going on and fix it.
But this should help us figure out what's going on.
Attachment #676422 - Flags: review?(jhammel)
Comment on attachment 676422 [details] [diff] [review]
try to figure out what's going on here

Review of attachment 676422 [details] [diff] [review]:
-----------------------------------------------------------------

r+ with one nit.

::: mozharness/mozilla/testing/talos.py
@@ +454,5 @@
>          self.return_code = self.run_command(command, cwd=self.workdir,
>                                              output_parser=parser)
> +        if parser.minidump_output:
> +            for item in parser.minidump_output:
> +                self.run_command(["ls", "-l", item])

I would like to qualify this output so we know what it is and why we see it.
Attachment #676422 - Flags: review?(jhammel) → review+
(Reporter)

Comment 6

5 years ago
Comment on attachment 676422 [details] [diff] [review]
try to figure out what's going on here

http://hg.mozilla.org/build/mozharness/rev/8de53d8a6437
Attachment #676422 - Flags: checked-in+
(Reporter)

Updated

5 years ago
See Also: → bug 745193

Comment 7

5 years ago
It would probably be a nice thing to have Talos be a little more informative about this as well.  I'll work on a patch here, though this will likely be a multi-step process unless I can reproduce (FWIW, I haven't seen this error either with mozharness or otherwise).

Comment 8

5 years ago
Created attachment 685312 [details] [diff] [review]
be a little more verbose about a few things

I can't say if this is sufficient to really diagnose the failure, but its probably not a bad change to take
Attachment #685312 - Flags: review?(jmaher)
Comment on attachment 685312 [details] [diff] [review]
be a little more verbose about a few things

Review of attachment 685312 [details] [diff] [review]:
-----------------------------------------------------------------

this isn't much more verbose, but it cleans a lot of little things up.
Attachment #685312 - Flags: review?(jmaher) → review+
since this is rare, I am not sure of what testing is needed, maybe a sanity check on windows.

Comment 11

5 years ago
pushed to try: https://tbpl.mozilla.org/?tree=Try&rev=ce2120177cde

while the patch isn't a ton more verbose, it should at least give some sort of clue as to why it fails and it ensures that the minidump_stackwalk and the symbols path given are actually found on disk.

Comment 12

5 years ago
Try run for ce2120177cde is complete.
Detailed breakdown of the results available here:
    https://tbpl.mozilla.org/?tree=Try&rev=ce2120177cde
Results (out of 5 total builds):
    success: 5
Builds (or logs if builds failed) available at:
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jhammel@mozilla.com-ce2120177cde

Comment 13

5 years ago
pushed: http://hg.mozilla.org/build/talos/rev/e00b0636f9f3 ; hopefully when this hits production/cedar this will at least give up the next clue
(Reporter)

Comment 14

5 years ago
We haven't seen a crash recently.

We can try to force a crash (ted's crash page in a page load manifest?) or decide whether this actually blocks rollout.

Comment 15

5 years ago
This probably shouldn't block deployment if it is rare.
My impression is that what's rare are ongoing intermittent talos crashes and hangs, but that this is what happens every time a mozharness talos run crashes or hangs.

As bug 814698 says, talos-only crashes caused by newly landed patches happen around once a month. It would be an interesting gamble to see whether we would tell someone that they just can't land a patch because it crashes, though we cannot tell them where, or whether we would just hide some or all of talos on some or all platforms on some or all branches. I'd bet heavily on the latter for a frequent intermittent crash, especially in just one suite, but for permared? An interesting gamble.
(Reporter)

Comment 17

5 years ago
Not actively working on this.
Assignee: aki → nobody

Updated

5 years ago
Assignee: nobody → yshun

Comment 18

5 years ago
It seems that this is one of the last things to enable talos mozharness in production.
I was thinking of aiming for switching things on the try branch first, once we verify the numbers on Cedar and fix this bug.
(Assignee)

Updated

5 years ago
Depends on: 892524
(Assignee)

Comment 19

5 years ago
Created attachment 774206 [details]
linux talos mozharness log

minidump stackwalk output on linux(64)
(Assignee)

Comment 20

5 years ago
Created attachment 774213 [details]
windows talos mozharness log

output for minidump stackwalk on winxp
(Assignee)

Comment 21

5 years ago
Created attachment 774217 [details]
mac talos mozharness log

I have posted 3 logs. I can see the output for minidump_stackwalk when ff creashes on linux and win but not on mac.

I commented out http://mxr.mozilla.org/build/source/talos/talos/ttest.py#209 till 210 so that I wouldn't get the error in bug 892524.
getting closer!

Updated

5 years ago
Attachment #774213 - Attachment mime type: text/x-log → text/plain

Updated

5 years ago
Attachment #774217 - Attachment mime type: text/x-log → text/plain

Updated

5 years ago
Attachment #774206 - Attachment mime type: text/x-log → text/plain
(Assignee)

Comment 23

5 years ago
Created attachment 774661 [details]
talos mozharness minidump_stackwalk output on mac when firefox crashes

I managed to crash firefox with https://code.google.com/p/crashme/ by executing this hack in the browser console:

    Components.utils.import("resource://crashme/modules/Crasher.jsm");
    Crasher.crash(0);

minidump_stackwalk was able to find the dmp file and printed the output in the log.

I wasn't able to get the output in https://bugzilla.mozilla.org/attachment.cgi?id=774217&action=edit because I was using kill -SEGV to crash firefox. When I try to kill firefox that way, the dmp files are not generated in the minidumps folder.
(Assignee)

Updated

5 years ago
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
(Assignee)

Comment 24

5 years ago
We have the minidump output on mac, win and linux. Resolved fixed.
(Reporter)

Comment 25

5 years ago
\o/
Product: mozilla.org → Release Engineering

Updated

4 years ago
Component: General Automation → Mozharness
You need to log in before you can comment on or make changes to this bug.