Closed
Bug 804385
Opened 12 years ago
Closed 11 years ago
mozharness talos has issues with minidump_stackwalk
Categories
(Release Engineering :: Applications: MozharnessCore, defect, P3)
Release Engineering
Applications: MozharnessCore
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: mozilla, Assigned: jyeo)
References
Details
(Whiteboard: [mozharness][talos])
Attachments
(6 files)
1.88 KB,
patch
|
jmaher
:
review+
mozilla
:
checked-in+
|
Details | Diff | Splinter Review |
7.44 KB,
patch
|
jmaher
:
review+
|
Details | Diff | Splinter Review |
184.72 KB,
text/plain
|
Details | |
774.96 KB,
text/plain
|
Details | |
199.52 KB,
text/plain
|
Details | |
398.82 KB,
text/plain
|
Details |
e.g. https://tbpl.mozilla.org/php/getParsedLog.php?id=16357091&tree=Cedar&full=1 15:27:33 INFO - NOISE: exception getting privileged access, defaulting to XUL_FENNEC 15:27:38 INFO - NOISE: Found crashdump: /tmp/tmp4OAlyL/profile/minidumps/4dbd15ec-1f53-8926-382c5348-0a6a5952.dmp 15:27:39 INFO - Failed tp5n: 15:27:39 INFO - Stopped Mon, 22 Oct 2012 15:27:39 15:27:39 ERROR - Traceback (most recent call last): 15:27:39 INFO - File "/home/cltbld/talos-slave/test/build/venv/lib/python2.6/site-packages/talos/run_tests.py", line 250, in run_tests 15:27:39 INFO - talos_results.add(mytest.runTest(browser_config, test)) 15:27:39 INFO - File "/home/cltbld/talos-slave/test/build/venv/lib/python2.6/site-packages/talos/ttest.py", line 412, in runTest 15:27:39 INFO - self.cleanupAndCheckForCrashes(browser_config, profile_dir) 15:27:39 INFO - File "/home/cltbld/talos-slave/test/build/venv/lib/python2.6/site-packages/talos/ttest.py", line 182, in cleanupAndCheckForCrashes 15:27:39 CRITICAL - raise talosError("error executing: '%s'" % subprocess.list2cmdline(cmd)) 15:27:39 CRITICAL - talosError: "error executing: '/home/cltbld/talos-slave/test/build/venv/lib/python2.6/site-packages/talos/breakpad/linux64/minidump_stackwalk /tmp/tmp4OAlyL/profile/minidumps/4dbd15ec-1f53-8926-382c5348-0a6a5952.dmp /home/cltbld/talos-slave/test/build/symbols'" 15:27:39 CRITICAL - FAIL: Busted: tp5n 15:27:39 INFO - FAIL: error executing: '/home/cltbld/talos-slave/test/build/venv/lib/python2.6/site-packages/talos/breakpad/linux64/minidump_stackwalk /tmp/tmp4OAlyL/profile/minidumps/4dbd15ec-1f53-8926-382c5348-0a6a5952.dmp /home/cltbld/talos-slave/test/build/symbols' 15:27:39 ERROR - Traceback (most recent call last): 15:27:39 INFO - File "/home/cltbld/talos-slave/test/build/venv/bin/talos", line 9, in <module> 15:27:39 INFO - load_entry_point('talos==0.0', 'console_scripts', 'talos')() 15:27:39 INFO - File "/home/cltbld/talos-slave/test/build/venv/lib/python2.6/site-packages/talos/run_tests.py", line 295, in main 15:27:39 INFO - run_tests(parser) 15:27:39 INFO - File "/home/cltbld/talos-slave/test/build/venv/lib/python2.6/site-packages/talos/run_tests.py", line 259, in run_tests 15:27:39 INFO - raise e 15:27:39 CRITICAL - talos.utils.talosError: "error executing: '/home/cltbld/talos-slave/test/build/venv/lib/python2.6/site-packages/talos/breakpad/linux64/minidump_stackwalk /tmp/tmp4OAlyL/profile/minidumps/4dbd15ec-1f53-8926-382c5348-0a6a5952.dmp /home/cltbld/talos-slave/test/build/symbols'" 15:27:39 ERROR - Return code: 1
Reporter | ||
Comment 1•12 years ago
|
||
Same for linux32: 15:12:02 INFO - DEBUG: created profile 15:32:04 INFO - NOISE: exception getting privileged access, defaulting to XUL_FENNEC 15:32:09 INFO - NOISE: Found crashdump: /tmp/tmpORRyKf/profile/minidumps/752b6547-0d26-efa5-61862122-6aff5a42.dmp 15:32:09 INFO - Failed tscrollr: 15:32:09 INFO - Stopped Mon, 22 Oct 2012 15:32:09 15:32:09 ERROR - Traceback (most recent call last): 15:32:09 INFO - File "/home/cltbld/talos-slave/test/build/venv/lib/python2.6/site-packages/talos/run_tests.py", line 250, in run_tests 15:32:09 INFO - talos_results.add(mytest.runTest(browser_config, test)) 15:32:09 INFO - File "/home/cltbld/talos-slave/test/build/venv/lib/python2.6/site-packages/talos/ttest.py", line 412, in runTest 15:32:09 INFO - self.cleanupAndCheckForCrashes(browser_config, profile_dir) 15:32:09 INFO - File "/home/cltbld/talos-slave/test/build/venv/lib/python2.6/site-packages/talos/ttest.py", line 182, in cleanupAndCheckForCrashes 15:32:09 CRITICAL - raise talosError("error executing: '%s'" % subprocess.list2cmdline(cmd)) 15:32:09 CRITICAL - talosError: "error executing: '/home/cltbld/talos-slave/test/build/venv/lib/python2.6/site-packages/talos/breakpad/linux/minidump_stackwalk /tmp/tmpORRyKf/profile/minidumps/752b6547-0d26-efa5-61862122-6aff5a42.dmp /home/cltbld/talos-slave/test/build/symbols'" 15:32:09 CRITICAL - FAIL: Busted: tscrollr 15:32:09 INFO - FAIL: error executing: '/home/cltbld/talos-slave/test/build/venv/lib/python2.6/site-packages/talos/breakpad/linux/minidump_stackwalk /tmp/tmpORRyKf/profile/minidumps/752b6547-0d26-efa5-61862122-6aff5a42.dmp /home/cltbld/talos-slave/test/build/symbols' 15:32:09 ERROR - Traceback (most recent call last): 15:32:09 INFO - File "/home/cltbld/talos-slave/test/build/venv/bin/talos", line 9, in <module> 15:32:09 INFO - load_entry_point('talos==0.0', 'console_scripts', 'talos')() 15:32:09 INFO - File "/home/cltbld/talos-slave/test/build/venv/lib/python2.6/site-packages/talos/run_tests.py", line 295, in main 15:32:09 INFO - run_tests(parser) 15:32:09 INFO - File "/home/cltbld/talos-slave/test/build/venv/lib/python2.6/site-packages/talos/run_tests.py", line 259, in run_tests 15:32:09 INFO - raise e 15:32:09 CRITICAL - talos.utils.talosError: "error executing: '/home/cltbld/talos-slave/test/build/venv/lib/python2.6/site-packages/talos/breakpad/linux/minidump_stackwalk /tmp/tmpORRyKf/profile/minidumps/752b6547-0d26-efa5-61862122-6aff5a42.dmp /home/cltbld/talos-slave/test/build/symbols'" 15:32:09 ERROR - Return code: 1
Comment 2•12 years ago
|
||
any idea why this is failing? do we need a different environment for launching this process? Are we not running these tests on linux, the minidump_stackwalk is using the linux version.
Reporter | ||
Comment 3•12 years ago
|
||
I don't know why it's failing. I do know that it's calling the linux minidump_stackwalk on linux, and linux64 on linux64, etc. so that's not it. I'm thinking about parsing for this in mozharness, and if I detect it, doing some debugging later (verifying the binary and directory are there, permissions, etc)
Reporter | ||
Updated•12 years ago
|
Assignee: nobody → aki
Reporter | ||
Comment 4•12 years ago
|
||
We should remove this code when we figure out what's going on and fix it. But this should help us figure out what's going on.
Attachment #676422 -
Flags: review?(jhammel)
Comment 5•12 years ago
|
||
Comment on attachment 676422 [details] [diff] [review] try to figure out what's going on here Review of attachment 676422 [details] [diff] [review]: ----------------------------------------------------------------- r+ with one nit. ::: mozharness/mozilla/testing/talos.py @@ +454,5 @@ > self.return_code = self.run_command(command, cwd=self.workdir, > output_parser=parser) > + if parser.minidump_output: > + for item in parser.minidump_output: > + self.run_command(["ls", "-l", item]) I would like to qualify this output so we know what it is and why we see it.
Attachment #676422 -
Flags: review?(jhammel) → review+
Reporter | ||
Comment 6•12 years ago
|
||
Comment on attachment 676422 [details] [diff] [review] try to figure out what's going on here http://hg.mozilla.org/build/mozharness/rev/8de53d8a6437
Attachment #676422 -
Flags: checked-in+
Comment 7•12 years ago
|
||
It would probably be a nice thing to have Talos be a little more informative about this as well. I'll work on a patch here, though this will likely be a multi-step process unless I can reproduce (FWIW, I haven't seen this error either with mozharness or otherwise).
Comment 8•12 years ago
|
||
I can't say if this is sufficient to really diagnose the failure, but its probably not a bad change to take
Attachment #685312 -
Flags: review?(jmaher)
Comment 9•12 years ago
|
||
Comment on attachment 685312 [details] [diff] [review] be a little more verbose about a few things Review of attachment 685312 [details] [diff] [review]: ----------------------------------------------------------------- this isn't much more verbose, but it cleans a lot of little things up.
Attachment #685312 -
Flags: review?(jmaher) → review+
Comment 10•12 years ago
|
||
since this is rare, I am not sure of what testing is needed, maybe a sanity check on windows.
Comment 11•12 years ago
|
||
pushed to try: https://tbpl.mozilla.org/?tree=Try&rev=ce2120177cde while the patch isn't a ton more verbose, it should at least give some sort of clue as to why it fails and it ensures that the minidump_stackwalk and the symbols path given are actually found on disk.
Comment 12•12 years ago
|
||
Try run for ce2120177cde is complete. Detailed breakdown of the results available here: https://tbpl.mozilla.org/?tree=Try&rev=ce2120177cde Results (out of 5 total builds): success: 5 Builds (or logs if builds failed) available at: http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jhammel@mozilla.com-ce2120177cde
Comment 13•12 years ago
|
||
pushed: http://hg.mozilla.org/build/talos/rev/e00b0636f9f3 ; hopefully when this hits production/cedar this will at least give up the next clue
Reporter | ||
Comment 14•11 years ago
|
||
We haven't seen a crash recently. We can try to force a crash (ted's crash page in a page load manifest?) or decide whether this actually blocks rollout.
Comment 15•11 years ago
|
||
This probably shouldn't block deployment if it is rare.
Comment 16•11 years ago
|
||
My impression is that what's rare are ongoing intermittent talos crashes and hangs, but that this is what happens every time a mozharness talos run crashes or hangs. As bug 814698 says, talos-only crashes caused by newly landed patches happen around once a month. It would be an interesting gamble to see whether we would tell someone that they just can't land a patch because it crashes, though we cannot tell them where, or whether we would just hide some or all of talos on some or all platforms on some or all branches. I'd bet heavily on the latter for a frequent intermittent crash, especially in just one suite, but for permared? An interesting gamble.
Updated•11 years ago
|
Assignee: nobody → yshun
Comment 18•11 years ago
|
||
It seems that this is one of the last things to enable talos mozharness in production. I was thinking of aiming for switching things on the try branch first, once we verify the numbers on Cedar and fix this bug.
Assignee | ||
Comment 19•11 years ago
|
||
minidump stackwalk output on linux(64)
Assignee | ||
Comment 20•11 years ago
|
||
output for minidump stackwalk on winxp
Assignee | ||
Comment 21•11 years ago
|
||
I have posted 3 logs. I can see the output for minidump_stackwalk when ff creashes on linux and win but not on mac. I commented out http://mxr.mozilla.org/build/source/talos/talos/ttest.py#209 till 210 so that I wouldn't get the error in bug 892524.
Comment 22•11 years ago
|
||
getting closer!
Updated•11 years ago
|
Attachment #774213 -
Attachment mime type: text/x-log → text/plain
Updated•11 years ago
|
Attachment #774217 -
Attachment mime type: text/x-log → text/plain
Updated•11 years ago
|
Attachment #774206 -
Attachment mime type: text/x-log → text/plain
Assignee | ||
Comment 23•11 years ago
|
||
I managed to crash firefox with https://code.google.com/p/crashme/ by executing this hack in the browser console: Components.utils.import("resource://crashme/modules/Crasher.jsm"); Crasher.crash(0); minidump_stackwalk was able to find the dmp file and printed the output in the log. I wasn't able to get the output in https://bugzilla.mozilla.org/attachment.cgi?id=774217&action=edit because I was using kill -SEGV to crash firefox. When I try to kill firefox that way, the dmp files are not generated in the minidumps folder.
Assignee | ||
Updated•11 years ago
|
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 24•11 years ago
|
||
We have the minidump output on mac, win and linux. Resolved fixed.
Reporter | ||
Comment 25•11 years ago
|
||
\o/
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
Updated•10 years ago
|
Component: General Automation → Mozharness
You need to log in
before you can comment on or make changes to this bug.
Description
•