Closed
Bug 542952
Opened 16 years ago
Closed 8 years ago
leak test build reports 'No symbols path given, can't process dump.' when a crash happens
Categories
(Testing :: General, defect, P5)
Tracking
(Not tracked)
RESOLVED
WONTFIX
People
(Reporter: sgautherie, Unassigned)
References
Details
(Keywords: regression, Whiteboard: [unittest])
Attachments
(1 file, 2 obsolete files)
|
879 bytes,
patch
|
coop
:
review+
|
Details | Diff | Splinter Review |
Moved here from bug 520707 comment 13:
{
Serge Gautherie (:sgautherie) 2010-01-21 06:49:12 PST
http://tinderbox.mozilla.org/showlog.cgi?log=SeaMonkey/1264014698.1264015053.2511.gz
Linux comm-central-trunk leak test build on 2010/01/20 11:11:38
No symbols path given, can't process dump.
I'm not sure what/where we need to do to get the symbol path.
}
And I'm not sure how to check this now that bug 520707 is fixed :-/
Flags: in-testsuite-
Comment 1•16 years ago
|
||
Ted, do you have an idea what we could be missing here?
Comment 2•16 years ago
|
||
This is crashing running leaktest.py? You'll probably have to append --symbols-path=dist/crashreporter-symbols to the commandline.
Comment 3•16 years ago
|
||
It's running leaktest.px though our buildbot harness, yes.
Is --symbols-path always good to be passed, or is it only needed if the symbols are in a non-default location?
Comment 4•16 years ago
|
||
It needs to always be passed, there's no default for it.
Comment 5•16 years ago
|
||
Interesting, as that means that Firefox leak test builds ought to show the same problem when they crash, and we probably need a general fix in buildbotcustom for that.
Comment 6•16 years ago
|
||
Moving to RelEng as it seems that the issue is that we generally always need to pass --symbols-path to leaktest.py to get useful output in case the build crashes.
Component: Build Config → Release Engineering
Flags: in-testsuite-
Product: SeaMonkey → mozilla.org
QA Contact: build-config → release
Version: Trunk → other
Updated•16 years ago
|
Assignee: nobody → ccooper
Status: NEW → ASSIGNED
Priority: -- → P2
Comment 7•16 years ago
|
||
Probably a dupe of bug 519195.
Comment 8•16 years ago
|
||
Ted,
Please correct me if I'm wrong.
--symbols-path should point to the _directory_ (dist/crashreporter-symbols) where firefox-*-symbols.txt exists, not the _file_.
Coop,
Seems like the only place where leacktest is being called is AliveTest step. I've added another optional parameter for symbolsPath and changed the factories accordingly. Please review.
Assignee: ccooper → raliiev
Attachment #428875 -
Flags: review?(ccooper)
Comment 9•16 years ago
|
||
Comment on attachment 428875 [details] [diff] [review]
Proposed fix
> workdir='build/%s/_leaktest' % self.mozillaObjdir,
>+ symbolsPath='build/%s/dist/crashreporter-symbols',
You're missing the actual variable to replace % with, and that should be mozillaObjDir like in the line above, so make it that:
symbolsPath='build/%s/dist/crashreporter-symbols' % self.mozillaObjdir,
Same for the other symbolsPath= lines.
Comment 10•16 years ago
|
||
Yes, you want the directory.
Comment 11•16 years ago
|
||
(In reply to comment #9)
> (From update of attachment 428875 [details] [diff] [review])
> > workdir='build/%s/_leaktest' % self.mozillaObjdir,
> >+ symbolsPath='build/%s/dist/crashreporter-symbols',
Shouldn't that also be:
symbolsPath='../dist/crashreporter-symbols'
...given that the workdir is already under the objdir?
Comment 12•16 years ago
|
||
Coop, you're probably right. It looks to me like Rail didn't actually test this patch, it probably should run through staging to make sure it works basically.
The real test will probably be when it's in production and hits a crash, which seems to still happen intermittently on SeaMonkey trunk, even if the frequency has been greatly reduced recently.
Comment 13•16 years ago
|
||
Comment on attachment 428875 [details] [diff] [review]
Proposed fix
A few things:
* AFAICT --symbols-path is an arg to automation.py rather than leaktest.py, so it needs to go in the extraArgs, i.e. after the '--'
* we probably want to set the symbolsPath every time we run leaktest.py, so I would prefer having it default to being set so that we would have to explicitly turn it off when adding steps rather than turning it on every time.
Attachment #428875 -
Flags: review?(ccooper) → review-
Comment 14•16 years ago
|
||
If you want to test this in staging, you could run a leaktest run on Linux, then kill -SEGV the browser process, which should trigger Breakpad and give you a stack trace.
Comment 15•16 years ago
|
||
At least I'm on the right way. :)
I had no idea how to test it. Ted, thanks for the tip.
I'll test it on staging tomorrow and be back with the results.
Thanks for the comments.
Updated•16 years ago
|
Summary: [SeaMonkey] 'Linux comm-central-trunk leak test build' reports 'No symbols path given, can't process dump.' when a crash happens → leak test build reports 'No symbols path given, can't process dump.' when a crash happens
Comment 16•16 years ago
|
||
Coop, could you review the following patch.
Still trying to reproduce, but the only message I get after "kill -SEGV pid" is
TEST-UNEXPECTED-FAIL | automation.py | Exited with code -11 during test run
Attachment #428875 -
Attachment is obsolete: true
Attachment #429198 -
Flags: review?(ccooper)
Comment 17•16 years ago
|
||
Comment on attachment 429198 [details] [diff] [review]
Proposed fix
Hrmmm, sorry. On second inspection, we always want to set --symbols-path even when extraArgs is None.
Maybe we should make extraArgs=[] the default? We'll have to check any AliveTest consumers to make sure they conform if we do.
Attachment #429198 -
Flags: review?(ccooper) → review-
Comment 18•16 years ago
|
||
> Hrmmm, sorry. On second inspection, we always want to set --symbols-path even
when extraArgs is None.
Argh... Should have slept more. :)
> Maybe we should make extraArgs=[] the default? We'll have to check any
AliveTest consumers to make sure they conform if we do.
I'd prefer to use None, [] may become stateful in some cases.
Please take a look at this version.
Attachment #429198 -
Attachment is obsolete: true
Attachment #429475 -
Flags: review?(ccooper)
Comment 19•16 years ago
|
||
Comment on attachment 429475 [details] [diff] [review]
Proposed fix
Looks good.
Attachment #429475 -
Flags: review?(ccooper) → review+
Comment 20•16 years ago
|
||
Comment on attachment 429475 [details] [diff] [review]
Proposed fix
http://hg.mozilla.org/build/buildbotcustom/rev/027e10fcbf66
Attachment #429475 -
Flags: checked-in+
Comment 21•16 years ago
|
||
pm* reconfig-ed with this change.
Status: ASSIGNED → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Comment 22•16 years ago
|
||
Am I right that the end of http://tinderbox.mozilla.org/showlog.cgi?log=SeaMonkey/1267465601.1267466624.27337.gz&fulltext=1 is actually quite the output we actually want to see when it crashes and this bug was fixed successfully?
Comment 23•16 years ago
|
||
No, that's just our internal stack walking code from hitting an assertion.
Comment 24•16 years ago
|
||
OK, then we'll need to wait on a real crash that isn't from an assertion to be able to confirm this fix.
Comment 25•16 years ago
|
||
I think the change here is causing the win32 debug builds to timeout doing the tracemalloc alive test (bug 549422). Rather than trying to guess a new timeout I'm going to back it the change and we can revisit in staging (including figuring out if spending more than an hour on this test is a good use of resources).
Comment 26•16 years ago
|
||
Comment on attachment 429475 [details] [diff] [review]
Proposed fix
http://hg.mozilla.org/build/buildbotcustom/rev/d4aa624ca025
Attachment #429475 -
Flags: checked-in+ → checked-in-
Comment 27•16 years ago
|
||
(In reply to comment #25)
> I think the change here is causing the win32 debug builds to timeout doing the
> tracemalloc alive test (bug 549422). Rather than trying to guess a new timeout
> I'm going to back it the change and we can revisit in staging (including
> figuring out if spending more than an hour on this test is a good use of
> resources).
Rail: can you try to reproduce this in staging tomorrow, please?
Comment 28•16 years ago
|
||
As a note, SeaMonkey doesn't show any timeouts on trunk boxes, not even on Windows.
Comment 29•16 years ago
|
||
Another interesting observation. Trying to reproduce this timeout I've got this:
MINIDUMP_STACKWALK binary not found: /e/builds/moz2_slave/mozilla-central-win32-debug/tools/breakpad/win32/minidump_stackwalk.exe
minidump_stackwalk.exe exists in that directory.
Should we use Windows style paths for MINIDUMP_STACKWALK env variable?
Comment 30•16 years ago
|
||
Rail: that's an MSYS path, which is ok as long as you execute things via a msys shell. If you run directly from a Windows shell, or execute a native Windows program (python) that executes the command without going through a msys shell, then it won't translate that path for you.
Comment 31•16 years ago
|
||
(In reply to comment #27)
> Rail: can you try to reproduce this in staging tomorrow, please?
Yes, I can reproduce this. Alive test #5 times out.
I changed the timeout value to 5400 and the test looks fine now. It takes more than 4000 sec on Windows now. :(
Should I just adjust the timeout or should we investigate the root clause?
Comment 32•16 years ago
|
||
(In reply to comment #31)
> Yes, I can reproduce this. Alive test #5 times out.
>
> I changed the timeout value to 5400 and the test looks fine now. It takes more
> than 4000 sec on Windows now. :(
>
> Should I just adjust the timeout or should we investigate the root clause?
How long does AliveTest #5 take to complete on Linux and Mac? 4000 sec seems long, but if it's in the same ballpark as the others I'm less worried.
Also, does AliveTest #5 take 4000 sec regardless of whether the symbols-path is specified, i.e. was there an unrelated code change that has caused AliveTest to start taking longer on Windows?
Comment 33•16 years ago
|
||
(In reply to comment #32)
> How long does AliveTest #5 take to complete on Linux and Mac?
Usually AliveTest #5 takes 1-2 minutes on Linux and Mac.
> Also, does AliveTest #5 take 4000 sec regardless of whether the symbols-path is
> specified, i.e. was there an unrelated code change that has caused AliveTest to
> start taking longer on Windows?
Not sure (can test). Additionally looks like the given parameter is not passed to the right place, see bug 549897.
Comment 34•16 years ago
|
||
(In reply to comment #32)
> Also, does AliveTest #5 take 4000 sec regardless of whether the symbols-path is
> specified, i.e. was there an unrelated code change that has caused AliveTest to
> start taking longer on Windows?
Just tested: elapsedTime=6214.438000.
Looks like the long wait time isn't closely connected to the patch.
Comment 35•16 years ago
|
||
(In reply to comment #32)
> How long does AliveTest #5 take to complete on Linux and Mac? 4000 sec seems
> long, but if it's in the same ballpark as the others I'm less worried.
I filed bug 549561 because Windows is much much slower than the other platforms.
(In reply to comment #34)
> Looks like the long wait time isn't closely connected to the patch.
Have to disagree here. All branches started timing out, and went green again, in close correlation to landing and backing out attachment 429475 [details] [diff] [review]. That includes relatively inactive branches like 1.9.1 and 1.9.2.
Comment 36•16 years ago
|
||
(In reply to comment #35)
> Have to disagree here. All branches started timing out, and went green again,
> in close correlation to landing and backing out attachment 429475 [details] [diff] [review]. That
> includes relatively inactive branches like 1.9.1 and 1.9.2.
Will test with 1.9.2 tomorrow.
Comment 37•16 years ago
|
||
SeaMonkey trunk needs about 1000s for alive test 5, but we're doing shared builds. Might the fact that Firefox does libxul builds instead make a difference here?
Comment 38•16 years ago
|
||
1.9.2 Win result:
python leaktest.py -- --trace-malloc malloc.log --shutdown-leaks sdleak.log --symbols-path ../dist/crashreporter-symbols
.....
program finished with exit code 0
elapsedTime=1309.438000
Comment 39•16 years ago
|
||
1.9.1 Win result:
python leaktest.py -- --trace-malloc malloc.log --shutdown-leaks sdleak.log --symbols-path ../dist/crashreporter-symbols
...
program finished with exit code 0
elapsedTime=774.328000
Comment 40•16 years ago
|
||
Are those results all from VMs or hardware ? ie are we comparing apples with apples
Comment 41•16 years ago
|
||
SeaMonkey result is from an ESX VM, but I guess the FF ones should be more interesting.
Comment 42•16 years ago
|
||
(In reply to comment #40)
> Are those results all from VMs or hardware ? ie are we comparing apples with
> apples
All of the builds were built on the same salve (win32-slave04).
Comment 43•16 years ago
|
||
Here are some timings from staging-master.
== Without the patch ==
mozilla-central
---------------
slave: win32-04, 6288 sec, 6303 sec
slave: win32-38, 4068 sec, 3976 sec
mozilla-1.9.2
-------------
slave: win32-04, 1281 sec
slave: win32-38, 907 sec, 898 sec
places
------
slave: win32-38, 4009 sec
== With the patch ==
mozilla-central
---------------
slave: win32-38, 4151 sec, 4357 sec, 4626 sec
mozilla-1.9.2
-------------
slave: win32-04, 1321 sec
slave: win32-38, 910 sec
places
------
slave: win32-04, 5841 sec
Looks like the difference between runs with and without the patch is not notable.
Comment 44•16 years ago
|
||
Guess we should try landing this again, and be ready with a timeout bump just in case we need it. Rail says he used 7200 for the runs in comment #43, thinking at it controlled the total step length rather the how long to wait after the last bit of output.
Comment 45•16 years ago
|
||
After some investigation playing with timeouts I can get the following situation:
http://img697.imageshack.us/img697/5260/screenshotinm.png
(browser window with URL of --symbols-path).
On my local machine I've tried to run the same steps by hand and found that leaktest.py doesn't pass --symbols-path parameter properly.
leaktest.py passes all extra arguments (extraArgs) to Automation.runApp
http://hg.mozilla.org/mozilla-central/file/5e9d5bbf7596/build/leaktest.py.in#l83
while Automation.runApp expects symbolsPath as a separate parameter:
http://hg.mozilla.org/mozilla-central/file/5e9d5bbf7596/build/automation.py.in#l727
Updated•16 years ago
|
Priority: P2 → P4
Updated•16 years ago
|
Assignee: rail → nobody
Priority: P4 → P5
Comment 46•16 years ago
|
||
(In reply to comment #45)
> On my local machine I've tried to run the same steps by hand and found that
> leaktest.py doesn't pass --symbols-path parameter properly.
What's the takeaway here? Should we be reassigning to someone to fix leaktest.py?
Whiteboard: [unittest]
Comment 47•16 years ago
|
||
Yeah, it needs to be fixed for this to work.
Component: Release Engineering → General
Product: mozilla.org → Testing
QA Contact: release → general
Version: other → Trunk
Comment 49•8 years ago
|
||
Mass closing bugs with no activity in 2+ years. If this bug is important to you, please re-open.
Status: REOPENED → RESOLVED
Closed: 16 years ago → 8 years ago
Resolution: --- → WONTFIX
You need to log in
before you can comment on or make changes to this bug.
Description
•