Closed
Bug 483111
Opened 16 years ago
Closed 15 years ago
Get stacks for try server unittest crashes
Categories
(Release Engineering :: General, defect, P3)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: lsblakk, Assigned: nthomas)
References
Details
Attachments
(1 file)
8.75 KB,
patch
|
bhearsum
:
review+
nthomas
:
checked-in+
|
Details | Diff | Splinter Review |
This is a follow-up bug for bug 445611
Reporter | ||
Updated•16 years ago
|
Priority: -- → P3
Reporter | ||
Updated•15 years ago
|
Assignee: lsblakk → nobody
Component: Release Engineering → Release Engineering: Future
Reporter | ||
Comment 1•15 years ago
|
||
Looking for someone to take this.
Status: ASSIGNED → NEW
Component: Release Engineering: Future → Release Engineering
Assignee | ||
Comment 2•15 years ago
|
||
I've got other things to work on in try server land so will have a look.
Assignee: nobody → nthomas
Status: NEW → ASSIGNED
Priority: P3 → P2
Comment 3•15 years ago
|
||
Nick, do you have any news on this?
Unfortunatly i have a persistent crash on tryservers and i can't debug it :(
Assignee | ||
Comment 4•15 years ago
|
||
We have a few options here. We can go the quick and dirty approach, like this
http://pastebin.mozilla.org/674526
which is to adapt the exist factories to clone the repo with the stack walkers, build the symbols, and define MINIDUMP_STACKWALK in the environment. That gets us reasonably close to the unit tests on the Firefox tree, but leaves out leak thresholds (and possibly other stuff). There's a factory for each platform to modify, just linux is done in the pastebin.
I also tried to write a TryUnittestBuildFactory based on buildbotcustom.process.factory.UnittestBuildFactory. That wasn't passing the env changes on properly and I didn't get a chance to dig into it more. This approach is nicer because we use the one factory for all three platforms, and get leak thresholds, and generally sync up with the main unit testers. Still needs some work to make it function.
But in the meantime UnittestBuildFactory has been gutted of tests in favour of packaged unit tests. Given bug 493228 and bug 520227 it may better to take the quick and dirty fix now, and jump to
* optimized build also produces symbols and test packages, and maybe 'make check'
* unit test run is triggered by sendchange, using packaged unit test factory
* the time we save by not doing a unit compile could be spent doing a debug build, to get us leak checks on debug tests (overall a load increase)
* talos as usual
catlee and bhearsum, what say you ?
Assignee | ||
Comment 5•15 years ago
|
||
The quick and dirty approach is running on try staging now. Mak, if you push to try again and watch the "try" columns on the MozillaTest tree then you should hopefully get a stack. Staging gets behind quite quickly because it only has 1 slave per platform, so there may be a delay in producing a build.
Comment 6•15 years ago
|
||
(In reply to comment #4)
> But in the meantime UnittestBuildFactory has been gutted of tests in favour of
> packaged unit tests. Given bug 493228 and bug 520227 it may better to take the
> quick and dirty fix now, and jump to
> * optimized build also produces symbols and test packages, and maybe 'make
> check'
> * unit test run is triggered by sendchange, using packaged unit test factory
> * the time we save by not doing a unit compile could be spent doing a debug
> build, to get us leak checks on debug tests (overall a load increase)
> * talos as usual
That sounds great to me.
Assignee | ||
Comment 7•15 years ago
|
||
For each platform, this pulls the build/tools repo for the stackwalker, calls 'make buildsymbols' to create objdir/dist/crashreporter-symbols, sets MINIDUMP_STACKWALK in a copy of the unittest environment, and uses that environment for all the test suites. Also fixes up some use of OBJDIR along the way. I've verified that the repo is pulled OK, buildsymbols succeeds, and MINIDUMP_STACKWALK is being set for all unit suite invocations (even though it's probably useless on make check).
Having a problem with resolving the symbols in the stacks though. bz suggested a null-pointer dereference as an easy way to guarantee a crash (by inserting "*(char*)0 = 'x';" at http://mxr.mozilla.org/mozilla-central/source/content/base/src/nsGenericElement.cpp#1782). After pushing that to try staging the resulting xpcshell crashes are correctly resolved (except on mac which misses the topmost call of the crashing thread). But all of ref/crashtest, and the mochi suites return output like this:
TEST-UNEXPECTED-FAIL | automation.py | application crashed (minidump found)
...
Crash reason: SIGSEGV
Crash address: 0x404eb3f0
Thread 0 (crashed)
0 libxul.so + 0x4db3f0
eip = 0x404eb3f0 esp = 0xbf875854 ebp = 0xbf87585c ebx = 0x40e37d5c
esi = 0x42848c60 edi = 0x41a7b840 eax = 0x42848c70 ecx = 0x41a7b840
edx = 0x00200001 efl = 0x00010202
1 libxul.so + 0x5a2b0d
eip = 0x405b2b0e esp = 0xbf875864 ebp = 0xbf87587c
2 libxul.so + 0x5a2c04
eip = 0x405b2c05 esp = 0xbf875884 ebp = 0xbf87589c
....
There's no complaint from build/automationutils.py::checkForCrashes about undefined SYMBOLS_PATH. I'm going to try not eating stderr when calling the stack walker to try to get some clues.
Comment 8•15 years ago
|
||
There was (is?) a bug in the ref/crashtest harness where the symbols path wasn't being converted to an absolute path to be passed along to the stack walker, which results in failure since the harness also changes its own working directory, invalidating any relative directory references.
Ted, would this explain the behaviour above? One way to check if you can run the ref/crashtest manually is to pass an absolute path to the symbols directory to the test harness.
Comment 9•15 years ago
|
||
There was a bug, but I believe I fixed all instances of it. It's possible there's a related bug, or I missed a spot.
Assignee | ||
Comment 10•15 years ago
|
||
For linux reftest we call 'make reftest'
in /builds/slave/sendchange-linux-unittest/mozilla/objdir
On crashing, we're looking for symbols at
/builds/slave/sendchange-linux-unittest/dist/crashreporter-symbols/libxul.so/F8CC3D6D8F1AF9564377BDAD9B4B367B0/libxul.so.sym
which is missing a mozilla/objdir before the dist/.
For mochitest-plain it was
/builds/slave/dist/crashreporter-symbols/libxul.so/F8CC3D6D8F1AF9564377BDAD9B4B367B0/libxul.so.sym
so that's missing sendchange-linux-unittest/mozilla/objdir before dist. Same problem on mac and windows.
Comment 11•15 years ago
|
||
The default path is preprocessed into automation.py:
http://mxr.mozilla.org/mozilla-central/source/build/automation.py.in#98
From here:
http://mxr.mozilla.org/mozilla-central/source/build/automation-build.mk#24
TARGET_DEPTH comes from the makefiles that include automation-build.mk, like here for reftest:
http://mxr.mozilla.org/mozilla-central/source/layout/tools/reftest/Makefile.in#61
We do try to normalize paths into a full path here (to fix that bug catlee mentioned):
http://mxr.mozilla.org/mozilla-central/source/layout/tools/reftest/runreftest.py#127
So maybe we're mucking up the path somehow? Could you look in automation.py (it gets preprocessed to a variety of locations, but $objdir/_tests/reftest/ is the reftest copy) and see if that path looks good? (It should be a path relative to the directory it sits in.)
Comment 12•15 years ago
|
||
Of course, after typing that out and looking at the code, I realized the problem. When I landed a fix for bug 519194, ( http://hg.mozilla.org/mozilla-central/rev/0f0239688a76 ), I made us resolve the symbols path relative to the cwd. However, the SYMBOLS_PATH that gets preprocessed into automation.py is...a path relative to the *script dir*, so I fixed the packaged case and broke the in-tree case. Nice.
I filed bug 523330 on fixing that, so if your work here gets you stacks (even with bogus symbols), you're probably ok.
Assignee | ||
Comment 13•15 years ago
|
||
Comment on attachment 406985 [details] [diff] [review]
Get stacks, v1
My notes about this patch are in comment #7. Can't wait to rip all this duplication out.
Attachment #406985 -
Attachment description: WIP → Get stacks, v1
Attachment #406985 -
Flags: review?(bhearsum)
Comment 14•15 years ago
|
||
Comment on attachment 406985 [details] [diff] [review]
Get stacks, v1
>+ s(SetProperty, name="get toolsdir",
>+ command=['bash', '-c', 'pwd'],
>+ property='toolsdir',
>+ workdir='tools',
>+ ),
Does this work here? We discovered that we have to use 'pwd -W' for packaged unittests, because we run python directly, which doesn't convert the MSYS style path this provides to a windows one. I _think_ this is ok, just want to be sure.
r=bhearsum if it works, or if 'pwd -W' is needed.
Attachment #406985 -
Flags: review?(bhearsum) → review+
Assignee | ||
Comment 15•15 years ago
|
||
That's a good question. I'll do another run with my crash patch + the change from bug 523330.
Would wait to roll out this for 523330.
Depends on: 523330
Assignee | ||
Comment 16•15 years ago
|
||
the pwd is fine as is, I think because we're still calling the make target in this build.
Priority: P2 → P3
Whiteboard: waiting on blocker
Assignee | ||
Comment 17•15 years ago
|
||
Comment on attachment 406985 [details] [diff] [review]
Get stacks, v1
http://hg.mozilla.org/build/buildbot-configs/rev/5eb1dc43cd59
Attachment #406985 -
Flags: checked-in+
Assignee | ||
Comment 18•15 years ago
|
||
prod and staging servers reconfig'd.
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Whiteboard: waiting on blocker
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
You need to log in
before you can comment on or make changes to this bug.
Description
•