Closed
Bug 1276011
Opened 8 years ago
Closed 8 years ago
Marionette crashes aren't being properly symbolicated
Categories
(Testing :: Marionette Client and Harness, defect)
Tracking
(firefox47 unaffected, firefox48 unaffected, firefox49 fixed, firefox-esr45 unaffected, firefox50 fixed)
RESOLVED
FIXED
mozilla50
Tracking | Status | |
---|---|---|
firefox47 | --- | unaffected |
firefox48 | --- | unaffected |
firefox49 | --- | fixed |
firefox-esr45 | --- | unaffected |
firefox50 | --- | fixed |
People
(Reporter: RyanVM, Assigned: impossibus)
References
Details
(Keywords: intermittent-failure, regression)
Attachments
(1 file)
I'm pretty sure this used to work, so that's swell. I wanted to throw Taskcluster under the bus for it, but then OSX had to go and ruin that theory. https://treeherder.mozilla.org/logviewer.html#?job_id=3950116&repo=mozilla-central#L1312 https://treeherder.mozilla.org/logviewer.html#?job_id=28754506&repo=mozilla-inbound#L1314
Flags: needinfo?(dburns)
Comment 1•8 years ago
|
||
Ted, I don't even know where to start to fix this, some guidance please.
Flags: needinfo?(dburns) → needinfo?(ted)
Reporter | ||
Updated•8 years ago
|
Summary: Mn-e10s crashes aren't being properly symbolicated → Marionette crashes aren't being properly symbolicated
Comment 3•8 years ago
|
||
So for the mozilla-inbound case as Ryan filed above I cannot see that we unzip the downloaded symbols in the marionette harness script. Instead we pass the zip file path to mozcrash. But mozcrash does only unzip archives in case it downloads those itself: https://dxr.mozilla.org/mozilla-central/source/testing/mozbase/mozcrash/mozcrash/mozcrash.py#174
Comment 4•8 years ago
|
||
mozharness looks like it's downloading and unzipping the symbols properly: ``` 01:14:31 INFO - Running command: ['unzip', '-q', '-o', '/home/worker/workspace/build/target.crashreporter-symbols.zip', '-d', '/home/worker/workspace/build/symbols'] 01:14:31 INFO - Copy/paste: unzip -q -o /home/worker/workspace/build/target.crashreporter-symbols.zip -d /home/worker/workspace/build/symbols ``` And the marionette runtests.py is being invoked with `--symbols-path=/home/worker/workspace/build/symbols`, which looks right. However! Something is not passing that symbols path down to mozcrash properly, if you look at the minidump_stackwalk invocation it puts in the log: ``` 01:19:39 INFO - mozcrash Copy/paste: /usr/local/bin/linux64-minidump_stackwalk /tmp/tmpEAKbwF.mozrunner/minidumps/2ee0ace2-b7d1-8fab-65952a72-05b2565e.dmp /tmp/tmpIc4Mhu ``` It's passing a temp directory there, not the symbols path. mozcrash will do that if you pass `None` down as the symbols path: https://dxr.mozilla.org/mozilla-central/rev/d1102663db10b3d4b9358f3cf4e16b7c56902352/testing/mozbase/mozcrash/mozcrash/mozcrash.py#167 I tried tracing through the Marionette code, and it *looks* like it's persisting the symbols_path all the way down through mozrunner, but it's pretty complicated and I'm not at all confident of that.
Flags: needinfo?(ted)
Comment 5•8 years ago
|
||
whimboo: this is the kind of thing I was worried about with making `symbols_path` optional. :-/
Comment 6•8 years ago
|
||
Maja, can you have a look that we are doing things properly based on what ted has documented in comment 4 when you have a moment please.
Flags: needinfo?(mjzffr)
Comment 7•8 years ago
|
||
So the builds as pasted earlier on this bug are from Taskcluster and Buildbot. Here the taskcluster one: 01:13:59 INFO - Run as /home/worker/workspace/mozharness/scripts/marionette.py --config-file mozharness/configs/marionette/prod_config.py --config-file mozharness/configs/remove_executables.py --no-read-buildbot-config --installer-url=https://queue.taskcluster.net/v1/task/KnQrzy0rRVONLRnC82z6OA/artifacts/public/build/target.tar.bz2 --test-packages-url=https://queue.taskcluster.net/v1/task/KnQrzy0rRVONLRnC82z6OA/artifacts/public/build/target.test_packages.json --e10s --download-symbols=true We use the following symbols-url setting from the prod_config.py: https://dxr.mozilla.org/mozilla-central/source/testing/mozharness/configs/marionette/prod_config.py#80 Because of download-symbols=True we download and extract. Both work fine and symbols should be at the location of symbols_path. Also all is fine for the run command call for Marionette: 01:15:55 INFO - Copy/paste: /home/worker/workspace/build/venv/bin/python -u /home/worker/workspace/build/tests/marionette/marionette/runtests.py --gecko-log=/home/worker/workspace/build/blobber_upload_dir/gecko.log --log-raw=/home/worker/workspace/build/blobber_upload_dir/marionette_raw.log --log-errorsummary=/home/worker/workspace/build/blobber_upload_dir/marionette_errorsummary.log --binary=/home/worker/workspace/build/application/firefox/firefox --address=localhost:2828 --symbols-path=/home/worker/workspace/build/symbols /home/worker/workspace/build/tests/marionette/tests/testing/marionette/harness/marionette/tests/unit-tests.ini Maybe that is an issue in mozcrash here? Here for the buildbot command: /tools/buildbot/bin/python scripts/scripts/marionette.py --cfg marionette/prod_config.py --e10s --blob-upload-branch mozilla-inbound in dir /builds/slave/test/. (timeout 1800 secs) (maxTime 7200 secs) The same config file is used, but with ondemand for download. As result we do not download and extract the archive via mozharness but pass it directly to mozcrash: 22:18:56 INFO - Calling ['/builds/slave/test/build/venv/bin/python', '-u', '/builds/slave/test/build/tests/marionette/marionette/runtests.py', '--gecko-log=/builds/slave/test/build/blobber_upload_dir/gecko.log', '--log-raw=/builds/slave/test/build/blobber_upload_dir/marionette_raw.log', '--log-errorsummary=/builds/slave/test/build/blobber_upload_dir/marionette_errorsummary.log', '--binary=/builds/slave/test/build/application/NightlyDebug.app/Contents/MacOS/firefox', '--address=localhost:2828', '--symbols-path=https://queue.taskcluster.net/v1/task/faovEIEtTvyN6xaRWYI_xw/artifacts/public/build/firefox-49.0a1.en-US.mac64.crashreporter-symbols.zip', '/builds/slave/test/build/tests/marionette/tests/testing/marionette/harness/marionette/tests/unit-tests.ini'] with output_timeout 1000 So in both cases we have a symbols-path specified, so none of my changes should have an impact here: https://hg.mozilla.org/mozilla-central/rev/7ba4a5c7bc0e#l1.30 Maybe it might be good to get more logging into mozcrash? Also in case of symbols being not found or such.
Comment 8•8 years ago
|
||
Btw I can see some similar things for our firefox-ui-tests (bug 1279490). Given that crashes were properly symbolicated before, I wonder if that could be related a build system change?
Comment 9•8 years ago
|
||
Maybe the task definition changes for --download-symbols on bug 1267825 caused this problem? We have ondemand now for opt and pgo builds. I thought that minidump-stackwalk can only fetch nightly and release builds but not inbound. Or am I wrong, Ted?
Flags: needinfo?(ted)
Comment 10•8 years ago
|
||
I think the following comment in mozharness answers it: https://dxr.mozilla.org/mozilla-central/source/testing/mozharness/mozharness/mozilla/testing/testbase.py#206 > # As of now this only works for Nightly and release builds. Maja, you may want to check mozcrash how it handles the ondemand case with a known symbols url.
Flags: needinfo?(ted)
Comment 11•8 years ago
|
||
(In reply to Henrik Skupin (:whimboo) from comment #9) > Maybe the task definition changes for --download-symbols on bug 1267825 > caused this problem? We have ondemand now for opt and pgo builds. I thought > that minidump-stackwalk can only fetch nightly and release builds but not > inbound. Or am I wrong, Ted? Per my investigation in comment 4, I don't think mozharness is to blame here. It seems to be downloading and unzipping the symbols correctly, and passing that path to the Marionette harness correctly. Somewhere inside the harness that symbols_path is getting lost and mozcrash winds up using a temp directory.
Assignee | ||
Comment 12•8 years ago
|
||
Review commit: https://reviewboard.mozilla.org/r/62548/diff/#index_header See other reviews: https://reviewboard.mozilla.org/r/62548/
Attachment #8768219 -
Flags: review?(dburns)
Assignee | ||
Comment 13•8 years ago
|
||
I traced the runner code and it seems that the `symbols_path` arg was not being passed on to the Marionette constructor, and therefore the runner class, etc. With the fix I get the following locally when testing with the crashme addon: 0:09.80 LOG: MainThread mozcrash INFO Copy/paste: /Users/mozilla/dev/temp/macosx64-minidump_stackwalk /var/folders/5k/xmn_fndx0qs2jcpcwhzl86wm0000gn/T/tmpRXUgV7/minidumps/2853752E-97E4-4027-B3FA-0B4491596CFE.dmp /Users/mozilla/dev/temp/firefox-49.0a1.en-US.mac64.crashreporter-symbols.zip 0:11.67 CRASH: MainThread pid:4447. Test:runner.py. Minidump anaylsed:False. Signature:[@ libcrashme.dylib + 0xe50] Crash dump filename: /var/folders/5k/xmn_fndx0qs2jcpcwhzl86wm0000gn/T/tmpRXUgV7/minidumps/2853752E-97E4-4027-B3FA-0B4491596CFE.dmp Operating system: Mac OS X 10.11.5 15F34 CPU: amd64 family 6 model 69 stepping 1 4 CPUs Crash reason: EXC_BAD_ACCESS / KERN_INVALID_ADDRESS Crash address: 0x0 Process uptime: 9 seconds Thread 0 (crashed) 0 libcrashme.dylib + 0xe50 rax = 0x000000010e494e50 rdx = 0x000000010e494e34 rcx = 0x000000012e1e88c0 rbx = 0x0000000000000001 rsi = 0x00000001093d59b0 rdi = 0x0000000000000000 rbp = 0x00007fff5b7aff60 rsp = 0x00007fff5b7aff50 r8 = 0x00007fff5b7aff00 r9 = 0x00000001098893c8 r10 = 0x00007fff5b7afec0 r11 = 0x000000010e494e10 r12 = 0x0000000126009c18 r13 = 0x0000000109911644 r14 = 0x000000010e494e10 r15 = 0xb700817702326750 rip = 0x000000010e494e50 Found by: given as instruction pointer in context 1 XUL + 0x4bbbc44 rbp = 0x00007fff5b7aff70 rsp = 0x00007fff5b7aff70 rip = 0x0000000109910c44 Found by: previous frame's frame pointer 2 XUL + 0x4bbc620 rsp = 0x00007fff5b7aff90 rip = 0x0000000109911620 Found by: stack scanning 3 libcrashme.dylib + 0xe10 rsp = 0x00007fff5b7affb8 rip = 0x000000010e494e10 Found by: stack scanning 4 XUL + 0x468bf57 rsp = 0x00007fff5b7b0010 rip = 0x00000001093e0f57 Found by: stack scanning 5 libcrashme.dylib + 0xe10 rsp = 0x00007fff5b7b0020 rip = 0x000000010e494e10 Found by: stack scanning 6 XUL + 0x4132fe2 rsp = 0x00007fff5b7b0050 rip = 0x0000000108e87fe2 Found by: stack scanning ...
Flags: needinfo?(mjzffr)
Assignee | ||
Updated•8 years ago
|
Assignee: nobody → mjzffr
Assignee | ||
Comment 14•8 years ago
|
||
I don't often look at crash dumps, but the above in Comment 13 doesn't look any better to me than Comment 0 (aside from the crash symbols being found correctly by minidump_stackwalk). Ryan or anyone -- what do you think?
Flags: needinfo?(ryanvm)
Comment 15•8 years ago
|
||
It looks like you passed a path to a symbols.zip--mozcrash is written to either take a path to an *unzipped* set of symbols, or a URL to a symbols.zip (which it will download and unzip). It doesn't have any code to handle a path to a local zip file.
Assignee | ||
Comment 16•8 years ago
|
||
Ah! Thanks for clearing that up. Better: 0:10.95 LOG: MainThread mozcrash INFO Copy/paste: /Users/mozilla/dev/temp/macosx64-minidump_stackwalk /var/folders/5k/xmn_fndx0qs2jcpcwhzl86wm0000gn/T/tmpfK8KHv/minidumps/B4113599-5462-4790-B2E3-D281FDFC8F81.dmp /Users/mozilla/dev/temp/firefox-49.0a1.en-US.mac64.crashreporter-symbols 0:20.39 CRASH: MainThread pid:6293. Test:runner.py. Minidump anaylsed:False. Signature:[@ Crash] Crash dump filename: /var/folders/5k/xmn_fndx0qs2jcpcwhzl86wm0000gn/T/tmpfK8KHv/minidumps/B4113599-5462-4790-B2E3-D281FDFC8F81.dmp Operating system: Mac OS X 10.11.5 15F34 CPU: amd64 family 6 model 69 stepping 1 4 CPUs Crash reason: EXC_BAD_ACCESS / KERN_INVALID_ADDRESS Crash address: 0x0 Process uptime: 10 seconds Thread 0 (crashed) 0 libcrashme.dylib!Crash [nsCrasher.cpp : 56 + 0x0] rax = 0x0000000110870e50 rdx = 0x0000000110870e34 rcx = 0x0000000163911500 rbx = 0x0000000000000001 rsi = 0x000000010b8d59b0 rdi = 0x0000000000000000 rbp = 0x00007fff592b9f60 rsp = 0x00007fff592b9f50 r8 = 0x00007fff592b9f00 r9 = 0x000000010bd893c8 r10 = 0x00007fff592b9ec0 r11 = 0x0000000110870e10 r12 = 0x0000000128381240 r13 = 0x000000010be11644 r14 = 0x0000000110870e10 r15 = 0x42005606fb6299f4 rip = 0x0000000110870e50 Found by: given as instruction pointer in context 1 XUL!ffi_call_unix64 + 0x4c rbx = 0x0000000000000001 rbp = 0x00007fff592b9f70 rsp = 0x00007fff592b9f70 r12 = 0x0000000128381240 r13 = 0x000000010be11644 r14 = 0x0000000110870e10 r15 = 0x42005606fb6299f4 rip = 0x000000010be10c44 Found by: call frame info 2 XUL!ffi_call + 0x420 rbx = 0x0000000000000001 rbp = 0x00007fff592ba040 rsp = 0x00007fff592b9f90 r12 = 0x0000000128381240 r13 = 0x000000010be11644 r14 = 0x0000000110870e10 r15 = 0x42005606fb6299f4 rip = 0x000000010be11620 Found by: call frame info
Flags: needinfo?(ryanvm)
Comment 17•8 years ago
|
||
Comment on attachment 8768219 [details] Bug 1276011 - Fix symbols_path arg not being passed on to runner class. https://reviewboard.mozilla.org/r/62548/#review59392
Attachment #8768219 -
Flags: review?(dburns) → review+
Comment 18•8 years ago
|
||
Pushed by mjzffr@gmail.com: https://hg.mozilla.org/integration/autoland/rev/f564cb1eb300 Fix symbols_path arg not being passed on to runner class. r=automatedtester
Comment 19•8 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/f564cb1eb300
Status: NEW → RESOLVED
Closed: 8 years ago
status-firefox50:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla50
Comment 20•8 years ago
|
||
Maja, I think it would be good to check also aurora and at least beta if those branches are affected to.
Assignee | ||
Comment 21•8 years ago
|
||
David, looks like we have an aurora uplift for you.
status-firefox47:
--- → unaffected
status-firefox-esr45:
--- → unaffected
Flags: needinfo?(mjzffr) → needinfo?(dburns)
Comment 22•8 years ago
|
||
Thanks Maja. So the sheriffs can land this test framework only change when we add the correct whiteboard entry.
Flags: needinfo?(dburns)
Whiteboard: [checkin-needed-aurora]
Reporter | ||
Comment 24•8 years ago
|
||
bugherder uplift |
https://hg.mozilla.org/releases/mozilla-aurora/rev/1986988c99dd
Whiteboard: [checkin-needed-aurora]
Updated•1 year ago
|
Product: Testing → Remote Protocol
Comment 25•1 year ago
|
||
Moving bug to Testing::Marionette Client and Harness component per bug 1815831.
Component: Marionette → Marionette Client and Harness
Product: Remote Protocol → Testing
You need to log in
before you can comment on or make changes to this bug.
Description
•