|Submitter||Diff||Changes||Open Issues||Last Updated|
|Error loading review requests:|
58 bytes, text/x-review-board-request
|Details | Review|
I'm pretty sure this used to work, so that's swell. I wanted to throw Taskcluster under the bus for it, but then OSX had to go and ruin that theory. https://treeherder.mozilla.org/logviewer.html#?job_id=3950116&repo=mozilla-central#L1312 https://treeherder.mozilla.org/logviewer.html#?job_id=28754506&repo=mozilla-inbound#L1314
Ted, I don't even know where to start to fix this, some guidance please.
Flags: needinfo?(dburns) → needinfo?(ted)
Summary: Mn-e10s crashes aren't being properly symbolicated → Marionette crashes aren't being properly symbolicated
So for the mozilla-inbound case as Ryan filed above I cannot see that we unzip the downloaded symbols in the marionette harness script. Instead we pass the zip file path to mozcrash. But mozcrash does only unzip archives in case it downloads those itself: https://dxr.mozilla.org/mozilla-central/source/testing/mozbase/mozcrash/mozcrash/mozcrash.py#174
mozharness looks like it's downloading and unzipping the symbols properly: ``` 01:14:31 INFO - Running command: ['unzip', '-q', '-o', '/home/worker/workspace/build/target.crashreporter-symbols.zip', '-d', '/home/worker/workspace/build/symbols'] 01:14:31 INFO - Copy/paste: unzip -q -o /home/worker/workspace/build/target.crashreporter-symbols.zip -d /home/worker/workspace/build/symbols ``` And the marionette runtests.py is being invoked with `--symbols-path=/home/worker/workspace/build/symbols`, which looks right. However! Something is not passing that symbols path down to mozcrash properly, if you look at the minidump_stackwalk invocation it puts in the log: ``` 01:19:39 INFO - mozcrash Copy/paste: /usr/local/bin/linux64-minidump_stackwalk /tmp/tmpEAKbwF.mozrunner/minidumps/2ee0ace2-b7d1-8fab-65952a72-05b2565e.dmp /tmp/tmpIc4Mhu ``` It's passing a temp directory there, not the symbols path. mozcrash will do that if you pass `None` down as the symbols path: https://dxr.mozilla.org/mozilla-central/rev/d1102663db10b3d4b9358f3cf4e16b7c56902352/testing/mozbase/mozcrash/mozcrash/mozcrash.py#167 I tried tracing through the Marionette code, and it *looks* like it's persisting the symbols_path all the way down through mozrunner, but it's pretty complicated and I'm not at all confident of that.
whimboo: this is the kind of thing I was worried about with making `symbols_path` optional. :-/
Maja, can you have a look that we are doing things properly based on what ted has documented in comment 4 when you have a moment please.
So the builds as pasted earlier on this bug are from Taskcluster and Buildbot. Here the taskcluster one: 01:13:59 INFO - Run as /home/worker/workspace/mozharness/scripts/marionette.py --config-file mozharness/configs/marionette/prod_config.py --config-file mozharness/configs/remove_executables.py --no-read-buildbot-config --installer-url=https://queue.taskcluster.net/v1/task/KnQrzy0rRVONLRnC82z6OA/artifacts/public/build/target.tar.bz2 --test-packages-url=https://queue.taskcluster.net/v1/task/KnQrzy0rRVONLRnC82z6OA/artifacts/public/build/target.test_packages.json --e10s --download-symbols=true We use the following symbols-url setting from the prod_config.py: https://dxr.mozilla.org/mozilla-central/source/testing/mozharness/configs/marionette/prod_config.py#80 Because of download-symbols=True we download and extract. Both work fine and symbols should be at the location of symbols_path. Also all is fine for the run command call for Marionette: 01:15:55 INFO - Copy/paste: /home/worker/workspace/build/venv/bin/python -u /home/worker/workspace/build/tests/marionette/marionette/runtests.py --gecko-log=/home/worker/workspace/build/blobber_upload_dir/gecko.log --log-raw=/home/worker/workspace/build/blobber_upload_dir/marionette_raw.log --log-errorsummary=/home/worker/workspace/build/blobber_upload_dir/marionette_errorsummary.log --binary=/home/worker/workspace/build/application/firefox/firefox --address=localhost:2828 --symbols-path=/home/worker/workspace/build/symbols /home/worker/workspace/build/tests/marionette/tests/testing/marionette/harness/marionette/tests/unit-tests.ini Maybe that is an issue in mozcrash here? Here for the buildbot command: /tools/buildbot/bin/python scripts/scripts/marionette.py --cfg marionette/prod_config.py --e10s --blob-upload-branch mozilla-inbound in dir /builds/slave/test/. (timeout 1800 secs) (maxTime 7200 secs) The same config file is used, but with ondemand for download. As result we do not download and extract the archive via mozharness but pass it directly to mozcrash: 22:18:56 INFO - Calling ['/builds/slave/test/build/venv/bin/python', '-u', '/builds/slave/test/build/tests/marionette/marionette/runtests.py', '--gecko-log=/builds/slave/test/build/blobber_upload_dir/gecko.log', '--log-raw=/builds/slave/test/build/blobber_upload_dir/marionette_raw.log', '--log-errorsummary=/builds/slave/test/build/blobber_upload_dir/marionette_errorsummary.log', '--binary=/builds/slave/test/build/application/NightlyDebug.app/Contents/MacOS/firefox', '--address=localhost:2828', '--symbols-path=https://queue.taskcluster.net/v1/task/faovEIEtTvyN6xaRWYI_xw/artifacts/public/build/firefox-49.0a1.en-US.mac64.crashreporter-symbols.zip', '/builds/slave/test/build/tests/marionette/tests/testing/marionette/harness/marionette/tests/unit-tests.ini'] with output_timeout 1000 So in both cases we have a symbols-path specified, so none of my changes should have an impact here: https://hg.mozilla.org/mozilla-central/rev/7ba4a5c7bc0e#l1.30 Maybe it might be good to get more logging into mozcrash? Also in case of symbols being not found or such.
Btw I can see some similar things for our firefox-ui-tests (bug 1279490). Given that crashes were properly symbolicated before, I wonder if that could be related a build system change?
Maybe the task definition changes for --download-symbols on bug 1267825 caused this problem? We have ondemand now for opt and pgo builds. I thought that minidump-stackwalk can only fetch nightly and release builds but not inbound. Or am I wrong, Ted?
I think the following comment in mozharness answers it: https://dxr.mozilla.org/mozilla-central/source/testing/mozharness/mozharness/mozilla/testing/testbase.py#206 > # As of now this only works for Nightly and release builds. Maja, you may want to check mozcrash how it handles the ondemand case with a known symbols url.
(In reply to Henrik Skupin (:whimboo) from comment #9) > Maybe the task definition changes for --download-symbols on bug 1267825 > caused this problem? We have ondemand now for opt and pgo builds. I thought > that minidump-stackwalk can only fetch nightly and release builds but not > inbound. Or am I wrong, Ted? Per my investigation in comment 4, I don't think mozharness is to blame here. It seems to be downloading and unzipping the symbols correctly, and passing that path to the Marionette harness correctly. Somewhere inside the harness that symbols_path is getting lost and mozcrash winds up using a temp directory.
Created attachment 8768219 [details] Bug 1276011 - Fix symbols_path arg not being passed on to runner class. Review commit: https://reviewboard.mozilla.org/r/62548/diff/#index_header See other reviews: https://reviewboard.mozilla.org/r/62548/
Attachment #8768219 - Flags: review?(dburns)
I traced the runner code and it seems that the `symbols_path` arg was not being passed on to the Marionette constructor, and therefore the runner class, etc. With the fix I get the following locally when testing with the crashme addon: 0:09.80 LOG: MainThread mozcrash INFO Copy/paste: /Users/mozilla/dev/temp/macosx64-minidump_stackwalk /var/folders/5k/xmn_fndx0qs2jcpcwhzl86wm0000gn/T/tmpRXUgV7/minidumps/2853752E-97E4-4027-B3FA-0B4491596CFE.dmp /Users/mozilla/dev/temp/firefox-49.0a1.en-US.mac64.crashreporter-symbols.zip 0:11.67 CRASH: MainThread pid:4447. Test:runner.py. Minidump anaylsed:False. Signature:[@ libcrashme.dylib + 0xe50] Crash dump filename: /var/folders/5k/xmn_fndx0qs2jcpcwhzl86wm0000gn/T/tmpRXUgV7/minidumps/2853752E-97E4-4027-B3FA-0B4491596CFE.dmp Operating system: Mac OS X 10.11.5 15F34 CPU: amd64 family 6 model 69 stepping 1 4 CPUs Crash reason: EXC_BAD_ACCESS / KERN_INVALID_ADDRESS Crash address: 0x0 Process uptime: 9 seconds Thread 0 (crashed) 0 libcrashme.dylib + 0xe50 rax = 0x000000010e494e50 rdx = 0x000000010e494e34 rcx = 0x000000012e1e88c0 rbx = 0x0000000000000001 rsi = 0x00000001093d59b0 rdi = 0x0000000000000000 rbp = 0x00007fff5b7aff60 rsp = 0x00007fff5b7aff50 r8 = 0x00007fff5b7aff00 r9 = 0x00000001098893c8 r10 = 0x00007fff5b7afec0 r11 = 0x000000010e494e10 r12 = 0x0000000126009c18 r13 = 0x0000000109911644 r14 = 0x000000010e494e10 r15 = 0xb700817702326750 rip = 0x000000010e494e50 Found by: given as instruction pointer in context 1 XUL + 0x4bbbc44 rbp = 0x00007fff5b7aff70 rsp = 0x00007fff5b7aff70 rip = 0x0000000109910c44 Found by: previous frame's frame pointer 2 XUL + 0x4bbc620 rsp = 0x00007fff5b7aff90 rip = 0x0000000109911620 Found by: stack scanning 3 libcrashme.dylib + 0xe10 rsp = 0x00007fff5b7affb8 rip = 0x000000010e494e10 Found by: stack scanning 4 XUL + 0x468bf57 rsp = 0x00007fff5b7b0010 rip = 0x00000001093e0f57 Found by: stack scanning 5 libcrashme.dylib + 0xe10 rsp = 0x00007fff5b7b0020 rip = 0x000000010e494e10 Found by: stack scanning 6 XUL + 0x4132fe2 rsp = 0x00007fff5b7b0050 rip = 0x0000000108e87fe2 Found by: stack scanning ...
I don't often look at crash dumps, but the above in Comment 13 doesn't look any better to me than Comment 0 (aside from the crash symbols being found correctly by minidump_stackwalk). Ryan or anyone -- what do you think?
It looks like you passed a path to a symbols.zip--mozcrash is written to either take a path to an *unzipped* set of symbols, or a URL to a symbols.zip (which it will download and unzip). It doesn't have any code to handle a path to a local zip file.
Ah! Thanks for clearing that up. Better: 0:10.95 LOG: MainThread mozcrash INFO Copy/paste: /Users/mozilla/dev/temp/macosx64-minidump_stackwalk /var/folders/5k/xmn_fndx0qs2jcpcwhzl86wm0000gn/T/tmpfK8KHv/minidumps/B4113599-5462-4790-B2E3-D281FDFC8F81.dmp /Users/mozilla/dev/temp/firefox-49.0a1.en-US.mac64.crashreporter-symbols 0:20.39 CRASH: MainThread pid:6293. Test:runner.py. Minidump anaylsed:False. Signature:[@ Crash] Crash dump filename: /var/folders/5k/xmn_fndx0qs2jcpcwhzl86wm0000gn/T/tmpfK8KHv/minidumps/B4113599-5462-4790-B2E3-D281FDFC8F81.dmp Operating system: Mac OS X 10.11.5 15F34 CPU: amd64 family 6 model 69 stepping 1 4 CPUs Crash reason: EXC_BAD_ACCESS / KERN_INVALID_ADDRESS Crash address: 0x0 Process uptime: 10 seconds Thread 0 (crashed) 0 libcrashme.dylib!Crash [nsCrasher.cpp : 56 + 0x0] rax = 0x0000000110870e50 rdx = 0x0000000110870e34 rcx = 0x0000000163911500 rbx = 0x0000000000000001 rsi = 0x000000010b8d59b0 rdi = 0x0000000000000000 rbp = 0x00007fff592b9f60 rsp = 0x00007fff592b9f50 r8 = 0x00007fff592b9f00 r9 = 0x000000010bd893c8 r10 = 0x00007fff592b9ec0 r11 = 0x0000000110870e10 r12 = 0x0000000128381240 r13 = 0x000000010be11644 r14 = 0x0000000110870e10 r15 = 0x42005606fb6299f4 rip = 0x0000000110870e50 Found by: given as instruction pointer in context 1 XUL!ffi_call_unix64 + 0x4c rbx = 0x0000000000000001 rbp = 0x00007fff592b9f70 rsp = 0x00007fff592b9f70 r12 = 0x0000000128381240 r13 = 0x000000010be11644 r14 = 0x0000000110870e10 r15 = 0x42005606fb6299f4 rip = 0x000000010be10c44 Found by: call frame info 2 XUL!ffi_call + 0x420 rbx = 0x0000000000000001 rbp = 0x00007fff592ba040 rsp = 0x00007fff592b9f90 r12 = 0x0000000128381240 r13 = 0x000000010be11644 r14 = 0x0000000110870e10 r15 = 0x42005606fb6299f4 rip = 0x000000010be11620 Found by: call frame info
Comment on attachment 8768219 [details] Bug 1276011 - Fix symbols_path arg not being passed on to runner class. https://reviewboard.mozilla.org/r/62548/#review59392
Attachment #8768219 - Flags: review?(dburns) → review+
Pushed by email@example.com: https://hg.mozilla.org/integration/autoland/rev/f564cb1eb300 Fix symbols_path arg not being passed on to runner class. r=automatedtester
Status: NEW → RESOLVED
Last Resolved: 2 years ago
status-firefox50: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla50
Maja, I think it would be good to check also aurora and at least beta if those branches are affected to.
status-firefox48: --- → ?
status-firefox49: --- → ?
David, looks like we have an aurora uplift for you.
status-firefox47: --- → unaffected
status-firefox48: ? → unaffected
status-firefox49: ? → affected
status-firefox-esr45: --- → unaffected
Flags: needinfo?(mjzffr) → needinfo?(dburns)
Thanks Maja. So the sheriffs can land this test framework only change when we add the correct whiteboard entry.
Actually this is a regression from bug 1274408.
status-firefox49: affected → fixed
You need to log in before you can comment on or make changes to this bug.