Closed Bug 1276011 Opened 8 years ago Closed 8 years ago

Marionette crashes aren't being properly symbolicated

Categories

(Testing :: Marionette Client and Harness, defect)

Version 3
defect
Not set
critical

Tracking

(firefox47 unaffected, firefox48 unaffected, firefox49 fixed, firefox-esr45 unaffected, firefox50 fixed)

RESOLVED FIXED
mozilla50
Tracking Status
firefox47 --- unaffected
firefox48 --- unaffected
firefox49 --- fixed
firefox-esr45 --- unaffected
firefox50 --- fixed

People

(Reporter: RyanVM, Assigned: impossibus)

References

Details

(Keywords: intermittent-failure, regression)

Attachments

(1 file)

I'm pretty sure this used to work, so that's swell. I wanted to throw Taskcluster under the bus for it, but then OSX had to go and ruin that theory.

https://treeherder.mozilla.org/logviewer.html#?job_id=3950116&repo=mozilla-central#L1312
https://treeherder.mozilla.org/logviewer.html#?job_id=28754506&repo=mozilla-inbound#L1314
Flags: needinfo?(dburns)
Ted,

I don't even know where to start to fix this, some guidance please.
Flags: needinfo?(dburns) → needinfo?(ted)
Summary: Mn-e10s crashes aren't being properly symbolicated → Marionette crashes aren't being properly symbolicated
So for the mozilla-inbound case as Ryan filed above I cannot see that we unzip the downloaded symbols in the marionette harness script. Instead we pass the zip file path to mozcrash. But mozcrash does only unzip archives in case it downloads those itself:

https://dxr.mozilla.org/mozilla-central/source/testing/mozbase/mozcrash/mozcrash/mozcrash.py#174
mozharness looks like it's downloading and unzipping the symbols properly:
```
01:14:31     INFO - Running command: ['unzip', '-q', '-o', '/home/worker/workspace/build/target.crashreporter-symbols.zip', '-d', '/home/worker/workspace/build/symbols']
01:14:31     INFO - Copy/paste: unzip -q -o /home/worker/workspace/build/target.crashreporter-symbols.zip -d /home/worker/workspace/build/symbols
```

And the marionette runtests.py is being invoked with `--symbols-path=/home/worker/workspace/build/symbols`, which looks right. However! Something is not passing that symbols path down to mozcrash properly, if you look at the minidump_stackwalk invocation it puts in the log:
```
01:19:39     INFO -  mozcrash Copy/paste: /usr/local/bin/linux64-minidump_stackwalk /tmp/tmpEAKbwF.mozrunner/minidumps/2ee0ace2-b7d1-8fab-65952a72-05b2565e.dmp /tmp/tmpIc4Mhu
```

It's passing a temp directory there, not the symbols path. mozcrash will do that if you pass `None` down as the symbols path:
https://dxr.mozilla.org/mozilla-central/rev/d1102663db10b3d4b9358f3cf4e16b7c56902352/testing/mozbase/mozcrash/mozcrash/mozcrash.py#167

I tried tracing through the Marionette code, and it *looks* like it's persisting the symbols_path all the way down through mozrunner, but it's pretty complicated and I'm not at all confident of that.
Flags: needinfo?(ted)
whimboo: this is the kind of thing I was worried about with making `symbols_path` optional. :-/
Maja, can you have a look that we are doing things properly based on what ted has documented in comment 4 when you have a moment please.
Flags: needinfo?(mjzffr)
So the builds as pasted earlier on this bug are from Taskcluster and Buildbot.

Here the taskcluster one:

01:13:59     INFO - Run as /home/worker/workspace/mozharness/scripts/marionette.py --config-file mozharness/configs/marionette/prod_config.py --config-file mozharness/configs/remove_executables.py --no-read-buildbot-config --installer-url=https://queue.taskcluster.net/v1/task/KnQrzy0rRVONLRnC82z6OA/artifacts/public/build/target.tar.bz2 --test-packages-url=https://queue.taskcluster.net/v1/task/KnQrzy0rRVONLRnC82z6OA/artifacts/public/build/target.test_packages.json --e10s --download-symbols=true

We use the following symbols-url setting from the prod_config.py:
https://dxr.mozilla.org/mozilla-central/source/testing/mozharness/configs/marionette/prod_config.py#80

Because of download-symbols=True we download and extract. Both work fine and symbols should be at the location of symbols_path. Also all is fine for the run command call for Marionette:

01:15:55     INFO - Copy/paste: /home/worker/workspace/build/venv/bin/python -u /home/worker/workspace/build/tests/marionette/marionette/runtests.py --gecko-log=/home/worker/workspace/build/blobber_upload_dir/gecko.log --log-raw=/home/worker/workspace/build/blobber_upload_dir/marionette_raw.log --log-errorsummary=/home/worker/workspace/build/blobber_upload_dir/marionette_errorsummary.log --binary=/home/worker/workspace/build/application/firefox/firefox --address=localhost:2828 --symbols-path=/home/worker/workspace/build/symbols /home/worker/workspace/build/tests/marionette/tests/testing/marionette/harness/marionette/tests/unit-tests.ini

Maybe that is an issue in mozcrash here?


Here for the buildbot command:

/tools/buildbot/bin/python scripts/scripts/marionette.py --cfg marionette/prod_config.py --e10s --blob-upload-branch mozilla-inbound
 in dir /builds/slave/test/. (timeout 1800 secs) (maxTime 7200 secs)

The same config file is used, but with ondemand for download. As result we do not download and extract the archive via mozharness but pass it directly to mozcrash:

22:18:56     INFO - Calling ['/builds/slave/test/build/venv/bin/python', '-u', '/builds/slave/test/build/tests/marionette/marionette/runtests.py', '--gecko-log=/builds/slave/test/build/blobber_upload_dir/gecko.log', '--log-raw=/builds/slave/test/build/blobber_upload_dir/marionette_raw.log', '--log-errorsummary=/builds/slave/test/build/blobber_upload_dir/marionette_errorsummary.log', '--binary=/builds/slave/test/build/application/NightlyDebug.app/Contents/MacOS/firefox', '--address=localhost:2828', '--symbols-path=https://queue.taskcluster.net/v1/task/faovEIEtTvyN6xaRWYI_xw/artifacts/public/build/firefox-49.0a1.en-US.mac64.crashreporter-symbols.zip', '/builds/slave/test/build/tests/marionette/tests/testing/marionette/harness/marionette/tests/unit-tests.ini'] with output_timeout 1000

So in both cases we have a symbols-path specified, so none of my changes should have an impact here:
https://hg.mozilla.org/mozilla-central/rev/7ba4a5c7bc0e#l1.30

Maybe it might be good to get more logging into mozcrash? Also in case of symbols being not found or such.
Btw I can see some similar things for our firefox-ui-tests (bug 1279490). Given that crashes were properly symbolicated before, I wonder if that could be related a build system change?
Maybe the task definition changes for --download-symbols on bug 1267825 caused this problem? We have ondemand now for opt and pgo builds. I thought that minidump-stackwalk can only fetch nightly and release builds but not inbound. Or am I wrong, Ted?
Flags: needinfo?(ted)
I think the following comment in mozharness answers it:

https://dxr.mozilla.org/mozilla-central/source/testing/mozharness/mozharness/mozilla/testing/testbase.py#206

> # As of now this only works for Nightly and release builds.

Maja, you may want to check mozcrash how it handles the ondemand case with a known symbols url.
Flags: needinfo?(ted)
Blocks: 1281998
(In reply to Henrik Skupin (:whimboo) from comment #9)
> Maybe the task definition changes for --download-symbols on bug 1267825
> caused this problem? We have ondemand now for opt and pgo builds. I thought
> that minidump-stackwalk can only fetch nightly and release builds but not
> inbound. Or am I wrong, Ted?

Per my investigation in comment 4, I don't think mozharness is to blame here. It seems to be downloading and unzipping the symbols correctly, and passing that path to the Marionette harness correctly. Somewhere inside the harness that symbols_path is getting lost and mozcrash winds up using a temp directory.
I traced the runner code and it seems that the `symbols_path` arg was not being passed on to the Marionette constructor, and therefore the runner class, etc.

With the fix I get the following locally when testing with the crashme addon:

0:09.80 LOG: MainThread mozcrash INFO Copy/paste: /Users/mozilla/dev/temp/macosx64-minidump_stackwalk /var/folders/5k/xmn_fndx0qs2jcpcwhzl86wm0000gn/T/tmpRXUgV7/minidumps/2853752E-97E4-4027-B3FA-0B4491596CFE.dmp /Users/mozilla/dev/temp/firefox-49.0a1.en-US.mac64.crashreporter-symbols.zip
 0:11.67 CRASH: MainThread pid:4447. Test:runner.py. Minidump anaylsed:False. Signature:[@ libcrashme.dylib + 0xe50]
Crash dump filename: /var/folders/5k/xmn_fndx0qs2jcpcwhzl86wm0000gn/T/tmpRXUgV7/minidumps/2853752E-97E4-4027-B3FA-0B4491596CFE.dmp
Operating system: Mac OS X
                  10.11.5 15F34
CPU: amd64
     family 6 model 69 stepping 1
     4 CPUs

Crash reason:  EXC_BAD_ACCESS / KERN_INVALID_ADDRESS
Crash address: 0x0
Process uptime: 9 seconds

Thread 0 (crashed)
 0  libcrashme.dylib + 0xe50
    rax = 0x000000010e494e50   rdx = 0x000000010e494e34
    rcx = 0x000000012e1e88c0   rbx = 0x0000000000000001
    rsi = 0x00000001093d59b0   rdi = 0x0000000000000000
    rbp = 0x00007fff5b7aff60   rsp = 0x00007fff5b7aff50
     r8 = 0x00007fff5b7aff00    r9 = 0x00000001098893c8
    r10 = 0x00007fff5b7afec0   r11 = 0x000000010e494e10
    r12 = 0x0000000126009c18   r13 = 0x0000000109911644
    r14 = 0x000000010e494e10   r15 = 0xb700817702326750
    rip = 0x000000010e494e50
    Found by: given as instruction pointer in context
 1  XUL + 0x4bbbc44
    rbp = 0x00007fff5b7aff70   rsp = 0x00007fff5b7aff70
    rip = 0x0000000109910c44
    Found by: previous frame's frame pointer
 2  XUL + 0x4bbc620
    rsp = 0x00007fff5b7aff90   rip = 0x0000000109911620
    Found by: stack scanning
 3  libcrashme.dylib + 0xe10
    rsp = 0x00007fff5b7affb8   rip = 0x000000010e494e10
    Found by: stack scanning
 4  XUL + 0x468bf57
    rsp = 0x00007fff5b7b0010   rip = 0x00000001093e0f57
    Found by: stack scanning
 5  libcrashme.dylib + 0xe10
    rsp = 0x00007fff5b7b0020   rip = 0x000000010e494e10
    Found by: stack scanning
 6  XUL + 0x4132fe2
    rsp = 0x00007fff5b7b0050   rip = 0x0000000108e87fe2
    Found by: stack scanning
...
Flags: needinfo?(mjzffr)
Assignee: nobody → mjzffr
I don't often look at crash dumps, but the above in Comment 13 doesn't look any better to me than Comment 0 (aside from the crash symbols being found correctly by minidump_stackwalk). 

Ryan or anyone -- what do you think?
Flags: needinfo?(ryanvm)
It looks like you passed a path to a symbols.zip--mozcrash is written to either take a path to an *unzipped* set of symbols, or a URL to a symbols.zip (which it will download and unzip). It doesn't have any code to handle a path to a local zip file.
Ah! Thanks for clearing that up.

Better:
0:10.95 LOG: MainThread mozcrash INFO Copy/paste: /Users/mozilla/dev/temp/macosx64-minidump_stackwalk /var/folders/5k/xmn_fndx0qs2jcpcwhzl86wm0000gn/T/tmpfK8KHv/minidumps/B4113599-5462-4790-B2E3-D281FDFC8F81.dmp /Users/mozilla/dev/temp/firefox-49.0a1.en-US.mac64.crashreporter-symbols
 0:20.39 CRASH: MainThread pid:6293. Test:runner.py. Minidump anaylsed:False. Signature:[@ Crash]
Crash dump filename: /var/folders/5k/xmn_fndx0qs2jcpcwhzl86wm0000gn/T/tmpfK8KHv/minidumps/B4113599-5462-4790-B2E3-D281FDFC8F81.dmp
Operating system: Mac OS X
                  10.11.5 15F34
CPU: amd64
     family 6 model 69 stepping 1
     4 CPUs

Crash reason:  EXC_BAD_ACCESS / KERN_INVALID_ADDRESS
Crash address: 0x0
Process uptime: 10 seconds

Thread 0 (crashed)
 0  libcrashme.dylib!Crash [nsCrasher.cpp : 56 + 0x0]
    rax = 0x0000000110870e50   rdx = 0x0000000110870e34
    rcx = 0x0000000163911500   rbx = 0x0000000000000001
    rsi = 0x000000010b8d59b0   rdi = 0x0000000000000000
    rbp = 0x00007fff592b9f60   rsp = 0x00007fff592b9f50
     r8 = 0x00007fff592b9f00    r9 = 0x000000010bd893c8
    r10 = 0x00007fff592b9ec0   r11 = 0x0000000110870e10
    r12 = 0x0000000128381240   r13 = 0x000000010be11644
    r14 = 0x0000000110870e10   r15 = 0x42005606fb6299f4
    rip = 0x0000000110870e50
    Found by: given as instruction pointer in context
 1  XUL!ffi_call_unix64 + 0x4c
    rbx = 0x0000000000000001   rbp = 0x00007fff592b9f70
    rsp = 0x00007fff592b9f70   r12 = 0x0000000128381240
    r13 = 0x000000010be11644   r14 = 0x0000000110870e10
    r15 = 0x42005606fb6299f4   rip = 0x000000010be10c44
    Found by: call frame info
 2  XUL!ffi_call + 0x420
    rbx = 0x0000000000000001   rbp = 0x00007fff592ba040
    rsp = 0x00007fff592b9f90   r12 = 0x0000000128381240
    r13 = 0x000000010be11644   r14 = 0x0000000110870e10
    r15 = 0x42005606fb6299f4   rip = 0x000000010be11620
    Found by: call frame info
Flags: needinfo?(ryanvm)
Comment on attachment 8768219 [details]
Bug 1276011 - Fix symbols_path arg not being passed on to runner class.

https://reviewboard.mozilla.org/r/62548/#review59392
Attachment #8768219 - Flags: review?(dburns) → review+
Pushed by mjzffr@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/f564cb1eb300
Fix symbols_path arg not being passed on to runner class. r=automatedtester
https://hg.mozilla.org/mozilla-central/rev/f564cb1eb300
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla50
Maja, I think it would be good to check also aurora and at least beta if those branches are affected to.
Flags: needinfo?(mjzffr)
David, looks like we have an aurora uplift for you.
Flags: needinfo?(mjzffr) → needinfo?(dburns)
Thanks Maja. So the sheriffs can land this test framework only change when we add the correct whiteboard entry.
Flags: needinfo?(dburns)
Whiteboard: [checkin-needed-aurora]
Blocks: 1274408
No longer blocks: 1281998
Actually this is a regression from bug 1274408.
Keywords: regression
https://hg.mozilla.org/releases/mozilla-aurora/rev/1986988c99dd
Whiteboard: [checkin-needed-aurora]
Product: Testing → Remote Protocol
Moving bug to Testing::Marionette Client and Harness component per bug 1815831.
Component: Marionette → Marionette Client and Harness
Product: Remote Protocol → Testing
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: