Closed
Bug 1221661
Opened 9 years ago
Closed 9 years ago
task cluster test results seem to fail on crash reporter tests- probably a common cause
Categories
(Testing :: Mochitest, defect)
Testing
Mochitest
Tracking
(firefox46 fixed)
RESOLVED
FIXED
mozilla46
Tracking | Status | |
---|---|---|
firefox46 | --- | fixed |
People
(Reporter: jmaher, Assigned: dustin)
References
Details
Attachments
(2 files)
from this try run: https://treeherder.mozilla.org/#/jobs?repo=try&revision=603b29218a37 we have some failures that look suspicious: 10:02 < jmaher> ted: look at m-3: dom/plugins/test/mochitest/test_CrashService_crash.html | 10:03 < jmaher> ted: and bc2: browser/base/content/test/plugins/browser_CTP_crashreporting.j 10:03 < jmaher> ted: and bc6: toolkit/components/thumbnails/test/browser_thumbnails_bg_crash_during_capture.js | dumpID is present and not an empty string and test recommends: 10:11 <@ted> so probably the first thing i would try here (i don't have time *right* at the moment) 10:12 <@ted> is running that same docker image, run Xvfb, and run any old firefox build in it 10:12 <@ted> and kill -ABRT it 10:12 < jmaher> ah, got it 10:12 <@ted> with MOZ_CRASHREPORTER=1 MOZ_CRASHREPORTER_NO_REPORT=1 in the env 10:12 <@ted> and see if you get a minidump in $profile/minidumps ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── 10:12 <@ted> actually, these tests are crashing the plugin/content process, so maybe kill -ABRT plugin-container 10:14 <@ted> if that works, maybe run as much of https://dxr.mozilla.org/mozilla-central/source/testing/taskcluster/scripts/tester/test-linux.sh as possible 10:14 <@ted> since it sets up the environment a bit
Updated•9 years ago
|
Blocks: tc-linux64-debug
Reporter | ||
Comment 1•9 years ago
|
||
while running some tests on a production docker image locally, I see dumps being created- one less headache to figure out.
Reporter | ||
Comment 2•9 years ago
|
||
doing kill -ABRT works; I guess we need to figure out where the missing glue is for getting the actual dump out of there.
Reporter | ||
Comment 3•9 years ago
|
||
trying to reproduce this on the above mentioned tests, I run into errors while waiting for focus.
Reporter | ||
Comment 4•9 years ago
|
||
this might be specific to the lack of a window manager (think dialogs not working, etc.)- will update this when we test with a proper window manager on try. locally it seems to fix the problem.
Comment 5•9 years ago
|
||
Hm, that's interesting, I wouldn't expect that to have any effect here. We don't launch the crash reporter client during these tests. I would expect lots of other tests to fail without a window manager though, honestly. I know we're running some sort of window manager under Xvfb on the buildbot Linux testers.
Comment 6•9 years ago
|
||
I was poking through open Breakpad issues and came across https://bugs.chromium.org/p/google-breakpad/issues/detail?id=644. I did some local testing and it looks like Breakpad is actually just broken under Docker. I'll see if it's something fixable.
Comment 7•9 years ago
|
||
Okay, so it turns out this is due to Ubuntu's AppArmor profiles. If I pass `--cap-add SYS_PTRACE --security-opt apparmor:unconfined` to `docker run` on my machine I can run the Breakpad unit tests and they pass inside Docker. We'll have to get Taskcluster's docker-worker to do that.
Assignee | ||
Comment 8•9 years ago
|
||
--security-opt apparmor:unconfined seems .. unconfined 14:41:55 <ted> ¯\_(ツ)_/¯ 14:42:07 <ted> that's the best i could get out of that docker issue 14:42:44 <ted> i can give you a super simple command to test and you can try to find something that's more restrictive but still works 14:43:28 <ted> `od -x /proc/$$/auxv` 14:43:35 <ted> if that prints an error, breakpad won't work 14:43:40 <ted> if it prints output, it will 14:44:15 <ted> $ docker run --rm -t -i ubuntu:14.04 /bin/sh -c 'od -x /proc/$$/auxv' 14:44:15 <ted> od: /proc/1/auxv: read error: Permission denied 14:44:51 <ted> $ docker run --rm -t -i --cap-add SYS_PTRACE --security-opt apparmor:unconfined ubuntu:14.04 /bin/sh -c 'od -x /proc/$$/auxv' 14:44:51 <ted> 0000000 0021 0000 0000 0000 c000 84fe 7ffe 0000 14:44:54 <ted> <...>
Comment 9•9 years ago
|
||
I'm trying to figure out where to append --cap-add SYS_PTRACE --security-opt apparmor:unconfined to test jobs and I've failed to find where this could be changed. Any suggestions? Is starting the docker image part of one of the TC repos?
Comment 10•9 years ago
|
||
dustin mentioned on IRC that it is too risky to use --security-opt apparmor:unconfined Are we OK to having green TC jobs running side by side as tier2 *across* the board? What would the value of running tests without having a solution here?
Reporter | ||
Comment 11•9 years ago
|
||
have we determined on try that adding --security-opt will fix the tests? If we had that data, then we can determine which tests pass/fail and document it. Maybe a mozinfo flag and manifest condition could help out?
Assignee | ||
Comment 12•9 years ago
|
||
This isn't something we can test in try -- it requires changes to the docker-worker which invokes docker. From the little bit I've been able to read, the proposal is the equivalent of removing your front door and doorframe because your spare key doesn't work in the lock. Effective? yes. A good idea? maybe not. Ted gave a pretty good reproduction recipe that we could use to test other, more refined solutions. Someone -- and it might be me, just not now -- needs to dig into this and look at how AppArmor works, how docker uses it, how that might be configured, and so on, then devise the minimal solution with a full understanding of the security implications. I suspect that might start by putting AppArmor in complain mode and running Ted's recipe in a docker container. Then probably duplicate the docker-default profile with the necessary changes (after understanding them -- I'd guess it's just allowing ptrace and/or reading auxv). https://github.com/docker/docker/blob/master/docs/security/apparmor.md https://github.com/docker/docker/blob/master/daemon/execdriver/native/apparmor.go#L32 It looks like the profile is pretty basic. Docker rewrites its own profile on startup, but we could easily copy/paste it to another profile named `tc-docker` and invoke that with `--security-opt apparmor:tc-docker`.
Comment 13•9 years ago
|
||
As I understand it, this is a tier1 blocker and the top one atm. I assume we can continue adding more tier2 jobs for now without this.
Assignee | ||
Updated•9 years ago
|
Assignee: nobody → dustin
Comment 14•9 years ago
|
||
Docker looks to have code to work around this on trunk: https://github.com/docker/docker/blob/master/daemon/execdriver/native/apparmor.go#L67 Adding `ptrace (trace,read) peer=docker-default,` to /etc/apparmor.d/docker on my local machine and restarting apparmor with `service apparmor restart` makes the simple test (`docker run --rm -t -i ubuntu:14.04 /bin/sh -c 'od -x /proc/$$/auxv'`) work locally. However, these changes don't survive a docker daemon restart. Doing what dustin suggested in comment 12 seems to work fine though: ``` cp /etc/apparmor.d/docker /etc/apparmor.d/docker-ptrace patch /etc/apparmor.d/docker-ptrace <<EOF 6c6 < profile docker-default flags=(attach_disconnected,mediate_deleted) { --- > profile docker-ptrace flags=(attach_disconnected,mediate_deleted) { 14a15 > ptrace (trace,read) peer=docker-ptrace, EOF service apparmor restart ``` Running my test command with this apparmor profile works fine then: `docker run --rm -t -i --security-opt apparmor:docker-ptrace ubuntu:14.04 /bin/sh -c 'od -x /proc/$$/auxv'`. Running a simple Breakpad test works as well, so I think this should fix the unittest problems.
Comment 15•9 years ago
|
||
Just as clarification (because I wasn't sure), Docker containers use PID namespaces for each container: http://man7.org/linux/man-pages/man7/pid_namespaces.7.html So processes running inside a container cannot inspect processes outside of that container, even with ptrace enabled. I think that enabling ptrace shouldn't actually weaken security.
Assignee | ||
Comment 16•9 years ago
|
||
So basically this new profile would allow processes in docker containers to ptrace one another (since they're all in the docker-ptrace profile), but due to PID namespaces they can practically only ptrace other processes in the same container. That seems pretty reasonable. :kang, is that about right? I'll start in on a patch to accomplish this as a docker-worker feature, preferably controlled by workerType configuration, so we can enable it only for test jobs' workerTypes, where inter-container interference isn't such a big deal.
Flags: needinfo?(gdestuynder)
Assignee | ||
Comment 17•9 years ago
|
||
Oh, hey, we can isolate ptrace to within the single container by creating an apparmor profile per task..
Some disclaimer: ptrac'ing a process is the equivalent of full control of the said process (both memory and execution). The namespaces in kernel have been "patched" in over time so that all kernel code would be namespace aware and properly separate resources. However, that also means there's a lower confidence in how well separated these namespaces are / higher probability for bugs. Due to that, apparmor (or selinux or else) are used to ensure all potentially dangerous calls are forbidden (its much easier to audit the system call table, than what the code behind the calls actually do). It's a little safer that way and ptrace() is always among the calls to be denied when making such lists. Likewise, there might be privileged processes inside a container that you don't want to be ptraced even as root - obviously, removing the limitation will also allow that. With that in mind, it should otherwise work (i.e. can't ptrace the other PID namespace). Basically, a test case would be: unshare -f -p /bin/bash ps aux #pick a process from parent - you can see them because you're not remounting proc. gdb -p 999 #gdb does a ptrace() call on 999 <No such process 999> #yay!
Flags: needinfo?(gdestuynder)
Assignee | ||
Comment 19•9 years ago
|
||
Attachment #8695305 -
Flags: review?(garndt)
Comment 20•9 years ago
|
||
Comment on attachment 8695305 [details] [review] https://github.com/taskcluster/docker-worker/pull/192 After reviewing some failures with Dustin, he was able to get it running in his environment and we also tested on an actual deployed ami of the worker. For some reason the test fails in the vagrant environment but since we tested by other means, we should let this PR merge and figure out the test later.
Attachment #8695305 -
Flags: review?(garndt) → review+
Assignee | ||
Comment 21•9 years ago
|
||
Merged! We'll deploy tomorrow, only to the desktop-test workerType.
Comment 22•9 years ago
|
||
\o/ What do I need to do to make use of it? I will try it Tuesday evening.
Assignee | ||
Comment 23•9 years ago
|
||
Add task: payload: features: allowPtrace: true
Comment 24•9 years ago
|
||
I've pushed with ptrace enabled (let's see what the results look like): https://hg.mozilla.org/try/rev/c3cb854e7fa0
Comment 25•9 years ago
|
||
New amis were added for desktop-test. There's some existing pending jobs that will keep the old machines around for a bit. Once those complete and die off the new ones will rollout. We can kill old instances to speed up with the process if you need it. us-east-1: ami-2e397344 us-west-1: ami-461e7526 us-west-2: ami-c94f50a8
Assignee | ||
Comment 26•9 years ago
|
||
Using the proper AMIs: https://treeherder.mozilla.org/#/jobs?repo=try&revision=199520ba8bfe
Assignee | ||
Comment 28•9 years ago
|
||
Argh, I lied -- we also need to add docker-worker:feature:allowPtrace to the scopes.
Assignee | ||
Comment 29•9 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=2945b0340d53
Assignee | ||
Comment 30•9 years ago
|
||
Looks promsing?
Assignee | ||
Comment 31•9 years ago
|
||
Bug 1221661: enable ptrace for desktop tests; r?jmaher
Attachment #8698518 -
Flags: review?(jmaher)
Reporter | ||
Comment 32•9 years ago
|
||
Comment on attachment 8698518 [details] MozReview Request: Bug 1221661: enable ptrace for desktop tests; r?jmaher https://reviewboard.mozilla.org/r/28017/#review25095 awesome, thanks!
Attachment #8698518 -
Flags: review?(jmaher) → review+
Assignee | ||
Comment 33•9 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/40a4725f311a9ebc1f8aead3872f20efce52febc Bug 1221661: enable ptrace for desktop tests; r=jmaher
Comment 34•9 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/40a4725f311a
Status: NEW → RESOLVED
Closed: 9 years ago
status-firefox46:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla46
You need to log in
before you can comment on or make changes to this bug.
Description
•