Closed Bug 945498 Opened 11 years ago Closed 10 years ago

Use breakpad to handle SIGSYS like other kinds of crash

Categories

(Toolkit :: Crash Reporting, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla30

People

(Reporter: jld, Assigned: jld)

References

Details

Attachments

(1 file, 3 obsolete files)

In support of developers finding out why their changes broke tests due to system call filtering, which is a thing that will certainly happen once it's live on Tinderbox, we need crash dumps from SIGSYS (generated for sandbox violations on --enable-content-sandbox-reporter builds).

This will be a little more complicated than it sounds, because (1) you can't return from a SIGSYS to re-raise it, as with a SIGSEGV/SIGBUS; and (2) the existing signal handler in Sandbox.cpp (which should remain) does not play well with others.

Depends on bug 942407 in that the patches will conflict if applied out of order.
In addition to making breakpad handle SIGSYS, I also had to modify the sandbox's handler to deal with the existence of the other handler, and the change wound up being nontrivial.  And there's a one-line change in IPC, to treat SIGSYS as a crash; I'm not sure if that needs an IPC peer or not.
Attachment #8342165 - Flags: review?(ted)
Attachment #8342165 - Flags: review?(gdestuynder)
Comment on attachment 8342165 [details] [diff] [review]
bug945498-breakpad-sigsys-hg0.diff

Review of attachment 8342165 [details] [diff] [review]:
-----------------------------------------------------------------

Looks ok.. however, if we're going to do this, it might be nicer to use SECCOMP_RET_TRAP instead of SECCOMP_RET_KILL.
This will inhibit the syscall (=not executed), let you emulate the syscall return, and will not kill the process. That means if the sigsys isn't handled things still run (even thus forbidden syscalls will not be executed, even in the stacktracers of course).

::: security/sandbox/linux/Sandbox.cpp
@@ +66,5 @@
> +extern "C" {
> +  // To avoid including <sys/syscall.h>, so that we're using only the
> +  // syscall numbers from our tree.  GNU has this in unistd.h, but
> +  // Android seems to do otherwise.
> +  int syscall(int, ...);

have you checked this is really necessary? (iirc bionic generates his own NR_* list and has this)

@@ +87,5 @@
> +
> +  // Try to invoke the next handler.  (This is typically the crash reporter.)
> +  // FIXME: should we also reset the signal mask based on sa_mask and sa_flags?
> +  //   What do we want to happen if the next handler isn't SA_NODEFER and
> +  //   invokes a forbidden syscall?

it probably dies from sig kill

@@ +101,3 @@
>  
> +  // Try to reraise, so the parent sees that this process crashed.
> +  // (If tgkill is forbidden, then that works too.)

s/that works too./this will call the reporter again due to the whitelist violation./
Attachment #8342165 - Flags: review?(gdestuynder) → review+
(In reply to Guillaume Destuynder [:kang] (use NEEDINFO!) from comment #2)
> Looks ok.. however, if we're going to do this, it might be nicer to use
> SECCOMP_RET_TRAP instead of SECCOMP_RET_KILL.
> This will inhibit the syscall (=not executed), let you emulate the syscall
> return, and will not kill the process. That means if the sigsys isn't
> handled things still run (even thus forbidden syscalls will not be executed,
> even in the stacktracers of course).

I'm a little confused here.  In the MOZ_CONTENT_SANDBOX_REPORTER case we're already using SECCOMP_RET_TRAP and raising SIGSYS and so on.  In the non-reporter case, we're using SECCOMP_RET_KILL (which, ignoring my SIGKILL hack for 3.0.x, doesn't involve signals at all) because we don't want a compromised process to be able to keep probing the filter after the first failure.

> ::: security/sandbox/linux/Sandbox.cpp
> @@ +66,5 @@
> > +extern "C" {
> > +  // To avoid including <sys/syscall.h>, so that we're using only the
> > +  // syscall numbers from our tree.  GNU has this in unistd.h, but
> > +  // Android seems to do otherwise.
> > +  int syscall(int, ...);
> 
> have you checked this is really necessary? (iirc bionic generates his own
> NR_* list and has this)

It's not strictly necessary, because I could include sys/syscall.h instead, but the advantage of using only the syscall list in the Gecko tree is that we can't accidentally break the build on some Linux distributions but not others (by referencing a new syscall that's not in our copy).

At the very least, it looks like I need to reword that comment so that it's clearer.

> @@ +87,5 @@
> > +
> > +  // Try to invoke the next handler.  (This is typically the crash reporter.)
> > +  // FIXME: should we also reset the signal mask based on sa_mask and sa_flags?
> > +  //   What do we want to happen if the next handler isn't SA_NODEFER and
> > +  //   invokes a forbidden syscall?
> 
> it probably dies from sig kill

It would have died from SIGKILL if we hadn't intervened, because it would have blocked SIGSYS.  But we've cleared SA_NODEFER, so it will recursively invoke the sandbox's SIGSYS handler.  We could just do a simple recursion check — it would be nice to at least get the syscall number and a useful log message in that case, instead of silent failure to dump core.

Also, I probably meant to fix that FIXME before sending this patch.

> @@ +101,3 @@
> >  
> > +  // Try to reraise, so the parent sees that this process crashed.
> > +  // (If tgkill is forbidden, then that works too.)
> 
> s/that works too./this will call the reporter again due to the whitelist
> violation./

Agreed; will fix.
Comment on attachment 8342165 [details] [diff] [review]
bug945498-breakpad-sigsys-hg0.diff

Review of attachment 8342165 [details] [diff] [review]:
-----------------------------------------------------------------

I don't feel like I know enough here to do an informed review. Can you do a patch against upstream SVN and submit it for review? Some of the Google folks are likely to know more here given Chromium/ChromeOS' use of sandboxing. I can point the right people at it for review.
Attachment #8342165 - Flags: review?(ted)
Breakpad upstream had concerns about the previous patch, and raised the question of whether it was actually the best way to accomplish this.  So I considered alternatives, and found what seems to be a simpler approach (which, conveniently, doesn't alter breakpad itself).
Attachment #8342165 - Attachment is obsolete: true
Attachment #8367707 - Flags: review?(ted)
Attachment #8367707 - Flags: review?(gdestuynder)
Comment on attachment 8367707 [details] [diff] [review]
bug945498-sandbox-crashreport-hg1.diff

Review of attachment 8367707 [details] [diff] [review]:
-----------------------------------------------------------------

Looking good to me, woot :)
Attachment #8367707 - Flags: review?(gdestuynder) → review+
Comment on attachment 8367707 [details] [diff] [review]
bug945498-sandbox-crashreport-hg1.diff

Review of attachment 8367707 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/crashreporter/nsExceptionHandler.cpp
@@ +1727,5 @@
> +                              const struct ucontext *context,
> +                              pid_t tid)
> +{
> +  google_breakpad::CrashGenerationClient *client =
> +    google_breakpad::CrashGenerationClient::TryCreate(kMagicChildCrashReportFd);

I think you ought to be able to replace this all with a call to ExceptionHandler::SimulateSignalDelivery:
http://mxr.mozilla.org/mozilla-central/source/toolkit/crashreporter/google-breakpad/src/client/linux/handler/exception_handler.cc#417

...except it doesn't currently take a ucontext parameter. Do you think it'd be worthwhile to add an overload for that?
Attachment #8367707 - Flags: review?(ted)
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #8)
> I think you ought to be able to replace this all with a call to
> ExceptionHandler::SimulateSignalDelivery:
> http://mxr.mozilla.org/mozilla-central/source/toolkit/crashreporter/google-
> breakpad/src/client/linux/handler/exception_handler.cc#417
> 
> ...except it doesn't currently take a ucontext parameter. Do you think it'd
> be worthwhile to add an overload for that?

That makes sense... but it would be equivalent to making ExceptionHandler::HandleSignal public.  Is there a reason it's currently private?
^
Flags: needinfo?(ted)
I suspect there's no overarching reason, simply "there was no need to provide that as a public API". Changing the visibility of that API because we have a need for it ought to be non-controversial.
Flags: needinfo?(ted)
New version.  Breakpad patch filed as https://breakpad.appspot.com/1114003/.  Also, now logs a message if breakpad returns failure.
Attachment #8367707 - Attachment is obsolete: true
Attachment #8368418 - Flags: review?(ted)
Attachment #8368418 - Flags: review?(gdestuynder)
Comment on attachment 8368418 [details] [diff] [review]
bug945498-sandbox-crashreport-hg2.diff

Review of attachment 8368418 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks, that's much simpler.

::: toolkit/crashreporter/nsExceptionHandler.cpp
@@ +1739,5 @@
>  }
>  #endif
>  
> +#ifdef XP_LINUX
> +nsresult WriteMinidumpForSigInfo(int signo, siginfo_t* info, void* uc)

I'd just make this return bool. (I'm not sure why these other not-really-XPCOM methods return a nsresult.)
Attachment #8368418 - Flags: review?(ted) → review+
Attachment #8368418 - Flags: review?(gdestuynder) → review+
Carrying over r=ted r=kang.
Attachment #8368418 - Attachment is obsolete: true
Attachment #8370496 - Flags: review+
C-N note: b2g-inbound probably makes the most sense.
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/e2a5bb78e8a5
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Whiteboard: [fixed-in-fx-team]
Target Milestone: --- → mozilla29
Target Milestone: mozilla29 → mozilla30
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: