Open Bug 1710940 Opened 4 years ago Updated 8 days ago

Crash event is not sent in isolated process

Categories

(GeckoView :: General, defect, P2)

Unspecified
All
defect

Tracking

(firefox103 disabled, firefox104 disabled, firefox105 disabled, firefox106 disabled, firefox107 disabled)

Tracking Status
firefox103 --- disabled
firefox104 --- disabled
firefox105 --- disabled
firefox106 --- disabled
firefox107 --- disabled

People

(Reporter: owlish, Unassigned, NeedInfo)

References

(Blocks 2 open bugs)

Details

(Whiteboard: [sandboxing] [geckoview:2022q3] [geckoview:m106] [geckoview:m107] [geckoview:2022q4] [fxdroid] [geckoview])

Affected tests:

  • ContentDelegateTest#crashContent
  • ContentDelegateTest.#rashContent_tapAfterCrash
  • GeckoSessionTestRuleTest#contentCrashIgnored
  • ContentCrashTest#crashContent
Severity: -- → S3
Priority: -- → P2
Whiteboard: [geckoview:m91?]
Whiteboard: [geckoview:m91?]
Whiteboard: [geckoview:2022q3?]
Whiteboard: [geckoview:2022q3?] → [sandboxing] [geckoview:2022q3]

Moving isolated process bugs to the new GeckoView::Sandboxing component.

Component: General → Sandboxing
Assignee: nobody → ohall
Priority: P2 → P1
Whiteboard: [sandboxing] [geckoview:2022q3] → [sandboxing] [geckoview:2022q3] [geckoview:m106]

107

Whiteboard: [sandboxing] [geckoview:2022q3] [geckoview:m106] → [sandboxing] [geckoview:2022q3] [geckoview:m106] [geckoview:m107]
Depends on: 1793784
See Also: → 1644486

This bug is related to issues reading the auxiliary vector at /proc/<crashing-pid>/auxv. Currently tested using the breakpad minidump writer.

The rust minidump writer bug of Improving handling of the auxiliary vector on Linux will likely become related to this bug on the change.

P2, waiting for gsvelto's new rust minidump writer to land.

Priority: P1 → P2
Whiteboard: [sandboxing] [geckoview:2022q3] [geckoview:m106] [geckoview:m107] → [sandboxing] [geckoview:2022q3] [geckoview:m106] [geckoview:m107] [geckoview:2022q4]
Blocks: 1698178
Type: task → defect
Assignee: ohall → nobody

FYI I've briefly investigated what would be needed to generate crash reports from isolated processes and it might require quite a bit of work. Going through Google documentation I discovered that it shouldn't be possible to use ptrace() like we currently do to inspect the child process. Specifically we need a broker process for ptrace()-ing the crashed child. That is the parent cannot ptrace_attach() to the child but must instead have the child fork a broker process that will do the ptrace() calls on its behalf.

The architecture of minidump-writer doesn't currently support this and will require a fairly significant refactoring.

Component: Sandboxing → General
Whiteboard: [sandboxing] [geckoview:2022q3] [geckoview:m106] [geckoview:m107] [geckoview:2022q4] → [sandboxing] [geckoview:2022q3] [geckoview:m106] [geckoview:m107] [geckoview:2022q4] [fxdroid] [geckoview]

Hi Gabriele,

We're looking at moving forward with isolated processes on Android.

I did some preliminary testing to see what the current state is of these junit failures. (A lot has changed since last we looked at this a couple of years ago with the rust minidump writer and bug 1620998!)

Here is a try from a couple of days ago running these tests.

In a failing isolated process run we get:

MOZ_CRASH(Crash via about:crashcontent)
...
04-25 14:55:05.682 31343 31343 I Binder:31343_3: type=1400 audit(0.0:3922): avc: denied { signal } for scontext=u:r:untrusted_app:s0:c512,c768 tcontext=u:r:isolated_app:s0:c512,c768 tclass=process permissive=1

// (GeckoCrashHelper never appears in the logs)

In a passing non-isolated process run, we see a lot less avc: denied calls overall, and a stack that looks like this:

MOZ_CRASH(Crash via about:crashcontent)
...
04-24 15:04:07.582 17499 17516 D GeckoCrashHelper: minidump_writer::linux::sections::mappings: retrieving build id for MappingInfo { start_address: 481648033792, size: 24576, system_mapping_info: SystemMappingInfo { start_address: 481648033792, end_address: 481648058368 }, offset: 0, permissions: MMPermissions(READ | WRITE | EXECUTE | PRIVATE), name: Some("/system/bin/app_process64") }

I'm still tracing and understanding how CrashHelper service works.

Could you take a look at this when you have some time?

Flags: needinfo?(gsvelto)

This is a known problem that needed bug 1620998 to be addressed. In short, isolated processes need the ptrace() calls to be proxied, which is something that we don't do yet. Before bug 1620998 we had no way of doing it, but now we should be able to. I need some time to figure this out but first I have to deal with main process crashes so I won't be able to tackle this before 1-2 months.

Flags: needinfo?(gsvelto)

Olivia, is there a list of restrictions placed on isolated processes with regards to system calls? I was trying to design the mechanism that the child processes will use to rendez-vous with the crash helper and let it generate the crash report, and I want to be sure my design does not involve bits that aren't required. Among those things I was wondering if an isolated process would be able to connect() to an AF_UNIX socket. My guess is that it shouldn't be able to, given even our sandbox prevents that unless the destination Unix address is explicitly allowed by the security policy.

Flags: needinfo?(ohall)

Thanks for looking into this!

is there a list of restrictions placed on isolated processes with regards to system calls?

I'll try to find some documentation with some specific information. I've found this short video helpful for providing some general expectations.

My guess is that it shouldn't be able to, given even our sandbox prevents that unless the destination Unix address is explicitly allowed by the security policy.

Yeah, that seems like a reasonable assumption to me too.

I'll see if I can find out some more information and leave my ni?.

:owlish, do you have some additional resources or thoughts here?

Flags: needinfo?(bugzeeeeee)

A further bit of clarification regarding comment 7 and 8. From the information I could find about isolated processes it seems like we wouldn't be able to call ptrace() directly from another non-isolated process. Right now the crash generation flow involves the crash helper process ptrace()-ing a crashed child process to extract the data that will go into the crash report. If this is not possible then we'd need an IPC channel to have been established between the crash helper and the isolated child process (because we can't connect to an external process from the isolated process). When the child process crashes it should fork() itself, call ptrace() on the forked process and then communicate back the results across the IPC channel. If this is the case then it's gonna be quite a bit of work and a dedicated abstraction in the minidump-writer crate for this particular case.

is there a list of restrictions placed on isolated processes with regards to system calls?

You want to look into the AOSP source, e.g. the SELinux policy for isolated process sits in there.

(In reply to Gian-Carlo Pascutto [:gcp] from comment #11)

is there a list of restrictions placed on isolated processes with regards to system calls?

You want to look into the AOSP source, e.g. the SELinux policy for isolated process sits in there.

From what I can tell, the main problem with ptrace is that each isolated process is assigned a unique uid/gid which are different from any regular app or any other isolated service, so ptrace fails the old-fashioned same-uid check (and so do accesses to /proc/%d/{stat,maps,…}, which may cause some other problems outside the scope of this bug; and see also comment #3 about the auxv, I think).

But, I believe that the SELinux policy would also block this, even if there would otherwise be a way for the child process to opt into being ptraced by a different-uid tracer. The only rule I can see that allows an untrusted_app to ptrace applies only if the tracee is also an untrusted_app, not an isolated_app (and the same for isolated_app: it can only ptrace another isolated process). There's also a neverallow rule in the app_domain macro which would cause a policy compilation error if such an allow rule existed.

Thanks for looking into this bug!

Just adding some more docs to add to the reference pile:

Flags: needinfo?(ohall)
You need to log in before you can comment on or make changes to this bug.