Closed Bug 1564107 Opened 5 years ago Closed 3 years ago

Thunderbird Crash in [@ mozilla::ipc::PrincipalInfoToPrincipal]

Categories

(Core :: DOM: Service Workers, defect, P2)

64 Branch
x86
All
defect

Tracking

()

RESOLVED WORKSFORME
Tracking Status
thunderbird_esr68 ? affected

People

(Reporter: wsmwk, Unassigned)

References

Details

(Keywords: crash, topcrash-thunderbird, Whiteboard: [tbird topcrash])

Crash Data

#5 ranked crash for beta.

Might have been a big upswing in the beta 65 era, but hard to say, because we don't have crash data from 2018 https://crash-stats.mozilla.com/signature/?product=Thunderbird&signature=mozilla%3A%3Aipc%3A%3APrincipalInfoToPrincipal&date=%3E%3D2019-01-08T13%3A23%3A00.000Z&date=%3C2019-07-08T13%3A23%3A00.000Z#summary example: bp-d426a576-f8f3-43a2-a98b-5ec220190208 build ID 20190123180341

This bug is for crash report bp-e4dd235a-703d-4109-bd80-02d3a0190708.

Top 10 frames of crashing thread:

0 xul.dll mozilla::ipc::PrincipalInfoToPrincipal ipc/glue/BackgroundUtils.cpp:98
1 xul.dll mozilla::dom::ClientInfo::GetPrincipal dom/clients/manager/ClientInfo.cpp:96
2 xul.dll nsGlobalWindowInner::EnsureClientSource dom/base/nsGlobalWindowInner.cpp:1721
3 xul.dll nsGlobalWindowInner::ExecutionReady dom/base/nsGlobalWindowInner.cpp:1802
4 xul.dll nsGlobalWindowOuter::SetNewDocument dom/base/nsGlobalWindowOuter.cpp:2224
5 xul.dll nsDocumentViewer::InitInternal layout/base/nsDocumentViewer.cpp:966
6 xul.dll nsDocumentViewer::Init layout/base/nsDocumentViewer.cpp:717
7 xul.dll nsDocShell::SetupNewViewer docshell/base/nsDocShell.cpp:8446
8 xul.dll nsDocShell::Embed docshell/base/nsDocShell.cpp:6355
9 xul.dll nsDocShell::CreateContentViewer docshell/base/nsDocShell.cpp:8261

bp-1239c33e-94c5-414b-967a-a2f390190707 another beta

Heaps of Firefox crashes in the reports. Why don't we ship this off to Firefox and ask people who work on DOM IPC like Nika?

Flags: needinfo?(nika)

Hopefully that will help. But note, the Firefox crashes don't an increase at the same time as the Thunderbird crashes. And the Firefox crash rate on a per user basis is nowhere near as high as Thunderbird.

The specific case here is a deserialization error when decoding a codebase principal. In effect what this code is doing is something like this (pseudocode):

// sending side
// true == aPrincipal->GetIsCodebasePrincipal();
nsCString spec = aPrincipal->GetURI()->GetSpec();
nsCString originNoSuffix = aPrincipal->GetOriginNoSuffix();
OriginAttributes attrs = aPrincipal->GetOriginAttributes();

https://searchfox.org/mozilla-central/rev/f711552789d6308796e34170ddaabead044adc1e/ipc/glue/BackgroundUtils.cpp#316-369

// receiving side
nsCOMPtr<nsIPrincipal> principal =
  BasePrincipal::CreateCodebasePrincipal(NS_NewURI(spec), attrs);
MOZ_ASSERT(principal->GetOriginNoSuffix() == originNoSuffix);

https://searchfox.org/mozilla-central/rev/f711552789d6308796e34170ddaabead044adc1e/ipc/glue/BackgroundUtils.cpp#79-123

This assertion can fail if a couple of invariants aren't held, such as:

  1. Round-tripping a URI to its spec is lossy - if this is the case, a different URI may arrive at the other side, and thus the wrong origin is selected
  2. Getting the originNoSuffix value for a URI can vary over time.

In this code's case, the time when the PrincipalInfo is deserialized can be far in time from when it was serialized, meaning that it's possible that dependencies on ambient state could change things. So anything which depends on ambient mutable state could change here: https://searchfox.org/mozilla-central/rev/da3f3eaaacb6fb344fd21ac29ace2da0e33f12d3/caps/ContentPrincipal.cpp#101

I see a few places where we depend on local state there (beyond the more direct issues like protocols changing their ORIGIN_IS_FULL_SPEC flag bit):

  1. If a protocol dynamically changes information (such as the ORIGIN_IS_FULL_SPEC flag), the origin can change.
  2. If the security.fileuri.strict_origin_policy pref changes, it will change the principals of file URIs.
  3. Loading blob principal information. This requires reading from the ambient blob URL state. If the blob has been revoked this method may change the principal which is returned from a content principal to a null principal, causing an assertion. https://searchfox.org/mozilla-central/rev/da3f3eaaacb6fb344fd21ac29ace2da0e33f12d3/caps/ContentPrincipal.cpp#182-186

Not sure which issue thunderbird is running into in particular, but ni? :baku to take a look at the blob URI situation.

Flags: needinfo?(nika) → needinfo?(amarchesini)

Thanks, Nika. I didn't see that we MOZ_CRASH() on "Origin must be available when deserialized" here:
https://searchfox.org/mozilla-central/rev/da3f3eaaacb6fb344fd21ac29ace2da0e33f12d3/ipc/glue/BackgroundUtils.cpp#98

I guess this needs closer looking at by a backend guy.

Flags: needinfo?(benc)

Still #5 crash for Thunderbird

For Thunderbird this goes back to at least version 66 (but I say this without having compared stacks across versions)
bp-f26af196-4f89-4afd-8e23-de0dd0190318 win10 Crash Address 0x10376263
bp-fedbe70c-6d58-4187-974b-68d510190313 mac Crash Address 0x0
bp-463ca1a3-2d5a-4251-a893-155700190225 win10 Crash Address 0x52e85d93

More recent - version 69
bp-5af27bfd-58e3-4564-8b07-0616d0190822 win7 Crash Address 0x7fece1b810e
bp-43575dee-a6c7-4d46-9618-3a5030190807 Mac

See Also: → tb68found
Version: 68 → 64
See Also: → 1578458
Flags: needinfo?(benc)
Flags: needinfo?(amarchesini)

Do we really expect bug 1578458 to help in the Thunderbird case?

Flags: needinfo?(benc)
Flags: needinfo?(amarchesini)

(In reply to Wayne Mery (:wsmwk) from comment #6)

Do we really expect bug 1578458 to help in the Thunderbird case?

bug 1578458 is set at P5 (not very encouraging), plus bug 1578458 comment 13 "Looking at crash-stats, the crash is hit in the field from a reasonable variety of callers - indicating the calling code here is buggy in quite a few places."

Given that comment, are we not at fault?

Flags: needinfo?(benc) → needinfo?(mkmelin+mozilla)

Looking at e.g. bp-e5752ece-4367-438e-9b45-a88bf0200107 probably not c-c at fault. We have an m_channelListener and go from there. If something's wrong in c-c, I don't know what it is.

Flags: needinfo?(mkmelin+mozilla)
Component: General → DOM: Core & HTML
Product: Thunderbird → Core
Summary: Crash in [@ mozilla::ipc::PrincipalInfoToPrincipal] → Thunderbird Crash in [@ mozilla::ipc::PrincipalInfoToPrincipal]
Whiteboard: [tbird topcrash]
Version: 64 → 64 Branch

In the Firefox crashes, such as https://crash-stats.mozilla.org/report/index/2cb1e2b9-ce2a-41f9-8953-fbf030200309 , I see a HTTP cookie related signature, where the crash comes from
https://searchfox.org/mozilla-central/source/netwerk/cookie/CookieJarSettings.cpp#281

That doesn't seem like a blob URL concern, which is what the present needinfo is about. Maybe it's a different bug.

Either way, it's unclear to me why the bug is in this component. The Thunderbird crashes implicate code that was added for Service Workers and the Firefox crashes implicate cookie code. Sending over to Service Workers for now.

Component: DOM: Core & HTML → DOM: Service Workers

(In reply to Henri Sivonen (:hsivonen) from comment #9)

The Thunderbird crashes implicate code that was added for Service Workers.

I cannot see this, at least in the 4-5 crashes I clicked on.

And we have several bugs talking about PrincipalInfoToPrincipal crashes, all assigned to different components. Probably someone should just dig into it regardless of component ownership.

See Also: → 1609951

Maybe mozilla::ipc::PrincipalInfoToPrincipal should be added to the Socorro prefix list? However, I have no idea at all how the crash reporting works for Thunderbird. In the crash stats table above, Thunderbird isn't mentioned at all.

Following the link from "Signature" in the bug metadata, the resulting list is mostly Thunderbird.

(In reply to Jens Stutte [:jstutte] from comment #10)

(In reply to Henri Sivonen (:hsivonen) from comment #9)

The Thunderbird crashes implicate code that was added for Service Workers.

I cannot see this, at least in the 4-5 crashes I clicked on.

I was referring to mozilla::dom::ClientInfo as code added for Service Workers. E.g.:
https://crash-stats.mozilla.org/report/index/9b7eb840-e904-45ab-9758-d76450200309

(In reply to Simon Giesecke [:sg] [he/him] from comment #11)

Maybe mozilla::ipc::PrincipalInfoToPrincipal should be added to the Socorro prefix list?

If doing so presents a more accurate crash signature I would support doing this.

However, I have no idea at all how the crash reporting works for Thunderbird.

Works the same as Firefox

(In reply to Henri Sivonen (:hsivonen) from comment #13)

I was referring to mozilla::dom::ClientInfo as code added for Service Workers. E.g.:
https://crash-stats.mozilla.org/report/index/9b7eb840-e904-45ab-9758-d76450200309

I see. Digging a bit further I stumbled over:

#ifdef FUZZING
        return nullptr;
#else
        MOZ_CRASH("Origin must be available when deserialized");
#endif /* FUZZING */

introduced by bug 1598472. So if we can safely return nullptr in case of fuzzing, probably we should do so also normally? Or remove this and try to reduce a testcase with the fuzzer if we want to enforce this invariant ?

Flags: needinfo?(jkratzer)
Priority: -- → P2

(In reply to Jens Stutte [:jstutte] from comment #15)

introduced by bug 1598472. So if we can safely return nullptr in case of fuzzing, probably we should do so also normally?

I think it doesn't mean that returning nullptr is safe--just a level of unsafety that fuzzing is willing to overlook in order to find out things. It could be bad to return nullptr in the wild.

(In reply to Henri Sivonen (:hsivonen) from comment #16)

(In reply to Jens Stutte [:jstutte] from comment #15)

introduced by bug 1598472. So if we can safely return nullptr in case of fuzzing, probably we should do so also normally?

I think it doesn't mean that returning nullptr is safe--just a level of unsafety that fuzzing is willing to overlook in order to find out things. It could be bad to return nullptr in the wild.

We've hit this before in fuzzing, that is why the code is as it is, but recent discussions have concluded that this code should not MOZ_CRASH at all (via IPC, because it causes instabilities like this one and essentially undos the process separation). So yes, we probably want to return nullptr all the time, but we need to check that callers handle this properly.

Flags: needinfo?(jkratzer)

(In reply to Henri Sivonen (:hsivonen) from comment #16)
this code should not MOZ_CRASH at all (via IPC, because it causes instabilities like this one and essentially undos the process separation).

Perry, what do you think? I totally second that code called via IPC should never crash regardless of outer order invariants violated or not. Do you suspect side effects?

Flags: needinfo?(amarchesini) → needinfo?(perry)

That makes sense to me if it's the case that de-serialization of the PrincipalInfo is failing. Not crashing to maintain process separation is a good point too. I don't suspect side effects as long as the callers check if PrincipalInfoToPrincipal fails, and from a quick spot check it looks like most but not all do check this.

Flags: needinfo?(perry)

I think the rationale for crashing was that a mismatch in origins coming from a content process suggests the content process has been compromised (see https://bugzilla.mozilla.org/show_bug.cgi?id=1340163#c37 and https://bugzilla.mozilla.org/show_bug.cgi?id=1340163#c55). In general, the strategy for such cases has been to terminate the potentially rogue content process, not the parent process. However, the code in question didn't/doesn't have easy access to the actor that provided the PrincipalInfo, so it just opted to crash the parent process.

I should also note that with bug 922464 fixed via bug 1536744, it's not clear we ever need to ship the origin separately in PrincipalInfo anymore since it should be able to be derived on any thread. (A motivation for PBackground APIs previously was that they couldn't parse URIs off the main thread and so needed to have the origin pre-computed, but this also needed to be enforced by fatal assertions for security reasons, as it would be very nice for an attacker to say the URI was evil.com but the origin was good.com.) It's mainly just a question of actually performing the cleanup at this point.

Blocks: 1634062

Would it surprise anyone if the signature is gone for Thunderbird after 77beta buildid 20200511021338, i.e. 77.0b2? Half the crashes via nsImapCacheStreamListener and half via nsMailboxProtocol::ReadMessageResponse

No crashes with 77.0b3, built 5/18/2020 - comm-beta range https://hg.mozilla.org/releases/comm-beta/pushloghtml?fromchange=THUNDERBIRD_77_0b2_RELEASE&tochange=THUNDERBIRD_77_0b3_RELEASE&full=1

(In reply to Wayne Mery (:wsmwk) from comment #22)

Would it surprise anyone if the signature is gone for Thunderbird after 77beta buildid 20200511021338, i.e. 77.0b2? Half the crashes via nsImapCacheStreamListener and half via nsMailboxProtocol::ReadMessageResponse

Not if those builds include the fix from bug 1635399 which removed all calls to MOZ_CRASH from PrincipalInfoToPrincipal in favor of propagating errors to the callers and for them to handle things.

Status: NEW → RESOLVED
Closed: 3 years ago
Depends on: 1635399
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.