Closed Bug 1459655 Opened 2 years ago Closed 2 years ago

Crash in mozilla::ipc::PrincipalInfoToPrincipal

Categories

(Core :: DOM: Core & HTML, defect, P1, blocker)

defect

Tracking

()

VERIFIED FIXED
mozilla62
Tracking Status
firefox-esr52 --- unaffected
firefox-esr60 --- unaffected
firefox60 --- unaffected
firefox61 blocking verified
firefox62 + verified

People

(Reporter: marcia, Assigned: Mardak)

References

Details

(Keywords: crash, regression)

Crash Data

User Story

https://github.com/mozilla/activity-stream/compare/firefox-61b3...66b23ff4695a0edd97dda9db0efa62138b001d56

Attachments

(2 files)

This bug was filed from the Socorro interface and is
report bp-0519cdd9-340c-41e4-8b54-f32000180506.
=============================================================

Seen while looking at nightly crash stats, crashes started on Windows and Mac using 20180504220115: https://bit.ly/2IhHmlz (Windows). Mac crashes: https://bit.ly/2jAoT5A. Moz Crash reason: MOZ_CRASH(Origin must be available when deserialized)

Possible regression range based on Build ID: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=a2d1d4158bb4718d8bb31e1284e133aa99273600&tochange=5207b1392b11db534550a5eb801302e6dbb58f95

Top 10 frames of crashing thread:

0 xul.dll mozilla::ipc::PrincipalInfoToPrincipal ipc/glue/BackgroundUtils.cpp:106
1 xul.dll mozilla::ipc::LoadInfoArgsToLoadInfo ipc/glue/BackgroundUtils.cpp:464
2 xul.dll mozilla::net::HttpChannelParent::DoAsyncOpen netwerk/protocol/http/HttpChannelParent.cpp:521
3 xul.dll mozilla::net::HttpChannelParent::Init netwerk/protocol/http/HttpChannelParent.cpp:131
4 xul.dll mozilla::net::NeckoParent::RecvPHttpChannelConstructor netwerk/ipc/NeckoParent.cpp:323
5 xul.dll mozilla::net::PNeckoParent::OnMessageReceived ipc/ipdl/PNeckoParent.cpp:920
6 xul.dll mozilla::dom::PContentParent::OnMessageReceived ipc/ipdl/PContentParent.cpp:3497
7 xul.dll mozilla::ipc::MessageChannel::DispatchAsyncMessage ipc/glue/MessageChannel.cpp:2141
8 xul.dll mozilla::ipc::MessageChannel::DispatchMessageW ipc/glue/MessageChannel.cpp:2071
9 xul.dll mozilla::ipc::MessageChannel::RunMessage ipc/glue/MessageChannel.cpp:1917

=============================================================
OS: Windows 10 → All
Hardware: x86 → All
Christoph, any ideas? Nothing in the files in the top few stack frames seems to have changed on 5-4.
Flags: needinfo?(ckerschb)
Priority: -- → P2
(In reply to Andrew McCreight [:mccr8] from comment #1)
> Christoph, any ideas? Nothing in the files in the top few stack frames seems
> to have changed on 5-4.

I browsed through PrincipalInfoToPrincipal with particular focus around line 106, but nothing obvious seems wrong here to me. Would be curious to see what's going on. I suppose we don't have STR handy, right? Otherwise I would get right to it.
Flags: needinfo?(ckerschb)
Not sure it explains this, but bug 1456986 is in the range and I just backed it out.  Maybe that will help here.
Looks like this is still crashing on Nightlies with that backout included.
I've look at the push list in comment 2 and I don't really see anything that explains this.  If the optional principal value was garbage on the sending side I would expect it to crash in the child process.

Maybe we should add a similar MOZ_CRASH() on missing origin in the PrincipalToPrincipalInfo() serialization side of things so we can get crash reports closer to the place where the bad value is being produced.

Andrea, what do you think?
Flags: needinfo?(amarchesini)
(In reply to Ben Kelly [:bkelly] from comment #5)
> Maybe we should add a similar MOZ_CRASH() on missing origin in the
> PrincipalToPrincipalInfo() serialization side of things so we can get crash
> reports closer to the place where the bad value is being produced.

FWIW, that sounds like a good path forward to me - thanks.
> Andrea, what do you think?

Yes, this seems a excellent starting point. I would also convert MOZ_ASSERT to MOZ_DIAGNOSTIC_ASSERT(mInitialized) in BasePrincipal::GetOrigin and in BasePrincipal::GetOriginNoSuffix
Flags: needinfo?(amarchesini)
This is spiking big-time on 61.0b3. If we can get a diagnostic patch created quickly, we can uplift to Beta as well ASAP.
Severity: critical → blocker
I'll work on creating a diagnostic patch this morning.  Not sure if I should do it in a dependent bug or here, though.
the crash signature is accounting for 70% of browser crashes in 61.0b3. 

this would be a more narrow pushlog than in comment #0 from build 20180504100129 to build 20180504220115:
https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=8994f3&tochange=5207b1
Actually, I see an old bug that could explain why this is triggering a crash instead of failing further upstream.  I'll fix that.
The return value of the triggering principal PrincipalToPrincipalInfo() is not checked here:

https://searchfox.org/mozilla-central/rev/eb6c5214a63e20a3fff455e92c876287a8d2e188/ipc/glue/BackgroundUtils.cpp#316
Assignee: nobody → bkelly
Status: NEW → ASSIGNED
Priority: P2 → P1
We should, like, check for error and stuff.
Attachment #8974701 - Flags: review?(amarchesini)
The missed error checking was introduced in bug 1179505.  Not sure what changed recently to make the principal fail to serialize, though.
Blocks: 1179505
Attachment #8974701 - Flags: review?(amarchesini) → review+
I'm going to wait for try in case this causes the error condition to show as an orange somehow.  Although thats probably unlikely given we don't see the crash in automation.
Pushed by bkelly@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/0e0e6097f282
Check for error when serializing the LoadInfo triggering principal. r=baku
Comment on attachment 8974701 [details] [diff] [review]
Check for error when serializing the LoadInfo triggering principal. r=baku

Approval Request Comment
[Feature/Bug causing the regression]: Partially bug 1179505 for removing some error checking, but also some newer regression which causes PrincipalToPrincipalInfo() return an error code.  (I think).
[User impact if declined]: Not catching the error causes us to hit a MOZ_CRASH() on the parent side.  This is a top crasher in beta.
[Is this code covered by automated tests?]: This problem does not trigger in automation.
[Has the fix been verified in Nightly?]: Not yet.  I only pushed this to inbound just now.  Since this is a top crash, though, we want to get this potential fix in the next beta if possible.
[Needs manual test from QE? If yes, steps to reproduce]: No
[List of other uplifts needed for the feature/fix]: None
[Is the change risky?]: Minimal
[Why is the change risky/not risky?]: One line change to check for a failed error code.
[String changes made/needed]: None
Attachment #8974701 - Flags: approval-mozilla-beta?
Comment on attachment 8974701 [details] [diff] [review]
Check for error when serializing the LoadInfo triggering principal. r=baku

Given the frequency of this crash, let's take the speculative fix now for today's 61.0b4 build.
Attachment #8974701 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
https://hg.mozilla.org/mozilla-central/rev/0e0e6097f282
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla62
We've still got crashes with the latest nightly including this change, e.g. https://crash-stats.mozilla.com/report/index/c3034268-869c-451f-a0c9-f3a770180511
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: mozilla62 → ---
So we're crashing in this code:

      nsAutoCString originNoSuffix;
      rv = principal->GetOriginNoSuffix(originNoSuffix);
      if (NS_WARN_IF(NS_FAILED(rv)) ||
          !info.originNoSuffix().Equals(originNoSuffix)) {
        MOZ_CRASH("Origin must be available when deserialized");
      }

It seems the principal is serializing fine, but its failing this conditional:

  info.originNoSuffix().Equals(originNoSuffix)

So maybe its not that the origin is missing, but that it does not match the origin contained in the spec string.
I'll make a diagnostic patch to assert that the origin and spec match.
I wonder if its something like the principal URL is "about:blank" and the origin string is something like "http://example.com".
Ted, is there any chance you have time to grab a mini dump from one of these crashes and look at the strings being compared just before the crash?  I don't have access to the dumps and I'm also not familiar with the tools to inspect them after the fact.
Flags: needinfo?(ted)
i can reproduce this crash with a particular profile in a portable firefox version & *think* this was introdcued by bug 1438367.
Blocks: 1438367
Flags: needinfo?(edilee)
This might fallout from removing nsIAboutModule::MAKE_LINKABLE.  It seems that impact how the nsIURI is formed:

https://searchfox.org/mozilla-central/rev/3f17a234769d25fca5144ebb8abc8e1cb3c56c16/netwerk/protocol/about/nsAboutProtocolHandler.cpp#44
https://searchfox.org/mozilla-central/rev/3f17a234769d25fca5144ebb8abc8e1cb3c56c16/netwerk/protocol/about/nsAboutProtocolHandler.cpp#139-161

And then the principal code is expecting certain behavior for the about:blank URI:

https://searchfox.org/mozilla-central/rev/3f17a234769d25fca5144ebb8abc8e1cb3c56c16/caps/ContentPrincipal.cpp#107-113

But that is a DEBUG-only assert, so we might be blowing right past that and hitting the MOZ_CRASH instead.

Dropping NI on Ted since we have a reproducing profile.
Flags: needinfo?(ted)
(In reply to Ben Kelly [:bkelly] from comment #25)
> I wonder if its something like the principal URL is "about:blank" and the
> origin string is something like "http://example.com".

Riffing on this given comment #27 and complete lack of STR and so on (so can't really see exactly what's going on), I wonder if this is about about:home vs. moz-safe-about:home or something?
FWIW, if this is broken on about:home today I don't see why it wouldn't also be broken on about:newtab even if we back out the patch for bug 1438367, given that about:newtab essentially loads exactly the same stuff and was already using the same flags and thus type of nested (or not) URI.
I can confirm that this is fixed by backing out bug 1438367.  The following patch adding back MAKE_LINKABLE also fixes it.

We should probably backout for now and re-land with a test that would catch this, though.  We clearly don't have adequate test coverage for this case.  Not sure what the trigger is, but this profile has two blank "highlight" panels.

--- a/browser/components/about/AboutRedirector.cpp
+++ b/browser/components/about/AboutRedirector.cpp
@@ -79,17 +79,17 @@ static const RedirEntry kRedirMap[] = {
     nsIAboutModule::HIDE_FROM_ABOUTABOUT },
   { "sessionrestore", "chrome://browser/content/aboutSessionRestore.xhtml",
     nsIAboutModule::ALLOW_SCRIPT |
     nsIAboutModule::HIDE_FROM_ABOUTABOUT },
   { "welcomeback", "chrome://browser/content/aboutWelcomeBack.xhtml",
     nsIAboutModule::ALLOW_SCRIPT |
     nsIAboutModule::HIDE_FROM_ABOUTABOUT },
   // Actual activity stream URL for home and newtab are set in channel creation
-  { "home", "about:blank", ACTIVITY_STREAM_FLAGS },
+  { "home", "about:blank", ACTIVITY_STREAM_FLAGS | nsIAboutModule::MAKE_LINKABLE },
   { "newtab", "about:blank", ACTIVITY_STREAM_FLAGS },
   { "library", "chrome://browser/content/aboutLibrary.xhtml",
     nsIAboutModule::URI_MUST_LOAD_IN_CHILD |
     nsIAboutModule::URI_SAFE_FOR_UNTRUSTED_CONTENT },
   { "preferences", "chrome://browser/content/preferences/in-content/preferences.xul",
     nsIAboutModule::ALLOW_SCRIPT },
   { "downloads", "chrome://browser/content/downloads/contentAreaDownloadsView.xul",
     nsIAboutModule::ALLOW_SCRIPT },
Assignee: bkelly → edilee
No longer depends on: 1460881
There were some changes that were dependent on MAKE_LINKABLE going away, so if we revert bug 1438367, I'll also need to revert some other things too at the same time. I'm pretty sure we haven't had any other changes to beta, so it should uplift cleanly although it will touch ~100 files.
Flags: needinfo?(edilee)
Blocks: 1460119
User Story: (updated)
Depends on: 1459296
Comment on attachment 8975121 [details]
Bug 1459655 - Restore about flag for about:home.

https://reviewboard.mozilla.org/r/243490/#review249348

Looks good to me
Attachment #8975121 - Flags: review?(khudson) → review+
Comment on attachment 8975121 [details]
Bug 1459655 - Restore about flag for about:home.

https://reviewboard.mozilla.org/r/243490/#review249386

grumbly rs=me - I'd still like to understand how this is causing crashes, and I still don't see STR in the bug... anyway, I guess that's something for a follow-up.

Also, I'm not sure why all the JS needs wrapping in an IIFE / self-executing function expression. But I mean, it'll work for the purposes of what we need it to do...
Attachment #8975121 - Flags: review?(gijskruitbosch+bugs) → review+
The STR are to open a browser using Philip's profile.  Maybe he can send it to you.  (Or I can when I'm back to my computer.)
Pushed by edilee@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/ad8535a8e153
Restore about flag for about:home. r=k88hudson
Comment on attachment 8975121 [details]
Bug 1459655 - Restore about flag for about:home.

Approval Request Comment
[Feature/Bug causing the regression]: Bug 1438367 regressed
[User impact if declined]: Top crasher in beta
[Is this code covered by automated tests?]: No
[Has the fix been verified in Nightly?]: Still on autoland right now
[Needs manual test from QE? If yes, steps to reproduce]: No
[List of other uplifts needed for the feature/fix]: None
[Is the change risky?]: Shouldn't be
[Why is the change risky/not risky?]: This is reverting two changes (the regressing bug and a dependent bug) resulting in a state that was previously good on Nightly.
[String changes made/needed]: None
Attachment #8975121 - Flags: approval-mozilla-beta?
https://hg.mozilla.org/mozilla-central/rev/ad8535a8e153
Status: REOPENED → RESOLVED
Closed: 2 years ago2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla62
Comment on attachment 8975121 [details]
Bug 1459655 - Restore about flag for about:home.

It would appear that we're still seeing crashes with this signature on Nightly builds containing the fix, but possibly at lower rates than before. It's really hard to say, though, given that it's the weekend and usage might just be lower.
Flags: needinfo?(edilee)
Comment on attachment 8975121 [details]
Bug 1459655 - Restore about flag for about:home.

I'm approving this for 61.0b5 because it does seem to have made a difference on Nightly, but I'm not entirely convinced we're out of the woods here yet.
Attachment #8975121 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
OK, so the reason this crashes is that we start an http request where the triggering principal is about:home, and the triggering principal info that HttpChannelParent gets has originNoSuffix of moz-safe-about:home , but uri of about:home, which means the codebase principal that gets constructed gets originNoSuffix of about:home, which mismatches and therefore blows up.

Doing a JS stack dump in the content process for this load, it looks like it's triggered off a restoreTabContent call (see ContentRestore.jsm and content-sessionStore.js) . This makes me believe that what's happening is the following:

0. set session to autorestore
1. with Firefox version X, where about:home is non-nested, the user opens about:home and clicks a link that goes to foo.com
2. get an update ready to version X+1 where about:home *is* nested.
3. restart to update
4. crash, because about:home and moz-safe-about:home are not the same.

Notably, this means that now that we've switched back, people will still hit this crash but for the inverse process, where they opened a link from about:home and the meaning of the about:home origin changed again, so there's a mismatch again, so it's crashy again, which explains comment #40 (obviously there's fewer people with a session with the "new" version of about:home's principal than there are people with the "old" version of about:home's principal).
Would it make sense to just special case that specific about:home case for some time?
Depends on: 1461407
(In reply to Christoph Kerschbaumer [:ckerschb] from comment #44)
> Would it make sense to just special case that specific about:home case for
> some time?

Maybe, but it'd be nice not to get caught out like this again. Further discussion is in bug 1461407.
I think we have enough crash data to call this verified at this point. Note that there's still an open question about uplifting bug 1461407 to Beta, but the crash rates are definitely tailing off now and we shouldn't see this on release at all after 61 ships.
Status: RESOLVED → VERIFIED
Flags: needinfo?(edilee)
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.