Closed Bug 1512356 Opened 10 months ago Closed 10 months ago

Massive Mozmill test failure on 2018-12-06: PROCESS-CRASH | attachment | application crashed [@ IPC::ParamTraits<nsIPrincipal>::Write(IPC::Message*, nsIPrincipal*)] - TB crashes when displaying message

Categories

(Thunderbird :: General, defect, blocker)

defect
Not set
blocker

Tracking

(Not tracked)

RESOLVED FIXED
Thunderbird 65.0

People

(Reporter: darktrojan, Assigned: jorgk)

References

Details

Attachments

(2 files, 5 obsolete files)

Lots of MozMill tests are failing with this:

> PROCESS-CRASH | attachment | application crashed [@ IPC::ParamTraits<nsIPrincipal>::Write(IPC::Message*, nsIPrincipal*)]

Here's what landed on mozilla-central, I think it's highly likely to be in the later merge: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=371779a8067c91a5122047682bd359e17c&tochange=643a4a6bbfe98aacb7f285b8f9e0a00179

I've started a Try run to prove or disprove that theory: https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=3f5f10f1f27dff9bbcfd984a99a58ee018936eec
Try run's green. This is well outside my area, so I'm going to leave it at that.
Thanks, so coming from here:
f3c4afc3c780 Nika Layzell — Bug 1512085 - Don't overlap IDs between content and middleman process, r=bhackett
dc7076b4b610 Nika Layzell — Bug 1511237 - Store a TabChild reference in cached docshells, r=bzbarsky, r=smaug
625c8d800f59 Nika Layzell — Bug 1510460 - Record the current WindowGlobal on ChromeBrowsingContext, r=farre
1c1f445a9b6a Nika Layzell — Bug 1509362 - Don't crash when constructing actor during content shutdown, r=jld
64cab8bb3dc7 Nika Layzell — Bug 1500950 - Expose rootFrameLoader on WindowGlobalParent, r=farre
f824e2415aa6 Nika Layzell — Bug 1500949 - Include innerWindowId/outerWindowId in PWindowGlobal, r=farre
5bab2966c5c6 Nika Layzell — Bug 1500948 - Expose BrowsingContext on nsFrameLoader objects, r=farre
6351599e53d4 Nika Layzell — Bug 1500944 - Part 2: Expose WindowGlobal actors to Chrome JS, r=farre
80ae3a4f8e44 Nika Layzell — Bug 1500944 - Part 1: Store the set of active WindowGlobalParent objects in ChromeBrowsingContext, r=farre
20554da7b303 Nika Layzell — Bug 1487249 - Part 3: Add the WindowGlobal actor representing a single window global, r=bzbarsky
222e2bcb25df Nika Layzell — Bug 1487249 - Part 2: Add a new PInProcess actor to manage intra-thread actors, r=mccr8
8976adedb5cb Nika Layzell — Bug 1487249 - Part 1: Allow MessageChannel objects to be created within a thread, r=mccr8
TB is pretty useless like that, if you click a message, it crashes :-(
Severity: critical → blocker
Hit MOZ_CRASH(Unable to serialize principal.) at c:/mozilla-source/comm-central/dom/ipc/PermissionMessageUtils.cpp:23
#01: mozilla::ipc::PInProcessChild::SendPWindowGlobalConstructor (c:\mozilla-source\comm-central\obj-x86_64-pc-mingw32\ipc\ipdl\pinprocesschild.cpp:96)
#02: mozilla::dom::WindowGlobalChild::Create (c:\mozilla-source\comm-central\dom\ipc\windowglobalchild.cpp:55)
#03: nsGlobalWindowInner::InnerSetNewDocument (c:\mozilla-source\comm-central\dom\base\nsglobalwindowinner.cpp:1651)
#04: nsGlobalWindowOuter::SetNewDocument (c:\mozilla-source\comm-central\dom\base\nsglobalwindowouter.cpp:1942)
#05: nsDocumentViewer::InitInternal (c:\mozilla-source\comm-central\layout\base\nsdocumentviewer.cpp:969)
#06: nsDocumentViewer::Init (c:\mozilla-source\comm-central\layout\base\nsdocumentviewer.cpp:715)
#07: nsDocShell::SetupNewViewer (c:\mozilla-source\comm-central\docshell\base\nsdocshell.cpp:8467)
#08: nsDocShell::Embed (c:\mozilla-source\comm-central\docshell\base\nsdocshell.cpp:6358)
#09: nsDocShell::CreateContentViewer (c:\mozilla-source\comm-central\docshell\base\nsdocshell.cpp:8277)
#10: nsDSURIContentListener::DoContent (c:\mozilla-source\comm-central\docshell\base\nsdsuricontentlistener.cpp:182)
#11: nsDocumentOpenInfo::TryContentListener (c:\mozilla-source\comm-central\uriloader\base\nsuriloader.cpp:750)
#12: nsDocumentOpenInfo::DispatchContent (c:\mozilla-source\comm-central\uriloader\base\nsuriloader.cpp:422)
#13: nsDocumentOpenInfo::OnStartRequest (c:\mozilla-source\comm-central\uriloader\base\nsuriloader.cpp:301)
#14: nsMsgProtocol::OnStartRequest (c:\mozilla-source\comm-central\comm\mailnews\base\util\nsmsgprotocol.cpp:323)
#15: nsInputStreamPump::OnStateStart (c:\mozilla-source\comm-central\netwerk\base\nsinputstreampump.cpp:489)
#16: nsInputStreamPump::OnInputStreamReady (c:\mozilla-source\comm-central\netwerk\base\nsinputstreampump.cpp:413)
#17: mozilla::SlicedInputStream::OnInputStreamReady (c:\mozilla-source\comm-central\xpcom\io\slicedinputstream.cpp:419)
#18: nsInputStreamReadyEvent::Run (c:\mozilla-source\comm-central\xpcom\io\nsstreamutils.cpp:93)
#19: nsThread::ProcessNextEvent (c:\mozilla-source\comm-central\xpcom\threads\nsthread.cpp:1161)

Debug build crashes on:

  nsCString principalString;
  nsresult rv = NS_SerializeToString(aParam, principalString);
  if (NS_FAILED(rv)) {
    MOZ_CRASH("Unable to serialize principal.");
    return;
  }

Seems like opt build crash around the same area, too:
PROCESS-CRASH | attachment | application crashed [@ IPC::ParamTraits<nsIPrincipal>::Write(IPC::Message*, nsIPrincipal*)]

Sigh. I'll just NI some people and see who replies first.
Flags: needinfo?(nika)
Flags: needinfo?(bzbarsky)
Flags: needinfo?(bugs)
Flags: needinfo?(afarre)
Summary: Massive Mozmill test failure on 2018-12-06 → Massive Mozmill test failure on 2018-12-06: PROCESS-CRASH | attachment | application crashed [@ IPC::ParamTraits<nsIPrincipal>::Write(IPC::Message*, nsIPrincipal*)]
Sorry, I didn't read that correctly, it's a MOZ_CRASH() not a MOZ_ASSERT(), so of course that will crash opt builds, too. That code in PermissionMessageUtils.cpp has been there for a while, so it in itself is not the cause of the problem.

We had other issues with crashing on not being able to serialise URIs in bug 1483043:
https://hg.mozilla.org/mozilla-central/rev/92c10215a3f6

Maybe it's something similar.
Duplicate of this bug: 1512358
Summary: Massive Mozmill test failure on 2018-12-06: PROCESS-CRASH | attachment | application crashed [@ IPC::ParamTraits<nsIPrincipal>::Write(IPC::Message*, nsIPrincipal*)] → Massive Mozmill test failure on 2018-12-06: PROCESS-CRASH | attachment | application crashed [@ IPC::ParamTraits<nsIPrincipal>::Write(IPC::Message*, nsIPrincipal*)] - TB crashes when displaying message
Poking around the code at the lines in the stack, I have the impression that his is mostly related to e10s since there are a few XRE_IsParentProcess() calls.
Just a tip for the TB team wishing to work locally: hg qbackout -s -r 8976adedb5cb:f3c4afc3c780 should get them going:
https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=aed587b82a6189ff613c02c8e9a4c15aff620559
I had a chat with Nika on IRC. This is what I understood:

Documents opened via our various mailnews URLs, most derived from nsMsgMailNewsUrl, somehow get a "codebase" principal (also called content principal?). Since the underlying URL can't be serialised, that is, doesn't implement
https://searchfox.org/mozilla-central/rev/adec563403271e78d1a057259b3e17fe557dfd91/xpcom/ds/nsISerializable.idl#11-32
the principal can't be serialised, which now appears to be a requirement, and we fail.

Some quotes from the chat:
21:05:16 - nika: jorgk: I'm guessing we have a content principal here which is failing to serialize because its URI isn't serializing
21:48:44 - nika: jorgk: ... if you create a document from a URI, we create a principal for it
21:53:39 - nika: jorgk: ... We try to create a codebase principal for every document load

Apparently this code allows for the content principal creation:
https://searchfox.org/mozilla-central/rev/adec563403271e78d1a057259b3e17fe557dfd91/caps/ContentPrincipal.cpp#115-116
Attached patch make-url-serialisable.patch (obsolete) — Splinter Review
OK, here's an attempt to make nsMsgMailNewsUrl serialisable. Doesn't work, still crashes here:
Hit MOZ_CRASH(Unable to serialize principal.) at c:/mozilla-source/comm-central/
dom/ipc/PermissionMessageUtils.cpp:23

Reading
https://searchfox.org/mozilla-central/rev/adec563403271e78d1a057259b3e17fe557dfd91/xpcom/ds/nsISerializable.idl#28
we also must implement nsIClassInfo which I haven't done.
Attached patch make-url-serialisable.patch (v2) (obsolete) — Splinter Review
nsIClassInfo needed to be implemented. Now we get a bit further. Still crashing though.
Attachment #9029890 - Attachment is obsolete: true
Now it crashes since nsMsgMailNewsUrl::Read() finds m_baseURL==null. Just returning a read error doesn't help since then we get to:

[13788, Main Thread] ###!!! ASSERTION: IPDL error: "Error deserializing 'principal' (nsIPrincipal) member of 'WindowGlobalInit'". Intentionally crashing.: 'Error', file c:/mozilla-source/comm-central/ipc/glue/ProtocolUtils.cpp, line 254
Hit MOZ_CRASH(IPC FatalError in the parent process!) at c:/mozilla-source/comm-central/ipc/glue/ProtocolUtils.cpp:259
#01: mozilla::ipc::IPDLParamTraits<mozilla::dom::WindowGlobalInit>::Read (c:\mozilla-source\comm-central\obj-x86_64-pc-mingw32\ipc\ipdl\domtypes.cpp:1962)

So looks like we need to read and write this stuff properly as per IRC:
21:08:41 - nika: Given that you don't have e10s you could also do the simpler thing and just put them in a global table
21:08:42 - jorgk: the URI objects are extensions of nsIURL and have some other stuff tagged on the side
21:08:49 - nika: And then store the ID to look up
21:09:10 - nika: We just need to be able to write it to a channel & pull it out of one.
21:09:29 - nika: The channel will stay on the main thread, we just need to be able to serialize to it
Digging through bug 1447272 (bug 1447272 comment #5 and further down), the Read/Write stuff needs a different boilerplate which I've added here. For some reason, we're getting into nsMsgMailNewsUrl::Read() and crash.
Here's another version which doesn't work since nsStandardURL::Read() can't be called directly.

Maybe Valentin can lend a hand as to how that Read()/Write() needs to be implemented.
Assignee: nobody → jorgk
Flags: needinfo?(valentin.gosu)
Attachment #9029893 - Attachment description: make-url-serialisable.patch → make-url-serialisable.patch (v2)
Attachment #9029901 - Attachment description: make-url-serialisable.patch → make-url-serialisable.patch (v4)
Following Nika's suggestion of a "global table" I've just used memory in general as my global table. This is a giant hack, but at least messages display again.

Let's see:
https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=afac1cc2d6c612f3d0f61406e7bb8cbd850e0434
OK, the hacky approach crashes, most likely due to the incorrect ref-counting. So here's something based on:

21:07:44 - nika: Even if it's just by stringifying them to their spec and re-parsing them

https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=d57a8d2f4869744afcd0e18a29629ec2ad1470d4
With that patch mail in local folders displays OK, also news and feed messages. IMAP with messages in non-offline folders crashes on some MOZ_ASSERT() in IMAP caching, that's most likely due to the fact that the code only partly writes/reads the nsMsgMailNewsUrl members like m_memCacheEntry are ignored so far. Looks like the try is turning green, since there are no IMAP MozMill tests, so I'll land this and I'll have to follow-up with more.

Clearing a few NIs here. Leaving Nika, Boris and Valentin. I'd just like to know whether we can drop the requirement of having the URIs serialisable if they are associated with a codebase principal:
https://searchfox.org/mozilla-central/rev/adec563403271e78d1a057259b3e17fe557dfd91/caps/ContentPrincipal.cpp#543-549

And if so, how can we do the serialisation better, Valentin?
Flags: needinfo?(bugs)
Flags: needinfo?(afarre)
Keywords: leave-open
Attachment #9029893 - Attachment is obsolete: true
Pushed by mozilla@jorgk.com:
https://hg.mozilla.org/comm-central/rev/e993eb9d7c1b
Make nsMsgMailNewsUrl serialisable in a rudimentary way. rs=bustage-fix
Comment on attachment 9029937 [details] [diff] [review]
make-url-serialisable.patch (v6) - rudimentary serialisation [landed in comment #19]

Valentin, can you please comment on this. Also on (v3) and (v4).
Attachment #9029937 - Attachment description: make-url-serialisable.patch (v6) → make-url-serialisable.patch (v6) - rudimentary serialisation [landed in comment #19]
Flags: needinfo?(valentin.gosu)
Attachment #9029937 - Flags: feedback?(valentin.gosu)
Attachment #9029896 - Attachment description: make-url-serialisable.patch (v3) → make-url-serialisable.patch (v3) - with mutator boilerplate
Attachment #9029901 - Attachment description: make-url-serialisable.patch (v4) → make-url-serialisable.patch (v4) - creating new about:blank to read into
Comment on attachment 9029930 [details] [diff] [review]
make-url-serialisable.patch (v5) - hack "global table"

This is a hacky "global table". How could that be done correctly?
Attachment #9029930 - Attachment description: make-url-serialisable.patch (v5) → make-url-serialisable.patch (v5) - hack "global table"
Flags: needinfo?(nika)
Attachment #9029930 - Flags: feedback?(nika)
(In reply to Jorg K (GMT+1) from comment #14)
> Created attachment 9029896 [details] [diff] [review]
> make-url-serialisable.patch (v3) - with mutator boilerplate
> 
> Digging through bug 1447272 (bug 1447272 comment #5 and further down), the
> Read/Write stuff needs a different boilerplate which I've added here. For
> some reason, we're getting into nsMsgMailNewsUrl::Read() and crash.

This patch is mostly correct - similar to what we do in nsStandardURL.
It's the deserialization code that should use a mutator, instead of instantiating nsMsgMailNewsUrl and reading it.
Do you have a stack trace?
Comment on attachment 9029937 [details] [diff] [review]
make-url-serialisable.patch (v6) - rudimentary serialisation [landed in comment #19]

Review of attachment 9029937 [details] [diff] [review]:
-----------------------------------------------------------------

This seems mostly correct, with the note that nsMsgMailNewsUrl::Read should not be called directly, as nsIURIs are expected to be immutable - it should be called via the mutator, as in v3.

I think the reason for the crash was that nsBinaryInputStream::ReadObject is calling do_CreateInstance(cid) - the cid for nsMsgMailNewsUrl should create a mutator, not a nsIURI - just as we did for m-c in bug 1442239.

Also, I haven't looked too carefully, but things that extend nsMsgMailNewsUrl such as nsSmtpUrl will probably to override the implementation of Write and Read/ReadPrivate.
Attachment #9029937 - Flags: feedback?(valentin.gosu) → feedback+
(In reply to Valentin Gosu [:valentin] from comment #22)
> It's the deserialization code that should use a mutator, instead of
> instantiating nsMsgMailNewsUrl and reading it.
> Do you have a stack trace?
This is about (v3):

You mean, no one should actually be calling nsMsgMailNewsUrl::Read(). Yes, we hit the MOZ_ASSERT_UNREACHABLE() here:

Assertion failure: false (MOZ_ASSERT_UNREACHABLE: Use nsIURIMutator.read() instead), at c:/mozilla-source/comm-central/comm/mailnews/base/util/nsMsgMailNewsUrl.cpp:91
#01: mozilla::ContentPrincipal::Read (c:\mozilla-source\comm-central\caps\contentprincipal.cpp:544)
#02: nsBinaryInputStream::ReadObject (c:\mozilla-source\comm-central\xpcom\io\nsbinarystream.cpp:907)
#03: NS_DeserializeObject (c:\mozilla-source\comm-central\netwerk\base\nsserializationhelper.cpp:40)
#04: IPC::ParamTraits<nsIPrincipal>::Read (c:\mozilla-source\comm-central\dom\ipc\permissionmessageutils.cpp:49)

So looks like we have two problems here:
1) The "read technicalities", via mutator, etc.
   I'm looking through https://hg.mozilla.org/mozilla-central/rev/61aa0247279a from bug 1442239
   and basically you've switched various CIDs to creating a mutator.
2) How much of the nsMsgMailNewsUrl needs to be written and read for the application logic to function.
   There are many not-so-easy-to-serialise members, see:
   https://searchfox.org/comm-central/rev/16003487aabb9202fe21e4fc43777478d0dd916e/mailnews/base/util/nsMsgMailNewsUrl.h#105-130
   For a reason we have so far avoided implementing serialisation for those URLs.

That's why instead of really serialising it, storing it away in a "global table" has some appeal. Even the hacky (v5) works somewhat, modulo some memory management issues.

I was just looking into the GetClassIDNoAlloc() since so far that always returns kNS_MAILBOXURL_CID when we have the following derived classes: mailbox, imap, nntp, smtp, pop3 and the JS account URL.
Forgot the link:
https://searchfox.org/comm-central/search?q=public+nsMsgMailNewsUrl&case=false&regexp=false&path=
and those classes are even harder to serialise, just look at:
https://searchfox.org/comm-central/rev/16003487aabb9202fe21e4fc43777478d0dd916e/mailnews/imap/src/nsImapUrl.h#68-123
https://searchfox.org/comm-central/rev/16003487aabb9202fe21e4fc43777478d0dd916e/mailnews/compose/src/nsSmtpUrl.h#123-142

So really think we need to determine which parts of the URL need to be maintained in what Nika phrased this way:
21:09:10 - nika: We just need to be able to write it to a channel & pull it out of one.
21:09:29 - nika: The channel will stay on the main thread, we just need to be able to serialize to it

So what exactly is used from the object being pulled out?
> 1) The "read technicalities", via mutator, etc.
>    I'm looking through
> https://hg.mozilla.org/mozilla-central/rev/61aa0247279a from bug 1442239
>    and basically you've switched various CIDs to creating a mutator.

So, the main cause for all these issues is that nsIURIs have certain properties in m-c, that c-c URIs currently don't. m-c nsIURIs are completely immutable, and may be used off-main-thread. Sooner or later we will get to a point where a c-c implemented nsIURI is used off-main-thread on m-c code. 

> 2) How much of the nsMsgMailNewsUrl needs to be written and read for the
> application logic to function.
>    There are many not-so-easy-to-serialise members, see:

We had a similar issue with nsHostObjectURI (bug 1416791) which we fixed by removing the mutable/non-serializable members from the URI implementations - and moving them to another hashtable. So instead of uri.m_imapServerSink you'd have something like GetAssociatedInfo(uri)->m_imapServerSink and you might not even need to have any nsIURI implementations in c-c code.
I think we don't stand a chance to serialise all our six URLs, so here some infrastructure that will allow us to just create a generic nsMsgMailNewsUrl object and pass its class ID back. It's more correct than what we had using kNS_MAILBOXURL_CID, which is one of the six.

As far as I can see, that also fixes the IMAP problems I mentioned.

Valentin, how much do you insist on doing things right?

Maybe Neil can take a look since he's the last "real" Mailnews man standing ;-)

Ben, we had some more trouble with M-C now requiring to be able to serialise our URLs as part of serialising codebase principals.

You should try whether JS Account and OWL still work after these changes.

Try:
https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=bbf42a591b9213b06c4f40d6ed3fcdca94dcb17e
Attachment #9029896 - Attachment is obsolete: true
Attachment #9029901 - Attachment is obsolete: true
Attachment #9029930 - Attachment is obsolete: true
Attachment #9029930 - Flags: feedback?(nika)
Flags: needinfo?(bzbarsky) → needinfo?(ben.bucksch)
Attachment #9029968 - Flags: review?(neil)
Comment on attachment 9029968 [details] [diff] [review]
1512356-NS_MSGMAILNEWSURL_CID.patch

Review of attachment 9029968 [details] [diff] [review]:
-----------------------------------------------------------------

::: mailnews/base/util/nsMsgMailNewsUrl.cpp
@@ -144,2 @@
>  NS_IMETHODIMP nsMsgMailNewsUrl::GetClassIDNoAlloc(nsCID *aClassIDNoAlloc) {
> -  *aClassIDNoAlloc = kNS_MAILBOXURL_CID;  // XXX TODO: May need to vary based on type.

This removes the hasty hack from this morning.
(In reply to Jorg K (GMT+1) from comment #27)
> As far as I can see, that also fixes the IMAP problems I mentioned.
> Valentin, how much do you insist on doing things right?

I don't have any issues with hacks, as long as they don't make it even harder to do the right thing in the future. :)
So, if you're fine with it for now, that works for me.
It would be nice to have it done right, but I suppose you might have higher priority issues on your hands. I'll take a look this weekend, maybe I could help.
Thanks for the comment. I filed a follow-up, bug 1512698. I will land the second patch here to remove the dependence on mailbox (since there are five more derived classes).
Status: NEW → ASSIGNED
Keywords: leave-open
> Ben, ... You should try whether JS Account and OWL still work after these changes.

Thanks for the heads-up :-)
Flags: needinfo?(ben.bucksch)
Pushed by mozilla@jorgk.com:
https://hg.mozilla.org/comm-central/rev/e0c27feaf097
Follow-up: Create NS_MSGMAILNEWSURL_CID and related items. rs=bustage-fix
Status: ASSIGNED → RESOLVED
Closed: 10 months ago
Resolution: --- → FIXED
Target Milestone: --- → Thunderbird 65.0
You need to log in before you can comment on or make changes to this bug.