Closed Bug 1801067 Opened 1 year ago Closed 1 year ago

Port bug 1791633: separate TLS socket control from transport security info - comm/mailnews/imap/test/unit/test_starttlsFailure.js failing on debug

Categories

(Thunderbird :: General, defect)

defect

Tracking

(thunderbird_esr102 unaffected, thunderbird109 wontfix, thunderbird111 affected)

RESOLVED FIXED
112 Branch
Tracking Status
thunderbird_esr102 --- unaffected
thunderbird109 --- wontfix
thunderbird111 --- affected

People

(Reporter: benc, Assigned: KaiE)

References

Details

(Keywords: intermittent-failure, Whiteboard: [TM:111.0b3])

Attachments

(7 files, 9 obsolete files)

48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
12.80 KB, text/plain
Details
22.55 KB, patch
Details | Diff | Splinter Review
48 bytes, text/x-phabricator-request
Details | Review
No description provided.
Assignee: nobody → benc
Status: NEW → ASSIGNED

Late here and I won't be awake to see if it builds with this patch, so I kicked off a try run:
https://treeherder.mozilla.org/jobs?repo=try-comm-central&revision=0a7527c83e45e96c57d5b80b077c3f3c533d9d4e

Pushed by martin@humanoids.be:
https://hg.mozilla.org/comm-central/rev/618f8ad3da6f
Handle changes from Bug 1791633, renameing nsISSLSocketControl to nsITLSSocketControl. r=freaktechnik

Status: ASSIGNED → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED

Even with my patch there's still one debug test failure left from this: mailnews/imap/test/unit/test_starttlsFailure.js. See for example https://treeherder.mozilla.org/logviewer?job_id=396973492&repo=try-comm-central&lineNumber=4952

This failure happens inside the native IMAP protocol implementation where it tries to call IsAlive on the socket from the wrong thread from what I can tell. However I couldn't quite follow what thread that call happens in vs. what thread it should be in. As explained by :keeler in phabricator:

[...] debug builds [...] should abort with an assertion failure if the current thread isn't the same thread the socket control was created on.

I also wouldn't assert that all our calls are correct with my patch, there might still be untested code paths that don't behave correctly under the new restrictions.

Status: RESOLVED → REOPENED
Resolution: FIXED → ---

Pushed by mkmelin@iki.fi:
https://hg.mozilla.org/comm-central/rev/59629a2d1298
followup - Pass security info from JS protocol clients. r=mkmelin

Right, mailnews/imap/test/unit/test_starttlsFailure.js is failing on debug.
https://treeherder.mozilla.org/logviewer?job_id=396973492&repo=try-comm-central&lineNumber=4966

I'm assuming m_transport was created on the main thread, and now it's being checked in the imap thread?

Could we do without the isAlive check? https://searchfox.org/comm-central/rev/5901d59728b25f72b55397ce192ae1adfb5d4057/mailnews/imap/src/nsImapProtocol.cpp#1330

Summary: Port Bug 1791633: separate TLS socket control from transport security info → Port bug 1791633: separate TLS socket control from transport security info - comm/mailnews/imap/test/unit/test_starttlsFailure.js failing on debug

(In reply to Magnus Melin [:mkmelin] from comment #7)

Right, mailnews/imap/test/unit/test_starttlsFailure.js is failing on debug.
https://treeherder.mozilla.org/logviewer?job_id=396973492&repo=try-comm-central&lineNumber=4966

I'm assuming m_transport was created on the main thread, and now it's being checked in the imap thread?

Certainly looks like m_transport is being created in the main thread, via SetupWithUrlCallback(), which called by SetupWithUrl(), either directly, or as the result of a proxy lookup callback.

And yes, it's stupid that the sockettransport for the IMAP thread is being created on the main thread (the transport lifetime is more-or-less synced to the IMAP thread lifetime, right?)... but it seems to be done like this so it can get the proxy info from a mockchannel. Or something.
It basic boils down to the crazy way we deal with channels/requests and have stateful urls, and also the confusion of our nsImapProtocol (which is really the IMAP connection interface, but also used for requests) and nsImapMockChannel.

BUT, nsISocketTransport claims that it can be shared across threads:

* NOTE: This is a free-threaded interface, meaning that the methods on
* this interface may be called from any thread.

So there's a M-C bug there? Either the same-thread assertion we're hitting, or the comment is wrong.

Could we do without the isAlive check? https://searchfox.org/comm-central/rev/5901d59728b25f72b55397ce192ae1adfb5d4057/mailnews/imap/src/nsImapProtocol.cpp#1330

I don't know... there are a bunch of other isAlive checks called from the IMAP thread, which do seem important. I.e. "is the connection still up?"
Although these don't seem to cause problems? Maybe it's only an issue with a socket which has a failed TLS attempt?

Assuming it's not an M-C bug, I think a more "proper" solution would be to get the the transport created on the IMAP thread in the first place, instead of on-demand at SetupWithUrl() time. But there's the issue of the proxy setup. A mock channel is being used to set up the proxy (the MsgExamineForProxyAsync() call in SetupWithUrl()). And I've not idea what's going on there. (I'd have to do a lot of digging)
Alternatively, we could try dispatching the SetupWithUrlCallback() calls synchonously (ugh) to the IMAP thread... but I don't know how to do that. We've got a PR_Thread for the IMAP thread, but I don't know how (or if) to dispatch calls to that....

Is the documentation wrong (see comment 8)?

Flags: needinfo?(dkeeler)

... and the patch from comment #10 doesn't fix that. Please compile a debug version and run it with a few IMAP accounts.
Crash is:

Assertion failure: mOwningThread == PR_GetCurrentThread(), at C:/mozilla-source/mozilla-central/security/manager/ssl/CommonSocketControl.cpp:455
#01: CommonSocketControl::GetSecurityInfo (C:\mozilla-source\mozilla-central\security\manager\ssl\CommonSocketControl.cpp:455)
#02: nsImapProtocol::LoadImapUrlInternal (C:\mozilla-source\mozilla-central\comm\mailnews\imap\src\nsImapProtocol.cpp:2353)
#03: nsImapProtocol::SetupWithUrl (C:\mozilla-source\mozilla-central\comm\mailnews\imap\src\nsImapProtocol.cpp:958)
#04: nsImapProtocol::LoadImapUrl (C:\mozilla-source\mozilla-central\comm\mailnews\imap\src\nsImapProtocol.cpp:2308)
#05: nsImapIncomingServer::LoadNextQueuedUrl (C:\mozilla-source\mozilla-central\comm\mailnews\imap\src\nsImapIncomingServer.cpp:541)
#06: `anonymous namespace'::SyncRunnable2<nsIImapServerSink,nsIImapProtocol *,bool *>::Run (C:\mozilla-source\mozilla-central\comm\mailnews\imap\src\nsSyncRunnableHelpers.cpp:121)

(In reply to Ben Campbell from comment #8)

BUT, nsISocketTransport claims that it can be shared across threads:

* NOTE: This is a free-threaded interface, meaning that the methods on
* this interface may be called from any thread.

So there's a M-C bug there? Either the same-thread assertion we're hitting, or the comment is wrong.

Yeah, that might be a bug. nsSocketTransport::IsAlive is calling PR_Recv, so that should really be on the socket thread, if my understanding is correct.

Assuming it's not an M-C bug, I think a more "proper" solution would be to get the the transport created on the IMAP thread in the first place, instead of on-demand at SetupWithUrl() time. But there's the issue of the proxy setup. A mock channel is being used to set up the proxy (the MsgExamineForProxyAsync() call in SetupWithUrl()). And I've not idea what's going on there. (I'd have to do a lot of digging)

Can the IMAP thread be the same as the socket thread? That might solve all these issues at once (or it might introduce dealocks, if that's not a sound thing to do).

Flags: needinfo?(dkeeler)

(In reply to Dana Keeler (she/her) (use needinfo) (:keeler for reviews) from comment #14)

Yeah, that might be a bug. nsSocketTransport::IsAlive is calling PR_Recv, so that should really be on the socket thread, if my understanding is correct.
Can the IMAP thread be the same as the socket thread? That might solve all these issues at once (or it might introduce dealocks, if that's not a sound thing to do).

Sorry, my shakey understanding of the threading and IO restrictions is showing here :-(
What do you mean by "socket" thread?

  • the thread the socket was opened on?
  • the background thread pool used for dispatching blocking IO via NS_DispatchBackgroundTask()?
  • something else?

If the first one, then yes - it seems that it'd be prudent to open the socket on the thread that's using it (it's currently opened on the main thread, then passed over to the IMAP thread).
Is there any particular documentation on this stuff that I should be reading?

The IMAP thread is a bit annoying, TBH - there's a new thread opened up for each IMAP connection, and it just loops, running a (messy) state machine and doing blocking IO in a loop. It should be using NS_DispatchBackgroundTask() instead of dedicated IMAP threads, I think (Bug 1618934). But that'd be a big job - the IMAP code is pretty hairy and brittle, and has loads of hard-won real-world-server hacks in there which nobody really has a good grasp on :-) A rewrite in JS is in the works, but isn't ready for prime time yet.

Flags: needinfo?(dkeeler)

(In reply to Ben Campbell from comment #15)

Sorry, my shakey understanding of the threading and IO restrictions is showing here :-(
What do you mean by "socket" thread?

  • the thread the socket was opened on?
  • the background thread pool used for dispatching blocking IO via NS_DispatchBackgroundTask()?
  • something else?

Or The Socket Transport Service (STS) thread? (That's the one I was trying to think of!)

Yes, the socket transport service thread. My understanding is any and all networking should happen on that thread.

Flags: needinfo?(dkeeler)

I'm out of ideas. I think we should disable on debug until we can fix it, if we ever fix it.

Please note comment #12. Any debug build with an IMAP account is unusable since it constantly runs into this MOZ_ASSERT:
https://searchfox.org/mozilla-central/rev/ce78234f5e653a5d3916813ff990f053510227bc/security/manager/ssl/CommonSocketControl.h#21
Developers wishing to work with a debug build will forever need to remove this MOZ_ASSERT.

Sure, but with no fix in sight, having the test on is just polluting the tree making it harder to notice other new problems that may come up.

Maybe M-C would accept a patch similar to this one to avoid MOZ_ASSERT() in a debug build. However, the patch creates very noisy output.

(In reply to Ben Campbell from comment #8)

Alternatively, we could try dispatching the SetupWithUrlCallback() calls synchonously (ugh)
to the IMAP thread... but I don't know how to do that. We've got a PR_Thread for the IMAP thread,
but I don't know how (or if) to dispatch calls to that....

We didn't find a way to dispatch to a PRThread, but one can dispatch to an nsIThread. The attached patch compiles but doesn't work. It's unclear show to spin the event loop on the IMAP thread, the dispatch call with NS_DISPATCH_SYNC never returns.

Ben, do you know whether this is totally wrong or whether this approach can be made to work?

Flags: needinfo?(benc)
Comment on attachment 9306938 [details] [diff] [review]
1801067-proxy-transport-creation.patch

Review of attachment 9306938 [details] [diff] [review]:
-----------------------------------------------------------------

::: mailnews/imap/src/nsImapProtocol.cpp
@@ +1004,5 @@
> +    printf("=== nsImapTransportProxy::Run()\n");
> +    nsresult rv = NS_OK;
> +    nsCOMPtr<nsISocketTransportService> socketService =
> +        do_GetService(NS_SOCKETTRANSPORTSERVICE_CONTRACTID, &rv);
> +    rv = socketService->CreateTransport(mConnectionTypeArray, mHostname, mPort,

I don't think *creating* the transport off the socket thread is the problem. *Using* the transport off the socket thread is the problem. Did you try dispatching the use of it to the socket transport thread?

Comment on attachment 9306938 [details] [diff] [review]
1801067-proxy-transport-creation.patch

Thanks for the comment. Apologies, we didn't read the bug carefully enough, in particular comment #17. We're also not familiar with the socket thread. Looks like this search gives some pointers to code using the sts thread. Which "networking" needs to be proxied there? Opening input/output streams and reading/writing to them?

The protocols (POP3, SMTP, NNTP) recently re-implemented in JS all use TCPSocket. Looks like the code is here which must be using the correct thread already. Maybe that's a good example to study. Overall, making things happen on the right thread in the legacy C++ code may be a big task, so depending on the timing of the new IMAP implementation in JS it might not be worth the effort. Maybe in the meantime just take a patch like in attachment 9306930 [details] [diff] [review].

Attachment #9306938 - Attachment is obsolete: true
Flags: needinfo?(benc)
Duplicate of this bug: 1805111

The test works in imap-js because ImapClient checks capability first and doesn't send STARTTLS if no capability found. But ImapProtocol sends STARTTLS directly.

It crashes when creating a new account with imap-js or pop3-js at https://searchfox.org/mozilla-central/rev/abcee8d2c97a5c8a1fbeaf84607ea427be72497a/security/manager/ssl/NSSSocketControl.cpp#277

Attachment 9310585 [details] is an example of something that could be done as a temporary workaround while IMAP gets rewritten in JS.

(In reply to Ping Chen (:rnons) from comment #28)

The test works in imap-js because ImapClient checks capability first and doesn't send STARTTLS if no capability found. But ImapProtocol sends STARTTLS directly.

It crashes when creating a new account with imap-js or pop3-js at https://searchfox.org/mozilla-central/rev/abcee8d2c97a5c8a1fbeaf84607ea427be72497a/security/manager/ssl/NSSSocketControl.cpp#277

So it's still a problem for the protocols rewritten in js, that use TCPSocket? In that case, a bug in TCPSocket?

Flags: needinfo?(remotenonsense)

(In reply to Magnus Melin [:mkmelin] from comment #31)

So it's still a problem for the protocols rewritten in js, that use TCPSocket? In that case, a bug in TCPSocket?

TCPSocket uses the same SocketTransportService under the hood.

(In reply to Dana Keeler (she/her) (use needinfo) (:keeler for reviews) from comment #30)

Attachment 9310585 [details] is an example of something that could be done as a temporary workaround while IMAP gets rewritten in JS.

Thanks, tlsSocketControl->StartTLS() needs to be updated as well https://searchfox.org/comm-central/rev/0fefd9ea7a9c8e805c833ce935cb54df597453df/mailnews/imap/src/nsImapProtocol.cpp#1820

smtp-js/nttp-js/pop3-js/ldap-js also use TCPSocket, so I think it's better to update TCPSocket.cpp. In attachment 9310605 [details] I tried to dispatch to the socket thread, but it seems to make no difference, do you know why? Thanks

Flags: needinfo?(remotenonsense)

Dana, any pointers?

Flags: needinfo?(dkeeler)

Yeah, TCPSocket should probably be fixed. Looking at https://searchfox.org/mozilla-central/source/dom/webidl/TCPSocket.webidl, many (all?) of those methods/attributes should probably actually return promises instead of directly doing whatever operation on the main thread (as opposed to the socket thread).

Flags: needinfo?(dkeeler)
Depends on: 1809755

Thanks, I filed bug 1809755 to the best of my understanding of the problem.

Dana, thank you very much for your patch (comment 29), it works well for me. No more assertions when using a debug build.

Ping, regarding comment 32, I guess you were testing a StartTLS configuration (I haven't yet).
I'll attach an additional patch for the place you've pointed out, copying Dana's code.

Comment on attachment 9306930 [details] [diff] [review]
1801067-no-moz-assert-for-TB.patch

I'm marking this patch comment 24 as obsolete, I don't think we want to disable the integrity/correctness assertions for Thunderbird. We need a real fix.

Attachment #9306930 - Attachment is obsolete: true
Attachment #9305218 - Attachment is obsolete: true
Attachment #9306357 - Attachment is obsolete: true

With D165915 and D168111 applied, test_starttlsFailure.js works again in a debug build, so we should undo commit from comment 22, I've attached revision D168114 to do so.

I think we should get these three revisions r+'ed and landed, as a partial fix.

(In reply to Ping Chen (:rnons) from comment #32)

Doesn't work yet, please help.

Ping, indeed your revision D165926 doesn't work yet.
That code apparently creates some temporary variables, and attempts to destroy them on the wrong thread, with the following output:

Hit MOZ_CRASH(DOMEventTargetHelper not thread-safe) at /home/user/moz/comm-beta/mozilla/xpcom/base/nsISupportsImpl.cpp:43

I made another patch which seems to work, I'll attach it to bug 1809755.

Attachment #9310585 - Attachment description: Bug 1801067 - run GetSecurityInfo and IsAlive synchronously on the socket thread in imap → Bug 1801067 - run GetSecurityInfo and IsAlive synchronously on the socket thread in imap. r=kaie
Attachment #9314551 - Attachment description: WIP: Bug 1801067 - run StartTLS synchronously on the socket thread in imap → WIP: Bug 1801067 - run StartTLS synchronously on the socket thread in imap. r=keeler
Attachment #9314560 - Attachment description: WIP: Bug 1801067 - Re-enable imap test test_starttlsFailure.js on cpp debug. → WIP: Bug 1801067 - Re-enable imap test test_starttlsFailure.js on cpp debug. r=mkmelin

(In reply to Kai Engert (:KaiE:) from comment #41)

I'm marking this patch comment 24 as obsolete, I don't think we want to disable the integrity/correctness assertions for Thunderbird. We need a real fix.

Kai, with the three patches applied, can yon run a debug TB with an IMAP/SSL account or does it still crash? Note that attachment 9306930 [details] [diff] [review] was intended to allow developers to work (crashes made any work impossible, see for bug 1805111 comment #2).

(In reply to BB from comment #45)

Kai, with the three patches applied, can yon run a debug TB with an IMAP/SSL account or does it still crash?

Yes it runs fine. No crash.

Duplicate of this bug: 1814054
Attachment #9310605 - Attachment is obsolete: true

The three accepted patches are ready for landing together - remaining work will be done in bug 1809755.

Pushed by geoff@darktrojan.net:
https://hg.mozilla.org/comm-central/rev/5904645b6d72
run GetSecurityInfo and IsAlive synchronously on the socket thread in imap. r=kaie
https://hg.mozilla.org/comm-central/rev/2276f3710694
run StartTLS synchronously on the socket thread in imap. r=keeler
https://hg.mozilla.org/comm-central/rev/93e86bff6c50
Re-enable imap test test_starttlsFailure.js on cpp debug. r=mkmelin

Status: REOPENED → RESOLVED
Closed: 1 year ago1 year ago
Resolution: --- → FIXED
Target Milestone: --- → 111 Branch
Pushed by geoff@darktrojan.net:
https://hg.mozilla.org/comm-central/rev/1c4f5f102b6b
follow-up - Fix C++ formatting. rs=clang-format DONTBUILD

I can confirm, the landed patches fix the startup crash observed in Bug 1814054.

Backout by geoff@darktrojan.net:
https://hg.mozilla.org/comm-central/rev/3d90da3f05c2
Backed out 4 patches for causing frequent hangs, bug 1816633.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---

I think there is one other place where the getSecurityInfo() changes was not fixed. I haven't seen a crash or lockup since I changed this but not 100% sure this is the only problem.

See Also: → 1814748
Attachment #9318065 - Attachment description: Needs on more "dispatch to socket thread" fix, I think.diff → Needs one more "dispatch to socket thread" fix, I think.diff

Given the risk of deadlocks, I think we need a more fundamental change.

I think we should stop running IMAP code on the main thread.

nsSyncRunnableHelpers has code to dispatch many functions to the main thread.
I think that should be changed to dispatch to the socket thread instead.

I'll make that change locally and test what happens.

Well, the ideal solution from comment 54 seems very difficult to do. In my experiment I immediately run into problems.
Some code must run on the main thread, and we're reaching assertions when running it from either the IMAP thread or the socket thread.
The IMAP code has a lot of assumptions baked in about the thread it is running on...

So, let's try harder with the pragmatic approach (minimal changes).
I've made more experiments.

Both Emilio and Magnus had suggested to skip the call to IsAlive.

I agree we could remove that from nsImapProtocol::TellThreadToDie()

In nsImapProtocol::ImapThreadMainLoop() the result of the call is used to decide about retrying. I don't have a great idea how to change that logic. However, that code runs on the IMAP thread, and the call to IsAlive should be simple and quick. I'm hoping it's safe to dispatch IsAlive to the socket thread in that place.

Then we have the calls to GetSecurityInfo. When Emilio hit the deadlock in bug 1816633 comment 1, it was inside nsImapProtocol::LoadImapUrlInternal(), running on the main thread, dispatching GetSecurityInfo() to the socket thread.

As a wild experiment I simply disabled that code, which skips adding that info to the mock channel.
In my manual testing so far I didn't find a negative side effect yet.
Let me allow to cross fingers and hope that we indeed could disable it, and maybe follow up later with a solution, should it be necessary.

Gene pointed out another place in comment 53, which we weren't yet handling. That's true, I tested a connection that requires a certificate exception, and Gene's addition is necessary in that place. Luckily, that place runs on the IMAP thread, so again (hopefully) it's safe to proxy to the socket thread from there. In my manual testing, adding the exception worked.

Let's see if that combination of changes survives a try build.
https://treeherder.mozilla.org/jobs?repo=try-comm-central&revision=41769cab39516687f216bd8a537ffa6aede17b24

Try build looks good (no test failures related to imap).

I've been running a local m-c build with the patches (from that try build) for the last 12 hours, and it didn't hang yet.

The code that obtains the security from the socket (and stores it into the m_mockChannel),
which had caused the deadlock in (bug 1816633),
and which these patches disable,
seems to be unnecessary.

In an experiment, I've reenabled the above code (risking the deadlock, which is rare),
and I've set a breakpoint on nsImapMockChannel::GetSecurityInfo().
Whenever that function was reached, mSecurityInfo was still null !

The callers are the document, which attempt to obtain securityInfo early, and keep a copy "for potentially using it later", and those always get a nullptr.

So apparently it isn't a problem that this returns null,
and setting the mockChannel's securityInfo later on seemed useless.

I'm not surprised, because our mail window doesn't have any place to report the TLS status of the server connection, which means, we probably don't have a reason to query/use the securityInfo for good connections anywhere.

I cannot tell if there are other scenarios, which I haven't covered in my interactive testing, in which it would be necessary.

We could either remove it now completely, and hope there won't be any regression reports, or we could use a different strategy for obtaining the securityInfo.

If we want to do a latter (try to be carefuly, and obtain and set it anyway), here is a potential approach:

The securityInfo is related to the server connection.
I don't know if one IMAP thread is used for connections to more than one server.

@Gene: Do you know that?

If an IMAP thread is limited to one specific server, then we could cache the security info at an earlier time, and reuse it for every URL that is processed on that thread.

Gene, do you know the answer to the question from the previous comment?

Flags: needinfo?(gds)

Updating attached patches/phab:

Test patch: Remains unchanged.

Call start TLS: Applied linting, and added an assertion for not being on the main thread, carrying forward r+

Old patch for GetSecurityInfo/IsAlive: Dropping

New patch for GetSecurityInfo/IsAlive: As explained in previous comments, will ask Gene for review.

Attachment #9310585 - Attachment description: Bug 1801067 - run GetSecurityInfo and IsAlive synchronously on the socket thread in imap. r=kaie → obsolete - Bug 1801067 - run GetSecurityInfo and IsAlive synchronously on the socket thread in imap. r=kaie
Attachment #9314560 - Attachment description: WIP: Bug 1801067 - Re-enable imap test test_starttlsFailure.js on cpp debug. r=mkmelin → Bug 1801067 - Re-enable imap test test_starttlsFailure.js on cpp debug. r=mkmelin
Attachment #9314551 - Attachment description: WIP: Bug 1801067 - run StartTLS synchronously on the socket thread in imap. r=keeler → Bug 1801067 - run StartTLS synchronously on the socket thread in imap. r=keeler

Emilio, because you ran into the deadlock, would you like to apply revisions D168111 and D170206 and test if it works for you?

The securityInfo is related to the server connection.
I don't know if one IMAP thread is used for connections to more than one server.

@Gene: Do you know that?

If an IMAP thread is limited to one specific server, then we could cache the security info at an earlier time, and reuse it for every URL that is processed on that thread.

Each imap thread handles a connection to only one server. There may be up to 5 connections (threads) per server with each thread imap selected to a single mailbox (a.k.a., folder).

Flags: needinfo?(gds)
Attached file bt-moz-crash.txt (obsolete) —

So far, mostly looks ok with Kai's latest changes applied.
Don't know is this is relevant but I often see this forced crash:
Hit MOZ_CRASH(nsWeakReference not thread-safe) at /home/gene/mozilla/xpcom/base/nsISupportsImpl.cpp:43
Attached is the backtrace from gdb. Doesn't look like it's caused by any comm code.
Since I run with a debug build I usually have this MOZ_CRASH commented out. I've seen this (maybe since Nov 2022) with and without the recent changes and churn to imap code. However, not sure if I saw it before the fundamental changes were made to moz code, maybe here: bug 1793841 ??

Edit: FYI, on console output I see this when this crash occurs:

Hit MOZ_CRASH(nsWeakReference not thread-safe) at /home/gene/mozilla/xpcom/base/nsISupportsImpl.cpp:43
Attachment #9318065 - Attachment is obsolete: true

I also see this quite a bit. I change it from MOZ_ASSERT to NS_ASSERTION so it doesn't crash. No idea what a "mutated DOM" is.

[Parent 120379, Main Thread] ###!!! ASSERTION: gds: Don't mutate the DOM while using a ShadowIncludingTreeIterator: '!mMutationGuard.Mutated(0)', file /home/gene/mozilla/obj-x86_64-pc-linux-gnu/dist/include/mozilla/dom/ShadowIncludingTreeIterator.h:45

Anyhow, without this and the item in comment 62 ignored I'm unable to run a debug build without constant crashes. Don't know if this is related to the the recent moz network changes.

Yes, I have also seen DOM crash with a debug build when loading HTML email. I switched to plain text (menu / view /message body as / plain text).

Hmm those are both problematic assertions... The stack from comment 62 is not of the right thread, but those seem worth at least looking into

(In reply to Kai Engert (:KaiE:) from comment #64)

Yes, I have also seen DOM crash with a debug build when loading HTML email. I switched to plain text (menu / view /message body as / plain text).

Ok, haven't tried that yet. I'll check that I only see the DOM assert on emails opened with HTML.

(In reply to Emilio Cobos Álvarez (:emilio) from comment #65)

Hmm those are both problematic assertions... The stack from comment 62 is not of the right thread, but those seem worth at least looking into

I didn't think the stack looked right either. The crash output from tb just said to run the command shown to attach gdb to the thread. How do you tell which thread the crash is for and then bt it's stack?

(In reply to gene smith from comment #66)

I didn't think the stack looked right either. The crash output from tb just said to run the command shown to attach gdb to the thread. How do you tell which thread the crash is for and then bt it's stack?

You can do thread apply all bt to get a backtrace of all threads, then see which thread is asserting

Attachment #9318473 - Attachment is obsolete: true
Attached file dom-worker.txt

Emilio, thanks for thread apply all bt hint. Here's the crashing thread "DOM Worker" due to "nsWeakReference not thread-safe". When this happened I was opening a huge PDF into a tab. It appears the crash happened during the imap URL destructor. Before opening the PDF, the message had been successfully imap fetched and stored in cache.
To me, this doesn't seem to have anything to do with the threading issues for the current bug.
Edit: I see this assert for a pdf of any size opened to a tab. I didn't try to open the pdf into an external app but opening other attachment types into external apps doesn't cause the assert/crash.

Changes at https://phabricator.services.mozilla.com/D170206 look OK to me and work when I tested them. After testing this diff some I made a few changes and tested again to see if it still works without any lock-ups or crashes.
I made each of the separate in-line dispatching code chunks a separate private nsImapProtocol member function. I think it makes the changes easier to understand.
I also put back in the "is alive" determinations in the nsImapProtocol::TellThreadToDie() since it runs on imap thread.
Also, FWIW, I see non-null m_securityInfo placed into mock channel.
I tested extensively with this diff and didn't seen any crashes or deadlocks.

(In reply to gene smith from comment #69)

Also, FWIW, I see non-null m_securityInfo placed into mock channel.

Yes, I also saw that. But it happened after the other code had already queried it (when it was still null). (At least in the scenarios I have tested.)

Attachment #9318773 - Attachment description: WIP: Bug 1801067 - Restructured patch by gds → Bug 1801067 - Use safe calling to GetSecurityInfo and IsAlive in IMAP code. r=gds
Attachment #9314551 - Attachment is obsolete: true
Attachment #9318414 - Attachment is obsolete: true

Pushed by kaie@kuix.de:
https://hg.mozilla.org/comm-central/rev/0cbcf924b4a1
Re-enable imap test test_starttlsFailure.js on cpp debug. r=mkmelin
https://hg.mozilla.org/comm-central/rev/9856e8bc702c
Use safe calling to GetSecurityInfo and IsAlive in IMAP code. r=gds

Status: REOPENED → RESOLVED
Closed: 1 year ago1 year ago
Resolution: --- → FIXED
Assignee: benc → kaie

Wayne, because imap-js is being enabled by default on Daily, it means we won't get test coverage for this on Daily.

From now on, I think we should uplift all changes to the IMAP/C++ code immediately to beta.

Flags: needinfo?(vseerror)

Comment on attachment 9314560 [details]
Bug 1801067 - Re-enable imap test test_starttlsFailure.js on cpp debug. r=mkmelin

need beta uplift to test imap/c++ patches

Attachment #9314560 - Flags: approval-comm-beta?

Comment on attachment 9318773 [details]
Bug 1801067 - Use safe calling to GetSecurityInfo and IsAlive in IMAP code. r=gds

need beta uplift to test imap/c++ patches

Attachment #9318773 - Flags: approval-comm-beta?

IMAP-JS will not be permanently enabled on Daily, FYI.

(In reply to Kai Engert (:KaiE:) from comment #73)

Wayne, because imap-js is being enabled by default on Daily, it means we won't get test coverage for this on Daily.

Daily has test coverage for both CPP and JS IMAP.

test_starttlsFailure.js is in xpcshell-shared.ini. That file gets included by xpcshell-cpp.ini which runs the tests with mailnews.imap.jsmodule=false as well as xpcshell.ini where mailnews.imap.jsmodule=true.

Well, I'm sorry if I used "test coverage" in the wrong way.

My point is, the code won't be tested by users of Daily (for an unknown amount of time, see comment 76).

Flags: needinfo?(vseerror)
Whiteboard: [TM:111.0b3]
Target Milestone: 111 Branch → 112 Branch

Comment on attachment 9314560 [details]
Bug 1801067 - Re-enable imap test test_starttlsFailure.js on cpp debug. r=mkmelin

[Triage Comment]
Let's revisit this in a few days.

(In reply to Kai Engert (:KaiE:) from comment #78)

My point is, the code won't be tested by users of Daily (for an unknown amount of time, see comment 76).

Quite correct. However this is not a zero risk change, so perhaps instead we should consider again disabling imap-js (just as easy as having enabled it)

Other factors:

  • 111.0b1 when it does release (it hasn't yet), it is going to be a very low rate.
  • beta is already overloading users with supernova changes
  • with Rob's comment there is less urgency to test via beta
Attachment #9314560 - Flags: approval-comm-beta? → approval-comm-beta-

Comment on attachment 9318773 [details]
Bug 1801067 - Use safe calling to GetSecurityInfo and IsAlive in IMAP code. r=gds

[Triage Comment]
We can revisit this in a few days.

Attachment #9318773 - Flags: approval-comm-beta? → approval-comm-beta-

FWIW, I do think the patch is working. I had very frequent lockups earlier, but nothing while running dailies with the latest with the newest patch.

imap-js was disabled on daily as of two days ago.
I'd still feel better letting this ride to next Monday's merge, unless someone has objections.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: