Crash in IPCError-content | PPrintProgressDialog::Msg_CancelledCurrentJob Route error: message sent to unknown actor ID

VERIFIED FIXED in Firefox 59

Status

()

Core
Printing: Output
--
critical
VERIFIED FIXED
2 months ago
25 days ago

People

(Reporter: philipp, Assigned: bobowen)

Tracking

({crash, regression, topcrash})

59 Branch
mozilla60
crash, regression, topcrash
Points:
---

Firefox Tracking Flags

(relnote-firefox 59+, firefox-esr52 unaffected, firefox58 unaffected, firefox59+ verified, firefox60+ verified)

Details

(crash signature)

Attachments

(2 attachments, 1 obsolete attachment)

(Reporter)

Description

2 months ago
This bug was filed from the Socorro interface and is
report bp-41302054-6f8f-443c-afe3-eca7f0180224.
=============================================================

Top 10 frames of crashing thread:

0 xul.dll mozilla::dom::ContentChild::ProcessingError dom/ipc/ContentChild.cpp:2373
1 xul.dll mozilla::ipc::MessageChannel::MaybeHandleError ipc/glue/MessageChannel.cpp:2513
2 xul.dll mozilla::ipc::MessageChannel::DispatchAsyncMessage ipc/glue/MessageChannel.cpp:2112
3 xul.dll mozilla::ipc::MessageChannel::DispatchMessageW ipc/glue/MessageChannel.cpp:2040
4 xul.dll mozilla::ipc::MessageChannel::RunMessage ipc/glue/MessageChannel.cpp:1886
5 xul.dll mozilla::ipc::MessageChannel::MessageTask::Run ipc/glue/MessageChannel.cpp:1919
6 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1040
7 xul.dll NS_ProcessNextEvent xpcom/threads/nsThreadUtils.cpp:517
8 xul.dll mozilla::ipc::MessagePump::Run ipc/glue/MessagePump.cpp:125
9 xul.dll mozilla::ipc::MessagePumpForChildProcess::Run ipc/glue/MessagePump.cpp:301

=============================================================

content crashes with this signature are regressing in firefox 59 on windows and linux. reports come with a generic MOZ_CRASH(Content child abort due to IPC error).

we don't have much crash data available for 59.0a1 because crash reports that were submitted before december 26 got purged, but nightly build 20171130220131 is the earliest recorded crashing build.
there weren't many printing related patches landing in the first weeks of the 59.0a1 cycle & because it looks related, i'd be suspicious of bug 1409971 as the source of the regression & i'm tentatively setting this bug as blocking it.
This is content crash #4 (crash #8 overall).

Transferring over tracking flags from bug 1432409.
tracking-firefox59: --- → +
tracking-firefox60: --- → +
Keywords: topcrash
One of the comments says something like "Crash cancelling printing of Skarabee showcase poster" ("Plantage en voulant annuler une impression d'affiche vitrine sur Skarabee") and has the URL:

https://www.skarabee.net/property/basic/basic.aspx
(Assignee)

Comment 3

2 months ago
The problem here appears to be that we are relying on the Send__delete__ in ~PrintProgressDialogChild to prevent any more messages, but obviously we can still attempt to send them before the Recv__delete__ gets processed by the parent.
This is fairly high volume for beta so I expect it will show up more heavily on 59 release. 

I could still take a patch for the 59 RC build if you are able to come up with a fix, but I won't consider it a 59 blocker.
(Assignee)

Comment 5

2 months ago
I can reproduce this fairly consistently when cancelling a one page print, just as it completes.
(There is also a problem where if you cancel very early you break printing for that content process, but that is a different bug.)

I have what I believe to be a safe change which seems to fix the issue for me.
It may not fix all instances of this race, but I think a more comprehensive fix would carry quite a lot of risk on its own.
Assignee: nobody → bobowencode
Status: NEW → ASSIGNED
(Assignee)

Comment 6

2 months ago
Created attachment 8955603 [details] [diff] [review]
Don't try and send messages to PrintProgressDialogChild when printing is complete
Attachment #8955603 - Flags: review?(jwatt)
Comment on attachment 8955603 [details] [diff] [review]
Don't try and send messages to PrintProgressDialogChild when printing is complete

Review of attachment 8955603 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/printingui/nsPrintProgress.cpp
@@ +181,5 @@
>  NS_IMETHODIMP nsPrintProgress::OnStateChange(nsIWebProgress *aWebProgress, nsIRequest *aRequest, uint32_t aStateFlags, nsresult aStatus)
>  {
> +  if (XRE_IsE10sParentProcess() &&
> +      aStateFlags & nsIWebProgressListener::STATE_STOP) {
> +    MOZ_ASSERT(m_observer);

The assert seems a bit redundant since we'll deref and crash on the very next line. It would be helpful to have a comment here noting that we're notifying our PrintProgressDialogParent to avoid a race.

@@ +202,5 @@
>  
>  NS_IMETHODIMP nsPrintProgress::OnProgressChange(nsIWebProgress *aWebProgress, nsIRequest *aRequest, int32_t aCurSelfProgress, int32_t aMaxSelfProgress, int32_t aCurTotalProgress, int32_t aMaxTotalProgress)
>  {
> +  if (XRE_IsE10sParentProcess() && aCurSelfProgress >= aMaxSelfProgress) {
> +    MOZ_ASSERT(m_observer);

Ditto.
Attachment #8955603 - Flags: review?(jwatt) → review+
(Assignee)

Comment 10

2 months ago
(In reply to Bob Owen (:bobowen) from comment #5)
...
> (There is also a problem where if you cancel very early you break printing
> for that content process, but that is a different bug.)

Another try push, this time with a fix for this bug as well, because I just realised what must be going on.
This bug is in current release and probably has been for long time.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=e6c7ce7721b4ac3d3c7a7c0b6ef957c234c86b8c
(Assignee)

Comment 11

2 months ago
Apparently there's a JS implemented nsIObserver for the progress dialog as well:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=dd0c5830fc816fc5399ec982720304c5d4f3925e
(Assignee)

Comment 12

2 months ago
Created attachment 8955930 [details] [diff] [review]
Don't try and send messages to PrintProgressDialogChild when printing is complete
Attachment #8955930 - Flags: review?(jwatt)
(Assignee)

Updated

2 months ago
Attachment #8955603 - Attachment is obsolete: true
(Assignee)

Comment 13

2 months ago
Created attachment 8955931 [details] [diff] [review]
Part 2: When cancelling a print make sure the DialogOpened message has been sent
Attachment #8955931 - Flags: review?(jwatt)

Updated

2 months ago
Attachment #8955930 - Flags: review?(jwatt) → review+

Updated

2 months ago
Attachment #8955931 - Flags: review?(jwatt) → review+

Comment 14

2 months ago
Pushed by bobowencode@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/b47e93718912
Don't try and send messages to PrintProgressDialogChild when printing is complete. r=jwatt
https://hg.mozilla.org/integration/mozilla-inbound/rev/0036938db4fe
Part 2: When cancelling a print make sure the DialogOpened message has been sent. r=jwatt

Comment 15

2 months ago
Pushed by bobowencode@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/bb6db24a20dd
Backed out changeset 0036938db4fe - for accidental printf.
https://hg.mozilla.org/integration/mozilla-inbound/rev/2219e0ec9d4f
Part 2: When cancelling a print make sure the DialogOpened message has been sent. r=jwatt

Comment 16

2 months ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/b47e93718912
https://hg.mozilla.org/mozilla-central/rev/2219e0ec9d4f
Status: ASSIGNED → RESOLVED
Last Resolved: 2 months ago
status-firefox60: affected → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla60
(Assignee)

Comment 17

a month ago
Comment on attachment 8955930 [details] [diff] [review]
Don't try and send messages to PrintProgressDialogChild when printing is complete

Approval Request Comment
[Feature/Bug causing the regression]:
Bug 1409971

[User impact if declined]:
User's trying to cancel a print around the time it completes will continue to get intermittent crashes.

[Is this code covered by automated tests?]:
Some of the code paths must be covered, because of failures on try pushes.
These are probably around print preview, I don't think there are any currently running for actual printing.

[Has the fix been verified in Nightly?]:
I can no longer reproduce in latest Nightly.

[Needs manual test from QE? If yes, steps to reproduce]:
Cancelling a one page print as it completes, for example of http://example.com, reliably reproduces for me.
Some general printing and print preview manual regression tests should be run.

[List of other uplifts needed for the feature/fix]:
Other patch from bug, fixes a related issue.

[Is the change risky?]:
Somewhat.

[Why is the change risky/not risky?]:
The change is relatively simple, but the printing code is fairly fragile.

[String changes made/needed]:
None.
Attachment #8955930 - Flags: approval-mozilla-beta?
(Assignee)

Comment 18

a month ago
Comment on attachment 8955931 [details] [diff] [review]
Part 2: When cancelling a print make sure the DialogOpened message has been sent

See comment 17.
Attachment #8955931 - Flags: approval-mozilla-beta?
Comment on attachment 8955930 [details] [diff] [review]
Don't try and send messages to PrintProgressDialogChild when printing is complete

This would need to land on m-r at this point and would end up in our RC2 build if taken. Which seems awful late :(. If nothing else, maybe we can keep it on the radar for dot-release consideration.
Attachment #8955930 - Flags: approval-mozilla-beta? → approval-mozilla-release?
Attachment #8955931 - Flags: approval-mozilla-beta? → approval-mozilla-release?
Comment on attachment 8955930 [details] [diff] [review]
Don't try and send messages to PrintProgressDialogChild when printing is complete

This is just a little too late for 59 for me. Since we have already built RC1, this could make it to RC2 but we don't have time for RC3 in case of needing backout or seeing regressions.  I'll keep this open for 59 and keep it tracked for a potential dot release where we would have a little more breathing room.
Attachment #8955930 - Flags: approval-mozilla-release? → approval-mozilla-release-
Attachment #8955931 - Flags: approval-mozilla-release? → approval-mozilla-release-
Flags: qe-verify+
I reproduced the crash on Windows 10 x64 and Windows 7 x64 using Nightly 60.0a1(2018-03-01).

Verified fixed on Windows 10 x64 and Windows 7 x64 using the latest Nightly (2018-03-11).

I didn't manage to reproduce the crash on Ubuntu 16.04 x64 using Nightly 60.0a1(2018-03-01).
Status: RESOLVED → VERIFIED
status-firefox60: fixed → verified
Comment on attachment 8955930 [details] [diff] [review]
Don't try and send messages to PrintProgressDialogChild when printing is complete

This is the #10 content process crash on 59 so far. It's been landed for a couple weeks now with no known regressions and has been verified as fixing the crash. I think we should consider this as a dot release ride-along.
Attachment #8955930 - Flags: approval-mozilla-release- → approval-mozilla-release?
Attachment #8955931 - Flags: approval-mozilla-release- → approval-mozilla-release?
Comment on attachment 8955930 [details] [diff] [review]
Don't try and send messages to PrintProgressDialogChild when printing is complete

High-volume crash fix, verified on nightly, A+
Attachment #8955930 - Flags: approval-mozilla-release? → approval-mozilla-release+

Updated

28 days ago
Attachment #8955931 - Flags: approval-mozilla-release? → approval-mozilla-release+
Added to 59.0.2 release notes
relnote-firefox: --- → 59+
I reproduced the crash on Windows 10 x64 and Windows 7 x64 using Firefox 59.0.1

Verified fixed on Windows 10 x64 and Windows 7 x64 using Firefox 59.0.2.

I didn't manage to reproduce the crash on Ubuntu 16.04 x64 on Firefox 59.0.1.
status-firefox59: fixed → verified
Flags: qe-verify+
You need to log in before you can comment on or make changes to this bug.