Closed Bug 1432409 Opened 2 years ago Closed 2 years ago

Crash in IPCError-content | PRemotePrintJob::Msg_StatusChange Route error: message sent to unknown actor ID

Categories

(Core :: Printing: Output, defect, P3, critical)

59 Branch
All
Windows
defect

Tracking

()

VERIFIED FIXED
mozilla60
Tracking Status
firefox-esr52 --- unaffected
firefox58 --- unaffected
firefox59 + verified
firefox60 + verified

People

(Reporter: philipp, Assigned: jwatt)

References

Details

(Keywords: crash, regression, topcrash, Whiteboard: sb-)

Crash Data

Attachments

(2 files)

This bug was filed from the Socorro interface and is
report bp-3e0cdfc6-6add-4435-8bb1-0af9d0180123.
=============================================================

Top 10 frames of crashing thread:

0 xul.dll mozilla::dom::ContentChild::ProcessingError dom/ipc/ContentChild.cpp:2372
1 xul.dll mozilla::ipc::MessageChannel::MaybeHandleError ipc/glue/MessageChannel.cpp:2513
2 xul.dll mozilla::ipc::MessageChannel::DispatchMessageW ipc/glue/MessageChannel.cpp:2040
3 xul.dll mozilla::ipc::MessageChannel::RunMessage ipc/glue/MessageChannel.cpp:1886
4 xul.dll mozilla::ipc::MessageChannel::MessageTask::Run ipc/glue/MessageChannel.cpp:1919
5 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1040
6 xul.dll NS_ProcessNextEvent xpcom/threads/nsThreadUtils.cpp:517
7 xul.dll mozilla::ipc::MessagePump::Run ipc/glue/MessagePump.cpp:125
8 xul.dll mozilla::ipc::MessagePumpForChildProcess::Run ipc/glue/MessagePump.cpp:301
9 xul.dll MessageLoop::RunHandler ipc/chromium/src/base/message_loop.cc:319

=============================================================

these printing related content crashes are newly showing up in firefox 59.
Whiteboard: sb?
We don't think this is related to sandboxing changes for printing.
Whiteboard: sb? → sb-
Crash Signature: [@ IPCError-content | PPrintProgressDialog::Msg_CancelledCurrentJob Route error: message sent to unknown actor ID] → [@ IPCError-content | PPrintProgressDialog::Msg_CancelledCurrentJob Route error: message sent to unknown actor ID] [@ IPCError-browser | PRemotePrintJob::Msg_StatusChange Route error: message sent to unknown actor ID]
Ah, I can produce this bug with extension using by saveAsPDF().
STR is similar to bug 1429707.

1) Install the addon which uses saveAsPDF() function.
2) Load content
3) Call saveAsPDF() function before loading content is finished.
4) Click print button on the native print dialog which is displayed after loading content is finished.
is it possible to get to a regression range with these steps?
(i'm not able to reproduce it this way but maybe i'm missing something)
Flags: needinfo?(mantaroh)
I pinged mantaroh on IRC for more information, and I can reproduce with the Print to PDF add-on loading https://www.bbc.co.uk/ if I throttle my bandwidth to make the page load more slowly.
There seem to be at least four bugs contributing to this issue:

1. When a user tries to save-as-PDF during page load, we
   prompt them for a file name and location, generate the
   PDF and save it, all before the page has even loaded.

2. After the page loads we open a second print dialog
   asking the user if they want to print to PDF (if the
   user goes ahead, this is what we crash under). We
   should not be prompting the user with this particular
   dialog window.

3. The save-as-PDF print settings object from the second
   dialog has an empty file name.

4. When handling an error initializing the print due to
   the file name being empty, our e10s printing code
   destroys the RemotePrintJobChild then tries to send a
   notification to it (which is what is resulting in the
   crash).
Some more information on issue #4 from comment 5:

We try to initialize a print job under:

  RemotePrintJobChild::InitializePrint
  nsDeviceContextSpecProxy::BeginDocument    
  nsDeviceContext::BeginDocument    
  nsPrintJob::SetupToPrintContent
  nsPrintJob::DocumentReadyForPrinting
  nsPrintJob::AfterNetworkPrint    
  nsPrintJob::OnStateChange    
  nsDocLoader::DoFireOnStateChange    
  nsDocLoader::FireOnStateChange    
  nsDocLoader::doStopURLLoad    
  nsDocLoader::OnStopRequest    
  mozilla::net::nsLoadGroup::RemoveRequest    
  imgRequestProxy::RemoveFromLoadGroup()  
  imgRequestProxy::OnLoadComplete    
  mozilla::image::SyncNotifyInternal::__l30::<lambda>    
  mozilla::image::ImageObserverNotifier::operator() 
  mozilla::image::SyncNotifyInternal    
  mozilla::image::ProgressTracker::SyncNotifyProgress::__l2::<lambda>    
  mozilla::image::CopyOnWrite<mozilla::image::ObserverTable>::Read 
  mozilla::image::ProgressTracker::SyncNotifyProgress    
  mozilla::image::VectorImage::OnSVGDocumentLoaded()  
  mozilla::image::SVGLoadEventListener::HandleEvent    
  mozilla::EventListenerManager::HandleEventSubType    
  mozilla::EventListenerManager::HandleEventInternal    
  mozilla::EventListenerManager::HandleEvent    
  mozilla::EventTargetChainItem::HandleEvent    
  mozilla::EventTargetChainItem::HandleEventTargetChain    
  mozilla::EventDispatcher::Dispatch    
  mozilla::EventDispatcher::DispatchDOMEvent    
  nsINode::DispatchEvent    
  mozilla::AsyncEventDispatcher::Run()  
  nsThread::ProcessNextEvent    

In the parent we fail to make a PrintTarget because we return NS_ERROR_FAILURE from:

  nsLocalFile::InitWithPath
  nsDeviceContextSpecWin::MakePrintTarget
  nsDeviceContext::InitForPrinting
  mozilla::layout::RemotePrintJobParent::InitializePrintDevice
  mozilla::layout::RemotePrintJobParent::RecvInitializePrint

This happens because the passed path is empty, which is because when MakePrintTarget called mPrintSettings->GetToFileName(filename) the returned file name was empty.

When the error is returned up to RemotePrintJobParent::RecvInitializePrint, that function does two things:

 1. sends the error code to the child
 2. then calls Send__delete__ to tell the child process to
    delete its RemotePrintJobChild

A failure result is sent back to the child process resulting in mInitializationResult having the value NS_ERROR_FAILURE in the RemotePrintJobChild::InitializePrint call above, which returns that error up the stack.

We return up the stack to nsPrintJob::AfterNetworkPrint which handles the failure by calling CleanupOnFailure, under which we dispatch an nsPrintCompletionEvent:

  nsPrintCompletionEvent::nsPrintCompletionEvent 
  nsPrintJob::FirePrintCompletionEvent
  nsPrintJob::CleanupOnFailure
  nsPrintJob::AfterNetworkPrint 

After returning to the event loop we handle the second message from the parent process telling the child to delete the RemotePrintJobChild:

  mozilla::layout::RemotePrintJobChild::ActorDestroy
  mozilla::layout::PRemotePrintJobChild::DestroySubtree
  mozilla::layout::PRemotePrintJobChild::OnMessageReceived
  mozilla::dom::PContentChild::OnMessageReceived
  mozilla::dom::ContentChild::OnMessageReceived
  mozilla::ipc::MessageChannel::DispatchAsyncMessage
  mozilla::ipc::MessageChannel::DispatchMessageW
  mozilla::ipc::MessageChannel::RunMessage
  mozilla::ipc::MessageChannel::MessageTask::Run
  nsThread::ProcessNextEvent

After returning to the event loop again we handle the nsPrintCompletionEvent and try to use the destroyed RemotePrintJobChild in nsDeviceContextSpecProxy::EndDocument, causing us to crash:

  MOZ_CrashOOL    
  mozilla::ipc::LogicError    
  mozilla::layout::PRemotePrintJob::Transition    
  mozilla::layout::PRemotePrintJobChild::SendFinalizePrint()  
  nsDeviceContextSpecProxy::EndDocument()  
  nsDeviceContext::EndDocument()  
  nsPrintData::~nsPrintData()  
  [External Code]		Annotated Frame
  nsPrintData::Release()  
  mozilla::RefPtrTraits<nsPrintData>::Release    
  RefPtr<nsPrintData>::ConstRemovingRefPtrTraits<nsPrintData>::Release    
  RefPtr<nsPrintData>::assign_assuming_AddRef    
  RefPtr<nsPrintData>::operator=    
  nsPrintJob::Destroy()  
  nsDocumentViewer::OnDonePrinting()  
  nsPrintCompletionEvent::Run()  
  mozilla::SchedulerGroup::Runnable::Run()  
  nsThread::ProcessNextEvent    

needinfo'ing Bob since he may have some thoughts on this.
Flags: needinfo?(bobowencode)
(In reply to Jonathan Watt [:jwatt] (needinfo? me) from comment #6)
>   mozilla::layout::PRemotePrintJobChild::SendFinalizePrint()  

Although that seems to be the signature for bug 1429707.

mantaroh, did you really get crashes with the signatures covered by this bug using the steps to reproduce you gave? Or do the steps to reproduce belong in bug 1429707?
Currently on beta this is the #4 content process crash and the #8 crash across all processes.
Keywords: topcrash
I'm going to put up a patch to fix the problem described in comment 6 by making nsDeviceContextSpecProxy's BeginDocument,
EndDocument and AbortDocument no-ops.

However, there's yet another issue. AfterNetworkPrint not only dispatches an nsPrintCompletionEvent, but we also end up creating an event under it to post a status change event to the parent:

  MessageLoop::PostTask_Helper
  MessageLoop::PostTask
  mozilla::ipc::ProcessLink::SendMessageW
  mozilla::ipc::MessageChannel::SendMessageToLink
  mozilla::ipc::MessageChannel::Send
  mozilla::layout::PRemotePrintJobChild::SendStatusChange
  mozilla::layout::RemotePrintJobChild::OnStatusChange
  nsPrintData::DoOnStatusChange
  nsPrintJob::FirePrintingErrorEvent
  nsPrintJob::CleanupOnFailure
  nsPrintJob::AfterNetworkPrint

At the time RemotePrintJobChild::OnStatusChange is called mDestroyed is false so we don't detect the problem at that point. Once we try to process it we crash under the following stack though since there's nothing to dispatch to:

  mozilla::ipc::MessageChannel::MaybeHandleError
  mozilla::ipc::MessageChannel::DispatchAsyncMessage
  mozilla::ipc::MessageChannel::DispatchMessageW
  mozilla::ipc::MessageChannel::RunMessage
  mozilla::ipc::MessageChannel::MessageTask::Run
  nsThread::ProcessNextEvent

Although we can't use mDestroyed to detect this scenario, we can check that mInitializedResult is a success value. I'll do that in a separate patch though.
And to correlate the patches with the signatures, part 1 should fix:

[@ IPCError-content | PPrintProgressDialog::Msg_CancelledCurrentJob Route error: message sent to unknown actor ID]

although for me I couldn't get a crash for cancelling, only for "OK"ing. Making both EndDocument and AbortDocument no-ops should prevent crashes in both cases though.

Part 2 should fix:

[@ IPCError-browser | PRemotePrintJob::Msg_StatusChange Route error: message sent to unknown actor ID]
Assignee: nobody → jwatt
Requesting tracking given the high crash rate.
Comment on attachment 8952264 [details]
Bug 1432409 part 1 - Prevent nsDeviceContextSpecProxy using RemotePrintJobChild if initialization fails.

https://reviewboard.mozilla.org/r/221504/#review227318

::: commit-message-679f0:5
(Diff revision 1)
> +waits for a reply.  If the parent fails to initialize printing it will
> +send back an error message followed immediately by a second message
> +telling the child process to delete its RemotePrintJobParent.  The error

I think you mean "delete its RemotePrintJobChild" (not Parent), right?

(based on comment 6)

::: commit-message-679f0:12
(Diff revision 1)
> +telling the child process to delete its RemotePrintJobParent.  The error
> +message causes the nested event loop to terminate and blocks
> +RemotePrintJobChild::InitializePrint.  We then do various async things
> +to clean up, some of which can try to post messages to the parent
> +process's RemotePrintJobParent.  This is a problem since the delete
> +message is pending in the content process's event loop resulting in a

s/content/child/ for consistency.

(This is the only mention of "content process" here -- the rest of this message says "child process" & "parent process")

::: widget/nsDeviceContextSpecProxy.cpp:167
(Diff revision 1)
> +  if (mRemotePrintJob) {
> -  Unused << mRemotePrintJob->SendAbortPrint(NS_OK);
> +    Unused << mRemotePrintJob->SendAbortPrint(NS_OK);
> +  }
>    return NS_OK;
>  }
>  
>  NS_IMETHODIMP
>  nsDeviceContextSpecProxy::BeginPage()
>  {
>    mRecorder->OpenFD(mRemotePrintJob->GetNextPageFD());

Looks like you're guarding all of the usages of  mRemotePrintJob, *except* for this one in nsDeviceContextSpecProxy::BeginPage() & the one in ::EndPage.

Could you mention in the commit message why these ones are safe? (Maybe they're called later in the process after we're sure that the parent didn't prompt us to delete our mRemotePrintJob?)
Attachment #8952264 - Flags: review?(dholbert) → review+
Comment on attachment 8952265 [details]
Bug 1432409 part 2 - Make RemotePrintJobChild::OnStatusChange no-op if initialization failed.

https://reviewboard.mozilla.org/r/221506/#review227322

Seems reasonable. r=me
Attachment #8952265 - Flags: review?(dholbert) → review+
Thanks, Daniel. I'll fix up the comments and push.
Pushed by jwatt@jwatt.org:
https://hg.mozilla.org/integration/mozilla-inbound/rev/bd244079d11a
part 1 - Prevent nsDeviceContextSpecProxy using RemotePrintJobChild if initialization fails. r=dholbert
https://hg.mozilla.org/integration/mozilla-inbound/rev/9cd1539c04f9
part 2 - Make RemotePrintJobChild::OnStatusChange no-op if initialization failed. r=dholbert
Blocks: 499025
Comment on attachment 8952264 [details]
Bug 1432409 part 1 - Prevent nsDeviceContextSpecProxy using RemotePrintJobChild if initialization fails.

Approval Request Comment
[Feature/Bug causing the regression]: unknown
[User impact if declined]: high beta crash rate will uplift to release
[Is this code covered by automated tests?]: there are some printing tests
[Has the fix been verified in Nightly?]: it's stuck on m-i (requesting approval early to try to make b12)
[Needs manual test from QE? If yes, steps to reproduce]: probably useful - STR in comment 2 using https://www.bbc.co.uk/ 
[List of other uplifts needed for the feature/fix]: none
[Is the change risky?]: low/medium
[Why is the change risky/not risky?]: stops us trying to access a destroyed object and crashing. As best I can tell the rest of the code flow after that point should be safe.
[String changes made/needed]: none
Attachment #8952264 - Flags: approval-mozilla-beta?
Attachment #8952265 - Flags: approval-mozilla-beta?
FWIW the changes made here may fix some of our other print crashes (the stacks for some other bugs seem related).
https://hg.mozilla.org/mozilla-central/rev/bd244079d11a
https://hg.mozilla.org/mozilla-central/rev/9cd1539c04f9
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla60
[ Triage 2017/02/20: P3 ]
Priority: -- → P3
Comment on attachment 8952264 [details]
Bug 1432409 part 1 - Prevent nsDeviceContextSpecProxy using RemotePrintJobChild if initialization fails.

Fix for a top crash in beta, let's take this for beta 12.
Attachment #8952264 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #8952265 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Duplicate of this bug: 1429707
Crash Signature: [@ IPCError-content | PPrintProgressDialog::Msg_CancelledCurrentJob Route error: message sent to unknown actor ID] [@ IPCError-browser | PRemotePrintJob::Msg_StatusChange Route error: message sent to unknown actor ID] → [@ IPCError-content | PPrintProgressDialog::Msg_CancelledCurrentJob Route error: message sent to unknown actor ID] [@ IPCError-browser | PRemotePrintJob::Msg_StatusChange Route error: message sent to unknown actor ID] [@ mozilla::ipc::LogicError | mozil…
Flags: needinfo?(jwatt)
Flags: needinfo?(mantaroh)
From the 59b12 reports so far, it looks like the patches here fixed the PRemotePrintJob::Msg_StatusChange crash but not the PPrintProgressDialog::Msg_CancelledCurrentJob crash. :-/
Hello,
I've tried reproducing this issue but to no avail. The report here (https://goo.gl/TWrxeT) shows us that most of the crashes were reported on Windows 7 using a Beta 59.0b11 build and the STRs say that I should use an "addon which uses saveAsPDF() function". I've installed "Print to PDF" and tried saving the www.bbc.com page as a pdf while it was loading multiple times but couldn't reproduce the crash. Also tried on Windows 10 x64 on a an older 58.0a1 build.

Mantaroh, can you please help to verify this issue?
Flags: needinfo?(mantaroh)
(In reply to Jonathan Watt [:jwatt] (needinfo? me) from comment #26)
> From the 59b12 reports so far, it looks like the patches here fixed the
> PRemotePrintJob::Msg_StatusChange crash but not the
> PPrintProgressDialog::Msg_CancelledCurrentJob crash. :-/

i've filed bug 1441598 as a followup for the remaining signature.
Thanks!
Crash Signature: [@ IPCError-content | PPrintProgressDialog::Msg_CancelledCurrentJob Route error: message sent to unknown actor ID] [@ IPCError-browser | PRemotePrintJob::Msg_StatusChange Route error: message sent to unknown actor ID] [@ mozilla::ipc::LogicError | mozil… → [@ IPCError-browser | PRemotePrintJob::Msg_StatusChange Route error: message sent to unknown actor ID] [@ mozilla::ipc::LogicError | mozilla::layout::PRemotePrintJobChild::SendFinalizePrint]
Flags: needinfo?(jwatt)
Summary: Crash in IPCError-content | PPrintProgressDialog::Msg_CancelledCurrentJob Route error: message sent to unknown actor ID → Crash in IPCError-content | PRemotePrintJob::Msg_StatusChange Route error: message sent to unknown actor ID
(In reply to Alexandru Simonca, QA (:asimonca) from comment #27)
> Hello,
> I've tried reproducing this issue but to no avail. The report here
> (https://goo.gl/TWrxeT) shows us that most of the crashes were reported on
> Windows 7 using a Beta 59.0b11 build and the STRs say that I should use an
> "addon which uses saveAsPDF() function". I've installed "Print to PDF" and
> tried saving the www.bbc.com page as a pdf while it was loading multiple
> times but couldn't reproduce the crash. Also tried on Windows 10 x64 on a an
> older 58.0a1 build.
> 
> Mantaroh, can you please help to verify this issue?

Hi Alexandru,
I can reproduce on my environment(win10 + Beta 59.0b11). 
STR:
 1) Install the 'Print to PDF'.(https://addons.mozilla.org/en-US/firefox/addon/print-to-pdf-document/)
 2) Go to the 'bbc.com'
 3) Push 'Print to PDF' while loading the document and click the 'save' button of the dialog.
 4) Push 'OK' button on the print dialog. (This dialog is shown after step 3))

Capture is as follow:
https://drive.google.com/file/d/1L4VRhwOZlU-aUw6ftY8h_aYdlSXsPUyl/view?usp=sharing

After updating the beta 59.0b13, I confirmed that this bug is fixed.(i.e. gecko doesn't crash after step 4)
Flags: needinfo?(mantaroh)
Hello,

Based on Comment #30 and my own assessment of the issue and the 60.0a1 build I can say that this is VERIFIED FIXED in:
- 59.0b13
- 60.0a1 (id: 20180227220155)
Status: RESOLVED → VERIFIED
Flags: qe-verify+
Flags: needinfo?(bobowencode)
See Also: → 1449120
You need to log in before you can comment on or make changes to this bug.