Closed Bug 1141756 Opened 9 years ago Closed 7 years ago

Print-preview crashes Firefox on Windows, when the default printer is an unavailable networked printer

Categories

(Core :: Print Preview, defect)

36 Branch
All
Windows
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla54
Tracking Status
firefox51 --- wontfix
firefox52 --- wontfix
firefox-esr52 --- fixed
firefox53 --- fixed
firefox54 --- fixed

People

(Reporter: benjamin.lerner, Assigned: bobowen)

Details

(Keywords: crash, qawanted, regression)

Crash Data

Attachments

(2 files)

User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:36.0) Gecko/20100101 Firefox/36.0
Build ID: 20150305021524

Steps to reproduce:

On Win7x64, I have a networked printer installed.  When I'm on the network where the printer exists, if I go to File -> Print Preview, everything works fine.  However, when I take the laptop home, the printer is no longer available.  When I go to File -> Print Preview, Firefox crashes.



Expected results:

I haven't triaged the system carefully to see whether setting a local printer (e.g. Adobe PDF) as default would avoid this problem yet.  I also don't see where the dependency is between actual printers and print preview (maybe in getting page dimensions?), but it shouldn't be enough to crash Firefox!
Do you have a crash ID? https://developer.mozilla.org/en-US/docs/How_to_get_a_stacktrace_for_a_bug_report
Flags: needinfo?(benjamin.lerner)
I just triggered another crash, and got a crash id of bp-5a5419fc-cd51-42f1-a9a2-0bf192150404.
Flags: needinfo?(benjamin.lerner)
BTW, this triggers on Fx37.0 now, as well as the 36.x that I had when initially reporting this bug.
Thanks!

Mike (you fixed the most bugs in printing components in the last months), can you figure anything from the stacktrace? It looks useful.
Severity: normal → critical
Crash Signature: [@ nsPrintEngine::GetSeqFrameAndCountPagesInternal(nsPrintObject*, nsIFrame*&, int&)]
Component: Untriaged → Print Preview
Flags: needinfo?(mconley)
Keywords: crash
Product: Firefox → Core
Reading this, I suspect mPresShell is nullptr at [1], so we're doing a null dereference.

Can somebody from QA confirm this?

[1]: http://hg.mozilla.org/releases/mozilla-release/annotate/29182ac68a26/layout/printing/nsPrintEngine.cpp#l363
Flags: needinfo?(mconley)
Keywords: qawanted
I've tried to reproduce this crash using a Microsoft Surface Pro 2 device running Windows 8.1 64-bit.
I could not crash it but Firefox freezes for about 20 seconds after entering "Print Preview" with a secondary connection, where the network Printer is not installed.
I've used Wireless connections.
Tested on:
- latest Firefox Nightly, build ID: 20150504030210;
- Firefox 38 beta 8, build ID: 20150426174329;
- Firefox 37.0.2, build ID: 20150415140819.
Crash Signature: [@ nsPrintEngine::GetSeqFrameAndCountPagesInternal(nsPrintObject*, nsIFrame*&, int&)] → [@ nsPrintEngine::GetSeqFrameAndCountPagesInternal(nsPrintObject*, nsIFrame*&, int&)] [@ nsPrintEngine::GetSeqFrameAndCountPagesInternal]
this crash signature is spiking up again since the 52.0 cycle as a crash in the content process.
could you take a look?
Status: UNCONFIRMED → NEW
Ever confirmed: true
Flags: needinfo?(bobowencode)
OS: Windows 7 → Windows
Hardware: x86_64 → All
(In reply to [:philipp] from comment #7)
> this crash signature is spiking up again since the 52.0 cycle as a crash in
> the content process.
> could you take a look?

I think that this has spiked a little, because we are now accessing the printer in the parent for print preview as well (because of an issue with a Print Driver that was attempting to copy files and being blocked by the sandbox bug 1324064).
This means we are spinning the event loop at this point and getting a similar crash in the child.
Looks like it was still happening before that change just very rarely, presumably normally things are getting set up before the Printing:Preview:UpdatePageCount message.

The original crash looks like it was happening as a modal dialog was displayed (probably a print error one I guess) and we were spinning the event loop underneath that, which was trying to process this Printing:Preview:UpdatePageCount message.

I can't reproduce, but I could put a null check on mPressShell as identified by mconley in comment 5.
I'm not sure what the knock effect of that failure would be though.

One question is, why are we calling for Printing:Preview:UpdatePageCount too early?
mconley - any idea on the above question.
Flags: needinfo?(bobowencode) → needinfo?(mconley)
One other thing I just noticed is that we're not removing the message listener correctly [1].
That's the wrong name.
It's this listener that ends up sending the Printing:Preview:UpdatePageCount message, so I wonder if that could be a possible cause.
Although I can't work out exactly how. :-(

[1] https://hg.mozilla.org/mozilla-central/file/7ef1e9abd296/toolkit/components/printing/content/printUtils.js#l564
I'm going to add that null check and fix the message listener removal.
Assignee: nobody → bobowencode
Status: NEW → ASSIGNED
(In reply to Bob Owen (:bobowen) from comment #9)
> One other thing I just noticed is that we're not removing the message
> listener correctly [1].
> That's the wrong name.

Good find!  Chronologically, that totally makes sense as a cause for this bug, I think.  Bug 1082575 added that faulty line of source code, back in October 2014 for Firefox 36.  And then several months later, this bug here was filed, *for Firefox 36*.  Very suspicious. :)

(In reply to Bob Owen (:bobowen) from comment #8)
> One question is, why are we calling for Printing:Preview:UpdatePageCount too
> early?

I would actually bet we're not calling it too *early* -- rather, we're calling it too *late*!

We do eventually set this mPresShell pointer to null, here:
https://dxr.mozilla.org/mozilla-central/rev/7ef1e9abd296a8edc39b7efc8d637767ba2f77ed/layout/printing/nsPrintObject.cpp#107
Given your discovery about failure-to-unregister, I'll bet that our null-deref might be a result of that clearing. (combined with an unexpected late-breaking call into GetSeqFrameAndCountPagesInternal)

I suspect we shouldn't bother with Part 1, and we should just optimistically take part 2 and watch for this crash signature to disappear, to confirm that it's really fixed.  (And then if needed, we could take part 1 later on, and/or do some more investigation to find out why the crash might be hypothetically persisting at that point.)

What do you think?
Flags: needinfo?(bobowencode)
(In reply to Daniel Holbert [:dholbert] from comment #14)
> (In reply to Bob Owen (:bobowen) from comment #9)

> I would actually bet we're not calling it too *early* -- rather, we're
> calling it too *late*!

I'm not sure.
In this crash [1], we're in the middle of the print preview setup in DoCommonPrint (where we're now using the ShowPrintDialog call as a hack to get the print settings in the parent and spinning the event loop as a result).
As far as I can see we shouldn't have fired the printPreviewUpdate event by this point, although trying to trace routes through this code is not at all easy.
So my thoughts were maybe we were picking up one caused by a previous print preview or something, but I don't see how that could happen either.

Of course that might be a special case and as you say we're hitting this after the pres shell has been destroyed in most cases.
 
> I suspect we shouldn't bother with Part 1, and we should just optimistically
> take part 2 and watch for this crash signature to disappear, to confirm that
> it's really fixed.  (And then if needed, we could take part 1 later on,
> and/or do some more investigation to find out why the crash might be
> hypothetically persisting at that point.)
> 
> What do you think?

I'm happy to go with that, if you're fairly convinced it will fix this.

[1] https://crash-stats.mozilla.com/report/index/3968adc7-8055-458d-9045-6406a2170226
Flags: needinfo?(bobowencode)
(In reply to Bob Owen (:bobowen) from comment #15)
> (In reply to Daniel Holbert [:dholbert] from comment #14)
> > (In reply to Bob Owen (:bobowen) from comment #9)

> Of course that might be a special case and as you say we're hitting this
> after the pres shell has been destroyed in most cases.

Thinking about it that doesn't make sense if, as I'm fairly sure, this has spiked because of that change to get the print settings in the parent and therefore the event loop spin.
Comment on attachment 8841648 [details] [diff] [review]
Part 1: Add null check for mPressShell in nsPrintEngine::GetSeqFrameAndCountPagesInternal

Review of attachment 8841648 [details] [diff] [review]:
-----------------------------------------------------------------

It sounds like you're not-at-all-sure that Part 2 will fix this on its own -- if that's the case, then I'm probably-OK r+'ing Part 1, with these nits.

First, a commit message nit:
> Bug 1141756 Part 1: Add null check for mPressShell in nsPrintEngine::GetSeqFrameAndCountPagesInternal.

Typo: s/mPressShell/mPresShell/
(there should only be one lowercase "s", not 2)

::: layout/printing/nsPrintEngine.cpp
@@ +334,5 @@
> +  // This is sometimes incorrectly called before the pres shell has been created
> +  // (bug 1141756). MOZ_DIAGNOSTIC_ASSERT so we'll still see the crash in
> +  // Nightly/Aurora in case the other patch fixes this.
> +  if (!aPO->mPresShell) {
> +    MOZ_DIAGNOSTIC_ASSERT(false);

Ideally, assertions should always be accompanied with some explanatory text about what and/or why we're asserting, so that we print something more useful than the condition ("false" in this case) to the terminal when it fails.

So please change this to:
  MOZ_DIAGNOSTIC_ASSERT(false,
                        "GetSeqFrameAndCountPages needs a non-null pres shell");

@@ +335,5 @@
> +  // (bug 1141756). MOZ_DIAGNOSTIC_ASSERT so we'll still see the crash in
> +  // Nightly/Aurora in case the other patch fixes this.
> +  if (!aPO->mPresShell) {
> +    MOZ_DIAGNOSTIC_ASSERT(false);
> +    return NS_ERROR_FAILURE;

Do you know what happens in the callers when you return NS_ERROR_FAILURE from this method?

Before landing this, please double check that some reasonable fallback behavior happens when you do this -- i.e. let's be sure you're not just making release users trip over an even-worse problem.  (This is hard to test reliably since IIUC we can't reproduce this problem, but you could approximate the experience by just adding a new unconditional "return NS_ERROR_FAILURE" in this method, and see what happens when you trigger this code.)
Attachment #8841648 - Flags: review?(dholbert) → review+
Comment on attachment 8841649 [details] [diff] [review]
Part 2: Correct removal of Printing:Preview:Entered message listener

Review of attachment 8841649 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/printing/content/printUtils.js
@@ +560,5 @@
>        mm.addMessageListener("Printing:Preview:ProgressChange", this);
>      }
>  
>      let onEntered = (message) => {
> +      mm.removeMessageListener("Printing:Preview:Entered", onEntered);

Ooof. :/ Good catch!
Attachment #8841649 - Flags: review?(mconley) → review+
I'm afraid I don't have much to add about Printing:Preview:UpdatePageCount. I'm as mystified as you are - but let's see what happens when we land these changes.
Flags: needinfo?(mconley)
Thanks both for reviews.

(In reply to Daniel Holbert [:dholbert] from comment #17)
> Comment on attachment 8841648 [details] [diff] [review]

> First, a commit message nit:
> > Bug 1141756 Part 1: Add null check for mPressShell in nsPrintEngine::GetSeqFrameAndCountPagesInternal.
> 
> Typo: s/mPressShell/mPresShell/
> (there should only be one lowercase "s", not 2)

Fixed locally, thanks.

> So please change this to:
>   MOZ_DIAGNOSTIC_ASSERT(false,
>                         "GetSeqFrameAndCountPages needs a non-null pres
> shell");

Added locally.

> @@ +335,5 @@
> > +  // (bug 1141756). MOZ_DIAGNOSTIC_ASSERT so we'll still see the crash in
> > +  // Nightly/Aurora in case the other patch fixes this.
> > +  if (!aPO->mPresShell) {
> > +    MOZ_DIAGNOSTIC_ASSERT(false);
> > +    return NS_ERROR_FAILURE;
> 
> Do you know what happens in the callers when you return NS_ERROR_FAILURE
> from this method?

The page count in the print preview doesn't get updated at all and so the page navigation buttons in the toolbar don't work, but the preview is displayed and can be scrolled fine.
It also prints properly.

If I change it so just the first one fails, which is sort of closer to what I think we might be seeing, then if you do anything that changes the page count like scaling, orientation or page setup, the correct value gets set and the buttons start working.
In theory if we are getting the message too early, then another will be sent when we enter print preview proper and so the correct value will get set.

Even in the worst case above, I think this is better than crashing the content process.
Pushed by bobowencode@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/fb5e539b6624
Part 1: Add null check for mPresShell in nsPrintEngine::GetSeqFrameAndCountPagesInternal. r=dholbert
https://hg.mozilla.org/integration/mozilla-inbound/rev/467c4bdf5110
Part 2: Correct removal of Printing:Preview:Entered message listener. r=mconley
Too late for Fx52, but we should probably consider uplifting this for Fx53 at least.
Flags: needinfo?(bobowencode)
Comment on attachment 8841648 [details] [diff] [review]
Part 1: Add null check for mPressShell in nsPrintEngine::GetSeqFrameAndCountPagesInternal

As we're very close to merge day this will probably turn into a Beta uplift request.

Approval Request Comment
[Feature/Bug causing the regression]:
Possibly bug 1082575, but we're not sure of the root cause.
It has been made worse by the fix for bug 1324064.

[User impact if declined]:
Users sometimes experiencing crashes when entering print preview.

[Is this code covered by automated tests?]:
No

[Has the fix been verified in Nightly?]:
We don't have specific STR, so we'll have to watch crash-stats to see if it is fixed.
It will still crash in Nightly and Aurora, so we'll be able to see if the second patch fixes the root cause.

[Needs manual test from QE? If yes, steps to reproduce]:
No STR unfortunately.

[List of other uplifts needed for the feature/fix]:
Just the other patch in this bug.

[Is the change risky?]:
Very slightly.

[Why is the change risky/not risky?]:
The changes are simple, but we are returning a failure for the page count call, so in some circumstances this could cause the page number navigation buttons on the print preview toolbar to not work correctly.
The alternative is to crash the content process, so this is not really much of a risk.
This is explained in a little more detail in comment 20, as well as the simulated failure testing I did, even though we can't reproduce.

[String changes made/needed]:
No
Flags: needinfo?(bobowencode)
Attachment #8841648 - Flags: approval-mozilla-aurora?
Comment on attachment 8841649 [details] [diff] [review]
Part 2: Correct removal of Printing:Preview:Entered message listener

See comment 24.
Attachment #8841649 - Flags: approval-mozilla-aurora?
Comment on attachment 8841648 [details] [diff] [review]
Part 1: Add null check for mPressShell in nsPrintEngine::GetSeqFrameAndCountPagesInternal

Let's see if we can avoid this crash, uplift to aurora 53.
Attachment #8841648 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Attachment #8841649 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment on attachment 8841648 [details] [diff] [review]
Part 1: Add null check for mPressShell in nsPrintEngine::GetSeqFrameAndCountPagesInternal

[Approval Request Comment]
If this is not a sec:{high,crit} bug, please state case for ESR consideration:
This is a simple fix and the crash seems to be quite common on ESR52.

User impact if declined: 
Users sometimes experiencing crashes when entering print preview.

Fix Landed on Version:
Fx54 - uplifted to Fx53.

Risk to taking this patch (and alternatives if risky):
Low.

String or UUID changes made by this patch:
None.
Attachment #8841648 - Flags: approval-mozilla-esr52?
Comment on attachment 8841649 [details] [diff] [review]
Part 2: Correct removal of Printing:Preview:Entered message listener

See comment 28.
Attachment #8841649 - Flags: approval-mozilla-esr52?
Comment on attachment 8841648 [details] [diff] [review]
Part 1: Add null check for mPressShell in nsPrintEngine::GetSeqFrameAndCountPagesInternal

fix a crash in print-preview, esr52+
Attachment #8841648 - Flags: approval-mozilla-esr52? → approval-mozilla-esr52+
Attachment #8841649 - Flags: approval-mozilla-esr52? → approval-mozilla-esr52+

Bugbug thinks this bug is a regression, but please revert this change in case of error.

Keywords: regression
You need to log in before you can comment on or make changes to this bug.