Open Bug 1273765 Opened 4 years ago Updated 2 years ago

GetTextMetrics failure in _cairo_win32_scaled_font_set_metrics, when printing specific PDF

Categories

(Core :: Printing: Output, defect, critical)

x86
Windows 8
defect
Not set
critical

Tracking

()

Tracking Status
firefox46 --- affected
firefox47 --- affected
firefox48 --- affected
firefox49 --- affected

People

(Reporter: njn, Unassigned)

References

Details

(Keywords: crash)

Crash Data

Attachments

(1 file)

This bug was filed from the Socorro interface and is 
report bp-246a432d-f35c-4593-9945-054542160517.
=============================================================

This is a low-volume crash. I see 22 occurrences in Firefox this year, mostly in 47--49.

https://crash-stats.mozilla.com/signature/?product=Firefox&platform=Windows&date=%3E%3D2016-01-01&signature=mozilla%3A%3Agfx%3A%3ARecordedSetTransform%3A%3APlayEvent&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&page=1

I looked at a minidump and confirmed that the problem is that aTranslator->LookupDrawTarget(mDT) is returning nullptr. This clearly isn't supposed to happen, because PrintTranslator::LookupDrawTarget() has MOZ_ASSERT(result), as do all the other Lookup methods in that class. But it is, so somehow the mDrawTargets.GetWeak() call is returning null.

bobowen, you've touched printing code recently. Can you please take a look?
Flags: needinfo?(bobowen.code)
Thanks njn.
Printing via the parent, which uses this code, was briefly in Aurora 47, but is now Nightly only until we roll out the low integrity sandbox on content for Windows.
So, those crash versions make sense.

I'll try and take a look soon.
Whiteboard: sb?
Whiteboard: sb? → sbwc1
This seems really low level, but I've actually managed to reproduce it, so I'll take a closer look soon.

https://crash-stats.mozilla.com/report/index/7896e4c3-0718-4477-8215-2976a2160630
https://crash-stats.mozilla.com/report/index/907a63ed-3953-455f-b622-ffea22160630

One of these is a duplicate I think, not sure how I managed that.
Assignee: nobody → bobowen.code
Flags: needinfo?(bobowen.code)
This is failing because in the creation event for that DrawTarget the check at [1] is failing.

The status for mCairoSurface is CAIRO_STATUS_NO_MEMORY (1), which is a bit odd because although the child process seems to be eating lots of memory (a known problem with printing PDFs I think), the parent process looks OK.

[1] http://searchfox.org/mozilla-central/rev/a7c8e9f3cc323fd707659175a46826ad12899cd1/gfx/thebes/PrintTarget.cpp#55
Status: NEW → ASSIGNED
This crash can be reproduced by printing pages 109 to 110 of:
http://www.cisco.com/c/en/us/td/docs/routers/7200/install_and_upgrade/7200vxr_install_config/72vxicg.pdf

The root cause of the CAIRO_STATUS_NO_MEMORY is from [1], which seems like an odd choice of default error to me.
This is called at [2] (I've also seen it called at [3] while debugging).

If I change this to return CAIRO_INT_STATUS_UNSUPPORTED and update _cairo_font_face_set_error and _cairo_scaled_font_set_error to ignore this, then I think the fallback code at [4] gets used and certainly the crash does not occur.

Here's a try push:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=deb328082d000c2d6bb9b99afc126738ef5156a6

This problem exists on release, but the printing code there can pick up on the error and it fails with an error dialog saying "There is not enough free memory to print.".

I'll file a separate bug over the issue that RecordedEvent::PlayEvent doesn't pass back a result to the Translator, so that errors like these could be handled without a crash.

[1] https://hg.mozilla.org/mozilla-central/file/1bee8d2da23e/gfx/cairo/cairo/src/cairo-win32-surface.c#l141
[2] https://hg.mozilla.org/mozilla-central/file/1bee8d2da23e/gfx/cairo/cairo/src/cairo-win32-font.c#l976
[3] https://hg.mozilla.org/mozilla-central/file/1bee8d2da23e/gfx/cairo/cairo/src/cairo-win32-font.c#l458
[4] https://hg.mozilla.org/mozilla-central/file/1bee8d2da23e/gfx/cairo/cairo/src/cairo-surface.c#l2812
See Also: → 1285942
Also:
* changed cairo font *_set_error functions to ignore CAIRO_INT_STATUS_UNSUPPORTED.
* fixed error print format in _cairo_win32_print_gdi_error.

MozReview-Commit-ID: 759hEFLd0vX
Attachment #8769686 - Flags: review?(jfkthame)
(In reply to Bob Owen (:bobowen) from comment #4)
> This crash can be reproduced by printing pages 109 to 110 of:
> http://www.cisco.com/c/en/us/td/docs/routers/7200/install_and_upgrade/
> 7200vxr_install_config/72vxicg.pdf
> 
> The root cause of the  is from [1], which seems like
> an odd choice of default error to me.
> This is called at [2] (I've also seen it called at [3] while debugging).

It does seem odd to return CAIRO_STATUS_NO_MEMORY for whatever GDI errors are happening; but what *are* the GDI failures? ISTM we should figure out why they're happening, and whether they indicate some deeper problem. If you can reproduce this in a debug build, you should be seeing the GDI errors reported to stderr by _cairo_win32_print_gdi_error(), afaict... so what are those errors?

> 
> If I change this to return CAIRO_INT_STATUS_UNSUPPORTED and update
> _cairo_font_face_set_error and _cairo_scaled_font_set_error to ignore this,
> then I think the fallback code at [4] gets used and certainly the crash does
> not occur.

Another concern here is that surely we shouldn't be crashing even if PrintTarget::MakeDrawTarget() returns null; it looks like it has several failure modes that return null, and whoever calls it needs to be prepared to handle that safely.


(I'm not saying the patch here is necessarily the wrong thing to do; the current failure handling does look weird. But before we do this, I'd like to understand better what's actually happening, as otherwise I fear we might just be papering over the real problem.)
(In reply to Jonathan Kew (:jfkthame) from comment #6)

> > The root cause of the  is from [1], which seems like
> > an odd choice of default error to me.
> > This is called at [2] (I've also seen it called at [3] while debugging).
> 
> It does seem odd to return CAIRO_STATUS_NO_MEMORY for whatever GDI errors
> are happening; but what *are* the GDI failures? ISTM we should figure out
> why they're happening, and whether they indicate some deeper problem. If you
> can reproduce this in a debug build, you should be seeing the GDI errors
> reported to stderr by _cairo_win32_print_gdi_error(), afaict... so what are
> those errors?

These fprintfs are not guarded, so they come out in opt builds as well ... perhaps they shouldn't.

In this particular crash the error we get is 6 - "The handle is invalid."
I couldn't work out why this was happening.
It is something to do with when the DWrite fonts get substituted for GDI fonts (to get PDF printing to work properly).
Following that those GDI fonts are substituted again, because the surface has a ctm at [1].

I can try to investigate this a bit further, but I thought that once I'd stopped it causing an issue in those other *_set_error functions, returning CAIRO_INT_STATUS_UNSUPPORTED would be no worse than CAIRO_STATUS_NO_MEMORY and in some cases is definitely an improvement.

Weirdly, once I fixed the error printing statement, in the crashtest logs for the try push in comment 4, there are a load of "The operation completed successfully.", which is 0.
Seems like some of these functions don't set a last error even when they fail.

> Another concern here is that surely we shouldn't be crashing even if
> PrintTarget::MakeDrawTarget() returns null; it looks like it has several
> failure modes that return null, and whoever calls it needs to be prepared to
> handle that safely.

Agreed, I've filed bug 1285942 for this.

[1] https://hg.mozilla.org/mozilla-central/file/214884d507ee/gfx/cairo/cairo/src/cairo-win32-printing-surface.c#l1522
(In reply to Bob Owen (:bobowen) from comment #7)
> In this particular crash the error we get is 6 - "The handle is invalid."
> I couldn't work out why this was happening.
> It is something to do with when the DWrite fonts get substituted for GDI
> fonts (to get PDF printing to work properly).
> Following that those GDI fonts are substituted again, because the surface
> has a ctm at [1].

Here's a guess... might it be related to use of fonts that were loaded via @font-face? The PDF has embedded fonts; when we render it, I presume we use @font-face to load those resources and make them available for rendering.

But if we load a user font resource through the DWrite font backend (i.e. via gfxDWriteFontList::MakePlatformFont, which works via the IDWriteFontFileLoader interface), I doubt that will make it available to the GDI font subsystem; that would require us to activate the resource with AddFontMemResource, as in gfxGDIFontList::MakePlatformFont.

In other words, I'm suspicious that it simply doesn't work to substitute GDI fonts for DWrite fonts on the fly when we're dealing with @font-face resources. Maybe that's how we're ending up with invalid resource handles?
Comment on attachment 8769686 [details] [diff] [review]
Return CAIRO_INT_STATUS_UNSUPPORTED from _cairo_win32_print_gdi_error to allow for fallback

As far as the actual cairo patch here is concerned, jrmuizel might be a better reviewer; I think he has much more in-depth cairo background, and could assess whether this is the appropriate cairo-esque approach to handling the failures.

And as for the root cause of the problems, can you determine whether it is in fact related to user font resources failing to "cross over" between dwrite and gdi, as suggested above? If that's the case, then even if we prevent the crash (which is a good start, obviously) we may still not be getting correct output, as we'll probably still end up using a wrong font.
Attachment #8769686 - Flags: review?(jfkthame) → review?(jmuizelaar)
(In reply to Jonathan Kew (:jfkthame) from comment #9)

> And as for the root cause of the problems, can you determine whether it is
> in fact related to user font resources failing to "cross over" between
> dwrite and gdi, as suggested above? If that's the case, then even if we
> prevent the crash (which is a good start, obviously) we may still not be
> getting correct output, as we'll probably still end up using a wrong font.

This might be part of the problem, but wouldn't any web fonts fail to print in that case.

I've investigated a bit further and it seems that the original failure is at [1].
We don't check the return value here, so we're left with 0 for metrics.tmPitchAndFamily, which means the is_bitmap flag gets set further down.

I notice that the latest cairo code has a check on GetTextMetrics (although not the one below), but if I just add that bit of code it crashes somewhere else in cairo, because another bit of the code doesn't check to see if the scaled_font is valid.

If we always want to use GDI and not DWrite fonts for printing, then I suppose I might be able to make that switch in the PrintTranslator.

[1] https://hg.mozilla.org/mozilla-central/file/94c926911767/gfx/cairo/cairo/src/cairo-win32-font.c#l895
(In reply to Bob Owen (:bobowen) from comment #10)
> 
> I've investigated a bit further and it seems that the original failure is at
> [1].
> We don't check the return value here, so we're left with 0 for
> metrics.tmPitchAndFamily, which means the is_bitmap flag gets set further
> down.

Can we find out what font GetTextMetrics is failing on?
Flags: needinfo?(bobowen.code)
I'll look into this again when I get back.

Now that bug 1285942 has landed this shouldn't crash any more, so I'll remove the sandbox tracking.
Keywords: crash
Summary: Crash in mozilla::gfx::RecordedSetTransform::PlayEvent → GetTextMetrics failure in _cairo_win32_scaled_font_set_metrics, when printing specific PDF
Whiteboard: sbwc1
The crash was resolved separately, but as I'd got a fair way towards finding the underlying (possibly long-standing) bug, I'd like to re-look at this at some point.
De-assigning myself for the moment as I don't have time right now.
Assignee: bobowencode → nobody
Status: ASSIGNED → NEW
Flags: needinfo?(bobowencode)
Comment on attachment 8769686 [details] [diff] [review]
Return CAIRO_INT_STATUS_UNSUPPORTED from _cairo_win32_print_gdi_error to allow for fallback

Clearing this request until I (or someone else) has had time to look at the underlying failure.
Attachment #8769686 - Flags: review?(jmuizelaar)
It's intermittent for me, so finding a regression range is quite hard.
the signature is spiking up again in the past few days on 55.0.3 - some comments talk about printing from irs.gov...
Duplicate of this bug: 1414936
I can reproduce the crash on Nightly59.0a1 x64 windows10.
bp-0b61c4cb-8ee1-4d06-9369-002700171222

Reproducible: always

Steps To Reproduce:
1. Open twitter 
        e.g. https://twitter.com/d_toybox  not need login
2. Print with Microsoft Print to PDF. 
3. Input file name, then click on Save button

Actual Results:
Browser crashes
Keywords: crash, reproducible
(In reply to Alice0775 White from comment #19)
> I can reproduce the crash on Nightly59.0a1 x64 windows10.
> bp-0b61c4cb-8ee1-4d06-9369-002700171222
> 
> Reproducible: always
> 
> Steps To Reproduce:
> 1. Open twitter 
>         e.g. https://twitter.com/d_toybox  not need login
> 2. Print with Microsoft Print to PDF. 
> 3. Input file name, then click on Save button
> 
> Actual Results:
> Browser crashes


I filed a separate a new Bug 1426807. Because the regression range is different.
You need to log in before you can comment on or make changes to this bug.