Closed Bug 1922063 Opened 2 months ago Closed 2 months ago

Text layer and canvas don't always match in the pdf viewer on Windows

Categories

(Firefox :: PDF Viewer, defect)

defect

Tracking

()

RESOLVED FIXED
133 Branch
Tracking Status
firefox-esr128 --- unaffected
firefox131 --- unaffected
firefox132 --- unaffected
firefox133 --- fixed

People

(Reporter: calixte, Assigned: jfkthame)

References

(Regression)

Details

(Keywords: regression)

Attachments

(3 files)

Today the pdf.js windows tests shew some failures (see [1]) because of some mismatches between the text layer and the canvas where a page is rendered.
This text layer, on top of the canvas, is here for the text selection and for highlighting search results, so it's really a bad bug.

In using mozregression on the bot, I found that:
https://bugzilla.mozilla.org/show_bug.cgi?id=1919906
caused this regression.
The bot is running on Windows Server 2016 Datacenter.

The text layer is composed of spans and they're rescaled in order to fit the corresponding text bounding box. In order to compute the scale factor, the text is measured thanks to canvas::measureText:
https://github.com/mozilla/pdf.js/blob/3cdc3258d94a06af658e50290bf709426adf44b8/src/display/text_layer.js#L403-L420

[1] http://54.193.163.58:8877/5e6e7c3a9150063/reftest-analyzer.html#web=eq.log

Flags: needinfo?(jfkthame)

Set release status flags based on info from the regressing bug 1919906

Attached image image.png

Finally I can reproduce the issue locally on Windows 11: it depends on the zoom level.
It's reproducible in selecting some text https://github.com/mozilla/pdf.js/blob/master/test/pdfs/tracemonkey.pdf and with a zoom level set at 70% (my devicePixelRatio is 1.5).
Would it be possible to enforce allowForceGDIClassic set to true in the pdf.js case ?

(In reply to Calixte Denizet (:calixte) from comment #2)

Finally I can reproduce the issue locally on Windows 11: it depends on the zoom level.
It's reproducible in selecting some text https://github.com/mozilla/pdf.js/blob/master/test/pdfs/tracemonkey.pdf and with a zoom level set at 70% (my devicePixelRatio is 1.5).

That's as expected, because the force-gdi pref has a size threshold, so at larger (device-pixel) sizes it has never been involved.

Would it be possible to enforce allowForceGDIClassic set to true in the pdf.js case ?

Yeah, I guess we could do that; we have code to check whether we're dealing with a pdf.js document. This whole thing is kinda hackish anyhow, but we can have the canvas context check whether it's being used for pdf.js, and if so, continue to allow the force-gdi setting. That should restore the previous behavior; and pdf.js doesn't care about the potential canvas2d measurement vs rendering-time glyph spacing mismatch (as seen in bug 1919906) because it places each glyph individually anyhow.

Flags: needinfo?(jfkthame)
Assignee: nobody → jfkthame
Status: NEW → ASSIGNED

(In reply to Jonathan Kew [:jfkthame] from comment #3)

Would it be possible to enforce allowForceGDIClassic set to true in the pdf.js case ?

Yeah, I guess we could do that; we have code to check whether we're dealing with a pdf.js document.

It seems to me that that won't actually help when PDF.js is used "outside" of the Firefox PDF Viewer though (which applies even to the PDF.js test-suite that we use at GitHub)?

Yes, there are many websites, prominent ones too, using pdf.js as a library.

Hmm -- do we have any way to detect within Gecko when it's being used like that?

Could we disable GDI Classic on transparent HTML text (like the text layers in pdf.js)?

(In reply to Gregory Pappas [:gregp] from comment #8)

Could we disable GDI Classic on transparent HTML text (like the text layers in pdf.js)?

That would break the (reasonable) assumption that simply modifying color does not affect layout.

(In reply to Jonathan Kew [:jfkthame] from comment #7)

Hmm -- do we have any way to detect within Gecko when it's being used like that?

And if not, what other options do we have? A few possibilities come to mind:

(a) We could "fix" bug 1919906 in the opposite way: instead of preventing force-gdi-classic being applied to canvas text, we could allow it to take effect for both measurement and rendering. (The issue in bug 1919906 was that it was being applied during measurement, but not during rendering, resulting in glyph spacing that doesn't work for the actual rendered glyphs.)

The risk here is that we may regress the Google Docs issues that originally caused us to disable forced-gdi-classic glyph rendering in canvas (bug 1730772).

(b) Eventually, we'd like to completely remove the force-gdi-classic "hack" for certain legacy Windows fonts; see bug 1920733. That should resolve the problem here, because then neither the canvas measureText operation nor the HTML text rendering will be forcing a rendering mode, so they should both behave the same.

The risk here is that we're not sure how quickly we'll be able to deploy such a removal; the change to the rendering of certain (common) legacy fonts may generate a significant backlash.

(c) Here's an idea: I see that the (invisible) text layer in pdf.js is using font-family: sans-serif. That explains why it is affected by the bug 1730772 change: sans-serif on Windows resolves to Arial, and Arial is one of the "classic" Windows fonts listed in gfx.font_rendering.cleartype_params.force_gdi_classic_for_families, so that's why its metrics are affected by the change, and there is now a canvas vs HTML text mismatch. Maybe pdf.js could prefer a different Windows font for the text layer (with sans-serif remaining as fallback, or for other platforms). If it were to use, for instance, font-family: Calibri, sans-serif, then any modern Windows system should use Calibri, which isn't affected by the force-gdi-classic setting, and so the metrics and scaling computation ought to work as intended.

Calixte, WDYT about option (c) here -- would that be a reasonable workaround, at least until such time as we're able to drop the force-gdi-classic behavior altogether? Given that the text layer is invisible, it shouldn't really matter exactly which font is used, right?

Flags: needinfo?(cdenizet)

(In reply to Jonathan Kew [:jfkthame] from comment #10)

(c) Here's an idea: I see that the (invisible) text layer in pdf.js is using font-family: sans-serif.

Please note that it's just one of the possible font-families that can be used in the textLayer: https://github.com/mozilla/pdf.js/blob/567df4214983a12da9f3d02a2ab1ea646f0dfcf1/src/core/fonts.js#L996-L1002

(In reply to Jonas Jenwald [:Snuffleupagus] from comment #11)

(In reply to Jonathan Kew [:jfkthame] from comment #10)

(c) Here's an idea: I see that the (invisible) text layer in pdf.js is using font-family: sans-serif.

Please note that it's just one of the possible font-families that can be used in the textLayer: https://github.com/mozilla/pdf.js/blob/567df4214983a12da9f3d02a2ab1ea646f0dfcf1/src/core/fonts.js#L996-L1002

There shouldn't be an issue with serif, as Times isn't affected by the gdi-classic pref. Presumably monospace could be an issue, though, as both Courier New and Consolas are listed in the pref.

Calixte, WDYT about option (c) here -- would that be a reasonable workaround, at least until such time as we're able to drop the force-gdi-classic behavior altogether? Given that the text layer is invisible, it shouldn't really matter exactly which font is used, right?

What could be a replacement for monospace ?
Do we know how many users have a Windows version without Calibri ?

Flags: needinfo?(cdenizet)

(In reply to Calixte Denizet (:calixte) from comment #13)

Calixte, WDYT about option (c) here -- would that be a reasonable workaround, at least until such time as we're able to drop the force-gdi-classic behavior altogether? Given that the text layer is invisible, it shouldn't really matter exactly which font is used, right?

What could be a replacement for monospace ?

"Lucida Console" should be OK.

Do we know how many users have a Windows version without Calibri ?

Virtually none, I suspect; I believe Calibri has been a standard font since Windows 7 (or Vista?).

Note if we work around this in pdf.js, we'd still see the bug in external users until they upgrade to the version with the workaround (and they're are often slow in upgrading).

We don't have any exact idea about how many people are using pdf.js in an other context that the built-in viewer, but it's probably a lot.

Maybe it's a stupid idea, but on Windows, when the font is sans-serif, why not choosing Calibri before Arial ? Are we "obliged" to select Arial ?
I mean in general and not only in pdf.js.

(In reply to Calixte Denizet (:calixte) from comment #18)

Maybe it's a stupid idea, but on Windows, when the font is sans-serif, why not choosing Calibri before Arial ? Are we "obliged" to select Arial ?
I mean in general and not only in pdf.js.

I don't think we're obliged to use Arial, it's just that it has been the default for a very long time, and changing a default like that will no doubt cause a certain amount of disruption. Also I think Edge and Chrome still use Arial (I'm not on Windows right now to check...), in which case changing our sans-serif to a different font might add some minor interop pain.

(Personally, I like Calibri better than Arial for general use. Still, I'd want to tread carefully if we consider changing such a fundamental default.)

On Windows 11, Chrome and Edge are using Arial when the font is sans-serif.

Status: ASSIGNED → RESOLVED
Closed: 2 months ago
Component: Graphics: Text → PDF Viewer
Product: Core → Firefox
Resolution: --- → FIXED
Target Milestone: --- → 133 Branch
See Also: → 1923150
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: