Closed Bug 837461 Opened 7 years ago Closed 7 years ago

pdf.js Black squares appear in some PDF documents even though the original PDFs does not have them.


(Core :: Canvas: 2D, defect)

19 Branch
Windows Vista
Not set



Tracking Status
firefox18 --- unaffected
firefox19 + verified
firefox20 + verified
firefox21 + verified


(Reporter: rshimazu, Assigned: jfkthame)


(Blocks 1 open bug)


(Keywords: regression)


(3 files)

User Agent: Mozilla/5.0 (Windows NT 6.0; rv:19.0) Gecko/20100101 Firefox/19.0
Build ID: 20130130080006

Steps to reproduce:

I access to the following urls:

Actual results:

Black squares(boxes), which are not included in the original document, and are appearently strange, appear in the document. 

Expected results:

Squares should not appear.
WFM using Mozilla/5.0 (Windows NT 6.1; WOW64; rv:19.0) Gecko/20100101 Firefox/19.0 ID:20130130080006 and

Mozilla/5.0 (Windows NT 6.1; WOW64; rv:21.0) Gecko/20130202 Firefox/21.0 ID:20130202030955 + PDF.js 0.7.142
Could you take a screenshot of these black boxes, please.

In addition, try with hardware acceleration disabled:
Flags: needinfo?(rshimazu)
Flags: needinfo?(rshimazu)
I disabled hardware acceleration, but no change. Still I see obvious strangeness.
Component: Untriaged → PDF Viewer
Could you try with a fresh profile, please:
Flags: needinfo?(rshimazu)
I created a new profile and tested. But no change and the same result.
Flags: needinfo?(rshimazu)
I saw this on the latest Nightly, but not on Firefox 18.
Ever confirmed: true
Confirmed too, but only with HWA disabled (on Win 7).
Could you confirm that HWA is disabled when you're seeing the black/gray rectangles? (type about:support > graphics)

Regression range:
Regression window(m-c)
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:19.0) Gecko/19.0 Firefox/19.0 ID:20121104012747
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:19.0) Gecko/19.0 Firefox/19.0 ID:20121104152246

Regression window(m-i)
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:19.0) Gecko/19.0 Firefox/19.0 ID:20121103200445
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:19.0) Gecko/19.0 Firefox/19.0 ID:20121104025144

	58b4bd7b5065	Jonathan Kew — bug 808288 - render missing glyphs as hexboxes in <canvas> text. r=bas
I'm not able to reproduce this on Win7 or OS X; will try WinXP shortly.

However, I do notice a number of font-related errors in the Web Console when loading these PDFs, including some fonts that are rejected; this implies that the rendering is probably falling back to a locally-installed font instead of using the embedded font from the PDF document.

This suggests that the root of the problem here may well be pdf.js font handling, and the change in bug 808288 just caused the "junk" that it is drawing to become visible instead of invisible. But it shouldn't be drawing text to the canvas without the proper font (or is it an encoding problem?) in the first place.
(In reply to Jonathan Kew (:jfkthame) from comment #11)
> I'm not able to reproduce this on Win7 or OS X; will try WinXP shortly.
See comment #9, It only happens when HWA disabled.
Ah, right - sorry, I overlooked that. Also reproduced on my WinXP VM, which makes sense, as we use the GDI font backend there.

I notice that with Firefox 18 on XP, I see missing-glyph "boxes" where there are spaces in the English fragments of text. With Nightly, I see those ugly gray bars instead, but those spaces are the trigger for it. I'm not sure yet why they appear so long, but I think there's some kind of font-handling or encoding problem in pdf.js that is causing it to try to draw an undefined character for those spaces.
Jonathan, can you tell us whether a backout of bug 808288 is feasible here and what state that would leave us in when we ship 19?  The black boxes are very undesireable, can we still ship pdf.js with the former missing glyph behaviour?
Completely backing out 808288 isn't a very attractive option, as the old canvas code was broken to the extent that it rendered some arbitrary ("random") glyph in place of any missing character. Much of the time, that might produce nothing visible, but it also had the potential to draw a totally -wrong- glyph. (See the testcase and screenshot in bug 808288.)

But as it happens, bug 808288 was implemented in several logical steps, so one option would be to back out the second and third patches there, leaving only the first stage. That was a very small patch to make canvas text simply leave missing characters blank, not attempt to draw a glyph at all.

I think that would be a very safe option, and the result for pdf.js should be as good as it can be short of actually fixing pdf.js such that it renders the -correct- glyph from the correct font.

I'll push a tryserver job of mozilla-beta with such a backout (i.e. back out bug 808288 parts 2 and 3) to confirm that it works as expected.
This is the partial backout, as described above. Tryserver run in progress at I'm expecting it to fail a reftest on some OS X systems; if we land this, we'll need to readjust the manifest accordingly.
Attachment #710914 - Flags: review?(bas)
Component: PDF Viewer → Canvas: 2D
Product: Firefox → Core
The tryserver job above was pushed from a mozilla-beta tree, so some builds such as b2g may fail.

Also pushed a job with the same partial backout based on mozilla-central, to confirm desktop reftest results there:
From tracing the behavior when rendering the PDF documents from comment #0, it's clear that the underlying issue here is that pdf.js is failing to create a usable font resource from one or more of the embedded fonts in the document.

The font(s) are either rejected by the OTS sanitizer, or fail to activate on the platform; this shows up in the Web Console, as we log error messages there for font failures.

Then pdf.js proceeds to draw text using Private Use Area codepoints (U+Exxx, at least in the cases I traced), which were intended to map to specific glyphs in the embedded font but are not supported by whatever standard font we fall back to, due to the font-loading failure. And so we hit the missing-glyph code path, which on current trunk would usually be expected to draw hexboxes. I'm guessing that the ugly black bars we're seeing are a fragment clipped from those hexboxes, which are being drawn at a huge size due to some quirk of how pdf.js is dealing with font scaling, though I'm not 100% sure of this.

The suggested partial-backout here can't solve the problem of pdf.js failing to create a usable font resource, but by making missing glyphs entirely invisible in canvas, it will at least make the result less glaringly bad. (It's not really a good solution, though, as users might not even realize that some of the text in their document isn't showing up.)
Tryserver results are looking ok... I assume the android test failures are because of pushing mozilla-beta (rather than trunk) to tryserver, and things have changed too much for this to work. The canvas/text-emoji.html reftest failure makes perfect sense, as most platforms don't have a font for the character used there and so this hits the canvas missing-character code, which is exactly what we're proposing to back out here. If we take this backout patch, I'll mark this in the reftest manifest.
Attachment #710914 - Flags: review?(bas) → review+
Pushed to m-c, including reftest manifest update to mark text-emoji.html as random on most platforms.
Comment on attachment 710914 [details] [diff] [review]
backout parts 2 and 3 of bug 808288 because pdf.js may paint missing chars to canvas if it failed to load fonts, which looks really ugly; better to skip them for now.

[Approval Request Comment]
Bug caused by (feature/regressing bug #): not strictly a regression, but bug 808288 results in visually worse results due to pdf.js font-loading issues

User impact if declined: potential for really ugly rendering of some PDFs on Windows GDI systems, when loading of document's embedded fonts fails

Testing completed (on m-c, etc.): verified locally that mozilla-beta tryserver build (comment 16) renders as expected - missing glyphs are simply absent, no black blobs; landed on mozilla-central

Risk to taking this patch (and alternatives if risky): minimal risk, this just removes the code that was added to draw missing-glyph hexboxes in canvas text

String or UUID changes made by this patch: none
Attachment #710914 - Flags: approval-mozilla-beta?
Attachment #710914 - Flags: approval-mozilla-aurora?
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla21
Comment on attachment 710914 [details] [diff] [review]
backout parts 2 and 3 of bug 808288 because pdf.js may paint missing chars to canvas if it failed to load fonts, which looks really ugly; better to skip them for now.

Low risk partial backout that will improve the rendering of non-English PDFs. Approving for Aurora/Beta.
Attachment #710914 - Flags: approval-mozilla-beta?
Attachment #710914 - Flags: approval-mozilla-beta+
Attachment #710914 - Flags: approval-mozilla-aurora?
Attachment #710914 - Flags: approval-mozilla-aurora+
I confirm the fix is verified on Latest Aurora on Windows Vista x64:

Mozilla/5.0 (Windows NT 6.0; WOW64; rv:20.0) Gecko/20130210 Firefox/20.0(20130210042017)

I will make same investigations on FF 19b6 because on FF 19b5 it was not implemented yet.
I confirm fix is verified on FF 19b6 and Latest Nightly too on Windows Vista x64

Mozilla/5.0 (Windows NT 6.0; WOW64; rv:19.0) Gecko/20100101 Firefox/19.0 (20130212082553)

Mozilla/5.0 (Windows NT 6.0; WOW64; rv:21.0) Gecko/20130212 Firefox/21.0(20130212031120)
Keywords: verifyme
I am the original reporter of this bug. I also confirmed the fix on Firefox 19b6 with pdf.js 0.7.234 addon. Thank you.
(In reply to rshimazu from comment #26)
> I am the original reporter of this bug. I also confirmed the fix on Firefox
> 19b6 with pdf.js 0.7.234 addon. Thank you.

Thank you for confirming too.
Would anyone care to see if this bug is related to
(In reply to RonB from comment #28)
> Would anyone care to see if this bug is related to

No, bug 858128 is still present in FF23.
You need to log in before you can comment on or make changes to this bug.