Closed Bug 829435 Opened 11 years ago Closed 11 years ago

Japanese characters appear completely garbled in PDF viewer

Categories

(Firefox :: PDF Viewer, defect, P1)

19 Branch
x86
Windows 7
defect

Tracking

()

VERIFIED FIXED
Firefox 21
Tracking Status
firefox19 + verified
firefox20 + verified
firefox21 + verified

People

(Reporter: alice0775, Assigned: emk)

References

Details

(Keywords: jp-critical, Whiteboard: [pdfjs-c-rendering][pdfjs-d-font-conversion])

Attachments

(4 files)

Build Identifier:
http://hg.mozilla.org/mozilla-central/rev/0a6e5a67c4e8
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:21.0) Gecko/20130110 Firefox/21.0 ID:20130110030939

COMPLETELY BROKEN in Firefox.

The problem does not happen in Google Chrome 23.0.1271.97 m.

Steps to reproduce:
1. Open Firefox with newly created profile
2. Open attached pdf

Actual results:
The characters were garbled.

Expected results:
Should be rendered with proper characters
Attached file sample pdf
The problem happen in Firefox 19 and later.
http://hg.mozilla.org/releases/mozilla-beta/rev/222e6877be4b
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:19.0) Gecko/20100101 Firefox/19.0 ID:20130109111322

However, The Problem does not happen in Firefox18
http://hg.mozilla.org/releases/mozilla-release/rev/8efe34fa2289
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:18.0) Gecko/20100101 Firefox/18.0 ID:20130104151925
Keywords: regression
Version: Trunk → 19 Branch
I saw this bug even on Firefox 18 if I installed pdf.js add-on.
I reported this bug from way back, but it is abandoned.
https://github.com/mozilla/pdf.js/issues/696
Although I personally agree we should not ship pdf.js without fixing this bug, it looks like they are not interested in fixing the problem specific to Japanese.
(In reply to Alice0775 White from comment #3)
> The problem happen in Firefox 19 and later.
> http://hg.mozilla.org/releases/mozilla-beta/rev/222e6877be4b
> Mozilla/5.0 (Windows NT 6.1; WOW64; rv:19.0) Gecko/20100101 Firefox/19.0
> ID:20130109111322
> 
> However, The Problem does not happen in Firefox18
> http://hg.mozilla.org/releases/mozilla-release/rev/8efe34fa2289
> Mozilla/5.0 (Windows NT 6.1; WOW64; rv:18.0) Gecko/20100101 Firefox/18.0
> ID:20130104151925
Depends on: 815718
Keywords: regression
Summary: The characters were garbled in PDF viewer → The characters were garbled in PDF viewer in Firefox19Beta and later
(In reply to Alice0775 White from comment #5)
> (In reply to Alice0775 White from comment #3)
> > The problem happen in Firefox 19 and later.
> > http://hg.mozilla.org/releases/mozilla-beta/rev/222e6877be4b
> > Mozilla/5.0 (Windows NT 6.1; WOW64; rv:19.0) Gecko/20100101 Firefox/19.0
> > ID:20130109111322
> > 
> > However, The Problem does not happen in Firefox18
> > http://hg.mozilla.org/releases/mozilla-release/rev/8efe34fa2289
> > Mozilla/5.0 (Windows NT 6.1; WOW64; rv:18.0) Gecko/20100101 Firefox/18.0
> > ID:20130104151925

It's just because PDF viewer is not enabled on Firefox 18, no?
I bet this bug will not be fixed unless someone in Japan write a patch and send a pull-request.
(In reply to Masatoshi Kimura [:emk] from comment #4)
> I saw this bug even on Firefox 18 if I installed pdf.js add-on.
> I reported this bug from way back, but it is abandoned.
> https://github.com/mozilla/pdf.js/issues/696
> Although I personally agree we should not ship pdf.js without fixing this
> bug,

> it looks like they are not interested in fixing the problem specific to
> Japanese.

OK, in this case,I requested, Disable PDF.js In FF19 Beta  and forever.
Summary: The characters were garbled in PDF viewer in Firefox19Beta and later → Disable PDF.js In FF19 Beta and Aurora and Nightly
I set "pdfjs.disabled" to false and tried to view the PDF with Firefox 18 internal viewer, but it was not displayed at all with the following error: "TypeError: currentPage is undefined @ resource://pdf.js/web/viewer.js:361".
You must have used something other than internal PDF viewer on Firefox 18.
(In reply to Alice0775 White from comment #8)
> OK, in this case,I requested, Disable PDF.js In FF19 Beta  and forever.

It should be disabled on release and beta unless this bug is fixed, but there is no reason to disable it on other channels. You can easily select other viewers from Options > Applications > Portable Document Format (PDF).
(In reply to Masatoshi Kimura [:emk] from comment #9)
> I set "pdfjs.disabled" to false and tried to view the PDF with Firefox 18
> internal viewer, but it was not displayed at all with the following error:
> "TypeError: currentPage is undefined @ resource://pdf.js/web/viewer.js:361".
> You must have used something other than internal PDF viewer on Firefox 18.

My bad.
Yes, you are correct.
Firefox18 is disabled PDFjs by default.
(In reply to Masatoshi Kimura [:emk] from comment #10)
> (In reply to Alice0775 White from comment #8)
> > OK, in this case,I requested, Disable PDF.js In FF19 Beta  and forever.
> 
> It should be disabled on release and beta unless this bug is fixed, but
> there is no reason to disable it on other channels. You can easily select
> other viewers from Options > Applications > Portable Document Format (PDF).

OK,
Summary: Disable PDF.js In FF19 Beta and Aurora and Nightly → Disable PDF.js In FF19 Beta
Nominating this for a release blocker.
Blocks: 748923
PDF.js is in the release notes of Firefox 19: http://www.mozilla.org/en-US/firefox/19.0beta/releasenotes/
It was also written in the release notes of Firefox 18, but eventually disabled.
Using RELEASE_BUILD build-time flag introduced by bug 820148.
Attachment #701004 - Flags: review?(mak77)
Is this issue appearing on all Japanese PDF documents? If not, what portion of the documents looks bad vs looks good?
A example of non-mathematical paper:
http://jp.xlsoft.com/documents/windriver/wdman10J.pdf
You can use different PDF viewers with Firefox instead of using PDF.js, and PDF.js works with a large proportion of PDFs found in the wild, so this is a case of a missing support feature in PDF.js, rather than something that makes it impossible to view certain things on the web anymore. Asking for its removal or even suggesting blocking a release on a missing feature in an optional library is going a little far =)

Constructively speaking, if PDF.js is having problems with a subset of PDF files such as files that use SHIFT_JIS encoding (used in many older Japanese PDFs, before Unicode become more popular) then it's a good idea to file a bug on this over at the PDF.js github tracker (http://github.com/mozilla/pdf.js/issues) so that it can be fixed, along the lines of "pdf.js does not support PDF files with SHIFT_JIS encoded text". This'll let the PDF.js dev team make sure the issue gets tracked by the people who work on the code, including non-mozilla volunteers who contribute to its codebase (they only look at the github issues, not mozilla's bugzilla)
(In reply to Mike "Pomax" Kamermans from comment #20)
> Constructively speaking, if PDF.js is having problems with a subset of PDF
> files such as files that use SHIFT_JIS encoding (used in many older Japanese
> PDFs, before Unicode become more popular) then it's a good idea to file a
> bug on this over at the PDF.js github tracker
> (http://github.com/mozilla/pdf.js/issues) so that it can be fixed, along the
> lines of "pdf.js does not support PDF files with SHIFT_JIS encoded text".
> This'll let the PDF.js dev team make sure the issue gets tracked by the
> people who work on the code, including non-mozilla volunteers who contribute
> to its codebase (they only look at the github issues, not mozilla's bugzilla)

Could you please read comment #4?
(In reply to Mike "Pomax" Kamermans from comment #20)
> You can use different PDF viewers with Firefox instead of using PDF.js,

I know (see comment #10), but many novice users would have never even seen the Options dialog. Even the bug reporter, a very experienced user, didn't know.

> and
> PDF.js works with a large proportion of PDFs found in the wild, so this is a
> case of a missing support feature in PDF.js, rather than something that
> makes it impossible to view certain things on the web anymore. Asking for
> its removal or even suggesting blocking a release on a missing feature in an
> optional library is going a little far =)

I didn't propose removing the PDF viewer. You can use PDF viewer by flipping "pdfjs.disabled", as you probably know :)
Masatoshi, I'm not sure why "The characters were garbled in PDF viewer" issue was escalated to "Disable PDF.js In FF19 Beta". As far as I understand, it shall just stay as "The characters were garbled in PDF viewer" and be a blocker to ff19 release bug.
(In reply to Yury (:yury) from comment #23)
> Masatoshi, I'm not sure why "The characters were garbled in PDF viewer"
> issue was escalated to "Disable PDF.js In FF19 Beta". As far as I
> understand, it shall just stay as "The characters were garbled in PDF
> viewer" and be a blocker to ff19 release bug.

The reporter changed the title, not me.
(In reply to Yury (:yury) from comment #23)
> Masatoshi, I'm not sure why "The characters were garbled in PDF viewer"
> issue was escalated to "Disable PDF.js In FF19 Beta". As far as I
> understand, it shall just stay as "The characters were garbled in PDF
> viewer" and be a blocker to ff19 release bug.

I thought that PSDjs is not reached the level of beta release and also aurora.
(In reply to Masatoshi Kimura [:emk] from comment #24)
> (In reply to Yury (:yury) from comment #23)
> > Masatoshi, I'm not sure why "The characters were garbled in PDF viewer"
> > issue was escalated to "Disable PDF.js In FF19 Beta". As far as I
> > understand, it shall just stay as "The characters were garbled in PDF
> > viewer" and be a blocker to ff19 release bug.
> 
> The reporter changed the title, not me.

The patch was submitted by you, so I thought you have some information that makes this patch the only solution for the issue.

(In reply to Alice0775 White from comment #25)
> I thought that PSDjs is not reached the level of beta release and also
> aurora.

What criteria was used to determine that PDF.js is not reached the level of beta or aurora release?
(In reply to Yury (:yury) from comment #26)
> (In reply to Masatoshi Kimura [:emk] from comment #24)
> > (In reply to Yury (:yury) from comment #23)
> > > Masatoshi, I'm not sure why "The characters were garbled in PDF viewer"
> > > issue was escalated to "Disable PDF.js In FF19 Beta". As far as I
> > > understand, it shall just stay as "The characters were garbled in PDF
> > > viewer" and be a blocker to ff19 release bug.
> > 
> > The reporter changed the title, not me.
> 
> The patch was submitted by you, so I thought you have some information that
> makes this patch the only solution for the issue.
> 
> (In reply to Alice0775 White from comment #25)
> > I thought that PSDjs is not reached the level of beta release and also
> > aurora.
> 
> What criteria was used to determine that PDF.js is not reached the level of
> beta or aurora release?

Hello,

The PDF.js team has been working hard the last six months to address rendering bugs based on their severity and breadth of impact. It is very unfortunate that some Japanese language documents won't render properly, but in my opinion this symptom in no way justifies disabling the PDF viewer in FF19.

I propose that we discuss this bug in our regularly scheduled bug triage meeting on Monday, January 14th. Details will be published in dev-pdf-js mailing list, please join us! Until then, I'd like to revert this bug back to a description of its symptom, which is how we can make best use of Bugzilla.
Summary: Disable PDF.js In FF19 Beta → Japanese characters appear completely garbled in PDF viewer
Actually there were two different reasons for the garbled text. Filed https://github.com/mozilla/pdf.js/issues/2559 for the second.
Comment on attachment 701004 [details] [diff] [review]
Disable pdf.js on release and beta channels automatically

While technically I could review this patch, I can't take such a decision, that's up to the module owner and release management.

I wonder if there's no "safe" fix that could be taken in Beta to address this problem?
Attachment #701004 - Flags: review?(mak77) → feedback?(gavin.sharp)
Flags: needinfo?(akeybl)
I'm sending a pull request to upstream.
https://github.com/mozilla/pdf.js/pull/2561
This patch will fix only the second problem.
Comment on attachment 701004 [details] [diff] [review]
Disable pdf.js on release and beta channels automatically

This seems premature. Let's try to address this problem before disabling pdf.js wholesale.
Attachment #701004 - Flags: feedback?(gavin.sharp)
Flags: needinfo?(akeybl)
One option may be to automatically disable PDF.js for Japanese locale builds, if we can't find a low risk fix in time for FF19. Users on other localized builds can always choose to open the PDF in an external reader.
Priority: -- → P1
Whiteboard: [pdfjs-c-rendering][pdfjs-d-font-conversion]
Whiteboard: [pdfjs-c-rendering][pdfjs-d-font-conversion] → [pdfjs-c-rendering][pdfjs-d-font-conversion][pdfjs-f-fixed-upstream] https://github.com/mozilla/pdf.js/pull/2562
The reported attachment is still garbled. That is
https://github.com/mozilla/pdf.js/issues/696
. The PDF uses '90ms' encoding.
Whiteboard: [pdfjs-c-rendering][pdfjs-d-font-conversion][pdfjs-f-fixed-upstream] https://github.com/mozilla/pdf.js/pull/2562 → [pdfjs-c-rendering][pdfjs-d-font-conversion]
[Approval Request Comment]
Bug caused by (feature/regressing bug #): N/A
User impact if declined: Some PDFs are not readable for Japanese users.
Testing completed (on m-c, etc.): upstream
Risk to taking this patch (and alternatives if risky): Medium. An obvious alternative is disabling PDF viewer (at least on Japanese localization).
String or UUID changes made by this patch: None
Attachment #703068 - Flags: approval-mozilla-beta?
Attachment #703068 - Flags: approval-mozilla-aurora?
Adding qawanted to get some testing around this fix so we can confirm whether it is lower risk to disable pdf.js on jp builds or land this to branches.  Can qa confirm if this patch makes the feature usable on jp builds?
Keywords: qawanted
(In reply to Masatoshi Kimura [:emk] from comment #35)
> Risk to taking this patch (and alternatives if risky): Medium. An obvious
> alternative is disabling PDF viewer (at least on Japanese localization).

Out of curiosity, is this medium risk evaluation for PDFs with Japanese characters, or for all PDFs? How do you expect a regression to manifest itself?
(In reply to Lukas Blakk [:lsblakk] from comment #36)
> Adding qawanted to get some testing around this fix so we can confirm
> whether it is lower risk to disable pdf.js on jp builds or land this to
> branches.  Can qa confirm if this patch makes the feature usable on jp
> builds?

Can qa read Japanese and determine that the characters are garbled?

(In reply to Alex Keybl [:akeybl] from comment #37)
> Out of curiosity, is this medium risk evaluation for PDFs with Japanese
> characters, or for all PDFs? How do you expect a regression to manifest
> itself?

Some other PDFs' display may be regress (but I don't encounter that for now). Non-CJK PDFs are less likely to regress because patched parts are involving multibyte character encodings.
The following PDFs are examples of PDFs, which pdf.js can not render as we expect.

http://kanji.zinbun.kyoto-u.ac.jp/~yasuoka/publications/JST2007-5.pdf
http://www.jsce.or.jp/committee/amc/jam/DOC/jam_makepdf.pdf
http://www.nii.ac.jp/userdata/shimin/documents/H23/111005_4thlec02.pdf
http://www.cyflex.jp/knowhow/f_know/f_mojibake.pdf
http://asciimw.jp/info/release/pdf/20080908b.pdf
http://www.kyotanabe.ed.jp/jouhou_k/pdf/firefox.pdf
http://www.unisys.co.jp/tec_info/tr93/9306.pdf
http://www.jasst.jp/archives/jasst07k/pdf/T1.pdf
http://www.cqpub.co.jp/dwm/contents/0110/dwm011000820.pdf
http://www.jasst.jp/archives/jasst05w/pdf/S4-1.pdf
http://www.marubun-arita.co.jp/to/rabachi/rabachi.pdf
http://www.jst.go.jp/kisoken/crest/research_area/ongoing/21senryaku_1.pdf
http://repository.kulib.kyoto-u.ac.jp/dspace/bitstream/2433/57579/1/takahashi1.pdf
http://www.alpha-web.jp/service/internet/roaming/pdf/wlan-ap04.pdf
http://www.yaf.or.jp/mmh/recommend/McSeat.pdf
http://www.kanebo-cosmetics.co.jp/company/pdf/20120926-02.pdf

The followings are also examples which pdf.js can not render as expected.
They may look fine to you, but they seem Chinese sentences to us Japanese.

http://www.ztv.co.jp/support/column/2008/pdf/giju_column051.pdf
http://wl.emit-japan.com/past/tips/pdf/tips_character_change.pdf
http://www.jkn.auecc.aichi-edu.ac.jp/for_user/tip/firefox/firefox.pdf
http://www.mew.org/~kazu/material/2008-malloc.pdf
http://www.mext.go.jp/a_menu/sports/stamina/05030101/001.pdf
http://www.apu.ac.jp/~makita/pdf/2.pdf
http://www.isc.meiji.ac.jp/~pkchoi/gelpaper.pdf

http://azusa.shinshu-u.ac.jp/~haru/miracle.pdf

As far as I investigate very briefly, 1 out of 8 or 12 Japanese PDFs on the WEB seem to have problems with pdf.js. I searched PDFs with the keyword "filetype:pdf arbitary-keywords" in Yahoo or in Google.

I wish you disable pdf.js by default in Firefox 19 Japanese version unless this bug is fixed.
All the above examples are rendered without garbled text using with patched pdf.js 0.7.82.
(In reply to Lukas Blakk [:lsblakk] from comment #36)
> Adding qawanted to get some testing around this fix so we can confirm
> whether it is lower risk to disable pdf.js on jp builds or land this to
> branches.  Can qa confirm if this patch makes the feature usable on jp
> builds?

Is there any build with this fix?
(In reply to Mihaela Velimiroviciu [QA] (:mihaela) from comment #41)
> (In reply to Lukas Blakk [:lsblakk] from comment #36)
> > Adding qawanted to get some testing around this fix so we can confirm
> > whether it is lower risk to disable pdf.js on jp builds or land this to
> > branches.  Can qa confirm if this patch makes the feature usable on jp
> > builds?
> 
> Is there any build with this fix?

Made tryserver builds.
aurora: http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/VYV03354@nifty.ne.jp-47ea74f63363/
beta: http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/VYV03354@nifty.ne.jp-83dec4f3c354/
Fix for m-c will land in bug 832995
Depends on: 832995
No longer depends on: 815718
(Tested on Win 7 x86 and Ubuntu 12.10 x86, both beta and aurora try builds)

The Japanese characters painting issue seems to be solved with this fix. There are still a few issues, though:
1. Non-Japanese characters spacing is incorrect in most documents. See for example: 
  - http://www.unisys.co.jp/tec_info/tr93/9306.pdf
  - http://www.cqpub.co.jp/dwm/contents/0110/dwm011000820.pdf
  - http://asciimw.jp/info/release/pdf/20080908b.pdf
2. Sometimes non-Japanese chars are displayed incorrect or not displayed at all
  - http://www.nii.ac.jp/userdata/shimin/documents/H23/111005_4thlec02.pdf - numbers not displayed correctly (other characters painted instead numbers)
  - http://www.alpha-web.jp/service/internet/roaming/pdf/wlan-ap04.pdf - missing "alpha" character
3. 「 and 」 not correctly displayed in most of pdfs (the vertical lines are too short). This seem to happen only on Ubuntu
4. Where writing is vertical with Japanese characters, it is displayed horizontally; see page 3 in http://www.jasst.jp/archives/jasst05w/pdf/S4-1.pdf 

There are some documents in which Japanese characters are still not displayed correctly - see chapter names and spacing in http://repository.kulib.kyoto-u.ac.jp/dspace/bitstream/2433/57579/1/takahashi1.pdf

For most documents that are not displayed correctly, the "The PDF document might not be displayed correctly" message pops up (on Windows).
Keywords: qawanted
(In reply to Mihaela Velimiroviciu [QA] (:mihaela) from comment #44)
> (Tested on Win 7 x86 and Ubuntu 12.10 x86, both beta and aurora try builds)
> 
> The Japanese characters painting issue seems to be solved with this fix.
> There are still a few issues, though:
> 1. Non-Japanese characters spacing is incorrect in most documents. See for
> example: 
>   - http://www.unisys.co.jp/tec_info/tr93/9306.pdf
>   - http://www.cqpub.co.jp/dwm/contents/0110/dwm011000820.pdf
>   - http://asciimw.jp/info/release/pdf/20080908b.pdf
> 2. Sometimes non-Japanese chars are displayed incorrect or not displayed at
> all
>   - http://www.nii.ac.jp/userdata/shimin/documents/H23/111005_4thlec02.pdf -
> numbers not displayed correctly (other characters painted instead numbers)
>   - http://www.alpha-web.jp/service/internet/roaming/pdf/wlan-ap04.pdf -
> missing "alpha" character
> 3. 「 and 」 not correctly displayed in most of pdfs (the vertical lines are
> too short). This seem to happen only on Ubuntu
> 4. Where writing is vertical with Japanese characters, it is displayed
> horizontally; see page 3 in
> http://www.jasst.jp/archives/jasst05w/pdf/S4-1.pdf 
> 
> There are some documents in which Japanese characters are still not
> displayed correctly - see chapter names and spacing in
> http://repository.kulib.kyoto-u.ac.jp/dspace/bitstream/2433/57579/1/
> takahashi1.pdf
> 
> For most documents that are not displayed correctly, the "The PDF document
> might not be displayed correctly" message pops up (on Windows).

Yeah, but
- Those problems are not regressions (not caused by this patch).
- Those problems are out of scope of this bug.
- It's much better than completely unreadable text.
Comment on attachment 703068 [details] [diff] [review]
Support some CMap encodings without embedded fonts

Given the fact that none of the issues found are thought to be regressions, let's land now and ensure that this fix gets into beta 3 of FF19.
Attachment #703068 - Flags: approval-mozilla-beta?
Attachment #703068 - Flags: approval-mozilla-beta+
Attachment #703068 - Flags: approval-mozilla-aurora?
Attachment #703068 - Flags: approval-mozilla-aurora+
pdf.js 0.7.82 update:
https://hg.mozilla.org/mozilla-central/rev/5fae1032ef3b
Assignee: nobody → VYV03354
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 21
Verified on Firefox 19 beta 3 (Ubuntu 12.10, Win 7 32bit and Mac 10.8.2): Japanese characters are displayed correctly.
Some(not all) characters in this PDF aren't displayed corrently.

http://navi.hamabus.jp/dia/bustime/pdfdata/510_7813_14.pdf
Verified with Firefox 20 beta 1.

User Agent: Mozilla/5.0 (Windows NT 6.1; rv:20.0) Gecko/20100101 Firefox/20.0
Build ID: 20130220104816
Verified on latest Aurora 21 version: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:21.0) Gecko/20130305 Firefox/21.0
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: