Closed Bug 1666824 Opened 5 years ago Closed 3 years ago

pdfjs does not properly handle non-ASCII characters in forms when saving PDFs

Categories

(Firefox :: PDF Viewer, defect, P1)

76 Branch
defect

Tracking

()

RESOLVED FIXED
109 Branch
Tracking Status
relnote-firefox --- 108+
firefox-esr91 --- wontfix
firefox93 --- wontfix
firefox94 --- wontfix
firefox95 --- wontfix
firefox96 --- wontfix
firefox107 --- wontfix
firefox108 + verified
firefox109 --- verified

People

(Reporter: ilusha.paschuk, Assigned: calixte, NeedInfo)

References

Details

(Whiteboard: [pdfjs-form-acroform])

Attachments

(3 files)

User Agent: Mozilla/5.0 (Windows NT 6.1; Win64; rv:76.0) Gecko/20100101 Firefox/76.0

Steps to reproduce:

open fillable pdf, fill up some fields with cirilic letters, press download button in the viewer and open just downloaded file in firefox again

Actual results:

got unreadable chars instead my cirilic text

Expected results:

correct handling

Bugbug thinks this bug should belong to this component, but please revert this change in case of error.

Component: Untriaged → PDF Viewer
Severity: -- → S2
Whiteboard: [pdfjs-c-forms]

Hi,

I used https://campustecnologicoalgeciras.es/wp-content/uploads/2017/07/OoPdfFormExample.pdf for a fillable pdf sample, and https://www.lexilogos.com/keyboard/russian.htm to fill some fields with cirilic letters, then clicked on download , "with your changes" options, and I'm not able to reproduce (Firefox is showing cirilic letters just as chrome)

Let me know if I missed any steps. I checked on Windows 10 pro, firefox release 84.0.2 (64-bit)
Clara

Flags: needinfo?(ilusha.paschuk)

The original pdf doesn't contain any fonts to display cyrillic chars, so we need to find a way to find a font, take a subset to display the chars and then write this subset in the pdf file.

Status: UNCONFIRMED → NEW
Ever confirmed: true
Flags: needinfo?(ilusha.paschuk)
Whiteboard: [pdfjs-c-forms] → [pdfjs-form-acroform]

This affects not "just" cyrillic characters but apparently everything outside the ASCII range as well.
I'm attaching my reduced test-case from the dup'ed bug here as well as this makes it fairly easy to reproduce the problem:

Entering any non-ascii character (I've tried with Umlauts ä ö ü and accented characters like é) and entering print preview shows these characters are missing (in the case of umlauts) or misrepresented (é turns into Ø) in the lower field. When entering the same characters in the top field, the characters are displayed correctly in print preview and printed correctly.

The attached file is a word document saved as PDF. I've used the official Adobe Acrobat Pro to auto-detect form fields. The lower, broken field is a result of that automatic detection.
I've then deleted the auto-detected top field and copied the bottom one to result in the now working top field.

Summary: pdfjs incorrect cirilic letters handling in form saving → pdfjs does not properly handle non-ASCII characters in forms when saving PDFs

:Snuffleupagus, what do you think about adding an entry for a missing font with the correct unicode mapping and let the pdf viewers deal with that.
I know it isn't ideal at all, it's likely the less exciting idea I had in the last months, but it'd help to at least fix the printing issue and likely help to fix the saving issues either.
For the future I think we could try to get a font from the system itself, then subset it or not and include the stream in resulting pdf. To be honest, I'm not super excited by the idea to add a font (even a subset) in an incremental saving, but in the meantime I don't feel like to write a pdf from scratch.
So my feeling is that adding a missing font is maybe the best of the worst solutions to fix that.

Flags: needinfo?(jonas.jenwald)

This is also affecting the PDFs download from ceskekormidlo.cz, as reported at https://github.com/webcompat/web-bugs/issues/108433

Jonas, ping for the question from Calixte in comment 10. Any thoughts?

(In reply to Calixte Denizet (:calixte) from comment #10)

:Snuffleupagus, what do you think about adding an entry for a missing font with the correct unicode mapping and let the pdf viewers deal with that.

Assuming that something this "simple" works out, then that definitely sounds like the best/easiest way forward here in my opinion.

For the future I think we could try to get a font from the system itself, then subset it or not and include the stream in resulting pdf.

That sounds like it could introduce all kinds of problems, given the different fonts available on different computers. (Maybe if we used the standard fonts that we ship with the library, but still probably more trouble than its worth.)

To be honest, I'm not super excited by the idea to add a font (even a subset) in an incremental saving, but in the meantime I don't feel like to write a pdf from scratch.

Completely agreed, on all points.

So my feeling is that adding a missing font is maybe the best of the worst solutions to fix that.

Adding a "dummy" font with appropriate /ToUnicode data seems like a good approach; sorry about overlooking the need-info previously!

Flags: needinfo?(jonas.jenwald)

:Snuffleupagus, since you're the expert in everything around encoding stuff, would you have time to write a patch to fix this issue ?

Assignee: nobody → cdenizet
Status: NEW → ASSIGNED
Priority: P3 → P1
Duplicate of this bug: 1799171
Duplicate of this bug: 1800369
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Flags: qe-verify+
Resolution: --- → FIXED
Target Milestone: --- → 109 Branch
Depends on: 1800694

[Tracking Requested - why for this release]: The bug prevents filling forms in languages that are not fully representable by ASCII. This is also the case for text annotations that we introduced recently (bug 1784272). If it was only affecting forms, I wouldn't suggest uplifting because it is a long-standing bug, but it is affecting text annotations too and we have many duplicates, so I think it's worth considering an uplift (the many duplicates also help us with verifying the fix).

In addition to the above, 108 is also the first version in which we will have a callout for the PDF editing features (bug 1793636).

I could not reproduce the issue from description, but I could reproduce the issue mentioned on comment #8 (if I try to print the document attached there, characters are not displayed on broken field) using Beta 84.0.2(20210105180113). Verified same issue is not reproducing on Win 10 using Firefox build 109.0a1(20221116182402).

Since I was not able to reproduce the initial issue, I am asking reporter if he can still reproduce the issue on latest Nightly build (https://archive.mozilla.org/pub/firefox/nightly/2022/11/2022-11-17-09-39-01-mozilla-central/). Thank you so much.

Flags: needinfo?(ilusha.paschuk)
No longer duplicate of this bug: 1669097
See Also: → 1669097

Release Note Request (optional, but appreciated)
[Why is this notable]:

  • All the pdf where the users added some non-latin characters shew a wrong rendering when printed/saved.
  • Some forms couldn't been printed/saved correctly.

So it's a real improvement for a lot of non-english users.
[Affects Firefox for Android]:
No
[Suggested wording]:
[Links (documentation, blog post, etc)]:
None

relnote-firefox: --- → ?

(In reply to Monica Chiorean from comment #25)

I could not reproduce the issue from description, but I could reproduce the issue mentioned on comment #8 (if I try to print the document attached there, characters are not displayed on broken field) using Beta 84.0.2(20210105180113). Verified same issue is not reproducing on Win 10 using Firefox build 109.0a1(20221116182402).

Since I was not able to reproduce the initial issue, I am asking reporter if he can still reproduce the issue on latest Nightly build (https://archive.mozilla.org/pub/firefox/nightly/2022/11/2022-11-17-09-39-01-mozilla-central/). Thank you so much.

Thanks Monica for verifying the fix to this bug and all its duplicates!

Comment on attachment 9304429 [details]
Bug 1666824 - Fix printing/saving annotations containing non-ascii chars r=#pdfjs-reviewers

Beta/Release Uplift Approval Request

  • User impact if declined: Some user using some non-english alphabets could have some issues when printing/saving some forms or some others they edited themselves.
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: Yes
  • If yes, steps to reproduce: Follow the STR in the different dups
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Well tested in pdf.js CI, verified in nightly and pdf.js is self-contained.
  • String changes made/needed:
  • Is Android affected?: No
Attachment #9304429 - Flags: approval-mozilla-beta?

Comment on attachment 9304429 [details]
Bug 1666824 - Fix printing/saving annotations containing non-ascii chars r=#pdfjs-reviewers

Approved for 108.0b5

Attachment #9304429 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
QA Whiteboard: [qa-triaged]

Verified issue is not reproducing on Win 10/Ubuntu20.04/Mac 10.13 using Firefox Nightly build 109.0a1(20221122214324) and Beta 108.0b5(20221122190120) I used same steps as described on comment#8. We'll add a comment on each duplicate once verified.

QA Whiteboard: [qa-triaged]
Flags: qe-verify+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: