pdfjs does not properly handle non-ASCII characters in forms when saving PDFs
Categories
(Firefox :: PDF Viewer, defect, P1)
Tracking
()
People
(Reporter: ilusha.paschuk, Assigned: calixte, NeedInfo)
References
Details
(Whiteboard: [pdfjs-form-acroform])
Attachments
(3 files)
User Agent: Mozilla/5.0 (Windows NT 6.1; Win64; rv:76.0) Gecko/20100101 Firefox/76.0
Steps to reproduce:
open fillable pdf, fill up some fields with cirilic letters, press download button in the viewer and open just downloaded file in firefox again
Actual results:
got unreadable chars instead my cirilic text
Expected results:
correct handling
Comment 1•5 years ago
|
||
Bugbug thinks this bug should belong to this component, but please revert this change in case of error.
Updated•5 years ago
|
Comment 2•4 years ago
|
||
Hi,
I used https://campustecnologicoalgeciras.es/wp-content/uploads/2017/07/OoPdfFormExample.pdf for a fillable pdf sample, and https://www.lexilogos.com/keyboard/russian.htm to fill some fields with cirilic letters, then clicked on download , "with your changes" options, and I'm not able to reproduce (Firefox is showing cirilic letters just as chrome)
Let me know if I missed any steps. I checked on Windows 10 pro, firefox release 84.0.2 (64-bit)
Clara
| Assignee | ||
Comment 3•4 years ago
|
||
The original pdf doesn't contain any fonts to display cyrillic chars, so we need to find a way to find a font, take a subset to display the chars and then write this subset in the pdf file.
Updated•4 years ago
|
| Assignee | ||
Updated•4 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Comment 8•3 years ago
|
||
This affects not "just" cyrillic characters but apparently everything outside the ASCII range as well.
I'm attaching my reduced test-case from the dup'ed bug here as well as this makes it fairly easy to reproduce the problem:
Entering any non-ascii character (I've tried with Umlauts ä ö ü and accented characters like é) and entering print preview shows these characters are missing (in the case of umlauts) or misrepresented (é turns into Ø) in the lower field. When entering the same characters in the top field, the characters are displayed correctly in print preview and printed correctly.
The attached file is a word document saved as PDF. I've used the official Adobe Acrobat Pro to auto-detect form fields. The lower, broken field is a result of that automatic detection.
I've then deleted the auto-detected top field and copied the bottom one to result in the now working top field.
Updated•3 years ago
|
| Assignee | ||
Comment 10•3 years ago
|
||
:Snuffleupagus, what do you think about adding an entry for a missing font with the correct unicode mapping and let the pdf viewers deal with that.
I know it isn't ideal at all, it's likely the less exciting idea I had in the last months, but it'd help to at least fix the printing issue and likely help to fix the saving issues either.
For the future I think we could try to get a font from the system itself, then subset it or not and include the stream in resulting pdf. To be honest, I'm not super excited by the idea to add a font (even a subset) in an incremental saving, but in the meantime I don't feel like to write a pdf from scratch.
So my feeling is that adding a missing font is maybe the best of the worst solutions to fix that.
Updated•3 years ago
|
Comment 11•3 years ago
|
||
This is also affecting the PDFs download from ceskekormidlo.cz, as reported at https://github.com/webcompat/web-bugs/issues/108433
Updated•3 years ago
|
Comment 12•3 years ago
|
||
Jonas, ping for the question from Calixte in comment 10. Any thoughts?
Comment 13•3 years ago
|
||
(In reply to Calixte Denizet (:calixte) from comment #10)
:Snuffleupagus, what do you think about adding an entry for a missing font with the correct unicode mapping and let the pdf viewers deal with that.
Assuming that something this "simple" works out, then that definitely sounds like the best/easiest way forward here in my opinion.
For the future I think we could try to get a font from the system itself, then subset it or not and include the stream in resulting pdf.
That sounds like it could introduce all kinds of problems, given the different fonts available on different computers. (Maybe if we used the standard fonts that we ship with the library, but still probably more trouble than its worth.)
To be honest, I'm not super excited by the idea to add a font (even a subset) in an incremental saving, but in the meantime I don't feel like to write a pdf from scratch.
Completely agreed, on all points.
So my feeling is that adding a missing font is maybe the best of the worst solutions to fix that.
Adding a "dummy" font with appropriate /ToUnicode data seems like a good approach; sorry about overlooking the need-info previously!
| Assignee | ||
Comment 14•3 years ago
|
||
:Snuffleupagus, since you're the expert in everything around encoding stuff, would you have time to write a patch to fix this issue ?
| Assignee | ||
Updated•3 years ago
|
| Assignee | ||
Comment 18•3 years ago
|
||
Updated•3 years ago
|
Comment 23•3 years ago
|
||
[Tracking Requested - why for this release]: The bug prevents filling forms in languages that are not fully representable by ASCII. This is also the case for text annotations that we introduced recently (bug 1784272). If it was only affecting forms, I wouldn't suggest uplifting because it is a long-standing bug, but it is affecting text annotations too and we have many duplicates, so I think it's worth considering an uplift (the many duplicates also help us with verifying the fix).
Comment 24•3 years ago
•
|
||
In addition to the above, 108 is also the first version in which we will have a callout for the PDF editing features (bug 1793636).
Updated•3 years ago
|
Comment 25•3 years ago
|
||
I could not reproduce the issue from description, but I could reproduce the issue mentioned on comment #8 (if I try to print the document attached there, characters are not displayed on broken field) using Beta 84.0.2(20210105180113). Verified same issue is not reproducing on Win 10 using Firefox build 109.0a1(20221116182402).
Since I was not able to reproduce the initial issue, I am asking reporter if he can still reproduce the issue on latest Nightly build (https://archive.mozilla.org/pub/firefox/nightly/2022/11/2022-11-17-09-39-01-mozilla-central/). Thank you so much.
Updated•3 years ago
|
| Assignee | ||
Comment 26•3 years ago
|
||
Release Note Request (optional, but appreciated)
[Why is this notable]:
- All the pdf where the users added some non-latin characters shew a wrong rendering when printed/saved.
- Some forms couldn't been printed/saved correctly.
So it's a real improvement for a lot of non-english users.
[Affects Firefox for Android]:
No
[Suggested wording]:
[Links (documentation, blog post, etc)]:
None
Comment 27•3 years ago
|
||
(In reply to Monica Chiorean from comment #25)
I could not reproduce the issue from description, but I could reproduce the issue mentioned on comment #8 (if I try to print the document attached there, characters are not displayed on broken field) using Beta 84.0.2(20210105180113). Verified same issue is not reproducing on Win 10 using Firefox build 109.0a1(20221116182402).
Since I was not able to reproduce the initial issue, I am asking reporter if he can still reproduce the issue on latest Nightly build (https://archive.mozilla.org/pub/firefox/nightly/2022/11/2022-11-17-09-39-01-mozilla-central/). Thank you so much.
Thanks Monica for verifying the fix to this bug and all its duplicates!
| Assignee | ||
Comment 28•3 years ago
|
||
| Assignee | ||
Comment 29•3 years ago
|
||
Comment on attachment 9304429 [details]
Bug 1666824 - Fix printing/saving annotations containing non-ascii chars r=#pdfjs-reviewers
Beta/Release Uplift Approval Request
- User impact if declined: Some user using some non-english alphabets could have some issues when printing/saving some forms or some others they edited themselves.
- Is this code covered by automated tests?: Yes
- Has the fix been verified in Nightly?: Yes
- Needs manual test from QE?: Yes
- If yes, steps to reproduce: Follow the STR in the different dups
- List of other uplifts needed: None
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): Well tested in pdf.js CI, verified in nightly and pdf.js is self-contained.
- String changes made/needed:
- Is Android affected?: No
Comment 30•3 years ago
|
||
Comment on attachment 9304429 [details]
Bug 1666824 - Fix printing/saving annotations containing non-ascii chars r=#pdfjs-reviewers
Approved for 108.0b5
Comment 31•3 years ago
|
||
| bugherder uplift | ||
Updated•3 years ago
|
Comment 32•3 years ago
|
||
Verified issue is not reproducing on Win 10/Ubuntu20.04/Mac 10.13 using Firefox Nightly build 109.0a1(20221122214324) and Beta 108.0b5(20221122190120) I used same steps as described on comment#8. We'll add a comment on each duplicate once verified.
Updated•3 years ago
|
Updated•3 years ago
|
Description
•