PDF Viewer displays BOM in bookmarks as "þÿ"
Categories
(Firefox :: PDF Viewer, defect)
Tracking
()
People
(Reporter: vincent-moz, Unassigned)
Details
Attachments
(3 files)
User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:95.0) Gecko/20100101 Firefox/95.0
Steps to reproduce:
Open http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2731.pdf
Actual results:
In the bookmarks (on the left), "þÿ" is displayed before the actual text.
Expected results:
"þÿ" actually corresponds to BOM and should not be displayed.
Reporter | ||
Updated•4 years ago
|
Reporter | ||
Comment 1•4 years ago
|
||
Comment 2•4 years ago
|
||
Comment 3•4 years ago
|
||
Comment 4•4 years ago
•
|
||
(In reply to Vincent Lefevre from comment #0)
"þÿ" actually corresponds to BOM and should not be displayed.
Please note that a byte order mark (BOM) needs to be placed at the beginning of a string in order to be valid, and that it appearing in the middle of a piece of text has no special significance.
Unfortunately this is a bug in the PDF document itself, rather than in Firefox, and similarly broken behaviour be observed in other PDF viewers as well; please refer to the attached screen-shots from Adobe Reader and PDFium (in Google Chrome).
Reporter | ||
Comment 5•4 years ago
|
||
OK, I've reported the issue to the author of the PDF file.
I suspect that the section number was prepended to the bookmark string at some point, hence the issue, e.g. "BookmarkTitle: 1 þÿScope" from the pdftk output. But I would have thought that the BOM would have appeared as nothing (since it is U+FEFF ZERO WIDTH NO-BREAK SPACE) instead of being interpreted as ISO-8859-1. I actually suspect that none of these viewers support UTF-16 (which is the encoding used for these bookmarks - perhaps non standard?) and handle it differently (Adobe and Atril both see byte 00 as an end of string, thus stopping after the BOM, while Firefox and Chrome ignore it, which appears as OK since UTF-16 and ASCII are equivalent except for these bytes 00).
Description
•