Incorrect rendering of accented characters in tab title for PDF hotlinked from Dropbox
Categories
(Firefox :: PDF Viewer, defect)
Tracking
()
People
(Reporter: microtherion, Unassigned)
References
Details
Attachments
(1 file)
4.91 KB,
image/jpeg
|
Details |
User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:71.0) Gecko/20100101 Firefox/71.0
Steps to reproduce:
Load https://www.dropbox.com/s/iu30xpltgqw7mdo/Bomba%20de%20coraz%C3%B3n.pdf?raw=1
Actual results:
The accented ó in the document title is rendered as a missing character in the tab title (see attached image)
Expected results:
Expected "corazón" to be displayed correctly.
Comment 1•5 years ago
|
||
Hi Matthias,
Thanks for the details. I was able to reproduce on MacOS 10.14.5 on Firefox Nightly version 73.0a1 (2019-12-27) (64-bit), Release 71.0 and Beta 72.0b5
I've chosen a component so that the issue is reviewed.
Best regards, Clara.
Comment 2•5 years ago
|
||
This is a text-encoding error in the PDF file (presumably caused by a bug in lilypond or ghostscript, which generated it), not a Firefox bug.
Looking into the PDF, there is a block of metadata that includes the document title: an extract from a hex dump shows
00022960 2f 70 64 66 27 3e 3c 64 63 3a 74 69 74 6c 65 3e |/pdf'><dc:title>|
00022970 3c 72 64 66 3a 41 6c 74 3e 3c 72 64 66 3a 6c 69 |<rdf:Alt><rdf:li|
00022980 20 78 6d 6c 3a 6c 61 6e 67 3d 27 78 2d 64 65 66 | xml:lang='x-def|
00022990 61 75 6c 74 27 3e 42 6f 6d 62 61 20 64 65 20 63 |ault'>Bomba de c|
000229a0 6f 72 61 7a ef bf bd 6e 3c 2f 72 64 66 3a 6c 69 |oraz...n</rdf:li|
000229b0 3e 3c 2f 72 64 66 3a 41 6c 74 3e 3c 2f 64 63 3a |></rdf:Alt></dc:|
000229c0 74 69 74 6c 65 3e 3c 2f 72 64 66 3a 44 65 73 63 |title></rdf:Desc|
Note that where the ó in "corazón" should be, we have the three bytes ef bf bd
, which is the UTF-8 representation of the Unicode codepoint U+FFFD REPLACEMENT CHARACTER. And that's what shows up in the Firefox tab title.
A bit later in the file, we find the title in a different form:
00022b10 33 30 5a 30 30 27 30 30 27 29 0a 2f 43 72 65 61 |30Z00'00')./Crea|
00022b20 74 6f 72 28 4c 69 6c 79 50 6f 6e 64 20 32 2e 31 |tor(LilyPond 2.1|
00022b30 39 2e 38 33 29 0a 2f 54 69 74 6c 65 28 5c 33 37 |9.83)./Title(\37|
00022b40 36 5c 33 37 37 5c 30 30 30 42 5c 30 30 30 6f 5c |6\377\000B\000o\|
00022b50 30 30 30 6d 5c 30 30 30 62 5c 30 30 30 61 5c 30 |000m\000b\000a\0|
00022b60 30 30 20 5c 30 30 30 64 5c 30 30 30 65 5c 30 30 |00 \000d\000e\00|
00022b70 30 20 5c 30 30 30 63 5c 30 30 30 6f 5c 30 30 30 |0 \000c\000o\000|
00022b80 72 5c 30 30 30 61 5c 30 30 30 7a 5c 30 30 30 5c |r\000a\000z\000\|
00022b90 33 36 33 5c 30 30 30 6e 29 0a 2f 53 75 62 74 69 |363\000n)./Subti|
00022ba0 74 6c 65 28 29 0a 2f 43 6f 6d 70 6f 73 65 72 28 |tle()./Composer(|
00022bb0 45 64 64 69 65 20 50 61 6c 6d 69 65 72 69 29 0a |Eddie Palmieri).|
Here, the /Title
entry in the PDF dictionary is encoded as UTF16-BE, indicated by the \376\377
prefix, and the ó is (correctly) encoded as \000\363
, or U+00F3. But apparently Firefox is relying on the XMP metadata block to provide the tab title, and there the ó has been replaced by U+FFFD.
Besides hex-dumping the PDF file, this can also be confirmed using an XMP metadata-viewing tool such as https://www.get-metadata.com; uploading the file there shows the incorrect "Title: Bomba de corazn" field among the displayed metadata.
Updated•5 years ago
|
Updated•5 years ago
|
Description
•