Closed Bug 1122280 Opened 5 years ago Closed 5 years ago

PDF.js parses links within PDF file incorrectly and results on file not found errors

Categories

(Firefox :: PDF Viewer, defect, P3)

x86
Windows 7
defect

Tracking

()

RESOLVED FIXED
Firefox 41

People

(Reporter: alex_mayorga, Assigned: hellemar)

References

Details

(Whiteboard: [pdfjs-c-ux][pdfjs-d-annotations][pdfjs-f-fixed-upstream] https://github.com/mozilla/pdf.js/pull/5999)

Steps:
- Load http://sadm.gob.mx/PortalSadm/Docs/Proceso_de_Licitacion.pdf
- Click any of the links within the PDF file

Result:
All the links are broken on Nightly.

Expected result:
All the links work like on IE.
Looks like something is failing with certain unicode characters.
Priority: -- → P3
Whiteboard: [pdfjs-c-ux][pdfjs-d-annotations][good first bug]
this is a windows only issue as nightly and current release on OSX work find
This is not a Windows-only issue. It happens with both nightly (39.0a1) and stable (36.0) on Ubuntu 14.04 too.

Apparently the example PDF file is malformed. The 'ó' character present in the action URI of most of the links is UTF-8 encoded while the PDF standard https://www.adobe.com/content/dam/Adobe/en/devnet/pdf/pdfs/PDF32000_2008.pdf states in section 12.6.4.7 that URI needs to be encoded in 7-bit ASCII.

It appears that PDF.js assumes the URI to be WinAnsi (CP-1252) encoded thus producing the ó characters from the UTF-8 C3 B3 character.

Other viewers (Evince 3.10.3 and Chromium 40.0.2214.111) are able produce the intended links.


Is the above described behaviour intentional? If not, I should be able to provide a patch to fix this.
I have created a pull request https://github.com/mozilla/pdf.js/pull/5999 to fix this issue.

It seems that by now, the linked PDF http://sadm.gob.mx/PortalSadm/Docs/Proceso_de_Licitacion.pdf has been changed and most of the problematic links have been fixed (those with the word Licitación causing trouble).
There are still a few others remaining, e.g. Actos de la Licitación\3.2.-ANEXO 2 GUÍA DE DOCUMENTOS ANEXOS in the Abril 14 row.
Assignee: nobody → hellemar
Whiteboard: [pdfjs-c-ux][pdfjs-d-annotations][good first bug] → [pdfjs-c-ux][pdfjs-d-annotations][pdfjs-f-fixed-upstream] https://github.com/mozilla/pdf.js/pull/5999
Duplicate of this bug: 1023808
Depends on: 1168547
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 41
Depends on: 1308362
You need to log in before you can comment on or make changes to this bug.