Closed Bug 1724182 Opened 3 months ago Closed 3 months ago

Save to PDF output broken due to missing destination

Categories

(Core :: Printing: Output, defect, P2)

Firefox 92
defect

Tracking

()

RESOLVED FIXED
92 Branch
Tracking Status
firefox-esr78 --- unaffected
firefox-esr91 --- unaffected
firefox90 --- unaffected
firefox91 --- unaffected
firefox92 --- fixed

People

(Reporter: ajohnson, Assigned: jfkthame)

References

(Regression)

Details

(Keywords: regression)

Attachments

(1 file)

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:92.0) Gecko/20100101 Firefox/92.0

Steps to reproduce:

  1. Open https://en.wikipedia.org/wiki/Boeing_CH-47_Chinook_in_Australian_service
  2. Save As PDF

Actual results:

The PDF file is broken (missing xref) and can not be opened in a PDF viewer

Expected results:

A valid PDF file should be generated.

The reason the PDF file is incomplete is because the cairo pdf generation aborted with an error status. Setting CAIRO_TAG_DEBUG=1 before running firefox and repeating the same steps revealed the problem:

TAG ERROR: Link to dest="cite_note-FOOTNOTEStephens1995421%E2%80%93423-1" not found

Status: UNCONFIRMED → NEW
Component: Untriaged → Printing: Output
Ever confirmed: true
Flags: needinfo?(jfkthame)
Product: Firefox → Core
Regressed by: 1722300

So is this a broken link in the page, and we're generating it even there's no destination? I guess we could check in the document name / id map before emitting it.

The link within the page works fine; this'll be an encoding issue of some kind, because the ID includes a non-ASCII character.

Flags: needinfo?(jfkthame)
Severity: -- → S2
Priority: -- → P2

Would it help if I add additional options to the CAIRO_DEBUG_TAG variable. eg print all tags and attributes or print a specific tag?

FWIW, on macOS (where we generate the PDF through Quartz rather than cairo's PDF surface) the file is generated successfully; the links with non-ASCII destinations don't work but at least the file is usable. So this primarily affects Linux & Windows, though fixing the non-working links on macOS should be nice a side-effect.

(In reply to ajohnson@redneon.com from comment #3)

Would it help if I add additional options to the CAIRO_DEBUG_TAG variable. eg print all tags and attributes or print a specific tag?

I don't think it's needed here, thanks (though of course more debugging info is always useful in general). In this case it's clear what is causing the problem, we just need to determine how such characters should be handled, and at what level.

So the bug here is triggered by the link to

<li id="cite_note-FOOTNOTEStephens1995421–423-1">

where the key issue is that the dash in "421–423" is not an ASCII hyphen but an en-dash (U+2013).

The error

TAG ERROR: Link to dest="cite_note-FOOTNOTEStephens1995421%E2%80%93423-1" not found

suggests that when generating the link, this character is getting percent-encoded; so presumably we need to ensure the same encoding is applied when generating the destination.

Checking the cairo docs I don't think it says anything about the encoding of the dests. I would have assumed utf8. For internal links it shouldn't matter as they are only used by strcmp(). For non internal dests the code the emits the name into the PDF is assuming ASCII. I will fix that allow utf8 dest names.

The fact that the link also doesn't work on macOS (where we generate it through a CGPDFContext) suggests we need to handle this in some way on the Gecko side too, though.

Oh, I suspect I know what's wrong -- we retrieve the link from the element in URI form (which will have been subject to %-encoding of "unsafe" characters), and then we just use its hash. But the destination doesn't get that treatment, it just uses the id or name attribute directly.

So probably we just need to undo the %-encoding of the hash we extract from the URI when generating the link. I'll check....

This appears to fix things on Linux: the PDF is successfully generated, and links (including the first few footnotes, which have the problematic dash) work when opened in evince.

I want to double-check the behavior on macOS as well before asking for review. If the non-ASCII characters seem to be troublesome, an alternative would be to apply the same URL-encoding when generating the destination, though it makes for slightly less nice-looking links (e.g. if a tooltip displays the destination).

Adrian, I do wonder if the cairo backend should penalize a missing destination so severely? Could it simply be ignored, rather than resulting in broken output?

Flags: needinfo?(ajohnson)

The cairo policy has generally been garbage in garbage out. But this is one case where bailing out may not be the best option. Unfortunately cairo does not provide a "warning" mechanism to notify the application of a problem and keep going. The error handling policy is "shutdown the surface at the first error".

One option would be to add a link tag option such as "ignoremissingdest" (or something a bit shorter) to tell cairo not to bail out if any dests are missing. Cairo still needs to create a destination for a link already emitted so it would have to point it somewhere, eg the first or last page, but this is better than a broken pdf.

Flags: needinfo?(ajohnson)

I could make the destination of links with a missing dest the link itself. This would comply with the principle of least surprise.

That sounds reasonable to me. With the fix in Gecko, we shouldn't need it to resolve this case, but in general I think it'd be preferable behavior.

OK, confirmed that applying NS_UnescapeURL to the destinations gives us working links here on macOS as well. So non-ASCII destination names don't seem to be a problem for either the cairo or cgpdfcontext backends.

Assignee: nobody → jfkthame
Status: NEW → ASSIGNED
Pushed by jkew@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/06751b9dbd15
Unescape the fragment from the URL when using as an internal PDF destination. r=dholbert
Status: ASSIGNED → RESOLVED
Closed: 3 months ago
Resolution: --- → FIXED
Target Milestone: --- → 92 Branch
Flags: qe-verify+
You need to log in before you can comment on or make changes to this bug.