Closed Bug 1896076 Opened 1 year ago Closed 1 year ago

Better handling of PDF destinations for document-internal links

Categories

(Core :: Printing: Output, enhancement)

enhancement

Tracking

()

RESOLVED FIXED
128 Branch
Tracking Status
firefox-esr140 --- fixed
firefox128 --- fixed

People

(Reporter: jfkthame, Assigned: jfkthame)

References

Details

Attachments

(3 files)

On thinking about bug 1895872, arguably the "ideal" behavior for document-internal links would be to generate a link that points to the PDF-internal destination if it exists in the resulting PDF, but points back to the online resource if the destination is not present in the generated document.

For example, consider a document with a table of contents that points down to various sections of the doc. If we save a PDF of the whole document, those should be internal destinations. But if we save a PDF of just the first page, it would be nice of the ToC links now pointed back at the online doc, rather than simply being non-functional.

To achieve this, I think we'd need a cairo modification, such that when generating the CAIRO_TAG_LINK, we could specify both a URL destination and an internal name, as it may be a forward link whose CAIRO_TAG_DEST hasn't yet been written. (Currently _cairo_tag_parse_link_attributes would reject this as an invalid combination of attributes.)

This would result in a new type of cairo_tag_link_type_t such as DEST_AND_URI, to be resolved when written out. Then when resolving the indirect destinations of forward links, cairo would use the internal destination if known in the document, and the URL otherwise.

@ajohnson, does that seem feasible/worthwhile as a change to cairo behavior, or are there reasons it would be undesirable?

I'm working on implementing this.

Severity: -- → S3
Type: defect → enhancement

I've implemented DEST with fallback to URI in this cairo MR.

https://gitlab.freedesktop.org/cairo/cairo/-/merge_requests/547

I also fixed the missing DEST issue by setting the destination to [<current-page> /XYZ null null 0]. This is a link to the current location. This should be a better fix as it is fully compliant with the PDF spec (ie it doesn't try to reference anything that doesn't exist). It does nothing in Adobe Reader, Okular, Evince, and mupdf.

Looking at DrawTargetCairo::Link, the comment doesn't make sense to me. I'm not sure why you needed double backslashes on some platforms. The code in cario-pdf-interchange for unescaping backslashes is very simple and I can't see how it could work differently on different platforms. I tried a simple cairo test case of a URL with a backslash and it worked fine.

The test case used:

   cairo_tag_begin (cr, CAIRO_TAG_LINK, "uri='http://example.com/a\\\\b'");

and setting CAIRO_DEBUG_PDF=1 to disable compression and object streams resulted in a PDF that contained:

<< /Type /Annot
   /Subtype /Link
   /StructParent 3
   /Rect [ 149 789 227 802 ]
   /A <<
      /Type /Action
      /S /URI
      /URI (http://example.com/a\\b)
   >>
   /BS << /W 0 >>
>>

the backslash is escaped as required for PDF strings. Checking with pdfinfo:

$ pdfinfo -url test.pdf
Page  Type          URL
   1  Annotation    http://example.com/a\b

This is on Linux.

The comment also says "Encoding of non-ASCII chars etc gets handled later by the PDF backend" . The PDF standard requires the URI to be 7-bit ASCII. Cairo will throw an error if it is not ASCII. The PDF standard also notes that URIs are to comply with RFC 2396 which specifies that backslash must be % escaped.

I tried testing a URI with a backslash in Firefox but it converted it to a slash.

Hmm. I don't remember details of that offhand, but it looks like that originated in bug 1748077, to avoid generating invalid PDFs. But then I filed https://gitlab.freedesktop.org/cairo/cairo/-/issues/526, and you applied a fix upstream, so probably the code in Gecko should be adjusted in the light of that fix. I'll try to take a look; thanks for the heads-up.

I'm not sure why you needed double backslashes on some platforms. The code in cario-pdf-interchange for unescaping backslashes is very simple and I can't see how it could work differently on different platforms.

I think that's a red herring. We were (are) doing the backslash-doubling any time we're using cairo-pdf-interchange (but this may be unnecessary/wrong since cairo issue 526 was resolved); the platform that's different is macOS, but that's because on macOS we don't go through cairo-pdf-interchange at all; pdf is generated through the cairo-quartz backend.

Regarding the backslashes, I confirmed that since we updated cairo, we now get incorrectly-doubled backslashes in pdf output if given a URL with backslash in it, so I spun off bug 1896892 to fix this. Thanks for noticing this!

URL: 1896892
URL: 1896892
See Also: → 1896892

This extends the cairo PDF backend to allow a link tag to specify both
DEST and URI attributes; the internal destination will be used if it
exists in the resulting PDF, and the external URI otherwise.

Assignee: nobody → jfkthame
Status: NEW → ASSIGNED

To take advantage of the new cairo capability, extend the link APIs to support
passing both a local destination name and a URI from the display list to the
rendering backend.

With this change, links to parts of the document that aren't included in the
print-to-pdf output will point back to the online resource, instead of just being
dead links.

(Note that this won't work the same on macOS at present, as we don't use
the cairo pdf backend there.)

Now that we provide both DEST and URI attributes to the cairo tag API, this is necessary
to keep internal destinations working when generating PDF on macOS. Unfortunately, the
new feature of falling back to the URI when the named destination doesn't exist in the
resulting document won't work here, because we have to decide which kind of link to
generate (whether to use CGPDFContextSetURLForRect or CGPDFContextSetDestinationForRect)
before we know whether the destination will be defined.

So this patch merely preserves the existing behavior on macOS; to get the improved
functionality, we'd need to pre-flight the whole print job to find out what destinations
are going to be defined, before actually doing the PDF generation. (Or switch over fully
to the cairo pdf backend.) This is left for a possible future enhancement.

Pushed by jkew@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/b8582b9465ea patch 1 - Apply changes from https://gitlab.freedesktop.org/cairo/cairo/-/merge_requests/547. r=gfx-reviewers,lsalzman https://hg.mozilla.org/integration/autoland/rev/4ce8e4713fec patch 2 - Support passing both a local destination name and a URI when generating a hyperlink. r=gfx-reviewers,lsalzman https://hg.mozilla.org/integration/autoland/rev/dd6c6e0bef33 patch 3 - In cairo-quartz-surface, prefer to use link's DEST rather than URI if both are present. r=gfx-reviewers,lsalzman
Status: ASSIGNED → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
Target Milestone: --- → 128 Branch
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: