Better handling of PDF destinations for document-internal links
Categories
(Core :: Printing: Output, enhancement)
Tracking
()
People
(Reporter: jfkthame, Assigned: jfkthame)
References
Details
Attachments
(3 files)
On thinking about bug 1895872, arguably the "ideal" behavior for document-internal links would be to generate a link that points to the PDF-internal destination if it exists in the resulting PDF, but points back to the online resource if the destination is not present in the generated document.
For example, consider a document with a table of contents that points down to various sections of the doc. If we save a PDF of the whole document, those should be internal destinations. But if we save a PDF of just the first page, it would be nice of the ToC links now pointed back at the online doc, rather than simply being non-functional.
To achieve this, I think we'd need a cairo modification, such that when generating the CAIRO_TAG_LINK, we could specify both a URL destination and an internal name, as it may be a forward link whose CAIRO_TAG_DEST hasn't yet been written. (Currently _cairo_tag_parse_link_attributes would reject this as an invalid combination of attributes.)
This would result in a new type of cairo_tag_link_type_t
such as DEST_AND_URI
, to be resolved when written out. Then when resolving the indirect destinations of forward links, cairo would use the internal destination if known in the document, and the URL otherwise.
@ajohnson, does that seem feasible/worthwhile as a change to cairo behavior, or are there reasons it would be undesirable?
Comment 1•1 year ago
|
||
I'm working on implementing this.
Assignee | ||
Updated•1 year ago
|
Comment 2•1 year ago
|
||
I've implemented DEST
with fallback to URI
in this cairo MR.
https://gitlab.freedesktop.org/cairo/cairo/-/merge_requests/547
I also fixed the missing DEST
issue by setting the destination to [<current-page> /XYZ null null 0]
. This is a link to the current location. This should be a better fix as it is fully compliant with the PDF spec (ie it doesn't try to reference anything that doesn't exist). It does nothing in Adobe Reader, Okular, Evince, and mupdf.
Comment 3•1 year ago
|
||
Looking at DrawTargetCairo::Link
, the comment doesn't make sense to me. I'm not sure why you needed double backslashes on some platforms. The code in cario-pdf-interchange for unescaping backslashes is very simple and I can't see how it could work differently on different platforms. I tried a simple cairo test case of a URL with a backslash and it worked fine.
The test case used:
cairo_tag_begin (cr, CAIRO_TAG_LINK, "uri='http://example.com/a\\\\b'");
and setting CAIRO_DEBUG_PDF=1 to disable compression and object streams resulted in a PDF that contained:
<< /Type /Annot
/Subtype /Link
/StructParent 3
/Rect [ 149 789 227 802 ]
/A <<
/Type /Action
/S /URI
/URI (http://example.com/a\\b)
>>
/BS << /W 0 >>
>>
the backslash is escaped as required for PDF strings. Checking with pdfinfo:
$ pdfinfo -url test.pdf
Page Type URL
1 Annotation http://example.com/a\b
This is on Linux.
The comment also says "Encoding of non-ASCII chars etc gets handled later by the PDF backend" . The PDF standard requires the URI to be 7-bit ASCII. Cairo will throw an error if it is not ASCII. The PDF standard also notes that URIs are to comply with RFC 2396 which specifies that backslash must be % escaped.
I tried testing a URI with a backslash in Firefox but it converted it to a slash.
Assignee | ||
Comment 4•1 year ago
|
||
Hmm. I don't remember details of that offhand, but it looks like that originated in bug 1748077, to avoid generating invalid PDFs. But then I filed https://gitlab.freedesktop.org/cairo/cairo/-/issues/526, and you applied a fix upstream, so probably the code in Gecko should be adjusted in the light of that fix. I'll try to take a look; thanks for the heads-up.
I'm not sure why you needed double backslashes on some platforms. The code in cario-pdf-interchange for unescaping backslashes is very simple and I can't see how it could work differently on different platforms.
I think that's a red herring. We were (are) doing the backslash-doubling any time we're using cairo-pdf-interchange (but this may be unnecessary/wrong since cairo issue 526 was resolved); the platform that's different is macOS, but that's because on macOS we don't go through cairo-pdf-interchange at all; pdf is generated through the cairo-quartz backend.
Assignee | ||
Comment 5•1 year ago
|
||
Regarding the backslashes, I confirmed that since we updated cairo, we now get incorrectly-doubled backslashes in pdf output if given a URL with backslash in it, so I spun off bug 1896892 to fix this. Thanks for noticing this!
Assignee | ||
Updated•1 year ago
|
Assignee | ||
Comment 6•1 year ago
|
||
This extends the cairo PDF backend to allow a link tag to specify both
DEST and URI attributes; the internal destination will be used if it
exists in the resulting PDF, and the external URI otherwise.
Updated•1 year ago
|
Assignee | ||
Comment 7•1 year ago
|
||
To take advantage of the new cairo capability, extend the link APIs to support
passing both a local destination name and a URI from the display list to the
rendering backend.
With this change, links to parts of the document that aren't included in the
print-to-pdf output will point back to the online resource, instead of just being
dead links.
(Note that this won't work the same on macOS at present, as we don't use
the cairo pdf backend there.)
Assignee | ||
Comment 8•1 year ago
|
||
Now that we provide both DEST and URI attributes to the cairo tag API, this is necessary
to keep internal destinations working when generating PDF on macOS. Unfortunately, the
new feature of falling back to the URI when the named destination doesn't exist in the
resulting document won't work here, because we have to decide which kind of link to
generate (whether to use CGPDFContextSetURLForRect or CGPDFContextSetDestinationForRect)
before we know whether the destination will be defined.
So this patch merely preserves the existing behavior on macOS; to get the improved
functionality, we'd need to pre-flight the whole print job to find out what destinations
are going to be defined, before actually doing the PDF generation. (Or switch over fully
to the cairo pdf backend.) This is left for a possible future enhancement.
Comment 10•1 year ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/b8582b9465ea
https://hg.mozilla.org/mozilla-central/rev/4ce8e4713fec
https://hg.mozilla.org/mozilla-central/rev/dd6c6e0bef33
Updated•2 months ago
|
Description
•