Closed Bug 1722300 Opened 3 years ago Closed 3 years ago

Save as PDF links to same pages are external links in PDF

Categories

(Core :: Printing: Output, enhancement)

Firefox 90
enhancement

Tracking

()

VERIFIED FIXED
92 Branch
Tracking Status
firefox92 --- verified

People

(Reporter: ajohnson, Assigned: jfkthame, NeedInfo)

References

Details

Attachments

(4 files)

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:90.0) Gecko/20100101 Firefox/90.0

Steps to reproduce:

Open any wikipedia page. eg https://en.wikipedia.org/wiki/Firefox. Select Print and destination "Save as PDF".

Actual results:

When I open the PDF and click on a link in the contents box of the wikipedia page it opens the link in a browser.

Expected results:

LInks in the PDF file that link to the same page should cause the PDF viewer to go to the target position in the displayed PDF.

Cairo supports internal linking so the bug is in the way Firefox is using the cairo API.

https://www.cairographics.org/manual/cairo-Tags-and-Links.html#internal-link

The Bugbug bot thinks this bug should belong to the 'Core::Printing: Output' component, and is moving the bug to that component. Please revert this change in case you think the bot is wrong.

Component: Untriaged → Printing: Output
Product: Firefox → Core

Yeah, nsDisplayListBuilder::Linkfier doesn't deal with internal links at all and just uses uri links. Seems fixable, though not the most trivial of the fixes.

Status: UNCONFIRMED → NEW
Type: defect → enhancement
Ever confirmed: true

Right, this is a limitation of the link-generating code as implemented so far.

We could do internal links using the page and pos attributes, but that seems hard because at the point where the link is to be generated, we may not have seen the destination yet and have no idea what page/position it's going to end up on.

A better fix would be to generate the appropriate CAIRO_TAG_DEST tag for the link destination, but that's also tricky given that we don't know at the time of processing the destination element (which could be anything with an id attribute, presumably) that it's going to be used as a link target. We could generate CAIRO_TAG_DEST for every element with an id, so that internal destinations are defined, but (a) that seems a bit wasteful, and (b) there's no guarantee that id attributes are unique. I'm not sure what either cairo or eventual PDF viewers would make of duplicate destinations. (But I suppose we could keep track of the set of destinations generated, and just discard any duplicates.)

Hmm, actually it looks like if we use the internal attribute on the destinations, that should avoid excessively bloating the PDF.

What does Firefox do when a page has multiple id attributes that are not unique?

You can creating your own naming convention for the destinations to ensure they are unique. They are not visible to the user. I would not worry about bloating the PDF. The internal links to a named destination take up very little space.

eg the annotation entry for an external link:

<< /Type /Annot
   /Subtype /Link
   /StructParent 1
   /Rect [ 199 166 280 179 ]
   /A <<
      /Type /Action
      /S /URI
      /URI (https://www.cairographics.org/)
   >>
   /BS << /W 0 >>
>>

the annotation entry for an named destination link:

<< /Type /Annot
   /Subtype /Link
   /StructParent 6
   /Rect [ 50 788 128 805 ]
   /Dest (Chapter 1)
   /BS << /W 0 >>
>>

and the Names table, generated from all the CAIRO_TAG_DEST tags, emitted at the end of the document looks like this

<< /Names [
   (Chapter 1) [129 0 R /XYZ 50 805 0]
   (Chapter 2) [199 0 R /XYZ 50 805 0]
   ...

Use the internal attribute, names should be omitted from the Names table if they are not referenced (if cairo doesn't already do this it is an easy fix).

My object stream compression changes will further reduce the size.

@ajohnson, in experimenting with this I'm not having good success with the internal attribute; it appears to work only if the destination is on the current page, not if it occurs on some later page of the document.

I think the trouble is that when cairo_pdf_interchange_write_dest tries to look it up in &surface->interchange, it doesn't find the destination, so it uses the name; but because the destination was marked as internal when it was defined, _cairo_pdf_interchange_write_document_dests skips outputting it.

So in reference to...

Use the internal attribute, names should be omitted from the Names table if they are not referenced (if cairo doesn't already do this it is an easy fix).

...it appears to me that names are always omitted from the Names table if they have the internal attribute, which is a problem if there was a cross-page forward reference to them (as that generates a named-dest link rather than a direct reference).

Am I doing something wrong here, or is this actually a cairo pdf-interchange bug?

Flags: needinfo?(ajohnson)

That sounds like a bug.

A link to a specific page needs to know the page object id which is not known until the page is emitted. So I think when I wrote the code the idea was links to previous pages used the page object and links to future pages used the Dest name and then at the end of the document when the Names table is emitted we know all the page object ids. So the code should keep track of whether it needs to emit a name even if it is marked as internal.

I've been thinking about whether it would be better to not emit Names for internal links to future pages and use indirect objects instead. I calculated that when a Name is > 16 bytes, using an indirect name uses less bytes.

I will work on a fix for cairo that used indirect objects for forward links. This will also fix this bug.

Here's the cairo fix. I'll merge it in a day or two.
https://gitlab.freedesktop.org/cairo/cairo/-/merge_requests/220

I've already merged to object stream compression feature and cairo tag debug feature.

Flags: needinfo?(ajohnson)

Thanks so much! I'll try to find time to pull this in and test it shortly.

In particular this includes the changes from cairo MRs

https://gitlab.freedesktop.org/cairo/cairo/-/merge_requests/197 (pdf-object-streams)
https://gitlab.freedesktop.org/cairo/cairo/-/merge_requests/213 (floating-point glyph widths)

The object-stream support gives us significantly more compact PDF output.

Assignee: nobody → jfkthame
Status: NEW → ASSIGNED

This is https://gitlab.freedesktop.org/cairo/cairo/-/merge_requests/220, required so that
we can use the 'internal' attribute when generating link destinations.

Depends on D121238

Like the earlier support for the Link tag, this is a minimal version
that just implements the features we actually use.

Depends on D121239

Attachment #9233287 - Attachment description: Bug 1722300 - Implement internal destinations when generating PDF output through cairo. → Bug 1722300 - Implement internal destinations when generating PDF output through cairo. r=mattwoodrow

I've merged the pdf-links-fixes MR to master.

Attachment #9233287 - Attachment description: Bug 1722300 - Implement internal destinations when generating PDF output through cairo. r=mattwoodrow → Bug 1722300 - patch 4 - Implement internal destinations when generating PDF output through cairo. r=mattwoodrow
Pushed by jkew@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/13f609eeff54
patch 1 - Pull recent PDF fixes/enhancements from upstream cairo (master branch). r=jrmuizel
https://hg.mozilla.org/integration/autoland/rev/be1dd7d93bba
patch 2 - Pull in cairo MR "pdf links: fix forward references to dest names with 'internal' flag". r=jrmuizel
https://hg.mozilla.org/integration/autoland/rev/b11f7f6bff04
patch 3 - Add basic support for internal named destinations when generating PDF output via cairo-quartz-surface. r=jrmuizel
https://hg.mozilla.org/integration/autoland/rev/a79d9a152a6d
patch 4 - Implement internal destinations when generating PDF output through cairo. r=mattwoodrow

This is creating broken PDF files in some cases. https://bugzilla.mozilla.org/show_bug.cgi?id=1724182

Regressions: 1724182
Flags: qe-verify+

I've reproduced this bug using the steps form comment 0, on an affected Nightly build 92.0a1 (20210724215146).

The issue is verified as fixed on Beta 92.0b2, under macOS 11, Win 11 x64 and Ubuntu 21.04 x64.

Status: RESOLVED → VERIFIED
Flags: qe-verify+
Regressions: 1725743
See Also: → 1726347

Adrian, unfortunately we had to disable this feature for now, because we started hitting assertions/crashes within the pdf backend; see bug 1725743.

I have filed https://gitlab.freedesktop.org/cairo/cairo/-/issues/508 about the issue that seems to be at the root of this. If you have a chance to look into that, we'd be hugely thankful, as it would be great to be able to re-enable this feature.

Flags: needinfo?(ajohnson)
See Also: → 1729276
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: