Closed Bug 1657973 Opened 5 years ago Closed 21 days ago

Support generation of tagged (accessible) PDFs

Categories

(Core :: Printing: Output, defect, P3)

defect

Tracking

()

RESOLVED FIXED
150 Branch
Accessibility Severity s2
Tracking Status
firefox150 --- fixed

People

(Reporter: svoisen, Assigned: Jamie, NeedInfo)

References

(Blocks 1 open bug)

Details

(Keywords: access, Whiteboard: [print2020])

Attachments

(6 files)

When saving a document as PDF, either via "Save as PDF" or through the print user interface, we currently don't support the generation of tagged PDFs which contain the metadata necessary for screen reader use.

Keywords: access
Priority: -- → P3
Whiteboard: [print2020]
See Also: → tagged-pdf
Accessibility Severity: --- → s2

:jfkthame, could you have a look on this bug please ?

Flags: needinfo?(jfkthame)

We generate the PDF output via cairo, and cairo in principle has APIs for generating the tagged-pdf annotations (cairo_tag_begin, cairo_tag_end) that would describe document structure. (See https://www.cairographics.org/manual/cairo-Tags-and-Links.html#doc-struct) However, at the point where we're generating the PDF, we're quite far removed from the DOM structure, and dealing instead with a display list created from the frame tree. Doing a good job of tagged PDF generation looks like a pretty big and complex task... even more so given that there's no guarantee that the original HTML document was itself well structured.

So while I think this would be a great feature, I don't think implementing it is currently on anyone's radar, and it'd be a major undertaking if we do ever tackle it.

Flags: needinfo?(jfkthame)

FYI, Chrome is able to generate a tagged pdf when saving to pdf.

We have some capabilities to go from frame tree to a11y role and so on. The main question would be whether there's an easy way to get the role of a frame (or something that loosely maps to this) that doesn't involve instantiating the whole a11y engine...

Jamie, do you know if such an API exists off hand (I don't think so?) or how easy would it be to implement it?

Flags: needinfo?(jteh)

(In reply to Emilio Cobos Álvarez (:emilio) from comment #4)

We have some capabilities to go from frame tree to a11y role and so on. The main question would be whether there's an easy way to get the role of a frame

Once we're at the display list stage, do we have the frame any more though?

(or something that loosely maps to this)

For various reasons, The a11y engine has been moving further and further away from using the frame tree to determine roles in the last few years. Markup is really a much better purveyor of these kinds of semantics, especially given that ARIA is not just for interactive controls these days. You can have ARIA headings, ARIA tables, etc.

That said, to answer the question, where the a11y engine does use frames to determine roles, that happens in nsAccessibilityService::CreateAccessibleByFrameType, which in turn uses nsIFrame::AccessibleType. So, there is no direct API, but it probably wouldn't be that hard to add one, with the caveats above that it would be far from ideal.

that doesn't involve instantiating the whole a11y engine...

Assuming there were benefits (and there probably aren't since we have lost the semantic markup based on comment 2), would this be such a bad thing? We're trying to create an accessible PDF after all. We might even be able to instantiate the a11y engine only in the content process where it is needed, though that might require some slight IPC tweaks. Note that we can choose to shut down the accessibility engine after we're done. We do support that for internal (XPCOM) instantiation, just not (currently) for platform API instantiation.

Flags: needinfo?(jteh)

We do still have the frame and dom trees around yeah, that's how we e.g do pdf links. My gut feeling is that the information we can expose to pdf is relatively simple so starting up then shutting off a11y would be a lot of work compared to what we'd get out of it... But may be wrong there.

I mean, sure, we could write a completely new thing to map from HTML tag names, ARIA attributes, etc., but in time, I worry we'd be duplicating massive chunks of the a11y engine. PDF does allow you to expose much of what you can expose in HTML and ARIA, even making fillable forms accessible. It's just that most authors/generation tools don't bother.

I guess a lot of the complexity in the a11y engine is dealing with dynamic content and client queries, which we don't need to deal with here, so I do take your point. But equally, I worry about anything that would introduce an entirely new layer of code we need to maintain every time we need to add a new HTML element, ARIA role/property, etc.

Changing the type of this bug from the "Enhancement" to "Defect" because this is, in fact, a bug that Firefox does not tag PDFs.

Type: enhancement → defect
Depends on: 1950656

I'm not familiar with the Gecko code associated with this, let alone the Cairo code, so please forgive any inaccuracies. Even so, I did a bit of digging into this in an attempt to figure out how much work this might be and what questions we need to answer. Some findings:

  1. It looks like we'd need to store accessibility semantics on the nsDisplayListBuilder, then have a special nsPaintedDisplayItem subclass which generates the tags. At least, that's how we currently handle links. I guess we could just use an nsTHashMap or nsTArray or similar.
  2. The only tagging we currently support is for links. However, even the way we handle link tagging right now is a bit problematic for accessibility. We create a begin/end pair with no content in between, passing a rect. The problem with that is that semantically, the link contains no text! We could solve that by putting the drawing operations inside the begin/end pair, but I'm not sure if you can pass an overriding rect when you do that, and bug 454059 comment 32 suggests we really do want to provide our own rect.
    • :jfkthame, in bug 454059 comment 32, you noted that you had experimented with putting drawing commands inside cairo_tag_begin and cairo_tag_end. Putting aside the question of whether we can also pass a rect in that case (we could presumably "fix" that in Cairo), do you recall whether it was at least feasible to do tag pairs containing content with our display list code without massive refactoring? I ask because pairs containing content are significantly different to how links currently work. I guess we'd need a start nsPaintedDisplayItem and an end nsPaintedDisplayItem or something like that?
  3. From what I can tell, Cairo doesn't support any attributes for structure tags. This mailing list post seems to confirm that. There aren't a lot of attributes we need, but, for example, NVDA queries: Alt (image alt text); ActualText; Lang (text language); Layout->TextDecorationType (underline/strikethrough); and Summary, Headers, ColSpan, RowSpan and Scope (tables). We'd need to add support for those to Cairo.
    • :jfkthame, what's the situation with contributing patches to Cairo? I see we have quite a few patches in our repo. Do we generally try to contribute these upstream or do we just consider our Cairo a divergent fork?
  4. Aside from just the semantics of individual pieces of content, we need to consider the logical structure (reading order) of the document. The display list order might not be the reading order. Even if the display order follows DOM order (which I suspect it sometimes doesn't, but I'm not sure), there are things like aria-owns (and eventually CSS reading-flow) which can change the reading order. The accessibility engine knows about this, so this isn't a problem in terms of the input. However, from what I can determine, Cairo just supports pairs of tags, with drawing commands inside each pair. There doesn't seem to be any way to specify the order of tags inside the PDF StructTreeRoot other than the order in which they were added using cairo_tag_begin/end, which will be the display list order. If I'm correct, fixing that would involve quite some work in Cairo, including an API change of some sort.
    • This seems like the hardest problem to solve. However, I think tagged output, even with incorrect reading order, is a great deal better than the nothing we have now.
Flags: needinfo?(jfkthame)

(In reply to James Teh [:Jamie] from comment #9)

I'm not familiar with the Gecko code associated with this, let alone the Cairo code, so please forgive any inaccuracies. Even so, I did a bit of digging into this in an attempt to figure out how much work this might be and what questions we need to answer. Some findings:

  1. It looks like we'd need to store accessibility semantics on the nsDisplayListBuilder ...

I think that would be a possible approach. Alternatively, maybe we could map back from display list items via the frame tree to DOM content at the point of needing a11y semantics for tag generation. That might add less overhead to the display list itself?

  1. The only tagging we currently support is for links. However, even the way we handle link tagging right now is a bit problematic for accessibility. We create a begin/end pair with no content in between, passing a rect. The problem with that is that semantically, the link contains no text! We could solve that by putting the drawing operations inside the begin/end pair, but I'm not sure if you can pass an overriding rect when you do that, and bug 454059 comment 32 suggests we really do want to provide our own rect.

If I remember right, I think it would be possible to pass our own rect to the link-begin tag, while having the drawing operations for the link content between the begin/end pair. I don't recall exactly how I went about doing links-with-content back when I was experimenting with this, but there's probably a way to generate the end-tag based on scoping or something like that.

Providing our own rect is necessary because it's not uncommon for pages to have links that don't actually contain rendered content that cairo detects and automatically accumulates into a suitable area. E.g. there could be a transparent link element placed on top of some kind of background that provides the visual representation, but is not structurally "inside" the link in the HTML page. If we don't provide the rect explicitly, such a link would end up zero-sized in the PDF.

  1. From what I can tell, Cairo doesn't support any attributes for structure tags.

The post you linked is pretty old and I don't think it reflects the current state of things. It should be possible to generate structure tags; see https://www.cairographics.org/manual/cairo-Tags-and-Links.html.

One thing I worry about with generating document structure tags is how this interacts with pagination, and more particularly with PDF generation if a subset of pages are being saved -- e.g. if we're saving a random page from the middle of a large structured document, how do we determine what structure tags to wrap around it, and ensure everything is properly balanced?

  1. Aside from just the semantics of individual pieces of content, we need to consider the logical structure (reading order) of the document.

Agreed, I think this is a tough one. I'd expect the display list order to largely reflect the frame tree, and in the simplest cases the frame tree will have a fairly straightforward relationship to the DOM, but handling more complex cases (e.g. positioned elements, floats, reordered grid items, reading-flow, etc) seems likely to be a challenge.

Flags: needinfo?(jfkthame)

(In reply to Jonathan Kew [:jfkthame] from comment #10)

maybe we could map back from display list items via the frame tree to DOM content at the point of needing a11y semantics for tag generation. That might add less overhead to the display list itself?

Ah. I didn't realise you could get the layout frame from a display list item, but now I see there's a simple getter for that. That is probably easier, yes.

  1. From what I can tell, Cairo doesn't support any attributes for structure tags.

The post you linked is pretty old and I don't think it reflects the current state of things. It should be possible to generate structure tags; see https://www.cairographics.org/manual/cairo-Tags-and-Links.html.

You can indeed generate structure tags, but Cairo doesn't support the attributes for those tags such as the ones I listed. See here where various types of attributes are handled; note that other attribute types parse attributes, but structure tags don't. Also see the various attribute_spec_t definitions here; note that there is no spec for structure attributes. I don't expect this will be particularly difficult to add, but I'm almost certain it will require changes to Cairo.

One thing I worry about with generating document structure tags is how this interacts with pagination, and more particularly with PDF generation if a subset of pages are being saved -- e.g. if we're saving a random page from the middle of a large structured document, how do we determine what structure tags to wrap around it, and ensure everything is properly balanced?

That's a very good question. The simplest thing to do is probably to write opening tags for the semantics for the entire ancestry at the start of every page, then close them all at the end of the page. While not entirely ideal - you get repeated semantics at the end of a page - I still think that's far better than nothing. I actually think that might be the only way to do it in PDF, but I'm not certain of that.

(In reply to James Teh [:Jamie] from comment #11)

You can indeed generate structure tags, but Cairo doesn't support the attributes for those tags such as the ones I listed. ...

Ah, sorry -- I missed that this was specifically about the attributes for them, not merely the ability to generate structure tags. So yes, that support appears to be lacking, but probably not too difficult to add. If we do that, we should also try to get it upstreamed so that it doesn't become a local patch we have to carry indefinitely. (I know we maintain some local patches, but we try to avoid this as much as we can. There used to be a lot more of them, actually, but we made an effort a while back to reduce our divergence.)

The severity field is not set for this bug.
:dshin, could you have a look please?

For more information, please visit BugBot documentation.

Flags: needinfo?(dshin)
Severity: -- → S3
Flags: needinfo?(dshin)

The severity field for this bug is set to S3. However, the accessibility severity is higher, .
:jwatt, could you consider increasing the severity?

For more information, please visit BugBot documentation.

Flags: needinfo?(mozmail)

(In reply to James Teh [:Jamie] from comment #9)

  1. Aside from just the semantics of individual pieces of content, we need to consider the logical structure (reading order) of the document. The display list order might not be the reading order. Even if the display order follows DOM order (which I suspect it sometimes doesn't, but I'm not sure), there are things like aria-owns (and eventually CSS reading-flow) which can change the reading order. The accessibility engine knows about this, so this isn't a problem in terms of the input. However, from what I can determine, Cairo just supports pairs of tags, with drawing commands inside each pair. There doesn't seem to be any way to specify the order of tags inside the PDF StructTreeRoot other than the order in which they were added using cairo_tag_begin/end, which will be the display list order. If I'm correct, fixing that would involve quite some work in Cairo, including an API change of some sort.

FWIW, Chromium uses Skia for PDF tagging. Rather than beginning tags, rendering content and then ending tags, Skia has an API which lets you specify the tag tree separately. Quoting the relevant commit:

Adds an interface for the document creator to pass in a tree of tags indicating the structure of the document, each with a type (from a predetermined enum of possible types) and a node ID. It also adds a setNodeId function to SkCanvas so that page content can be associated with a particular tag.

This is the kind of API I envisaged we'd want in Cairo in order to support proper logical order.

Interestingly, having an API like this would avoid the need to faf about with trying to open and close tags from inside the display list item. So it would make things much simpler in the display list code. Unfortunately, I still think implementing this in Cairo would be a pretty big lift.

I agree with Jamie, it should be a way simpler like this.
The idea would be to be able to just add some marks around each drawn elements:

/Tag <<MCID anIntegerId >>
  BMC
    .... draw whatever you want
  EMC

If an element has to be split because of a new page for example, then the MCID (Marked Content IDentifier) is incremented and new marks are added on the next page.
Finally we've to maintain a mapping between elements and the MCIDs used to represent them.
Once the drawing of the pages is finished we just have to create a StructTree which is reflecting the semantics and the contents are just "pointing" through MCIDs on what has been drawn.
For example, if we have something like:

<div><p>Hello <img ...> World <p></div>

it'll become

/P <<MCID 0>>
  BMC
    BT /F1 10 Tf 1 0 0 1 100 100 Tm (Hello) Tj ET
  EMC
/Img <<MCID 1>>
  BMC
    /X1 Do % Draw image X1
  EMC
/P <<MCID 0>>
  BMC
    BT /F1 10 Tf 1 0 0 1 150 100 Tm (World) Tj ET
  EMC

And then we must create a struct tree:

<< /Type /StructElem
    S: /Div,
    K: [
     << /Type /StructElem
          S: /P,
          K: [<< /Type /MCR
                   /Pg 1 0 R % ref on first page
                   /MCID 0 >>,
                << /Type /StructElem
                    S: /Img,
                    K: << /Type /MCR
                         /Pg 1 0 R % ref on first page
                         /MCID 1 >> >>,
                << /Type /MCR
                   /Pg 1 0 R % ref on first page
                   /MCID 2 >> ] >> ] >>

We don't have to create a "structure" in the content stream itself, the names used just before BMC give the role of the marked content.
On the Cairo's side, I'd say it's only a matter of adding some public api to just add the BMC/EMC things on the surface like it's already possible to add tags (which add the BMC/EMC things + what's required to create the struct tree at the end). Then we should add what's needed in order to write the struct tree ourselves.
In the struct tree it's possible to add some semantics about pages but it isn't required. For example if we've a <p>...</p> on two pages then the struct tree will just contain /Document -> /P -> [ MCID-0, MCID-1 ].
We've to create a ParentTree too but it'll be pretty straightforward.
That said, I'm just speaking from my pdf point of view and so maybe I'm overlooking a lot of things.

another thought I had is that it might be possible to put the MCIDs (marked content ids) in the PDF initially using Cairo, then use a separate, self-written tool (not Cairo) to rewrite the structure tree based on our accessibility tree, referencing the MCIDs placed by Cairo. That would effectively mean generating the PDF in two passes, but it might (that's a big might) be simpler than tweaking Cairo. It's slightly less efficient than doing it in a single pass, though, and certainly more ugly from an architectural perspective. But the output (and benefit to the user) would ultimately be the same.

:jfkthame, would this be remotely feasible?
:calixte, do you know of existing C++ or Rust libraries that would allow replacing the struct tree of an existing document like this? Totally fine if you don't; I just thought I'd ask given your familiarity with PDF.

Flags: needinfo?(jfkthame)
Flags: needinfo?(cdenizet)

:Jamie, the problem of doing that is it can take some time:

  • the pdf needs to be parsed in order to get the pages and their contents.
  • we need to add an entry in each page dictionary
  • we need to write a struct tree
  • we need to rewrite the xref table with the new entries for the struct tree

Looking for the MCIDs in the content stream can be a bit painful: we could use regex or whatever but if the text of the pdf contains "/MCID..." (for example if we print this bug report) then we're stuck, so it'd require to correctly parse the contents, which can take some time too.
So I'm a bit so-so with such a solution.
To answer to your question, I don't know if such a C++/Rust lib for rewriting a struct tree is existing, but for sure it's doable with a js one in using pdf.js :).

Out of curiosity, I had a look on the Cairo code (for the pdf part) and it I think it should be easy to patch.
I don't know if Cairo people would accept few patchs in order add some kind of low-level api in order to create the struct tree ourselves.

Flags: needinfo?(cdenizet)

I think Calixte has a much better sense of this than I do, but FWIW I share the hesitancy in comment 19 about this approach.

In view of the API notes in comment 16, I'm wondering if we should look again at Skia as the PDF-generating backend. This was being worked on several years ago (see bug 1372506 & related issues) but set aside, and AFAIK hasn't really been touched recently, but perhaps it could be resurrected.

Flags: needinfo?(jfkthame)

Jonathan, do we ever consider using for example https://github.com/LaurenzV/krilla ?
For example, when I read:
https://github.com/LaurenzV/krilla/blob/f22a9365bafa46bbe26c8c56c88208eb50063e63/crates/krilla-tests/src/tagging.rs#L65-L87
I think it should be possible to add MCIDs in the content stream while updating the struct tree separately.
It uses skrifa for the font stuff which is a pretty good idea I think.

That said, I've no idea if the library is mature enough to be used in Firefox.

Flags: needinfo?(jfkthame)

(In reply to Calixte Denizet (:calixte) from comment #19)

  • the pdf needs to be parsed in order to get the pages and their contents.
    ...
    Looking for the MCIDs in the content stream can be a bit painful:

Do we need to look for them? That is, if we already know the MCIDs we wrote when we wrote the content, can we just use them? Or do you have to link directly to nodes rather than using ids?

(In reply to Jonathan Kew [:jfkthame] from comment #20)

In view of the API notes in comment 16, I'm wondering if we should look again at Skia as the PDF-generating backend.

Switching to Skia feels like a much bigger lift with a lot more risks, though. That potentially changes the output for everyone, as opposed to just changing the tagging output if we tweak the Cairo solution. Thus, the potential for noticeable regressions (and thus the subsequent long tail of bugs) is much higher. Also, those bugs are likely to require layout/graphics expertise (which we're not likely to get in the near term), whereas the accessibility team has a much greater chance of being able to handle things related to tagging.

Switching to skia is the right long-term solution tho... It'd be nice to get rid of mozcairo. There was some code in the tree already for skpdf, see bug 1372506 and related... Most of PrintTargetSkPdf is in the tree already, I think it's worth at least exploring finishing that work.

Depends on: 2001909

FWIW hooking SkPdf to our save-to-pdf code seems somewhat straight-forward, patches up in bug 2001909 (behind a default-off-pref for now of course).

(In reply to Calixte Denizet (:calixte) from comment #21)

Jonathan, do we ever consider using for example https://github.com/LaurenzV/krilla ?
For example, when I read:
https://github.com/LaurenzV/krilla/blob/f22a9365bafa46bbe26c8c56c88208eb50063e63/crates/krilla-tests/src/tagging.rs#L65-L87
I think it should be possible to add MCIDs in the content stream while updating the struct tree separately.
It uses skrifa for the font stuff which is a pretty good idea I think.

That said, I've no idea if the library is mature enough to be used in Firefox.

I don't know that we've ever looked into krilla as a potential backend (well, it's only been around for a year). From a brief look, it seems pretty interesting: we could create a DrawTargetKrilla or something, and render entirely through krilla (rather than either cairo or skia).

Seems like a bigger undertaking than either incremental enhancements to our current cairo codepath or resurrecting the SkPDF code that we experimented with some years ago. But if it's mature enough and well maintained, it might be an attractive goal. Note that there's also a proposal to migrate Gecko's font handling to skrifa (bug 1987940), which might tie in well with this.

(One thing that would be interesting -- but may be hard to evaluate without quite a lot of work -- would be to compare the PDF output from the various possible backends. There could potentially be substantial differences in quality and file size, depending how the library goes about representing the various graphical constructs in PDF.)

Flags: needinfo?(jfkthame)

(In reply to Emilio Cobos Álvarez [:emilio] from comment #24)

FWIW hooking SkPdf to our save-to-pdf code seems somewhat straight-forward, patches up in bug 2001909 (behind a default-off-pref for now of course).

What is left to make it the default other than the jpeg decoder bit and extensive testing?

Well, I think mostly the extensive testing is the big part. I can look into hooking up the image decoder if we want to get that ready asap :)

We could ask QA to test with a bunch of random web pages, and then turn it on by default in Nightly for more extensive testing by the Nightly population.

(In reply to Marco Castelluccio [:marco] from comment #26)

What is left to make it the default other than the jpeg decoder bit and extensive testing?

As far as I can tell, DrawTargetSkia doesn't support Link(). I think this means hyperlinks won't work at all right now. Looking briefly, I think we can support this using SkAnnotateRectWithURL, but I'm not certain.

Now that we potentially might be able to use SkPDF, I've done a bit more digging into what might be involved. As I understand it, printing flow goes something like this:

  1. In the content process, a request is made to print the document. That gets handled by nsGlobalWindowOuter::Print.
  2. That makes a static clone of the document.
    • Accessibility currently ignores static clones, so we'll need to override that for this case.
    • It's unclear to me when the layout frame tree is built. We may need that in order to build the accessibility tree and send it anywhere.
  3. It creates a DrawEventRecorderPRFileDesc to record draw target events.
  4. RemotePrintJobChild::InitializePrint is called, which calls RemotePrintJobChild::SendInitializePrint.
  5. In the parent process, when RemotePrintJobParent::RecvInitializePrint is called, PrintTargetSkPDF is constructed. That calls SkPDF::MakeDocument.
    • We have to pass the structure tree to SkPDF::MakeDocument. That means we need the accessibility tree in some form at this point, even though we haven't painted the pages yet.
    • In contrast, Chromium seems to paint all the pages before it calls SkPDF::MakeDocument. This is an architectural difference that is probably going to make life a bit painful for us using SkPDF; see below.
    • I guess we could pass the accessibility tree (or some form of it) to RemotePrintJobChild::SendInitializePrint. We already have PDocAccessible to serialise an accessibility tree from content to parent, but this might be a bit tricky to use here, as it would need to be sent before RemotePrintJobChild::SendInitializePrint.
    • SkPDF needs global int ids, but Gecko accessibility uses document specific uint64_t ids. For this purpose, we'll need to generate ids for each Accessible and maintain a temporary map somewhere. If we want any chance of supporting out-of-process iframes, that map should probably be in the parent process.
  6. For each page, the content process paints to the recorder, then the parent process paints the recorded events to the SkPDF canvas. In the parent process, the page handling starts with RemotePrintJobParent::RecvProcessPage.
    • Before the drawing instructions for a node, we need to get the Accessible id so we can set it for the following instructions using SkPDF::SetNodeId when we replay in the parent process. The draw target would probably need a SetAccessibleId method or similar, taking a uint64_t Accessible id, as well as some way of identifying the document.
  7. If there are out-of-process iframes, these are handled using CrossProcessPaint.
    • This is potentially problematic for accessibility because we need to prepare the entire struct tree before SkPDF::MakeDocument is called. However, at that point, we haven't done CrossProcessPaint, which means we haven't processed the iframes yet.
    • Are the iframe documents static clones? I haven't been able to figure this out.
    • Are the iframe documents ready by the time RemotePrintJobChild::InitializePrint is called? if so, perhaps we could somehow tell the parent process to request and/or wait for the accessibility trees from those processes before creating the print target.
    • Iframe documents will have their own document specific Accessible ids. This is why the draw target probably needs to take both an Accessible id and some way of identifying the document when setting an Accessible id.
    • We need some way to determine which Accessible embeds the document for an out-of-process iframe. For normal accessibility, we use PBrowserBridge to establish this link, but I'm not sure if we have PBrowserBridge in this case.
    • Interestingly, Chromium doesn't yet tag PDF correctly for iframes.

For reference, I found this document which explains the design of tagged PDF export in Chromium.

Okay. I have a working (but somewhat ugly) proof of concept for generating tagged PDF with SkPDF. It can tag basic headings, paragraphs, lists, links and tables and correctly associate the text therein. It's in my wip/genTaggedPdf branch on GitHub.

Obviously, there's still a lot of work to do. Major design issues:

  1. In order to build the struct tree and maintain the id mapping, we need a builder instance in the parent process which is constructed with the document Accessible. However, the printing code in the parent process doesn't have any concept of the document being printed, only the print info and drawing instructions. That makes it very difficult to plumb the document Accessible through the printing code. So right now, constructing the builder is handled by a11y::DocAccessibleParent.
  2. Because of this plumbing challenge, the builder is currently a global singleton. Aside from being kinda ugly, I'm not sure if this could cause problems. Is it possible to print two documents simultaneously? Ideally, the builder instance would be held by some printing object, probably PrintTargetSkPDF, but that requires plumbing the Accessible through multiple other layers (nsPrintJob, nsDeviceContext, nsIDeviceSpec, nsDeviceSpecProxy, RemotePrintJobChild, RemotePrintJobParent, etc.).
  3. PrintTargetSkPDF and DrawTargetSkia both need to be able to access this builder. However, even if we could hold the builder instance in PrintTargetSkPDF, I don't think a draw target can access the print target that created it.
  4. Ideally, we would only build the accessibility tree and set Accessible ids for PDF documents. Currently, we do it for all print jobs. Is there some way to detect that we're printing to a PDF from within the content process?
  5. This doesn't currently handle iframes at all, though the high level design allows for it. Figuring out how to get content processes to send the accessibility trees for iframe documents and then making the parent process wait for those to arrive before PrintTargetSkPDF::BeginDocument runs will be tricky. Also, linking the iframes into the correct place in the accessibility tree might be tricky if we don't have PBrowserBridge. That said, this seems solvable without boiling the ocean.
  6. This doesn't currently support parent process (LocalAccessible) documents. Again, the high level design allows for it though.
  7. This will only work if the accessibility engine is running. To support the case where it isn't running, we will need to figure out how to spin up the accessibility engine just for PDF print documents and shut it down immediately after we're done. Again, this will be tricky, but I think it's doable; we already have some limited support for shutting down the engine if it wasn't spun up by an OS client.

Curious, why doing that tagging work in the parent process? I would've expected the content process to generate the right info, and pass it around via RemotePrintJob to the parent in an easily digestible way.

That was my original thought too. However, that's not feasible if we ever want to support OOP iframes.

  1. OOP iframes are handled by CrossProcessPaint, but only after SkPDF::MakeDocument has already been called. We need the whole tree before that point. So, we're already outside the scope of what PRemotePrintJob can handle.
  2. Even assuming we could get the tag trees from all the content processes into the parent process and merge them somehow, SkPDF needs global 32 bit ids, but Gecko accessibility uses document specific uint64_t ids. I guess the tag tree serialisation could handle the mapping from Gecko accessibility id to SkPDF id somehow, but that mapping would still need to survive through the entire printing process, including all sub-documents.
  3. Right now, I'm only setting tags. Eventually, we'll need to set attributes as well. That means quite a bit of serialisation code. We already have accessibility code to handle tree serialisation and maintaining two serialisers doesn't seem ideal.

It looks like we might also be able to use this to generate a PDF outline/table of contents from headings. See SkPDF::Metadata::fOutline.

I now have this working for both in-process and out-of-process iframes.

One thing that seems odd to me is that nsDisplayList::IsForPrinting returns false for an OOP iframe inside a document being printed. So it seems CrossProcessPaint doesn't use the printing mode for the nsDisplayList. That's problematic because it means we end up calculating Accessible ids for screen rendering, which is pointless and wasteful. I think this will also already cause problems for links inside OOP iframes, since we only do those if IsForPrinting is true.

(In reply to James Teh [:Jamie] from comment #35)

One thing that seems odd to me is that nsDisplayList::IsForPrinting returns false for an OOP iframe inside a document being printed. ... That's problematic because it means we end up calculating Accessible ids for screen rendering

I ended up finding a solution to this problem, but I think the fact that IsForPrinting returns false is still problematic for links.

Answering some of my own questions for future reference:

(In reply to James Teh [:Jamie] from comment #30)

- It's unclear to me when the layout frame tree is built. We may need that in order to build the accessibility tree and send it anywhere.

It seems to be built in time for us to create the DocAccessible.

- Are the iframe documents static clones?

Yes.

- Are the iframe documents ready by the time RemotePrintJobChild::InitializePrint is called?

Possibly not, but it doesn't really matter because we have to ask and wait for their accessibility trees anyway. The BrowsingContext tree is ready, though.

- We need some way to determine which Accessible embeds the document for an out-of-process iframe. For normal accessibility, we use PBrowserBridge to establish this link, but I'm not sure if we have PBrowserBridge in this case.

We do.

(In reply to James Teh [:Jamie] from comment #31)

  1. This doesn't currently support parent process (LocalAccessible) documents.

This is now fixed too, though it doesn't support out-of-process iframes hosted in parent process documents.

Blocks: 1950656
No longer depends on: 1950656

In order to generate tagged PDF later, we need the ability to associate printed content with semantic document content.
The BrowsingContext id is the easiest way to do this without too much coupling of modules.

Assignee: nobody → jteh
Status: NEW → ASSIGNED
  1. Allow DocAccessibles to be created for static documents when explicitly requested, since print documents are static clones.
  2. Changes in a couple places to support static documents correctly.
  3. To generate a tagged PDF, we need the accessibility tree to be built for the document and any iframes, and we need to initialize PdfStructTreeBuilder. Add DocManager::NotifyOfPrintDocument to trigger this.
  4. PDF generation happens in the parent process, so if the document is remote, we send the content via IPDl and use the RemoteAccessible tree. Add a PDocAccessible message to notify the parent process that this document is being printed.
  5. We need to request the accessibility trees for OOP iframes as well. Add a PBrowser message to facilitate this.
  1. Notify accessibility about a document that is being printed so it can build the accessibility tree and initialize PdfStructTreeBuilder.
  2. If there are OOP iframes, wait for their accessibility trees to arrive async before proceeding with printing. Unfortunately, SkPDF requires the struct tree when it is initialised, so we can't do this during printing the way we do with CrossProcessPaint.
  3. PrintTargetSkPDF uses PdfStructTreeBuilder to build the struct tree and pass it to SkPDF.
  4. When printing is finished (done or aborted), PrintTargetSkPDF notifies PdfStructTreeBuilder that it is done so the builder can be cleaned up.

Although the struct tree is generated, it isn't associated with any of the printed content yet.
We'll do that in the next patch.

Note that this requires that the accessibility engine is enabled, which means an accessibility client needs to be running or the engine needs to be force enabled.
If it isn't, tagged PDF output will be disabled.
Fixing this will require adjustments to the accessibility engine which will be handled in a separate bug.

In order for the struct tree to be useful, we need to associate printed content with the various nodes in the struct tree.

  1. Add an AccessibleId command to DrawTarget.
  2. Add a nsDisplayAccessibleId display list item.
  3. Make layout frames output nsDisplayAccessibleId items to associate painted content with accessibility tree nodes.
  4. Support plumbing the AccessibleId DrawTarget command from the content process to the parent process.
  5. Implement AccessibleId for DrawTargetSkia, associating subsequent content drawn to the canvas with the relevant node in the struct tree built by PdfStructTreeBuilder.
See Also: → 2021813

This uses pdf.js to get information about the PDF struct tree and content stream.
pdf.js doesn't support attributes yet, so we can't test those.

Attachment #9549794 - Attachment description: Bug 1657973 part 1: Plumb BrowsingContext id to PrintTargetSkPDF::BeginPrinting and make it available to PrintTargetSkPDF::Finish. r?emilio! → Bug 1657973 part 1: Plumb BrowsingContext id to PrintTargetSkPDF::BeginPrinting and make it available to nsDeviceContext::End/AbortDocument. r?emilio!
Pushed by jteh@mozilla.com: https://github.com/mozilla-firefox/firefox/commit/cc79f1945a0f https://hg.mozilla.org/integration/autoland/rev/3785033ceae8 part 1: Plumb BrowsingContext id to PrintTargetSkPDF::BeginPrinting and make it available to nsDeviceContext::End/AbortDocument. r=emilio,geckoview-reviewers,win-reviewers,layout-reviewers,gstoll,m_kato https://github.com/mozilla-firefox/firefox/commit/8d98b7bed14d https://hg.mozilla.org/integration/autoland/rev/8367531b11f0 part 2: Add PdfStructTreeBuilder. r=eeejay https://github.com/mozilla-firefox/firefox/commit/5a036cc26ef0 https://hg.mozilla.org/integration/autoland/rev/b6acf8307bd7 part 3: Accessibility engine changes required for PdfStructTreeBuilder. r=eeejay https://github.com/mozilla-firefox/firefox/commit/f600fda75a97 https://hg.mozilla.org/integration/autoland/rev/db25a4a04260 part 4: Hook up PdfStructTreeBuilder to the printing and PDF code. r=layout-reviewers,dshin https://github.com/mozilla-firefox/firefox/commit/50abf9d6c337 https://hg.mozilla.org/integration/autoland/rev/c935457d4ed2 part 5: Hook up PdfStructTreeBuilder to the layout and gfx code. r=emilio,lsalzman,layout-reviewers https://github.com/mozilla-firefox/firefox/commit/864e3ed1cada https://hg.mozilla.org/integration/autoland/rev/6820c5e8164c part 6: Add tests for tagged PDF generation. r=eeejay
Pushed by abutkovits@mozilla.com: https://github.com/mozilla-firefox/firefox/commit/18fc13f54e56 https://hg.mozilla.org/integration/autoland/rev/db1369c0e3b7 Revert "Bug 1657973 part 6: Add tests for tagged PDF generation. r=eeejay" for causing bustages in SkPDF.
Flags: needinfo?(jteh)
Pushed by jteh@mozilla.com: https://github.com/mozilla-firefox/firefox/commit/6ef328dbd5a9 https://hg.mozilla.org/integration/autoland/rev/83f55b04aca3 part 1: Plumb BrowsingContext id to PrintTargetSkPDF::BeginPrinting and make it available to nsDeviceContext::End/AbortDocument. r=emilio,geckoview-reviewers,win-reviewers,layout-reviewers,gstoll,m_kato https://github.com/mozilla-firefox/firefox/commit/ca1f4218444f https://hg.mozilla.org/integration/autoland/rev/45a64c540611 part 2: Add PdfStructTreeBuilder. r=eeejay https://github.com/mozilla-firefox/firefox/commit/34025870b2d0 https://hg.mozilla.org/integration/autoland/rev/80772bc695e2 part 3: Accessibility engine changes required for PdfStructTreeBuilder. r=eeejay https://github.com/mozilla-firefox/firefox/commit/362b9f27e45e https://hg.mozilla.org/integration/autoland/rev/c030e90ba9fb part 4: Hook up PdfStructTreeBuilder to the printing and PDF code. r=layout-reviewers,dshin https://github.com/mozilla-firefox/firefox/commit/66110424ce48 https://hg.mozilla.org/integration/autoland/rev/24e8377639bf part 5: Hook up PdfStructTreeBuilder to the layout and gfx code. r=emilio,lsalzman,layout-reviewers https://github.com/mozilla-firefox/firefox/commit/4d3001bfdfaa https://hg.mozilla.org/integration/autoland/rev/174805eaf351 part 6: Add tests for tagged PDF generation. r=eeejay
Regressions: 2024862
Regressions: 2025119
QA Whiteboard: [qa-triage-done-c151/b150]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: