Closed Bug 1705139 Opened 4 years ago Closed 1 year ago

Accessibility Review for Tagged PDFs

Categories

(Firefox :: PDF Viewer, task)

task

Tracking

()

RESOLVED FIXED
a11y-review requested

People

(Reporter: bdahl, Assigned: Jamie)

References

Details

(Whiteboard: [pdfjs-accessibility])

Description:
Adds support for the tagged PDF feature which exposes more structure information about the document such as headings, tables, lists, etc. This allows for easier navigation with screen readers. Note: This is just the first implementation of the feature, I plan to do more work to integrate form elements and improve the text content of PDFs so paragraphs will be read without pause.

How do we test this?
Use a current nightly with a PDF that has tags. Some example tagged PDFs found crawling:

When will this ship?
Tracking bug/issue: 1704661
Design documents (e.g. Product Requirements Document, UI spec):
Engineering lead: bdahl
Product manager:

The accessibility team has developed the Mozilla Accessibility Release Guidelines which outline what is needed to make user interfaces accessible:
https://wiki.mozilla.org/Accessibility/Guidelines
Please describe the accessibility guidelines you considered and what steps you've taken to address them:
Not sure this question applies, everything added was to improve accessibility.

Describe any areas of concern to which you want the accessibility team to give special sattention:

a11y-review: --- → requested

Thanks so much for this amazing work. It is a huge leap forward. Here is some initial feedback, but I still have a lot more testing/investigation to do.

Broken blocks

At present, blocks tend to get "broken up", resulting in weird line lengths, inability to read by full paragraph, etc. When reading long-form text, this can become quite problematic. Sometimes, these breaks occur according to the visual line length in the document. Other times, the chunks are smaller than that. Ideally, blocks should be based on paragraph, heading, etc., rather than visual text chunk.

This is tricky because we're aria-owning stuff from the text layer, which necessarily gets display: block due to its absolute positioning.

One thing I think can safely be done is to mark spans inside .markedContent as role="presentation". For example:

<span class="markedContent" id="page370R_mcid1"><span style="left: 81.5563px; top: 226.895px; font-size: 40px; font-family: serif; transform: scaleX(1.05114);" dir="ltr" role="presentation">Social Security </span><span style="left: 98.5163px; top: 270.215px; font-size: 40px; font-family: serif; transform: scaleX(1.08913);" dir="ltr" role="presentation">Numbers for </span><span style="left: 133.476px; top: 313.535px; font-size: 40px; font-family: serif; transform: scaleX(1.09171);" dir="ltr"  role="presentation">Children</span></span>

This causes them to be stripped from the a11y tree, so the display: block isn't picked up, but the text is preserved.

Unfortunately, there are other cases that are much harder to solve. For example, consider this link:

<span role="link"><span aria-owns="77R"></span><span aria-owns="page7R_mcid22"></span><span aria-owns="75R"></span><span aria-owns="page7R_mcid23"></span><span aria-owns="84R"></span><span aria-owns="page7R_mcid24"></span></span>

Here, the text consists of separate chunks of marked content. We can't put role="presentation" on .markedContent to avoid display: block because we need it to be aria-owned, so it must be in the a11y tree. Ideally, .markedContent would be display: inline too, but that doesn't seem to work, so I'm guessing there's a styling reason that can't be done. I guess that means an intermediate inline span (which we put the id on) would have the same styling issue?

Internal links

When a document has links to sections (e.g. a table of contents), activating the links doesn't move the screen reader cursor to the page or section. This can be seen in the PDF specification.

The way this is handled on the web normally is that the page URL gets "#target" appended, the a11y engine gets notified that the page is scrolling to the element with id target and the a11y engine notifies a11y clients. I notice that activating an internal link doesn't append a "#target" to the page location. However, I see the links do have suffixes like "#G4.776894", though the id "G4.776894" doesn't exist in the document. In order for this to work, there would need to be an element with that id to which the screen reader should scroll.

Images

I saw you mention this as an open issue on the PR and we've discussed images on Slack, but I thought I'd document my thoughts here so we can track this better.

Looking at Acrobat, it seems to map Figure to the image role in OS a11y APIs. I think it'd probably make sense for us to do the same; i.e. role="img" instead of role="figure". That will provide a more familiar experience and will allow screen readers to read it even when there's no content inside, except...

An additional problem is that screen readers ignore leaf elements with a width or height of 0, which is currently the case for pdf.js images even with role="img". So, these somehow need to get a non-0 width and height without breaking visual layout. Perhaps we could position them with the same coordinates as the actual image in the canvas? Alternatively, we could position them off-screen, but that's not ideal because the a11y coordinates will be wrong (impacts mouse routing, hit testing, etc.).

Assignee: nobody → jteh

(In reply to James Teh [:Jamie] from comment #1)

Unfortunately, there are other cases that are much harder to solve. For example, consider this link:

<span role="link"><span aria-owns="77R"></span><span aria-owns="page7R_mcid22"></span><span aria-owns="75R"></span><span aria-owns="page7R_mcid23"></span><span aria-owns="84R"></span><span aria-owns="page7R_mcid24"></span></span>

Thanks for the initial feedback! I'm still going through it, but do you still have the URL or PDF that generates the above harder to solve case?

Flags: needinfo?(jteh)

The examples I cited concerning broken blocks came from the social security numbers for children document you linked in comment 0. Specifically concerning the link, I was looking at the "Social Security Numbers for Noncitizens (Publication No. 05-10096)" link.

Flags: needinfo?(jteh)
See Also: → 1706814

Hi Brendan! Jamie is out of town for a bit, so I wanted to reach out and touch base so we don't lose track of the status of this project; Any updates since you and Jamie last spoke? Anything you need from us?
Thanks again for your work on this 😀

Flags: needinfo?(bdahl)

Hey,
Not much of an update yet and I'm not blocked on anything (besides finding time). I've been playing around with improving the link handling, but the way acrobat structures links in tagged PDFs is not very compatible with the current way we generate the tree in PDF.js. We currently generate the tree in one pass, but I think we'll need to have a way of doing another pass to collect the text that should be announced for links.

One thing I think would still be nice to have from the accessibility team, is an easy way to get the accessibility tree from Firefox to make writing automated tests easier. In our case getting it through puppeteer would be ideal, but we could also do mochitests.

Flags: needinfo?(bdahl)

Not sure if this is exactly what you're looking for, but we have a few functions in our mochitest support files that might be helpful. The ones I highlighted there allow you to dump the tree to the console, and also let you compare trees.

See Also: → 1722740
Whiteboard: [pdfjs-accessibility]

We can close this as the review was done, all identified problems are broken up in separate bugs and are blocking bug 861157.

Status: NEW → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.