Closed Bug 1894849 Opened 1 month ago Closed 19 days ago

Semantic information in tagged PDF not exposed to screen readers

Categories

(Firefox :: PDF Viewer, defect, P1)

Firefox 125
Desktop
All
defect

Tracking

()

VERIFIED FIXED
128 Branch
Accessibility Severity s2
Tracking Status
firefox-esr115 --- unaffected
firefox125 --- wontfix
firefox126 + verified
firefox127 + verified
firefox128 --- verified

People

(Reporter: tmthywynn8, Assigned: calixte)

References

(Regression)

Details

(Keywords: access, regression)

Attachments

(3 files)

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:125.0) Gecko/20100101 Firefox/125.0

Steps to reproduce:

This bug is reproduceable in a new Firefox profile, though I do not know when it started. According to this article though, Firefox's PDF viewers is supposed to render the tags in a PDF to screen readers.

  1. With NVDA (Windows screen reader), open a tagged PDF in Firefox, e.g.:
    https://www.ssa.gov/pubs/EN-05-10023.pdf
  2. Press D to move to the first page, where the first piece of text should be "Social Security Numbers for Children".

Actual results:

This heading is broken into two elements, i.e.:

Social SecurityNumbers for

Children

Expected results:

NVDA should have announced that this was a level-one heading.

Confirmed. I originally wondered whether this might be a problem with the specific PDF, but: 1) Acrobat doesn't warn about an untagged PDF and it exposes the headings correctly; and 2) the simple test PDF doesn't expose headings correctly either. I'm pretty sure it used to; certainly, this is severe enough that it's highly unlikely that I missed this in the initial review of this work (bug 1705139).

I'm marking this as s3 for now, as you can still read the text, but it's bordering on an s2.

(In reply to Timothy Wynn from comment #0)

This heading is broken into two elements, i.e.:

Social SecurityNumbers for

Children

We were aware of this bug at least. It's bug 1708035. But I'm fairly sure we still exposed the fact that this was a heading previously, despite the broken block.

Status: UNCONFIRMED → NEW
Accessibility Severity: --- → s3
Component: Untriaged → PDF Viewer
Ever confirmed: true
Keywords: access, regression

5:55.60 INFO: Last good revision: 18720cd9e180a84f2440b6cfd6bc283e5bc63613
5:55.61 INFO: First bad revision: cbafc7ee44a8630659515b0fa2b87e1dd5465168
5:55.61 INFO: Pushlog:
https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=18720cd9e180a84f2440b6cfd6bc283e5bc63613&tochange=cbafc7ee44a8630659515b0fa2b87e1dd5465168
This implicates bug 1886872. There were a few a11y related fixes in that pdf.js merge, but I'm not sure which one of them would have broken this specifically.

Regressed by: 1886872

The problem is that .canvasWrapper now has aria-hidden="true", but .structTree is inside .canvasWrapper, which means it is also aria-hidden. aria-owns can't apply on aria-hidden subtrees, so all the structure information from tags is lost.

Elevating this to s2, since this means all semantic info is lost.

Accessibility Severity: s3 → s2

Set release status flags based on info from the regressing bug 1886872

:calixte, since you are the author of the regressor, bug 1886872, could you take a look? Also, could you set the severity field?

For more information, please visit BugBot documentation.

This is really odd. The regression range suggests Firefox 126, but the reporter and I can both reproduce the problem in Firefox 125.0.3.

I can't reproduce it in 124 though, so it only started in 125. That still suggests I might have misidentified the regressing bug incorrectly... somehow.

[Tracking Requested - why for this release]: All accessibility semantics are lost when reading tagged PDF documents.

Aha, bug 1889118 merged this in beta.

It seems like we might need some a11y browser tests which do some spot checks using the a11y tree to catch regressions like this. I know pdf.js has tests, but they're not going to catch something like this because it relates to a11y tree rules.

Severity: -- → S2

The bug is marked as tracked for firefox126 (beta) and tracked for firefox127 (nightly). We have limited time to fix this, the soft freeze is in 6 days. However, the bug still isn't assigned.

:marco, could you please find an assignee for this tracked bug? Given that it is a regression and we know the cause, we could also simply backout the regressor. If you disagree with the tracking decision, please talk with the release managers.

For more information, please visit BugBot documentation.

Flags: needinfo?(mcastelluccio)
Assignee: nobody → cdenizet
Status: NEW → ASSIGNED
Flags: needinfo?(cdenizet)
Priority: -- → P1

We will definitely fix this for 127. I don't know if we'll have enough time to fix it for 126 unfortunately, but we could uplift it for the 126 planned dot release.

Flags: needinfo?(mcastelluccio)

:calixte do you plan on cherry picking a fix for Fx126 that is safe to uplift?
do you also plan on landing the pdf update in time for 127 that goes to beta next week

Flags: needinfo?(cdenizet)

Calixte is off this week. The fix landed but was backed out because of a timeout (bug 1895324 comment 4).

No longer depends on: 1895324
Status: ASSIGNED → RESOLVED
Closed: 19 days ago
Depends on: 1896448
Resolution: --- → FIXED
Target Milestone: --- → 128 Branch
Attachment #9401638 - Flags: approval-mozilla-beta?

beta Uplift Approval Request

  • User impact if declined: Users with screen readers won't be able to read tagged pdfs
  • Code covered by automated testing: yes
  • Fix verified in Nightly: no
  • Needs manual QE test: yes
  • Steps to reproduce for manual QE testing: See comment#0
  • Risk associated with taking this patch: Low
  • Explanation of risk level: Very small change
  • String changes made/needed: No
  • Is Android affected?: yes
Flags: qe-verify+
Attachment #9401641 - Flags: approval-mozilla-release?

release Uplift Approval Request

  • User impact if declined: Users with screen readers won't be able to read tagged pdfs
  • Code covered by automated testing: yes
  • Fix verified in Nightly: no
  • Needs manual QE test: yes
  • Steps to reproduce for manual QE testing: See comment#0
  • Risk associated with taking this patch: Low
  • Explanation of risk level: Very small change
  • String changes made/needed: No
  • Is Android affected?: yes
No longer depends on: 1896534
Attachment #9401638 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
QA Whiteboard: [qa-triaged]

I can reproduce this issue in Release v125.0.3 and V126.0 and Beta v127.0b1; NVDA verbalizes:

  • "page 1 region clickable, social security number for"
  • "page 1 region clickable, children"

I can verify the fix in Nightly v128.0a1; NVDA verbalizes:

  • "page 1 region, social security numbers for, heading level 1".
  • "heading level 1, children"
    I have to mention that the last word of the title is read as a separate title, by itself.

VoiceOver in the affected build does not recognize the title as heading level 1 and reads each line of the title separately:

  • "Social Security, clickable" and then "You are currently on a text element"
  • "Numbers For, clickable" and then "You are currently on a text element"
  • "Children, clickable" and then "You are currently on a text element".

VoiceOver in fixed build recognizes the title as heading level 1 and reads the whole title in one go while also announcing the link:

  • "Heading level 1, 4 items, Social Security Numbers for Children htpps"//www.ssa.gov/" and then automatically says "You are currently on a heading level 1."

ORCA in the affected build: reads each row of the title separately as simple text
ORCA in the fixed build: reads each row of the title separately, but announces "heading level 1 each time".

It will be verified in Beta127 when it will be available.

OS: Unspecified → Windows
Hardware: Unspecified → Desktop
OS: Windows → All

Verification was also performed in Beta v127.0b2 in Windows10 + NVDA, MacOS11 + VoiceOver and Ubuntu22 + Orca. The same behavior is observed as described in the fixed builds paragraphs from the above comment.

Status: RESOLVED → VERIFIED
Flags: qe-verify+
Flags: needinfo?(cdenizet)
Attachment #9401641 - Flags: approval-mozilla-release? → approval-mozilla-release+

I can confirm that the behavior in Release Candidate v126.0.1 is the same as the "fixed behavior" described in comment 19. The First page's title elements are properly recognized as Heading Level 1 in the case of Windows10 + NVDA, MacOS11 + VoiceOver and Ubuntu22 + Orca.

I don't know the etiquette for this sort of thing, so please feel free to delete if I am just generating noise instead of contributing anything useful. I can confirm that on Windows, updating to 26.0.1 restores the semantic information and have tested with both the test document from the description as well as the far simpler simple test PDF linked to in Comment 1.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: