Closed Bug 1823296 Opened 2 years ago Closed 2 years ago

PDF editor removes tags from tagged PDFs

Categories

(Firefox :: PDF Viewer, defect, P1)

Firefox 111
defect

Tracking

()

RESOLVED FIXED
113 Branch
Accessibility Severity s2

People

(Reporter: aroselli, Assigned: calixte)

References

Details

(Keywords: access)

Attachments

(3 files)

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/111.0

Steps to reproduce:

When I opened Firefox 111 it prompted me to try editing a PDF. I did so using a PDF that I had tagged (a PDF/UA-conformant accessible PDF).

  1. Open a tagged PDF in Firefox;
  2. Make any edit (text box, drawing);
  3. Save PDF;
  4. Open the newly-saved PDF in Adobe Acrobat.

Actual results:

All pre-existing tags are gone.

Expected results:

No tags should have been removed.

Possibly related (for background on PDF/UA if nothing else): https://bugzilla.mozilla.org/show_bug.cgi?id=861157

The attached image shows how a PDF that had been tagged has had all its tags removed after adding a drawing and text comment on the first page.

Component: Untriaged → PDF Viewer
Keywords: access
Whiteboard: [access-s2]

I tried with the pdf: https://accessinghigherground.org/wp/wp-content/uploads/2015/09/The-BasicsOfTaggedPDF20161.pdf and all the tags are still there after I modified and save it in Firefox nightly (113) and the same in release (111).
Would it be possible to share the pdf ? or even just create a basic one containing few tags ?

I used the PDF you linked, edited it, and opened the edited file in Acrobat Pro and the tags were retained.

I created a new Microsoft Word document, dumped content from Wikipedia in there as plain text (used Notepad as an intermediary), formatted it, exported as PDF (using Create PDF/XPS Document), confirmed the tags in Acrobat Pro, opened in Firefox 111, edited, saved, and then opened the new file in Acrobat Pro and all the tags were removed.

My original PDF: https://adrianroselli.com/files/xfr/PDF-UA.pdf
My edited PDF: https://adrianroselli.com/files/xfr/PDF-UA_edited.pdf

To speculate, Firefox may be struggling with certain PDF files as a function of how they are encoded.

Quick follow-up: I noted the AHG PDF (which retained its tags) was created in Word 2016 and mine (which lost its tags) was created in Word 2021. No idea if/how that factors.

I can confirm that tags are removed when editing and saving the PDF-UA.pdf problem file linked above in Comment 2.

I did some further fiddling, and found that if I simply added the PDF/UA metadata identifier to that PDF, and then edited the file in Firefox 111, the tags would be preserved this time.

Unfortunately, the "Basics of Tagged PDF" file linked in Comment 1 does not have the PDF/UA metadata either, so the PDF/UA identifier cannot be the only issue causing this bug.

If I understand correctly the pdf specs, PDF-UA.pdf is a "hybrid-reference" file because it contains a xref table with some deleted elements and a xref stream which references those deleted elements.
When we're writing data in the pdf, we use a xref stream but its Prev entry makes a reference on the xref table but not on the previous xref stream:
https://github.com/mozilla/pdf.js/blob/b1e0253f29176751c9762f88b5b9765fcf9fc07c/src/core/writer.js#L285

but in the specifications for xref stream we've:

The byte offset in the decoded stream from the beginning of the file to the
beginning of the previous cross-reference stream. This entry has the same
function as the Prev entry in the trailer dictionary (Table 15).

Consequently, the fix is to reference the previous xref stream instead of the previous xref table.

Assignee: nobody → cdenizet
Severity: -- → S2
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Priority: -- → P1
Attached file PDF-UA.pdf
No longer depends on: 1826399
No longer depends on: 1824983
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 113 Branch
Accessibility Severity: --- → s2
Whiteboard: [access-s2]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: