Tagged PDFs are accessible PDFs with semantic info to make screen readers for the visually impaired read PDFs better. Headings, form fields, tables, and the general flow of the text are determined by a very specific tag hierarchy, much similar to HTML. Currently, pdf.js does not support tagged PDFs, it ignores tags and only deduces the text from the general text info. All information about PDF accessibility can be found from this entry site: http://www.adobe.com/accessibility/ There is also PDF/UA, an initiative to help user agents such as pdf.js to make the most out of accessible PDFs.
Let's continue info and discussion here. Jamie wrote in bug 727819 comment #14: > Two things worth noting: > * Support for tagged PDF (and guessing where there aren't tags) will very much change the structure of the HTML representation of the content. Aside from headings, tables, etc., text should also flow better. That is, a single block of content (e.g. a paragraph) should appear in a single block element instead of multiple block elements. Right now, text breaks in awkward places. > * Tagged PDF can specify the reading order of the content. In extreme cases, the reading order can actually mix content from different pages. There are valid use cases for this; e.g. a 2-page brochure where you are meant to read some parts across both pages instead of reading all of one page and then the other.
5 years ago
You need to log in before you can comment on or make changes to this bug.