Open Bug 1635353 Opened 5 years ago Updated 5 years ago

Reader mode omits text on ncurses manpage style documents when deeply nested nodes have a lot of text

Categories

(Toolkit :: Reader Mode, defect, P5)

75 Branch
defect

Tracking

()

People

(Reporter: alkersh.omar, Unassigned)

Details

(Whiteboard: [reader-mode-readability-algorithm])

Attachments

(1 file)

Attached image bug_img.png

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:75.0) Gecko/20100101 Firefox/75.0

Steps to reproduce:

Entered reader mode in sites. Specific example is here http://tldp.org/HOWTO/NCURSES-Programming-HOWTO/windows.html

Actual results:

Some paragraphs are not showing in reader mode, but they show up in normal mode.

Expected results:

Same text in normal mode shows up in reader mode.

I noticed that the page is made up of three divs. The content in reader mode shows up from the very last div inside div 2. All other elements are ignored. NOTE: The rendered div is inside the last div in div 2 and is the last div.

Bugbug thinks this bug should belong to this component, but please revert this change in case of error.

Component: Untriaged → Reader Mode
Product: Firefox → Toolkit

This is a result of the algorithm giving weight to large amounts of text, and the code example has by far the most text of any node on the page. Ancestor nodes get scored at fractions of their child nodes (otherwise we'd just always pick all of <body> as the container of the article text), and clearly here we do not score anything else high enough to lead to the overall container having a higher score. There's also no class names or anything else to help clue readermode into what's happening, and there's some div soup going on in terms of the article structure which isn't helping either.

I'm not sure how best to fix this, and given the fact that it's not the primary target for reader mode (that's news articles and other frequently visited webpages, which have very different DOM structures), marking P5.

Severity: normal → S3
Status: UNCONFIRMED → NEW
Ever confirmed: true
OS: Unspecified → All
Priority: -- → P5
Hardware: Unspecified → All
Summary: reader mode skipping text → Reader mode omits text on ncurses manpage style documents when deeply nested nodes have a lot of text
Whiteboard: [reader-mode-readability-algorithm]

Perhaps an option to use whole of <body> as input without any score or algorithms. This is not a blanket solution, but if allowed to exist with the current setup it can provide a brute force workaround to this 'bug' while a more sophisticated solution is developed.

(In reply to alkersh.omar from comment #4)

Perhaps an option to use whole of <body> as input without any score or algorithms. This is not a blanket solution, but if allowed to exist with the current setup it can provide a brute force workaround to this 'bug' while a more sophisticated solution is developed.

Options aren't solutions; most users will have no idea whether they should use this option, plus then we'd have to provide UI for the option, or the option would be unused by the vast majority of users and they'd still get a broken experience.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: