Reader mode omits text on ncurses manpage style documents when deeply nested nodes have a lot of text
Categories
(Toolkit :: Reader Mode, defect, P5)
Tracking
()
People
(Reporter: alkersh.omar, Unassigned)
Details
(Whiteboard: [reader-mode-readability-algorithm])
Attachments
(1 file)
|
150.43 KB,
image/png
|
Details |
User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:75.0) Gecko/20100101 Firefox/75.0
Steps to reproduce:
Entered reader mode in sites. Specific example is here http://tldp.org/HOWTO/NCURSES-Programming-HOWTO/windows.html
Actual results:
Some paragraphs are not showing in reader mode, but they show up in normal mode.
Expected results:
Same text in normal mode shows up in reader mode.
| Reporter | ||
Comment 1•5 years ago
|
||
I noticed that the page is made up of three divs. The content in reader mode shows up from the very last div inside div 2. All other elements are ignored. NOTE: The rendered div is inside the last div in div 2 and is the last div.
Comment 2•5 years ago
|
||
Bugbug thinks this bug should belong to this component, but please revert this change in case of error.
Comment 3•5 years ago
|
||
This is a result of the algorithm giving weight to large amounts of text, and the code example has by far the most text of any node on the page. Ancestor nodes get scored at fractions of their child nodes (otherwise we'd just always pick all of <body> as the container of the article text), and clearly here we do not score anything else high enough to lead to the overall container having a higher score. There's also no class names or anything else to help clue readermode into what's happening, and there's some div soup going on in terms of the article structure which isn't helping either.
I'm not sure how best to fix this, and given the fact that it's not the primary target for reader mode (that's news articles and other frequently visited webpages, which have very different DOM structures), marking P5.
| Reporter | ||
Comment 4•5 years ago
|
||
Perhaps an option to use whole of <body> as input without any score or algorithms. This is not a blanket solution, but if allowed to exist with the current setup it can provide a brute force workaround to this 'bug' while a more sophisticated solution is developed.
Comment 5•5 years ago
|
||
(In reply to alkersh.omar from comment #4)
Perhaps an option to use whole of
<body>as input without any score or algorithms. This is not a blanket solution, but if allowed to exist with the current setup it can provide a brute force workaround to this 'bug' while a more sophisticated solution is developed.
Options aren't solutions; most users will have no idea whether they should use this option, plus then we'd have to provide UI for the option, or the option would be unused by the vast majority of users and they'd still get a broken experience.
Description
•