Closed Bug 782423 Opened 12 years ago Closed 3 years ago

Intro paragraph missing from http://en.m.wikipedia.org pages

Categories

(Toolkit :: Reader Mode, defect, P4)

defect

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: lucasr, Unassigned)

References

(Blocks 3 open bugs)

Details

(Whiteboard: [reader-mode-readability-algorithm])

Readability.js picks one div with more text than others. We should probably add code handle pages where the text is split in different sections/divs.
OS: Linux → Android
Hardware: x86 → All
Priority: -- → P2
Two pages that I have been unable to add to reader mode:

* http://www.w3.org/TR/components-intro/
* https://leanpub.com/javascript-alonge/read
Component: Reader Mode → Readability
Component: Readability → Reader Mode
Product: Firefox for Android → Toolkit
Hi, 

I would like to work on this bug. How should I go about it?

Thanks in advance
shreyas
Flags: needinfo?(margaret.leibovic)
Looks like this bug has changed slightly or Lucas misstated what he saw. For example visit http://en.m.wikipedia.org/wiki/Chitrakoot_Falls in Android Firefox or desktop nightly with the reader mode preference enabled in about:config. The reading view removes several of the headings for the paragraphs. 

I would start by saving a copy of http://en.m.wikipedia.org/wiki/Chitrakoot_Falls making a copy of that file and then modifying the copy to simplify the page as much as possible while still retaining the problem that this bug reports. Bug 809724 has an example of going from a complex NY Times page to a simple test case. Once you get that attach that to this bug.

The readability code lives at https://github.com/mozilla/readability Margaret would be better at speaking about interacting with the code there.
Thanks for jumping in, Kevin.

Kevin is correct that we're developing this readability script in a standalone library on github, but unfortunately right now we're in a transition state where we don't have a system in place to test easily outside of Firefox.

I would follow Kevin's advice to try to make a reduced testcase that lets you isolate this problem, and then you should try modifying Readability.js. You could do this with either a desktop Firefox or Firefox for Android build (desktop has a quicker build time, so that's probably easier).

It looks like _grabArticle is probably where you want to look:
http://mxr.mozilla.org/mozilla-central/source/toolkit/components/reader/content/Readability.js#404

bnicholson would also be a good person to advise on the Readability algorithm details, since he spent time working on that at one point.
Flags: needinfo?(margaret.leibovic)
Updating stale priorities to reflect priority in current reading list / reader view work.
OS: Android → All
Priority: P2 → P3
I'm updating this bug to be about this specific test case. It doesn't look like it's a problem on the desktop version of wikipedia, so it's more of an edge case.
Priority: P3 → P4
Summary: Reader Mode: Improve readability.js to handle pages like wikipedia → Intro paragraph missing from http://en.m.wikipedia.org pages
(In reply to :Margaret Leibovic from comment #7)
> I'm updating this bug to be about this specific test case. It doesn't look
> like it's a problem on the desktop version of wikipedia, so it's more of an
> edge case.
"(Mar 2015: mobile traffic represents 34.9% of total traffic)"
source: http://stats.wikimedia.org/EN/TablesPageViewsMonthlyCombined.htm

I'm not sure over a third of visits can be considered an edge case.
(In reply to David Bruant from comment #8)
> (In reply to :Margaret Leibovic from comment #7)
> > I'm updating this bug to be about this specific test case. It doesn't look
> > like it's a problem on the desktop version of wikipedia, so it's more of an
> > edge case.
> "(Mar 2015: mobile traffic represents 34.9% of total traffic)"
> source: http://stats.wikimedia.org/EN/TablesPageViewsMonthlyCombined.htm
> 
> I'm not sure over a third of visits can be considered an edge case.

I'm just saying this is an edge case because it's only on a specific version of a site (it may even be considered an edge case if it's something that happens on a single site).

Also, I imagine our percentage of mobile Firefox traffic (relative to all Firefox traffic) is much smaller than that.
Blocks: 1222993
Whiteboard: [reader-mode-readability-algorithm]
Blocks: 1329358

(In reply to :Gijs (he/him) from comment #11)

recent example from the dupe: https://en.m.wikipedia.org/wiki/Domesticated_plants_and_animals_of_Austronesia .

Seems to be fixed now...

Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.