Closed
Bug 881091
Opened 11 years ago
Closed 10 years ago
Article showing twice in reader mode
Categories
(Firefox for Android Graveyard :: Reader View, defect)
Tracking
(firefox23 affected, firefox24 affected, firefox33 verified, fennec+)
RESOLVED
FIXED
Firefox 33
People
(Reporter: daniel.nr01, Assigned: eedens)
References
Details
(Keywords: testcase, Whiteboard: [lang=js])
Attachments
(3 files, 3 obsolete files)
713.90 KB,
image/jpeg
|
Details | |
2.80 KB,
text/html
|
Details | |
2.44 KB,
patch
|
Details | Diff | Splinter Review |
User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20130608 Firefox/24.0 (Nightly/Aurora) Build ID: 20130608031212 Steps to reproduce: Visit http://infrequently.org/2013/06/sfuturepromiseg/ Enter reader mode. Actual results: The text of the article is duplicated as shown in the screenshot (The tiny font size is only to fit the whole article). The first instance has no paragraphs and smaller font size than usual. I'm using an Asus Transformer TF300.
Updated•11 years ago
|
Status: UNCONFIRMED → NEW
tracking-fennec: --- → ?
status-firefox23:
--- → affected
status-firefox24:
--- → affected
Ever confirmed: true
Flags: needinfo?(bnicholson)
Comment 1•11 years ago
|
||
Sounds like an issue with the Readability.js parse, but I don't really have any information outside of that.
Flags: needinfo?(bnicholson)
Updated•11 years ago
|
Flags: needinfo?(lucasr.at.mozilla)
Comment 2•11 years ago
|
||
Minimumish test case at http://people.mozilla.org/~kbrosnan/tmp/881091-edit.html still need to check what classes cause it.
Comment 3•11 years ago
|
||
Updated•11 years ago
|
Assignee: nobody → bnicholson
tracking-fennec: ? → 23+
Comment 4•11 years ago
|
||
This is happening because of the div with a class called "author". We're picking it as the author blob for the article. We should probably improve our author-fetching logic in the parser to cover this case.
Flags: needinfo?(lucasr.at.mozilla)
This bug affects https://blog.mozilla.org/javascript/2013/07/18/clawing-our-way-back-to-precision/ as well. There is a "show-author" class but no plain "author" class on the article element.
Updated•11 years ago
|
Whiteboard: [mentor=lucasr][lang=js]
Updated•11 years ago
|
tracking-fennec: ? → +
Comment 7•11 years ago
|
||
Mmmm ... not sure what to suggest here ... Readability.js makes a best guess as to the node to use for the author information based on whether the class has a string that regex matches |/byline|author|dateline|writtenby/i| ... (it uses the first one found) The example http://infrequently.org/2013/06/sfuturepromiseg/ has multiple matches for |author| ... the body has |s-author-alex| (which it uses) and also there's a div class match |author-alex|, a span class |author|, a div class |comment-author| and etc. The Minimumish test case at http://people.mozilla.org/~kbrosnan/tmp/881091-edit.html matches div class |author-alex| as designed ... The example https://blog.mozilla.org/javascript/2013/07/18/clawing-our-way-back-to-precision/ matches an article class |show-author|, an address class |author fn|, and somewhere around 8 other variations.
Comment 8•11 years ago
|
||
Relying on class names is, well, not reliable. Some possible ideas to improve it: - Fetch author information from meta tags if present e.g. <meta name="author" content="Lucas Rocha"> - Fetch author information from twitter cards if present. See: https://dev.twitter.com/docs/cards/markup-reference - Fetch author information from well-known/widely used metadata schemas/conversions in major news sites. See: http://schema.org/CreativeWork, http://microformats.org/wiki/hnews#Schema, http://dublincore.org/documents/dcmi-terms/, http://ogp.me. The idea is to prioritize proper metadata over a random element in the page with an author-like class name. We should at least limit the length of the author blob. If it's too long, ignore it. The same idea probably applies to article publish date.
Comment 9•11 years ago
|
||
Should probably unassign myself if Lucas wants to take this as a mentor bug!
Assignee: bnicholson → nobody
Assignee | ||
Updated•10 years ago
|
Assignee: nobody → eedens
Assignee | ||
Comment 10•10 years ago
|
||
If the byline is really big (over 40 char, or bigger than 1/10 of article), then it is rejected and an empty string is returned. lucasr, I like your idea for adding extractors based on meta tags, html5 tags, etc. I'll create a bug to manage those, since they're more of an enhancement, rather than a fix for this particular issue.
Attachment #8423381 -
Flags: feedback?(lucasr.at.mozilla)
Comment 11•10 years ago
|
||
Comment on attachment 8423381 [details] [diff] [review] WIP - Detect whether byline is a huge chunk of text. Review of attachment 8423381 [details] [diff] [review]: ----------------------------------------------------------------- For this particular issue, I think we can just do the simplest thing and just check for a maximum acceptable length before setting articleByline (something like 100 is probably good enough?) i.e. no need to compare with the article's length ratio. I'd rather focus on fetching the right data from meta tags, twitter cards, and the like.
Attachment #8423381 -
Flags: feedback?(lucasr.at.mozilla) → feedback+
Assignee | ||
Comment 12•10 years ago
|
||
Simplified check of byline length: simply checks whether less than 100 chars.
Attachment #8423381 -
Attachment is obsolete: true
Attachment #8424045 -
Flags: review?(lucasr.at.mozilla)
Comment 13•10 years ago
|
||
Comment on attachment 8424045 [details] [diff] [review] REV - Detect whether byline is a huge chunk of text. Review of attachment 8424045 [details] [diff] [review]: ----------------------------------------------------------------- ::: mobile/android/chrome/content/Readability.js @@ +1530,5 @@ > // } > > let excerpt = this._getExcerpt(articleContent); > > + let byline = this._getByline(articleContent); If you do this here, you might be missing an opportunity to fetch the 'right' node while parsing the page. In other words, maybe we shouldn't even assign a value to this._articleByline if the length doesn't seem right. So, maybe this check should be done here? http://mxr.mozilla.org/mozilla-central/source/mobile/android/chrome/content/Readability.js#449
Attachment #8424045 -
Flags: review?(lucasr.at.mozilla)
Assignee | ||
Comment 14•10 years ago
|
||
I like your suggestion. This version accepts bylines that are less than 100 chars. If a byline fails, it is tried again on the next pass.
Attachment #8424045 -
Attachment is obsolete: true
Attachment #8426559 -
Flags: review?(lucasr.at.mozilla)
Comment 15•10 years ago
|
||
Comment on attachment 8426559 [details] [diff] [review] REV - Detect whether byline is a huge chunk of text Review of attachment 8426559 [details] [diff] [review]: ----------------------------------------------------------------- Looks good, please test on a few other sites looking for any regressions. Unfortunately, the integration tests I wrote in bug 786638 got blocked on some infrastructure issues... ::: mobile/android/chrome/content/Readability.js @@ +723,5 @@ > /** > + * Check whether the input string could be a byline. > + * This verifies that the input is a string, and that the length > + * is less than 100 chars. > + * nit: remove trailing space.
Attachment #8426559 -
Flags: review?(lucasr.at.mozilla) → review+
Updated•10 years ago
|
Mentor: lucasr.at.mozilla
Whiteboard: [mentor=lucasr][lang=js] → [lang=js]
Updated•10 years ago
|
Mentor: lucasr.at.mozilla
Assignee | ||
Comment 18•10 years ago
|
||
Fixed nit (removed 8 trailing whitespace characters)
Attachment #8426559 -
Attachment is obsolete: true
Assignee | ||
Comment 19•10 years ago
|
||
Lucas, no reason -- this just fell off the radar. I fixed the trailing whitepace, and also tested on a fresh MC pull. +checkin-needed
Flags: needinfo?(eedens)
Keywords: checkin-needed
Comment 20•10 years ago
|
||
Can we please run this through Try first? Thanks :)
Keywords: checkin-needed
Assignee | ||
Comment 21•10 years ago
|
||
Thanks Ryan :) All passing: https://tbpl.mozilla.org/?tree=Try&rev=cf9b0f97e987
Keywords: checkin-needed
Comment 22•10 years ago
|
||
https://hg.mozilla.org/integration/fx-team/rev/64fb83a88fac
Keywords: checkin-needed
Whiteboard: [lang=js] → [lang=js][fixed-in-fx-team]
Comment 23•10 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/64fb83a88fac
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Whiteboard: [lang=js][fixed-in-fx-team] → [lang=js]
Target Milestone: --- → Firefox 33
Comment 24•10 years ago
|
||
Verified as fixed in: Build: Firefox for Android 33.0a2 (2014-06-30) Device: Asus Transformer Pad TF300T (Android 4.2.1) and Nexus 4 (Android 4.4.2)
status-firefox33:
--- → verified
Updated•3 years ago
|
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•