Closed
Bug 1173548
Opened 9 years ago
Closed 8 years ago
Reader View picks wrong direction for some RTL pages
Categories
(Toolkit :: Reader Mode, defect, P2)
Tracking
()
RESOLVED
FIXED
mozilla53
Tracking | Status | |
---|---|---|
firefox53 | --- | fixed |
People
(Reporter: mehdi, Assigned: evanxd)
References
(Blocks 1 open bug)
Details
(Keywords: rtl, Whiteboard: [reader-mode-readability-algorithm])
Attachments
(2 files)
User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Firefox/38.0 Iceweasel/38.0.1
Build ID: 20150525043631
Steps to reproduce:
Go to this page: http://mehdix.ir/rtl-feed.html
click the reader icon. The ouput's direction is left to right. If it shows correctly (rtl) for you, don't stop. Try this page: http://www.bbc.com/persian/sport/2015/06/150610_world_cup_2026_bidding_delayed and press the reader icon, the direction is false. However on this page the direction is showing correctly: https://ar.wikipedia.org/wiki/%D8%A7%D9%84%D9%87%D8%A8%D9%88%D8%B7_%D8%B9%D9%84%D9%89_%D8%B3%D8%B7%D8%AD_%D8%A7%D9%84%D9%82%D9%85%D8%B1
Actual results:
Just browsing different pages with my updated iceweasel 38.0.1 on debian jessie that I see the direction is wrong for most RTL pages that I visit.
Expected results:
The content should appear RTL if the main page is also RTL. I change direction in css having `body { direction: rtl; }` not `dir="rtl"` as html attributed.
Comment 1•9 years ago
|
||
(In reply to Mehdi Sadeghi from comment #0)
> I change
> direction in css having `body { direction: rtl; }` not `dir="rtl"` as html
> attributed.
This is not a good idea -- both the HTML spec [1] and the CSS spec [2] recommend setting direction with the HTML dir attribute and not the CSS direction property.
That said, I don't know exactly how Reader View chooses a direction, and we may well be able to improve things here.
[1] https://html.spec.whatwg.org/multipage/dom.html#the-dir-attribute:the-dir-attribute-18
[2] http://www.w3.org/TR/css-writing-modes-3/#bidi
Reporter | ||
Comment 2•9 years ago
|
||
(In reply to Simon Montagu :smontagu from comment #1)
> (In reply to Mehdi Sadeghi from comment #0)
> > I change
> > direction in css having `body { direction: rtl; }` not `dir="rtl"` as html
> > attributed.
>
> This is not a good idea -- both the HTML spec [1] and the CSS spec [2]
> recommend setting direction with the HTML dir attribute and not the CSS
> direction property.
>
> That said, I don't know exactly how Reader View chooses a direction, and we
> may well be able to improve things here.
>
> [1]
> https://html.spec.whatwg.org/multipage/dom.html#the-dir-attribute:the-dir-
> attribute-18
> [2] http://www.w3.org/TR/css-writing-modes-3/#bidi
Setting direction with the HTML dir attribute solved my problem. I change the status to RESOLVED.
Status: UNCONFIRMED → RESOLVED
Closed: 9 years ago
Resolution: --- → INVALID
Comment 3•9 years ago
|
||
I still want to investigate whether we can improve behaviour on sites like http://www.bbc.com/persian/sport/2015/06/150610_world_cup_2026_bidding_delayed where there is no dir attribute on the body, but all the content is contained in a <div> with dir="rtl".
Status: RESOLVED → REOPENED
Ever confirmed: true
Resolution: INVALID → ---
Comment 4•9 years ago
|
||
This page also doesn't support RTL in desktop reader mode:
http://hwzone.co.il/%D7%94%D7%A6%D7%A6%D7%94-%D7%9C%D7%9E%D7%95%D7%A6%D7%A8%D7%99-%D7%94%D7%A2%D7%99%D7%93%D7%9F-%D7%94%D7%97%D7%93%D7%A9-%D7%A9%D7%9C-amd-%D7%A9%D7%99%D7%92%D7%99%D7%A2%D7%95-%D7%91%D7%A9%D7%A0%D7%AA-201/
Kubuntu 15.04 64bit, Firefox 39.
Comment 5•9 years ago
|
||
Web designers should use dir attribute instead of CSS direction property, but there are sites out there not conforming to the recommendation. We can still do better on this by preserving the text direction.
A very safe implementation is to get computed style of the extracted elements, and set this to the dir attribute. This would guarantee the text direction is correct, but takes more time. DOM distiller is a similar project, but for Chromium. Their implementation is here:
https://github.com/chromium/dom-distiller/search?utf8=%E2%9C%93&q=cloneSubtreeRetainDirection
Comment 6•9 years ago
|
||
(In reply to Wei-Yin Chen from comment #5)
> A very safe implementation is to get computed style of the extracted
> elements, and set this to the dir attribute. This would guarantee the text
> direction is correct, but takes more time. DOM distiller is a similar
> project, but for Chromium. Their implementation is here:
> https://github.com/chromium/dom-distiller/
> search?utf8=%E2%9C%93&q=cloneSubtreeRetainDirection
This sounds sensible to me.
This is broken on both desktop and Android Firefox.
From the opening bug description, http://mehdix.ir/rtl-feed.html is broken for me and http://www.bbc.com/persian/sport/2015/06/150610_world_cup_2026_bidding_delayed works correctly.
Another site that is broken for me: http://www.nrg.co.il/online/1/ART2/749/875.html?hp=1&cat=402&loc=1
Comment 9•8 years ago
|
||
Felipe, can CLD or maybe Intl help and determine directionality of text (or maybe just language, and can we make an inference based on that, if no direction is set and we're confident we're looking at Arabic/Hebrew/something-else-we-know-is-rtl ?)
Flags: needinfo?(felipc)
Comment 10•8 years ago
|
||
The charset autodetection code could help find the page language, but I am not sure if this component is not obsolete in Firefox. It should be better to set direction by checking the page direction and not the language, even that it will break the directionality of (very!) old webpages that have never used the direction directive.
Since the problem is about pages that are written with RTL directionality, I guess that such webpages should have most of its content marked with dir=rtl or direction:rtl. RTL mode should be triggered if there is RTL content on the page, or if more than a given percentage of the page content is actually RTL.
Comment 11•8 years ago
|
||
(In reply to Amir Aharoni from comment #6)
> (In reply to Wei-Yin Chen from comment #5)
> > A very safe implementation is to get computed style of the extracted
> > elements, and set this to the dir attribute. This would guarantee the text
> > direction is correct, but takes more time. DOM distiller is a similar
> > project, but for Chromium. Their implementation is here:
> > https://github.com/chromium/dom-distiller/
> > search?utf8=%E2%9C%93&q=cloneSubtreeRetainDirection
>
> This sounds sensible to me.
>
> This is broken on both desktop and Android Firefox.
>
> From the opening bug description, http://mehdix.ir/rtl-feed.html is broken
> for me and
> http://www.bbc.com/persian/sport/2015/06/
> 150610_world_cup_2026_bidding_delayed works correctly.
I'm not sure how it was back when I posted this comment, but now it's definitely the other way:
* correct: http://mehdix.ir/rtl-feed.html
* broken: http://www.bbc.com/persian/sport/2015/06/150610_world_cup_2026_bidding_delayed
Comment 12•8 years ago
|
||
(In reply to Amir Aharoni from comment #11)
> * correct: http://mehdix.ir/rtl-feed.html
Direction set on the HTML tag. <html dir="rtl" lang="fa">
> * broken:
> http://www.bbc.com/persian/sport/2015/06/
> 150610_world_cup_2026_bidding_delayed
Direction set on the content (and on <head> - why?!) <div class="direction" dir="rtl">
Broken: http://www.nrg.co.il/online/1/ART2/749/875.html?hp=1&cat=402&loc=1
- Direction set on CSS: #page-wrap {direction: rtl;}
Correct: https://hwzone.co.il/%D7%94%D7%A6%D7%A6%D7%94-%D7%9C%D7%9E%D7%95%D7%A6%D7%A8%D7%99-%D7%94%D7%A2%D7%99%D7%93%D7%9F-%D7%94%D7%97%D7%93%D7%A9-%D7%A9%D7%9C-amd-%D7%A9%D7%99%D7%92%D7%99%D7%A2%D7%95-%D7%91%D7%A9%D7%A0%D7%AA-201/
- Direction set on HTML: <html dir="rtl" lang="he-IL">
Comment 13•8 years ago
|
||
Yeah, the CLD (LanguageDetector.jsm) should be able to provide a locale from a given chunk of text, which could them be run through nsIChromeRegistry.isLocaleRTL, which checks the "ui.direction.*" prefs to determine if it's in the list of RTL locales.
The problem is that there could be text with mixed languages which will never look right one way or the other. I think that Reader Mode should get the computed style for the `direction` style on each chunk of text that it decided to keep, and use that information.
Does Reader Mode have a direct knowledge of the DOM elements it's keeping? Or does the Readability API only give back the filtered plain text?
Flags: needinfo?(felipc)
Comment 14•8 years ago
|
||
(In reply to :Felipe Gomes (needinfo me!) from comment #13)
> Does Reader Mode have a direct knowledge of the DOM elements it's keeping?
No. It doesn't even have more than the HTML of the original page in some cases, so "computed style" isn't available, because there's no CSS except inline styles. So really, attempting to rely on the direction: style is hard/impossible without completely rearchitecturing the whole thing. We might be able to make the common case of clicking to enter reader mode slightly better by storing directionality from the loaded document, but even that is non-trivial. And even if we do that it won't always work because e.g. from session restore / refresh / back/fwd navigation, we won't necessarily have this information (we just XHR the HTML). Hence asking for other options.
Comment 15•8 years ago
|
||
(In reply to :Gijs Kruitbosch from comment #14)
> No. It doesn't even have more than the HTML of the original page in some
> cases, so "computed style" isn't available, because there's no CSS except
> inline styles.
Does it means that the reader mode is not aware of stylesheets at all? Because it may make it difficult to guess direction that is set on the stylesheets.
Comment 16•8 years ago
|
||
(In reply to Tomer Cohen :tomer from comment #15)
> (In reply to :Gijs Kruitbosch from comment #14)
> > No. It doesn't even have more than the HTML of the original page in some
> > cases, so "computed style" isn't available, because there's no CSS except
> > inline styles.
> Does it means that the reader mode is not aware of stylesheets at all?
Yes.
> Because it may make it difficult to guess direction that is set on the
> stylesheets.
Not just "may" - it *does* make it *impossible* to know. There's a good reason the specs suggest using the "dir" attribute instead of CSS, see comment #1. It's a semantics rather than just a styling change anyway.
Anyway, this is the web we get and so we should try and deal with it, somehow.
Reporter | ||
Comment 17•8 years ago
|
||
(In reply to Amir Aharoni from comment #11)
> I'm not sure how it was back when I posted this comment, but now it's
> definitely the other way:
> * correct: http://mehdix.ir/rtl-feed.html
It is correct because I have updated the website to reflect changes described in comment #1.
Updated•8 years ago
|
Priority: -- → P2
Updated•8 years ago
|
Status: REOPENED → NEW
Whiteboard: [reader-mode-readability-algorithm]
Assignee | ||
Comment 19•8 years ago
|
||
(In reply to :Gijs Kruitbosch from comment #16)
> (In reply to Tomer Cohen :tomer from comment #15)
> > (In reply to :Gijs Kruitbosch from comment #14)
> > > No. It doesn't even have more than the HTML of the original page in some
> > > cases, so "computed style" isn't available, because there's no CSS except
> > > inline styles.
> > Does it means that the reader mode is not aware of stylesheets at all?
>
> Yes.
>
> > Because it may make it difficult to guess direction that is set on the
> > stylesheets.
>
> Not just "may" - it *does* make it *impossible* to know. There's a good
> reason the specs suggest using the "dir" attribute instead of CSS, see
> comment #1. It's a semantics rather than just a styling change anyway.
Gijs,
Since it deals with the "Direction set on CSS" case impossibly, should we set "WONTFIX" here? Or we should focus on the "Direction set on HTML" cases here?
If we want to deal with the "Direction set on HTML" cases, somehow we might need to find a good way to get possible "dir" attribute in a web page to instead of current solution, `doc.documentElement.getAttribute("dir")`. For example, check the "dir" attribute of result HTML text selected by Reader Mode algorithm from a web page, parent(parent's parent...) node of the result HTML text, head tag, and body tag.
Any thought?
[1]: http://searchfox.org/mozilla-central/source/toolkit/components/reader/Readability.js#666
Flags: needinfo?(gijskruitbosch+bugs)
Comment 20•8 years ago
|
||
Is it possible for the reader mode to read two blocks with different directionality?
<div dir="ltr">foo</div>
<div dir="rtl">foo2</div>
Also, it may be valuable to recognize directionality of text without markup (tags like RLI/FSI/PDI may indicate that), but then you'd need similar code to one layout uses to interpret the direction from the chunk.
Comment 21•8 years ago
|
||
(In reply to Evan Tseng [:evanxd][:愛聞插低] from comment #19)
> Since it deals with the "Direction set on CSS" case impossibly, should we
> set "WONTFIX" here? Or we should focus on the "Direction set on HTML" cases
> here?
>
> If we want to deal with the "Direction set on HTML" cases, somehow we might
> need to find a good way to get possible "dir" attribute in a web page to
> instead of current solution, `doc.documentElement.getAttribute("dir")`. For
> example, check the "dir" attribute of result HTML text selected by Reader
> Mode algorithm from a web page, parent(parent's parent...) node of the
> result HTML text, head tag, and body tag.
We should do two things:
1) look for a "dir" attribute on the ancestors of the elements we take out of the original DOM, not just on the document element.
2) for cases where that does not obtain a dir attribute, in the reader page JS/HTML (not in the readability code itself, where we can do the other stuff), abstract the language detection that we're doing for narrate anyway out, and make it more accessible, then use that to determine a fallback rtl/ltr-ness as per comment #13 .
Flags: needinfo?(gijskruitbosch+bugs)
Comment 22•8 years ago
|
||
(In reply to Zibi Braniecki [:gandalf][:zibi] from comment #20)
> Is it possible for the reader mode to read two blocks with different
> directionality?
>
> <div dir="ltr">foo</div>
> <div dir="rtl">foo2</div>
This question does not make sense to me. What does "read" mean?
If the dir attribute is included in the markup reader mode extracts from a page, we don't touch it, and things should display fine. The problem is with dir attributes that are in some non-extracted ancestor of the content that we do extract, but not on the <body>/<html> tags, and pages that use CSS only to indicate directionality. So, what you're talking about is not currently a problem.
> Also, it may be valuable to recognize directionality of text without markup
> (tags like RLI/FSI/PDI may indicate that), but then you'd need similar code
> to one layout uses to interpret the direction from the chunk.
Why would we need to do layout's job for it? We don't specifically set dir=ltr if we don't find any indication to do that, and the content still has the same text in it. If layout already does not automatically determine it should be displayed RTL then I don't see why we would be able to do any better.
Assignee | ||
Comment 23•8 years ago
|
||
(In reply to :Gijs Kruitbosch (PTO Nov. 17-20 inclusive) from comment #21)
> (In reply to Evan Tseng [:evanxd][:愛聞插低] from comment #19)
> > Since it deals with the "Direction set on CSS" case impossibly, should we
> > set "WONTFIX" here? Or we should focus on the "Direction set on HTML" cases
> > here?
> >
> > If we want to deal with the "Direction set on HTML" cases, somehow we might
> > need to find a good way to get possible "dir" attribute in a web page to
> > instead of current solution, `doc.documentElement.getAttribute("dir")`. For
> > example, check the "dir" attribute of result HTML text selected by Reader
> > Mode algorithm from a web page, parent(parent's parent...) node of the
> > result HTML text, head tag, and body tag.
>
> We should do two things:
> 1) look for a "dir" attribute on the ancestors of the elements we take out
> of the original DOM, not just on the document element.
Got it.
> 2) for cases where that does not obtain a dir attribute, in the reader page
> JS/HTML (not in the readability code itself, where we can do the other
> stuff), abstract the language detection that we're doing for narrate anyway
> out, and make it more accessible, then use that to determine a fallback
> rtl/ltr-ness as per comment #13 .
One question here, what do you mean "make it more accessible"?
In my understanding, we could do language detection for the `article.content` variable in the `_showContent` method[1]. And the language detection sample code is like below[2]
```
LanguageDetector.detectLanguage(sampleText).then(result => {
resolve(result.confident ? result.language : null);
});
```
Looks like we can just reuse `LanguageDetector.jsm` without changing code for it, right?
[1]: http://searchfox.org/mozilla-central/source/toolkit/components/reader/AboutReader.jsm#769-805
[2]: http://searchfox.org/mozilla-central/source/toolkit/components/narrate/Narrator.jsm#39-41
Flags: needinfo?(gijskruitbosch+bugs)
Comment 24•8 years ago
|
||
(In reply to Evan Tseng [:evanxd][:愛聞插低] from comment #23)
> (In reply to :Gijs Kruitbosch (PTO Nov. 17-20 inclusive) from comment #21)
> > 2) for cases where that does not obtain a dir attribute, in the reader page
> > JS/HTML (not in the readability code itself, where we can do the other
> > stuff), abstract the language detection that we're doing for narrate anyway
> > out, and make it more accessible, then use that to determine a fallback
> > rtl/ltr-ness as per comment #13 .
> One question here, what do you mean "make it more accessible"?
>
> In my understanding, we could do language detection for the
> `article.content` variable in the `_showContent` method[1]. And the language
> detection sample code is like below[2]
> ```
> LanguageDetector.detectLanguage(sampleText).then(result => {
> resolve(result.confident ? result.language : null);
> });
> ```
>
> Looks like we can just reuse `LanguageDetector.jsm` without changing code
> for it, right?
>
> [1]:
> http://searchfox.org/mozilla-central/source/toolkit/components/reader/
> AboutReader.jsm#769-805
> [2]:
> http://searchfox.org/mozilla-central/source/toolkit/components/narrate/
> Narrator.jsm#39-41
I mean that we should call LanguageDetector.jsm in reader mode code instead of in narrator code, and the result be saved somewhere where it is accessible to the narrator code - the narrator code can depend on the reader mode code but not vice versa.
Flags: needinfo?(gijskruitbosch+bugs)
Assignee | ||
Comment 25•8 years ago
|
||
Wrote WIP patch[1] to fixed the RTL issue of the BBC webpage[2] and add tests for it.
[1]: https://github.com/evanxd/readability/commit/f7b50429e00350da65dcdc770bb31e47e94c3861
[2]: http://www.bbc.com/persian/sport/2015/06/150610_world_cup_2026_bidding_delayed
Assignee | ||
Updated•8 years ago
|
Assignee: nobody → evan
Assignee | ||
Comment 26•8 years ago
|
||
> 1) look for a "dir" attribute on the ancestors of the elements we take out of the original DOM, not just on the document element.
Fixed the first problem: https://github.com/evanxd/readability/commit/d2bc22aa775f7d6e3bc512bb7991e09f9cb71910
Assignee | ||
Comment 27•8 years ago
|
||
Fixed some issues of the patch and sent a pull request for fixing problem one: https://github.com/mozilla/readability/pull/320
Assignee | ||
Comment 28•8 years ago
|
||
> 2) for cases where that does not obtain a dir attribute, in the reader page JS/HTML (not in the readability code itself, where we can do the other stuff), abstract the language detection that we're doing for narrate anyway out, and make it more accessible, then use that to determine a fallback rtl/ltr-ness as per comment #13 .
Since we need to write another patch to fix problem two, we should file a new bug and fix it there.
Assignee | ||
Updated•8 years ago
|
Assignee | ||
Comment 29•8 years ago
|
||
Updated patch for comments: https://github.com/mozilla/readability/pull/320/commits/bdbeb18c45bce0ef57e9b66203d3fd61a8aced55
Assignee | ||
Comment 30•8 years ago
|
||
Updated patch for comments: https://github.com/mozilla/readability/pull/320/commits/9ddcd14378ad38a31e186c81c6e5e859cb3f04e4
Assignee | ||
Comment 31•8 years ago
|
||
Updated patch for comments: https://github.com/mozilla/readability/pull/320/commits/cf6e6708bf349942f9e7e0712db6a15f9e4a3499
Assignee | ||
Comment 32•8 years ago
|
||
Patch is landed on GitHub repo: https://github.com/mozilla/readability/commit/af0aa5c59fe26484ea444204a9275f05d713a038
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Assignee | ||
Comment 35•8 years ago
|
||
Hi Gijs,
I've updated readability from github repo, includes fix for Bug 1173548 and Bug 1255978. Could you take a look and help land it? Thanks.
Comment 36•8 years ago
|
||
mozreview-review |
Comment on attachment 8814827 [details]
No bug - update readability from github repo, includes fix for Bug 1173548 and Bug 1255978,
https://reviewboard.mozilla.org/r/95940/#review95918
Attachment #8814827 -
Flags: review?(gijskruitbosch+bugs) → review+
Assignee | ||
Comment 37•8 years ago
|
||
Pushed a new try[1] to ensure `tc-M-V` test is good. In previous run, it is all failed for unknown reason. Once it's good, let's land it.
[1]: https://treeherder.mozilla.org/#/jobs?repo=try&revision=8fdcbe4c14af
Assignee | ||
Comment 38•8 years ago
|
||
Pushed a new try[1] to debug the `tc-M-V` failures[2].
[1]: https://treeherder.mozilla.org/#/jobs?repo=try&revision=c1df7241e32054b3048b08922c7f8e6d42267b01
[2]: https://treeherder.mozilla.org/#/jobs?repo=try&revision=8fdcbe4c14af&filter-resultStatus=testfailed
Comment 39•8 years ago
|
||
(In reply to Evan Tseng [:evanxd][:愛聞插低] from comment #37)
> Pushed a new try[1] to ensure `tc-M-V` test is good. In previous run, it is
> all failed for unknown reason. Once it's good, let's land it.
>
> [1]: https://treeherder.mozilla.org/#/jobs?repo=try&revision=8fdcbe4c14af
Those are tier-2 and those oranges are not your fault.
Comment 40•8 years ago
|
||
Pushed by gijskruitbosch@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/e9ff40069326
No bug - update readability from github repo, includes fix for Bug 1173548 and Bug 1255978, r=Gijs
Comment 41•8 years ago
|
||
bugherder |
Status: NEW → RESOLVED
Closed: 9 years ago → 8 years ago
status-firefox53:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla53
Updated•8 years ago
|
QA Whiteboard: [good first verify]
Comment 42•8 years ago
|
||
I have reproduced this bug with Nightly 41.0a1 (2015-06-10) on WIndows 7, 64 Bit!
This bg's fix is verified with latest Beta!
Build ID : 20170403072723
User Agent : Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:53.0) Gecko/20100101 Firefox/53.0
[bugday-20170412]
You need to log in
before you can comment on or make changes to this bug.
Description
•