Closed Bug 785549 Opened 11 years ago Closed 9 years ago
Possible problems if byline contains <'s or >'s
We're using textContent from the byline node which might contain < or > and we use the set the byline text with innerHTML setter. We need to protect against this.
I guess the place to fix this is: http://mxr.mozilla.org/mozilla-central/source/mobile/android/chrome/content/Readability.js#446
I made a webpage --- http://www.ginformations.heliohost.org/Home/fennec.html --- in it i made a div with class="byline" and in that div i putted some text. The result is that in Reader Mode that content is listed twice, if i delete the class name from that div, the content is shown only once. At the bottom of the div i putted this: --- <hello world ! --- and in Reader Mode it's not shown. 1)why the content of the div with class="byline" is shown twice? 2)i don't know if i did correctly ( <hello world ! ) to reproduce the bug? Hmm... what i am doing wrong?
Hi alex, could you let us know if you are still working on this bug? Thanks!
Hi Henry, I am not working on this bug, whoever want may take and start to work on it.
Assignee: alexbio2008 → nobody
Did you fix this bug yet? Can you please elaborate on what is the problem you are facing since I don't see anything shown twice as you specified.
Lucas, Could you tell me what exactly is the problem we are facing here. I tried rendering the page which Alex had made. I found this issue: Although the text in the div with byline is shown only once, the text beneath the div inside a <p> tag is not being rendered. So is this the issue which needs to be fixed unlike what alex said of the div being rendered twice.
(In reply to Sushant Hiray [:sushant] from comment #6) > Lucas, Could you tell me what exactly is the problem we are facing here. > I tried rendering the page which Alex had made. > I found this issue: > Although the text in the div with byline is shown only once, the text > beneath the div inside a <p> tag is not being rendered. > > So is this the issue which needs to be fixed unlike what alex said of the > div being rendered twice. Just wondering: are you using latest nightly?
Umm, I'm not sure but I was using the latest build for Firefox for Android. Do we get a nightly version for Android as well?
(In reply to Sushant Hiray [:sushant] from comment #8) > Umm, I'm not sure but I was using the latest build for Firefox for Android. > Do we get a nightly version for Android as well? Do you mean the released Firefox in the Play store? Similarly to desktop, yes, we have Nightly builds that you can download here: http://nightly.mozilla.org/ That's what Lucas meant when he asked about the latest nightly.
Whiteboard: [mentor=lucasr][lang=js] → [lang=js]
Mentor: lucasr.at.mozilla → margaret.leibovic
Hmm, having just looked at bug 794958, what's the security exposure here? .innerHTML + mismatched tag delimiters classically lead to being able to inject <script> and such into a page... Can that happen here?
Hard to say...I haven't been able to exploit it with some light testing myself, but I definitely can't assert that it's impossible. I'm not really sure what case this bug was originally filed for. Here's an article  with a "<" (surrounded by spaces), which doesn't give us any problems. Here's another article  without the surrounding spaces, which confuses our parser since it thinks we're starting a new element. That doesn't display correctly for the original page either, though, so that's not a reader issue. But I also don't know why we wouldn't use textContent for the byline like we do with the title . If we make that change, we should have nothing to worry about in terms of security or correctness.  https://people.mozilla.org/~bnicholson/test/reader3.html  https://people.mozilla.org/~bnicholson/test/reader4.html  http://mxr.mozilla.org/mozilla-central/source/toolkit/components/reader/AboutReader.jsm?rev=d819aa9d47f6#695
(In reply to Brian Nicholson (:bnicholson) from comment #11) > But I also don't know why we wouldn't use textContent for the byline like we > do with the title . If we make that change, we should have nothing to > worry about in terms of security or correctness. > >  https://people.mozilla.org/~bnicholson/test/reader3.html I tried using .textContent for the byline on this test page, and we end up with "A thoughtful piece by Brian < Nicholson" Talking about this with Gijs, it looks like this is a problem with our JSDOMParser .textContent implementation (it's returning this encoded value), and the easiest way forward here would be to just continue to use .innerHTML for about:reader. I'm temped to just WORKSFORME this bug.
It is slightly concerning that we are using innerHTML here since we don't sanitize the byline like we do with the article content. If want to be extra careful, we could: 1) sanitize the byline, or 2) unencode the HTML: let tempDiv = document.createElement("div"); tempDiv.innerHTML = rawByline; let byline = tempDiv.textContent;
When we looked at this during the work week it seemed that because JSDOMParser doesn't actually implement textContent/innerHTML's distinction in terms of having real HTML / unescaped text. I have a pull request for fixing that ( https://github.com/mozilla/readability/pull/80 ) which is blocked on some perf questions, which I'm working on in another PR ( https://github.com/mozilla/readability/pull/88 )
While making a testcase for this, I found we really do have problems using innerHTML. With this testcase, it looks like we interpret the second < character as the start of a tag, and we don't render the rest of the byline: http://people.mozilla.org/~mleibovic/test/brackets.html Readability does return "A thoughtful piece by Foo < <Bar" as the byline, so it's definitely just a problem with using innerHTML, and this patch fixes that.
Attachment #8591917 - Flags: review?(gijskruitbosch+bugs)
Attachment #8591917 - Flags: review?(gijskruitbosch+bugs) → review+
Comment on attachment 8591917 [details] [diff] [review] Use textContent instead of innerHTML to set domain and credits in reader view Approval Request Comment [Feature/regressing bug #]: reader view [User impact if declined]: Users may see mis-formatted domain and byline content in reader view, since we'll render those fields as HTML instead of text. [Describe test coverage new/current, TreeHerder]: Tested locally, landed on nightly, no automated test coverage. [Risks and why]: Low-risk swap to use textContent instead of innerHTML for reader view domain and byline. Possible risk would include showing HTML markup as text, but we fixed readability such that this shouldn't be a problem. [String/UUID change made/needed]: none
Do we still need to uplift this if we're only keeping Reader View without Reading List, for now?
(In reply to Liz Henry (:lizzard) from comment #19) > Do we still need to uplift this if we're only keeping Reader View without > Reading List, for now? Yes, see comment #18 .
Should be in 38 beta 6 but disabled...
(In reply to Sylvestre Ledru [:sylvestre] from comment #21) > Should be in 38 beta 6 but disabled... This is for reader view, so it will be enabled. Only 'reading list' is being disabled.
Ah sorry, I didn't know you were differentiating between 38 and 38.1. Anyways, there is nothing that this patch needs to do to disable its changes. Bug 1155191 will handle doing all of the disabling.
You need to log in before you can comment on or make changes to this bug.