Possible problems if byline contains <'s or >'s

RESOLVED FIXED in Firefox 38

Status

()

defect
P3
normal
RESOLVED FIXED
7 years ago
4 years ago

People

(Reporter: lucasr, Assigned: Margaret, Mentored)

Tracking

(Blocks 1 bug)

unspecified
mozilla40
All
Android
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox38 fixed, firefox39 fixed, firefox40 fixed)

Details

(Whiteboard: [lang=js])

Attachments

(1 attachment)

Reporter

Description

7 years ago
We're using textContent from the byline node which might contain < or > and we use the set the byline text with innerHTML setter. We need to protect against this.
Reporter

Updated

7 years ago
Priority: -- → P3
Reporter

Updated

7 years ago
Whiteboard: [mentor=lucasr][lang=js]
Reporter

Updated

7 years ago
Assignee: lucasr.at.mozilla → alexbio2008

Comment 2

7 years ago
I made a webpage --- http://www.ginformations.heliohost.org/Home/fennec.html --- in it i made a div with class="byline" and in that div i putted some text. The result is that in Reader Mode that content is listed twice, if i delete the class name from that div, the content is shown only once.
At the bottom of the div i putted this: --- <hello world ! --- and in Reader Mode it's not shown.
1)why the content of the div with class="byline" is shown twice?
2)i don't know if i did correctly ( <hello world ! ) to reproduce the bug?
Hmm... what i am doing wrong?
Hi alex, could you let us know if you are still working on this bug? Thanks!
Flags: needinfo?(alexbio2008)

Comment 4

6 years ago
Hi Henry, I am not working on this bug, whoever want may take and start to work  on it.
Assignee: alexbio2008 → nobody
Flags: needinfo?(alexbio2008)
Did you fix this bug yet?
Can you please elaborate on what is the problem you are facing since I don't see anything shown twice as you specified.
Lucas, Could you tell me what exactly is the problem we are facing here.
I tried rendering the page which Alex had made.
I found this issue:
Although the text in the div with byline is shown only once, the text beneath the div inside a <p> tag is not being rendered.

So is this the issue which needs to be fixed unlike what alex said of the div being rendered twice.
Flags: needinfo?(lucasr.at.mozilla)
Reporter

Comment 7

6 years ago
(In reply to Sushant Hiray [:sushant] from comment #6)
> Lucas, Could you tell me what exactly is the problem we are facing here.
> I tried rendering the page which Alex had made.
> I found this issue:
> Although the text in the div with byline is shown only once, the text
> beneath the div inside a <p> tag is not being rendered.
> 
> So is this the issue which needs to be fixed unlike what alex said of the
> div being rendered twice.

Just wondering: are you using latest nightly?
Flags: needinfo?(lucasr.at.mozilla)
Umm, I'm not sure but I was using the latest build for Firefox for Android.
Do we get a nightly version for Android as well?
Assignee

Comment 9

6 years ago
(In reply to Sushant Hiray [:sushant] from comment #8)
> Umm, I'm not sure but I was using the latest build for Firefox for Android.
> Do we get a nightly version for Android as well?

Do you mean the released Firefox in the Play store?

Similarly to desktop, yes, we have Nightly builds that you can download here:
http://nightly.mozilla.org/

That's what Lucas meant when he asked about the latest nightly.
Mentor: lucasr.at.mozilla
Whiteboard: [mentor=lucasr][lang=js] → [lang=js]
Assignee

Updated

5 years ago
Mentor: lucasr.at.mozilla → margaret.leibovic
Assignee

Updated

5 years ago
Product: Firefox for Android → Toolkit
Hmm, having just looked at bug 794958, what's the security exposure here?

.innerHTML + mismatched tag delimiters classically lead to being able to inject <script> and such into a page... Can that happen here?
Flags: needinfo?(bnicholson)
Hard to say...I haven't been able to exploit it with some light testing myself, but I definitely can't assert that it's impossible.

I'm not really sure what case this bug was originally filed for. Here's an article [1] with a "<" (surrounded by spaces), which doesn't give us any problems. Here's another article [2] without the surrounding spaces, which confuses our parser since it thinks we're starting a new element. That doesn't display correctly for the original page either, though, so that's not a reader issue.

But I also don't know why we wouldn't use textContent for the byline like we do with the title [3]. If we make that change, we should have nothing to worry about in terms of security or correctness.

[1] https://people.mozilla.org/~bnicholson/test/reader3.html
[2] https://people.mozilla.org/~bnicholson/test/reader4.html
[3] http://mxr.mozilla.org/mozilla-central/source/toolkit/components/reader/AboutReader.jsm?rev=d819aa9d47f6#695
Flags: needinfo?(bnicholson)
Assignee

Updated

4 years ago
Assignee: nobody → margaret.leibovic
Assignee

Comment 12

4 years ago
(In reply to Brian Nicholson (:bnicholson) from comment #11)

> But I also don't know why we wouldn't use textContent for the byline like we
> do with the title [3]. If we make that change, we should have nothing to
> worry about in terms of security or correctness.
> 
> [1] https://people.mozilla.org/~bnicholson/test/reader3.html

I tried using .textContent for the byline on this test page, and we end up with "A thoughtful piece by Brian &lt; Nicholson"

Talking about this with Gijs, it looks like this is a problem with our JSDOMParser .textContent implementation (it's returning this encoded value), and the easiest way forward here would be to just continue to use .innerHTML for about:reader.

I'm temped to just WORKSFORME this bug.
It is slightly concerning that we are using innerHTML here since we don't sanitize the byline like we do with the article content. If want to be extra careful, we could:

1) sanitize the byline, or

2) unencode the HTML:

let tempDiv = document.createElement("div");
tempDiv.innerHTML = rawByline;
let byline = tempDiv.textContent;

Comment 14

4 years ago
When we looked at this during the work week it seemed that because JSDOMParser doesn't actually implement textContent/innerHTML's distinction in terms of having real HTML / unescaped text. I have a pull request for fixing that ( https://github.com/mozilla/readability/pull/80 ) which is blocked on some perf questions, which I'm working on in another PR ( https://github.com/mozilla/readability/pull/88 )
Assignee

Comment 15

4 years ago
While making a testcase for this, I found we really do have problems using innerHTML. With this testcase, it looks like we interpret the second < character as the start of a tag, and we don't render the rest of the byline:

http://people.mozilla.org/~mleibovic/test/brackets.html

Readability does return "A thoughtful piece by Foo < <Bar" as the byline, so it's definitely just a problem with using innerHTML, and this patch fixes that.
Attachment #8591917 - Flags: review?(gijskruitbosch+bugs)

Updated

4 years ago
Attachment #8591917 - Flags: review?(gijskruitbosch+bugs) → review+
https://hg.mozilla.org/mozilla-central/rev/225457512400
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla40
Assignee

Comment 18

4 years ago
Comment on attachment 8591917 [details] [diff] [review]
Use textContent instead of innerHTML to set domain and credits in reader view

Approval Request Comment
[Feature/regressing bug #]: reader view

[User impact if declined]: Users may see mis-formatted domain and byline content in reader view, since we'll render those fields as HTML instead of text. 

[Describe test coverage new/current, TreeHerder]: Tested locally, landed on nightly, no automated test coverage.

[Risks and why]: Low-risk swap to use textContent instead of innerHTML for reader view domain and byline. Possible risk would include showing HTML markup as text, but we fixed readability such that this shouldn't be a problem.

[String/UUID change made/needed]: none
Attachment #8591917 - Flags: approval-mozilla-beta?
Attachment #8591917 - Flags: approval-mozilla-aurora?
Do we still need to uplift this if we're only keeping Reader View without Reading List, for now?
Flags: needinfo?(margaret.leibovic)

Comment 20

4 years ago
(In reply to Liz Henry (:lizzard) from comment #19)
> Do we still need to uplift this if we're only keeping Reader View without
> Reading List, for now?

Yes, see comment #18 .
Attachment #8591917 - Flags: approval-mozilla-beta?
Attachment #8591917 - Flags: approval-mozilla-beta+
Attachment #8591917 - Flags: approval-mozilla-aurora?
Attachment #8591917 - Flags: approval-mozilla-aurora+
Should be in 38 beta 6 but disabled...
(In reply to Sylvestre Ledru [:sylvestre] from comment #21)
> Should be in 38 beta 6 but disabled...

This is for reader view, so it will be enabled. Only 'reading list' is being disabled.
Jared, this contradicts bug 1155191 (comment 2)
Ah sorry, I didn't know you were differentiating between 38 and 38.1. Anyways, there is nothing that this patch needs to do to disable its changes. Bug 1155191 will handle doing all of the disabling.
Assignee

Updated

4 years ago
Flags: needinfo?(margaret.leibovic)
You need to log in before you can comment on or make changes to this bug.