794958 - Use whitelist element and attribute filtering for reader mode

Reporter

Description

•

13 years ago

Issue: Currently, reader mode uses a blacklist for the element and attribute filtering. In the interest of making the protection afforded by this filtering more robust, it'd be desirable to replace this with a whitelist (list of known good) filter. See https://wiki.mozilla.org/Security/Reviews/ReaderMode for more informaion.

Lucas Rocha (:lucasr)

Updated

•

13 years ago

OS: Mac OS X → Android

Priority: -- → P1

Hardware: x86 → All

Mark Goodwin [:mgoodwin]

Reporter

Updated

•

13 years ago

Blocks: 785077

Mark Goodwin [:mgoodwin]

Reporter

Updated

•

13 years ago

Blocks: reader

Lucas Rocha (:lucasr)

Updated

•

12 years ago

Assignee: lucasr.at.mozilla → nobody

Lucas Rocha (:lucasr)

Updated

•

12 years ago

Blocks: readerv2
No longer blocks: reader, 785077

:Margaret Leibovic

Updated

•

11 years ago

Blocks: fix-readability
No longer blocks: readerv2

Component: Reader View → Reader Mode

Product: Firefox for Android → Toolkit

Marco Mucci [:MarcoM]

Updated

•

11 years ago

Flags: qe-verify?

Flags: firefox-backlog+

Marco Mucci [:MarcoM]

Updated

•

11 years ago

Blocks: 1132074

:Margaret Leibovic

Comment 1

•

11 years ago

mgoodwin/bnicholson, can you help me understand what would go into fixing this bug? Is the main concern here that pages could invent their own tags with garbage, and we wouldn't end up removing those? Looking at our current Readability.js, I see us stripping out tags that we definitely don't want (e.g. "object", "iframe"), and then conditionally stripping other types of tags (e.g. "p", "img") depending on the result of that first pass, as well as some other heuristics about those elements. To implement a whitelist approach, I imagine we would keep that first pass to get rid of definitely unwanted junk, and then we would look for known good elements, such as "p" and "img", etc. I'm worried that this change would be a large fundamental change to how we do our readability parsing, and it seems like this could be risky for Fx38. I'm wondering how much this is actually an issue in practice. Yes, we don't want pages working against us to "fool" the parser, but couldn't a page just put its "unwanted" content inside an element that we think looks good? I think that no matter what we do, we still need heuristics to analyze what's actually inside these elements.

Flags: needinfo?(mgoodwin)

Flags: needinfo?(bnicholson)

Marco Mucci [:MarcoM]

Updated

•

11 years ago

Points: --- → 3

Mark Goodwin [:mgoodwin]

Reporter

Comment 2

•

11 years ago

(In reply to :Margaret Leibovic from comment #1) > mgoodwin/bnicholson, can you help me understand what would go into fixing > this bug? > Is the main concern here that pages could invent their own tags with > garbage, and we wouldn't end up removing those? Yes, or new tags, or existing things in ways that we don't currently envisage (it's difficult to enumerate badness). > To implement a whitelist approach, I imagine we would keep that first pass > to get rid of definitely unwanted junk, and then we would look for known > good elements, such as "p" and "img", etc. Yes, that sounds sensible. > > I'm worried that this change would be a large fundamental change to how we > do our readability parsing, and it seems like this could be risky for Fx38. > > I'm wondering how much this is actually an issue in practice. Yes, we don't > want pages working against us to "fool" the parser, but couldn't a page just > put its "unwanted" content inside an element that we think looks good? I > think that no matter what we do, we still need heuristics to analyze what's > actually inside these elements. This bug was less about reader mode being fooled into displaying different content and more about preventing reader mode XSS; we've had bugs of this kind before (see bug 778582) so the more we can do to reduce the likelihood and impact, the better. I think freddy has (casually) been looking into reader mode stuff recently; perhaps he has more to add?

Flags: needinfo?(mgoodwin) → needinfo?(fbraun)

Brian Nicholson (:bnicholson)

Comment 3

•

11 years ago

It's worth noting here that we already use nsIParserUtils.sanitize() to strip unsafe elements. The filtering that happens in Readability.js maybe helps to remove any other safe elements we don't want people to see in the article, but I'm not sure it's valuable for much else. I wouldn't think this would be a high priority bug unless we don't trust our built-in sanitization for some reason. And if that were the case, I think we'd have bigger problems!

Flags: needinfo?(bnicholson)

Justin Dolske [:Dolske]

Comment 4

•

11 years ago

Updating stale priorities to reflect priority in current reading list / reader view work. Comment 3 makes sense to me. I'd further posit that we shouldn't be relying on Readability to handle security issues, as it's a complex piece of very heuristicy code. The transform it's doing should be orthogonal to security (ie, security should happen at another layer or in a sandbox).

OS: Android → All

Priority: P1 → P5

Marco Mucci [:MarcoM]

Updated

•

11 years ago

No longer blocks: 1132074

Nicolas Perriault (:NiKo`) — needinfo me if you need my attention

Comment 5

•

11 years ago

(In reply to Brian Nicholson (:bnicholson) from comment #3) > It's worth noting here that we already use nsIParserUtils.sanitize() to > strip unsafe elements. (In reply to Justin Dolske [:Dolske] from comment #4) > The transform it's doing should be orthogonal to security Note that if we ever officially plan to create a lib reusable outside FFx, that could become an issue. Eg. my readable-proxy[0] had to implement its own sanitization mechanism, and I bet other potential consumers would expect that we ensure sanitization natively as well. It's a little hard to ask lib users to handle basic security on their own, though that's still an option. At the very least that should he stated in blinking bold =) That raises the question of what are our plans atm for Readability now, and for the longer term. I'd personally tend to favor federating contributions from the outside world, and to support more use cases that our very owns, but that's obviously just an opinion. [0]: https://github.com/n1k0/readable-proxy/

:Gavin Sharp [email: gavin@gavinsharp.com]

Comment 6

•

11 years ago

(In reply to Nicolas Perriault (:NiKo`) — needinfo me if you need my attention from comment #5) > That raises the question of what are our plans atm for Readability now, and > for the longer term. I'd personally tend to favor federating contributions > from the outside world, and to support more use cases that our very owns, > but that's obviously just an opinion. I agree 100% with that long term plan. We may not be able to fully invest in that in the very near term, given the need to get this feature shipped, though.

Frederik Braun [:freddy]

Comment 7

•

11 years ago

Comment 3 answers the requested needinfo: We should use nsIParserUtils.sanitize() on top of what Readability gives us.

Flags: needinfo?(fbraun)

:Gijs (he/him)

Comment 8

•

9 years ago

(In reply to Frederik Braun [:freddyb] [unavailable June 4th to August 29th] from comment #7) > Comment 3 answers the requested needinfo: We should use > nsIParserUtils.sanitize() on top of what Readability gives us. We already do this now. Sandboxing is bug 1204818.

Status: NEW → RESOLVED

Closed: 9 years ago

Resolution: --- → WORKSFORME

Bugzilla

Use whitelist element and attribute filtering for reader mode

Categories

(Toolkit :: Reader Mode, defect, P5)

Tracking

()

People

(Reporter: mgoodwin, Unassigned)

References

(Blocks 1 open bug)

Details

Crash Data

Security

(public)

User Story

Description

Updated

Updated

Updated

Updated

Updated

Updated

Updated

Updated

Comment 1

Updated

Comment 2

Comment 3

Comment 4

Updated

Comment 5

Comment 6

Comment 7

Comment 8