Closed
Bug 794958
Opened 12 years ago
Closed 8 years ago
Use whitelist element and attribute filtering for reader mode
Categories
(Toolkit :: Reader Mode, defect, P5)
Toolkit
Reader Mode
Tracking
()
RESOLVED
WORKSFORME
People
(Reporter: mgoodwin, Unassigned)
References
(Blocks 1 open bug)
Details
Issue:
Currently, reader mode uses a blacklist for the element and attribute filtering. In the interest of making the protection afforded by this filtering more robust, it'd be desirable to replace this with a whitelist (list of known good) filter.
See https://wiki.mozilla.org/Security/Reviews/ReaderMode for more informaion.
Updated•12 years ago
|
OS: Mac OS X → Android
Priority: -- → P1
Hardware: x86 → All
Updated•12 years ago
|
Assignee: lucasr.at.mozilla → nobody
Updated•11 years ago
|
Updated•10 years ago
|
Component: Reader View → Reader Mode
Product: Firefox for Android → Toolkit
Updated•10 years ago
|
Flags: qe-verify?
Flags: firefox-backlog+
Comment 1•10 years ago
|
||
mgoodwin/bnicholson, can you help me understand what would go into fixing this bug?
Is the main concern here that pages could invent their own tags with garbage, and we wouldn't end up removing those?
Looking at our current Readability.js, I see us stripping out tags that we definitely don't want (e.g. "object", "iframe"), and then conditionally stripping other types of tags (e.g. "p", "img") depending on the result of that first pass, as well as some other heuristics about those elements.
To implement a whitelist approach, I imagine we would keep that first pass to get rid of definitely unwanted junk, and then we would look for known good elements, such as "p" and "img", etc.
I'm worried that this change would be a large fundamental change to how we do our readability parsing, and it seems like this could be risky for Fx38.
I'm wondering how much this is actually an issue in practice. Yes, we don't want pages working against us to "fool" the parser, but couldn't a page just put its "unwanted" content inside an element that we think looks good? I think that no matter what we do, we still need heuristics to analyze what's actually inside these elements.
Flags: needinfo?(mgoodwin)
Flags: needinfo?(bnicholson)
Updated•10 years ago
|
Points: --- → 3
Reporter | ||
Comment 2•10 years ago
|
||
(In reply to :Margaret Leibovic from comment #1)
> mgoodwin/bnicholson, can you help me understand what would go into fixing
> this bug?
> Is the main concern here that pages could invent their own tags with
> garbage, and we wouldn't end up removing those?
Yes, or new tags, or existing things in ways that we don't currently envisage (it's difficult to enumerate badness).
> To implement a whitelist approach, I imagine we would keep that first pass
> to get rid of definitely unwanted junk, and then we would look for known
> good elements, such as "p" and "img", etc.
Yes, that sounds sensible.
>
> I'm worried that this change would be a large fundamental change to how we
> do our readability parsing, and it seems like this could be risky for Fx38.
>
> I'm wondering how much this is actually an issue in practice. Yes, we don't
> want pages working against us to "fool" the parser, but couldn't a page just
> put its "unwanted" content inside an element that we think looks good? I
> think that no matter what we do, we still need heuristics to analyze what's
> actually inside these elements.
This bug was less about reader mode being fooled into displaying different content and more about preventing reader mode XSS; we've had bugs of this kind before (see bug 778582) so the more we can do to reduce the likelihood and impact, the better.
I think freddy has (casually) been looking into reader mode stuff recently; perhaps he has more to add?
Flags: needinfo?(mgoodwin) → needinfo?(fbraun)
Comment 3•10 years ago
|
||
It's worth noting here that we already use nsIParserUtils.sanitize() to strip unsafe elements. The filtering that happens in Readability.js maybe helps to remove any other safe elements we don't want people to see in the article, but I'm not sure it's valuable for much else.
I wouldn't think this would be a high priority bug unless we don't trust our built-in sanitization for some reason. And if that were the case, I think we'd have bigger problems!
Flags: needinfo?(bnicholson)
Comment 4•10 years ago
|
||
Updating stale priorities to reflect priority in current reading list / reader view work.
Comment 3 makes sense to me. I'd further posit that we shouldn't be relying on Readability to handle security issues, as it's a complex piece of very heuristicy code. The transform it's doing should be orthogonal to security (ie, security should happen at another layer or in a sandbox).
OS: Android → All
Priority: P1 → P5
(In reply to Brian Nicholson (:bnicholson) from comment #3)
> It's worth noting here that we already use nsIParserUtils.sanitize() to
> strip unsafe elements.
(In reply to Justin Dolske [:Dolske] from comment #4)
> The transform it's doing should be orthogonal to security
Note that if we ever officially plan to create a lib reusable outside FFx, that could become an issue. Eg. my readable-proxy[0] had to implement its own sanitization mechanism, and I bet other potential consumers would expect that we ensure sanitization natively as well. It's a little hard to ask lib users to handle basic security on their own, though that's still an option. At the very least that should he stated in blinking bold =)
That raises the question of what are our plans atm for Readability now, and for the longer term. I'd personally tend to favor federating contributions from the outside world, and to support more use cases that our very owns, but that's obviously just an opinion.
[0]: https://github.com/n1k0/readable-proxy/
Comment 6•10 years ago
|
||
(In reply to Nicolas Perriault (:NiKo`) — needinfo me if you need my attention from comment #5)
> That raises the question of what are our plans atm for Readability now, and
> for the longer term. I'd personally tend to favor federating contributions
> from the outside world, and to support more use cases that our very owns,
> but that's obviously just an opinion.
I agree 100% with that long term plan. We may not be able to fully invest in that in the very near term, given the need to get this feature shipped, though.
Comment 7•10 years ago
|
||
Comment 3 answers the requested needinfo: We should use nsIParserUtils.sanitize() on top of what Readability gives us.
Flags: needinfo?(fbraun)
Comment 8•8 years ago
|
||
(In reply to Frederik Braun [:freddyb] [unavailable June 4th to August 29th] from comment #7)
> Comment 3 answers the requested needinfo: We should use
> nsIParserUtils.sanitize() on top of what Readability gives us.
We already do this now. Sandboxing is bug 1204818.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → WORKSFORME
You need to log in
before you can comment on or make changes to this bug.
Description
•