Need better system for controlling content loading
Categories
(Core :: DOM: Security, defect)
Tracking
()
People
(Reporter: shaver, Unassigned)
References
(Blocks 1 open bug)
Details
Attachments
(2 obsolete files)
We need a better way to do content-policy stuff than the system currently employed by the cookie (!) manager, including smaller, sharper interfaces and removal of the silly libimg->extension/cookie dependency. This bug is about the first part; once it's done we should file bugs regarding the conversion of existing image-blocking code to the new interfaces (assuming that the interfaces are adequate, or waiting until they're made so). The patch attached below actually works, but which wants fixing as per jband's comments in the .layout thread: - use wstring for URLs - lose the inline helper, because it ``hides'' NS_WITH_SERVICE, which will make some kinds of service-interface changes harder (I don't understand that, completely, but I'll ponder it some more). Steve thinks that cookie and image management are two faces of the same coin, but I disagree. I'll let him make his case here, though, before I make mine.
Reporter | ||
Comment 1•24 years ago
|
||
Comment 2•24 years ago
|
||
I think we should take this patch, and use these interfaces, instead of making libraries that should not depend on one another at compile or link time do so. Steve, I'm willing to help if you're overloaded. What do you say? /be
Comment 3•24 years ago
|
||
I'm overloaded. The fix is not just to change the interface, but then to move all the cookie-testing code and image-testing code over to the new interface. That's the bulk of the work.
Reporter | ||
Comment 4•24 years ago
|
||
Cookie stuff has nothing to do with whether a given object is loaded (rather, it's _how_ it's loaded, or what headers should be sent on an HTTP request), which is why I don't think it should be in the same interface. Having the cookie manager implement both interfaces seems a way to share implementation details, if you think they share a lot in common.
Comment 5•24 years ago
|
||
Steve -- understood. I'll come up with some more patches for everyone to review and try out. /be
Reporter | ||
Comment 6•24 years ago
|
||
Reporter | ||
Comment 7•24 years ago
|
||
So if brendan is willing to pick up the conversion bits, can I get an r= for my existing patches?
Comment 8•24 years ago
|
||
I can give an r= as mozilla.org bigshot -- what else is needed? Are module owners being shy about granting or denying r=? /be
Reporter | ||
Comment 9•24 years ago
|
||
Nobody has ``officially'' said approve or deny about the actual interfaces, though various people have mumbled encouragement, or deferred to Steve. I'll take your bigshot r=, though, since r= seems the exception rather than the rule these days anyway. =)
Comment 10•24 years ago
|
||
Hey, most of your patch is layout. And none of those owners are Cced in the bug. Approval if any is what you need from them. As I "mumbled" earlier, tons of goodness here.
Reporter | ||
Comment 11•24 years ago
|
||
(Actually, I wasn't referring to you when I was talking about the mumbling, but no mind.) I posted to .layout, and nobody seemed to mind, so I'm going in. Thanks, everyone!
Comment 12•24 years ago
|
||
Here are my thoughts on how user-control of content loading could work, from the perspective of a user who values his privacy and bandwith. (I'm posting here after I mailed some of this to Blake and he suggested I open a bugzilla account). First of all, unwanted content is not just images. With the advent of layers, frames and off-site javascript and stylesheets, there are many forms of embedded content that load automatically from a page, and it'd be a power user's dream to be able to configure his browser to explicitly allow or deny such content, by URL. And there, domain names are not enough; you may want to allow some URLs and disallow others within a single domain. So here's my idea: * have an ordered list of expressions to match URLs against; they can be regular expressions, or plain *-is-a-wildcard ones. Each list entry is associated with a number of flags with 3 possible values: "yes", "no" and "default". one of the flags would mean "don't load", but the system should be extensible so that later on someone can implement more precise features like "don't send referers to this server" or "restrict parts of the DOM from JS access" or whatever. * before queueing a URL for loading, see which expressions it matches, in order, collecting the flags values, by finding the first list entry that matches the URL and where the value for that flag is not 'default'. If no URLs match, then we just use the global default for that flag. (am I explaining myself..??) * these flags then affect how mozilla loads and interprets the content; we could start with just one flag meaning "don't load". * on the GUI side, configuration could just look like your average sorted list, with add, move-up, move-down, delete and properties buttons, where properties gives you all the flags associated with the pattern. and here's a twist: I'd like this to work on the referer as well as on the URL; i.e I can set the "do load" flag for an ad server when the referer matches a certain string, and then later set the "don't load" flag for that ad server with all referers... and the net result is that I'm *choosing* which sites I support by seeing ads, and which I don't. The existing cookie-blocking support could possibly be reimlplemented as a flag under this scheme.
Reporter | ||
Comment 13•24 years ago
|
||
People really seem to be missing this point, so I'll try and state it very clearly: This bug is about providing the _ability_ for Mozilla to control whether a given piece of content is loaded, not how the should-load-p policy is implemented. Mechanism, not policy, and all that. The interface certainly supports other kinds of content blocking (and scripts, style sheets, applets and plugins are listed in the interface, though the hooks in the code haven't been added yet -- they're pretty trivial though, and I invite you to send a patch atop of mine). Cookie: and Referer: blocking aren't part of this issue: those aren't about controlling whether a piece of content is loaded, they're about controlling what kinds of data is shared in the protocol exchange, and you can apparently use nsINetMgr hooks for filtering headers. Why do you needed to know about the referrer, when you have the element and can walk up the document tree to get the base URL or what have you?
Comment 14•24 years ago
|
||
Ok Mike, ignore my previous e-mail message. I (think I) understand this now.
Comment 15•24 years ago
|
||
I like this, but are the APIs for this being considered in light scope of the broader bug #7380 issue?
Reporter | ||
Comment 16•24 years ago
|
||
Well, given that there are no prefs involved, I think the answer to that question is ``no''. But then, I don't really know what it would mean to consider these APIs in that light. Matt, if you want to provide a policy component that does whatever it is you're thinking of, you'll be able to do that very shortly (as soon as I purge the last of the changes from my local tree).
Reporter | ||
Comment 17•24 years ago
|
||
The last of the stuff in my posted patch is in the tree now, but I want to go and get my hands on <script>, <style>, <object>, <embed> and <applet> too. Others should feel free to run ahead, especially those who have chimed in telling me to do it. =)
Comment 18•24 years ago
|
||
Adding bug 36484 as dependant. If you think, bug 28327 should also depend on this one, please add it.
Reporter | ||
Comment 21•24 years ago
|
||
<script>/<embed>/<object>/<applet> are in the tree. Style is left, but I'm probably not going to get it soon. Someone could ape my patches and whip something up for <style> (don't forget <link>!), and I'd check it in.
Comment 22•24 years ago
|
||
Reassigning to brendan at his request
Updated•24 years ago
|
Updated•24 years ago
|
Updated•24 years ago
|
Comment 23•24 years ago
|
||
From a quick look at the interface in the second patch, it is not at all clear to me how to use it. Who would call it and where? Why the difference between load and process?
Reporter | ||
Comment 24•24 years ago
|
||
It's called by layout already. You need to register in the appropriate category to be put in the veto-chain for content -- n.b. rayw's progid/catid whacking -- and then your implementation gets called for all embedded content. shouldLoad is for determining if out-of-band content should be fetched from the network. shouldProcess is as yet unused, but is intended to control processing of inline <script> tags and the like. (It's unused because I haven't found a good way to make it work with event handlers without strangling performance.)
Comment 25•24 years ago
|
||
I realize I'm coming in late in the game, but we already have several different mechanisms for storing per-site security/privacy/content control decisions: Steve's cookie/image manager, configurable DOM security policies, and now this. At some point, I think it would make sense to unify how we store this information.
Reporter | ||
Comment 26•24 years ago
|
||
``This'' has no storage of privacy/security/content-control data. It is simply a mechanism by which something that _does_ store that sort of data can use its data to influence the loading of content. The hope was that we could remove some of the insane dependencies (imglib depending on the cookie manager, yay) related to content-policy decisions, but I don't know that it's helped. Brendan, are you really going to whack the cookie manager code to use these interfaces? (I've given up all hope for ever understanding what cookie management has to do with image loading policy.)
Comment 27•24 years ago
|
||
Shaver's whole point in designing this interface and implementing it for certain tags was to unify policy interfaces. I promised to help morse clean up the bad imglib=>cookie-manager dependency and I will get to it soon, after some major JS GC whackage (see mh buglist). Shaver, can you say more about event handler shouldProcess performance probs and possible solutions? /be
Comment 28•24 years ago
|
||
Mike, am I understanding your 2000-05-25 comment correctly in thinking that your fixes are element-dependent? That is, content policy stuff implemented separately for <script>, <style>, <object>, <embed>, <applet>, etc? If so, that's not going to be much use to me in an XML world of <foo>s and <bar>s which I want to filter out (or filter in, depending on whether I'm using a blacklist or a whitelist).
Reporter | ||
Comment 29•24 years ago
|
||
(From 18352, where it don't belong:
> We also talked
> about the very thing Brendan mentioned in 28327 as part of Shaver's content
> policy hooks: a parameter which gives the type of URI load. I'm probably going
> to have to add this to CheckLoadURI to do some of the things i want to do with
> it.
I object to a URL-load-type parameter, if it's what I'm thinking of: ``this was
loaded in a mail window'' vs. ``this was loaded in a browser window''. It
shouldn't matter where I view the content -- imap: URLs should work just the
same in browser windows, for example. We need to check based on _how_ the
content was received, which I guess means the protocol. [Adding mscott to the
fray, for his URI-dispatching experience.] We might want to let protocols
signal their ``type'', within broad groups (mailnews, file transfer, (instant)
messaging, RPC, etc.) to avoid enumerating the types in the guts of a content
policy or script-security object.
Matt: I don't really understand how your <foo> and <bar> elements will be
loading external content. Can you describe the process to me? It's possible
that the implementations of those tag-elements will use existing code, like
nsHTMLImageFrame, or they might provide their own systems through XBL, in which
case they will need to check with the content policy system as well.
Comment 30•24 years ago
|
||
Shaver: I disagree. We had a mail/news security review a couple of months ago and the consensus was that the security model should be able to differentiate content loaded from mail or a browser window. Mail may be more security-sensitive because content arrives unsolicited, as opposed to a browser where the user must request pages or click links. In mail, the user has a slight degree less control over what gets loaded. Yes, we can split hairs about how users can simply not open mail from unknown addresses, et cetera, but the point is, we should have the ability to assign different security/privacy policies to mail and browsing, and that requires a load-type parameter.
Comment 31•24 years ago
|
||
Mike, if some future strain of XML has the ability to load external resources <foo data="bar://smeg.org/index.foo" />, like HTML (or XHTML, come to that) does now with IMG, OBJECT, STYLE, APPLET etc, then running those future elements through the content policies shouldn't require any new code on your part. And if that's the case, I can't understand why you're having to implement content policy support for HTML's IMG, OBJECT, STYLE, APPLET etc separately from each other in the first place. That's all.
Comment 32•24 years ago
|
||
Matthew, any code in layout does the load (tells netwerk to fetch the resource), i.e. knows that your fictional attribute "data" is supposed to load that resource, and this code is where we'd have to add the call to the security/privacy checks. Ideally, the programmer writing the loader would also add that call. The problem that I am slightly concerned about is that he might forget it.
Reporter | ||
Comment 33•24 years ago
|
||
Mitchell: you're talking about two different things, but I don't think you realize it. 1) "should be able to differentiate content loaded from mail or a browser window" (The use of ``loaded from'' is confusing: do you mean displayed in? If I click a link in a mail message, and that opens a browser window, is it loaded from a mail or browser window?) 2) "Mail may be more security-sensitive because content arrives unsolicited, as opposed to a browser where the user must request pages or click links." My assertion is that the mechanism used to display content (point 1) shouldn't matter. If I middle-click on an attachment icon, the content should be handled, WRT security and privacy, in exactly the same way as it would be if I displayed it inline in my message pane, or single clicked to put it there. As far as point 2 goes, I agree completely. But I believe that ``how the content arrived'' is a function of protocol, not of the display mechanism. If you got the document -- perhaps indirectly, in the case of attached links -- via mailbox: or imap: or pop: or what-have-you, then it should be treated as ``mail'' content. If some site puts <iframe src="imap://foo/bar/baz"> in their web page, that imap-loaded content had damned well better be treated exactly the same as if I had opened the message. I'll repeat myself, in the interests of clarity: I think that how you get the content is more important than where you're displaying it. ``mail'' is distinguished by the mechanisms used to fetch content, not the chrome: URL of the application containing the <browser> XUL element. Matt: I still don't understand how the XML stuff magically causes content to be fetched over the network, in ways that aren't present now. I understand how HTML <img> (and XHTML <img>, you'll be happy to know) load their images, and we do the right thing with that code. Can you explain to me -- please use small words and the present tense, for I am a bear of very little brain -- how the mere sight of <foo> in a document will cause Mozilla to initiate network activity? If there are paths through the CSS or other layout code that can cause a URL load, and they're not content-policy-aware, please let me know so that I can fix them.
Comment 34•24 years ago
|
||
Shaver, how would you, concretely, check, if the content should be loaded? Say, you have the msg <imap_message://myserver/INBOX/msg123> with the body-content <html><body>bla bla <img src="http://tracker.evilcorp.com/img.png" alt="an image"></body></html> and the user loads the msg in the Mailnews msg pane by clicking on the item in the thread pane. Please note that a msg must not be able to access another msg on the same server. See also <news://news.mozilla.org/39DBE44F.5971FB09@bucksch.org> and followups.
Reporter | ||
Comment 35•24 years ago
|
||
My DOM-fu isn't what it used to be, but the following pseudocode would suffice to stop the server hit: function shouldLoadContent(element, URL) { var doc = element.getDocument(); if (isMail(doc.location.protocol) && !isMail(URL.protocol)) return false; return true; } The message-can't-access-other-message problem is already solved, I would hope, by the cross-domain access checks -- mail messages from different sources should have different principals. Mitch, am I on crack?
Comment 36•24 years ago
|
||
Shaver, if I understood Mitch on .security correctly, the "same-origin" function is web-oriented in that returns true, if the protocol/server of the 2 URLs match. However, in the mail case, <imap://myserver/INBOX/msg123> has a different origin than <imap://myserver/INBOX/msg124>.
Reporter | ||
Comment 37•24 years ago
|
||
Well, that seems like a pretty big, but separate, security issue. I'll have to catch up in .security to be sure, I guess.
Comment 38•24 years ago
|
||
Ben is correct - for mail protocols, each message is considered a distinct trust domain, so messages do not have access to other messages, or to http pages. See nsBasePrincipal::Equals() for the implementation.
Reporter | ||
Comment 39•24 years ago
|
||
Right, and I misread what Ben wrote. All clear on this end, I think. =)
Updated•24 years ago
|
Comment 40•23 years ago
|
||
This is missing 0.9, and I've had my head down elsewhere for too long. Should someone take this away from me? If not, I'll try to get to it in 0.9.1. It'll get fixed for 1.0 or else. /be
Comment 41•23 years ago
|
||
Load-balancing. I'll still do this if no one else stands up and takes it away, but don't let me stop anyone! /be
Updated•23 years ago
|
Comment 42•23 years ago
|
||
I doubt I'll get to this during 0.9.4 -- anyone who wants to take it away, please jump in. /be
Comment 43•23 years ago
|
||
So what, exactly, is this bug waiting for? I would be glad to help out on it, but it isn't really clear what it is that needs to be done.
Comment 44•23 years ago
|
||
I still have no time for this. Someone please take it and do right. /be
Comment 45•23 years ago
|
||
Performance/footprint work has priority. Anyone motivated to take this bug from me, please step up. Otherwise I'll work on it in 0.9.8. /be
Comment 46•23 years ago
|
||
I am not going to have time for this before 1.0. Anyone motivated and able, please feel free to steal it from me, cc'ing me. /be
Updated•23 years ago
|
Updated•23 years ago
|
Comment 47•22 years ago
|
||
As a user, I find it extremely annoying, and frustrating, that this bug is blocking bug 28327. I think I'm not alone on this. Look at the difference in votes. I don't care about some super-duper ultimately precise way of selectively disabling content loading. I just want to disable all automatic content loading from my mail reader. I don't consider the mail tool to be "just chrome". In fact, it bothers me that the mail tool is combined with the news tool. I have nothing against sharing code, but the tools (or tasks, if you insist) should be treated separately. I don't want my mail settings to be globbed together with the my browser settings, for example. If this means I can't follow links from my email, then so be it.
Comment 48•22 years ago
|
||
Mail and news client readers should have the option of only automatically viewing the plain text portion of the message. Over 99% of the HTML news and e-mail I receive is spam, and the cycles used in rendering the images are a complete waste. The same is true for following links to download images. All such links received by me since starting using MOZILLA have been for spam. The addition of webbugs to the spam makes this a security hole. Another problem is that of the few HTML e-mails I get that are not spam, these are generated by products that do not set the Font or color information correctly, so the result is hard to read. (It seems the "other" browser automagically changes the bad fonts, so the problem does not occur as much HTML for web pages is fine. Allow it to be totally turned off for news and mail. And by default, a mail and news reader should never communicate with anything other than the mail or news server. The end user should also be able to disable the automatic displaying of images or opening any attachments from e-mail or news. It is not good to have some of the porn spam automatically displayed. It is convenient to use MOZILLA for reading plain text news and mail, but if this can not be fixed, I will be forced to use a different mail and news client. This should be a simple matter of providing some check boxes to inhibit some functions.
Comment 49•22 years ago
|
||
John, The feature you describe is in development. See bug 30888.
Comment 50•22 years ago
|
||
Yes bug 30888, it has almost got it right for the rendering of messages. The only thing is that there is nothing to let the receiver know that there is HTML content in the e-mail, should they choose to render it in that mode. HTML appears to be an attachment, so it seems logical that it's presence should be indicated in the attachment box.
Comment 51•22 years ago
|
||
I'm a terrible owner for this bug. So I'm giving it to nobody@mozilla.org, and leaving TM set to Future. No one can say I was squatting on it, scaring off new volunteers.... Sorry I didn't have time or energy for it. /be
Comment 52•21 years ago
|
||
ccing people who've been working on this area of code lately. Brendan, what's the state of this bug? What remains to be done here? How does mvl's recent work affect this bug?
Comment 53•21 years ago
|
||
Bug 191839 is touching some of content policy. One of my goals in that bug is to add new callers as appropriate. Comments/direction welcome. Oh, and if anyone can comment in there (or privately to me) what CONTROL_TAG is supposed to cover besides meta-refresh, please do. I'd also like to know if RAW_URL is necessary, given that we have OTHER.
Comment 54•20 years ago
|
||
Shaver, how's the api just landed for bug 191839 look? Mailnews should have its own content policy impl that can block all sorts of things based on prefs as desired, at this point.
Updated•15 years ago
|
Comment 55•6 years ago
|
||
This bug appears to be about Content Policy which seems off-topic for Core/Layout.
Assignee | ||
Updated•5 years ago
|
Comment 56•5 years ago
|
||
Should block bug 565512.
Comment 57•4 years ago
|
||
Moving this content security related bug to DOM:Security.
Comment 58•4 years ago
|
||
(In reply to Mike Shaver (:shaver -- probably not reading bugmail closely) from comment #0)
We need a better way to do content-policy stuff than the system currently
employed by the cookie (!) manager,
including smaller, sharper interfaces and
removal of the silly libimg->extension/cookie dependency.
This bug is about the first part;
Not reading every thing here, but I am left wondering:
Has this "first part" been fixed yet?
once it's done we should file bugs
regarding
the conversion of existing image-blocking code to the new interfaces
(assuming
that the interfaces are adequate, or waiting until they're made so).
The patch attached below actually works, but which wants fixing as per
jband's
comments in the .layout thread:
So, there is a working patch?
Can this bug be closed?
Comment 59•4 years ago
|
||
Looking at the first patch this is when Shaver invented nsIContentPolicy, which has existed for years and has been used by all the original legacy ad-blockers and things like NoScript.
Description
•