Closed Bug 37983 Opened 24 years ago Closed 4 years ago

Need better system for controlling content loading

Categories

(Core :: DOM: Security, defect)

defect
Not set
normal

Tracking

()

RESOLVED WORKSFORME
Future

People

(Reporter: shaver, Unassigned)

References

(Blocks 1 open bug)

Details

Attachments

(2 obsolete files)

We need a better way to do content-policy stuff than the system currently
employed by the cookie (!) manager, including smaller, sharper interfaces and
removal of the silly libimg->extension/cookie dependency.

This bug is about the first part; once it's done we should file bugs regarding
the conversion of existing image-blocking code to the new interfaces (assuming
that the interfaces are adequate, or waiting until they're made so).

The patch attached below actually works, but which wants fixing as per jband's
comments in the .layout thread:
- use wstring for URLs
- lose the inline helper, because it ``hides'' NS_WITH_SERVICE, which will make
some kinds of service-interface changes harder (I don't understand that,
completely, but I'll ponder it some more).

Steve thinks that cookie and image management are two faces of the same coin,
but I disagree.  I'll let him make his case here, though, before I make mine.
I think we should take this patch, and use these interfaces, instead of making 
libraries that should not depend on one another at compile or link time do so.  
Steve, I'm willing to help if you're overloaded.  What do you say?

/be
I'm overloaded.  The fix is not just to change the interface, but then to move 
all the cookie-testing code and image-testing code over to the new interface.  
That's the bulk of the work.
Cookie stuff has nothing to do with whether a given object is loaded (rather,
it's _how_ it's loaded, or what headers should be sent on an HTTP request),
which is why I don't think it should be in the same interface.  Having the
cookie manager implement both interfaces seems a way to share implementation
details, if you think they share a lot in common.
Steve -- understood.  I'll come up with some more patches for everyone to review
and try out.

/be
So if brendan is willing to pick up the conversion bits, can I get an r= for my
existing patches?
Status: NEW → ASSIGNED
I can give an r= as mozilla.org bigshot -- what else is needed?  Are module
owners being shy about granting or denying r=?

/be
Nobody has ``officially'' said approve or deny about the actual interfaces,
though various people have mumbled encouragement, or deferred to Steve.

I'll take your bigshot r=, though, since r= seems the exception rather than the
rule these days anyway. =)
Hey, most of your patch is layout. And none of those owners are Cced in the 
bug. Approval if any is what you need from them. As I "mumbled" earlier, tons 
of goodness here.
(Actually, I wasn't referring to you when I was talking about the mumbling, but
no mind.)

I posted to .layout, and nobody seemed to mind, so I'm going in.  Thanks,
everyone!
Here are my thoughts on how user-control of content loading could work, from the
perspective of a user who values his privacy and bandwith.  (I'm posting here
after I mailed some of this to Blake and he suggested I open a bugzilla
account).

First of all, unwanted content is not just images.  With the
advent of layers, frames and off-site javascript and stylesheets, there
are many forms of embedded content that load automatically from a
page, and it'd be a power user's dream to be able to configure his
browser to explicitly allow or deny such content, by URL.  And there,
domain names are not enough; you may want to allow some URLs and
disallow others within a single domain.

So here's my idea:

* have an ordered list of expressions to match URLs against; they can be regular
expressions, or plain *-is-a-wildcard ones.  Each list entry is associated with
a number of flags with 3 possible values: "yes", "no" and "default". one of the
flags would mean "don't load", but the system should be extensible so that later
on someone can implement more precise features like "don't send referers to this
server" or "restrict parts of the DOM from JS access" or whatever.

* before queueing a URL for loading, see which expressions it matches, in order,
collecting the flags values, by finding the first list entry that matches the
URL and where the value for that flag is not 'default'.  If no URLs match, then 
we just use the global default for that flag.  (am I explaining myself..??)

* these flags then affect how mozilla loads and interprets the content; we could
start with just one flag meaning "don't load".

* on the GUI side, configuration could just look like your average sorted list,
with add, move-up, move-down, delete and properties buttons, where properties
gives you all the flags associated with the pattern.

and here's a twist: I'd like this to work on the referer as well as on the URL;
i.e I can set the "do load" flag for an ad server when the referer matches a
certain string, and then later set the "don't load" flag for that ad server with
all referers... and the net result is that I'm *choosing* which sites I support
by seeing ads, and which I don't.

The existing cookie-blocking support could possibly be reimlplemented as a flag
under this scheme.
People really seem to be missing this point, so I'll try and state it very
clearly:

This bug is about providing the _ability_ for Mozilla to control whether a given
piece of content is loaded, not how the should-load-p policy is implemented. 
Mechanism, not policy, and all that.

The interface certainly supports other kinds of content blocking (and scripts,
style sheets, applets and plugins are listed in the interface, though the hooks
in the code haven't been added yet -- they're pretty trivial though, and I
invite you to send a patch atop of mine).

Cookie: and Referer: blocking aren't part of this issue: those aren't about
controlling whether a piece of content is loaded, they're about controlling what
kinds of data is shared in the protocol exchange, and you can apparently use
nsINetMgr hooks for filtering headers.

Why do you needed to know about the referrer, when you have the element and can
walk up the document tree to get the base URL or what have you?
Ok Mike, ignore my previous e-mail message. I (think I) understand this now.
OS: Linux → All
Hardware: PC → All
I like this, but are the APIs for this being considered in light scope of the
broader bug #7380 issue?
Well, given that there are no prefs involved, I think the answer to that
question is ``no''.  But then, I don't really know what it would mean to
consider these APIs in that light.  Matt, if you want to provide a policy
component that does whatever it is you're thinking of, you'll be able to do that
very shortly (as soon as I purge the last of the changes from my local tree).
The last of the stuff in my posted patch is in the tree now, but I want to go
and get my hands on <script>, <style>, <object>, <embed> and <applet> too.

Others should feel free to run ahead, especially those who have chimed in
telling me to do it. =)
Adding bug 36484 as dependant.
If you think, bug 28327 should also depend on this one, please add it.
Depends on: 36484
One moment please, fixing dependencies ...
No longer depends on: 36484
... done
Blocks: ImgInMail, 36484
<script>/<embed>/<object>/<applet> are in the tree.

Style is left, but I'm probably not going to get it soon.  Someone could ape my
patches and whip something up for <style> (don't forget <link>!), and I'd check
it in.
Reassigning to brendan at his request
Assignee: shaver → brendan
Status: ASSIGNED → NEW
Priority: P3 → P2
Target Milestone: --- → M17
Status: NEW → ASSIGNED
Target Milestone: M17 → M18
From a quick look at the interface in the second patch, it is not at all clear
to me how to use it. Who would call it and where? Why the difference between
load and process?
It's called by layout already.

You need to register in the appropriate category to be put in the veto-chain for
content -- n.b. rayw's progid/catid whacking -- and then your implementation
gets called for all embedded content.

shouldLoad is for determining if out-of-band content should be fetched from the
network.  shouldProcess is as yet unused, but is intended to control processing
of inline <script> tags and the like.  (It's unused because I haven't found a
good way to make it work with event handlers without strangling performance.)
I realize I'm coming in late in the game, but we already have several different 
mechanisms for storing per-site security/privacy/content control decisions: 
Steve's cookie/image manager, configurable DOM security policies, and now this. 
At some point, I think it would make sense to unify how we store this 
information.
``This'' has no storage of privacy/security/content-control data.  It is simply
a mechanism by which something that _does_ store that sort of data can use its
data to influence the loading of content.

The hope was that we could remove some of the insane dependencies (imglib
depending on the cookie manager, yay) related to content-policy decisions, but I
don't know that it's helped.  Brendan, are you really going to whack the cookie
manager code to use these interfaces?  (I've given up all hope for ever
understanding what cookie management has to do with image loading policy.)

Shaver's whole point in designing this interface and implementing it for certain
tags was to unify policy interfaces.  I promised to help morse clean up the bad
imglib=>cookie-manager dependency and I will get to it soon, after some major JS
GC whackage (see mh buglist).

Shaver, can you say more about event handler shouldProcess performance probs and
possible solutions?

/be
Mike, am I understanding your 2000-05-25 comment correctly in thinking that your 
fixes are element-dependent? That is, content policy stuff implemented separately 
for <script>, <style>, <object>, <embed>, <applet>, etc?

If so, that's not going to be much use to me in an XML world of <foo>s and <bar>s 
which I want to filter out (or filter in, depending on whether I'm using a 
blacklist or a whitelist).
(From 18352, where it don't belong:
> We also talked
> about the very thing Brendan mentioned in 28327 as part of Shaver's content
> policy hooks: a parameter which gives the type of URI load. I'm probably going
> to have to add this to CheckLoadURI to do some of the things i want to do with
> it.

I object to a URL-load-type parameter, if it's what I'm thinking of: ``this was
loaded in a mail window'' vs. ``this was loaded in a browser window''.  It
shouldn't matter where I view the content -- imap: URLs should work just the
same in browser windows, for example.  We need to check based on _how_ the
content was received, which I guess means the protocol.  [Adding mscott to the
fray, for his URI-dispatching experience.]  We might want to let protocols
signal their ``type'', within broad groups (mailnews, file transfer, (instant)
messaging, RPC, etc.) to avoid enumerating the types in the guts of a content
policy or script-security object.

Matt: I don't really understand how your <foo> and <bar> elements will be
loading external content.  Can you describe the process to me?  It's possible
that the implementations of those tag-elements will use existing code, like
nsHTMLImageFrame, or they might provide their own systems through XBL, in which
case they will need to check with the content policy system as well.
Shaver: I disagree. We had a mail/news security review a couple of months ago
and the consensus was that the security model should be able to differentiate
content loaded from mail or a browser window. Mail may be more
security-sensitive because content arrives unsolicited, as opposed to a browser
where the user must request pages or click links. In mail, the user has a slight
degree less control over what gets loaded. Yes, we can split hairs about how
users can simply not open mail from unknown addresses, et cetera, but the point
is, we should have the ability to assign different security/privacy policies to
mail and browsing, and that requires a load-type parameter.
Mike, if some future strain of XML has the ability to load external resources
<foo data="bar://smeg.org/index.foo" />, like HTML (or XHTML, come to that) does 
now with IMG, OBJECT, STYLE, APPLET etc, then running those future elements 
through the content policies shouldn't require any new code on your part. And if 
that's the case, I can't understand why you're having to implement content policy 
support for HTML's IMG, OBJECT, STYLE, APPLET etc separately from each other in 
the first place. That's all.
Matthew, any code in layout does the load (tells netwerk to fetch the resource),
i.e. knows that your fictional attribute "data" is supposed to load that
resource, and this code is where we'd have to add the call to the
security/privacy checks. Ideally, the programmer writing the loader would also
add that call. The problem that I am slightly concerned about is that he might
forget it.
Mitchell: you're talking about two different things, but I don't think you
realize it.

1) "should be able to differentiate content loaded from mail or a browser
window"

(The use of ``loaded from'' is confusing: do you mean displayed in?  If I click
a link in a mail message, and that opens a browser window, is it loaded from a
mail or browser window?)

2) "Mail may be more security-sensitive because content arrives unsolicited, 
   as opposed to a browser where the user must request pages or click links."

My assertion is that the mechanism used to display content (point 1) shouldn't
matter.  If I middle-click on an attachment icon, the content should be handled,
WRT security and privacy, in exactly the same way as it would be if I displayed
it inline in my message pane, or single clicked to put it there.

As far as point 2 goes, I agree completely.  But I believe that ``how the
content arrived'' is a function of protocol, not of the display mechanism.  If
you got the document -- perhaps indirectly, in the case of attached links -- via
mailbox: or imap: or pop: or what-have-you, then it should be treated as
``mail'' content.  If some site puts <iframe src="imap://foo/bar/baz"> in their
web page, that imap-loaded content had damned well better be treated exactly the
same as if I had opened the message.

I'll repeat myself, in the interests of clarity: I think that how you get the
content is more important than where you're displaying it.  ``mail'' is
distinguished by the mechanisms used to fetch content, not the chrome: URL of
the application containing the <browser> XUL element.

Matt: I still don't understand how the XML stuff magically causes content to be
fetched over the network, in ways that aren't present now.  I understand how
HTML <img> (and XHTML <img>, you'll be happy to know) load their images, and we
do the right thing with that code.  Can you explain to me -- please use small
words and the present tense, for I am a bear of very little brain -- how the
mere sight of <foo> in a document will cause Mozilla to initiate network
activity?  If there are paths through the CSS or other layout code that can
cause a URL load, and they're not content-policy-aware, please let me know so
that I can fix them.
Shaver, how would you, concretely, check, if the content should be loaded? Say,
you have the msg <imap_message://myserver/INBOX/msg123> with the body-content

<html><body>bla bla <img src="http://tracker.evilcorp.com/img.png"
alt="an image"></body></html>

and the user loads the msg in the Mailnews msg pane by clicking on the item in
the thread pane. Please note that a msg must not be able to access another msg
on the same server.

See also <news://news.mozilla.org/39DBE44F.5971FB09@bucksch.org> and followups.
My DOM-fu isn't what it used to be, but the following pseudocode would suffice
to stop the server hit:

function shouldLoadContent(element, URL) {
  var doc = element.getDocument();
  if (isMail(doc.location.protocol) && !isMail(URL.protocol))
     return false;
  return true;
}

The message-can't-access-other-message problem is already solved, I would hope,
by the cross-domain access checks -- mail messages from different sources should
have different principals.  Mitch, am I on crack?
Shaver, if I understood Mitch on .security correctly, the "same-origin" function
is web-oriented in that returns true, if the protocol/server of the 2 URLs
match. However, in the mail case, <imap://myserver/INBOX/msg123> has a different
origin than <imap://myserver/INBOX/msg124>.
Well, that seems like a pretty big, but separate, security issue.

I'll have to catch up in .security to be sure, I guess.
Ben is correct - for mail protocols, each message is considered a distinct trust
domain, so messages do not have access to other messages, or to http pages. See
nsBasePrincipal::Equals() for the implementation.
Right, and I misread what Ben wrote.  All clear on this end, I think. =)
Target Milestone: M18 → mozilla0.9
This is missing 0.9, and I've had my head down elsewhere for too long.  Should
someone take this away from me?  If not, I'll try to get to it in 0.9.1.  It'll
get fixed for 1.0 or else.

/be
Keywords: mozilla1.0
Target Milestone: mozilla0.9 → mozilla0.9.1
Load-balancing.  I'll still do this if no one else stands up and takes it away,
but don't let me stop anyone!

/be
Target Milestone: mozilla0.9.1 → mozilla0.9.2
Target Milestone: mozilla0.9.2 → mozilla0.9.3
I doubt I'll get to this during 0.9.4 -- anyone who wants to take it away,
please jump in.

/be
Target Milestone: mozilla0.9.3 → mozilla0.9.5
So what, exactly, is this bug waiting for? I would be glad to help out on it,
but it isn't really clear what it is that needs to be done.
Keywords: patch
I still have no time for this.  Someone please take it and do right.

/be
Target Milestone: mozilla0.9.5 → mozilla0.9.7
Performance/footprint work has priority.  Anyone motivated to take this bug from
me, please step up.  Otherwise I'll work on it in 0.9.8.

/be
Target Milestone: mozilla0.9.7 → mozilla0.9.8
I am not going to have time for this before 1.0.  Anyone motivated and able,
please feel free to steal it from me, cc'ing me.

/be
Keywords: mozilla1.0
Target Milestone: mozilla0.9.8 → Future
Attachment #8232 - Attachment is obsolete: true
Attachment #8268 - Attachment is obsolete: true
Keywords: patch
As a user, I find it extremely annoying, and frustrating, that this bug is
blocking bug 28327.  I think I'm not alone on this. Look at the difference in
votes.  I don't care about some super-duper ultimately precise way of
selectively disabling content loading.

I just want to disable all automatic content loading from my mail reader.
I don't consider the mail tool to be "just chrome". In fact, it bothers me that
the mail tool is combined with the news tool. I have nothing against sharing
code, but the tools (or tasks, if you insist) should be treated separately.

I don't want my mail settings to be globbed together with the my browser
settings, for example. If this means I can't follow links from my email, then so
be it.
Mail and news client readers should have the option of only automatically
viewing the plain text portion of the message.

Over 99% of the HTML news and e-mail I receive is spam, and the cycles used in
rendering the images are a complete waste.  The same is true for following links
to download images.  All such links received by me since starting using MOZILLA
have been for spam.

The addition of webbugs to the spam makes this a security hole.

Another problem is that of the few HTML e-mails I get that are not spam, these
are generated by products that do not set the Font or color information
correctly, so the result is hard to read.  (It seems the "other" browser
automagically changes the bad fonts, so the problem does not occur as much 

HTML for web pages is fine.  Allow it to be totally turned off for news and
mail.  And by default, a mail and news reader should never communicate with
anything other than the mail or news server.

The end user should also be able to disable the automatic displaying of images
or opening any attachments from  e-mail or news.

It is not good to have some of the porn spam automatically displayed.

It is convenient to use MOZILLA for reading plain text news and mail, but if
this can not be fixed, I will be forced to use a different mail and news client.

This should be a simple matter of providing some check boxes to inhibit some
functions.
John,
    The feature you describe is in development. See bug 30888.
Blocks: 188476
Yes bug 30888, it has almost got it right for the rendering of messages.

The only thing is that there is nothing to let the receiver know that there is
HTML content in the e-mail, should they choose to render it in that mode.

HTML appears to be an attachment, so it seems logical that it's presence should
be indicated in the attachment box.

I'm a terrible owner for this bug.  So I'm giving it to nobody@mozilla.org, and
leaving TM set to Future.  No one can say I was squatting on it, scaring off new
volunteers....  Sorry I didn't have time or energy for it.

/be
Assignee: brendan → nobody
Status: ASSIGNED → NEW
ccing people who've been working on this area of code lately.

Brendan, what's the state of this bug?  What remains to be done here?  How does
mvl's recent work affect this bug?
Bug 191839 is touching some of content policy.  One of my goals in that bug is
to add new callers as appropriate.  Comments/direction welcome.

Oh, and if anyone can comment in there (or privately to me) what CONTROL_TAG is
supposed to cover besides meta-refresh, please do.  I'd also like to know if
RAW_URL is necessary, given that we have OTHER.
Depends on: 191839
Shaver, how's the api just landed for bug 191839 look?

Mailnews should have its own content policy impl that can block all sorts of
things based on prefs as desired, at this point.
QA Contact: chrispetersen → layout
This bug appears to be about Content Policy which seems off-topic for Core/Layout.
Component: Layout → DOM
Component: DOM → DOM: Core & HTML

Should block bug 565512.

Moving this content security related bug to DOM:Security.

Component: DOM: Core & HTML → DOM: Security
Priority: P2 → --

(In reply to Mike Shaver (:shaver -- probably not reading bugmail closely) from comment #0)

We need a better way to do content-policy stuff than the system currently
employed by the cookie (!) manager,

including smaller, sharper interfaces and

removal of the silly libimg->extension/cookie dependency.

This bug is about the first part;

Not reading every thing here, but I am left wondering:

Has this "first part" been fixed yet?

once it's done we should file bugs

regarding
the conversion of existing image-blocking code to the new interfaces
(assuming
that the interfaces are adequate, or waiting until they're made so).

The patch attached below actually works, but which wants fixing as per
jband's
comments in the .layout thread:

So, there is a working patch?

Can this bug be closed?

Looking at the first patch this is when Shaver invented nsIContentPolicy, which has existed for years and has been used by all the original legacy ad-blockers and things like NoScript.

Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: