Open Bug 78104 Opened 23 years ago Updated 2 years ago

[RFE]block images by directory or by regexp/pattern

Categories

(Core :: Graphics: Image Blocking, enhancement)

enhancement

Tracking

()

Future

People

(Reporter: the_great_spam_bin, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: helpwanted)

Attachments

(3 files, 3 obsolete files)

It would be really cool if, in addition to being able to block all images
orginating from a given server, Mozilla could block all images from URLs
matching a given pattern. On many sites, ad images originate from the same
server as the non-ad images, so when the user tries to block the ad images, he
or she ends up blocking the non-ad images as well. It would be nice if Mozilla,
like iCab, could be set to block images originating from, for example, all URLs
containing the pattern "/Ads/" thereby allowing to only block the ad images on
the image server. This feature could also be useful if the ad banners are using
Akamai to serve their images because although the originating server is an
Akamai server, the URL still often contains a pattern matching the name of the
ad provider. It would be even cooler, though not absolutely necessary, if this
matching supported regular expressions.
*** Bug 78105 has been marked as a duplicate of this bug. ***
Marking NEW.
Assignee: mstoltz → pavlov
Status: UNCONFIRMED → NEW
Component: Security: General → ImageLib
Ever confirmed: true
OS: Mac System 9.x → All
QA Contact: ckritzer → tpreston
Hardware: Macintosh → All
Summary: Pattern-matching based image blocking → [RFE] Pattern-matching based image blocking
I agree that this would be a cool feature and it can probably be worked into
nsIContentPolicy, which we will hopefully be using for configurable content
filtering.
is there a bug for making nsIContentPolicy actually work anywhere?  There is
code in the imageframe that is if 0'd out since I had no way of testing it...
but it should allow imageblockin to work.
Target Milestone: --- → Future
What about having this read an external file, say like those used by
Junkbusters? Might save on the UI work if it was a parse/load/match issue as
opposed to that plus maintaining the associated files and UI support. An "import
block file" or something, maybe?


 
I would love to add that feature. It's on my list. I'll take this bug.
Assignee: pavlov → mstoltz
While it'd be cool if one could block a domain (or a directory of it), certain
machines under it should still be allowed to display images. 

Block "*.akamai.net" (99% ****), but allow "a1964.g.akamai.net" (the 5th wave)
and "a772.g.akamai.net" (apple) to show.
This should be implemented with regexp matching, preferably.

BLOCK ^ad\.
BLOCK /ads/
ALLOW foo.com/ads/

Cookie blocking/allowing also needs a regexp hookup. I think there's a bug for
that somewhere. Adding this to junkbuster feature tracker. No chance of this
pre-1.0?
Blocks: 91783
Keywords: helpwanted
Does this approach have to be specific to blocking images, or could it also be
used to block any http request matching a given expression?
Blocks: 91785
There are plenty of URLs for JavaScript code that loads ad banners; if you block
the URL of the JavaScript, then you block the ad banners completely (much
cleaner than blocking just the images).  If a list of URLs to block is
maintained, this lets you keep JavaScript on but block (much of) the stuff you
don't want.
A very nice implementation would be something like AdShield for IE.
See http://www.adshield.org/guide/Guide.htm#suppress for screenshots.
Right now, image and cookie permissions are stored in the same file in the
profile directory -- cookperm.txt. According to nsPermissions.cpp, which reads
from and writes to that file the current format of that file is:

host \t permission \t permission ... \n

When an image or cookie is blocked, the host name is extracted from the URL and
entries in the filer are updated. Instead of creating a new file that contains
image blocking patterns, I propose that we modify this format to be:

pattern \t permission \t permission ... \n

where 'pattern' is matched against the entire URL, not just the host name.

Under this new scheme, we may even be able to support entires of the old form by
simply treating the 'host' string as a pattern to be found in the URL. If we did
it this way, we would retain backward comaptibility with everyone's cookperm.txt
and we would be able to put off changing the current cookie/image permissions ui.

This idea seems to require some very doable changes to nsPermissions.cpp. The
questions that I have for knowledgable mozilla people are:

1. What should the format of the patterns be? UNIX-like regexps? Shell-like
completion (e.g. *.foo.com)? This affects whether we can have backward
compatibility with old entries?
2. Can we use pattern matching code from anywhere else in the tree? I wouldn't
want to reinvent the wheel. I know js has regular expression processing.
How about we use JavaScript regular expressions, and just use the JavaScript
engine directly? [A quick look shows extensions\cookies\makefile.win still lists
the JS engine as a .LIB, but nothing currently uses it in this tree]

One possibility would be to enclose regular expressions in "/" characters.  This
would help denote them as "new style patterns" rather than "old style host
names", and would also help reinforce they are regular expressions to the casual
viewer of the .txt file (as JS source code wraps regexs in "/")

Thus, you may expect to see:

/\S*:\/\/ads\..*/

which would block all protocols from all hosts starting with "ads.".  Of course,
the UI would implement a nicer layer on this that would provide a simple way to
partially specify the host, as per previous comments in this bug.
*** Bug 136575 has been marked as a duplicate of this bug. ***
Ive done some looking around at the source, and plan to tackle this. Im planning
on using the JavaScript RegExp for handaling urls. Also thinking about allowing
simple UNIX like wildchars (*.foo.com), that would easily be translated into
regexps.
I am glad this is getting addressed.  I think the solution discussed here is a
great idea.  However, I humbly request that you include, in whatever solution
you adopt, a fix for the issue discussed in Bug 136091 (Port number in image
source disables image blocking)?  Hopefully it will be solved naturally by your
enhancement.  This is a problem for those of us who are interested in blocking
images from servers mounted on ports other than 80, or from any server when the
link in the HTML source includes any port number at all.
In fact, I have just discovered that one can circumvent Moz's image blocking
entirely by simply attaching the :80 to the host-name portion of the URL (sort
of expected from what I already knew about incorrect image blocking for URLs
including any port number).
Maybe you could also look into how this solution might (or might not)
incorporate solutions to Bug 64068/Bug 133114 (images load even though all
images are blocked or certain server is blocked).  Also, be careful of Bug
83047/Bug 140172, because USEMAP attributes seem to sometimes break the image
blocking functionality.
I hope I know what I'm talking about.  Cheers.
*** Bug 141562 has been marked as a duplicate of this bug. ***
i completely and totally am for this feature.  what are the chnaces this feature
will make it into rc3 or the final 1.0?  also i think not only images should be
blocked.. it seems the advertising media is moving to flash and right clicking
on flash applets brings up a macromedia context menu :-(
Chance of it being in 1.0 is Zero no one has coded anything and its not a
feature anyone is working on at the moment. Sorry but if you can code C++ you
can help us out.
Following #18, an option to determine what objects to block would be nice (by
tabs: <img>, <object>, <script> etc.)
I Have started working on this, and got it fundamently working with images and 
cookies. Someone pointed out (#16) that port numbers in url defeat the current 
system, so I have to look into that as well. Once I get it working down low, I 
was going to attempt to figure out an improved UI to allow support for regexs, 
as right now all you can do is edit the text file. I should be able to get to 
this now, as im done with classes.
Blocking any type of object is bug 94035.
iCab (MacOS web browser) has a powerful pref for this...
That screenshot looks really nice!  Might be a good model to follow by.  I think
regular expressions might be better than the simple grep-like model they have,
but it might be a good default mode for those who don't know reg exps.
Blocks: useragent
Please do not make this feature dependent on Javascript being on. (Why isnt it
called ECMAscript in the browser UI - isn't Mozilla's implementation compliant
to the standard?)  Doing that would be remedying an annoyance (i.e. ad banners,
slow loading) with an abomination (i.e. stupid browser tricks, security holes
galore).  I am not trying to start a religious war, only to add some qualifiers
to my vote for this bug.
This is my inital implementation of regexp based blocking. It is using the
JavaScript engine for regexp support. Currently I have not focused on any speed
or other optimizations (Storing compiled regexps with host, so doesnt need to
be done each time...). I have used it under Linux, I hope it should work on
other platforms. To make acctual use of the regexp the cookperm.txt file must
be edited by hand (located in profile directory).

The standard host match is still there, and is the only avalible option from
within the GUI. A sample may look like this (The spaces should be tabs in the
file!):

x10.com 0F 1F
*.doubleclick.net 0F 1F
/images.*.slashdot.org\/banner/ 1F

The first is normal host blocking. Second one will block on any Host that is
part of the doubleclick.net domain. The last is a regexp to block add banners
served from slashdot's own images' servers.

Most modifications occured in nsPermissions.cpp. In nsImgManager I modified the
code to send the image url, instead of just the host.
Comment on attachment 88351 [details] [diff] [review]
Patch for regexp based blocking of images and cookies.

Updating MIME type, as this patch is gziped.

So these are just wildcards, not regexps? Regexps might be preferable... UI can
always let users enter them as wildcards and convert * to .* and ? to .
Bah, can't change a patch's MIME type to application/g-zip. You'll have to
download it and manually unzip.
Sorry about the gzip. It does support regexps, as the thrid line is an example
of it. For the code to recognize it as such, it must have a leading and trailing
'/' ... just like normal javascript or vim. If it does not, then it will be
treated as a normal hostname, or a wildcard if it has '*' in it.
I believe the feature should be designed (and marketed) toward network
admin, parents, etc. and should not be marketed as an ad-blocking
feature (too easy for advertisers to circumvent this, and we should be
careful about image-making of mozilla.org).

What we could do is separating the feature into backend (read regex
from a .js file) and front-end (accessible from Preference, using the
simplier *? expression) (see bug 94797).
Bah!  Screw the corporates.  They can take away ad-blockers from Tivo, but they
can't touch open-source software.  Put it in the front.  Besides, with enough
people using JunkBuster, Window Washer, and other anti-ad programs, it seems to
me that its what people want anyway.  If it's good for network admins and
parents like it too, but it seems to me that ad-blocking is the primary function
here.
Attached patch Non-gzipped unified diff (obsolete) — Splinter Review
Here's the same patch, but as a unified diff with context, capable of being
applied from the mozilla directory (with -p1) or its parent (-p0).  I've
changed none of the code whatsoever.
Regarding the patch (attachment 88351 [details] [diff] [review], aka attachment 88849 [details] [diff] [review]): It would be
worthwhile to only call into the JS for globs and regexes, assuming it's
expensive to do so. Here's a small patch (applies only after original patch) to
only call the javascript comparison function on special entries (I have over
1000 records in cookperm.txt, all but two being old-style entries; thus,
visiting hosts late in the list results in massive numbers of calls per
image/cookie).
I left out the spacing changes in the else block to make the patch smaller.

--- nsPermissions.cpp~	Sun Jun 23 08:27:36 2002
+++ nsPermissions.cpp	Sun Jun 23 08:31:04 2002
@@ -294,11 +294,25 @@
       /* Try using some JavaScript here now.... */ 
       //fprintf( stderr, "\t%s\n", hostStruct->host );
 
+      /* Use the javascript as a last resort (only if glob or RE) */
+      if( ( *(hostStruct->host) != '/' )
+          && ( PL_strchr( hostStruct->host, '*' ) == NULL ) ) {
+
+        PRInt32 hostlen = PL_strchr(hostStruct->host, '/') - hostStruct->host;
+        if( !PL_strncasecmp( hostname, hostStruct->host, hostlen - 1 ) ) {
+          ret = JS_TRUE;
+          rval = STRING_TO_JSVAL( "true" );
+        } else {
+          continue;
+        }
+      } else {
+
       argv[1] = STRING_TO_JSVAL( 
           JS_NewStringCopyN(jscx,hostStruct->host, PL_strlen(hostStruct->host )
) );
       ret = JS_CallFunction( jscx, glob, jsCompareFunc, 2, argv, &rval );
       /*fprintf ( stderr, "Called compare, ret = %d, rval = %d\n",
           JSVAL_TO_BOOLEAN( ret ), JSVAL_TO_BOOLEAN( rval ) );*/
+      }
 
       if ( ret == JS_TRUE && (JSVAL_TO_BOOLEAN(rval)==JS_TRUE) ) {
         /* search for type in the permission list for this host */
The one thing I had though about to speed it up was to 'compile' the regexp when
loading the file and then save it within the hoststruct. The main problem I can
see with this is large memory usage... espically if you have 1000 items.
However, when it came to comparing the items, it could easily be done by
checking if the member of the struct was null. If it is, then do basic host
match, other wise call a one-line JS function.

Another possible solution would be to keep a second list of actively used
hoststructs, and then compare from there first, moving onto the main list if it
a match wasnt found. This would be bennifical to browsing a handful of sites
constatnly .... sparratic web-browsing wouldn't be much better off. One could
take  this idea to an extreme and make it act like a multi-level feedback queue...

A third option would be to use the JS_RegExp fucntion calls. I cant find them in
the online docs to SpiderMonkey, but they are defined whithin the jsapi.h (line
1611). Then instead of having a JS function, just call the C API directly. This
could be extended with the first idea, saving the compiled regexp.

I think I will tinker with the thrid option, as it should be the easist to
implement. As for the politics, I think it should be in the frontend. Simple
host-named blocking already is. I think from within the prefernce view the
wildcard and regexp should be modifable. If someone wants to use at as a
censoring tool ... I dont paticularly want to be invovled. I think censoring is
anti-opensource.
Maybe it could borrow figures from the new URL sorting engine for auto-complete.
 I'm pretty sure it has frequency figures for each URL now.  (I used to have a
bug #, but it was complete, so I removed my vote.)  The higher URLs get faster
regexp matching, and the rarer ones get slower ones.

However, will this catch blocked URLs as well?  For something like
".+\/ads\/.+", it's going to be hit quite a few times, and all of them blocked.
I've done some hacking on the patch.  An important detail that needs to be
addressed is that with inexact host representations, the same URL can match
multiple times.  Currently, my thought is to leave it matching the first entry
hit.  To avoid problems, I've come up with the following recommendations (would
go in AddHost in nsPermissions.cpp):
Preferred order would be from least to most general:
  - Explicit hosts first, sorted alphabetically;
  - Globbed hosts next, sorted alphabetically;
  - Regular expressions last, sorted in descending order of length
      (longest first).

The last point needs addressing, because I don't know of an easy (i.e., fast)
way to calculate a generality index for a regex.  This works for me and allows
my /./ rule to come last.

Note that having a "/./ [01]F" rule at the end of the list allows a whitelist
effect (any URL not matched by an earlier rule always matches this rule).

I will try to make a patch with just the sorting change, but it'll take me some
time to clean out the extra cruft I'm messing with.
This patch incorporates the original patch because I'm too lazy to hang on to
an original-patch copy of the affected sources.

The main change it contains is the aformentioned sorting logic.
Attachment #88849 - Attachment is obsolete: true
Blocks: 33576
Ok, re-did it using a c++ function for most of the logic, directly calling JS
RegExp functions. (Not listed in JS API Docs, however they are in jsapi.h) This
also includes the sorting patch from Tim. This needs to be applied to orginal
file (unpatched).
Attachment #88351 - Attachment is obsolete: true
For future reference, (cvs -z3) diff -u is the preferred way for patches to be
submitted.  Regardless, nice work. =)
As before, I haven't changed anything in the patch--just the format.
Attachment #88966 - Attachment is obsolete: true
-> Image Blocking
Assignee: mstoltz → morse
Component: ImageLib → Image Blocking
QA Contact: tpreston → tever
After discussion in #mozilla, I'm ripping host-based matching apart from the
other types and into a hashtable-based implementation.  That should speed things
up considerably for lists with a lot of hosts (and probably should land
independently of regular expression blocking; if anyone here wants to spin off a
bug dedicated to that change, please go ahead).  However, I'm currently trying
to do a couple other things as well, so I can't easily produce a patch isolating
that change.
*** Bug 130685 has been marked as a duplicate of this bug. ***
*** Bug 156280 has been marked as a duplicate of this bug. ***
*** Bug 159648 has been marked as a duplicate of this bug. ***
Summary: [RFE] Pattern-matching based image blocking → [RFE] Pattern-matching based (url-based) image blocking
*** Bug 165805 has been marked as a duplicate of this bug. ***
*** Bug 166037 has been marked as a duplicate of this bug. ***
so... what's the final status?  does it work for anyone?  i tried it and regexp
doesn't work in Mozilla 1.1.  do i need to build it my own self?
*** Bug 167630 has been marked as a duplicate of this bug. ***
Blocks: 52168
I have added bug 52168 (Provide UI for regexp cookie blocking) as being
dependant on this one.
*** Bug 172403 has been marked as a duplicate of this bug. ***
*** Bug 172373 has been marked as a duplicate of this bug. ***
Blocks: 69758
Summary: [RFE] Pattern-matching based (url-based) image blocking → Pattern-matching based (url-based) image blocking
*** Bug 175592 has been marked as a duplicate of this bug. ***
*** Bug 175572 has been marked as a duplicate of this bug. ***
I don't think that last patch works for the simple host-based matching.  It
looks like it tries to match the entire url against the hostname.  I had to
change:

  } else {
    // Simple host-based matching
    return !PL_strcasecmp( url, hoststruct->host );	  
 }

to:

  } else {
    // Simple host-based matching
    nsCAutoString str(url);
    PRInt32 pos = str.FindChar('/', 0);
    if (pos > 0)
      str.Cut(pos, str.Length());
    
    return !PL_strcasecmp( str.get(), hoststruct->host );	  
  }
Is there any perf issue preventing this from being checked in?
Summary: Pattern-matching based (url-based) image blocking → Pattern-matching (regexp) based (url-based) image blocking
Blocks: 147866
Summary: Pattern-matching (regexp) based (url-based) image blocking → block images by directory or by regexp/pattern
Blocks: majorbugs
I don't see any comments here regarding the 2nd half of what is shown in the
iCab example someone posted. That is, the ability to filter out by "object link"
or "link target" as it is sometimes called.

So for example, if an image was part of a hyperlink, you could block all images
that link to "*/signup.asp" or "?referrer=", which are typical links for many
ads (I'm sure you can think of others too.)

In the patch mentioned here (which I can't really try right now), is this part
being addressed?

Rob
At the momment I do not see why this couldnt be checked in. I have added the
modification for literal matching, and I will try to roll another patch (Busy
with finals ...) I have not heard anything about the hash-table impl (see
comment 42.)

I am not sure what the best way to pursue filerting based on links as opposed to
source url is. To pull it off, I think that there would need to be changes in
nsImages, to provide not only the source, but a possible link url. The easy part
is handling it in the checking routines, all that is needed is another perm type.

Recently I have dabled with creating some form of a simple UI for this (menu
option pop-ups a dialog.) I am also wondering if it would be good to create a
pref to enable/disable 'advanced image (ect...) blocking'?
what does the owner have to say?
well you should ask someone to review, then

that's probably morse because he owns cookies.
This feature has been available since Netscape 2.0 via automatic proxy
configuration.  See http://www.schooner.com/~loverso/no-ads/ for information on
how to configure this.  I believe that this provides a complete solution without
any need for code changes.
Preston, you are right.  But I have the following comments:

1. With automatic proxy configuration you will have to use a blackhole proxy
server.  You suggest a simple way with inetd and a shellscipt.  This resulted
for me the following:

inetd[10430]: 3421/tcp server failing (looping or being flooded), service
terminated for 10 min

The *CORRECT WAY* would be if Mozilla would not even try to make a connection if
an image is an ad.


2. Ads are on webpages so the author gets money for his work.  IMHO Mozilla
should treat ads in a way so authors don't starve but the ads occupy as few
resources in the browser as necessary.


3. As previously mentioned in this bug there are other attributes on images
which could help determining its state for blocking: the object link.

An other decision rule could be if the image is hosted in an other domain (there
is already an option for this but it is not flexible enogh)
*** Bug 187940 has been marked as a duplicate of this bug. ***
Mass reassigning of Image manager bugs to mstoltz@netscape.com, and futuring.
Most of these bugs are enhancement requests or are otherwise low priority at
this time.
Assignee: morse → mstoltz
is there a way to change the priority on this other than voting for it?
ad image blocking is one of the few reasons i'm using Mozilla and not some
lightweight browser.
You could always submit enough dupes for it to get onto the mostfreq list.
i don't really want to subvert bugzilla for a feature that might be important to
me and not many others.
i think there are four separate user needs that the mozilla community needs to
address.  once policies have been set or a way for users to state their
preferences has been created, features that claim to meet these needs in various
ways will iron themselves out.

1. user control of bandwidth they use.

2. need to minimize page load time, e.g. by making some content optional

3. discouraging providers of unwanted content (advertising, objectionable stuff,
etc), e.g. by not providing hits to their server

4. user control (show/block/alter) of presentation of downloaded content.
Krishna, I appreciate your concern and I agree that this would be "a good thing
to have." Unfortunately, no one at Netscape has the time to work on this right
now. The best way to get this addressed will probably be for you to find someone
interested in implementing it, maybe by asking around on the newsgroups. If
someone comes up with a patch, I'll make sure it gets reviewed.
While it's not a patch for the core code of Mozilla, an extension
(http://adblock.mozdev.org) has been made, which contains a lot of the
functionality requested in this bugreport. At the moment it merely hides ads,
but  I expect that when bug #162044 is resolved, it will be possible to perform
true blocking. 
Mitchell: Is the patch attached here so rotted as to be invalid now?
http://bugzilla.mozilla.org/attachment.cgi?id=89643&action=view
It can probably be updated with a minimum of work.
It should be possible to block not just images, but everything that can be
transcluded from the original html (that includes CSS files, javascript brought
in through the src attribute (as in "<script src=whatever.js>"), and so forth. 
You'd just get the HTML file the same way Lynx does.

One motivation for this is looking at a page through its Google cache link when
the page's actual host server is down.  Currently, if you click the cache link,
sometimes the HTML file is retrieved from the Google cache and then the browser
hangs for a long time trying to retrieve a CSS file that the HTML file asks for.
Another reason for using the Google cache might be if you want to look at a page
without leaving an HTTP hit in the actual host's server log.  You can turn off
images, but there's still all these other things the page can try to load.  So
there should be a way to say, "don't load ANYTHING!".
I would especially like to block flash ads, for the same reason I want to block
some images - as mentioned in #18.

Max.
The AdBlock plugin is, indeed, nice.  But AdShield for IE is even nicer -
blocking Flash too!  Here's hoping for something just as good in Mozilla
(actually, Phoenix :-)
*** Bug 194529 has been marked as a duplicate of this bug. ***
QA Contact: tever → nobody
*** Bug 203625 has been marked as a duplicate of this bug. ***
*** Bug 210114 has been marked as a duplicate of this bug. ***
Status: NEW → ASSIGNED
Summary: block images by directory or by regexp/pattern → [RFE]block images by directory or by regexp/pattern
*** Bug 90634 has been marked as a duplicate of this bug. ***
I recommend this be moved to mozdev - it's a good idea, but it's way more than
most people need. The number of users who could benefit from this does not
justify the added bloat and complexity in the core browser. This would make a
fine add-on. Tim, still interested in working on this?
*** Bug 211250 has been marked as a duplicate of this bug. ***
Could somebody implement this on a base level, like wildcard (* and ?) support
for the servers only?  I can understand if putting this in for the directory
level would be a challenge.

*looks up*  Oh, what's the status of the above patches?

> it's a good idea, but it's way more than most people need. The number of users 
> who could benefit from this does not justify the added bloat and complexity in 
> the core browser.

Votes = 100, Dupes = 19, CCs = 62.  The number of users have spoken...
I think that this does belong in the core, not as an addon (at mozdev or
anywhere else).  There are a large number of people who know and like regular
expressions.  They are showing up in many end-user programs.  The success of
programs that use regular expressions should prove that there is a place for them.
...but the ability to switch between regular expressions and standard wildcards
could be valuable.
cow
The status of the above patch is that its not working. The last time I worked on
this (November 2002), there had been changes to some of the nspr or something
which affected this code. It was minor to fix - but I never re-released a patch
as I was playing around with some other enchancements.

The current implementation supports 3 modes. (1) The current strict textual
match. (2) A simple wildcard via * and ?. (3) RegExp via the javascript engine.
Truthfully, (2) is implemented as a regexp under the hood as well.

The discussion was moving towards how to speed up the hole processes in a
situation with say 1000 entries. Hashing was suggested, by Tim, for host base
matching. Ideas for regexp? I dont really know any. My suggestion was to lazy
load entries from the file. And only keep around so many.  (Like paging in a
OS). Also, thought of pre-compiling the regexp, but then they need to be stored
in memory - which could be expensive. 

The UI for this "bug" was a different issue as well - a different bug 52168. I
played with an idea - but never got too far. In doing so, I figured the best bet
was to have a config option to enable or disable regexp blocking. This would
turn off the additional features in the UI for those who didnot care or understand.

Personally I have not followed through with this becase the last six months I
was busy with my senior project and other classes. Now that I am done, and am
waiting to start school in a few months, I have some free time to work on this,
agian.
> The discussion was moving towards how to speed up the hole processes in a
> situation with say 1000 entries.

If this is implemented, you won't need 1000 entries, unless you're trying to 
recreate the Great Firewall of China, and that would demand a seperate daemon 
process, anyway.
> If this is implemented, you won't need 1000 entries, unless you're trying to 
> recreate the Great Firewall of China, and that would demand a seperate daemon 
> process, anyway.

Not so. We're talking of ad blocking, and number of ad servers in Internet is
quickly approaching infinity :)
One of most popular ad server list is more than 1000 entries long:
http://pgl.yoyo.org/adservers/
Another list (in form of proxy autoconfing .pac file) is near 500 entries (hard
to say actually, too many comments inside):
http://www.schooner.com/~loverso/no-ads/
Both use wildcards extensively, second list also uses regexps.

BTW, mozilla perfomance with .pac ad blocking is really horrible. Its locking up
 for several seconds on every url request. As .pac files use js engine, that
might be applicable to (2) and (3) modes proposed by David.
Javascript engine running isn't very nice for something that needs to run
"quickly"... Why not make a "byte-code" compilation of sorts to speed what the
browser needs to run to process them, and at the same time, allow the nicer user
land language for making the rules...
I'm still interested in this, mitchell. I'll be helping David out however I can.
My preference would be to have this in mozilla/extensions, since it's definitely
something that embeddors wouldn't want to drag along, but it seems desirable
enough to be appropriate for the default build.

To address some of the comments made, I don't expect this to become an all-out
generic filtering mechanism (like mail/news's filters), so that should be a
separate RFE (and more likely, a mozdev project, IMO). Obviously, it's David's
call. As for people who want block lists of 1000 or more, I don't personally
intend to cater to that crowd first (again, I'm not David, and I don't speak for
him). That is, performance in the insanely-large-list category will not be my
personal main concern (initially, anyway).
*** Bug 214869 has been marked as a duplicate of this bug. ***
I have been getting spam with a gif image like

http://xuphekuwisudohuti.miracleproductline.com/healthnews/

with a serial number at hotmail. So they know when I look at the email. I have
started blocking images from the server, but the guy varies the first part
before the domain. So the blocking doesn't work.  

http://[VARIES].miracleproductline.com

The Manage Image Permissions only allows removal of the whole address, not
editing it to be just the server address.  So I am stopped.  This would be great
miracleproductline.com

Though he seems to have a plethera of domains, it'd be a start. 
== OT spam ==
Re: comment 91 - go to http://miracleproductline.com and choose tools -> image
manager -> block images from this site. Image blocking nowadays blocks the
domain _and_ all its subdomains. This was fixed by bug 176950
Manually adding domains is bug 33467, but for now you could also just edit
cookperm.txt directly. Adding the following line
miracleproductline.com 1F
to cookperm.txt would do the same.
The list at http://pgl.yoyo.org/adservers/ is available in lots of different 
formats, including as a PAC file:

http://pgl.yoyo.org/adservers/serverlist.php?
hostformat=proxyautoconfig&showintro=1

and as a Mozilla cookie permissions file:

http://pgl.yoyo.org/adservers/serverlist.php?hostformat=cookperm&showintro=1

More information is on the main page.
*** Bug 218202 has been marked as a duplicate of this bug. ***
*** Bug 221535 has been marked as a duplicate of this bug. ***
*** Bug 226501 has been marked as a duplicate of this bug. ***
*** Bug 228592 has been marked as a duplicate of this bug. ***
*** Bug 217115 has been marked as a duplicate of this bug. ***
I am a little confused.

Could someone please clarify whether this bug includes cookie whitelisting, as
called for in duplicate bug 217115, or does this bug just cover cookie (and
image) blacklisting? 

TIA.
No longer blocks: 52168
*** Bug 176539 has been marked as a duplicate of this bug. ***
I think for now, it just includes blacklisting.  If whitelisting is easy enough
to include, so be it, but it's not a requirement.  This is mostly for catching
the word "ad" in a directory or server name, such as:

ad(s|server)?\..+
[\w\.\-]+\/ads?\/

A better list of examples could be found by just looking at a Junkbuster regexp
blocker file.
Blocks: 110363
*** Bug 264219 has been marked as a duplicate of this bug. ***
Blocks: 273416
Why should AdBlock be hardcoded into Mozilla core? Keeping them independent
increases AdBlock's versatility (it's updated independently, etc.) and keeps
Mozilla more lightweight.
No longer blocks: majorbugs
*** Bug 126635 has been marked as a duplicate of this bug. ***
Status: ASSIGNED → NEW
QA Contact: nobody
Please note, that allowing wildcard must include or throttle cookie changes, renew or so on too. All this naggin screens if you're using FF 2.0 with "ask me for cookie"

thx
~Marcel
Assignee: security-bugs → nobody
QA Contact: image-blocking
No longer blocks: 273416
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: