Last Comment Bug 16913 - Filter news based on any headers
: Filter news based on any headers
Status: RESOLVED FIXED
:
Product: MailNews Core
Classification: Components
Component: Backend (show other bugs)
: Trunk
: All All
: P1 enhancement with 52 votes (vote)
: mozilla1.9.1a2
Assigned To: Joshua Cranmer [:jcranmer]
:
:
Mentors:
: 75096 166986 198492 257504 277003 360221 392248 454873 (view as bug list)
Depends on: 16904 killfile 424607 428729 496346
Blocks: 66425
  Show dependency treegraph
 
Reported: 1999-10-20 14:44 PDT by Phil Peterson
Modified: 2009-12-13 14:26 PST (History)
60 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---


Attachments
First, buggy implementation (16.44 KB, patch)
2007-10-06 18:29 PDT, Joshua Cranmer [:jcranmer]
no flags Details | Diff | Splinter Review
More complete implementation (17.83 KB, patch)
2007-10-09 18:35 PDT, Joshua Cranmer [:jcranmer]
no flags Details | Diff | Splinter Review
(Almost) fully working patch (21.40 KB, patch)
2007-10-13 09:06 PDT, Joshua Cranmer [:jcranmer]
no flags Details | Diff | Splinter Review
Latest patch (30.94 KB, patch)
2007-10-13 14:40 PDT, Joshua Cranmer [:jcranmer]
no flags Details | Diff | Splinter Review
Version 5 patch (34.46 KB, patch)
2007-10-14 14:52 PDT, Joshua Cranmer [:jcranmer]
no flags Details | Diff | Splinter Review
Final patch, version 1.0 (37.83 KB, patch)
2007-10-16 16:55 PDT, Joshua Cranmer [:jcranmer]
no flags Details | Diff | Splinter Review
Version 2 patch (37.70 KB, patch)
2007-11-09 17:17 PST, Joshua Cranmer [:jcranmer]
mozilla: review-
Details | Diff | Splinter Review
Version 3 patch (38.60 KB, patch)
2007-11-14 18:35 PST, Joshua Cranmer [:jcranmer]
no flags Details | Diff | Splinter Review
Version 4 patch (37.91 KB, patch)
2007-11-28 15:31 PST, Joshua Cranmer [:jcranmer]
mozilla: review-
Details | Diff | Splinter Review
Now supports local searching (38.54 KB, patch)
2007-12-31 10:37 PST, Joshua Cranmer [:jcranmer]
mozilla: review+
neil: superreview+
Details | Diff | Splinter Review
Patch with all nits fixed. (49.74 KB, patch)
2008-02-05 15:32 PST, Joshua Cranmer [:jcranmer]
Pidgeot18: review+
neil: superreview+
Details | Diff | Splinter Review
Patch to check in (49.75 KB, patch)
2008-02-06 16:09 PST, Joshua Cranmer [:jcranmer]
Pidgeot18: review+
Pidgeot18: superreview+
Details | Diff | Splinter Review
Better patch, WIP (68.31 KB, patch)
2008-06-23 19:52 PDT, Joshua Cranmer [:jcranmer]
no flags Details | Diff | Splinter Review
Brand new patch, with tests (91.58 KB, patch)
2008-06-27 13:51 PDT, Joshua Cranmer [:jcranmer]
no flags Details | Diff | Splinter Review
This one works without filters! (91.71 KB, patch)
2008-06-30 06:46 PDT, Joshua Cranmer [:jcranmer]
mozilla: review-
Details | Diff | Splinter Review
Updated patch (95.88 KB, patch)
2008-07-17 11:56 PDT, Joshua Cranmer [:jcranmer]
mozilla: review+
standard8: review+
neil: superreview+
Details | Diff | Splinter Review

Description Phil Peterson 1999-10-20 14:44:58 PDT
This is the news version of 16902. It would be nice to allow filtering news
messages based on user-customizable headers.
Comment 1 John_David_Galt 1999-10-20 16:05:59 PDT
Will be very useful for spam-filtering (on NNTP-Posting-Host:, Approved:, etc.);
content-filtering in groups that use Keywords: ; filtering against character
sets I can't read, or messages with attachments; etc.

Also want to be able to filter out messages crossposted to more than N groups
(which are nearly always spam), or to certain groups (which often tells
something about the subject matter).
Comment 2 Scott MacGregor 2000-05-24 16:11:59 PDT
Moving way out there.
Comment 3 Garth Wallace 2000-06-27 18:40:25 PDT
This should be fairly trivial. A list could be maintained of additional headers
besides the usual ones that are referenced in filters. When downloading the
normal headers for a group, Mozilla could also use the XHDR command to download
these as well. Filtering proceeds normally.
Comment 4 Keyser Sose 2001-04-09 13:46:56 PDT
*** Bug 75096 has been marked as a duplicate of this bug. ***
Comment 5 Jeremy Browne (no email) 2001-10-16 20:00:40 PDT
Going by some comments by naving@netscape.com on bug 16902, filtering news by
any header was included in the fix for doing so in mail.
Comment 6 timeless 2001-11-23 09:35:43 PST
*** Bug 111560 has been marked as a duplicate of this bug. ***
Comment 7 David Brittain 2001-11-24 03:32:25 PST
As noted by timeless in Bug 111560 , why does the UI provide no indication that
this doesn't work. Should I re-open 111560 and change it's name to something
like "Create news filters UI should indicate that feature is not implemented"?
Comment 8 R.K.Aa. 2002-09-05 22:29:45 PDT
*** Bug 166986 has been marked as a duplicate of this bug. ***
Comment 9 lloyd 2002-12-03 16:06:06 PST
For newsgroups, what's needed is a filter action that HIDES messages that match
a filter condition.

Most people (?) use message filters in newsgroups to filter out posts by
assorted idiots, trolls, and other trouble-makers; but you can't select "delete"
action in newsgroup because it tries to CANCEL the post.

So, you need an additional action - "Hide message".
Comment 10 Garth Wallace 2002-12-03 20:41:51 PST
lloyd - I think that a "mark as read" option would work just fine
Comment 11 Jonadab the Unsightly One (Nathan Eady) 2002-12-23 13:53:08 PST
It's very useful to filter on the Followup-To: header; a lot of hipcrime's 
markov chains can be tossed out that way, for example.  Very handy.  But
I imagine users can think of all kinds of headers to filter on.
(X-Eric-Conspiracy: There Is No Conspiracy, or who only knows what.)

As far as what these headers ought to be able to do...  anything a normal
usenet filter can do, how about?  Then extending what usenet filters can
do could be other bugs, and this one could just be about another way to
trigger them.  Yes, mark as read should be adequate for hiding, until such
time as Mozilla gets full scoring.  
Comment 12 Sander 2003-03-21 13:29:37 PST
*** Bug 198492 has been marked as a duplicate of this bug. ***
Comment 13 Henrik Skupin (:whimboo) [away 09/30 - 10/06] 2004-02-04 14:16:18 PST
Nothing happens on this bug. Does anybody work on it? It's a missing feature for
a lot of users.
Comment 14 Frankie 2004-12-16 14:04:18 PST
*** Bug 257504 has been marked as a duplicate of this bug. ***
Comment 15 Frankie 2004-12-16 14:16:13 PST
I vehemently disagree with calling this an Enhancement. Attempting to read a
high-noise group (such as news.admin.net-abuse.*) without robust killfile
support is an exercise in masochism.

Especially since bug 16902#c68 said the "any headers" patch for mail was also
supposed to cover news.

But we need a coder. Someone want to set up a bounty?
Comment 16 Vidar Haarr (not reading bugmail) 2004-12-16 23:50:47 PST
By definition, this bug is an enhancement.
Comment 17 Frankie 2005-01-05 20:44:16 PST
*** Bug 277003 has been marked as a duplicate of this bug. ***
Comment 18 blujay 2005-10-16 00:27:09 PDT
Ok, so it's been SIX YEARS.  I still can't enter custom header filters for
newsgroup messages, but for email I can do whatever I want.  Why?!  Six years! 
This makes it impossible to use Tb for reading GMANE newsgroups because you
can't filter out the already-detected spam!
Comment 19 John David Galt 2005-10-18 17:36:02 PDT
Agree.  I fail to see why the code from the fix to bug 16902 (and subsequent
bugs that added features to mail-filtering) can't be reused for news-filtering.
 Indeed, I originally thought that the same functions <i>were</i> used for both
(otherwise why is the "Delete" action -- which implies rogue cancellation of
news messages, which Mozilla can't actually do because of GNKSA -- still allowed
as an option in news filters)?
Comment 20 Chuck Hamilton 2006-01-05 12:00:04 PST
Put my name on the list of folks wanting this feature too. I subscribe to a newsgroup that's under constant flood attach. Without custom header filtering, the NG is useless. Please fix this soon.
Comment 21 Cesar Eduardo Barros 2006-01-12 12:00:25 PST
Related bug: bug 178397
Comment 22 John Corliss 2006-03-31 07:15:42 PST
Better filtering of spam would be possible if filtering based on any header were possible. For example:

Google Groups is a source of usenet spamming that can't be dealt with in any other fashion. They refuse to respond to usenet abuse complaints, so it would be very nice to be able to screen messages out that have the follwoing header:

"Complaints-To: groups-abuse@google.com"

Sending complaints to that address only gets a reply that Google isn't responsible for the content of posts, but the way.
Comment 23 Chris Ilias [:cilias] 2006-11-09 23:46:15 PST
*** Bug 360221 has been marked as a duplicate of this bug. ***
Comment 24 Tony Mechelynck [:tonymec] 2006-11-10 00:01:48 PST
(In reply to comment #9)
> For newsgroups, what's needed is a filter action that HIDES messages that match
> a filter condition.
> 
> Most people (?) use message filters in newsgroups to filter out posts by
> assorted idiots, trolls, and other trouble-makers; but you can't select "delete"
> action in newsgroup because it tries to CANCEL the post.
> 
> So, you need an additional action - "Hide message".

For my needs, the ability to "Mark as Read" is good enough: if I had the option to specify:

if "NNTP-Posting-Host" "is" "123.45.67.89" then "Mark as Read"

or maybe even

if  "NNTP-Posting-Host" "is greater than" "123.45.67.80"
and "NNTP-Posting-Host" "is less than"    "123.45.67.96"
etc.

that would IMHO be a powerful weapon against one category of trolls (alas, not all of them, since it's powerless against dynamically-addressed lines in countries from where we want to see posts by other people).
Comment 25 lal_truckee 2007-06-27 20:09:24 PDT
Decent filters (filtering on particular newsgroups in the distribution list and/or in the followup distribution list, and on number of groups in distribution lists) would go a long way to making Thunderbird a strong entry in the newsreader world.
Comment 26 Nelson Bolyard (seldom reads bugmail) 2007-07-12 13:59:25 PDT
I have been told that our news client is limited to the headers on which 
it can filter because the technique it uses to fetch headers only returns
a small subset of the headers - basically the ones on which the news client
presently allows us to filter.  

But there are other news clients that do presently allow filtering on any
and all headers.  Those other clients are effective for use in newsgroups
that are undergoing "hipcrime" attacks (wherein the group is flooded with
messages containing sheer nonsense of random words, but from a limited 
number of sources), and mozilla's news client is ineffective in those groups.

sci.crypt is undering a prolonged hipcrime attack right now, and has been
unusable by mozilla news client users for weeks, but users of other readers
can use it OK.  This makes me consider using another news reader. :-/

One sci.crypt reader has published a set of filter rules that effectively
eliminate the hipcrime noise from the group.  See 
http://fragrieu.free.fr/sci_crypt_kill_rules.txt
It would be superb if our mailnews client could handle those rules (or 
equivalent, in our rule syntax).

Because of the way that downloading news headers works, it MAY be necessary
to download the entire messages, and perform the header filtering during
the download.  I would be happy with that.  

It seems to me that IF the user can configured his news client to download
the entire newsgroup for offline reading, then it certainly ought to be 
possible to apply the filters while the messages are being downloaded.

I would be willing to mark sci.crypt to be downloaded for offline reading
IF that would enable me to filter out the hipcrime noise.

Seth, you implemented the existing news filters, IINM.  Can you offer any
insight into how massive a project it would be to filter during message 
download, or any other ideas about how to filter the way that other news
readers do?  
Comment 27 John David Galt 2007-07-12 15:37:51 PDT
I fail to understand why the code that already exists to filter mail messages on any headers (bug 16904) can't simply be reused for news.  I've been using it to find mailing-list messages by filtering on "Reply-To:" for years now.
Comment 28 Ronald Killmer 2007-08-15 07:59:08 PDT
*** Bug 392248 has been marked as a duplicate of this bug. ***
Comment 29 Joshua Cranmer [:jcranmer] 2007-08-29 05:53:03 PDT
According to RFC 3977, HEAD returns all headers of an article. Since we are already getting all of the headers (presumably: RFC 3977 is only about a year old), why can't we filter on that?

Barring that, XHDR (see RFC 2980) can get any header and is probably widely supported, why can't that be used as well?

Finally, if neither of those work, why can't the message be downloaded and stored somewhere else or only filtered when the message itself is downloaded?
Comment 30 Robert Harvey 2007-08-29 10:31:27 PDT
I have now downloaded Pan, with the intention of overcoming these problems, and no doub t it will when I work out how to handle the "rules".

I think 8 years is long enough to wait for an important enhancement.  I've mainly used Google Groups rather than Thunderbird for some time now.
Comment 31 Joshua Cranmer [:jcranmer] 2007-08-31 19:59:13 PDT
Cursory examination shows that Pan uses only XOVER, LIST NEWSGROUPS, LIST, and ARTICLE for getting articles.

Based on that information, my current plan of action is to get article headers in one of three ways:

XHDR
XPAT
XOVER
HEAD

optimally using XHDR and falling back on HEAD if nothing else works...
Relevant RFCs: 977 (original), 2980 (common extensions), 3977 (modern).

Is it just a coincidence that RFC 977 was obsoleted by RFC 3977?
Comment 32 Blinky the Shark 2007-09-09 21:33:58 PDT
(In reply to comment #31)
> Cursory examination shows that Pan uses only XOVER, LIST NEWSGROUPS, LIST, and
> ARTICLE for getting articles.
> 
> Based on that information, my current plan of action is to get article headers
> in one of three ways:
> 
> XHDR
> XPAT
> XOVER
> HEAD
> 
> optimally using XHDR and falling back on HEAD if nothing else works...
> Relevant RFCs: 977 (original), 2980 (common extensions), 3977 (modern).
> 
> Is it just a coincidence that RFC 977 was obsoleted by RFC 3977?

I believe that for filtering purposes Pan also uses XHDR, insofar as one can score on headers that appear there but not in the XOVER database.  Besides Pan, two other news clients that I use daily that will filter on any headers are Xnews (not open source) and slrn (open source).  Note that those are *not* offline readers, so the downloading of article *bodies* is *not* required for the ability to filter on all headers (nor does Pan require downloading of article bodies to do its all-headers filtering.  Someone upthread mentioned the idea of article body downloads perhaps being required for such filtering cababilities; these three clients (Pan, Xnews slrn)(and I don't know how many others) are proof that that is not a requirement.  Commonly, XOVER is accessed then XHDR if a filter requires a headers not found in XOVER.  That makes scoring on XHDR headers "expensive" in that it takes two actions instead of one.
 

Comment 33 Joshua Cranmer [:jcranmer] 2007-09-10 18:36:14 PDT
Roughly speaking, the solution for this would involve using more than just XOVER to grab the headers. Where supported, XHDR would grab the needed headers from the filters; if XHDR is not supported HEAD could be used to grab (all) headers of a message.

(If XPAT is supported, that might help doing pattern-matching filters. But that is much more complex logic and would involve messing with how filters work.)
Comment 34 Joshua Cranmer [:jcranmer] 2007-09-30 19:08:54 PDT
I finally had time to actually figure out how the NNTP protocol stuff is working in mailnews. No wonder people don't want to touch it at all...

The solution here is to add a few more states to nsNNTPProtocol:
NNTP_XHDR_PROCESS   iterate over the headers to filter on, going through:
NNTP_XHDR_SEND      send the command
NNTP_XHDR_RESPONSE  same as NNTP_XOVER_RESPONSE
NNTP_XHDR           same as NNTP_XHDR: actually process the data.

My idea is to change the transition NNTP_XOVER -> NNTP_FIGURE_NEXT_CHUNK to:
NNTP_XOVER -> NNTP_XHDR_PROCESS -> NNTP_XHDR_SEND
                |           ^             |
                V           |             V
NNTP_FIGURE_NEXT_CHUNK  NNTP_XHDR  <- NNTP_XHDR_RESPONSE
(View diagram in fixed width font.)
(Sorry about the diagram, but it is easier to read in many circumstances than dot diagrams.)

Comments/questions/concerns?
Comment 35 Joshua Cranmer [:jcranmer] 2007-10-06 18:29:20 PDT
Created attachment 283875 [details] [diff] [review]
First, buggy implementation

This is my current implementation that visibly works. Complete list of known bugs:
1. If XHDR doesn't work (which it doesn't work on news.mozilla.org), then filtering doesn't happen. At least it no longer goes into an infinite loop if XOVER doesn't work.
2. It does not recognize the headers needed from server-side filtering.
3. `Size:' is not yet supported. `Date:' may be problematic
4. Filters are applied twice for each message.

I would be happy if people could look to make sure that I am not leaking any strings. There are too many conversions between char * and nsCString for me to feel comfortable.
Comment 36 JohnWoo 2007-10-09 01:00:58 PDT
Thank You for remembering this bug !!! I'm not able to test Your implementation, but I want You to understand our problem.
At least in our country, we have a spammer/troll who is cross-posting/flooding news-groups with troll-messages, and adding inapproriate groups to good threads. He has been doing it for many years. He is always using some anonymous news-host and these we would like to filter away.
Comment 37 Joshua Cranmer [:jcranmer] 2007-10-09 18:35:44 PDT
Created attachment 284252 [details] [diff] [review]
More complete implementation

This patch adds three more components for support:
* Full server-side filtering support (as opposed to accidental support)
* Delete message now works properly (UNTESTED)
* Faster! (only applies filters once per message)

Known problems:
1. Size filtering doesn't work
2. Fallback for HEAD still doesn't work
3. Code needs cleanup: XOVER support in nsNNTPNewsgroupList needs to be overhauled and nsNNTPNewsgroupList/nsNNTPProtocol need some more communication.
4. I might be leaking strings (my nsStringStats vary so much with and without this patch that I can't tell anything from it).
5. The UI progress meter doesn't update for the XHDR stuff (it is 100% when it finishes the XOVER when it should be at most 50%).

To anyone using sci.crypt, I believe that this patch should be sufficient to block out the hipcrime spam, although my tests have been as yet limited to news.mozilla.org's mozilla.test (which is broken in XHDR support) and alt.test on a news server without broken XHDR support.

This bug is being actively worked on, so there is no need to put advocacy here. That should be relegated to IRC or the newsgroups. Besides, filtering based on User-Agent headers that minimally catches spam probably requires much more comprehensive UI support.

My last point is more directed to the TB peers: what would be the permissibility of applying a final version of this patch to the TB2 branch?
Comment 38 Henrik Skupin (:whimboo) [away 09/30 - 10/06] 2007-10-10 10:16:37 PDT
(In reply to comment #37)
> My last point is more directed to the TB peers: what would be the
> permissibility of applying a final version of this patch to the TB2 branch?

I'm not a peer or module owner but I can give you an answer. Before the patch will be available for Tb 2 it has to bake on trunk for a while. If all work well the release drivers have to decide if it's worth to check into 1.8 branch.
Comment 39 Joshua Cranmer [:jcranmer] 2007-10-11 18:45:58 PDT
Er, my latest patch is broken in that it doesn't add any header to the database. Move my additional `#endif' up before the last if statement to fix that problem. I would attach another patch, but I am currently in the middle of adding HEAD support and removing my various attempts to debug why the header was broken, so the patch is not in a preview-level state yet.

I expect to have a working beta by the end of tomorrow, complete with full HEAD fall-back support, and hopefully with a UI progress meter update. By the end of the next week, I should have a review-ready patch that additionally cleans out the current XOVER code and documents the entire XOVER/XHDR/HEAD process.
Comment 40 Joshua Cranmer [:jcranmer] 2007-10-13 09:06:52 PDT
Created attachment 284766 [details] [diff] [review]
(Almost) fully working patch

This patch now has almost-full HEAD support (although it is, er, slow). The still unresolved issues:

1. Size filtering doesn't work (I might just take that out: filtering on Line doesn't work).
2. Filtering on Subject/Date/Author doesn't work if HEAD is used.
3. I don't know if I have any string leakage.
4. Code needs more documentation
5. UI progress meter doesn't quite reflect what is going on here properly.
Comment 41 Joshua Cranmer [:jcranmer] 2007-10-13 14:40:36 PDT
Created attachment 284787 [details] [diff] [review]
Latest patch

Fixed issues:
1. Size works properly now
2. HEAD can now deal with standard headers better

Unresolved issues:
1. Am I leaking anything ?
2. Documentation is still lacking, although improving
3. HEAD isn't up to XOVER par yet:
   a) Status bar updates prematurely if XHDR isn't supported, not at all if XOVER isn't supported.
   b) Unread counts won't work properly if XOVER isn't supported.
Comment 42 Joshua Cranmer [:jcranmer] 2007-10-14 14:52:55 PDT
Created attachment 284881 [details] [diff] [review]
Version 5 patch

My, how the patches grow in size. :-)
Fixed:
1. Improved communication between nsNewsgroupList/nsNNTPProtocol. Clamped down on the possible leakage points, have some memory savings to boot, and ripped out a state as well.
2. UI for XOVER/XHDR reflects what is going on more accurately.

Unresolved:
1. HEAD still needs massive UI updates.
2. The modified UI is going to need some l10n.
3. Documentation is still lacking.
4. Some methods are going to need to be renamed.
5. Am I leaking ???
Comment 43 Nelson Bolyard (seldom reads bugmail) 2007-10-16 02:05:03 PDT
Joshua, I wonder if your patch will have any effect on bug 387337
Comment 44 Joshua Cranmer [:jcranmer] 2007-10-16 16:55:03 PDT
Created attachment 285171 [details] [diff] [review]
Final patch, version 1.0

First fully-working patch, including full HEAD support.

In reply to comment 43:
It shouldn't: the only thing this has touched with respect to the standard XOVER headers is when the filters get applied.
Comment 45 Nicolas Alvarez 2007-10-28 16:58:59 PDT
Latest NNTP spec (RFC 3977) renamed XOVER to OVER and XHDR to HDR, since it's now part of the core protocol instead of an extension. You may want to try both. As far as I know, the syntax for the command arguments and reply remain totally compatible.
Comment 46 Nicolas Alvarez 2007-10-28 17:08:12 PDT
Also, LIST HEADERS (if supported by the server) returns a list of fields that may be retrieved using the HDR command. May be useful so it knows whether to filter server-side with HDR or grab all with HEAD and filter locally, depending on the headers the user wants to filter, without having to "try it and see" (do the HDR command and handle failure).
Comment 47 David :Bienvenu 2007-11-07 15:53:12 PST
Comment on attachment 285171 [details] [diff] [review]
Final patch, version 1.0

Thx for doing this, Joshua.

the frozen linkage strings don't support the + operator, as I understand it, so code like this needs to get rewritten without the '+' operator.

+    nsCString fullHeaders;
+    if (!(author.IsEmpty()))
+      fullHeaders += NS_LITERAL_CSTRING(FROM_HEADER) + author + NULL_CHAR;
+

+  for (PRInt32 i=1; i<m_filterHeaders.Count(); i++)
+  {
+    if (m_filterHeaders[i]->Equals(m_filterHeaders[i-1]->get()))
+      m_filterHeaders.RemoveCStringAt(i--);
+    else
+      allHeaders += NS_LITERAL_CSTRING(" ") + *(m_filterHeaders[i]);

Do you need to RemoveCStringAt? Couldn't you just decrement i, and simply not add the header to allHeaders? Or just use two separate CStringArrays, add all ngHeaders to allHeaders, and then only add elements of servHeader that can't be found in ngHeaders, using IndexOf : http://mxr.mozilla.org/seamonkey/source/xpcom/glue/nsVoidArray.cpp#992



If you can fix those and attach a new patch, I'll try running with it.

a few nits:

I don't like the extra braces here:
+  if (LL_CMP(elapsedTime, >, MIN_STATUS_UPDATE_INTERVAL))
+  {
+    UpdateStatus(PR_TRUE, numDownloaded, totalToDownload);
+  }

+  while (*line == ' ') line++;

I prefer the line++ to be on it's own line :-) makes the code easier to read...
Comment 48 Joshua Cranmer [:jcranmer] 2007-11-09 17:17:08 PST
Created attachment 288078 [details] [diff] [review]
Version 2 patch

Fixed everything you pointed out.
Comment 49 David :Bienvenu 2007-11-13 14:28:11 PST
Comment on attachment 288078 [details] [diff] [review]
Version 2 patch

No need to copy the string here, and unless I'm missing something, this leaks as well:

+  line = PL_strdup(line);
+  rv = header->SetStringProperty(m_filterHeaders[m_currentHeader]->get(), line);
+  NS_ENSURE_SUCCESS(rv, rv);

+  for (PRInt32 i=1; i<servArray.Count(); i++)
+  {
+    if (m_filterHeaders.IndexOf(*(servArray[i])) == -1)

I think kNotFound is better than -1.


+  value = PL_strdup(value);
+
+  char *temp = header;
+  while (*temp)
+  {
+    // We need to match the headers in lowercase
+    // Since 33 <= *temp <= 126, no need for i18n here!
+    if ('A' <= *temp && *temp <= 'Z')
+      *temp |= 0x20;
+    ++temp;
+  }
+
+  m_lastHeader = header;
+  m_thisLine = value;

this leaks value. There's no need to strdup value. m_thisLine/nsCString will make a copy anyway. You might audit your patch for that kind of error - I think you're doing the same thing with m_lastHeader.

+  char *header;
+  m_newsgroupList->InitXHDR(&header);
+  if (!header)
+  {
+    m_nextState = NNTP_FIGURE_NEXT_CHUNK;
+    return 0;
+  }
+  char outputBuffer[OUTPUT_BUFFER_SIZE];
+  PRInt32 status = 0;
+
+  PR_snprintf(outputBuffer, OUTPUT_BUFFER_SIZE,
+        "XHDR %s %d-%d" CRLF,
+	header, m_firstArticle, m_lastArticle);
+  NNTP_LOG_WRITE(outputBuffer);
+  PL_strfree(header);

Here, you could make header an nsCString, pass it in by reference to InitXHDR, and avoid the PL_strfree call, and the strdups in InitXHDR.

+    if (!(author.IsEmpty()))
+      fullHeaders += NS_LITERAL_CSTRING(FROM_HEADER) + author + NULL_CHAR;
+
+    if (!(subject.IsEmpty()))
+      fullHeaders += NS_LITERAL_CSTRING(SUBJECT_HEADER) + subject + NULL_CHAR;
+
+    for (PRInt32 header = 0; header < m_filterHeaders.Count(); header++)
+    {
+      char *retValue;
+      m_newMsgHdr->GetStringProperty(m_filterHeaders[header]->get(), &retValue);
+      if (*retValue != 0)
+      {
+        fullHeaders += *(m_filterHeaders[header]) + NS_LITERAL_CSTRING(": ");
+        fullHeaders += nsCString(retValue) + NULL_CHAR;
+      }

these still use the + operator - I don't think that will work with frozen linkages. Please see http://developer.mozilla.org/en/docs/Migrating_from_Internal_Linkage_to_Frozen_Linkage - you can use += but not +
Comment 50 David :Bienvenu 2007-11-13 14:49:46 PST
Comment on attachment 288078 [details] [diff] [review]
Version 2 patch

+# The differences between these two is that the second one is shown when processing XHDR commands

should this be just "The difference between..."?

minusing based on prev comments...
Comment 51 Joshua Cranmer [:jcranmer] 2007-11-14 18:35:19 PST
Created attachment 288782 [details] [diff] [review]
Version 3 patch

Removed most of the char * code except where unfeasible.
Comment 52 David :Bienvenu 2007-11-21 12:08:31 PST
Joshua, thx for the patch, sorry for the delay.

a few nits:

+  rv = m_newsDB->AddNewHdrToDB(newMsgHdr, PR_TRUE);
+  return rv;

just return directly w/o setting rv...

just remove the commented out code:
+  /*const char * token = line;
+  char * next;
+  next = (line ? PL_strchr (line, ' ') : 0);
+  if (next)
+    *next++ = 0;
+  else
+    return NS_OK;
+  line = next;*/


don't need the braces here - you could move the comment about the if() clause, and save a couple lines.
+  if (key.CharAt(0) < '0' || key.CharAt(0) > '9')
+  {
+    // According to RFC 2980, some will send (none) instead.
+    // So we don't treat this is an error.
+    return NS_OK;
+  }

// remove this commented out code:
+  //while (*line == ' ')
+  //  line++;


+  nsCOMPtr <nsIMsgDBHdr> header;
+  nsresult rv;
+  rv = m_newsDB->GetMsgHdrForKey(number, getter_AddRefs(header));

do nsresult rv = m_newsDB...

+  //char * value = PL_strchr(line, ':');
+  PRInt32 colon = line.FindChar(':');

more commented out code to remove - actually, can you just remove all the commented out code? If you want to replace the commented out code with comments about what the code is doing, even better :-)

should just be 

if (*retValue)

+      if (*retValue != 0)
+      {
+        fullHeaders += *(m_filterHeaders[header]);
+        fullHeaders.AppendLiteral(": ");
+        fullHeaders += nsCString(retValue);
+        fullHeaders += NULL_CHAR;
+      }

we prefer declaring rv closer to where it's used:

+  nsresult rv;
+  double ratio = filtering ? (m_currentHeader+1.0)/(m_filterHeaders.Count()+1)
+                           : 1.0/(m_filterHeaders.Count()+1);
+  PRInt32 percent = (PRInt32)(ratio * numDLed / totToDL);
+  
+  nsAutoString numDownloadedStr;
+  numDownloadedStr.AppendInt(numDLed);
+
+  nsAutoString totalToDownloadStr;
+  totalToDownloadStr.AppendInt(totToDL);
+
+  nsString statusString;
+  nsCOMPtr<nsIStringBundleService> bundleService = do_GetService(NS_STRINGBUNDLE_CONTRACTID, &rv);


indentation looks funny here:
+
+    /* almost correct
+  */
+  if(status > 1)


the var name m_currentHeader is a bit misleading, since it's not actually a header, but a count or index... maybe call it m_currentXHDRIndex?

Comment 53 Joshua Cranmer [:jcranmer] 2007-11-28 15:31:44 PST
Created attachment 290607 [details] [diff] [review]
Version 4 patch

Updated to take account of bienvenu's nits.

Sorry about all of the comments, I had left that code there to verify that the char * code was identical in function the nsCString code and forgot to delete the comments before posting the patch...
Comment 54 David :Bienvenu 2007-12-05 10:31:37 PST
I'm rebuilding with this patch now.
Comment 55 David :Bienvenu 2007-12-05 13:14:05 PST
Comment on attachment 290607 [details] [diff] [review]
Version 4 patch

I tried this patch. It works for incoming new newsgroup messages - Yay! But it breaks in online search of a newsgroup - i.e., edit | find | search messages, and then pick something other than subject or sender from the drop down. We try to construct a search term we can send to the server, but fail. If I execute the search while offline, it does work, which is nice. I don't remember if we have a different validity manager for offline news searches, so that you can enable that ui while offline, but not while online...probably not.
Comment 56 Nelson Bolyard (seldom reads bugmail) 2007-12-27 22:16:34 PST
David,
I've used a lot of news servers over the years, but I've never used 
one that supported online searches.

If I had to choose between improved filtration and online searches, 
I'd choose improved filtration every time.  I'd choose improved filtering
even if it caused a regression in online searches.  Let's get this feature
going forward ASAP.
Comment 57 Joshua Cranmer [:jcranmer] 2007-12-31 10:37:59 PST
Created attachment 294972 [details] [diff] [review]
Now supports local searching

I had this patch as of 12/07, but actually posting it slipped my mind. It seems that news.mozilla.org could use with some more functionality for testing...
Comment 58 David :Bienvenu 2008-01-03 11:26:51 PST
Nelson, we don't want to add UI that doesn't work - it's better to get the code right before landing it because it can be harder to get things fixed after landing.

Thx, Joshua, I'll try to get to running with this patch soon.
Comment 59 Nelson Bolyard (seldom reads bugmail) 2008-01-03 13:17:21 PST
The volume of "sporge" and other crud in Usenet newsgroups has increased 
several fold in the last 12 months.  Many Usenet newsgroups are now unusable 
with Thunderbird and SeaMonkey because those products cannot adequately 
filter out the crud.  Other products are able to adequately filter it.

The ability to filter on other headers than From and Subject will help a
lot, but I think we will also need regular expression filtering to really
get on top of this problem.  
Comment 60 David :Bienvenu 2008-01-12 20:03:36 PST
Joshua, I'm building with your patch, but for future patches, it would be nice to have cvs diffs so I can apply the patch more easily ;-)
Comment 61 David :Bienvenu 2008-01-13 16:21:36 PST
Comment on attachment 294972 [details] [diff] [review]
Now supports local searching

thx, Joshua. A few nits before you ask for sr (maybe from Neil?):

+  // servArray may have duplicates already in m_filterHeaders.
+  for (PRInt32 i=1; i<servArray.Count(); i++)

can you put spaces around the '=' and '<'?

When we go to frozen linkage, apparently we can't use kNotFound (which I find highly annoying) so you could make the life of the person that makes news use frozen linkage by replacing kNotFound with -1.

Could you fix the braces style here:?

+  if (percent != m_lastPercent) {
+    SetProgressBarPercent(percent);
+    m_lastPercent = percent;
+  }

remove the commented out line:
+  //m_headerIndex = 0;
+
Comment 62 Rich Gray (:rbgray) 2008-01-27 20:06:42 PST
Tried it out in Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9b3pre) Gecko/2008012720 SeaMonkey/2.0a1pre, accessing SuperNews but it did not seem to work.  Joshua and I spent a long remote gdb session without getting anywhere.   Will try to respin without optimization and go at it again.
Comment 63 Rich Gray (:rbgray) 2008-01-28 21:43:33 PST
By monitoring the NNTP traffic with WireShark (formerly Ethereal), I discovered that SM was only making XHDR requests for the second and any additional headers, but not the first.  (Which of course was *totally* useless if one was only filtering on a single extra header as I was last night...)  In IRC, Joshua suggested I change the `1' in nsNNTPNewsgroupList::Initialize to a `0'.  (That fragment is above in comment #61.)   And the change fixed it.   I saw the requested XHDRs come down the wire and filter hits on the matches.  YAY!
Comment 64 neil@parkwaycc.co.uk 2008-01-29 06:48:19 PST
Comment on attachment 294972 [details] [diff] [review]
Now supports local searching

>+  for (PRInt32 i=1; i<servArray.Count(); i++)
You already know this should be 0, but I would like spaces around = and <

>+  nsACString::iterator temp, end;
>+  header.BeginWriting(temp);
>+  header.EndWriting(end);
>+  while (temp != end)
>+  {
>+    // We need to match the headers in lowercase
>+    // Since 33 <= *temp <= 126, no need for i18n here!
>+    if ('A' <= *temp && *temp <= 'Z')
>+      *temp |= 0x20;
>+    ++temp;
>+  }
I guess you copied this, but I think you can use header.ToLowerCase() here.

>+    char * strippedId = PL_strdup(value);
>+    char * freeable = strippedId;
>+    if (strippedId[0] == '<')
>+      strippedId++;
>+    char * lastChar = strippedId + PL_strlen(strippedId) -1;
>+
>+    if (*lastChar == '>')
>+      *lastChar = '\0';
>+
>+    rv = m_newMsgHdr->SetMessageId(strippedId);
>+    PL_strfree(freeable);
-1 does not look like you're subtracting 1.
(Eww, bogus string code, SetMessageId needs to take a const nsACString&)

>+  NS_NAMED_LITERAL_CSTRING(NULL_CHAR, "\0");
Eww, overkill, try #define NULL_CHAR '\0'

>+    const PRUnichar *formatStrings[3] = { NS_ConvertUTF8toUTF16(m_filterHeaders[m_currentXHDRIndex]->get()).get(),
Can't use the result of .get() after this line, it's unsafe. Instead
NS_ConvertUTF8toUTF16 header(m_filterHeaders[m_currentXHDRIndex]);
and use header.get(), sr=me with this fixed.
Comment 65 neil@parkwaycc.co.uk 2008-01-29 07:02:13 PST
Comment on attachment 294972 [details] [diff] [review]
Now supports local searching

>--- /cvsroot/mozilla/mail/locales/en-US/chrome/messenger/news.properties
>+++ /cvsroot/mozilla/mail/locales/en-US/chrome/messenger/news.properties
Any chance you can patch /cvsroot/mozilla/suite/locales/en-US/chrome/messenger.newsproperties too?

It looks like you save all the news headers in the database. Was this intentional? Normally we only save the ones in the customDBHeeaders pref.
Comment 66 Joshua Cranmer [:jcranmer] 2008-02-05 15:32:17 PST
Created attachment 301592 [details] [diff] [review]
Patch with all nits fixed.

This is the latest version of my patch, fixing the awkward bug I missed (cited above), and now back in a CVS diff format. It also adds in the file for properties in suite, a few nits fixed, and doesn't add all headers to the message DB.

r=bienvenu carried over
sr not carried over per Neil's request
Comment 67 neil@parkwaycc.co.uk 2008-02-06 06:15:09 PST
Comment on attachment 301592 [details] [diff] [review]
Patch with all nits fixed.

>+  else if (m_filterHeaders.IndexOf(nsCString(header)) >= 0)
I think that does a copy; instead I'd write
m_filterHeaders.IndexOf(nsDependentCString(header)) != -1

>+    NS_ConvertUTF8toUTF16 header(m_filterHeaders[m_currentXHDRIndex]->get());
Don't need ->get(); just * will do.
Comment 68 Joshua Cranmer [:jcranmer] 2008-02-06 16:09:27 PST
Created attachment 301774 [details] [diff] [review]
Patch to check in

This is the patch fixing Neil's last two nits.
Comment 69 Reed Loden [:reed] (use needinfo?) 2008-02-06 21:34:20 PST
Checking in mailnews/news/public/nsINNTPNewsgroupList.idl;
/cvsroot/mozilla/mailnews/news/public/nsINNTPNewsgroupList.idl,v  <--  nsINNTPNewsgroupList.idl
new revision: 1.22; previous revision: 1.21
done
Checking in mailnews/news/src/nsNNTPNewsgroupList.cpp;
/cvsroot/mozilla/mailnews/news/src/nsNNTPNewsgroupList.cpp,v  <--  nsNNTPNewsgroupList.cpp
new revision: 1.138; previous revision: 1.137
done
Checking in mailnews/news/src/nsNNTPNewsgroupList.h;
/cvsroot/mozilla/mailnews/news/src/nsNNTPNewsgroupList.h,v  <--  nsNNTPNewsgroupList.h
new revision: 1.26; previous revision: 1.25
done
Checking in mailnews/news/src/nsNNTPProtocol.cpp;
/cvsroot/mozilla/mailnews/news/src/nsNNTPProtocol.cpp,v  <--  nsNNTPProtocol.cpp
new revision: 1.396; previous revision: 1.395
done
Checking in mailnews/news/src/nsNNTPProtocol.h;
/cvsroot/mozilla/mailnews/news/src/nsNNTPProtocol.h,v  <--  nsNNTPProtocol.h
new revision: 1.105; previous revision: 1.104
done
Checking in mail/locales/en-US/chrome/messenger/news.properties;
/cvsroot/mozilla/mail/locales/en-US/chrome/messenger/news.properties,v  <--  news.properties
new revision: 1.4; previous revision: 1.3
done
Checking in suite/locales/en-US/chrome/mailnews/news.properties;
/cvsroot/mozilla/suite/locales/en-US/chrome/mailnews/news.properties,v  <--  news.properties
new revision: 1.26; previous revision: 1.25
done
Comment 70 Reed Loden [:reed] (use needinfo?) 2008-02-07 00:25:17 PST
I somehow missed two files. oops.

Checking in mailnews/base/search/src/nsMsgSearchAdapter.cpp;
/cvsroot/mozilla/mailnews/base/search/src/nsMsgSearchAdapter.cpp,v  <--  nsMsgSearchAdapter.cpp
new revision: 1.96; previous revision: 1.95
done
Checking in mailnews/base/search/src/nsMsgSearchNews.cpp;
/cvsroot/mozilla/mailnews/base/search/src/nsMsgSearchNews.cpp,v  <--  nsMsgSearchNews.cpp
new revision: 1.44; previous revision: 1.43
done
Comment 71 Nelson Bolyard (seldom reads bugmail) 2008-02-07 13:05:57 PST
I eagerly downloaded and tried a nightly trunk build.
Happily, the news message filters now let me define filters base on headers
other thah From and Subject, such as Path and NNTP-Posting-Host.
But sadly, when I tried to use them, I got this alert dialog

    A News (NNTP) error occurred: xpat not supported

Is this expected?  
Will this new feature do me no good because giganews doesn't support xpat?
Comment 72 Chris Ilias [:cilias] 2008-02-07 14:45:06 PST
I tried it earlier today on Mac 10.5.1:
http://groups.google.com/group/mozilla.test/browse_frm/thread/597795ec228eb524

Filter was based on Complaints-To.
If [Complaints-To] is equal to [groups-abuse@google.com]
[add star]

It worked fine for me. Are you sure you pulled up the filter window, and not the search window? :-)
Comment 73 Nelson Bolyard (seldom reads bugmail) 2008-02-07 18:16:12 PST
Here's a problem that I think will be common to many users trying to use 
this new feature.
1. They visit a newsgroup, with few or no filters in place.
2. The news reader downloads a large number of message headers, including 
a very large number (hundreds or even thousands) of junk messages.  
3. The user studies some of the junk messages, and concludes that one or 
two simple additional filters would eliminate all the junk, such as filtering
out a certain NNTP-POSTING-HOST or a certain server name in the PATH.
4. The user creates filters for those things.

At this point, the user is stuck.  He has no way to apply those filters to 
the thousands of junk headers downloaded in step 2.  He can take a little
comfort in knowing that if the same spammer spams the group again from the 
same source, his filters will catch that, but that doesn't help him with 
all the spam he now sees in the newsgroup.  

An attempt to search for all the messages in the group that meet the new 
filter criteria doesn't have the desired effect either.  (I thought that 
search might search through the headers already downloaded, but no joy.)

So, here I am, with thousands of junk messages in my newsgroups, and no way
to get rid of them, despite all this new filter capability.  :(
Comment 74 Ronald Killmer 2008-02-07 18:32:36 PST
(In reply to comment #73)
> Here's a problem that I think will be common to many users trying to use 
> this new feature.
> 1. They visit a newsgroup, with few or no filters in place.
> 2. The news reader downloads a large number of message headers, including 
> a very large number (hundreds or even thousands) of junk messages.  

If these a novice Tb or Sm users, then they are likely to have the 500 header cap still set in the account setting preference.

> 3. 
> 4. 
> 
> At this point, the user is stuck.  He has no way to apply those filters to 
> the thousands of junk headers downloaded in step 2.  He can take a little
> comfort in knowing that if the same spammer spams the group again from the 
> same source, his filters will catch that, but that doesn't help him with 
> all the spam he now sees in the newsgroup.  
> 
> An attempt to search for all the messages in the group that meet the new 
> filter criteria doesn't have the desired effect either.  (I thought that 
> search might search through the headers already downloaded, but no joy.)
> 
> So, here I am, with thousands of junk messages in my newsgroups, and no way
> to get rid of them, despite all this new filter capability.  :(
> 

Bugs are filed for manual activation of filtering.  This bug's fix just highlights the need for a menu choice to activate filtering of the headers list.
See Bug https://bugzilla.mozilla.org/show_bug.cgi?id=178870
Comment 75 Nicolas Alvarez 2008-02-07 18:36:31 PST
9 years waiting for a fix that isn't useful in the most common case?
Comment 76 Chris Ilias [:cilias] 2008-02-07 20:48:02 PST
Please, if your comments are not helpful in fixing/verifying the bug, it doesn't belong in bugzilla. If you're sick of waiting for bug different than this one, stop waiting, and start coding.
Please read <https://bugzilla.mozilla.org/page.cgi?id=etiquette.html>.
Comment 77 Nelson Bolyard (seldom reads bugmail) 2008-02-09 19:23:52 PST
Additional observations about the effects of this patch:

1. When filter rules that use previously unavailable headers cause messages
to be marked read, with the result that the entire thread is read, the thread
still shows up in the message list pane, even when you've selected viewing 
only "threads with unread".  This is a difference between threads marked read
by the new filters vs threads marked read by the old ones, and it seems like
a bug.

2. When visiting a newsgroup that has any of these new filters in place, 
filters that use previously unavailable headers (is there a name for such 
filters?) the filtration rate is MUCH slower.  With the old filters, it could
easily filter 500 messages per second.  With the new filters, it's about 2-3
at best.  For newsgroups that gets LOTS of new messages daily (e.g. sci.crypt)
this can take MINUTES to filter the messages now.)  </whine> 
Comment 78 Dan Mosedale (:dmose) 2008-04-29 18:38:06 PDT
Reopening, as this patch has been backed out; see bug 424607 for details.
Comment 79 Joshua Cranmer [:jcranmer] 2008-06-23 19:52:26 PDT
Created attachment 326427 [details] [diff] [review]
Better patch, WIP

Coming back to this after a few months, I've added some stuff.

Most notable additions:
* Cleaned up some of the protocol stuff to be a little neater.
* Added test_filter.js to test the filters...
* ... fixed nntpd.js to actually do group stuff right ...
* ... got around to actually starting work on an RFC 2980 handler ...
* ... fixed up head_server_setup to handle stuff a tad better ...
* ... made test_server.js a little less fragile ...
* ... and added several messages to test for posting.

Still to come:
* There's a bug in my XHDR handling. See if you can spot it :-)
** A need a test to not hit this bug.
* There's a bigger what-may-or-may-not-be-a-bug in HEAD handling. It may actually be a big in nsMsgKeySet itself. Hacked around it for testing purposes for now.
* I need a Giganews-style server.
* Test-filters only verifies that the headers can be found.

Possibly, mailnews/test/resources may want the filter stuff instead of having news hog it all.
Comment 80 Joshua Cranmer [:jcranmer] 2008-06-23 19:53:31 PDT
I can manage my own bugs, I swear.
Comment 81 Joshua Cranmer [:jcranmer] 2008-06-27 13:51:10 PDT
Created attachment 327170 [details] [diff] [review]
Brand new patch, with tests

Did I mention I have tests on this one?

Some differences from old patch:
* I modified the IDL of nsNNTPNewsgroupList.cpp a tad to suit my needs better
* We filter before adding to the database. That resolves the regressions I can reproduce on bug 424607, and (if there are any more regressions) should resolve them as well.
* Tests. test_filter comprehensively tests filters.
* Fakeserver improvements.
* I fixed (hopefully) a problem with HEAD in that it looks like HEAD-only would cause some unread count problems that are hopefully fixed by this patch.
* Removed invalid assertion in db/msgdb with respect to marking headers read via filters.

By the way, the tests will only pass if bug 419143 is fixed.
Comment 82 Alfred Peters 2008-06-29 08:20:39 PDT
When I insert attachment (id=327170) in my TB source, I have problems subscribing to new newsgroups.

TB asks the the server for the XOVER, but then reports:
  "There are no new messages on the server."
and displays an empty newsgroup.

After undoing the Patch, the Problem is gone.
Comment 83 Joshua Cranmer [:jcranmer] 2008-06-30 06:46:52 PDT
Created attachment 327397 [details] [diff] [review]
This one works without filters!

As the title says... when I changed from adding before filters to adding after filters, I forgot to remove the "don't call filters if we don't have any" optimization.
Comment 84 David :Bienvenu 2008-07-03 13:29:23 PDT
Comment on attachment 327397 [details] [diff] [review]
This one works without filters!

Thx for doing this - it's a lot of work!

Maybe it would be clearer to say that one is used for XOVER and one for XHDR?
+# The difference between these two is that the second one is shown when processing XHDR commands
 downloadingHeaders=Downloading %S of %S headers
+downloadingFilterHeaders=Getting filter headers: %S (%S/%S)


Why this additional check?

+    
+    PRBool inDB;
+    (void)ContainsKey(msgKey, &inDB);
+
+    if (inDB)

Are we now trying to mark hdrs read that aren't in a db? Why would we still call MarkHdrReadInDB?

+  void processXHDRLine(in ACString line);
+
+  void initHEAD(in long message);
+  void processHEADLine(in ACString line);
+  void HEADFailed(in long message);

these should be aMessage, aLine, etc.

+  ++m_currentXHDRIndex;
+  if (m_currentXHDRIndex >= m_filterHeaders.Count())
+    header.Truncate();

Should move the ++ to the if statement, if (++currentXHDRIndex ..)

I've heard kNotFound is not available with frozen linkages :-( so we use -1 instead:

We don't need to use these 64 bit macros anymore:

+    LL_SUB(elapsedTime, PR_Now(), m_lastStatusUpdate);
+
+    if (LL_CMP(elapsedTime, >, MIN_STATUS_UPDATE_INTERVAL) ||




+  if (middle == kNotFound)
+    return NS_OK;


The two returns here could be avoided by moving the if inside the else clause - does that look nicer?

+  else if (line.CharAt(0) == ' ' || line.CharAt(0) == '\t') // We are continuing the header
+  {
+    m_thisLine += header; // Preserve whitespace (should we?)
+    return NS_OK;
+  }
+  else
+    return NS_OK; // We are malformed. Just ignore and hope for the best...

if any clause has brackets, we tend to make all of them have brackets, even if it's just a single line clause - it's not my favorite rule, but it's good to be consistent.
+  if (PL_strcmp(header, "from") == 0)
+    rv = m_newMsgHdr->SetAuthor(value);
+  else if (PL_strcmp(header, "date") == 0)
+  {

I believe the comment about agreeing with newsrc needs to move down a line
+    if (NS_MsgStripRE(&subject, &subjectLen, getter_Copies(modifiedSubject)))
+      // this will make sure read flags agree with newsrc
+     (void) m_newMsgHdr->OrFlags(MSG_FLAG_HAS_RE, &flags);
+
+    if (! (flags & MSG_FLAG_READ))
+      rv = m_newMsgHdr->OrFlags(MSG_FLAG_NEW, &flags);

nsMsgHdr::SetMessageId handles stripping the <>, so you don't have to:

+  else if (PL_strcmp(header, "message-id") == 0)
+  {
+    char * strippedId = PL_strdup(value);
+    char * freeable = strippedId;
+    if (strippedId[0] == '<')
+      strippedId++;
+    char * lastChar = strippedId + PL_strlen(strippedId) - 1;
+
+    if (*lastChar == '>')
+      *lastChar = '\0';
+
+    rv = m_newMsgHdr->SetMessageId(strippedId);
+    PL_strfree(freeable);
+  }

i < count here:

+  for (PRUint32 i = 0; i<count; i++)

we tend to prefer !filterCount, !serverFilterCount:

+    if (filterCount == 0 && serverFilterCount == 0)
+    {
+      m_newsDB->AddNewHdrToDB(m_newHeaders[i], PR_TRUE);
+      continue;

This leaks. Better to use an nsCString and getter_Copies:

+      char *retValue;
+      m_newMsgHdr->GetStringProperty(m_filterHeaders[header]->get(), &retValue);


No need to use a double here. Elsewhere, we just multiply the numerator by 100 and then divide by the denominator to get a %.

+  double ratio = filtering ? (m_currentXHDRIndex+1.0)/(m_filterHeaders.Count()+1)
+                           : 1.0/(m_filterHeaders.Count()+1);
+  PRInt32 percent = (PRInt32)(ratio * numDLed / totToDL);
+  

You can tighten this up a little. After the QI, just do

return (mailnewsurl) ? SendData(mailnewsurl, outputBuffer) : 0;

+  PRInt32 status = 0;
+  nsCOMPtr<nsIMsgMailNewsUrl> mailnewsurl = do_QueryInterface(m_runningURL);
+  if (mailnewsurl)
+    status = SendData(mailnewsurl, outputBuffer);
+
+  return status;

I haven't stared at the test code very much yet, but:

don't need the braces here:
+     var response = info[1]+'\n';
+     for (var header in info[0].headers) {
+       response += header + ": " + info[0].headers[header] + "\n";
      }
Comment 85 Robert Kaiser 2008-07-03 14:47:22 PDT
(In reply to comment #84)
> Maybe it would be clearer to say that one is used for XOVER and one for XHDR?
> +# The difference between these two is that the second one is shown when
> processing XHDR commands
>  downloadingHeaders=Downloading %S of %S headers
> +downloadingFilterHeaders=Getting filter headers: %S (%S/%S)

If you're changing this comment, please take into account that localizers don't know what either of XOVER or XHDR is. And maybe even go as far as using the syntax used elsewhere for "LOCALIZATION NOTE" (search MXR for it) so that L10n tools might be able to parse it as belonging to those values and display this accordingly in their UI (not that I currently know available tools that actually do it).
Comment 86 David :Bienvenu 2008-07-03 14:51:56 PDT
Robert's quite right. And maybe the second string should be something like "Getting headers for filters:
Comment 87 David :Bienvenu 2008-07-16 14:21:30 PDT
I looked through the tests. They look OK, but they could use some comments!

Maybe some comments about what flags you're checking here (kill and watch)?:

+  "3@regular.invalid" : function (header, folder) {
+    var db = folder.getMsgDatabase(null);
+    var flags = db.GetThreadContainingMsgHdr(header).flags;
+    return (flags & 0x40000) == 0x40000;
+  },
+  "4@regular.invalid" : function (header, folder) {
+    var db = folder.getMsgDatabase(null);
+    var flags = db.GetThreadContainingMsgHdr(header).flags;
+    return (flags & 0x100) == 0x100;

"run the folder" could be clearer - do you mean "get headers from the fake server for the group"? And/Or run filters on the new headers? Or perhaps just "Update the folder"?
+  // Run the folder
+  var folder = localserver.rootFolder.getChildNamed("test.filter");
+  folder.getNewMessages(null, {
Comment 88 Joshua Cranmer [:jcranmer] 2008-07-17 11:56:09 PDT
Created attachment 330078 [details] [diff] [review]
Updated patch

Updated. Also noticed that I had a few stuff added in to debug some of the fakeserver stuff, which was excised.
Comment 89 David :Bienvenu 2008-07-17 17:28:53 PDT
Comment on attachment 330078 [details] [diff] [review]
Updated patch

+NS_IMETHODIMP
+nsNNTPNewsgroupList::InitHEAD(PRInt32 number)

+nsNNTPNewsgroupList::ProcessHEADLine(const nsACString &line)

etc...

all these implementations should also use the "aNumber" convention, not "number"

You don't need to add a null terminator to nsCStrings - sorry, didn't notice that before:

+      fullHeaders += '\0';
+    }
+
+    if (!(subject.IsEmpty()))
+    {
+      fullHeaders.AppendLiteral(SUBJECT_HEADER);
+      fullHeaders += subject;
+      fullHeaders += '\0';
+

sr=me, with those addressed.
Comment 90 Mark Banner (:standard8) 2008-07-22 03:23:40 PDT
Comment on attachment 330078 [details] [diff] [review]
Updated patch

I've only taken a quick look at the test code, but I think it looks ok.
Comment 91 Joshua Cranmer [:jcranmer] 2008-07-22 14:18:29 PDT
Comment on attachment 330078 [details] [diff] [review]
Updated patch

Since the patch adds new tests, it's been bitrotted by the hg move. I'll get a fixed version up shortly.
Comment 92 neil@parkwaycc.co.uk 2008-07-25 13:04:48 PDT
Comment on attachment 330078 [details] [diff] [review]
Updated patch

>+NS_IMETHODIMP
>+nsNNTPNewsgroupList::InitXHDR(nsACString &header)
>+{
>+  if (++m_currentXHDRIndex >= m_filterHeaders.Count())
>+    header.Truncate();
>+  else
>+    header.Assign(*m_filterHeaders[m_currentXHDRIndex]);
>+  // Don't include these in our XHDR bouts, as they are already provided through
>+  // XOVER. 
>+  if (header.EqualsLiteral("message-id") ||
>+      header.EqualsLiteral("references"))
>+    return InitXHDR(header);
>+  return NS_OK;
>+}
This is not obviously a loop. Please try to make it more obvious.
Comment 93 Joshua Cranmer [:jcranmer] 2008-07-28 16:40:11 PDT
Pushed onto comm-central, changeset 23da5cab793a.
Comment 94 MickK 2008-08-19 19:25:32 PDT
Any chance of getting a "Doesn't Contain" option added?  It would be useful in filtering out forgeries.
Comment 95 Magnus Melin 2008-09-08 08:24:51 PDT
That's not this bug, but bug 146676.
Comment 96 Joshua Cranmer [:jcranmer] 2008-09-11 14:54:00 PDT
*** Bug 454873 has been marked as a duplicate of this bug. ***
Comment 97 Julien ÉLIE 2008-09-13 06:31:48 PDT
(In reply to comment #45)
> Latest NNTP spec (RFC 3977) renamed XOVER to OVER and XHDR to HDR, since it's
> now part of the core protocol instead of an extension. You may want to try
> both. As far as I know, the syntax for the command arguments and reply remain
> totally compatible.

No, they are not the same commands.  HDR for instance returns one line per article whereas XHDR returns only matching articles.
And XOVER replies are sometimes different (especially when there is no match too).
Comment 98 Julien ÉLIE 2008-09-13 06:37:49 PDT
By the way, why not try to use CAPABILITIES?  If it does not answer 500 and advertises OVER or HDR, use them; otherwise, use XOVER and XHDR.  Then HEAD if nothing is available.
Comment 99 Joshua Cranmer [:jcranmer] 2008-09-13 07:07:34 PDT
(In reply to comment #98)
> By the way, why not try to use CAPABILITIES?  If it does not answer 500 and
> advertises OVER or HDR, use them; otherwise, use XOVER and XHDR.  Then HEAD if
> nothing is available.

There is a bug on that, but it's low priority since the number of major servers supporting CAPABILITIES or OVER, etc. is 0. See bug 244682.
Comment 100 neil@parkwaycc.co.uk 2008-09-15 04:33:07 PDT
(In reply to comment #92)
> This is not obviously a loop. Please try to make it more obvious.
Pretty please?
Comment 101 NoOp 2009-11-07 17:31:48 PST
I see the subject bug is "Filter news based on any headers", however the filters only work on the _first_ such header. For example; if you have multiple 'Reply-To', multiple 'Delivered-To' or 'Original-Received' headers, etc., you are unable to filter on the second (or third) header of the same type. Sample:

Delivered-To: mailing list users@openoffice.org
Delivered-To: moderator for users@openoffice.org

Only the first header is possible, the 'moderator' filter is ignored. This may be related to 
https://bugzilla.mozilla.org/show_bug.cgi?id=387337
[news server filters not working in SM 2.0a1]

Please reopen, or resolve 387337 - thanks.
Comment 102 Joshua Cranmer [:jcranmer] 2009-11-07 17:46:10 PST
(In reply to comment #101)
> I see the subject bug is "Filter news based on any headers", however the
> filters only work on the _first_ such header. For example; if you have multiple
> 'Reply-To', multiple 'Delivered-To' or 'Original-Received' headers, etc., you
> are unable to filter on the second (or third) header of the same type. Sample:

We filter on the headers that the server returns to us when we get XHDR results. Please file a bug with your news provider.
Comment 103 NoOp 2009-11-07 19:19:04 PST
So just to be clear, an nntp header such as the following is a news
provider issue? Note: not being confrontational, just wish to be clear regarding the issue so that I understand the limitations of the filters. The following header is public & is typical of a gmane.org header from the openoffice.org users list.
====
Newsgroups: gmane.comp.openoffice.questions
Subject: CANCELLATION
Date: Sat, 31 Oct 2009 21:02:04 -0500
Lines: 16
Approved: news@gmane.org
Message-ID: <000c01ca5a97$4f7bdec0$86f666ae@Janet>
Reply-To: users@openoffice.org
NNTP-Posting-Host: lo.gmane.org
Mime-Version: 1.0
Content-Type: multipart/alternative;
	boundary="----=_NextPart_000_0009_01CA5A6D.662C39F0"
X-Trace: ger.gmane.org 1257103312 19582 80.91.229.12 (1 Nov 2009 19:21:52 GMT)
X-Complaints-To: usenet@ger.gmane.org
NNTP-Posting-Date: Sun, 1 Nov 2009 19:21:52 +0000 (UTC)
To: <users@openoffice.org>
Original-X-From: users-return-203102-opof-users=m.gmane.org@openoffice.org Sun Nov 01 20:21:45 2009
Return-path: <users-return-203102-opof-users=m.gmane.org@openoffice.org>
Envelope-to: opof-users@lo.gmane.org
Original-Received: from s006.sjc.collab.net ([204.16.104.2] helo=openoffice.org)
	by lo.gmane.org with smtp (Exim 4.50)
	id 1N4fzk-0000PT-0h
	for opof-users@lo.gmane.org; Sun, 01 Nov 2009 20:21:44 +0100
Original-Received: (qmail 10964 invoked by uid 5000); 1 Nov 2009 19:21:38 -0000
Mailing-List: contact users-help@openoffice.org; run by ezmlm
Precedence: bulk
X-No-Archive: yes
list-help: <mailto:users-help@openoffice.org>
list-unsubscribe: <mailto:users-unsubscribe@openoffice.org>
list-post: <mailto:users@openoffice.org>
Delivered-To: mailing list users@openoffice.org
Delivered-To: moderator for users@openoffice.org
Original-Received: (qmail 17574 invoked from network); 1 Nov 2009 02:02:11 -0000
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: Av0DAH+I7EpHSjh7kGdsb2JhbACCJiyGUSaBF4MWgzCKLAEBAQEJCQwHEwNCr28IjhaCSQiBaAQ
X-IronPort-AV: E=Sophos;i="4.44,660,1249282800"; 
   d="scan'208,217";a="33595114"
X-IRONPORT: SCANNED
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.3598
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3350
X-Gmane-Expiry: 2009-11-15
Xref: news.gmane.org gmane.comp.openoffice.questions:201827
Archived-At: <http://permalink.gmane.org/gmane.comp.openoffice.questions/201827>
====

Note the 
Delivered-To: mailing list users@openoffice.org
Delivered-To: moderator for users@openoffice.org

Currently there is no way to filter on 'moderator' in the second 'Delivered-To' header. "Filter news based on any headers" does not seem to work.
Comment 104 Joshua Cranmer [:jcranmer] 2009-11-07 19:30:20 PST
(In reply to comment #103)
> So just to be clear, an nntp header such as the following is a news
> provider issue? Note: not being confrontational, just wish to be clear
> regarding the issue so that I understand the limitations of the filters. The
> following header is public & is typical of a gmane.org header from the
> openoffice.org users list.

This is what the news server tells us:
XHDR Delivered-To 201827
221 Delivered-To matches follow (art)
201827 mailing list users@openoffice.org
.

We only see that one, sole, single header. Indeed, the spec requires the news server to only show us that one header.
Comment 105 NoOp 2009-11-07 19:41:25 PST
Thanks & I'll follow your advice to continue to pursue this with openoffice.org (collabnet). Sorry for the added noise on the bug.
Comment 106 John David Galt 2009-11-08 10:50:29 PST
I beg to differ with comment #102.  If XHDR only returns the first header of each type, then filters should not be using XHDR.  They should be examining the entire contents of "View Message Source" up to the first blank line, and the problem is with Firefox.
Comment 107 Steuard Jensen 2009-11-08 13:01:02 PST
Personally, I'd be quite frustrated if I had to wait for Thunderbird to download the full text of every single article (that is, the full content of "View Message Source") before I could do any filtering.  As I see it, much of the purpose of filtering is to decide which messages to download in full, and that's what the XHDR news server interface is supposed to facilitate.  So count this as a vote in favor of the current approach.
Comment 108 NoOp 2009-11-08 18:42:13 PST
OK. Here's one where a list user has posted to the seamonkey support group:

====
Path: Xl.tags.giganews.com!border1.nntp.dca.giganews.com!nntp.giganews.com!local2.nntp.dca.giganews.com!nntp.mozilla.org!news.mozilla.org.POSTED!not-for-mail
NNTP-Posting-Date: Thu, 05 Nov 2009 21:13:11 -0600
Return-Path: <res075oh@gte.net>
X-Original-To: support-seamonkey@lists.mozilla.org
Delivered-To: support-seamonkey@lists.mozilla.org
X-Virus-Scanned: amavisd-new at mozilla.org
Date: Thu, 05 Nov 2009 22:13:30 -0500
From: James <snipped>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.1.4)
	Gecko/20091017 SeaMonkey/2.0
MIME-version: 1.0
To: support-seamonkey@lists.mozilla.org
Subject: WTF IS WRONG WITH YOU?
References: <mailman.65.1257451217.11662.support-seamonkey@lists.mozilla.org>
In-reply-to: <mailman.65.1257451217.11662.support-seamonkey@lists.mozilla.org>
Content-type: text/plain; charset=ISO-8859-1; format=flowed
Content-transfer-encoding: 7bit
X-BeenThere: support-seamonkey@lists.mozilla.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Support for the SeaMonkey Application Suite
	<support-seamonkey.lists.mozilla.org>
List-Unsubscribe: <https://lists.mozilla.org/options/support-seamonkey>,
	<mailto:support-seamonkey-request@lists.mozilla.org?subject=unsubscribe>
List-Post: <mailto:support-seamonkey@lists.mozilla.org>
List-Help: <mailto:support-seamonkey-request@lists.mozilla.org?subject=help>
List-Subscribe: <https://lists.mozilla.org/listinfo/support-seamonkey>,
	<mailto:support-seamonkey-request@lists.mozilla.org?subject=subscribe>
Newsgroups: mozilla.support.seamonkey
Message-ID: <mailman.441.1257477226.526.support-seamonkey@lists.mozilla.org>
Lines: 23
X-Usenet-Provider: http://www.giganews.com
NNTP-Posting-Host: 63.245.208.166
X-AuthenticatedUsername: NoAuthUser
X-Trace: sv3-Aq9YiAo9XWf56rBbtf2iP2e40kpy9uaq83mZt8KjmGxpIAV2aA440GUbf1PI7efXignriBAusWPTO51!0GsMdvEmSpRL6/DJ89cto5qKh0OOuWaj0/lgaN/UC3cqDEKj/ra9N21bhseQXiJTi4nycgGxGCFP!XDQJyJoXgpCy5sm8o9ahIc0wOSfb96R1jqAwnQ==
X-Complaints-To: abuse@mozilla.org
X-DMCA-Complaints-To: abuse@mozilla.org
X-Abuse-and-DMCA-Info: Please be sure to forward a copy of ALL headers
X-Abuse-and-DMCA-Info: Otherwise we will be unable to process your complaint properly
X-Postfilter: 1.3.40
Bytes: 2809
Xref: number.nntp.dca.giganews.com mozilla.support.seamonkey:43690
====

If I set a filter for: Delivered-To: support-seamonkey@lists.mozilla.org
nothing happens. However, If I set a filter on NNTP-Posting-Host: 63.245.208.166
and set to tag and run a filter on the message it works just fine. 

Can someone please explain why a filter on 'NNTP-Posting-Host' works and 'Delivered-To' does not?
Comment 109 NoOp 2009-11-09 20:17:32 PST
Perhaps this is related to:
news server filters not working in SM 2.0a1
https://bugzilla.mozilla.org/show_bug.cgi?id=387337
Comment 110 John David Galt 2009-11-21 14:42:12 PST
Re. comment 107 - Would you consider splitting the difference by allowing the user to specify whether they want to filter on the full headers?  Maybe just have "full headers" as a dropdown item, which would work like "Body" but only cover the part of "View Source" up to the first blank line, and allow regexp matching.  That would cover everything that can be done with a killfile in trn (which is the point).
Comment 111 dennis bloodnok 2009-12-13 10:08:27 PST
although it's appreciated that tb3 has an additional headers filter, it now needs a NOT function in its filters so that one can set a "newsgroup is NOT my.news.group" filter to eliminate all the retards who insist on cross-posting to everywhere.
Comment 112 NoOp 2009-12-13 14:26:22 PST
Better asked and answered on mozilla.support.seamonkey/thunderbird, but:

Filter name: Crossposts 
o Match all of the following
Newsgroups|contains|,

You can then flag, tag etc from there.

Note You need to log in before you can comment on or make changes to this bug.