Closed Bug 414179 Opened 13 years ago Closed 13 years ago

Add junkpercent support to nsIMsgSearchTerm

Categories

(MailNews Core :: Filters, enhancement)

enhancement
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED
mozilla1.9.1a1

People

(Reporter: rkent, Assigned: rkent)

References

(Blocks 1 open bug)

Details

Attachments

(1 file, 4 obsolete files)

The new variable bayesjunkscore from bug 366491 will permit a filter view that shows emails which the bayes filter is uncertain about. We need to add that variable to the filter terms to allow the functionality desired by bug 209890.

Desired results: users should be able to write a filter which shows emails within a certain range of bayesjunkscore
Standard8, I added you to the cc: list so we can continue the discussion here that we started in m.d.a.thunderbird. I hope that is considered proper.

(The issue is, can we simply add additional terms to nsMsgSearchAttribValue, or do we need to somehow add additional capacity? Existing space seems to be reserved for LDAP).

I spent a few hours going over the filter code, and I could not find any reason that the attribute values are limited. The filters are stored in files by text descriptions, not by numbered values. There was quite a bit of discussion in bug 168553 about how to future proof the filter files, and maybe they succeeded.

I also tried adding an extra 100 terms to the attribute list by simply changing, in nsMsgSearchCore.idl, this:

    //49 is for showing customize... in ui headers start from 50 onwards up until 99.
    const nsMsgSearchAttribValue OtherHeader = 49;  /* for mail and news. MUST ALWAYS BE LAST attribute since we can have an arbitrary # of these... */
    const nsMsgSearchAttribValue kNumMsgSearchAttributes = 100;      /* must be last attribute */

to this:

    //149 is for showing customize... in ui headers start from 150 onwards up until 199.
    const nsMsgSearchAttribValue OtherHeader = 149;  /* for mail and news. MUST ALWAYS BE LAST attribute since we can have an arbitrary # of these... */
    const nsMsgSearchAttribValue kNumMsgSearchAttributes = 200;      /* must be last attribute */

My code compiled and ran fine, including using existing filters and virtual folders. But doing that just wastes memory, as we can do this at any time when we need additional attributes. It seems to me that the easiest way to add a search attribute for bayesjunkscore is still to take one from those "reserved for LDAP", that is value 45. If at any point in time LDAP needs additional terms, then we can increase the values of OtherHeader and kNumMsgSearchAttributes to use the additional memory.

I anyone reading this knows differently, please correct me. Perhaps there are historical reasons for leaving this alone that I am unaware of.
Status: NEW → ASSIGNED
Attached patch Filtering on junkpercent (obsolete) — Splinter Review
Here is a work-in-progress patch for this. This requires the latest patch for bug 366491 (https://bugzilla.mozilla.org/attachment.cgi?id=312215). Note this filters on the revised name for the Bayes filter result variable, which is junkpercent.

Here's my current issue. When I try to use this filter to setup a virtual folder for displaying uncertain spam status, it works but shows me the uncertain bugs that I have already processed and marked manually. I'd like to eliminate those from the display, but that requires filtering on junkstatusorigin. That's because in bug 366491 I decided to leave the junkpercent at its bayes value when I manually mark, rather than set to 0 or 100. I still think that is what I want to do (at least for someone like me who is monitoring spam performance) but it does complicate this filter.

So I think for my next step, I will add a filter term for junkstatusorigin.
This patch now finally lets me do what I want.

So now I can setup two virtual folders. My main virtual folder (which serves as my inbox), which I call NotFiled, has as one of its filters JunkPercent <75. I also have an Uncertain folder, which has filters for 10 < JunkPercent < 90 Relatively certain junk (with score>90) is moved automatically to the Junk folder. Uncertain mail is not moved, but does not show up in my virtual inbox.

Although I still store junkpercent in the database as the true result from the bayes filter plugin, in the search I modify is slightly to make it more search friendly. That is, if the junk status did not come from plugin, then junkpercent is set at 0 or 100 according to the junkstatus.

I'm supposed to check the Unread count in the Uncertain virtual folder, and decide if that mail is ham or spam. Unfortunately there are unrelated bugs in maintaining the unread count of the virtual folder in the folder pane, so this does not work as well as it could. But the junkpercent filtering is working fine.

In the end, I don't need the junkscoreorigin filter to accomplish the uncertain folder. But since I did the work, I left it in.

I'm going to let this bake a little longer on my system before I ask for review. Note this patch, like the previous, requires the most recent patch from bug 366491.
Attachment #312778 - Attachment is obsolete: true
Summary: Add bayesjunkscore support to nsIMsgSearchTerm → Add junkpercent support to nsIMsgSearchTerm
Status: I've been running this patch for awhile. There's a problme though in maintaining the unread count on the virtual folders created with these filters, which limits its usefulness. I've finally decided to file that separately as bug 428427 and get on with this patch. I need to review it again and check for bitrot before I ask for peer review (and wait for bug 366491 to land, which has now cleared all reviews).
Attachment #313274 - Attachment is obsolete: true
Attachment #316368 - Flags: review?(bugzilla)
I'm not sure about the use of the term "plugin" in this patch, would add-on or extension be better as that's what we typically call them from the user perspective.

Also, with this patch you can't actually just do a normal search within folders for junk origin/percent - something I think a user may want.

Finally, would the user want to be easily able to see the junk percentage actualy value?
(In reply to comment #6)
> I'm not sure about the use of the term "plugin" in this patch, would add-on or
> extension be better as that's what we typically call them from the user
> perspective.
>

Add-on or extension implies to me the extension capability, but the junk filter is not part of this. How about "Junk Analyzer"? BTW this only applies to the localization strings, not the internal variable which should stay "plugin". Note I have reverted internally to plugin instead of the my change bayesjunkscore after discussions in bug 366491

> Also, with this patch you can't actually just do a normal search within folders
> for junk origin/percent - something I think a user may want.
>

By "normal search" I think you mean to say "filter after the fact" The existing patch works for search, but not for filters. 

There is an architectural problem in the current filter scopes here as I understand them. Junk processing doesn't really work during mail filtering, because the junk analyzer is run after the filters. But the same scope offlineMailFilterTable is used for both normal filtering and after-the-fact filtering. (I'm in the process of restoring the after-the-fact filtering to tags currently in bug 217034 which was originally removed because of this same general issue). The related junkstatus filter issue is the subject of several high vote bugs such as bug 198100, bug 223591, and specifically bug 196036. The issues here are identical to bug 196036, that is if we enable junkpercent and junkscoreorigin filtering than many users will try to use it, and fail, in normal filtering.

I believe that after-the-fact filtering for junkpercent and junkscoreorigin would be useful in an after-the-fact filter. But because of the issues, may I suggest that we deal with them as part of bug 196036?

> Finally, would the user want to be easily able to see the junk percentage
> actualy value?

See bug 209890

So, if I understand your intentions correctly, we should get extra options added to the dialogs:

1) Search Messages
2) Saved Search

The additional options being:

a) Junk Score Origin
b) Junk Score Percent

On TB I've rebuilt in mailnews and mail, but I am only seeing these extra options in the Saved Search dialog. Am I doing something wrong?

So I think I can see why Junk Score Origin may be useful to the some users, but how useful would Junk Score Percent be? Would something like fuzziness be better? e.g. not junk, may be junk, most likely junk, definitely junk.
(In reply to comment #8)
> So, if I understand your intentions correctly, we should get extra options
> added to the dialogs:
> 
> 1) Search Messages
> 2) Saved Search
> 
> 
> On TB I've rebuilt in mailnews and mail, but I am only seeing these extra
> options in the Saved Search dialog. Am I doing something wrong?
>
I'm not sure what you mean by each of these dialogs. Let me use different terminology and see if we agree.

There is the search editor dialog SearchDialog.xul, reachable for example by right clicking on a folder and selecting Search. The new terms appear there.

There is also the quick search toolbar, which displays a subset of the available search terms defined in quick-search-menupopup. The new terms do not appear there.

Are these the two locations you are talking about? If so then I agree the new terms appear in one, and not in the other. So far, the work I have been doing in the filter area, including this bug, has been backend work. I have added the minimal user interface necessary to use the features at all. It may be appropriate to add additional UI to better utilize these new terms, but that should be another bug. In any case, I doubt if these new terms are appropriate for the quick search toolbar. The main target is virtual folders.
> So I think I can see why Junk Score Origin may be useful to the some users, but
> how useful would Junk Score Percent be? Would something like fuzziness be
> better? e.g. not junk, may be junk, most likely junk, definitely junk.
> 
I would actually argue the opposite. The main target here is virtual folders. In your Main viewing folder, you aggressively hide items that may be junk. In an Uncertain folder, you show items in the middle range. This approach greatly increases the effectiveness of junk processing - at the cost of adding the additional requirement on the user that she occasionally check the Uncertain folder, and train any items located there. (You also really need the fix for bug  428427 or the count bugs will make this unusable). Now you may decide that the correct UI is to present this as your fuzziness categories, but that is a UI decision. The backend still needs this filtering capability by junk percent. If you move it upstream into the bayesian filter, then you really are back at the current situation, where you have little control of the process. I would still in any case argue for showing junk percent. It is important for developing user confidence in the junk filters. And you need confidence to be willing to reduce the junk percent cutoff points, and risking more false positives.

The junkscoreorigin terms were added at a time when I thought I needed them for the uncertain virtual folder. As it turns out I don't, but I left them in anyway. (BTW ultimately I expect to propose that we change the UI so that the appearance of the junk/nonjunk icon changes depending on the junkscoreorigin).

I've been running this patch for many weeks. It's pretty clear that lowering the junk percent threshold from our default of 90 to a much lower value, say 50-70, almost completely eliminates spam (though I am using a SpamAssassin server-based filter as well). But you risk the occasional false positive, hence the need for the uncertain folder.
(In reply to comment #9)
> (In reply to comment #8)
> > So, if I understand your intentions correctly, we should get extra options
> > added to the dialogs:
> > 
> > 1) Search Messages
> > 2) Saved Search
> > 
> > 
> > On TB I've rebuilt in mailnews and mail, but I am only seeing these extra
> > options in the Saved Search dialog. Am I doing something wrong?
> >
> I'm not sure what you mean by each of these dialogs. Let me use different
> terminology and see if we agree.
> 
> There is the search editor dialog SearchDialog.xul, reachable for example by
> right clicking on a folder and selecting Search. The new terms appear there.

Or by doing Edit -> Find -> Search Messages. I did not see the new terms appear in this dialog.

> There is also the quick search toolbar, which displays a subset of the
> available search terms defined in quick-search-menupopup. The new terms do not
> appear there.

That wasn't the other one I was on about. The one I was on about was the File -> New -> Saved Search dialog. That did have the two new terms appear.

I think I'm happy with the other comments now, so thanks for those.
(In reply to comment #10)
> Or by doing Edit -> Find -> Search Messages. I did not see the new terms appear
> in this dialog.
This does work for my setup, please check again. Note that the new terms are not enabled for online IMAP, so if your selected folder is online IMAP the terms will not appear.
 

Comment on attachment 316368 [details] [diff] [review]
After some small corrections, ready for review

Apologies for the delay in reviewing this, I'd had it in my mind there was still a question to answer, but you'd already answered it.

>Index: mailnews/base/resources/content/mailWidgets.xml
>+          <xul:menuitem value="plugin" stringTag="junkScoreOriginPlugin" class="search-value-menuitem"/>
>+          <xul:menuitem value="user" stringTag="junkScoreOriginUser" class="search-value-menuitem"/>
>+          <xul:menuitem value="filter" stringTag="junkScoreOriginFilter" class="search-value-menuitem"/>
>+          <xul:menuitem value="whitelist" stringTag="junkScoreOriginWhitelist" class="search-value-menuitem"/>
>+          <xul:menuitem value="imapflag" stringTag="junkScoreOriginImapFlag" class="search-value-menuitem"/>

nit: please put the class on the next line for each of these (align with the start of value).

>Index: mailnews/base/search/public/nsIMsgSearchTerm.idl
>Index: mailnews/base/search/public/nsIMsgSearchValue.idl

Please add documentation to the new functions/attributes you've defined in these files.

>Index: mailnews/base/search/public/nsMsgSearchAdapter.h
>-    a == nsMsgSearchAttrib::FolderInfo || a == nsMsgSearchAttrib::JunkStatus);
>+    a == nsMsgSearchAttrib::FolderInfo || a == nsMsgSearchAttrib::JunkStatus || 
>+    a == nsMsgSearchAttrib::JunkPercent);

nit: unnecessary space on the end of the first line changed

>Index: mailnews/base/search/src/nsMsgLocalSearch.cpp

>+      case nsMsgSearchAttrib::JunkPercent:
>+      {
>+        // When the junk status is set by the plugin, use junkpercent (if available)
>+        // Otherwise, use the limits (0 or 100) depending on the junkscore.
>+        PRUint32 junkPercent;
>+        PRInt32 err;

nit: please make this variable rv not err.

>Index: mailnews/base/search/src/nsMsgSearchTerm.cpp
>@@ -1282,10 +1292,59 @@ nsresult nsMsgSearchTerm::MatchJunkStatu
> 
>   *pResult = matches;
>   return rv;
> }
> 
>+nsresult nsMsgSearchTerm::MatchJunkScoreOrigin(const char *aJunkScoreOrigin, PRBool *pResult)
>+{
>+  NS_ENSURE_ARG_POINTER(pResult);

You should arg check aJunkScoreOrigin as well.

r=me with those fixed.
Attachment #316368 - Flags: review?(bugzilla) → review+
This patch adds the unit tests, and fixes issues in the comments.

The patch is designed to work stand-alone, however it includes duplicates of testing files that are already included in patches for bug 217034 and bug 228675. The theory is that those bugs will land first, and then this patch will be revised to eliminate the duplicate files. Standard8, I thought you might want to see more examples of my attempts to use unit tests before deciding on overall infrastructure issues, so I pushed out this patch instead of waiting for the others. If you would rather wait to review until those bugs have landed, and the duplicates are removed from this patch, that would also be OK with me.

The local mail account unit testing infrastructure from bug 434810 is reimplemented here as well, as in the other unit tests I've done. It's pretty clear the standard local mail account setup needs to be agreed on, and put into a central location.

I added a test also for bug 366491 since it was easy to do at the same time (Actually I started writing the test for the current bug, then realized I had instead written a test for the previous bug!) But that brought up the question of where to place the test. I ended up placing it under mailnews/base, but you could easily argue it should go under mailnews/extensions/bayesian-spam-filter The argument here is that most of the issues in bug 366491 dealt with all of the infrastructure changes to support junkpercent, with rather small changes in the bayes filter code. But we are going to face that issue a lot since the mailnews code is very interdependent.

Concerning implementing the comments, there are a couple of places I deviated slightly. Concerning:

> >Index: mailnews/base/search/src/nsMsgLocalSearch.cpp
> 
> >+      case nsMsgSearchAttrib::JunkPercent:
> >+      {
> >+        // When the junk status is set by the plugin, use junkpercent (if available)
> >+        // Otherwise, use the limits (0 or 100) depending on the junkscore.
> >+        PRUint32 junkPercent;
> >+        PRInt32 err;
> 
> nit: please make this variable rv not err.

I did this partially, and not without reservations. The existing declaration of err was a mistake, as it redeclared "err" which is intended to be used outside of the case. I declared "rv" only for the the local use with .ToInteger, but that routine doesn't really use an nsresult, causing the ugly explicit cast. But that's been done by others as well.

> >+nsresult nsMsgSearchTerm::MatchJunkScoreOrigin(const char *aJunkScoreOrigin, PRBool *pResult)
> >+{
> >+  NS_ENSURE_ARG_POINTER(pResult);
> 
> You should arg check aJunkScoreOrigin as well.

Existing similar code was designed to work with null values, so that is what I did instead.
Attachment #316368 - Attachment is obsolete: true
Attachment #322560 - Flags: review?(bugzilla)
(In reply to comment #13)
> Created an attachment (id=322560) [details]
> Fixed issues in comments, added unit tests
...
> I added a test also for bug 366491 since it was easy to do at the same time
> (Actually I started writing the test for the current bug, then realized I had
> instead written a test for the previous bug!) But that brought up the question
> of where to place the test. I ended up placing it under mailnews/base, but you
> could easily argue it should go under mailnews/extensions/bayesian-spam-filter
> The argument here is that most of the issues in bug 366491 dealt with all of
> the infrastructure changes to support junkpercent, with rather small changes in
> the bayes filter code. But we are going to face that issue a lot since the
> mailnews code is very interdependent.

Yes I think we are going to come up with that sort of issue a lot, and we're going to have to make a best judgement.

> > >Index: mailnews/base/search/src/nsMsgLocalSearch.cpp
> > 
> > >+      case nsMsgSearchAttrib::JunkPercent:
> > >+      {
> > >+        // When the junk status is set by the plugin, use junkpercent (if available)
> > >+        // Otherwise, use the limits (0 or 100) depending on the junkscore.
> > >+        PRUint32 junkPercent;
> > >+        PRInt32 err;
> > 
> > nit: please make this variable rv not err.
> 
> I did this partially, and not without reservations. The existing declaration of
> err was a mistake, as it redeclared "err" which is intended to be used outside
> of the case. I declared "rv" only for the the local use with .ToInteger, but
> that routine doesn't really use an nsresult, causing the ugly explicit cast.
> But that's been done by others as well.

Ok, so I didn't quite make myself clear enough, what I intended here was for you to declare rv as a PRInt32, but could you actually make this section:

#ifdef MOZILLA_INTERNAL_API
   PRInt32 rv;
#else
   nsresult rv;
#endif

(with appropriate indentation). Then it'll be ready for the frozen-linkage migration.
Comment on attachment 322560 [details] [diff] [review]
Fixed issues in comments, added unit tests

Index: mailnews/base/test/unit/test_bug366491.js

When we check this in, we should set in-testsuite+ on bug 366491 (and add a comment as to the bug that added it).

Index: mailnews/base/test/unit/test_bug414179.js

test_searchJunk.js maybe?

+var fileName = "mailnews/test/resources/bugmail1";

This can be a const.

As you know, please fix the tests to work with the latest setup per bug 434810.

r=me with the comments fixed.
Attachment #322560 - Flags: review?(bugzilla) → review+
In addition to fixing remaining minor nits, this version modified the unit tests to support the current framework, plus forced async operations to run sequentially, as we've discovered in other bugs that is important on certain platforms. Carrying over Standard8's r, ready for sr.
Attachment #322560 - Attachment is obsolete: true
Attachment #325790 - Flags: superreview?(bienvenu)
Attachment #325790 - Flags: review+
Comment on attachment 325790 [details] [diff] [review]
Fixed more nits, unit test async operations forced sequential

The code looks good. I'm not convinced that this should be part of the UI by default but the backend work is all required even if the UI is simply an extension.
Attachment #325790 - Flags: superreview?(bienvenu) → superreview+
The issue of starting to include this in the default UI starts with bug 209890. As you say, this current bug is backend work.

There are additional issues still to be addressed before this can be used effectively. Bug 428427 addresses problems in maintaining accurate unread counts for virtual folders, which also affects tags but is more acute for junk-related virtual folders. There also needs to be some sort of method for junk data to survive when the folder databases are reindexed. Still not sure what to do about that.

I'm currently planning to do an extension that provides expanded spam management capabilities after the backend work is completed.
Keywords: checkin-needed
Checking in mail/locales/en-US/chrome/messenger/messenger.properties;
/cvsroot/mozilla/mail/locales/en-US/chrome/messenger/messenger.properties,v  <--  messenger.properties
new revision: 1.57; previous revision: 1.56
done
Checking in mail/locales/en-US/chrome/messenger/search-attributes.properties;
/cvsroot/mozilla/mail/locales/en-US/chrome/messenger/search-attributes.properties,v  <--  search-attributes.properties
new revision: 1.7; previous revision: 1.6
done
Checking in mailnews/base/resources/content/mailWidgets.xml;
/cvsroot/mozilla/mailnews/base/resources/content/mailWidgets.xml,v  <--  mailWidgets.xml
new revision: 1.129; previous revision: 1.128
done
Checking in mailnews/base/search/public/nsIMsgSearchTerm.idl;
/cvsroot/mozilla/mailnews/base/search/public/nsIMsgSearchTerm.idl,v  <--  nsIMsgSearchTerm.idl
new revision: 1.18; previous revision: 1.17
done
Checking in mailnews/base/search/public/nsIMsgSearchValue.idl;
/cvsroot/mozilla/mailnews/base/search/public/nsIMsgSearchValue.idl,v  <--  nsIMsgSearchValue.idl
new revision: 1.12; previous revision: 1.11
done
Checking in mailnews/base/search/public/nsMsgSearchAdapter.h;
/cvsroot/mozilla/mailnews/base/search/public/nsMsgSearchAdapter.h,v  <--  nsMsgSearchAdapter.h
new revision: 1.38; previous revision: 1.37
done
Checking in mailnews/base/search/public/nsMsgSearchCore.idl;
/cvsroot/mozilla/mailnews/base/search/public/nsMsgSearchCore.idl,v  <--  nsMsgSearchCore.idl
new revision: 1.33; previous revision: 1.32
done
Checking in mailnews/base/search/src/nsMsgImapSearch.cpp;
/cvsroot/mozilla/mailnews/base/search/src/nsMsgImapSearch.cpp,v  <--  nsMsgImapSearch.cpp
new revision: 1.55; previous revision: 1.54
done
Checking in mailnews/base/search/src/nsMsgLocalSearch.cpp;
/cvsroot/mozilla/mailnews/base/search/src/nsMsgLocalSearch.cpp,v  <--  nsMsgLocalSearch.cpp
new revision: 1.84; previous revision: 1.83
done
Checking in mailnews/base/search/src/nsMsgSearchTerm.cpp;
/cvsroot/mozilla/mailnews/base/search/src/nsMsgSearchTerm.cpp,v  <--  nsMsgSearchTerm.cpp
new revision: 1.163; previous revision: 1.162
done
Checking in mailnews/base/search/src/nsMsgSearchValue.cpp;
/cvsroot/mozilla/mailnews/base/search/src/nsMsgSearchValue.cpp,v  <--  nsMsgSearchValue.cpp
new revision: 1.31; previous revision: 1.30
done
RCS file: /cvsroot/mozilla/mailnews/base/test/unit/test_bug366491.js,v
done
Checking in mailnews/base/test/unit/test_bug366491.js;
/cvsroot/mozilla/mailnews/base/test/unit/test_bug366491.js,v  <--  test_bug366491.js
initial revision: 1.1
done
RCS file: /cvsroot/mozilla/mailnews/base/test/unit/test_searchJunk.js,v
done
Checking in mailnews/base/test/unit/test_searchJunk.js;
/cvsroot/mozilla/mailnews/base/test/unit/test_searchJunk.js,v  <--  test_searchJunk.js
initial revision: 1.1
done
Checking in suite/locales/en-US/chrome/mailnews/messenger.properties;
/cvsroot/mozilla/suite/locales/en-US/chrome/mailnews/messenger.properties,v  <--  messenger.properties
new revision: 1.146; previous revision: 1.145
done
Checking in suite/locales/en-US/chrome/mailnews/search-attributes.properties;
/cvsroot/mozilla/suite/locales/en-US/chrome/mailnews/search-attributes.properties,v  <--  search-attributes.properties
new revision: 1.22; previous revision: 1.21
done
Flags: in-testsuite+
Keywords: checkin-needed
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla1.9.1a1
Product: Core → MailNews Core
You need to log in before you can comment on or make changes to this bug.