Closed Bug 506397 Opened 15 years ago Closed 15 years ago

Support multiple spam/junk corpus files

Categories

(MailNews Core :: Filters, enhancement)

enhancement
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED
Thunderbird 3.0b4

People

(Reporter: rkent, Assigned: rkent)

References

(Blocks 1 open bug)

Details

Attachments

(1 file, 4 obsolete files)

As I thought about how to improve spam processing in TB, it occurred to me that the sharing of the spam database, such as requested in Bug 181471, could best be accomplished if the backend supported multiple spam corpus files. That is, the spam corpus used by the bayesian filter would be a combination of the local user's training file, and some sort of static file that is distributed either on a local LAN, or using an internet service.

This is part of what is needed for bug 181471, hence marking the blocking. The rest of that bug could then be implemented in an extension.
Attached patch WIP but no known issues (obsolete) — Splinter Review
Here's a basically working patch.
Whiteboard: [has patch]
Attached patch minor tweaks (obsolete) — Splinter Review
Let's get this reviewed.
Attachment #392313 - Attachment is obsolete: true
Attachment #392562 - Flags: superreview?(bienvenu)
Attachment #392562 - Flags: review?(bugzilla)
Whiteboard: [has patch] → [needs r standard8, sr bienvenu]
Comment on attachment 392562 [details] [diff] [review]
minor tweaks

probably worth changing the member name to something like mJunkPlugin:
-    nsCOMPtr<nsISupports> mSupports;
+    nsCOMPtr<nsIJunkMailPlugin> mSupports;

should you do more to handle this error, e.g., continue to go back to the start of the loop?

+        rv = traitService->GetAliases(antiTrait, &antiAliasesLength, &antiAliases);
+        if (NS_FAILED(rv))
+          NS_ERROR("trait service failed to get aliases");
+      }
(In reply to comment #3)
> (From update of attachment 392562 [details] [diff] [review])
> probably worth changing the member name to something like mJunkPlugin:
> -    nsCOMPtr<nsISupports> mSupports;
> +    nsCOMPtr<nsIJunkMailPlugin> mSupports;
>
I left this name because, as I understood the code, the only use of this variable was to generate a reference, which just uses the nsISupports functionality. I had to specify though one of the interfaces because the compiler complained about the ambiguity. I can change it if you want though.

> should you do more to handle this error, e.g., continue to go back to the start
> of the loop?
> 
> +        rv = traitService->GetAliases(antiTrait, &antiAliasesLength,
> &antiAliases);
> +        if (NS_FAILED(rv))
> +          NS_ERROR("trait service failed to get aliases");
> +      }

My rational in not handling it is this. Aliases are likely to be optional additions to the accuracy of classification (think of an external junk corpus that is used to enhance the accuracy of the local training) that are added by extensions (which could be unreliable), so it was best to continue mail processing even without the alias. So I thought about the error handling, and this is what I thought was best. I'm not real happy with the error reporting using NS_ERROR, but that is a larger problem in Mozilla mailnews that has no simple answers here. So I'm not convinced that anything else makes sense, but you can try to convince me.
Is there any hope of getting this reviewed soon?
I meant to say that NS_ERROR does nothing in release builds, so there's no handling that case; wouldn't adding a "continue;" at least skip the garbage data?
I think calling it mSupports would make people think it's an nsISupports, and it's not, so it would just make the code clearer, I think.
Attached patch Fixed nits, carrying over r/sr (obsolete) — Splinter Review
Patch to checkin.
Attachment #392562 - Attachment is obsolete: true
Attachment #394343 - Flags: superreview+
Attachment #394343 - Flags: review+
Attachment #392562 - Flags: superreview?(bienvenu)
Attachment #392562 - Flags: review?(bugzilla)
Comment on attachment 394343 [details] [diff] [review]
Fixed nits, carrying over r/sr

Oops, wrong bug.
Attachment #394343 - Attachment is obsolete: true
Attachment #394343 - Flags: superreview+
Attachment #394343 - Flags: review+
Comment on attachment 392562 [details] [diff] [review]
minor tweaks

Sorry about the bugspam.
Attachment #392562 - Attachment is obsolete: false
Attachment #392562 - Flags: superreview?(bienvenu)
Attachment #392562 - Flags: review?(bugzilla)
(In reply to comment #6)
> I meant to say that NS_ERROR does nothing in release builds, so there's no
> handling that case; wouldn't adding a "continue;" at least skip the garbage
> data?

I wish there was a simple way to report this error. I think what I should do is to add an entry to the log at least, since we have one.

As to the continue, that would skip the current trait. But that trait is likely to be JUNK, and skipping it would completely skip JUNK processing. Instead, what I did was to make sure that the alias variables were initialized to values that skip the bad data:

  PRUint32 proAliasesLength = 0;
  PRUint32* proAliases = nsnull;

Isn't that what you were trying to accomplish with your continue?
(In reply to comment #11)

> 
> As to the continue, that would skip the current trait. But that trait is likely
> to be JUNK, and skipping it would completely skip JUNK processing. Instead,
> what I did was to make sure that the alias variables were initialized to values
> that skip the bad data.

ah, ok, as long as it's ok to have null elements and 0 lengths in pro/Anti Aliases and pro/AntiAliasesLengths, then that looks OK.
Attachment #392562 - Flags: review?(bugzilla) → review+
Comment on attachment 392562 [details] [diff] [review]
minor tweaks


>+   * @param aTraitIndex:  the internal identifier for the aliased trait
>+   * @param aTraitAlias:  the internal identifier for the alias to add

nit: we don't normally use : after parameter names (several places in the one file).

>-    nsCOMPtr<nsISupports> mSupports;
>+    nsCOMPtr<nsIJunkMailPlugin> mSupports;

I think as David suggested it would be better to change the name of this variable.
Carrying over review.
Attachment #392562 - Attachment is obsolete: true
Attachment #395327 - Flags: superreview?(bienvenu)
Attachment #395327 - Flags: review+
Attachment #392562 - Flags: superreview?(bienvenu)
Whiteboard: [needs r standard8, sr bienvenu] → [needs sr bienvenu]
Comment on attachment 395327 [details] [diff] [review]
Fixed Standard8's and Bienvenu's nits

This hasn't changed from the original code, from what I can tell, but why is it OK not to close the Stream/file here?

+  FILE* stream;
+  nsresult rv = aFile->OpenANSIFileDesc("rb", &stream);

What led me to ask is that instead of doing if (error = ...)
  break;

you could return NS_ERROR_FAILURE directly. But if it turns out you want to close the file before returning, then you can't...
(In reply to comment #15)
>
>  why is it OK not to close the Stream/file here?
>

You're right, the closing of traits.dat needs to be fixed. I'll be happy to do that. Thanks for catching that!
Carrying over r=Standard8
Attachment #395327 - Attachment is obsolete: true
Attachment #396001 - Flags: superreview?(bienvenu)
Attachment #396001 - Flags: review+
Attachment #395327 - Flags: superreview?(bienvenu)
Attachment #396001 - Flags: superreview?(bienvenu) → superreview+
Keywords: checkin-needed
Whiteboard: [needs sr bienvenu] → [needs checkin]
Checked in: http://hg.mozilla.org/comm-central/rev/5de0a596f65c
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Keywords: checkin-needed
Resolution: --- → FIXED
Whiteboard: [needs checkin]
Summary: Support multiple spam corpus files → Support multiple spam/junk corpus files
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: