Open Bug 482617 Opened 15 years ago Updated 2 years ago

add junk tab to preferences

Categories

(Thunderbird :: Preferences, enhancement)

enhancement

Tracking

(Not tracked)

Thunderbird 3.0b4

People

(Reporter: clarkbw, Unassigned)

References

(Depends on 1 open bug, Blocks 1 open bug, )

Details

(Whiteboard: [m6][needs updated patch])

Attachments

(4 files)

Junk effects most mail users almost everyday. The current location of the junk preferences under Privacy does not lend to it being found or give enough space to accurately describe the preferences and where other account preferences exist.

This bug is for creating a 'Junk' tab which removes the junk sub-tab from the current 'Privacy' tab.  We can do a wholesale copy as a first pass but we need to rethink the Junk tab from scratch as well.

Our interests are this:
 * help find the junk preferences
 * help find the junk account settings
 * explain the junk preferences clearly
 * hide the advanced junk settings out of the main preferences

attachment 366669 [details] of bug 482610 has the icon we need for this pref pane
Flags: blocking-thunderbird3+
I'm sortof accidentally adding backend junk support for news (and even RSS if we want it) in bug 471833 and friends. If we are going to touch the junkmail UI, we should really make sure that we add whatever necessary to support news as well in the FE. For example, "move" doesn't make sense for junk messages in news, so we will probably use kill subthread as the equivalent. Also, news bodies are not normally donwloaded, but junk needs them. We will need to allow this to be enabled/disabled on a per-folder basis (which inheritedStringProperties supports) as people may not want to download bodies on some high volume news lists.
(In reply to comment #0)
> attachment 366669 [details] of bug 482610 has the icon we need for this pref pane

Will this icon replace the old junk icon all over the app? If not, why do we need a different junk icon for the junk tab? IMHO this would be confusing.
(In reply to comment #1)
> For example, "move" doesn't make sense for junk messages in
> news, so we will probably use kill subthread as the equivalent. 

Maybe we can find more generic language to describe the situation.  This is part of a constant effort to keep the preferences about the behavior level of things and the account settings about the lower level details of the behavior.  So I think that change makes sense.

> Also, news
> bodies are not normally donwloaded, but junk needs them. We will need to allow
> this to be enabled/disabled on a per-folder basis (which
> inheritedStringProperties supports) as people may not want to download bodies
> on some high volume news lists.

Sounds like we should at least make a special note for the user to look at the account settings for newsgroups and possibly rss.

(In reply to comment #2)
> Will this icon replace the old junk icon all over the app? If not, why do we
> need a different junk icon for the junk tab? IMHO this would be confusing.

Yes, this is part of a effort to get a better junk icon for the rest of Thunderbird.  Two different icons for the same thing probably doesn't make sense with this interface.
In regards to the news/email issue, would moving

 [x] When I mark messages as junk
     (o) Move them to the account's "Junk" folder
     ( ) Delete them

out of the Junk tab and into the Junk Account Settings be a solution?

Inside the New Account it could then be:

 [x] Delete threads I mark as junk
Looking at the Junk Account Settings, do people want to be able to customize

Do not mark mail as junk if the sender is:
  [x] Personal Address Book
  [ ] Collected Addresses

[x] Trust junk mail headers sent by: [SpamAssassin|v]

Wouldn't it make sense to have those in the Preferences Junk Tab, so they affect all accounts?


Right now the Junk Tab only has 3 options, so I am wondering, do we really need a Junk Tab?

Is this option worth keeping?
[ ] Mark messages determined to be junk as read

And perhaps
[ ] Enable junk filter logging [Show log] [Reset Training Data]

would be more useful per account basis?

In that case the Junk tab would have no options left =)
There are a few hidden junk preferences, that we could consider elevating to visible in the junk tab. These are currently shown in my JunQuilla extensions.

mailnews.bayesian_spam_filter.junk_maxtokens limits the growth of the token database. The default value is 100000 - but I think that is too small, I would recommend to people 300000. It was limited to minimize memory use.

mail.adaptivefilters.junk_threshold is the setpoint for when a message is considered spam. It defaults to 90, but I think that is too conservative if you are diligent with your training. I use 75.
Setting for m3.  We already have the necessary icon, we just need a design for both the Security and Junk tab.
Whiteboard: [m3]
Right now I'm going to put together this patch using the icon in attachment 368264 [details] from bug 484199 for every theme.  Initially I'm only going to port over the current Junk preferences without any changes and then will asses any additions.
Whiteboard: [m3] → [m4]
As another issue to consider, in bug 486420 I am asking the question of the correct default and UI for the control of whitelisting when the "from" matches your address or domain. The answer will probably be that your new pane will need to provide an interface to the two preferences that control that.
I haven't tested this patch in Mac or Windows so I'm just taking a guess at what is needed.  I also know that this icon is going to look awkward on the Mac, but until we get a Mac icon I think we can rely on this one.

I went a little further than I originally said, I couldn't just leave the tab panels in place.  Will upload screenshots of the changes next.
Blocks: 487132
I'm moving this to m5 since I don't have this finished or reviewed yet.  I think I can get something in this week.

My plan is to continue with the current iteration as seen in attachment 371553 [details] but doing some better grouping.  The preferences is going to continue to be about general behavior, with an account settings button that takes the user to the account settings page.

I see this as a behavior question, either keep the message until the person purges or move the message out of the view (to the junk folder).  It could be better worded and I believe should default to the move option on.

 [x] When I mark messages as junk
     (o) Move them to the account's "Junk" folder
     ( ) Delete them

This does seem to be an odd option.  I understand it's use but it doesn't seem like an upfront control; however I'm not really looking to fight that fight now.

[ ] Mark messages determined to be junk as read

I believe these options are global and not per account. I don't really like having them so up front but I don't want to try to hide them in advanced.

[ ] Enable junk filter logging [Show log] [Reset Training Data]

bug 486420 seems like it's a per-account pref, likely with the defaults as you suggested.

Here are some other bugs I've been looking at while working on this.

Some global options need to be moved into account settings
bug 232486

Junk controls split between prefs / account options
bug 352428

Trust Junk Mail headers
bug 323159

Group Junk Items
bug 397197
Whiteboard: [m4] → [m5]
I have a concern that this UI seems to conflate the concepts of "Advanced" with "Per Account". It makes it very difficult to add global advanced options.

What I would prefer is if the "Advanced" section of the main tab ("more advanced and account-specific ...") instead focus on "Per Account", and add an Advanced tab to the global preferences junk pane.

In the Advanced tab, I would have the following features:

  Shall we skip whitelisting on emails that may spoof:
    [x] Your email address
    [ ] Your domain

  [90] Percent score to mark messages as junk

  [100000] Maximum number of tokens allowed in the junk database

I would also move "Reset training data" and "Enable Junk Logging" into the advanced tab. (I really don't consider either of these as un-advanced options, and I specifically do not want to encourage casual resetting of the training data. Note there has been a bug fixed in TB 3.0 that reduces the need to reset the training data.)

There has been discussion in the past about global versus per-account settings. My recent backend work has generally tried to make it possible for features to be enabled on a very specific and/or very general level. But that does not mean that in normal usage we expect to show UI for all of these levels. Preferences are very easy to add in extensions, and that is where unusual options belong.

So while "Shall we skip whitelisting on emails that may spoof" can be set on a per-account basis, I think under normal usage it is better as a global option.
Blocks: 486420
Whiteboard: [m5] → [m6][needs updated patch]
Blocks: 487277
Just to be clear, what I'm reading you suggest is that we have two sub-tabs in the Junk tab for General and Advanced junk options.  e.g.

+-------------------------------------------------------------------------+
|                              +------+                                   |
| Main | Display | Composition | Junk | Security | Attachments | Advanced |
|------------------------------+------+-----------------------------------|
| General  | Advanced |                                                   |


I think I agree with the break down of advanced vs. per account you gave.

I'm not sure about the Max number of tokens option, that seems beyond advanced to me at least.
Yes, two subtabs makes sense. The link to account settings, which are not really advanced, should be in the General tab.

Here's why I am promoting the visibility of the "Max number of tokens" option, at least in advanced. Junk processing is very memory intensive, and the current default value of 100,000 was set to make sure that the memory usage by junk is small compared to the rest of TB. But I don't believe that the current default value is really sufficient for effective junk processing. However, increasing it will cause a noticable (though not unreasonable) increase in the memory usage of TB. My own experience is that a better value is 300,000 and I will be recommending at some point to people who are seriously relying on the bayes filter that they change it. I would really rather that not be a hidden preference, as that makes it seem like a dangerous setting.
I see.  I think if this option were translated into memory usage I would be more inclined to agree to including it.  Perhaps there are ways to soften it up at least with a combo box that gives low, high, and recommended values?  As it is now I feel like I'd need to have your comment inlined in the prefs to explain what that number would do if they changed it. :)  Any ideas?
For reference, the original data and discussion used to set the token limit was in bug 437098, which referenced a graph at http://wiki.mozilla.org/User:Rkentjames:Bug228675

I think that the idea of a low, medium, and high value for the memory usage is a good one. The values that I would recommend for low, medium, and high are 50,000/150,000/450,000 maximum tokens. I would set the default at medium. At the medium value, memory usage for junk analysis increases the overall memory footprint of TB by about 20%. When doing this preference, you will need to handle the case where the preference was previously set to a value that is not one of those three.
rkent, Sounds useful. 

I looked at the doc and bug - % footprint means nothing to me and I don't have a clear understanding of what types of usage one finds a larger value helping. Is the memory usage roughly the same as training.dat?  Is there an average of how that translates into # tokens?

Is a larger value be more relevant to someone who gets a ton of (varied) email, and who may be more cognizant of the implications of changing the setting?  

Where the current or proposed default value is sufficient or even too high - how would one know?  For example I have spamassasin as a front end and my 900k training.dat for 2 mail accounts is quite satisfactory. Perhaps it's even overkill.

Playing devil's advocate about exposing this - a less clued in user may increase the value with the expectation of improving spam recognition, and not get it because they aren't doing the proper training.

Is a slider too complex a presentation? 

Regardless, you should clearly state the expected impact/memory usage as the setting changes.
(In reply to comment #19)
> 
> I looked at the doc and bug - % footprint means nothing to me.

In bug 437098 I gave the correlation formula for some experiments that did:

(TB Memory) = 58.2 MB + (8.83E-5)(Total Counts)

so 100,000 tokens added 8.8 MB of memory - and the results should be roughly linear. Bug 437098 comment 1 also shows the relationship between training.dat size, token count, and memory use. There are also significant startup times involved in loading the larger training.dat files, though I did not measure that.

> I don't have
> a clear understanding of what types of usage one finds a larger value helping.

Neither do I. I think the general rule is the more, the better, so the issue largely comes down to how much memory and CPU you are willing to give.

> Is the memory usage roughly the same as training.dat?  Is there an average of
> how that translates into # tokens?

The memory usage is proportional to training.dat, so you would need a factor. See the data in bug 437098.

> 
> Is a larger value be more relevant to someone who gets a ton of (varied) email,
> and who may be more cognizant of the implications of changing the setting?  

I don't have any data to back this up, but my guess is that the larger values are probably not very valuable for someone who does not have a plan to train some good mail as well as junk. As I have complained before, the current UI provides users with no clue at all of what good messages need training. There is also confusion between the concepts of marking as junk/notjunk, and training as junk/notjunk.

> Where the current or proposed default value is sufficient or even too high -
> how would one know?  For example I have spamassasin as a front end and my 900k
> training.dat for 2 mail accounts is quite satisfactory. Perhaps it's even
> overkill.

I think you are getting at the correct concepts, that is if you are happy with your junk performance and overall responsiveness, leave it alone. If junk is a problem, increase it. If performance is a problem, decrease it.

This is after all an "advanced" setting, which means that we do not want users changing it unless they have put some effort into understanding it.

A point worth making is that increasing the limit has no effect until training occurs. Similarly, decreasing it will throw away some tokens that cannot be recovered. You can't just put it up and down and see an immediate effect. It might be useful to add a counter that shows the current number of tokens in use to help people. I could add an interface easily to the junk filter to return that.

SpamAssassin or some other server-based front end should hopefully be pretty standard for most people. We should consider that to be the normal configuration. I also have a spam assassin front end.  Our junk mail filter is capable of rejecting 80 - 95% of junk, which is not enough if you have a relatively stable, active email address that collects a lot of junk.
> 
> Playing devil's advocate about exposing this - a less clued in user may
> increase the value with the expectation of improving spam recognition, and not
> get it because they aren't doing the proper training.
>
Maybe the text to change it should also encourage more training - of both good and junk email.

> Is a slider too complex a presentation? 
>

I think that it is. 

> Regardless, you should clearly state the expected impact/memory usage as the
> setting changes.

We could, but it would only be a +/- 50% approximation.
Perhaps both junk and non-junk token counts be presented.
(In reply to comment #18)
> I think that the idea of a low, medium, and high value for the memory usage is
> a good one. The values that I would recommend for low, medium, and high are
> 50,000/150,000/450,000 maximum tokens. I would set the default at medium. At
> the medium value, memory usage for junk analysis increases the overall memory
> footprint of TB by about 20%. When doing this preference, you will need to
> handle the case where the preference was previously set to a value that is not
> one of those three.

Sorry for having no clue about this system.  Do we know how many tokens are being used currently by the persons profile?  Such that we could show them how many are currently being used beside this choice of how many do they want to cap it at.
The relevant idl for the junk information is nsIMsgFilterPlugin.idl  That interface does not currently have any method to return information about the training corpus, but the information is readily available at the C++ level, and I would be happy to add the necessary calls to the interface. I would add a call to return the total number of training tokens, and at the same time I would probably want to add a call to return the number of messages trained with both Junk and Good emails (though I would add a more generalized form that returned the number of trained "Pro" trait messages and "Anti" trait messages.) That information would be very useful, and I would certainly want to make it available in my JunQuilla extension.

Just ask and I can probably have a patch ready in a few hours. Getting it reviewed though is another story, but it will be pretty simple so should not cause much grief to the reviewer (who would be bienvenu or standard8).
Ok, this interface seems a little complex but I'm ok with what we have here.  I'm thinking of doing something along the lines of these pieces:

I'd like to offer a link to the mozilla knowledge base where, hopefully someone like you, would have written a small summary of what tokens are and what changing the values accomplishes.  Your graph would work really well there too.

--

You're currently using 100,000 tokens which consumes an estimated 5.8MB of memory.  [Learn more about junk tokens]

Maximum number of tokens: [ 150,000 = 8.8MB ( Recommended ) | v ]

.---------------------------------.
| 50,000 = 2.9MB ( Minimum )      |
| 150,000 = 8.8MB ( Recommended ) |
| 450,000 = 26.4MB ( Maximum )    |
| ( Custom... )                   |
'---------------------------------'

This last custom entry allows us to show a number of tokens that doesn't match the numbers we've recommended in this drop down.  We could display it in a similar manner when a value exists:

283,929 ( Custom...) 

If the custom entry is selected I think we could use a simple dialog for manually entering the number of tokens to use.  The input would default to the custom value.
+-----------------------------------+
|  Enter a custom number of tokens  |
|  [                              ] | 
|                 ( OK ) ( Cancel ) |
+-----------------------------------+

If a person chooses the max tokens to a number below the number of tokens they currently use we could use a dialog as a speed bump.

+-------------------------------------+
|  /!\ This will destroy your tokens  |
|                                     |
|  Changing the max from 150,000 to   |
|  50,000 will destroy 22,000 of your |
|  tokens.                            |
|                                     |
|                 ( OK ) ( Cancel )   |
+-------------------------------------+

I really wanted to offer more in this dialog but it was getting difficult.  Essentially offering 3 options to cap at the current value, continue with the change or cancel.  Something like this:
( Cap at Current Tokens ) ( Continue and discard tokens ) ( Cancel )

But I'm not really sure the cap at current is a use case that would be common, it makes the dialog difficult to design correctly and there is a work around to copy and paste your current values into the common dialog.

--

Let me know if this looks like a reasonable set of interactions.  If so I think we could continue and start iterating on it and the other pieces.  

I might work up most of the UI in this patch and then leave the tougher bits to you if that's ok.  So when you have the idl patch finished / in review you could wire it up to the UI patch.  We'll see how it goes...
I'm not sure that the extra work for the Custom form is justified, as the effect of the precise number of allowed tokens is very hard to see and measure. If someone really wanted to set this in a custom manner, which would be a real geek, they could always set the preference directly.

In the destroy dialog, I would tone it down a bit. "Destroy" is a strong word. "Prune" might be better, though I don't know if that translates well. Keep in mind that the process of automatically pruning tokens is a normal part now of the junk processing, so when you hit the limit and try to train some more, the number of tokens is silently cut in half ("pruned").

I see you want the changes to the idl, so I will work on that.
Depends on: 496453
I don't think we'd hold B3 for this, moving out.
Target Milestone: Thunderbird 3.0b3 → Thunderbird 3.0b4
As much as it would be great to get this for Tb3, I think if push came to shove, we could wait for 3.1.  Marking blocking-, wanted+.
Flags: wanted-thunderbird3+
Flags: blocking-thunderbird3-
Flags: blocking-thunderbird3+
Blocks: junktracker
Depends on: 531773
I shouldn't be the assignee for these bugs.  Filter against clarkbfilter to delete all these from your emails.
Assignee: clarkbw → nobody
This is a draft updated based on the previous patch. Many things have changed so it was in need to create a new patch which then could be applied to the repository.

What is included: removal of Junk mail tab on the Security pane and addition of this content into newly created Junk pane, new icon for Windows and Linux build (the Mac OX X will reuse -in this draft- Chat pane icon) and change in Junk section of Account settings so pressing the button will open this new pane.

OS X Preferences dialog panes uses CSS sprites for its icons, so this part would need some extra-work.

This is a draft so new work could be started from here, so it is not intended nor needed for review.

About Tokens
------------
That part could be implemented by adding a new informative label and a "Advanced..." button that would open a new sub-dialog allowing user to change those preferences.
No longer blocks: 487277
Depends on: 487277
See Also: → 752460
Depends on: 505530
Type: defect → enhancement

Can someone update and "unrot" this patch? Thank you.

Flags: needinfo?(clarkbw)

Bryan hasn't been active for many years, i.e. it's normally not helpful to randomly poke people in long dormant bug reports.
Maybe Alex can take a look in a few months.

Flags: needinfo?(clarkbw)
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: