Closed Bug 320351 Opened 19 years ago Closed 5 years ago

Scam detector should allow a user to train it so duplicate/similiar emails are not marked as a scam.

Categories

(Thunderbird :: Security, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: mikel, Unassigned)

References

(Blocks 2 open bugs, )

Details

(Whiteboard: [gs])

Newsletters from some companies are repeatedly flagged as a Scam attempt.  There is no indication of why, and no apparent way to prevent this happening for every future newsletter.  I find this rather annoying.

It would be nice if there were some documentation of what constituted a scam (other than the source code).

It would also be nice if Thunderbird eventually learned that a certain scam-like quality is not a scam (e.g. the token that caused the alert could be added to a database of non-scam tokens similar to Bayesian classification) or at least build a reputation for a certain sender and after some time learn that these messages are not scams.
It affects all versions and is on the trunk too.  It is a problem.
Severity: enhancement → normal
OS: Windows XP → All
Hardware: PC → All
Version: 1.5 → Trunk
*** Bug 320352 has been marked as a duplicate of this bug. ***
*** Bug 320352 has been marked as a duplicate of this bug. ***
Related to bug 318916?
I receive a lot of HTML-based newsletters, and the latest release of T-bird is marking all of them as possible scams. I click on the "NOT A SCAM" button, and if I re-open that particular e-mail, then it won't come up again as a possible scam. However, if I open another e-mail from the same sender (or a similar sender at the same domain), then T-bird thinks it's a scam. I would like to think that a quick work-around to this problem would be to create a whitelist, inasmuch the same way that the junk mail whitelist works. Since all of my mail runs through filters, I could just add a selection that "message is NOT a scam if address [is listed in [PERSONAL] Address Book] [contains xyz.com domain] [etc.]"

Thanks
I have found this flaw with "scam flags" to be a real problem.  Not only are perfectly fine messages flagged each and every time they are sent from the same source no matter how many times I tell the program it is not a scam, but the same message is flagged upon subsequent sessions if still in the inbox. And messages that are almost certainly a scam are never flagged - penny stock deals for example!  There does not seem to be any learning curve nor the ability to specifically exclude addresses by creating lists and parameters like with the normal message filters for spam.  While this could be a great feature, this filter appears to to have no value as it presently stands and I would like to see the ability to simply turn it off until it is improved.

I wonder if the same process that seems to learn what is spam could simply be applied to scams as well?  A button to confirm it is a scam (besides the existing one to deny it is a scam) could at least let the user mark that address for filtering which would help a lot.

H. Beitman
*** Bug 330039 has been marked as a duplicate of this bug. ***
I have a livejournal account, and it is set to send me an email when
someone comments on a journal entry of mine. The mail comes from
lj_notify@livejournal.com, and it contains a reply form.

"lj_notify@livejournal.com" is in my address book,
therefore messages from it should not be marked as scams. But they are,
every last one. Worse, no matter how many messages I mark as "not a scam", new messages are marked as scams all over again. There should be a way
around this. A whitelist or learning procedure is needed. 

Under related Bug#: 320352 It states that the detection algorithm tests
"if an e-mail contains a form element, then assume the message is a
phishing attack. Legitimate sites should not be using forms inside of
e-mail."

Livejournal is a legitimate site, and it uses forms in these emails in ways
that are not a phishing attack. Phishing implies deception, and forms
are not always deceptive. The code's assumptions are therefore
wrong.
this bug would really get fixed best by Bug 328749 which would allow us to integrate with online phising url black lists.
I whole-heartedly agree with this.  I have turned of this feature in Thunderbird because I have a number of subscriptions that Thunderbird deemed to be scam, but actually weren't and even though I informed Thunderbird they were not scams, the next time these e-mails came in, they were again determined to be scam.

As it works now, the feature in not useful to me and despite the fact that this is unrealated to functionality of the Thunderbird program, the way it now works is annoying and if you choose to disable it, like I have, you lose the functionality of a good feature *if it worked properly* and ergo, I think the severity level should be increased.
I have been downloading nightly builds every few days and testing them to see how the scam warning has changed.  I was pleased to see the ability to add the e-mail address to the "collected addresses" and have the program remember the sender and not bug me each time the same party sent me an e-mail.

However, the program continues to warn me that it thinks the e-mail is a scam just like before, no matter the sender's status in my "collected addresses."  Why do we need the scam warning in addition to the option to add the address to the address book?  If you have the address in your address book, the e-mail should just open fully and no warning should be needed.  

I'd drop the "SeaMonkey thinks this may be scam" warning and the "NOT A SCAM" button since they really do nothing that appears to be helpful.  If they somehow put the address in a file of known scam sources that would filter further e-mails from that address (just like spam filters) that would be worthwhile, but short of that I'd just eliminate the scam button.
Thunderbird has a 97% (my guesstimate) false positive rate on scam detection AND no way to "whitelist" sites.

It's not unusual for well known sites to feed some html from other domains. I assume that's why Thunderbird thinks said emails are scams.

[A version of this also posted to 392804]

I don't see why it's a good policy not to have the scam-marking function learn
from your clicking on "Not a Scam"; otherwise, what purpose does it serve? The
fact that it doesn't learn makes me ignore it, and that reduces the potential
effectiveness of the marking. I'm even getting the flag on email that I
create myself; what's that all about? Am I being identified as a scammer?

Every day headlines come to me from the Boston Globe, and every day they're marked as a possible scam. Is the Boston Globe really perpetrating a lot of scams? Get a clue, or give us some way to turn it off.

I assume that this function is in its early stage of development, and am
hoping that sometime soon these issues will be addressed. As it stands now, the
scam-marking function is not much more than a constant source of irritation.
I'm using v2.0.0.12(20080213)

As of this version Thunderbird is *still* marking mails as scams despite being repeatedly told they are not.  I would echo all of the above sentiments that it *needs* a learning capability.  As of now I have turned it off as I'm fed up of telling it the same senders/mails are *not* scams.

I've just read a forum where the majority of people suggest turning it off, which begs the question why have it at all?
I too found myself annoyed that Thunderbird has no way of learning which certain daily HTML newsletters I subscribe to are NOT scams (and ALWAYS warning me before I opened links they contained), and finally I just turned scam detection off.

It would be nice to have a option in the addressbook similar to the "Allow remote images images" checkbox whereby one could indicate that a given sender is NOT sending scams.

Currently using version v2.0.0.16.
Assignee: mscott → nobody
I am also currently using v2.0.0.16.  I've been a Thunderbird user since 1.0, and don't remember when this scam warning started -- but it's been an irritation since day 1.  

PLEASE!  Either make the thing learn from user reactions, or remove the 'feature' altogether.
At the risk of spamming this bug: I received a plain text message containing a single URL (http://www.cs.berkeley.edu/~bh/ss-toc2.html) and "Thunderbird thinks this message might be an email scam."
(In reply to comment #19)
> At the risk of spamming this bug: I received a plain text message containing a
> single URL (http://www.cs.berkeley.edu/~bh/ss-toc2.html) and "Thunderbird
> thinks this message might be an email scam."

There is a reason I called this "feature" worse than useless on another bug report, so don't worry about spamming this thread. You are absolutely right.
Thunderbird is convinced that Digg is perpetrating a scam. This is a nuisance. Confirm comments above, especially the need for a white list.
"This message may be a scam" is attached to several messages I receive every day.  The types of messages are political, updates from sources i regularly shop at, and once in a while friends.  There needs to be a way to permanently identify known safe messages as NOT a scam.  They come through the server with no problem.
(In reply to comment #24)
> The types of messages are political, updates from sources i regularly
> shop at, and once in a while friends.

See comment #8 for one possible reason for this warning; another reason I often see in my own mail is a link that looks like a straightforward URL, but actually goes to another URL first -- like http://www.example.com/cool-article actually is set up to send me first to http://link-tracking.privacy-invading.net/track/everything or something similar before redirecting to the original link.
Obviously, these are usually false positives, so this should be fixed, but they could in fact go somewhere else entirely.

However, the thing preventing this bug from moving forward is not so much lack of annoyed users as that no one is available/volunteering to work on it. And for some reason, volunteers are scarcer when they have to read a lot of upset comments before they're able to start work on the bug. So it might be best to leave well enough alone, or at most try to find someone to work on it. (Or, of course, dig through the source code to figure out how the fix might go, and post any observations here.)
Component: General → Message Reader UI
QA Contact: general → message-reader
(In reply to comment #24)
> "This message may be a scam" is attached to several messages I receive every
> day.  The types of messages are political, updates from sources i regularly
> shop at, and once in a while friends.  There needs to be a way to permanently
> identify known safe messages as NOT a scam.  They come through the server with
> no problem.
In a recent thread, Ron K wrote:

>Off hand I don't know if this has a bug report or not. However I can say
>that it has been discussed in another group to find a means to adopt a
>notifier more like the one used by Firefox. As with many other things
>wanted for Tbird, manpower shortfall is what keeps many enhancements from
>being realized.

>I suggest writing the enhancement as an address book integration that
>performs a lookup there as a whitelisting resource. That approach may fit
>with some planned work that is targeted for later in the Tbird 3
>development cycle.

>-- 
>Ron K.
>Who is General Failure, and why is he searching my HDD?
>Kernel Restore reported Major Error used BSOD to msg the enemy! 

I wish I were capable of helping to fix this.  Perhaps the white list function that Ron suggests could be the poor man's (or resource-strapped) solution.

Peter Clough
Component: Message Reader UI → Security
QA Contact: message-reader → thunderbird
Bug #623198 is a duplicate of this bug.
I'm the reporter of bug #623198. I created it to argue for the disabling by default of the spam detection functionality.

My argument is that being so unreliable it actually makes the net safety of Thunderbird user worse, not better. It almost exclusively generates false positives. Before I turned it off I got several warnings a day, and *never* has one been an actual scam. In the mean time many actual scams were not flagged at all.

So, many savvy users will turn it off, in which case it contributes nothing to their safety. People who leave it on will quickly learn to ignore it (the "boy who cried wolf" syndrome), in which it again contributes nothing to their safety. Or they may trust it too much and not be wary enough of the emails not flagged by Thunderbird. I don't see a situation in which the safety of the user is actually increased by the scam filter.

Furthermore, this problem is so hard that I don't see how the algorithm could ever be improved enough to become useful. It's the false positive paradox. The false positive rate may be relatively low, but because there are so many legitimate emails compared to scam emails, the amount of false positives will still be greater than the amount of true positives. The only way to improve the situation is to alter the algorithm so that it has an *extremely* low false positive rate, but also a low enough false negative rate to still catch (some) actual scams. Given the difficulty of detecting the intentions of the sender of an email, I don't see how that could ever be achieved...

So, given that the scam detection currently, in my opinion, does not contribute to the safety of the users (and may actually decrease it), and there is little hope of improving it much any time soon, I think it should be turned off by default.
(In reply to comment #30)
> So, given that the scam detection currently, in my opinion, does not contribute
> to the safety of the users (and may actually decrease it), and there is little
> hope of improving it much any time soon, I think it should be turned off by
> default.

I agree, I have just turned it off in 3.1.9 and wish I had turned it off years ago.  It has never generated anything but false positives for me, for newsletters that I receive every week.

If nobody is volunteering to work on it then turn it off by default until someone does volunteer.
Blocks: 623198
I've confirmed bug 623198 for switching the default per previous comments, so let's keep that discussion over there. It would be an interim solution until the algorithm has been redesigned to be actually useful again, at which time it can be activated by default again.
Just want to add my vote to turn it off by default, please!
Some thoughts on a possible implementation of a whitelist feature:

(In reply to comment #5 and others)
> However, if I open another e-mail from the same sender (or a similar
> sender at the same domain), then T-bird thinks it's a scam. I would like to
> think that a quick work-around to this problem would be to create a whitelist,

While in general keeping a whitelist to identify "trustworthy" mails is a good idea, basing it on the sender alone wouldn't work. Let's say you get regular mails from "news@yourbank.com" which contains a text "http://bank.example.com" with an underlying link "http://promo.counter.something" and thus trigger the scam warning. If you whitelist "news@yourbank.com" a scam artist may come and abuse that well-known e-mail, replacing "http://promo.counter.something" with "http://phishme.example.info" in an otherwise identical message. Now, the scam warning would stay quiet even though it is an actual scam with the potential to get your credentials as you consider "news@yourbank.com" trustworthy, right?

(In reply to comment #12)
> Thunderbird has a 97% (my guesstimate) false positive rate on scam detection
> AND no way to "whitelist" sites.

Sounds almost identical, but whitelisting sites rather than e-mail addresses would be better. Thus, if you associate the address "news@yourbank.com" with "http://promo.counter.something" as a trustworthy domain used by that sender, the problem would be solved and "http://phishme.example.info" now flagged. Maybe even just whitelisting domains as such might do the job already, but the question always is if you would consider "promo.counter.something" to include "else.counter.something" and to which level, which may cause issues with some countries having two top levels (e.g., ".co.uk") and you don't want to overdo it by whitelisting most of a specific country.
I agree - white-listing sites is not the answer.  The quickest, easiest and most effective answer is change the default setting to "Off".  That puts us in a much better place, because the current functionality is in fact worse than nothing.
Steve, tsuchan - please read comment #32, this is now a different bug. I'm trying to be constructive here towards a long-term solution, whereas switching it off by default is just an interim fix...

> I've confirmed bug 623198 for switching the default per previous comments, so
> let's keep that discussion over there. It would be an interim solution until
> the algorithm has been redesigned to be actually useful again, at which time it
> can be activated by default again.
(In reply to comment #35)
> I agree - white-listing sites is not the answer.

Just for clarification as you may have misunderstood my comment #34:
White-listing *senders* isn't useful given that they likely are spoofed,
whereas white-listing *sites* would indeed help solving part of the issue.
Blocks: mail-scam
This bug issue has been open for five and a half YEARS, and yet it's still assigned to NOBODY.  Is anybody even LISTENING?  

White-listing sites is an excellent suggestion.  But if the manpower isn't there to do anything else, then AT LEAST set the default for this 'feature' to OFF.
(In reply to comment #39)
> This bug issue has been open for five and a half YEARS, and yet it's still
> assigned to NOBODY.  Is anybody even LISTENING?  
> 
> White-listing sites is an excellent suggestion.  But if the manpower isn't
> there to do anything else, then AT LEAST set the default for this 'feature'
> to OFF.

See https://bugzilla.mozilla.org/show_bug.cgi?id=623198
(In reply to comment #40)
> See https://bugzilla.mozilla.org/show_bug.cgi?id=623198

Thanks, A.P.  I've added a vote and comment there as well, now.
No longer blocks: 623198
It would be good to prioritise this more as it is triggered on notifications from Facebook and other high volume alert senders. As is commented above, if it is in the "WONT FIX" list then I concur the default should be off.
Blocks: 682715
Whiteboard: [gs]
This issue seems to have been around for a long time. It is still causing problems, but in my case it is that my own replies get marked as a scam. The filter has caught several legitimate scam emails, but for my own to be marked as such is quite disturbing. I can dismiss the alert myself, but I more worried that the recipient is seeing the same alert. I have checked the content, but I cannot find anything that could possibly trigger the filter.
(In reply to Steve from comment #43)
> This issue seems to have been around for a long time. It is still causing
> problems, but in my case it is that my own replies get marked as a scam. The
> filter has caught several legitimate scam emails, but for my own to be
> marked as such is quite disturbing. I can dismiss the alert myself, but I
> more worried that the recipient is seeing the same alert. I have checked the
> content, but I cannot find anything that could possibly trigger the filter.

One question is: how should the filter tell apart the mail which you actually sent from the mail just pretending to be from you? Of the spam messages I get, a large number has my own email address on the From: line.

When a message comes in marked as scam, which actually isn't, don't forget to click "Not a Scam" on the right side of the headers in the Preview pane or in the message-read window, or else to hit Shift+P. This ought to teach the filters.
(In reply to Tony Mechelynck [:tonymec] from comment #44)
> This ought to teach the filters.

Unfortunately, the scam-filter is incapable of learning, so teaching it anything is wasted effort. That is the core of the problem.
(In reply to Tony Mechelynck [:tonymec] from comment #44)
> One question is: how should the filter tell apart the mail which you
> actually sent from the mail just pretending to be from you? Of the spam
> messages I get, a large number has my own email address on the From: line.

You are confusing the scam detector with the junk-filtering system. The scam detection only cares about /links/ in the message received, regardless of where it came from. Only if those trigger one of the given rules the scam warning appears. The "learning" in this case would be a white-listing of all link-target domains of a message that's considered authentic and where no phishy links are present by the assessment of the user.
(In reply to Tony Mechelynck [:tonymec] from comment #44)
> One question is: how should the filter tell apart the mail which you
> actually sent from the mail just pretending to be from you? Of the spam
> messages I get, a large number has my own email address on the From: line.
> 
A message that has truly originated from you should have a corresponding sent mail, so any spoofed messages would be caught where none exist in the SENT folder. The same goes for replied messages - which is the problem I'm having.

> When a message comes in marked as scam, which actually isn't, don't forget
> to click "Not a Scam" on the right side of the headers in the Preview pane
> or in the message-read window, or else to hit Shift+P. This ought to teach
> the filters.
This apparently does nothing more than remove the warning from that message.
(In reply to Steve from comment #47)
> This apparently does nothing more than remove the warning from that message.

Correct - at this time, "Not a Scam" simply sets a flag in the message to disable the scam warning for this specific message (and only this message).
I'd just like to remind everyone about the existence of bug 623198, in which I request the scam detection be turned off by default, arguing that it performs so bad (with little prospect of improvement due to the fundamental difficulty of the problem) that the net effect on users' security is negative. If you agree, please vote on that bug.
I still have not found any value in the scam warnings.  It seldom finds what appears to be true scams and often flags legit messages (over and over).  It is a wonderful idea that could be very helpful if it worked properly.  However, since there appears to be no progress in the benefit of this feature, I agree that it should be off by default until these issues are solved.  Could this become an add-on that users could decide they want or don't want?
I've wanted such an add-on for years.  See bug 398875, comments 10 and 11.
Blocks: 849694
+1 for this. The scam detector rarely seems to detect genuine scams, but persistently marks one regular email newsletter which I signed up for as a scam, even though the address it comes from is in my address book. The scam detector is effectively useless and it probably does more harm than good.
(In reply to tmw from comment #52)
> The scam detector is effectively useless and it probably does more harm than good.

You can scratch that "probably" and replace it with "certainly".
I agree with all of you on the desirability of this fix -- but I suggest marking this as a dup of 398875 and raising that bug from the dead.
(In reply to John David Galt from comment #54)
> I agree with all of you on the desirability of this fix -- but I suggest
> marking this as a dup of 398875 and raising that bug from the dead.

Judging by the higher number, that bug is a dup of this one.
I receive an email newsletter from the Rocky Mountain Synod of the Evangelical Lutheran Church in America, and it is ALWAYS flagged as a possible scam.  They are using Constant Contact to send these emails, which looks scammish to TB, but is definitely not in this case.  A way to whitelist a particular newsletter would be most welcome, especially after 10 YEARS of this bug being open.

Kind Regards, 
c.
(In reply to Chris from comment #56)
> I receive an email newsletter from the Rocky Mountain Synod of the
> Evangelical Lutheran Church in America, and it is ALWAYS flagged as a
> possible scam.  They are using Constant Contact to send these emails, which
> looks scammish to TB, but is definitely not in this case.  A way to
> whitelist a particular newsletter would be most welcome, especially after 10
> YEARS of this bug being open.
> 
> Kind Regards, 
> c.

Just turn the scam detection off, it hasn't ever worked properly and has been worth than useless ever since the introduction more than ten years ago.
In reply to comment #34:

> While in general keeping a whitelist to identify "trustworthy" mails is a good idea, basing
> it on the sender alone wouldn't work. Let's say you get regular mails from
> "news@yourbank.com" which contains a text "http://bank.example.com" with an underlying link
> "http://promo.counter.something" and thus trigger the scam warning. If you whitelist
> "news@yourbank.com" a scam artist may come and abuse that well-known e-mail, replacing
> "http://promo.counter.something" with "http://phishme.example.info" in an otherwise
> identical message. Now, the scam warning would stay quiet even though it is an actual scam
> with the potential to get your credentials as you consider "news@yourbank.com" trustworthy,
> right?

It seems obvious to me that the solution to this is to whitelist, not the From address
"news@yourbank.com", but the actual destination domain, "http://promo.counter.something".

Aside: Bugzilla won't allow me to make bug 398875 a dup of this one.
(In reply to John David Galt from comment #58)
> In reply to comment #34:
> 
> > While in general keeping a whitelist to identify "trustworthy" mails is a good idea, basing
> > it on the sender alone wouldn't work. Let's say you get regular mails from
> > "news@yourbank.com" which contains a text "http://bank.example.com" with an underlying link
> > "http://promo.counter.something" and thus trigger the scam warning. If you whitelist
> > "news@yourbank.com" a scam artist may come and abuse that well-known e-mail, replacing
> > "http://promo.counter.something" with "http://phishme.example.info" in an otherwise
> > identical message. Now, the scam warning would stay quiet even though it is an actual scam
> > with the potential to get your credentials as you consider "news@yourbank.com" trustworthy,
> > right?
> 
> It seems obvious to me that the solution to this is to whitelist, not the
> From address
> "news@yourbank.com", but the actual destination domain,
> "http://promo.counter.something".
> 
> Aside: Bugzilla won't allow me to make bug 398875 a dup of this one.

Did you really need to reply to a post of 4 years and 8 months ago?
It's still an important problem.  And of course for a large part of that time, the presumptuous nanny who wrote bug 398875 comment 11 was preventing a fix.
"Learning" what is or isn't a scam doesn't make sense. We mark messages as scams when they contain suspicious links. Usually, messages trip our scam detector when they contain a link whose text is a URL, but the link itself points to a *different* URL. This is suspicious at best, and commonly used as a way to track individual users (by redirecting them through a tracking URL).

Probably the best way to resolve the underlying issue (that the scam detector has too many false positives) is to fix bug 938902. Another thing we could do would be to change the wording of the scam detector to make it clear why we marked a message as a scam and to give users a better idea of how severe the warning is; this is bug 324820.
Mozilla's definition of a scam is a simplistic notion of a link to site A disguised as site B.  The major problem with that is that it always "detects" as scams some very common and desired messages such as (in my case) political appeals that send the reader to capwiz.com or salsalabs.com.  So long as we are not allowed to whitelist those sites, the only answer is to disable scam detection.  That's a stupid design.
Those sites might be whitelisted as part of bug 938902; however, even so, those links are very likely a privacy issue, so I think it makes sense to warn the user beforehand. My only real issue is that the wording in the scam detector is overly vague. It should say something more like, "Be careful! Thunderbird detected suspicious links in this message," possibly with a link to learn more. I don't see much issue with us providing a short warning about the links in a message, so long as we're not overstating things (we really have no idea if a message is a true "scam" or not).

Since Thunderbird doesn't analyze links (yet), we can't say whether a link is "safe" or not; even if it's from a site that might generally be recognized as safe, many of them (e.g. Facebook) contain redirectors that can send you off to another site. I don't think most users are in a position to make this decision either; even if you trust a particular site, the presence of a redirector on that domain could let actual scammers disguise their links. I doubt most users of a site can say with confidence whether that site has a redirector.

Ideally, bug 938902 will let us strip tracking from links and then we can just warn people in the cases where we can't protect their privacy. I could also see us adding hooks for add-ons to add support for more domains so that we can verify that links are actually going where the text says they will.
While it would be nice to have Thunderbird identify all scams, etc. correctly, we are left with the terrible choice of a high error rate (false positives) or no identification at all (turning it off).  I would much prefer an imperfect system that I can "teach" by whitelisting sites that I do not consider scammers, risking some errors if those sites are used inappropriately, and still have the benefit of some screening of actual scams.  A common database from sites that users indicate are the source of scams might strengthen the system.  The Ghostery addon apparently does something like this.  Its controls would, in my opinion, work well - 1) pause checking or 2) whitelist the address.  That way users would have the option to benefit from scanning for scams, but not be constantly pestered by the same flag every time they get a recurring newsletter, etc.
The bayes token analyzer was extended a few years ago to allow other traits beyond "Junk" to be processed at the same time as the Junk calculations. It would not be difficult in the backend to add some sort of Scam bayes analysis as well. The issues are all UI, that is the user has to have some mechanism to train some messages as Scams, and some as Not Scams.
I'm not sure that training will actually help much in this space, since it assumes that the traits which would imply "scam" (but not "junk") are predictable. It might be possible to run links through the safebrowsing API as a way of determining their safety, but I still think the basic principle of showing a message for "links whose text is a URL, but not the URL the link points to" is the right move. However, I don't think we should imply that all links of that form are scams. They're not. They're usually just trying to track who clicked the link.
Thunderbird maintainers, please listen to H Beitman!

(In reply to H Beitman from comment #64)
> While it would be nice to have Thunderbird identify all scams, etc.
> correctly, we are left with the terrible choice of a high error rate (false
> positives) or no identification at all (turning it off).

He is absolutely right, the current functionality is totally useless and I believe it is either switched off by those who know how to, or ignored by those who don't.

Something that works imperfectly (as proposed by H Beitman) would be way better than the current functionality.
Bug 778611 covers the Safe Browsing API. This would be a great way for us to have some idea of the severity of a phishing attempt. Then, we could have a few levels of phishing detection:

* Totally disabled
* Enabled only for known-evil links
* Enabled for known-evil links + raw IP addresses
* Enabled for all suspicious links
I would love to see added functionality here, but I do have to say... of the emails that I get a false flag on, I can point out that there should be some responsibility on the sender. Having "link to site A disguised as site B", in my opinion, can and should be avoided by the sender of a legitimate email. If the email is truly not a scam, there should be no issues with the sender following best practices in content as well as SPF, DKIM, DMARC, etc.
"Best practices" are BS if they imply that political committees shouldn't be allowed to use aggregating sites such as capwiz.com.  They are huge labor-savers, and there is no real added risk if recipients are expecting their use.  Which is why we should be allowed to whitelist such links by domain name.
Yes, for your specific example, whitelisting is a great idea, and to me as well would be welcomed. I am looking at the larger scope, not just my interests or your interests. If the "BS best practices" as you say can fix even 100 out of 1000, that is great, is it not? I would not call sender responsibility "BS" because it does not solve your specific example.

My point is simply that filters are great, but it is also a "labor-saver" if senders do their part as well. Not only do they pass this filter, but they also help prevent spam as a whole, which is one of the main points of best practices.
"Political committees" shouldn't get any special treatment; we should warn users that the email they're viewing contains misleading links. I do think we should change the text of the warning to be clearer to users that we're (usually) just telling them that a link doesn't go where it says it will, *but* I also think we should be warning by default in *more* instances, not fewer. We should warn any time we can detect that an email is sending users through a unique tracking URL and ideally allow them to avoid the tracking (e.g. by figuring out where the tracking link redirects to and sending users there directly).

If an organization would like their messages to be treated as trustworthy, they shouldn't be tracking users' behaviors without their explicit consent.
So, everything sent by MailChimp or other e-mail newsletter providers (that also track link clickthrough rates) is going to get this vague warning?  People are just going to get habituated to it and start ignoring it altogether.  If that's going to be the policy, the warning should be very specific, e.g. "All links in this message go to/through ____" or "this message contains an address at ___ but the link actually points to ___" or something like that.
There are other bugs pending for all those things. Please follow the meta-bug 654502 dependency list:
https://bugzilla.mozilla.org/showdependencytree.cgi?id=654502&hide_resolved=1
(In reply to Peter C. Frank from comment #5)
> I receive a lot of HTML-based newsletters, and the latest release of T-bird
> is marking all of them as possible scams. I click on the "NOT A SCAM"
> button, and if I re-open that particular e-mail, then it won't come up again
> as a possible scam. However, if I open another e-mail from the same sender
> (or a similar sender at the same domain), then T-bird thinks it's a scam. I
> would like to think that a quick work-around to this problem would be to
> create a whitelist, inasmuch the same way that the junk mail whitelist
> works. Since all of my mail runs through filters, I could just add a
> selection that "message is NOT a scam if address [is listed in [PERSONAL]
> Address Book] [contains xyz.com domain] [etc.]"
> 
> Thanks

I have the same problem, I normally just click the Not a scam button on every newsletter.
I understand that there could be problems with learning on these kinds of messages but one option would be to allow whitelisting of DKIM signatures.

If we choose Not Scam for an email that email should get a flag so it does not get listed as possible scam and if the email contains a valid DKIM signature thunderbird should be able to whitelist the signature and thereby be able to remember which signatures not to flag as scam.

The fact that it flags more and more legitimate is really making the warning useless.

I almost never see an email flagged as possible scam that is not already flagged as Spam anyway so for me false positives outnumber correct positives 10 to 1 at least.
This "bug" is 12 years old and still people are not happy with the way the scam detection mechanism works. I am one of those that have nearly 100% false positive scam detection on messages that are not marked as spam with GMail and TB spam filters.

I know there is a "This is not SCAM" button. But the client does not learn anything from pressing it. I have a recently had an order from certain on-line shop that was marked as scam. As well as were all the communication with their customer support team. It was not nice checking if all the messages were legit and marking them as not scam, especially when they asked for some personal information.

Perhaps this feature needs some love from the developers? Or at least a white-list/black-list learning system?

As for recommending the users to turn the scam detection feature off: if you implemented the feature that is of security benefit to users, than that is not really a real recommendation, is it?
I am about ready to disable the scam detection.  It adds no value, and is merely an irritant in its current form.  Whitelisting would at least let me get past the message once instead of clicking the not a scam button every time.  A true learning system would be even better.
(In reply to Luka from comment #82)
> This "bug" is 12 years old and still people are not happy with the way the
> scam detection mechanism doesn't work.

FTFY.

(In reply to Jerry Quinn from comment #83)
> I am about ready to disable the scam detection.  It adds no value, and is
> merely an irritant in its current form.  Whitelisting would at least let me
> get past the message once instead of clicking the not a scam button every
> time.  A true learning system would be even better.

Just disable it, that will save you a world of trouble.
Summary: should learn what is not a scam → Scam detector should allow a user to train it so duplicate/similiar emails are not marked as a scam.

With bug 1476428 now fixed, I think there is very little to do here. That should basically cover the initial problem.
The premise of this bug is false though. You can't "train" what is a scam or not.

Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → WONTFIX

(In reply to Magnus Melin [:mkmelin] from comment #85)

With bug 1476428 now fixed, I think there is very little to do here. That should basically cover the initial problem.
The premise of this bug is false though. You can't "train" what is a scam or not.

Perhaps you can not train for it, but I think the reality is the users want to flag what they consider is not a scam. Almost all instances of users wanting the product to learn is they want a white list and don't now how to ask for it. Almost all the reports in support start with "Thunderbird thinks mail from XXXXX is a scam. How do I tell it it is not.This is especially the case with US political email. Folks get a little testy when their party of choice is considered a spam because of the tracking URL's in the email

Spot on @Matt, of course it could be trained. Deciding something is a scam is a heuristic or algorithm , and its almost always flawed thinking by developers to think their algorithm is always right. In this case the algorithm decided that links via common tracking sites were scams, when in the experience of many users, most of the time they weren't scams - for example many if not most company newsletters went back through certain tracking sites, which could - but is very rarely - a scam. It would be perfectly possible for TB to learn from user feedback that a message from domain a.a.a. with links to b.b.b. wasn't a scam. In the same way I can tell it allow remote content from certain email addresses.

So ... if you mean that you don't have time to fix this very common problem then fine - we know the TB is small and over-stretched to fix multi-year bugs, but please don't try and convince us that you "can't train what is a scam or not".

Magnus, Firefox has recently introduced a tracking cookie blocker where they have blocked some thousands of domains from dropping tracking cookies. What do you think of the idea of using that same database in the scam detection of determine known trackers and automatically white list them. That would reduce the number of false positives I am sure to quite low levels.

The json source is hosted here. https://github.com/mozilla-services/shavar-prod-lists

If you think it might be worth pursuing, I will file an appropriate bug.

Flags: needinfo?(mkmelin+mozilla)

I think there's a misunderstanding about the scam detection here.

First, scam is not the same as spam (though scam can be spam too). A scam is when you get an email trying to trick you into something, like following the link in the mail to update your bank details... If it's to be effective (for the scammer) it will look very similar to a real mail, but maybe the site name can be slightly different, but still similar enough for you to think it's legit. Therefore, you can't "train" such detection, or set whitelists - it would make the detection attempt completely pointless - you're not going to give details to the prince of Nigeria, but you're likely to give them to your mom (who you'd whitelist).

Second, the problem with wrong detections due to using of a tracker link instead is solved with bug 1476428 (this would also cover the complaints about political bias, of which there of course were none).

Let's not also mix in tracking (on the web) into the mix. We're not using any such database, and tracking is not scam. Either way, tracking requires remote content, for which we have the other warning bar. And, please try in Thunderbird 68, bug 1476428 will kick in and give you a choice if you get such a link.

Flags: needinfo?(mkmelin+mozilla)

Isn't one of the benefits of using the computer that it is very good at spotting differences in once char.
Actually a single char difference should be a big clue that it is spam or scam :)

Also, do not say it is impossible, we have already seen computers do things that was considered impossible just a few years ago.

It might be very difficult and result in an unacceptable number of false positive or negative but thats not the same as impossible.

As for how to determine a white list, could you use the full received list in the headers. If the sender, all receiver headers and any server and other origin pattern headers are the same you can be pretty sure the email was from the same sender.

Another option is, could thunderbird use the HEAD tag to check if it gets a redirect response matching the visible domain? That could also be a way to white list a link?

Detecting the one-char differences is already working.
Checking headers wouldn't do much good. You'd get many false positives.
As for checking HEAD - no need, since the code in 68 already handles it. Would also violate privacy.

Please before commenting further, please try out 68, and if you have any specific improvement requests after that, file specific bugs.

How do we update to 68 when it's not an update yet?

I don't want to have huge problems with TB using a BETA version.

Thanks

68 is already officially out. Auto-upgrades from 60.x are coming soon. You should be able to do a manual update directly:Just go to Help | About Thunderbird and it will download and install for you.

(In reply to Magnus Melin [:mkmelin] from comment #89)

First, scam is not the same as spam (though scam can be spam too). A scam is when you get an email trying to trick you into something, like following the link in the mail to update your bank details... If it's to be effective (for the scammer) it will look very similar to a real mail, but maybe the site name can be slightly different, but still similar enough for you to think it's legit. Therefore, you can't "train" such detection, or set whitelists - it would make the detection attempt completely pointless - you're not going to give details to the prince of Nigeria, but you're likely to give them to your mom (who you'd whitelist).

Yes, scam is not the same as spam, and that’s why failures in scam detection are even more problematic. DIsclaimer: I haven’t used Thunderbird since shortly after I submitted my original bug (so unfortunately I can no longer give more details) because Thunderbird’s false positive rate was way beyond my threshold of tolerance. When you get virtually no actual (or only easily spottable) scams but important emails keep getting flagged as “scam” the user is going to feel Thunderbird is a piece of garbage.

Right - and that's why it makes sense to have something that learns, Getting a single "scam" detect in a message is no big deal, getting them repeatedly on something you know is fine is the problem. Being able to do similar to the "Allow remote content" that then understood that this particular combination is not actually a scam, is exactly what was needed.

Once again, use 68. You'll find that is not a problem.

People on LInux LTS editions have only 60 or slightly newer.

It's pretty ridiculous that emails from Google calendar are marked as scam and warning is shown for google.com links (!)

(In reply to Daniel from comment #97)

People on LInux LTS editions have only 60 or slightly newer.

It's pretty ridiculous that emails from Google calendar are marked as scam and warning is shown for google.com links (!)

What version of Thunderbird is offered by your distribution is an issue you might want to address with the folks that manager the distribution, it has nothing to do with the Thunderbird release process. Or you could install the version offered on the Thunderbird.net web site and get the latest version.

You need to log in before you can comment on or make changes to this bug.