Closed Bug 370141 Opened 17 years ago Closed 5 years ago

Scam detector body/content algorithm is overly simple

Categories

(Thunderbird :: Mail Window Front End, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: BenB, Unassigned)

References

(Blocks 1 open bug)

Details

Currently, the scam detector is excessively simple. It detects a whole lot of legitimate mail as scam. This has 3 implications:
- It's annoying during normal use
- It makes people listen less ("Crying wolf" problem)
- It prevents us from using more dramatic restrictions for the message
Also, there are quite a number of scams which are not detected.

A lot of mails, including some of my own mails and those from marketing companies, get marked as scam. For example, every mail with an IP address is considered scam. That's clearly excessive.


Basically, the algorithm for detection needs a complete rewrite.

Maybe something more along the lines of comparing link text with link address, and if they don't match, see whether the URL text is a valid URL, and if so, consider is scam. To trigger even less, the link could *additionally* be required to have an IP address as host.


A real problem are marketing mails which try to track links using external sites, e.g. http://registerclicks.trackercompany.com/track?companyid=7856785675&recipientid=435447476?url=http://www.playdoo.com/
This is really bad behaviour, I'm not sure whether we should care much. Even worse it is if the link *text* then says "http://www.playdoo.com", but luckily that's done with only maybe 10% of these links, and I wouldn't mind yelling for that. The marketeers should really stop that - they know people don't like it anyways.
Depends on: 320739
Blocks: 370138
No longer blocks: 370138
No longer depends on: 320739
Blocks: 370138
Depends on: 320739
Summary: Scam detector is really stupid → Scam detector is overly simple
Summary: Scam detector is overly simple → Scam detector algorithm is overly simple
No longer blocks: 370138
> more dramatic restrictions for the message

E.g. bug 370138
Blocks: 392804
related bug 347218
No longer blocks: 392804
Depends on: 370138
Assignee: mscott → nobody
and bug 320351 (learning what is a scam, similar to junk filter)
I am finding that my political mail is being marked as a scam, even though I get tons of nonpolitical scam male that is not! I've read that corporate/govt powers that be are encouraging this on email AND browsers to cut down on dissent and I hope you guys are not doing it!
Are you talking about scam or spam?
The scam detection is indeed pretty simple, it will mark mails as scams mainly if a mail has links that are shown to go to one site but that lead to another site.
Carol, if your mails are detected as scam, then maybe (just a guess, but common error) because you have links that have visible text like "http://propaganda.com", but actually go to "http://tracking.net/track?id=224&url=http://propaganda.com". This triggers the scam detector, and IMHO rightfully so - this is not covered by this bug, see initial description.
Solution is simple: Fix your mails so that the links don't appear to go somewhere else than they actually do. I.e. either remove the tracker (best) or change the visible link text (not target) to (in the example above) say "Propaganda website" instead of "http://propaganda.com".
Thanks! That makes sense. And will forward the info to anyone else I run into on various lists who makes same complaint/theorizing!
While this makes sense, and while admitting that it's overly simple, it is still a bother that I can have someone in my personal address book AND have authorized loading of images from that sender, but still get those same emails marked as "possible scam" every single day and receipt.  I think if the sender is in my personal address book and I've authorized the loading of images from that sender, the scam processing should be skipped.  Just my opinion.
Dennis, the From is not verified. The scammer may well send you mail "From: news@facebook.com" yet the body has a scammer link. You'd make the whole scam protection useless with that feature. Forget From, assume it's forged.
And, BTW, it's offtopic here in this bug. This bug is about the algorithm in the body.
Maybe a local whitelist could be included in the algorighm and used?  Or (shudder) some variant of "I've already clicked Not a Scam over 500 times" so it is good to go?
Dennis, I said this is offtopic here. Please file a new bug.
Summary: Scam detector algorithm is overly simple → Scam detector body/content algorithm is overly simple
Blocks: mail-scam
(In reply to Ben Bucksch (:BenB) from comment #0)
> Maybe something more along the lines of comparing link text with link
> address, and if they don't match, see whether the URL text is a valid URL,
> and if so, consider is scam.

At the time of this comment, the above is what the phishing detector does.

> A real problem are marketing mails which try to track links using external
> sites, e.g.
> http://registerclicks.trackercompany.com/
> track?companyid=7856785675&recipientid=435447476?url=http://www.playdoo.com/
> This is really bad behaviour, I'm not sure whether we should care much. Even
> worse it is if the link *text* then says "http://www.playdoo.com", but
> luckily that's done with only maybe 10% of these links, and I wouldn't mind
> yelling for that. The marketeers should really stop that - they know people
> don't like it anyways.

Unfortunately, links like that seem to be happening more and more with "legitimate" services (e.g. Facebook). We should probably consider some ways to avoid warning about those, or at least reduce the severity of the warning. If we could parse the tracking URL to see where it redirects and then check that against the link text, we could still keep things safe without annoying people quite as much.
> http://registerclicks.trackercompany.com/

We should warn about these, but not as "phishing", but "tracking"
> If we could parse the tracking URL to see where it redirects and then check that against the link text

If you mean: "intended target appears anywhere in the URL", then that's not a safe check, because it would be trivial for a phisher to make that happen. If anything, we need to query the URL and check the actual redirect target. But at that moment, we have already triggered the tracking and might just as well not warn at all. The only solution I can see is a hardcoded list of known tracking domains.
Tracking was explicitly out of scope here (see comment 0), so I filed bug 938902 about it.
> For example, every mail with an IP address is considered scam. That's clearly excessive.

I don't think this is fixed, is it?
(In reply to Ben Bucksch (:BenB) from comment #19)
> > For example, every mail with an IP address is considered scam. That's clearly excessive.
> 
> I don't think this is fixed, is it?

Anything with an IP address as the URL has to have matching link text. There's a bug where we trigger the phishing alert if the URL and the text match *except* for a trailing "/", but that's being fixed in bug 937265.
Depends on: 937265

I think it's reasonable for IPs to trigger the scam warning. But if you don't like it there's the mail.phishing.detection.ipaddresses that you can use to turn it off.

With bug 1476428 done, I don't see anything concrete to do left in this bug.

Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.