370141 - Scam detector body/content algorithm is overly simple

Reporter

Description

•

17 years ago

Currently, the scam detector is excessively simple. It detects a whole lot of legitimate mail as scam. This has 3 implications:
- It's annoying during normal use
- It makes people listen less ("Crying wolf" problem)
- It prevents us from using more dramatic restrictions for the message
Also, there are quite a number of scams which are not detected.

A lot of mails, including some of my own mails and those from marketing companies, get marked as scam. For example, every mail with an IP address is considered scam. That's clearly excessive.


Basically, the algorithm for detection needs a complete rewrite.

Maybe something more along the lines of comparing link text with link address, and if they don't match, see whether the URL text is a valid URL, and if so, consider is scam. To trigger even less, the link could *additionally* be required to have an IP address as host.


A real problem are marketing mails which try to track links using external sites, e.g. http://registerclicks.trackercompany.com/track?companyid=7856785675&recipientid=435447476?url=http://www.playdoo.com/
This is really bad behaviour, I'm not sure whether we should care much. Even worse it is if the link *text* then says "http://www.playdoo.com", but luckily that's done with only maybe 10% of these links, and I wouldn't mind yelling for that. The marketeers should really stop that - they know people don't like it anyways.

Ben Bucksch (:BenB)

Reporter

Updated

•

17 years ago

Depends on: 320739

Ben Bucksch (:BenB)

Reporter

Updated

•

17 years ago

Blocks: 370138

Tuukka Tolvanen (sp3000)

Updated

•

17 years ago

No longer blocks: 370138

No longer depends on: 320739

Tuukka Tolvanen (sp3000)

Updated

•

17 years ago

Blocks: 370138

Depends on: 320739

Ben Bucksch (:BenB)

Reporter

Updated

•

17 years ago

Summary: Scam detector is really stupid → Scam detector is overly simple

Ben Bucksch (:BenB)

Reporter

Updated

•

17 years ago

Summary: Scam detector is overly simple → Scam detector algorithm is overly simple

Ben Bucksch (:BenB)

Reporter

Updated

•

17 years ago

No longer blocks: 370138

Ben Bucksch (:BenB)

Reporter

Comment 1

•

17 years ago

> more dramatic restrictions for the message

E.g. bug 370138

Daniel Veditz [:dveditz]

Updated

•

17 years ago

Blocks: 392804

Wayne Mery (:wsmwk)

Comment 4

•

17 years ago

related bug 347218

No longer blocks: 392804

Depends on: 370138

Dan Mosedale (:dmosedale, :dmose)

Updated

•

16 years ago

Assignee: mscott → nobody

rsx11m

Comment 5

•

16 years ago

and bug 320351 (learning what is a scam, similar to junk filter)

CarolDC

Comment 6

•

15 years ago

I am finding that my political mail is being marked as a scam, even though I get tons of nonpolitical scam male that is not! I've read that corporate/govt powers that be are encouraging this on email AND browsers to cut down on dissent and I hope you guys are not doing it!

Magnus Melin [:mkmelin]

Comment 7

•

15 years ago

Are you talking about scam or spam?
The scam detection is indeed pretty simple, it will mark mails as scams mainly if a mail has links that are shown to go to one site but that lead to another site.

Ben Bucksch (:BenB)

Reporter

Comment 8

•

15 years ago

Carol, if your mails are detected as scam, then maybe (just a guess, but common error) because you have links that have visible text like "http://propaganda.com", but actually go to "http://tracking.net/track?id=224&url=http://propaganda.com". This triggers the scam detector, and IMHO rightfully so - this is not covered by this bug, see initial description.
Solution is simple: Fix your mails so that the links don't appear to go somewhere else than they actually do. I.e. either remove the tracker (best) or change the visible link text (not target) to (in the example above) say "Propaganda website" instead of "http://propaganda.com".

CarolDC

Comment 9

•

15 years ago

Thanks! That makes sense. And will forward the info to anyone else I run into on various lists who makes same complaint/theorizing!

Dennis

Comment 10

•

14 years ago

While this makes sense, and while admitting that it's overly simple, it is still a bother that I can have someone in my personal address book AND have authorized loading of images from that sender, but still get those same emails marked as "possible scam" every single day and receipt.  I think if the sender is in my personal address book and I've authorized the loading of images from that sender, the scam processing should be skipped.  Just my opinion.

Ben Bucksch (:BenB)

Reporter

Comment 11

•

14 years ago

Dennis, the From is not verified. The scammer may well send you mail "From: news@facebook.com" yet the body has a scammer link. You'd make the whole scam protection useless with that feature. Forget From, assume it's forged.

Ben Bucksch (:BenB)

Reporter

Comment 12

•

14 years ago

And, BTW, it's offtopic here in this bug. This bug is about the algorithm in the body.

Dennis

Comment 13

•

14 years ago

Maybe a local whitelist could be included in the algorighm and used?  Or (shudder) some variant of "I've already clicked Not a Scam over 500 times" so it is good to go?

Ben Bucksch (:BenB)

Reporter

Comment 14

•

14 years ago

Dennis, I said this is offtopic here. Please file a new bug.

Ben Bucksch (:BenB)

Reporter

Updated

•

14 years ago

Summary: Scam detector algorithm is overly simple → Scam detector body/content algorithm is overly simple

rsx11m

Updated

•

13 years ago

Blocks: mail-scam

Jim Porter (:squib)

Comment 15

•

11 years ago

(In reply to Ben Bucksch (:BenB) from comment #0)
> Maybe something more along the lines of comparing link text with link
> address, and if they don't match, see whether the URL text is a valid URL,
> and if so, consider is scam.

At the time of this comment, the above is what the phishing detector does.

> A real problem are marketing mails which try to track links using external
> sites, e.g.
> http://registerclicks.trackercompany.com/
> track?companyid=7856785675&recipientid=435447476?url=http://www.playdoo.com/
> This is really bad behaviour, I'm not sure whether we should care much. Even
> worse it is if the link *text* then says "http://www.playdoo.com", but
> luckily that's done with only maybe 10% of these links, and I wouldn't mind
> yelling for that. The marketeers should really stop that - they know people
> don't like it anyways.

Unfortunately, links like that seem to be happening more and more with "legitimate" services (e.g. Facebook). We should probably consider some ways to avoid warning about those, or at least reduce the severity of the warning. If we could parse the tracking URL to see where it redirects and then check that against the link text, we could still keep things safe without annoying people quite as much.

Ben Bucksch (:BenB)

Reporter

Comment 16

•

11 years ago

> http://registerclicks.trackercompany.com/

We should warn about these, but not as "phishing", but "tracking"

Ben Bucksch (:BenB)

Reporter

Comment 17

•

11 years ago

> If we could parse the tracking URL to see where it redirects and then check that against the link text

If you mean: "intended target appears anywhere in the URL", then that's not a safe check, because it would be trivial for a phisher to make that happen. If anything, we need to query the URL and check the actual redirect target. But at that moment, we have already triggered the tracking and might just as well not warn at all. The only solution I can see is a hardcoded list of known tracking domains.

Ben Bucksch (:BenB)

Reporter

Comment 18

•

11 years ago

Tracking was explicitly out of scope here (see comment 0), so I filed bug 938902 about it.

Ben Bucksch (:BenB)

Reporter

Comment 19

•

11 years ago

> For example, every mail with an IP address is considered scam. That's clearly excessive.

I don't think this is fixed, is it?

Jim Porter (:squib)

Comment 20

•

11 years ago

(In reply to Ben Bucksch (:BenB) from comment #19)
> > For example, every mail with an IP address is considered scam. That's clearly excessive.
> 
> I don't think this is fixed, is it?

Anything with an IP address as the URL has to have matching link text. There's a bug where we trigger the phishing alert if the URL and the text match *except* for a trailing "/", but that's being fixed in bug 937265.

Ben Bucksch (:BenB)

Reporter

Updated

•

11 years ago

Depends on: 937265

Magnus Melin [:mkmelin]

Comment 21

•

5 years ago

I think it's reasonable for IPs to trigger the scam warning. But if you don't like it there's the mail.phishing.detection.ipaddresses that you can use to turn it off.

With bug 1476428 done, I don't see anything concrete to do left in this bug.

Status: NEW → RESOLVED

Closed: 5 years ago

Resolution: --- → WORKSFORME