Open Bug 187044 Opened 22 years ago Updated 2 years ago

[RFE] Add Challenge/Response system to counter spam

Categories

(MailNews Core :: Filters, enhancement)

enhancement

Tracking

(Not tracked)

People

(Reporter: dwheeler, Unassigned)

References

()

Details

User-Agent:       Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.1) Gecko/20020830
Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.1) Gecko/20020830

I'd like to see another anti-spam technique also implemented in Mozilla:
a "challenge-response" system.  Here's how challenge-response works:
1. If the sender is on the whitelist, accept the email.
   (Spammers can forge their addresses, but they then have to figure out
   who to forge as... and anti-fraud measures make this dangerous).
2. If the subject line includes a "password" set by the receiver,
   accept the message.
3. Otherwise, reply back to the sender a message that's configurable by
   the receiver-to-be, saying that they need to include the password in
   the subject line & here's how to figure it out.  Spammers won't get the
   message, or won't read the responses.  Real users will include the password.
4. Include various measures to prevent email loops: detect null senders,
   vacation messages, and remember who you sent replies to (and after a few
   tries, start dropping them).

Information about this approach is at:
 http://www.uwasa.fi/~ts/info/spamfoil.html

Another implementation (more recent) is at:
 http://sourceforge.net/projects/whitelight



Reproducible: Always

Steps to Reproduce:
By the way - if you SEND or REPLY to an address, or even save that
email as a non-junk email, you should add the sender to the whitelist
for purposes of this approach. That way, if you talk to someone, they
no longer need to include the password.

This sounds very similiar to TMDA.  Is it bug 156744?
No, this isn't the same as TMDA (bug 156744), though there are some similarities.

TDMA also sends a challenge back, but it encodes a password in the
email address to reply to.  That makes it simpler to reply to initially,
but it also it also has a fundamental weakness: it presumes that spammers
never have a valid return address.  A spammer with a valid return address
can automatically slip the email through TDMA-like systems.
ASK has the same problem.

In contrast, if you ask for a password, and give instructions that require
human processing, you foil automatic bulk systems.
The challenge could be "The password is 'c' followed by 'hess'".
Handling that in general requires human processing, and that raises the
costs sufficiently that spammers are unlikely to do so.
Also, TDMA as they've currently implemented it requires significant
cooperation from the mail server.  That's because TDMA encodes codes in
the "To" address itself.  Many users don't have that kind of control over
their nameserver.  It also causes complications when two TDMA
users talk to each other.

However, requiring senders to include a password in the subject line is
really easy to do - everybody can do that, without fancy reconfigurations.
And while it requires senders to actually figure out the password, the
benefit is that even spammers who have valid return addresses can't
create automated tools to bash through password-protected systems.
If sending or replying automatically adds them to the whitelist - or
even sending the password adds them to the whitelist - then
this becomes very easy to do.

Confirmed enhancement.
Hw/OS -> All/All.
Status: UNCONFIRMED → NEW
Ever confirmed: true
OS: Linux → All
Hardware: PC → All
Summary: Add Challenge/Response system to counter spam → [RFE] Add Challenge/Response system to counter spam
Which parts of this are not possible with Mozilla now?

OK, part 1 and 2: Use normal message filters, using the available "sender isn't
in address book X" option, and "Subject doesn't contain password Y" to move
email to the "probably junk" folder.

Part 3: Cannot auto-reply to email using Mozilla right now.
Part 4: No email loop problem, since there are no auto-reply functions in Mozilla.

So, this bug is probably best simplified to a request that auto-reply functions
be added to Mozilla.

For an example of how NOT to do auto-reply, see valimail.com.  Personally, I
think the challenge-response method will not help the junk email problem, since
at least some spammers use the "I got a response" method to validate email
addresses, and this won't stop spam from taking network resources.  If anything,
it might increase the overall bandwidth needed to handle spam (plus challenge,
plus response, plus argument over whether or not a challenge-response system is
needed)

I'm against adding auto-reply or challenge-response to Mozilla.
>Part 3: Cannot auto-reply to email using Mozilla right now.
>Part 4: No email loop problem, since there are no auto-reply functions in >Mozilla.
>So, this bug is probably best simplified to a request that auto-reply functions
>be added to Mozilla.

No, not quite.  Imagine Spammer Spud, who forges an email to Bob
claiming to be from Alice.  Alice is on vacation, and sends out an
"I'm on vacation" message to anyone who sends her a message:
* Bob replies to Alice with a password message
* Alice replies with the vacation message
* Bob replies to Alice with a password message
* Alice replies with the vacation message
and so on.

This is easily dealt with using some simple email loop checks.
See Timo's notes for more.  Many loops can be eliminated
simply by setting the SMTP FROM envelope value to <>,
noted below.

>Personally, I think the challenge-response method will not help the
>junk email problem, since at least some spammers use the
>"I got a response" method to validate email addresses, and this
>won't stop spam from taking network resources.  If anything,
>it might increase the overall bandwidth needed to handle spam
>(plus challenge, plus response, plus argument over whether or not
>a challenge-response system is needed)

I don't agree, for several reasons.
First of all, the "I got a response" message is rarely the "From"
address.  Today's spammers usually throw away any replies to the
from address, because so many are error messages.
This is PARTICULARLY true if the "here's how to make the password"
has an SMTP envelope with the FROM value <> (null), which looks just
like other error messages and eliminates.

Today's, today's "sucker" lists are often created from the
"please unsubscribe me" email address, which is part of the email
address and NOT the from address.

Now, spammers could modify their code to also detect challenges and
add those users to their list.  But there's no incentive for
spammers to do so.  Since spammers 
have to READ the messages individually
to get the password, it won't be worth it to them.

Besides, if a spammer already has your email address, they'll
sell it to others, whether or not they "confirm".  You're going to
get MORE spam, and clog more network bandwidth, whether you
challenge or not.

Besides, think long-term.  If EVERYONE had good spam protection,
it would eventually be not worth it to spam.  Then the network
bandwidth wastage would be nearly zero.  And THAT would be worth
it for everyone.  Challenge-response would help.

I've written a paper to discuss the idea in more detail,
analyzing alternatives, etc.  Please take a look at:

http://www.dwheeler.com/guarded-email
Challenge response is not as useful for business addresses, since every new
customer would have to jump through a second hoop to send email to the business.
 The first hoop is figuring out the email address.  If everyone had "good spam
protection", then spammers would simply figure out a way around the most popular
spam protection, and use that.  People sometimes forget that there are some
smart spammers out there, and all it takes is one to write a good "get around
spam protection X" system and sell it to the rest.  The other problem is that
this system increases the network load (at least temporarily) while 10000 users
(or however many turn on the c-r system) send back a challenge to a non-existent
email address.  On most open relays, SMTP-FROM doesn't have to be a real
address, or may be set to the receivers email address to get around other
filters...  If the sending address is a real address, the spammer could easily
(and perhaps stupidly) use the responses as verification of reading, and just
keep right on sending junk email to that address.

I think the most common spammer trick is to send junk email to every short word
at each well-known ISP, and claim that they just sent "X thousand emails to
verified addresses".  I have one address that has not been used for anything
other than receiving spam.  After I set it up, it remained spam free for about a
month.  It's up to one spam a day.  The user name is only three letters long, at
a well-known ISP.  I have another that gets about 33 a day, which is a
dictionary word 8 letters long, at a not particularly well known ISP.  The 8
letter address can be found on google on two web sites (neither of them my own),
and has been in use for 5 years.

If, as you suggest, the challenge comes with SMTP-FROM: <>, most mail servers
these days won't deliver it.  If they do, they could easily be classed as open
relays by some of the relay testers.

The way email works ("accept email to local users" for the majority of email
servers today) makes it difficult to block all spam, and still let through those
good messages from new addresses (such as people who've read something you
wrote).  Challenge-response makes it less likely that you'll get the good messages.
I envision challenge-response as an option, off by default.
If you don't think it will help you, by all means don't enable it.
If you think it WILL help you, then enable it (by setting the password).

A few counters to your comments:
* "If everyone had "good spam protection", then spammers would
  simply figure out a way around the most popular
  spam protection, and use that."
  I agree that this is true for filters - which is why I think
  filters in the long run won't work.
  But I haven't seen any evidence that challenge-response systems
  have this problem.  If someone broadcasts your password, you can
  change it faster than spammers can afford to keep re-finding it.

* "The other problem is that this system increases the network
  load (at least temporarily) while 10000 users
  (or however many turn on the c-r system) send back a challenge
  to a non-existent email address..."
  Spammers are already using a torrent of network load.  If most people
  used a challenge-response system, there's hope it would dry up.
  But while network load is important, it's READING TIME that's more
  important.  Besides, if a spammer gets that many challenges, it'll
  hurt the spammer's source than anyone else - aiding in finding them.

  "If the sending address is a real address, the spammer could easily
  (and perhaps stupidly) use the responses as verification of reading, and just
  keep right on sending junk email to that address."
  A simple alternative would be to challenge email sent to a non-address,
  I suppose.

* "If, as you suggest, the challenge comes with SMTP-FROM: <>, most mail servers
these days won't deliver it.  If they do, they could easily be classed as open
relays by some of the relay testers."

  Hmm, that's a problem.  Is that really true?
  The simple solution would be to change the
  SMTP-FROM address to be the guarded user's email address.
  Thanks for the note.

* "The way email works ("accept email to local users" for the majority of email
servers today) makes it difficult to block all spam, and still let through those
good messages from new addresses (such as people who've read something you
wrote)."

  Yes.  We agree on the problem.

* "Challenge-response makes it less likely that you'll get the good messages."

  Increasingly, I can't get the good messages BECAUSE of spam.
  I've already had two separate email outages, where DAYS of email were
  lost, because of spam.  And I have to classify on-the-fly what's left,
  and sometimes I'll make a mistake & delete legitimate email.
  I have to run filters, and sometimes those
  filters will misclassify my email & I'll never see it.
  Remember, filters ALSO make it less likely that I'll get the good messages.

  There _IS_ a danger that a user won't bother to respond to a challenge.
  Please, let _ME_ (the receiver!) make that determination.



I'm not sure of the truth about blank MAIL-FROM: in the SMTP protocol.  I
believe it is left to the implementation to decide what to do.  I may be wrong
about the "most servers discard", it might be only "many servers discard".  I
know there are some out there that don't seem to care much about what they get,
as long as it has a "To:" address, they'll send it.

I've seen c-r systems fail more often than email servers.  Many of them leave
you guessing as to whether the receiver will ever see the email, or if your
response was correct enough to pass the challenge.  'c' plus 'hess' appears to
be a reasonable challenge.  Anything requiring a user to come up with a good
challenge may be doomed.  I've been a sysadmin, and I know that there are many
people who just want their email to work, and not require such difficult
thought.  I suppose that if you wanted to avoid receiving email from
mathematically challenged people, you could say "Respond with an 11 digit prime
number to send email to this address".  I'm not actually in favor of that sort
of thing, since I've got some friends who have difficulty figuring out what the
Subject: line does, and that they could type something into that field.  In
other matters than computers, they do very well.  Don't forget the
computer-challenged in the race to stop spam.  (A sysadmin saying "be nice"?  I
must be tired.)

The idea of a client-based c-r is a little better than the web-based ones I've
seen.  The client idea requires fewer third-party systems, which is a good
thing.  Some people, myself included, will never trust a third-party challenge.

The other difficulty is, where do you store the challenged email until the
response comes back, and how long do you wait in case someone missed the
challenge (or it got lost in the net).  Two weeks of spam, for me recently, is
about 1 MB.  Any of those could, in theory, be a human trying to contact me.
In practice, it's mostly wasted space, or "historical documents".  If you go by
the dates on the spam, it could be 1.3 MB per year, but that's only because
spammers can't set clocks either.

I don't like the idea of having to send a message twice (the original email, and
the response to challenge), and still having no idea whether my response to the
challenge got back before the other client deleted the message I sent.  Perhaps
you need to have a check for whether the message still exists on Bob's machine,
and send a "please resend your message" notice to Alice in response to late
responses.  Then a spammer could have more time to work out the proper response,
AND have something to point to that he could claim was a request for more email.
 Shoot.  That doesn't work, does it.  Perhaps requiring another c-r cycle for
late responses would handle that, but if the response was late in the first
place, it'll probably be late again unless you increase the amount of challenged
email you keep temporarily.

By the way, if you wanted to cut down on the spam that you see, you could try
popfile (popfile.sf.net).  After a few months of occasionally retraining it on
errors, I have a 97% classification accuracy with about 17 classes.  It can
usually distinguish between useful messages from friends, and chain letters from
the same friends.  Very few of the classification errors are "ham mismarked as
spam".  Since it's basically a trainable probability based filter, the only way
a spammer can get through it is by not sending email that looks like spam, or
occasionally, by changing to a different product from anything seen in a long time.

Too long, I know, but I ran out of time for editting.
mass re-assign.
Assignee: naving → sspitzer
Product: MailNews → Core
*** Bug 194792 has been marked as a duplicate of this bug. ***
sorry for the spam.  making bugzilla reflect reality as I'm not working on these bugs.  filter on FOOBARCHEESE to remove these in bulk.
Assignee: sspitzer → nobody
Filter on "Nobody_NScomTLD_20080620"
QA Contact: laurel → filters
Product: Core → MailNews Core
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.