Closed Bug 229686 Opened 21 years ago Closed 15 years ago

Request : Support for HashCash type of SPAM protection

Categories

(MailNews Core :: Composition, enhancement)

enhancement
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: ivarBZ, Unassigned)

References

()

Details

(Keywords: helpwanted)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.6b) Gecko/20031218
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.6b) Gecko/20031218

Microsoft has again found something new to sell (
http://news.bbc.co.uk/2/hi/technology/3324883.stm ). As normally this type of
feature has been around quite some time ( http://hashcash.org ). The main
difference seems to be the algorithm used. Is there enough interest in this kind
of functionality to warrant implementation ?

Reproducible: Always

Steps to Reproduce:



Expected Results:  
Put a hashcash tag into mail send by mozillamail
Any volunteers to write an Authentication Plug-in ? See bug 180049.
I would also very much appreachiate this. The idea of "paying" for 
E-Mail with CPU time is just super-elegant.

The BIG advantages would be:

* Absolutely no more false positives in junk mail detection for mails
  sent by another mozilla user.
* A big promotion for the open source solution. Perhaps other mail 
  clients would also include hashcrash support.
* Another triumph over the big company announcing something as new, 
  while the OS community has a solution since 1997. :)
Now that SpamAssassin has added support for Hashcash hash tokens, it would be
really good if we could get some MUA support for the technology.

Discussion thread on SpamAssassin bugzilla
http://bugzilla.spamassassin.org/show_bug.cgi?id=796

Hashcash FAQ
http://www.hashcash.org/faq/

I would really like to see this feature in Mozilla soon!
This is also a KDE KMail wishlist entry:
http://bugs.kde.org/show_bug.cgi?id=78816
The other obvious place for hashcash is on the incoming side -- to dig the
sender out of false positives in mozilla's built-in spam filter.

I've been using mozilla mail's spam filter for a while now I've noticed at least
for me it now and then marks as spam stuff which isn't spam.  Hashcash could
help with this if mail with hashcash were exempted.
personally, would prefer seeing a hybrid sender pays system like was implemented
with camram.  Additionally, stamped generation should take place in background
so that it won't impact the user experience

free registration required.
http://www.technologyreview.com/articles/wo_johansson032604.asp
re. Eric's comment about generating stamps in the background Hubert Chan
suggested  start creating the tokens as soon as the user addresses the message,
or clicks reply-to.  With an integrated implementation this would be possible.

Another approach suggested was to generate tokens with low priority on receipt
of mail in preparation for a potential reply.
(In reply to comment #7)
> re. Eric's comment about generating stamps in the background Hubert Chan
> suggested  start creating the tokens as soon as the user addresses the message,
> or clicks reply-to.  With an integrated implementation this would be possible.
> 

Adam, just a slightly contrary perspective, :-)  

a fair number of my messages are not addressed until I actually send them. 
Keeps me from making embarrassing mistakes.

> Another approach suggested was to generate tokens with low priority on receipt
> of mail in preparation for a potential reply.
 
I get something like 200-500 messages per day.  pre generate all of them?  if I
have some spare time over the next few days, I will run a test and see if it
makes sense.

---eric
(In reply to comment #1)
> Any volunteers to write an Authentication Plug-in ? See bug 180049.

not sure how and authentication plug-in would fit in this context but, if
someone can help me climb the learning curve I will integrate camram into
Mozilla as proof of concept.

Starting point: camram is written in Python.  it is best treated as a
self-contained unit as far as filtering and stamping is concerned.  From the
input filter perspective, all I need is a message in something I can convert to
a string.  for outbound stamp generation, either give me a message in the same
form (i.e. effectively a string) plus all of the message recipients as would be
seen as part of the SMTP protocol.  Or, I can use the camram background stamping
process and deliver messages that way.

there is a question of what to do with the existing content filter.  Camram is
using currently the CRM 114 package.  It's not hard to substitute as long as the
filter can be called on demand by camram for scoring and training.

Last is the issue of how to deal with a spamtrap.  Camram does have a different
spamtrap model but it's not horribly difficult or different.  For the purposes
of this proof of concept, we can use the existing Web based interface.
> a fair number of my messages are not addressed until I actually 
> send them. Keeps me from making embarrassing mistakes.

If you hit reply-to, and most mails are replies rather than 1st posts, your mail
will be addressed before you type any of the message body.

There are counter examples, but we're only talking about speeding up the general
case where possible, where it's not possible, it's not possible.

> > Another approach suggested was to generate tokens with
> > low priority on receipt of mail in preparation for a potential reply.
> 
> I get something like 200-500 messages per day.  pre generate 
> all of them?  if I have some spare time over the next few days, 
> I will run a test and see if it makes sense.

As Gerhard Blab said in more detail what you would want to do to do this
efficiently is be predictive.  Look at how many messages you had sent to that
recipient in the past as a weighting of how much effort to put into generating
stamps optimistically for that recipient.

(these "Gerhard and Hubert said" references are from the hashcash list:
http://www.hashcash.org/list/ where this discussion started)
(in reply to comment #9)
IMHO it is unnecessary to include another whole hybrid spam detection system
like camram into mozilla. 

If I understand correctly, the camram not only produces the hashcrash-stamps on
sending and verifies them on receiving mail, but also does some kind of text
filtering. Mozilla now already includes a bayes-filter-based junk classification. 

Perhaps it would be better if only the stamp creation and verification
functionality was added (and by this means false positives in junk mail
detection were omitted) rather than including a second content classification
system.
(In reply to comment #11)
> (in reply to comment #9)
> IMHO it is unnecessary to include another whole hybrid spam detection system
> like camram into mozilla. 

In the long run, yes.  I was only offering offering to work with the camram code
as a proof of concept.  after all, it is some 6000 odd lines of Python not
including CRM114, emailrelay, and hashcash.  I just thought that reusing
existing code and pulling all of the anti spam functionality out as totally
separate some process would have some benefits.  Especially since I figure it
might take some time to translate all of the camram functionality into whatever
language mozilla uses.

as a side note, what I added up the numbers, I was astounded.  I'm disabled, I
use speech recognition and the only language I can write any significant amount
of code in is python.  I was amazed to see how my efforts have added up.

This is also fair warning that weird text happens sometimes and just chalk it up
to a recognition failure that I did not catch.  Ask if it seems important.

> If I understand correctly, the camram not only produces the hashcrash-stamps on
> sending and verifies them on receiving mail, but also does some kind of text
> filtering. Mozilla now already includes a bayes-filter-based junk classification. 
> 
> Perhaps it would be better if only the stamp creation and verification
> functionality was added (and by this means false positives in junk mail
> detection were omitted) rather than including a second content classification
> system.

that's a very bare summary but it's essentially correct.  To elaborate, on
inbound messages camram uses a chain of predicates to evaluate e-mail.  If a
predicate should yield true, then camram passes it on for delivery.  If a
predicate yields false, then it is passed on to the next filter.  The last
filter is CRM114.  Effectively a Bayesian filter from the end-user perspective.
 The last stage scores each message into one of three categories.  If it's in
the good category, it is delivered (i.e. filter predicate yields true).  The
other two categories are effectively a false and the messages is placed into a
spamtrap or dumpster.  The difference between the two is that dumpster messages
will expire after some time (i.e. five days).  

The user interface (if you want a screen dump, just ask) is very focused on what
the user's task is, namely sorting through the indeterminant messages.  The
sorting process also trains the Bayesian filter.

One of the filters, fast white list is a simple white list which matches on
exact addresses of who you send e-mail to or who you have approved from the
spamtrap.  It turns out the white list also helps in Bayesian filter training. 
If you make the assumption that the white list is very clean and you always get
good e-mail through it, then if you scored each message approved by the white
list any message that comes up "spam" can be used to correct the Bayesian
filter.  Depending on how your filter operates, you may also need to throw in a
random number of good messages as well but this is something to be determined.

so, in the end you could use your Bayesian filter as an element in a camram
style chain.  There's nothing special about that stage except in the
interpretation of its results and the handling of the messages afterwards.  

Some of the important issues are a new user interface to accommodate the three
category output, background processing of stamps be it triggered on addresses or
deferred until sending, and limit setting for boundaries defining the
indeterminate region.  I've wrestled with these for a while and I finally come
down with some solutions that seem to make people happy And you are free to use
them if you wish.  

---eric
Keywords: helpwanted, mail6
I would also like to see a Hashcash implementation, both adding a Hashcash token
when sending email, and checking a Hashcash token when determining if a message
is spam.

As far as making things speed up, precomputing hashcash at low priority makes
sense.  I think you should prioritize email addresses that least appear to be
spam (since if it's spam, you're unlikely to reply if you're smart). Why not
also do that for any address in your address book if you've sent email to them
before?
Since SpamAssassin takes Hashcash values into account, there's now a valid
reason to send HashCash even to people who don't use Mozilla mail.
I put up a page with some details how mozilla integration would ideally work here:

http://www.hashcash.org/mail/mua/mozilla/

Particularly:

what features would make sense?

receive support - Mozilla mail and thunderbird include a bayesian learning
anti-spam feature. In my personal experience it makes a fair number of false
positives. It would help if mozilla supported hashcash on received mail.

send support - It would help if mozilla supported sending hashcash so that
mozilla mail users could benefit from the spamassassin support, TMDA support and
other anti-spam systems which are starting to support hashcash as a way to
bypass their filters to make sure your mail gets delivered and does not get
falsely classified.

stamp dialog - Hashcash can be a somewhat lengthy process on slow hardware (1
20-bit stamp per second on a 600Mhz CPU; 3 20-bit stamps per second on a 3.06Ghz
P4 (see here for more benchmarks)). It would be useful if some progress
indicator were given.

background stamp creation - in offline mode mozilla could compute stamps queued
ready for sending.

precomputed stamp pool - mozilla could precompute in the background stamps for
recipients you frequently communicate with.

early start to stamp creation - mozilla could start to compute stamps as soon as
you click on reply, or enter recipients into To, Cc, Bcc. 

minimal support

Actually receive support alone to avoid false positive from mozilla's own spam
filter would I think be rather easy to implement as verification is very cheap
and simple.

However it would seem a bit one sided to try to benefit from receive support
without participating in send support. Minimal could be a progress dialog on
send. Or maybe not even. How slow a machine can you comfortably run an app the
size of mozilla on? Note even rather slow machines by current standards such as
300Mhz P2 can create a stamp in 2 seconds.
for the baseline hashcash implementation, I would vote for 

1: background generation of stamps.
2: tracking who you have sent stamps to
3: receive filter bypassing the Bayesian filter if the message comes with a
stamp or from someone you have sent a stamp to
background stamp generation is defined as stamp generation after you push the
send button but before you initiate the SMTP connection.  I would counsel
against any foreground stamp generation because of the negative user impact
which in turn creates a high probability of discouraging users from using
hashcash.  Stamp generation in background will make for a better user experience
and ease adoption.  

It may be worthwhile lighting some sort of an indicator somewhere saying that
stamp generation is going on but that's the most I would tell the user at this time.

tracking who you e-mail allows minimization of stamp generation (better user
experience) and minimizes false positive errors on e-mail from people you have
already communicated with.

These simple items will make hashcash utilization virtually painless.
Instead of a fixed number of bits (say 20) in the hash collision, a variable
number with a default and suggested value of 20 would allow for greater
paranoia, faster CPUs, and more efficient SHA1 algorithms being developed.

In a point-based filtering algorithm, for example, a match of 24 bits could be
worth more points than a match of 18 bits.

Re: Adam Back's suggestion of a precomputed stamp pool <a
href="http://bugzilla.mozilla.org/show_bug.cgi?id=229686#c14">#14</a>, I think
that this is in fact a bad idea.  A spammer could also do this, and sell the
stamp.  It would be better to do the opposite, and insist upon uniqueness of
stamp with every email.  This could be achieved by using the from address and
time since the <a href="http://en.wikipedia.org/wiki/Unix_epoch">beginning of
the epoch</a> at the point when the calculation is begun to seed the search for
hash collisions.
In reply to comment 16.

The variable sized stamp is the approach used by spamassassin.  Your mail is
less likely to be classified as spam the larger it is.  A default is still
useful though and users can adjust from there.

On the precomputed pool: the pool stamps would still be used only once.  It is
just a way to hide the delay involved in creating them from the user to improve
his user experience.  Stamps must be unique as the recipient rejects stamps
which he has seen before.  The stamp includes a date stamp.  The from address is
intentionally omitted from the stamp as users often choose to hide it
(remailers, from mangling to avoid spam, and some MTAs also change it).  The
from address may however be added in an extension field.

Product: MailNews → Core
Just a thought on the user-experience impact of foreground stamp calculations
and  balance between having the functionality in the first place.

I would imagine that implementing hashcash with foreground stamp calculations is
easier than implementing it with background calculations.  If this actually is
the case, and the foreground code doesn't make future implementation of
background calculations more difficult, then I propose this:

1. Implement hashcash stamps on send, *properly*, with foreground calculations.
2. Disable the feature by default.  This allows those who are interested to use
it but will not cost user experience troubles to those that don't.
3. Provide some way of an interested user to:
  a) Enable hashcash stamps on outgoing mail.
  b) Set the cost of the stamps to be created.
  NOTE: Some way could be just knowing what bits to set in a config file like
user.js.
4. Implement hashcash stamp verification on receive with some way of specifying
how much to score junk/non-junk based on cost, see #3.

Then work on improving user experience if it's a problem.  It may be that people
don't mind an extra 2 seconds per recipent when sending an email.  After all
sometimes it takes me 2-4 seconds to contact my ISP's outgoing SMTP server...

Just a few cents to throw into the pot.
Adding Hashcash send support to Thunderbird would be very useful.

_Lots_ of people use Thunderbird, and we all have worries and experiences with
our mail recipients "magically" not getting our emails.  :-/

This would be a step in the right direction. :)

Regards and best wishes,

Justin Clift
Thinking about this even further, a possible "simple" approach to first
implementation/prototype would be to grab the Enigmail code, cut the guts out of
that, change it to use an external hashcash generator (they already exist), and
change the header message from the PGP one to the Hashcash one.

http://enigmail.mozdev.org

It actually sounds like most of the needed code is already written, and someone
that knows how to remove the unneeded Enigmail bits + compile the results is
required.

?

Regards and best wishes,

Justin Clift
Receive support is easiest to add: it should at the very least generate a
special pseudo-token which can be used as an input by the Bayesian spam
detection algorithm, so Thunderbird users can take advantage of other people's
hashcash sending systems: thus creating a "pull" for other implementations.

Then, the more difficult task of integrating hashcash signing with the mailer
should be addressed.
 
(In reply to comment #21)
And by the way, I like the idea of having variable-length hashcash, so things
can be scaled over time as machines get faster. More zeros == more
anti-spamminess. In which case, if we are sending pseudo-tokens to the Bayesian
classifier, we should send a number of different tokens to the classifier as N
gets bigger:

so for 20 zeros, we send HASHCASH_OK_20
for (say) 22 zeros, we send HASHCASH_OK_20, HASHCASH_OK_21, HASHCASH_OK_22
...and so on.

This way, learned weights for lower levels of hashcash are still useful, whilst
the system starts to learn about the higher levels.
I think comment#21 from (usenet@tonal.clara.co.uk) is a key breakthrough
as far as receiving goes.  No need to create fancy infrastructure
("how many bits do you need?"); just create tokens that the Bayesian filter uses.

HOWEVER, there's a catch: the pseudo-tokens MUST NOT BE things
that can be included in the sent message (header or body).
Otherwise, there's a simple trick: the attacker (spammer) doesn't
need to compute a hashcash value, they just include "HASHCASH_OK_25"
in the mail header or body.  Suddenly you've got a 25-bit hashcash value,
without the work!!

That doesn't mean comment #21 is bad; it's a very good idea.
We just need to make sure that the pseudo-tokens HASHCASH_OK_nn
can ONLY be derived from the hashcash check, and NOT from the
message itself.
Would it not be feasible just to have the bits that check the hashcash to strip
out such headers?
I don't know too much about the internal machinery of the bayesian filter, but
if you operate on a token stream, it should be possible to generate a token that
could never come from parsing the body, e.g. a token containing a space character. 
bayesian filters ignore the space char because it's in roughly 100% of the spam AND ham. Also you 
can't guarantee the same token stream between apps. Some may mix in higher order markov chains 
and such. 
Anyone on here familiar with netscape mail code base to add receive support?

I tried to do it myself, and did manage to exempt a stamp from moz mail's
bayesian spam filter based on presence of a hashcash stamp; however it felt a
hack given I was not at all familiar with the code base.
Adam - Post the patch.  Let someone who is more familiar with it make it
pretty...  It will give them something to go on without having to start from
scratch.
Blocks: majorbugs
No longer blocks: majorbugs
Assignee: sspitzer → nobody
QA Contact: esther → composition
Support for Hashcash in Mozilla Thunderbird is now in place.
An open-source extension is available for download from http://pennypost.sourceforge.net

Cheers,
Ali
It is not an extension.
Have you ever seen an extension of 16Mo? (thunderbird is 6Mo)

Knowing that the most complex thing is to calculate sha-1, manage dialogbox in thunderbird and whitelists, the extension should result of a 200Ko code max.

So, I think that it's a virus, or it's a microsoft program (maybe including Outlook :p)
The extension size is large because it bundles another algorithm besides hashcash called memory-bound. This is a pricing algorithm similar to hashcash. You can find details of this algorithm here.
http://pennypost.sourceforge.net/MBound

This algorithm requires a 16MB fixed table to work and taxes the memory rather than the CPU. Hashcash taxes the CPU alone.

Do you really believe I would release a virus as open-source!
(In reply to comment #29)
> Support for Hashcash in Mozilla Thunderbird is now in place.

Okay, I've installed it & and sent an email to myself. It generated a stamp (it's in the header (here munged) as x-hashcash: 1:20:070905:br????@????om.co.nz::c1674b3898b10dcc:12c9c599
But when I open it, I see no indication that the hashcash is valid. Yes, I DO have the recipient email addr as the 3rd of 3 email setup csv list. Am I missing something?
BTW I have about 500 email addrs I need to add, 99% for one domain. How do I do this quickly? *@mydomain maybe? Perhaps it's already there, but until I get it vaguely working I'll keep it simple.
On the surface the pennypost extension looks very fancy (and large).  However, I could not mail using it either & posted a bug on their sourceforge site.  Is it appropriate to move further pennypost discussion off this list and onto their own?

Thanks,

Lance
(In reply to comment #32)
> (In reply to comment #29)
> > Support for Hashcash in Mozilla Thunderbird is now in place.
> 
> Okay, I've installed it & and sent an email to myself. It generated a stamp
> ... But when I open it, I see no indication that the hashcash is valid. 
After stopping & starting TB, & checking (but not changing) settings, it now works!
But it is also brittle/fragile in other ways too - in the Inbox, when jumping from a stamped-to-me email to a stamped-but-not-to-me email (BEFORE validation display complete?), it shows the stamped-but-not-to-me email as valid with the stamped-to-me info showing!
> ...
> BTW I have about 500 email addrs I need to add, 99% for one domain. 
> How do I do this?
*@my.domain does not do this - how do I set a mask rather than email-by-email?
(In reply to comment #32)
> But when I open it, I see no indication that the hashcash is valid. Yes, I DO
> have the recipient email addr as the 3rd of 3 email setup csv list.

Same here, I filed a bug about it:
http://sf.net/tracker/index.php?func=detail&aid=1788340&group_id=201935&atid=979567
THIS IS NOT THE PENNYPOST BUG TRACKER!
This is the Thunderbird bug tracker
While you'd think this obvious it appears to need to be pointed out that comments in this ticket should be restricted to the discussion of adding hashcash to Thunderbird. It's nice to have a note here about the existence of PennyPost, maybe someone can use that as source to build something directly into Thunderbird, or make a smaller extension, BUT you should not be discussing your problems with, or questions about, PennyPost here.
Hello,
Sorry, I need to put this comment into this list as many of you CCed this list tried out pennypost and this message concerns them.

Many of you who downloaded pennypost were disappointed due to a bug in header detection. This is now resolved (thanks to Lance & Martin). You will maually need to update to the newer version as autoupdates are not yet supported. Further, please keep checking for newer releases frequently, manually. I'll not be posting any such updates to this list again.

Please DO NOT post your feedback, bugs, comments about pennypost in this list.

I've created a new feedback tracker http://sourceforge.net/tracker/?group_id=201935&atid=992785 where you can leave your feedback. This feedback is important for pushing this extension into the mozillas public extensions download site at http://addons.mozilla.org or eventually getting it integrated with thunderbird.

In re to kate, I fully agree this is not the place to post bugs or feedback for pennypost. However, AFAIK there is no way to build things 'into thunderbird'. Most of thunderbird code is extensions to the core mail client. It is only that many extensions (like junk mail filters and talkback) are shipped integrated with thunderbird by default.
Nobody is working on this? This feature was requested 6 years ago. Is there a reason why it can't be done or is nobody interested?

At a rudimentary level you could just ship the hashcash binary (276KB) and run `hashcash -mb20 $RECIPIENT`. Surely a Mozilla developer can do better than this within a reasonable time.
search pennypost on sourceforge, and download newest version.
search pennypost on sourceforge, and download newest version.
Pennypost is a 16MB download. That is not a solution. I'm not going to go around asking people to download a 16MB addon for a 6MB mail client. Why can't hashcash be added to Thunderbird?
its 16 Meg to support Mbound which uses lots of ram so it has to be 16Meg.
chad writes:
> its 16 Meg to support Mbound which uses lots of ram so it has to be 16Meg.

We known that, the problem is that for users that don't plan to use Mbound, it bloats the download enormously.

Also, even though it never changes, that 16Mb dataset will need to be downloaded again every time the Pennypost plugin is updated with bugfixes or features.

To avoid these problems, I suggest that the Mbound dataset be moved into a separate plugin, that contains nothing but the dataset, (no code) so it should never need to be updated. Users who only plan to use hashcash won't need to download it, and when ever any part of the Pennypost plugin needs updating there won't be any need for a big download. Users who which to use Mbound stamps will need both plugins.
Post your suggestion on the sourceforge page, I bet the developer(s) is(are) over there and never check here.
(In reply to comment #42)
> its 16 Meg to support Mbound which uses lots of ram so it has to be 16Meg.
> 

The present feature request is for Hashcash in Thunderbird. I couldn't care less about Mbound. If I did I would be filing a feature request for Mbound instead of commenting on a feature request for Hashcash. Please add hashcash support to Thunderbird and don't confuse the issue with an unrelated algorithm. Thank you.
> The present feature request is for Hashcash in Thunderbird. I couldn't care
> less about Mbound. ... Please add hashcash support to Thunderbird.

+1

+1
Injecting my opinions into this bug:

First off, I am dubious of the efficacy of the Hashcash spam-prevention measure. The central idea is that spammers would have to pay with CPU to send messages. As all readers are quite well aware, modern technology has improved to the point where there is a glut of CPU power; projects like Folding@Home use this glut to work. But so too do spammers recognize this; most emails are sent from infected computers (80% as of June 2006, if Wikipedia is to be believed), so spammers are not really losing any CPU power. Given that, for them, the cost of CPU is insanely low (how do you think they make money?), this payment is more of a burden on email clients and regular users than spammers.

Even supposing that this would actually impact spammers in any meaningful way, the second problem relates to another core idea. In order to force the spammers to pay, the idea has to be sufficiently widely supported so that a "everybody has to have this" or everybody modulo acceptable whitelist; this cannot happen until major MUAs, most notably Outlook, support it. A simple google search shows that Outlook has a similar feature, but not the same, and Microsoft claims it has a patent on it (unverified by myself, but an issue that is definitely worth resolving before looking further. Even looking at the comments on the relevant blog post shows that many other developers feel the same way: it fails to address the issue at heart.

About the only way to achieve the necessary widespread support is an RFC standard or the ilk. But should it reach RFC-type support, we return to the problem that the spam prevention relies on the fallacy that CPU usage is expensive. But also, spammers operate based on stealing others' computers; they can just as easily steal Outlook itself and spam people using forgeries and the automatic-whitelisting path.

A third issue should strike a little closer to home. There are some systems which, by necessity, need to send mass emails to people; one of these is bugzilla. So bugzilla would have to be included in people's whitelists, which both limits the utility of bugzilla somewhat and makes people more susceptible by spam, since, as the creator of hashcash points out, whitelists have known problems.

But in the end, does it stop spam? Not really: its salient points are that it taxes the CPU, the algorithm cannot be defeated (neither could MD5, if you recall), and that people with said header can be fast-tracked as not-spam until it reaches critical mass. Let me address the last point some more.

Ultimately, the problem we are seeking to solve is that current spam filtering does not work. Looking at the pages, hashcash essentially went defunct in late 2004. I believe it was after that time when naïve Bayesian classifiers became popular. In fact, when properly trained (which Thunderbird's setup makes difficult, but there are already other bugs on this), these have rates of false positives on the order of 1 in 10000 or so.

Finally, I have not investigated the library, but integrating the 16M extension into a 6M client does not make it much different from asking someone to install a 16M extension to a 6M client. Even assuming at best an additional 6M, that's 6M of code for a solution which taxes regular users unwittingly, doesn't tide the flood of spam, replicates an already excellent feature, and aggravates the image of a bloated email client.

In short: I feel that the nature of this request warrants a WONTFIX since it takes too much to do so little.

Since I am not sufficiently qualified to make this decision myself, I am CC'ing module owners and drivers for their input.
The Hashcash mailing list at http://news.gmane.org/gmane.mail.spam.hashcash last had activity in Dec 2007, so it's not 'defunct' just yet!

To address a couple of your points; Bayesian filters are not a solution to spam because of the false positive issue. Any amount of false positives are unacceptable. I currently receive about 9,000 spams a month. A false positive rate of 1-in-10000 (which seems generous) would mean I'm potentially losing one real business-related email per month. (I spend a lot of time digging around my spam folder recovering the false positives which end up in there!)
 
Hashcash is not a standalone solution to spam. Some discussions on the Hashcash list mention using Hashcash as part of a hybrid solution (to include content filters and whitelists) The problem here is not that the spammers can beat Hashcash by using zombie PCs, as increasing the hashcash stamp size will get past this. The issue is that users with older PCs or mobile devices are negatively impacted as the stamp size increases (a spammer with a powerful CPU might spend 30 seconds generating a stamp, a user with a Pentium II might spend 3 hours on the same stamp)

I am interested in the Outlook 'hash' header you mentioned. If MS have implemented a hashed stamp solution, then this could indicate that at least MS Exchange servers will (if they're not already) implement an Outlook hashed stamp filter. Which could be a way of introducing a widespread hash-based email stamping solution.
(In reply to comment #48)

Hashcash is certainly not a complete spam solution, but it can be a part of one (I'm sure this has been said already ...) As mentioned above, SpamAssassin applies a negative (not-spam) score when it sees a valid hashcash token:

http://spamassassin.apache.org/tests_3_2_x.html

Really all I want is for Thunderbird (or an extension) to be able to generate these tokens for me. And no, I don't consider that enormous Pennypost extension to be a solution ...
Product: Core → MailNews Core
Hashcash may not have been widely adopted (it does suffer from the issue that until it's commonplace its usefulness is limited). However, 'every little helps'.

The overhead to the user is limited (as a low priority tokens could be calculated in the background as mail is received, 'just in case' - and tokens generated on demand as mail is sent).

It'd be a positive for received mail, helping avoid false positives.

Yes, it's not the be-all and end-all, but it is rather elegant, and would make a nice supplement.

Much more useful though would be if thunderbird had something like spam assassin 'built in'.
a solution that requires near universal adoption isn't before it's practical isn't something for which the cost is worth considering.
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → WONTFIX
(In reply to comment #52)
> a solution that requires near universal adoption isn't before it's practical
> isn't something for which the cost is worth considering.

I agree wholeheartedly.

But hashcash is not something that requires near universal adoption, so the premise for your status-change isn't fulfilled.
i'd like to see hashcash in the client, and i'm going to give some counter arguments to the post made by Joshua Cranmer

1 "spammers have zombie PCs and have unlimited CPU power, so they're not slowed down"

this argument is countered by looking what *every single* zombie PC can do.
currently, a zombie PC may be able to send 100 spam mails per second, mostly depending on the internet connection. producing a hashcash stamp might take on average 1 second on a PC, now this same zombie PC can only send 1 spam mail per second. as spammers have a finite number of zombie PCs, a spammer can send proportionally fewer emails.

2 "everyone must support it before it's of any value"

spamassassin, widely deployed, counts a negative score for valid hashcash stamps. hashcash can take an incremental "every bit helps" approach. people are encouraged to send emails with hashcash stamps because it reduce false positives.

3 "CPU being expensive is a fallacy"

see point 1.

4 "MD5 was defeated, thus, so can SHA1, obsoleting hashcash"

while MD5 is broken for collisions, hashcash is based on a partial preimage. these are still not broken (made easier) for MD5.

5 "only 1 in 10000 false positives with bayesian filtering"

this is still one too many. a scheme like hashcash helps to further increase the signal to noise ratio of other filters.

6 "hashcash is defunct as of 2004"

it has mailinglist activity as of now (2011). due to it's simplicity, hashcash does not need constant active involvement from its inventor. the scheme's basic premises are still valid. it's still supported by spamassassin.

7 "adding a 16 MB library to a 6 MB client is unreasonable"

this request is not about adding the 16 MB library, but about adding hashcash, which is simple and can be implemented in little size.
(In reply to comment #54)

> 7 "adding a 16 MB library to a 6 MB client is unreasonable"
> 
> this request is not about adding the 16 MB library, but about adding hashcash,
> which is simple and can be implemented in little size.

Please provide a patch so we can figure out how much this would add and then decide if we want it in the product or not.
tl;dr

"Additionally, stamped generation should take place in background
so that it won't impact the user experience"

Quoting Wikipedia>Hashcash
"The header contains: the recipient's email address, the date, and information proving the required computation has been performed. The presence of the recipient's email address requires that a new header be computed for each recipient, and the date allows the recipient to record headers received recently and make sure the header is unique to this email."

That means that during composition of a mail (which is a rather not CPU intensive i guess) you could start working on the hashcash only if the receiver accepts a certain delay. Alternatively, the receiver could downgrade older hashcash: the older the worse. This would result in pregenerated hashcash to be more CPU expensive. With the values set properly, you have a relatively short time (e.g. half an hour) for finishing your mail or you have to redo your hashcash.

To speed things up, you could add a hash digest of the message itself. The receiver then knows that you started your hashcashing only after finishing the message. However this would be true also for spamers! Therefore, ideally you would add some random salt to your hash digest that everybody can agree on: e.g. a hourly RSS feed with randomness?
You need to log in before you can comment on or make changes to this bug.