Closed Bug 483022 Opened 15 years ago Closed 15 years ago

Bayes processing dies without calling listeners on UTF conversion errors

Categories

(MailNews Core :: Filters, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED
Thunderbird 3.0b3

People

(Reporter: rkent, Assigned: rkent)

Details

Attachments

(1 file)

In testing Bayes processing with newsgroups, I ran across posts where I get assertions in string processing: "ASSERTION: not a UTF8 string" and "ASSERTION: Input wasn't UTF8 or incorrect length was calculated"

But then classifyMessage just silently returns without calling the listeners, so routines that are expecting to loop through messages using the listeners just quit. This may be the same issue that I noticed that led to the filing of bug 472272 when I was testing with a spam corpus.

I've got a simple patch I'm testing, I'll run with it for awhile to make sure that I don't see any unexpected side effects. This could also just be a deficiency in the handling of internationalization of newsgroups, or at least what is seen by the bayes filter.

If this same effect occurs in emails, then spammers could get around the bayes processing by inserting an incorrect character or something that causes the tokenizer to fail.
Rather than silently return, in this patch the bayes module continues and just gives the result for inadequate data (which is 50% probability), and still calling the listeners. I'm logging the error, but of course the vast majority of users would see nothing.

I don't know how important this is in the long run. I'm seeing it on newsgroups, but that is probably because I have not completed the steps necessary for newsgroups to be properly tokenized by the bayes code. I'm pretty sure that my current issue will go away when I do that. I don't know how common this is outside of my current issue.

But still, callers to classifyMessage should be able to assume that the callback listeners will be called, otherwise they need to implement some complicated protection against that. But how to indicate the error? There is no clear way for the listeners to receive an error on callback, and I don't think this problem is serious enough to warrant adding some sort of callback error handling. Hence the simple PR_LOG. Originally I had an NS_ERROR() call instead, but I feared that there could be circumstances in which this error might occur frequently, so I decided instead to use a weaker error response.
Attachment #367685 - Flags: superreview?(bienvenu)
Attachment #367685 - Flags: review?(bienvenu)
Whiteboard: [needs r/sr bienvenu]
Comment on attachment 367685 [details] [diff] [review]
Call listeners even on tokenization failure

thx, Kent, I'll land this.
Attachment #367685 - Flags: superreview?(bienvenu)
Attachment #367685 - Flags: superreview+
Attachment #367685 - Flags: review?(bienvenu)
Attachment #367685 - Flags: review+
fix checked in.
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Whiteboard: [needs r/sr bienvenu]
Target Milestone: --- → Thunderbird 3.0b3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: