240788 - Marking Message as Junk Hangs Thunderbird

Reporter

Description

•

21 years ago

20040405 through 20040417 trunk builds Attempting to mark any message as junk hangs Thunderbird with 99% CPU. Deleting training.dat fixes the problem. This is a VERY large training.dat that has been learning for several years. I would rather not just delete it, but instead figure out why it broke. The training.dat file is available in 7zip format at http://testing.bakerweb.biz/mozilla/training.7z

Jerry Baker

Reporter

Comment 1

•

21 years ago

This error corrupted my Inbox. Luckily I had a backup that was run this morning.

Keywords: dataloss

Jerry Baker

Reporter

Comment 2

•

21 years ago

Further details: I was having no problem with Thunderbird up until this morning when I downloaded and ran the 20040417 trunk build. It proceeded to download some messages to the Inbox that were obviously junk. At the end of the download process it froze displaying downloading 7 of 7 messages and the junk mail stayed in the Inbox. After waiting about 10 minutes I killed the task and tried marking the junk messages as Junk with an older build - same problem. I tried deleting training.dat and it worked ok. If I restore training.dat from a backup made just a few hours before this hang, everything seems to be ok.

Patrick Corcoran

Comment 3

•

21 years ago

I have a similar "complaint", so let me add comments here instead of creating a new bug. The issue I have is that when a lot of messages are selected (in my case over 600) and I want to mark them all as not junk (I just fetched my mail for the first time from the remote account, and I know all of the messages to be valid), then the email client takes a long time to process all of those message as being not junk. I realize that their may be some expensive underlying processes at work in marking 600 message as not being junk -- that's fine. But if a process is going to take a long time, it's good defensive UI design to tell the user this information. I would have liked to see a message to the effect that Thunderbird was "working..." and that the delay (i.e. freeze) was temporary. I'm on Windows XP, running on a dual-proc 2.6GHz machine with 1GB RAM, and this process took about 20 seconds of freeze-time for me. I can imagine that many Thunderbird users will have significantly slower machines and will benefit even more from proactive notification about a long process. Thanks for all the good work!

mozilla3eran

Comment 4

•

21 years ago

Same problem here as in comment 3: marking as junk (or as not junk) takes a long time, sometimes several minutes or more. It also takes a huge amount of memory. For example, marking 60K spams as such took an hour and 1.6GB of virtual memory (Athlon 3000+, 1GB of RAM; due to swapping, CPU was was at about 30%). During this time, Thunderbird was completely unresponsive, and didn't even properly refresh its display.

Jocelyn

Comment 5

•

20 years ago

(In reply to comment #0) : I got the same problem 2 days ago, after 3months without any problem. Thunderbird hangs with 99% or 100% CPU, indefinitely, just after it has finished downloading mails from one of my mailboxes (this is when it should apply all user-defined filters and anti-spam filters). All what I tried didn't work : - reinstalling Thundirbird (1.0.2 20050317) - compacting all mail folders - deactivating all extensions And just now I came across this comment from Jerry Baker, saying that deleting training.dat fixed the problem. I deleted my own training.dat file, and it's working fine now. Thanks :-) My training.dat file is 1.19MB in size, I kept a copy if someone needs.

Jeongkyu Kim

Assignee

Comment 6

•

20 years ago

I found that the traninig file (from original reporter) seems to be somewhat corrupted and caused infinite loop in tb's bayesian filter. The format of the file could be found in nsbayesianfilter.cpp. /* Format of the training file for version 1: [0xFEEDFACE] [number good messages][number bad messages] [number good tokens] [count][length of word]word ... [number bad tokens] [count][length of word]word ... */ The problem was that the posted training file had huge 'length of word' value (2,217919,599) in one of the bad tokens. When thunderbird reads tokens, it doubles up suggested buffer size from 4096 bytes, which caused overflow in this case. And this overflow turned the buffer size into 0, and doubling had no other effect than infinite loop (see code below). static PRBool readTokens(...) { ... PRUint32 size; if (readUInt32(stream, &size) != 1) break; if (size >= bufferSize) { delete[] buffer; PRUint32 newBufferSize = 2 * bufferSize; while (size >= newBufferSize) newBufferSize *= 2; <-- infinite loop here ... } One possible solution is to ignore rest of the file when there is a overflow. A side effect of this is to lose lose training information in the rest of the file. But, that is better than infinite loop. :-) I will post my patch to handle this. By the way, is there any specific reason to use binary format for training data? In my opinion, it would be better to use text format considering the the case such as this issue.

Jeongkyu Kim

Assignee

Comment 7

•

20 years ago

Attached patch a patch to check overflow in reading tokens of training data — Details — Splinter Review

see comment #6 seeking r?

Attachment #184221 - Flags: review?(mscott)

David :Bienvenu

Comment 8

•

20 years ago

Thx for tracking this down Jeongkyu. One reason to use a binary file is that this file can get very large, and a binary format saves some space, which should speed up loading of the file. With your patch, what happens going forward after we throw away the rest of the file? Does new training information get remembered after that? I doubt we can recover from a corruption like this w/o throwing away the rest of the data, so this is probably the best we can do.

Jeongkyu Kim

Assignee

Comment 9

•

20 years ago

(In reply to comment #8) Thanks for your comment, David. I also agree that binary format is good for performance. As well as we can save sapce, we also don't need to convert string into number. However, one drawback is lack of human-readability. Because of this, it is very hard to make a diagnosis or fix it when some problems happend. Actually, effects of corrupted training data can be vary. In the case of original reporter, size of more than 2G leads overflow and infinite loop. If it was 1G than tb would try to allocate 2G heap memory. So, I belive that it is worth to think about using text format. The performance hit by text format could be compensated by other improvement. For example, tb curretnly use at least 4KB heap for each token. Typically, tokens occupy several bytes at most. I guess that this situation could be improved a lot if we allocated thread-safe common heap for all tokens. With regarding to the patch, I haven't got a chance to make sure that new training information is saved correctly. After applying the patch, I found that the problematic data file was shrunk as expected and there is no more infinite loop. I will try to verify effect of the patch by writing some code to display tokens inside training data file. :-) Jeongkyu

Scott MacGregor

Comment 10

•

20 years ago

If you can confirm that the the training file grows again after we shrink it down then I''ll review the patch. i.e. after triggering this truncation we should make sure future attempts at training get reflected back into the training data file.

Mike Cowperthwaite

Comment 11

•

20 years ago

Is bug 295843 a dupe of this?

Scott MacGregor

Comment 12

•

20 years ago

Comment on attachment 184221 [details] [diff] [review] a patch to check overflow in reading tokens of training data clearing the review request pending an answer to an old question of mine in the last comment.

Attachment #184221 - Flags: review?(mscott)

Jeongkyu Kim

Assignee

Comment 13

•

19 years ago

Hi Scott, > If you can confirm that the the training file grows again after we shrink it > down then I''ll review the patch. Sorry for being super late. :-) I just confirmed that the training file grows again. Here is what I've done. 1. Cleaned up thunderbird profile and set up test account. 2. Turned on junk mail control and closed thunderbird. 3. Copied the problematic training.dat into my profile. The data size at this point was 2,162,378 bytes. 4. Opened thunderbird and received a test mail. 5. Marked the mail as junk. 6. Confirmed that thunderbird hanged. 7. Applied my patch and rebuilt thunerbird. 8. Opened patched thunderbird. 9. Marked the test mail as junk. 10. Confirmed that thunderbird did not hang. 11. Closed the thunderbird. 12. Checked the data file. It was shrunken as expected and the size of it was 524,664. 13. Opened the thunderbird again and marked several test mails as junk. 14. Closed thunderbird and checked the data file. It had grown as expected and the size of it was 525,042 bytes.

Jeongkyu Kim

Assignee

Updated

•

19 years ago

Attachment #184221 - Flags: review?(mscott)

Scott MacGregor

Comment 14

•

19 years ago

Comment on attachment 184221 [details] [diff] [review] a patch to check overflow in reading tokens of training data cool. Although I don't think this issue needs to have the bug cited in acomment, folks can use lxr for that.

Attachment #184221 - Flags: review?(mscott) → review+

David :Bienvenu

Comment 15

•

19 years ago

Comment on attachment 184221 [details] [diff] [review] a patch to check overflow in reading tokens of training data on the one hand, we could try incrementing the newBufferSize in the situation where doubling it produces 0, but on the other hand, when it's that big, might as well prune it.

Attachment #184221 - Flags: superreview+

Jeongkyu Kim

Assignee

Comment 16

•

19 years ago

(In reply to comment #14) > (From update of attachment 184221 [details] [diff] [review] [edit]) > cool. Although I don't think this issue needs to have the bug cited in > acomment, folks can use lxr for that. > Got it. Would you take the comment out when you commit the patch? (In reply to comment #15) > (From update of attachment 184221 [details] [diff] [review] [edit]) > on the one hand, we could try incrementing the newBufferSize in the situation > where doubling it produces 0, but on the other hand, when it's that big, might > as well prune it. > I believe that the initial buffer size(4kb) is already big enough for each token. The actual problem here is not how we handle buffer size but why this problemtic case happen. The data file was somehow damaged, but I could not think of any possible scenario.

Wayne Mery (:wsmwk)

Updated

•

19 years ago

Blocks: 295843

Wayne Mery (:wsmwk)

Comment 17

•

19 years ago

Jeongkyu Kim in comment #16: >... > The data file was somehow damaged, but I could not think of any possible > scenario. Jeongkyu, there are testcases in bug 215701 and friends (perhaps this might lead to a fix for them)

Severity: normal → critical

QA Contact: front-end

Scott MacGregor

Updated

•

19 years ago

Whiteboard: [checkin needed]

:Gavin Sharp [email: gavin@gavinsharp.com]

Updated

•

19 years ago

Assignee: mscott → jeongkyu.kim

:Gavin Sharp [email: gavin@gavinsharp.com]

Comment 18

•

19 years ago

mozilla/mailnews/extensions/bayesian-spam-filter/src/nsBayesianFilter.cpp 1.57

Status: NEW → RESOLVED

Closed: 19 years ago

Resolution: --- → FIXED

Whiteboard: [checkin needed]

Target Milestone: --- → Thunderbird 3

Version: unspecified → Trunk

Andrew Schultz

Comment 19

•

19 years ago

*** Bug 317798 has been marked as a duplicate of this bug. ***

Wayne Mery (:wsmwk)

Comment 20

•

19 years ago

*** Bug 295843 has been marked as a duplicate of this bug. ***

No longer blocks: 295843

Wayne Mery (:wsmwk)

Comment 21

•

19 years ago

will this be picked up in TB 2 and/or SM 1.1?

Mike Cowperthwaite

Comment 22

•

19 years ago

Comment on attachment 184221 [details] [diff] [review] a patch to check overflow in reading tokens of training data Wayne: Ask for "approval1.8.1" on the attachment rather than just posting a comment, it'll be harder to miss that way.

Attachment #184221 - Flags: approval1.8.1?

Mike Connor [:mconnor]

Updated

•

19 years ago

Attachment #184221 - Flags: approval1.8.1? → approval-thunderbird2?

Scott MacGregor

Comment 23

•

19 years ago

Comment on attachment 184221 [details] [diff] [review] a patch to check overflow in reading tokens of training data per a discussion with ajschultz

Attachment #184221 - Flags: approval-thunderbird2? → approval-thunderbird2+

Andrew Schultz

Comment 24

•

19 years ago

landed on branch

Keywords: fixed1.8.1.2

Carsten Book [:Tomcat]

Comment 25

•

18 years ago

Verified fixed for 1.8.1.3 using Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.8.1.3) Gecko/20070326 Thunderbird/2.0.0.0 ID:2007032620 (Thunderbird 2 RC 1) Marking Messages as Junk is working fine, no hang on several tests.

Keywords: fixed1.8.1.2 → verified1.8.1.3