Closed Bug 240819 Opened 21 years ago Closed 20 years ago

Crash in mail.dll when checking mail - TB073 [@ nsTransform2D::SetToIdentity ]

Categories

(Thunderbird :: Mail Window Front End, defect)

x86
Windows 98
defect
Not set
critical

Tracking

(Not tracked)

RESOLVED FIXED
Thunderbird0.8

People

(Reporter: jay, Assigned: mscott)

Details

(Keywords: crash, topcrash, Whiteboard: fixed-aviary1.0)

Crash Data

Attachments

(1 file)

Mozilla Thunderbird 0.6+ (20040417) Win98SE Crash in MAIL.DLL when checking mail, manual or auto-check. No error message, just the crash. Not 100% reproducible, 75% would be a more accurate figure.
Severity: normal → critical
Keywords: crash
(In reply to comment #0) > Mozilla Thunderbird 0.6+ (20040417) Win98SE > > Crash in MAIL.DLL when checking mail, manual or auto-check. No error message, > just the crash. > > Not 100% reproducible, 75% would be a more accurate figure. I have the same issue. It appears to be caused by the adaptive Junk Mail Controls.
jay garcia: Could you reproduce with Thunderbird 0.7.x? Could you provide TalkBack id in such case?
I've now disabled junk mail detection for a week or so; in that time, I have not had a single crash. Certainly seems related...
I forgot to mention... I'm running 0.7.2 on Win2k.
can someone please turn on talkback, turn on junk mail filtering, let it crash and send the talkback and post the talkback number here pretty please? ;)
Of course, now I can't seem to get it to crash. However, here are some talkback IDs, in reverse chronological order: TB451242H (30 July) TB450929Q (30 July) TB436895X (29 July) TB436804Z (29 July) TB435573G (28 July) TB435141X (28 July) TB433148M (28 July) TB420253X (26 July) TB368206Z (19 July) TB360832E (18 July) TB354886M (18 July) TB351673Z (17 July)
(In reply to comment #7) Well, the only crash I've seen has been caused by downloading mail. However, I did have some follow-on crashes related to Talkback (where Thunderbird would crash, Talkback would hang, lots of task zaniness ensues requiring Task Manager to kill things off). Besides, I would be surprised if things like nsTransform2D::SetToIdentity, nsViewManager::DispatchEvent, or TimerThread::UpdateFilter had segv-like bugs in their implementation -- most of TB wouldn't work. This makes me suspect the Bayesian filter is causing memory corruption (I'd need to run a purified version to test this hypothesis).
Wait a minute. These are all "invalid operation" exceptions, and they're all occurring in methods which use floating-point operations. (The line numbers in the stack traces are off, not sure why...) I'm running on an Athlon XP machine, nothing terribly out of the ordinary (except that it's old). Not overclocking, overtweaking, overanything. I've seen similar errors (floating point exceptions) in Acrobat Reader. Hmm.
Ok, just got it again. Here's the talkback: http://talkback-public.mozilla.org/talkback/fastfind.jsp?search=2&type=iid&id=548292 Again, this shows the error happening in nsTransform2D::SetToIdentity(). Very weird. Win2k (5.00.2195), sp 4, Athlon XP 1800+, 1GB of RAM.
Hmm. I see 308 talkbacks with the SetToIdentity() trace, and a lot of them mention "downloading mail." http://talkback-public.mozilla.org/talkback/fastfind.jsp?search=1&searchby=stacksig&match=begins&searchfor=nstransform2d%3A%3ASetToIdentity However, I believe that downloading mail isn't the crux of the problem; it's something in the way we're doing floating-point. To dig any deeper, I'm going to need to explore this in a debugger (which means updating my version of MSVC from the ancient 5.0...).
Okay, this bug is about crash in nsTransform2D::SetToIdentity. Chris, do you have any idea, what situation should crash nsTransform2D? TB548292: nsTransform2D::SetToIdentity [../../../dist/include/gfx/nsTransform2D.h, line 89] nsRenderingContextWinConstructor [e:/builds/tbird-0.7.2/WINNT_5.0_Clobber/mozilla/gfx/src/windows/nsGfxFactoryWin.cpp, line 63] nsComponentManager::CreateInstance [e:/builds/tbird-0.7.2/WINNT_5.0_Clobber/mozilla/xpcom/components/nsComponentManagerObsolete.cpp, line 103] nsWindow::OnPaint [e:/builds/tbird-0.7.2/WINNT_5.0_Clobber/mozilla/widget/src/windows/nsWindow.cpp, line 5039] nsWindow::ProcessMessage [e:/builds/tbird-0.7.2/WINNT_5.0_Clobber/mozilla/widget/src/windows/nsWindow.cpp, line 3825] nsWindow::WindowProc [e:/builds/tbird-0.7.2/WINNT_5.0_Clobber/mozilla/widget/src/windows/nsWindow.cpp, line 1349] USER32.DLL + 0x1ef0 (0x77e11ef0) USER32.DLL + 0x3869 (0x77e13869) USER32.DLL + 0x38ab (0x77e138ab) ntdll.dll + 0x1ff57 (0x77f9ff57) USER32.DLL + 0x21af (0x77e121af) nsAppShellService::Run [e:/builds/tbird-0.7.2/WINNT_5.0_Clobber/mozilla/xpfe/appshell/src/nsAppShellService.cpp, line 495] main [e:/builds/tbird-0.7.2/WINNT_5.0_Clobber/mozilla/mail/app/nsMailApp.cpp, line 58] KERNEL32.DLL + 0x11af6 (0x7c581af6)
Summary: Crash in mail.dll when checking mail → Crash in mail.dll when checking mail [@ nsTransform2D::SetToIdentity ]
(In reply to comment #12) Argh, no. This bug is about invalid floating point state that is being triggered by the Bayesian filter, not about one specific stack trace. It just so happens that nsTransform2D::SetToIdentity is one of the more common floating point functions being called after we leave the Bayesian filter. Tagging this as a nsTransform2D::SetToIdentity bug is a red herring. I have a hunch this is related to a floating-point optimization in the build flags that is failing on, say, AMD vs. Intel chips. Note that the failure is NOT a GPF but an Invalid Operation and *always* on FP code. This is key.
Adding TB073 and topcrash keyword. This is a topcrasher for Thunderbird 0.7.3 (currently #7): http://talkback-public.mozilla.org/reports/thunderbird/TB073/TB073-topcrashers.html
Keywords: topcrash
Summary: Crash in mail.dll when checking mail [@ nsTransform2D::SetToIdentity ] → Crash in mail.dll when checking mail - TB073 [@ nsTransform2D::SetToIdentity ]
hhopefully we can get some traction on this for 0.8
Status: NEW → ASSIGNED
Target Milestone: --- → Thunderbird0.8
Here's the best breakdown I could come up with for "nsTransform2D::SetToIdentity" crashes...by processor type and brand/vendor: 80x86 (1 Subgroup) 124 Incidents GenuineIntel 124 Incidents Pentium (2 Subgroups) 230 Incidents AuthenticAMD 149 Incidents GenuineIntel 81 Incidents Pentium II (2 Subgroups) 7 Incidents AuthenticAMD 5 Incidents GenuineIntel 2 Incidents Doesn't look like Talkback collects any more details about the type of processor.
There's a pretty good chance the patch in Bug #244357 will fix this crash but I haven't had time to regression test it on the junk scores it generates.
I'm not sure bug 244357 will fix this problem. For the incident involving nsBayesianFilter.cpp I would look at this code. /* this part is similar to the Graham algorithm with some adjustments. */ PRUint32 i, goodclues=0, count = tokenizer.countTokens(); --> double ngood = mGoodCount, nbad = mBadCount, prob; for (i = 0; i < count; ++i) { Token& token = tokens[i]; const char* word = token.mWord; Token* t = mGoodTokens.get(word); double hamcount = ((t != NULL) ? t->mCount : 0); t = mBadTokens.get(word); double spamcount = ((t != NULL) ? t->mCount : 0); --> prob = (spamcount / nbad) / ( hamcount / ngood + spamcount / nbad); double n = hamcount + spamcount; prob = (0.225 + n * prob) / (.45 + n); ... How do you know ngood and nbad are non-zero? Also, the second marked line should probably be written to eliminate some of the divisions prob = (spamcount * ngood)/(hamcount *nbad + spamcount * ngood) I think.
I missed the more obvious issue. If t is null then both hamcount and spamcount are zero and you have a problem.
Maybe something like double denom = (spamcount * ngood)/(hamcount *nbad + spamcount * ngood); if (denom == 0.0) { // do something useful, but I don't know what continue; } else prob = (spamcount * ngood) / denom;
David Cuthbert, how easy is it for you to run into this crash? If we checked in some potential fixes can you use the build and say within a day or two that the crash is gone? Or is it not that frequent?
(In reply to comment #21) Oh, easily. I get enough spam that repeatedly testing this is trivial. Hm, a good test might be to back up the profile and download the same set of mail between the two versions (the idea being that the old one crashes, new one doesn't).
Here's a possible patch based on some comments by tenthumbs to avoid a possible division by zero situation.
Comment on attachment 156822 [details] [diff] [review] possible fix to protect against a division by zero tenthumbs, what do you think of this?
Attachment #156822 - Flags: review?(tenthumbs)
David C, I just checked in this potential fix into the 0.8 branch in the hopes that you can grab a build with the fix and see if it does indeed address the problem. Can you please look for a 0.8 test build here: http://ftp.mozilla.org/pub/mozilla.org/thunderbird/nightly/latest-0.8/ You'll need to wait until builds for August 24th come out. Thanks!
Whiteboard: fixed-aviary1.0
Yep; I'll check it when it comes out.
Are you absolutely positive that ngood and nbad can never simultaneously be zero? I can't really see it from the code. The orginal Graham algorithm actually does this. n1 = min(1, spamcount / nbad); d1 = min(1, hamcount / ngood); d2 = min(1, spamcount / nbad); prob = n1 / (d1 + d2); which would catch ngood or nbad being zero. That's inefficient and could throw exceptions but maybe it's useful. I'm not sure, though.
Ok... no good idea whether the fix helped or not yet. Here's what I did: --- Verify that we have a proper testcase --- 1. Downloaded my mail (93 messages, ~500k, mostly spam) to a Linux host. Copied the mail so we can fool the POP3 server into hosting it multiple times. 2. In the buggy TB, I enabled junk mail filtering, and then closed the application. 3. Copied my Thunderbird profile to a backup so we could restore the state (C:\Documents and Settings\dacut\Application Data\Thunderbird -> Thunderbird_orig) 4. Start up the buggy TB. Grabbed mail from the POP3 server. It crashed with the same ol' talkback trace (TB645214 if you're curious). Ok, at this point we know that we have a testcase which causes the bug. --- Verify the fix --- 5. Installed the nightly into a different directory (C:\Program Files\thundertest). 6. Delete my Thunderbird profile, restore from Thunderbird_orig. 7. Restore my mail on the POP3 server. 8. Start up the nightly (from the command line to ensure I'm not starting the buggy version). 9. Download POP3 mail. No crash, lots of spam identified and properly filtered. Ooh, ok, this looks promising! 10. Shut down the nightly. --- Sanity check: make sure buggy version still fails --- 11. Restore TB profile. 12. Restore mail. 13. Start up the buggy version. 14. Download mail. This time, however, no crash, and spam is again identified, just as if I had run the nightly. Hm. Puzzling. Perhaps it's picking up a component from the nightly? 15. Shut TB down. 16. Delete nightly install. 17. Restore TB profile 18. Restore mail. 19. Start up the buggy version again. 20. Download mail. Again, no crash, spam identified properly. I'm... stumped. Is there anything stochastic about the spam classifier (e.g., using a random variable seeded by the timer)? If I rerun the buggy version multiple times (from the restored mail+profile), could I encounter the crash? Does the spam classifier store any state in the registry (in addition to training.dat)? Also, the nightly reports its version as 0.7.0. I'm assuming this is because the number simply hasn't been updated... if I grabbed a bum build, let me know.
I think we fixed this. Optimisitcally marking this fixed as it's now on the trunk and the branch.
Status: ASSIGNED → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
Attachment #156822 - Flags: review?(tenthumbs)
Crash Signature: [@ nsTransform2D::SetToIdentity ]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: