Closed Bug 240819 Opened 20 years ago Closed 20 years ago

Crash in mail.dll when checking mail - TB073 [@ nsTransform2D::SetToIdentity ]

Categories

(Thunderbird :: Mail Window Front End, defect)

x86
Windows 98
defect
Not set
critical

Tracking

(Not tracked)

RESOLVED FIXED
Thunderbird0.8

People

(Reporter: jay, Assigned: mscott)

Details

(Keywords: crash, topcrash, Whiteboard: fixed-aviary1.0)

Crash Data

Attachments

(1 file)

Mozilla Thunderbird 0.6+ (20040417) Win98SE

Crash in MAIL.DLL when checking mail, manual or auto-check. No error message,
just the crash.

Not 100% reproducible, 75% would be a more accurate figure.
Severity: normal → critical
Keywords: crash
(In reply to comment #0)
> Mozilla Thunderbird 0.6+ (20040417) Win98SE
> 
> Crash in MAIL.DLL when checking mail, manual or auto-check. No error message,
> just the crash.
> 
> Not 100% reproducible, 75% would be a more accurate figure.

I have the same issue.  It appears to be caused by the adaptive Junk Mail 
Controls.
jay garcia: Could you reproduce with Thunderbird 0.7.x? Could you provide
TalkBack id in such case?
I've now disabled junk mail detection for a week or so; in that time, I have not
had a single crash.  Certainly seems related...
I forgot to mention... I'm running 0.7.2 on Win2k.
can someone please turn on talkback, turn on junk mail filtering, let it crash
and send the talkback and post the talkback number here pretty please? ;)
Of course, now I can't seem to get it to crash.

However, here are some talkback IDs, in reverse chronological order:
TB451242H  (30 July)
TB450929Q  (30 July)
TB436895X  (29 July)
TB436804Z  (29 July)
TB435573G  (28 July)
TB435141X  (28 July)
TB433148M  (28 July)
TB420253X  (26 July)
TB368206Z  (19 July)
TB360832E  (18 July)
TB354886M  (18 July)
TB351673Z  (17 July)
(In reply to comment #7)
Well, the only crash I've seen has been caused by downloading mail.  However, I
did have some follow-on crashes related to Talkback (where Thunderbird would
crash, Talkback would hang, lots of task zaniness ensues requiring Task Manager
to kill things off).

Besides, I would be surprised if things like nsTransform2D::SetToIdentity,
nsViewManager::DispatchEvent, or TimerThread::UpdateFilter had segv-like bugs in
their implementation -- most of TB wouldn't work.  This makes me suspect the
Bayesian filter is causing memory corruption (I'd need to run a purified version
to test this hypothesis).
Wait a minute.  These are all "invalid operation" exceptions, and they're all
occurring in methods which use floating-point operations.  (The line numbers in
the stack traces are off, not sure why...)

I'm running on an Athlon XP machine, nothing terribly out of the ordinary
(except that it's old).  Not overclocking, overtweaking, overanything.  I've
seen similar errors (floating point exceptions) in Acrobat Reader.

Hmm.
Ok, just got it again.  Here's the talkback:
http://talkback-public.mozilla.org/talkback/fastfind.jsp?search=2&type=iid&id=548292

Again, this shows the error happening in nsTransform2D::SetToIdentity().

Very weird.

Win2k (5.00.2195), sp 4, Athlon XP 1800+, 1GB of RAM.
Hmm.  I see 308 talkbacks with the SetToIdentity() trace, and a lot of them
mention "downloading mail."

http://talkback-public.mozilla.org/talkback/fastfind.jsp?search=1&searchby=stacksig&match=begins&searchfor=nstransform2d%3A%3ASetToIdentity

However, I believe that downloading mail isn't the crux of the problem; it's
something in the way we're doing floating-point.  To dig any deeper, I'm going
to need to explore this in a debugger (which means updating my version of MSVC
from the ancient 5.0...).
Okay, this bug is about crash in nsTransform2D::SetToIdentity.
Chris, do you have any idea, what situation should crash nsTransform2D?

TB548292:
nsTransform2D::SetToIdentity  [../../../dist/include/gfx/nsTransform2D.h, line 89]
nsRenderingContextWinConstructor 
[e:/builds/tbird-0.7.2/WINNT_5.0_Clobber/mozilla/gfx/src/windows/nsGfxFactoryWin.cpp,
line 63]
nsComponentManager::CreateInstance 
[e:/builds/tbird-0.7.2/WINNT_5.0_Clobber/mozilla/xpcom/components/nsComponentManagerObsolete.cpp,
line 103]
nsWindow::OnPaint 
[e:/builds/tbird-0.7.2/WINNT_5.0_Clobber/mozilla/widget/src/windows/nsWindow.cpp,
line 5039]
nsWindow::ProcessMessage 
[e:/builds/tbird-0.7.2/WINNT_5.0_Clobber/mozilla/widget/src/windows/nsWindow.cpp,
line 3825]
nsWindow::WindowProc 
[e:/builds/tbird-0.7.2/WINNT_5.0_Clobber/mozilla/widget/src/windows/nsWindow.cpp,
line 1349]
USER32.DLL + 0x1ef0 (0x77e11ef0)
USER32.DLL + 0x3869 (0x77e13869)
USER32.DLL + 0x38ab (0x77e138ab)
ntdll.dll + 0x1ff57 (0x77f9ff57)
USER32.DLL + 0x21af (0x77e121af)
nsAppShellService::Run 
[e:/builds/tbird-0.7.2/WINNT_5.0_Clobber/mozilla/xpfe/appshell/src/nsAppShellService.cpp,
line 495]
main  [e:/builds/tbird-0.7.2/WINNT_5.0_Clobber/mozilla/mail/app/nsMailApp.cpp,
line 58]
KERNEL32.DLL + 0x11af6 (0x7c581af6)
Summary: Crash in mail.dll when checking mail → Crash in mail.dll when checking mail [@ nsTransform2D::SetToIdentity ]
(In reply to comment #12)
Argh, no.  This bug is about invalid floating point state that is being
triggered by the Bayesian filter, not about one specific stack trace.  It just
so happens that nsTransform2D::SetToIdentity is one of the more common floating
point functions being called after we leave the Bayesian filter.  Tagging this
as a nsTransform2D::SetToIdentity bug is a red herring.

I have a hunch this is related to a floating-point optimization in the build
flags that is failing on, say, AMD vs. Intel chips.  Note that the failure is
NOT a GPF but an Invalid Operation and *always* on FP code.  This is key.
Adding TB073 and topcrash keyword.  This is a topcrasher for Thunderbird 0.7.3
(currently #7):
http://talkback-public.mozilla.org/reports/thunderbird/TB073/TB073-topcrashers.html
Keywords: topcrash
Summary: Crash in mail.dll when checking mail [@ nsTransform2D::SetToIdentity ] → Crash in mail.dll when checking mail - TB073 [@ nsTransform2D::SetToIdentity ]
hhopefully we can get some traction on this for 0.8
Status: NEW → ASSIGNED
Target Milestone: --- → Thunderbird0.8
Here's the best breakdown I could come up with for
"nsTransform2D::SetToIdentity" crashes...by processor type and brand/vendor:

80x86     (1 Subgroup) 	124 Incidents 
GenuineIntel 	124 Incidents 

Pentium     (2 Subgroups) 	230 Incidents 
AuthenticAMD 	149 Incidents 
GenuineIntel 	81 Incidents 

Pentium II     (2 Subgroups) 	7 Incidents 
AuthenticAMD 	5 Incidents 
GenuineIntel 	2 Incidents 

Doesn't look like Talkback collects any more details about the type of processor.
There's a pretty good chance the patch in Bug #244357 will fix this crash but I
haven't had time to regression test it on the junk scores it generates.
I'm not sure bug 244357 will fix this problem. For the incident involving 
nsBayesianFilter.cpp I would look at this code.

    /* this part is similar to the Graham algorithm with some adjustments. */
    PRUint32 i, goodclues=0, count = tokenizer.countTokens();
--> double ngood = mGoodCount, nbad = mBadCount, prob;

    for (i = 0; i < count; ++i)
    {
        Token& token = tokens[i];
        const char* word = token.mWord;
        Token* t = mGoodTokens.get(word);
      double hamcount = ((t != NULL) ? t->mCount : 0);
        t = mBadTokens.get(word);
       double spamcount = ((t != NULL) ? t->mCount : 0);
-->    prob = (spamcount / nbad) / ( hamcount / ngood + spamcount / nbad);
       double n = hamcount + spamcount;
       prob =  (0.225 + n * prob) / (.45 + n);
       ...

How do you know ngood and nbad are non-zero?

Also, the second marked line should probably be written to eliminate some of
the divisions

  prob = (spamcount * ngood)/(hamcount *nbad + spamcount * ngood)

I think.
I missed the more obvious issue. If t is null then both hamcount and spamcount 
are zero and you have a problem.
Maybe something like

  double denom = (spamcount * ngood)/(hamcount *nbad + spamcount * ngood);
  if (denom == 0.0)
  {
      // do something useful, but I don't know what
      continue;
  }
  else
      prob = (spamcount * ngood) / denom;
David Cuthbert, how easy is it for you to run into this crash? If we checked in
some potential fixes can you use the build and say within a day or two that the
crash is gone? Or is it not that frequent?
(In reply to comment #21)
Oh, easily.  I get enough spam that repeatedly testing this is trivial.

Hm, a good test might be to back up the profile and download the same set of
mail between the two versions (the idea being that the old one crashes, new one
doesn't).
Here's a possible patch based on some comments by tenthumbs to avoid a possible
division by zero situation.
Comment on attachment 156822 [details] [diff] [review]
possible fix to protect against a division by zero

tenthumbs, what do you think of this?
Attachment #156822 - Flags: review?(tenthumbs)
David C, I just checked in this potential fix into the 0.8 branch in the hopes
that you can grab a build with the fix and see if it does indeed address the
problem.

Can you please look for a 0.8 test build here:

http://ftp.mozilla.org/pub/mozilla.org/thunderbird/nightly/latest-0.8/

You'll need to wait until builds for August 24th come out. Thanks!
Whiteboard: fixed-aviary1.0
Yep; I'll check it when it comes out.
Are you absolutely positive that ngood and nbad can never simultaneously be
zero? I can't really see it from the code.

The orginal Graham algorithm actually does this.

  n1 = min(1, spamcount / nbad);
  d1 = min(1, hamcount / ngood);
  d2 = min(1, spamcount / nbad);
  prob = n1 / (d1 + d2);

which would catch ngood or nbad being zero. That's inefficient and could throw 
exceptions but maybe it's useful. I'm not sure, though.
FYI David, the builds are now out:
http://ftp.mozilla.org/pub/mozilla.org/thunderbird/nightly/latest-0.8/
Ok... no good idea whether the fix helped or not yet.  Here's what I did:

--- Verify that we have a proper testcase ---
1. Downloaded my mail (93 messages, ~500k, mostly spam) to a Linux host.  Copied
the mail so we can fool the POP3 server into hosting it multiple times.
2. In the buggy TB, I enabled junk mail filtering, and then closed the application.
3. Copied my Thunderbird profile to a backup so we could restore the state
(C:\Documents and Settings\dacut\Application Data\Thunderbird -> Thunderbird_orig)
4. Start up the buggy TB.  Grabbed mail from the POP3 server.  It crashed with
the same ol' talkback trace (TB645214 if you're curious).

Ok, at this point we know that we have a testcase which causes the bug.

--- Verify the fix ---
5. Installed the nightly into a different directory (C:\Program Files\thundertest).
6. Delete my Thunderbird profile, restore from Thunderbird_orig.
7. Restore my mail on the POP3 server.
8. Start up the nightly (from the command line to ensure I'm not starting the
buggy version).
9. Download POP3 mail.  No crash, lots of spam identified and properly filtered.

Ooh, ok, this looks promising!

10. Shut down the nightly.

--- Sanity check: make sure buggy version still fails ---
11. Restore TB profile.
12. Restore mail.
13. Start up the buggy version.
14. Download mail.  This time, however, no crash, and spam is again identified,
just as if I had run the nightly.

Hm.  Puzzling.  Perhaps it's picking up a component from the nightly?

15. Shut TB down.
16. Delete nightly install.
17. Restore TB profile
18. Restore mail.
19. Start up the buggy version again.
20. Download mail.  Again, no crash, spam identified properly.


I'm... stumped.  Is there anything stochastic about the spam classifier (e.g.,
using a random variable seeded by the timer)?  If I rerun the buggy version
multiple times (from the restored mail+profile), could I encounter the crash?

Does the spam classifier store any state in the registry (in addition to
training.dat)?

Also, the nightly reports its version as 0.7.0.  I'm assuming this is because
the number simply hasn't been updated... if I grabbed a bum build, let me know.
I think we fixed this. Optimisitcally marking this fixed as it's now on the
trunk and the branch.
Status: ASSIGNED → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
Attachment #156822 - Flags: review?(tenthumbs)
Crash Signature: [@ nsTransform2D::SetToIdentity ]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: