Crash in mail.dll when checking mail - TB073 [@ nsTransform2D::SetToIdentity ]

RESOLVED FIXED in Thunderbird0.8

Status

Thunderbird
Mail Window Front End
--
critical
RESOLVED FIXED
14 years ago
4 years ago

People

(Reporter: jay garcia, Assigned: Scott MacGregor)

Tracking

({crash, topcrash})

unspecified
Thunderbird0.8
x86
Windows 98
crash, topcrash

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: fixed-aviary1.0, crash signature)

Attachments

(1 attachment)

(Reporter)

Description

14 years ago
Mozilla Thunderbird 0.6+ (20040417) Win98SE

Crash in MAIL.DLL when checking mail, manual or auto-check. No error message,
just the crash.

Not 100% reproducible, 75% would be a more accurate figure.

Updated

14 years ago
Severity: normal → critical
Keywords: crash

Comment 1

14 years ago
(In reply to comment #0)
> Mozilla Thunderbird 0.6+ (20040417) Win98SE
> 
> Crash in MAIL.DLL when checking mail, manual or auto-check. No error message,
> just the crash.
> 
> Not 100% reproducible, 75% would be a more accurate figure.

I have the same issue.  It appears to be caused by the adaptive Junk Mail 
Controls.

Comment 2

14 years ago
jay garcia: Could you reproduce with Thunderbird 0.7.x? Could you provide
TalkBack id in such case?

Comment 3

14 years ago
I've now disabled junk mail detection for a week or so; in that time, I have not
had a single crash.  Certainly seems related...

Comment 4

14 years ago
I forgot to mention... I'm running 0.7.2 on Win2k.

Comment 5

14 years ago
can someone please turn on talkback, turn on junk mail filtering, let it crash
and send the talkback and post the talkback number here pretty please? ;)

Comment 6

14 years ago
Of course, now I can't seem to get it to crash.

However, here are some talkback IDs, in reverse chronological order:
TB451242H  (30 July)
TB450929Q  (30 July)
TB436895X  (29 July)
TB436804Z  (29 July)
TB435573G  (28 July)
TB435141X  (28 July)
TB433148M  (28 July)
TB420253X  (26 July)
TB368206Z  (19 July)
TB360832E  (18 July)
TB354886M  (18 July)
TB351673Z  (17 July)

Comment 8

14 years ago
(In reply to comment #7)
Well, the only crash I've seen has been caused by downloading mail.  However, I
did have some follow-on crashes related to Talkback (where Thunderbird would
crash, Talkback would hang, lots of task zaniness ensues requiring Task Manager
to kill things off).

Besides, I would be surprised if things like nsTransform2D::SetToIdentity,
nsViewManager::DispatchEvent, or TimerThread::UpdateFilter had segv-like bugs in
their implementation -- most of TB wouldn't work.  This makes me suspect the
Bayesian filter is causing memory corruption (I'd need to run a purified version
to test this hypothesis).

Comment 9

14 years ago
Wait a minute.  These are all "invalid operation" exceptions, and they're all
occurring in methods which use floating-point operations.  (The line numbers in
the stack traces are off, not sure why...)

I'm running on an Athlon XP machine, nothing terribly out of the ordinary
(except that it's old).  Not overclocking, overtweaking, overanything.  I've
seen similar errors (floating point exceptions) in Acrobat Reader.

Hmm.

Comment 10

14 years ago
Ok, just got it again.  Here's the talkback:
http://talkback-public.mozilla.org/talkback/fastfind.jsp?search=2&type=iid&id=548292

Again, this shows the error happening in nsTransform2D::SetToIdentity().

Very weird.

Win2k (5.00.2195), sp 4, Athlon XP 1800+, 1GB of RAM.

Comment 11

14 years ago
Hmm.  I see 308 talkbacks with the SetToIdentity() trace, and a lot of them
mention "downloading mail."

http://talkback-public.mozilla.org/talkback/fastfind.jsp?search=1&searchby=stacksig&match=begins&searchfor=nstransform2d%3A%3ASetToIdentity

However, I believe that downloading mail isn't the crux of the problem; it's
something in the way we're doing floating-point.  To dig any deeper, I'm going
to need to explore this in a debugger (which means updating my version of MSVC
from the ancient 5.0...).

Comment 12

14 years ago
Okay, this bug is about crash in nsTransform2D::SetToIdentity.
Chris, do you have any idea, what situation should crash nsTransform2D?

TB548292:
nsTransform2D::SetToIdentity  [../../../dist/include/gfx/nsTransform2D.h, line 89]
nsRenderingContextWinConstructor 
[e:/builds/tbird-0.7.2/WINNT_5.0_Clobber/mozilla/gfx/src/windows/nsGfxFactoryWin.cpp,
line 63]
nsComponentManager::CreateInstance 
[e:/builds/tbird-0.7.2/WINNT_5.0_Clobber/mozilla/xpcom/components/nsComponentManagerObsolete.cpp,
line 103]
nsWindow::OnPaint 
[e:/builds/tbird-0.7.2/WINNT_5.0_Clobber/mozilla/widget/src/windows/nsWindow.cpp,
line 5039]
nsWindow::ProcessMessage 
[e:/builds/tbird-0.7.2/WINNT_5.0_Clobber/mozilla/widget/src/windows/nsWindow.cpp,
line 3825]
nsWindow::WindowProc 
[e:/builds/tbird-0.7.2/WINNT_5.0_Clobber/mozilla/widget/src/windows/nsWindow.cpp,
line 1349]
USER32.DLL + 0x1ef0 (0x77e11ef0)
USER32.DLL + 0x3869 (0x77e13869)
USER32.DLL + 0x38ab (0x77e138ab)
ntdll.dll + 0x1ff57 (0x77f9ff57)
USER32.DLL + 0x21af (0x77e121af)
nsAppShellService::Run 
[e:/builds/tbird-0.7.2/WINNT_5.0_Clobber/mozilla/xpfe/appshell/src/nsAppShellService.cpp,
line 495]
main  [e:/builds/tbird-0.7.2/WINNT_5.0_Clobber/mozilla/mail/app/nsMailApp.cpp,
line 58]
KERNEL32.DLL + 0x11af6 (0x7c581af6)
Summary: Crash in mail.dll when checking mail → Crash in mail.dll when checking mail [@ nsTransform2D::SetToIdentity ]

Comment 13

14 years ago
(In reply to comment #12)
Argh, no.  This bug is about invalid floating point state that is being
triggered by the Bayesian filter, not about one specific stack trace.  It just
so happens that nsTransform2D::SetToIdentity is one of the more common floating
point functions being called after we leave the Bayesian filter.  Tagging this
as a nsTransform2D::SetToIdentity bug is a red herring.

I have a hunch this is related to a floating-point optimization in the build
flags that is failing on, say, AMD vs. Intel chips.  Note that the failure is
NOT a GPF but an Invalid Operation and *always* on FP code.  This is key.

Comment 14

14 years ago
Adding TB073 and topcrash keyword.  This is a topcrasher for Thunderbird 0.7.3
(currently #7):
http://talkback-public.mozilla.org/reports/thunderbird/TB073/TB073-topcrashers.html
Keywords: topcrash
Summary: Crash in mail.dll when checking mail [@ nsTransform2D::SetToIdentity ] → Crash in mail.dll when checking mail - TB073 [@ nsTransform2D::SetToIdentity ]
(Assignee)

Comment 15

14 years ago
hhopefully we can get some traction on this for 0.8
Status: NEW → ASSIGNED
Target Milestone: --- → Thunderbird0.8

Comment 16

14 years ago
Here's the best breakdown I could come up with for
"nsTransform2D::SetToIdentity" crashes...by processor type and brand/vendor:

80x86     (1 Subgroup) 	124 Incidents 
GenuineIntel 	124 Incidents 

Pentium     (2 Subgroups) 	230 Incidents 
AuthenticAMD 	149 Incidents 
GenuineIntel 	81 Incidents 

Pentium II     (2 Subgroups) 	7 Incidents 
AuthenticAMD 	5 Incidents 
GenuineIntel 	2 Incidents 

Doesn't look like Talkback collects any more details about the type of processor.
(Assignee)

Comment 17

14 years ago
There's a pretty good chance the patch in Bug #244357 will fix this crash but I
haven't had time to regression test it on the junk scores it generates.

Comment 18

14 years ago
I'm not sure bug 244357 will fix this problem. For the incident involving 
nsBayesianFilter.cpp I would look at this code.

    /* this part is similar to the Graham algorithm with some adjustments. */
    PRUint32 i, goodclues=0, count = tokenizer.countTokens();
--> double ngood = mGoodCount, nbad = mBadCount, prob;

    for (i = 0; i < count; ++i)
    {
        Token& token = tokens[i];
        const char* word = token.mWord;
        Token* t = mGoodTokens.get(word);
      double hamcount = ((t != NULL) ? t->mCount : 0);
        t = mBadTokens.get(word);
       double spamcount = ((t != NULL) ? t->mCount : 0);
-->    prob = (spamcount / nbad) / ( hamcount / ngood + spamcount / nbad);
       double n = hamcount + spamcount;
       prob =  (0.225 + n * prob) / (.45 + n);
       ...

How do you know ngood and nbad are non-zero?

Also, the second marked line should probably be written to eliminate some of
the divisions

  prob = (spamcount * ngood)/(hamcount *nbad + spamcount * ngood)

I think.

Comment 19

14 years ago
I missed the more obvious issue. If t is null then both hamcount and spamcount 
are zero and you have a problem.

Comment 20

14 years ago
Maybe something like

  double denom = (spamcount * ngood)/(hamcount *nbad + spamcount * ngood);
  if (denom == 0.0)
  {
      // do something useful, but I don't know what
      continue;
  }
  else
      prob = (spamcount * ngood) / denom;
(Assignee)

Comment 21

14 years ago
David Cuthbert, how easy is it for you to run into this crash? If we checked in
some potential fixes can you use the build and say within a day or two that the
crash is gone? Or is it not that frequent?

Comment 22

14 years ago
(In reply to comment #21)
Oh, easily.  I get enough spam that repeatedly testing this is trivial.

Hm, a good test might be to back up the profile and download the same set of
mail between the two versions (the idea being that the old one crashes, new one
doesn't).
(Assignee)

Comment 23

14 years ago
Created attachment 156822 [details] [diff] [review]
possible fix to protect against a division by zero

Here's a possible patch based on some comments by tenthumbs to avoid a possible
division by zero situation.
(Assignee)

Comment 24

14 years ago
Comment on attachment 156822 [details] [diff] [review]
possible fix to protect against a division by zero

tenthumbs, what do you think of this?
Attachment #156822 - Flags: review?(tenthumbs)
(Assignee)

Comment 25

14 years ago
David C, I just checked in this potential fix into the 0.8 branch in the hopes
that you can grab a build with the fix and see if it does indeed address the
problem.

Can you please look for a 0.8 test build here:

http://ftp.mozilla.org/pub/mozilla.org/thunderbird/nightly/latest-0.8/

You'll need to wait until builds for August 24th come out. Thanks!
Whiteboard: fixed-aviary1.0

Comment 26

14 years ago
Yep; I'll check it when it comes out.

Comment 27

14 years ago
Are you absolutely positive that ngood and nbad can never simultaneously be
zero? I can't really see it from the code.

The orginal Graham algorithm actually does this.

  n1 = min(1, spamcount / nbad);
  d1 = min(1, hamcount / ngood);
  d2 = min(1, spamcount / nbad);
  prob = n1 / (d1 + d2);

which would catch ngood or nbad being zero. That's inefficient and could throw 
exceptions but maybe it's useful. I'm not sure, though.
(Assignee)

Comment 28

14 years ago
FYI David, the builds are now out:
http://ftp.mozilla.org/pub/mozilla.org/thunderbird/nightly/latest-0.8/

Comment 29

14 years ago
Ok... no good idea whether the fix helped or not yet.  Here's what I did:

--- Verify that we have a proper testcase ---
1. Downloaded my mail (93 messages, ~500k, mostly spam) to a Linux host.  Copied
the mail so we can fool the POP3 server into hosting it multiple times.
2. In the buggy TB, I enabled junk mail filtering, and then closed the application.
3. Copied my Thunderbird profile to a backup so we could restore the state
(C:\Documents and Settings\dacut\Application Data\Thunderbird -> Thunderbird_orig)
4. Start up the buggy TB.  Grabbed mail from the POP3 server.  It crashed with
the same ol' talkback trace (TB645214 if you're curious).

Ok, at this point we know that we have a testcase which causes the bug.

--- Verify the fix ---
5. Installed the nightly into a different directory (C:\Program Files\thundertest).
6. Delete my Thunderbird profile, restore from Thunderbird_orig.
7. Restore my mail on the POP3 server.
8. Start up the nightly (from the command line to ensure I'm not starting the
buggy version).
9. Download POP3 mail.  No crash, lots of spam identified and properly filtered.

Ooh, ok, this looks promising!

10. Shut down the nightly.

--- Sanity check: make sure buggy version still fails ---
11. Restore TB profile.
12. Restore mail.
13. Start up the buggy version.
14. Download mail.  This time, however, no crash, and spam is again identified,
just as if I had run the nightly.

Hm.  Puzzling.  Perhaps it's picking up a component from the nightly?

15. Shut TB down.
16. Delete nightly install.
17. Restore TB profile
18. Restore mail.
19. Start up the buggy version again.
20. Download mail.  Again, no crash, spam identified properly.


I'm... stumped.  Is there anything stochastic about the spam classifier (e.g.,
using a random variable seeded by the timer)?  If I rerun the buggy version
multiple times (from the restored mail+profile), could I encounter the crash?

Does the spam classifier store any state in the registry (in addition to
training.dat)?

Also, the nightly reports its version as 0.7.0.  I'm assuming this is because
the number simply hasn't been updated... if I grabbed a bum build, let me know.
(Assignee)

Comment 30

14 years ago
I think we fixed this. Optimisitcally marking this fixed as it's now on the
trunk and the branch.
Status: ASSIGNED → RESOLVED
Last Resolved: 14 years ago
Resolution: --- → FIXED

Updated

13 years ago
Attachment #156822 - Flags: review?(tenthumbs)
Crash Signature: [@ nsTransform2D::SetToIdentity ]
You need to log in before you can comment on or make changes to this bug.