Closed Bug 308277 Opened 19 years ago Closed 18 years ago

Norman Virus Control crashes thunderbird [@ Niphk.dll + 0x2bb3]

Categories

(NSPR :: NSPR, defect)

x86
Windows XP
defect
Not set
critical

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: bugzilla, Assigned: wtc)

References

(Depends on 1 open bug, )

Details

(Keywords: crash, relnote)

Crash Data

Attachments

(2 files, 2 obsolete files)

thunderbird in the background suddenly crashed

TB9297895W
http://talkback-public.mozilla.org/talkback/fastfind.jsp?search=2&type=iid&id=TB9297895W

Thunderbird version 1.6a1 (20050912) nightly build
http://forum.norman.com/viewtopic.php?
p=1211&sid=9f76723bd1c492d1ec7b162c981ec1d8

Incident ID: 9297895 
Stack Signature Niphk.dll + 0x2bb3 (0x678a2bb3) 0f68639a 
Product ID ThunderbirdTrunk 
Build ID 2005091206 
Trigger Time 2005-09-12 22:34:02.0 
Platform Win32 
Operating System Windows NT 5.1 build 2600 
Module Niphk.dll + (00002bb3) 
URL visited  
User Comments  
Since Last Crash 2561 sec 
Total Uptime 2561 sec 
Trigger Reason Access violation 
Source File, Line No. N/A 
Stack Trace  

Niphk.dll + 0x2bb3 (0x678a2bb3)
_PR_MD_RECV  [e:/builds/tinderbox/thunderbird-
trunk/WINNT_5.0_Depend/mozilla/nsprpub/pr/src/md/windows/w95sock.c, line 214]
Summary: thunderbird in the background died → Norman Virus Control crashes thunderbird [@ Niphk.dll + 0x2bb3]
Assignee: mscott → nobody
Severity: normal → critical
Component: Mail Window Front End → MailNews: Networking
Keywords: relnote
Product: Thunderbird → Core
QA Contact: grylchan
NipHK.dll is not part of thunderbird - we can't really control crashes in dlls
that aren't part of our product.
invalid?
that's not true.

at this point, i think nspr probably needs to add SEH around all calls it makes to this series of win32api's.

we used to do that for plugins (until someone improved the compiler or something).
Assignee: nobody → wtchang
Component: MailNews: Networking → NSPR
Product: Core → NSPR
QA Contact: grylchan → wtchang
Version: Trunk → 4.6
The three Talkback URLs in this bug report all say:

  Incident Not Found.

  DB error looking for Incident ID: 9297895

The forum.norman.com link points to a page that doesn't
mention "Mozilla", "Firefox", or "Thunderbird".

The only info useful to me is the NSPR source file and
line number at the end of comment 1.  It tells me NSPR
is calling the Winsock "recv" function.

I don't know what else I can do to fix this.
see bug 348170 and friends.

basically for windows you need a flavor of try// around the call to wsock recv :)
Attached patch Catch the error using SEH (obsolete) — Splinter Review
This patch include NSPR Guard macros, and uses them to catch the error by means of SEH.  This patch is not tested.
Timeless in comment #6
> see bug 348170 and friends.

by friends you mean bug 156493 and bug ?____? 
nope, various bugs about SEH, not threading or ipc.
The previous patch (attachment #235086 [details] [diff] [review]) contained a bug which caused the linker to fail because of a non-existing reference to MessageBoxA (in user32.lib).  The new patch includes code to automatically link in user32.lib for all source files which #include "prguard.h".
Attachment #235086 - Attachment is obsolete: true
The BEGIN_GUARD and END_GUARD macros should be defined
in the NSPR-internal header file mozilla/nsprpub/pr/include/md/_win95.h.
When the macros are defined there, we only need to handle
the two possible C compilers: MSVC and GCC (MinGW).  We
don't need to handle __cplusplus or non-Windows compilers.

NSPR may be used by command-line applications, so it should
not call MessageBox to report an error.  In this case you
just need to call WSASetLastError.

Isn't this really a bug in Norman Virus Control?  Its Niphk.dll
should not throw an exception.  Has this crash been reported
to Norman Virus Control?  I'm not convinced that NSPR should
catch exceptions.
wtc: i've seen probably half a dozen *different* apps cause crashes like this by messing up winsock.

while it's their fault, our users shouldn't suffer.
Depends on: 350087
(In reply to comment #11)
> The BEGIN_GUARD and END_GUARD macros should be defined
> in the NSPR-internal header file mozilla/nsprpub/pr/include/md/_win95.h.

Well, I chose the public include directory, because these macros will actually be used in many other places besides NSPR internal source files, see bug #329299, bug #325756, bug #317420, bug #314723, etc.  In fact, it may be used in any source file which is calling a Win32 API function (or calls into other APIs, such as plugins).

> When the macros are defined there, we only need to handle
> the two possible C compilers: MSVC and GCC (MinGW).  We
> don't need to handle __cplusplus or non-Windows compilers.

For now, I'm not sure how to do SEH in MinGW, so the current code only handles MSVC (any gcc gurus out there who can help?).  The __cplusplus handling is there because SEH in C and C++ are different.  See my comments in bug #348170.  For C++, the best option is enabling the -EHac- compiler option (see attachment #233455 [details] [diff] [review]) and then using try/catch(...) blocks, while in C, those are not available, and __try/__except must be used.

> NSPR may be used by command-line applications, so it should
> not call MessageBox to report an error.  In this case you
> just need to call WSASetLastError.

Thanks for mentioning it.  I'm preparing a new patch which addresses this issue.  I'll send it shortly.

> Isn't this really a bug in Norman Virus Control?  Its Niphk.dll
> should not throw an exception.  Has this crash been reported
> to Norman Virus Control?  I'm not convinced that NSPR should
> catch exceptions.

I agree with timeless in comment #12.
This patch adds support for console applications to address the issue raised in comment #11.
Attachment #235220 - Attachment is obsolete: true
Thank you very much for your patch.  But I don't want to
catch exceptions thrown by code that should not throw
exceptions.

In the Microsoft documentation of the recv() function
(http://msdn.microsoft.com/library/default.asp?url=/library/en-us/winsock/winsock/recv_2.asp),
search for "except", "try", and "catch".  You will find
none of these.  So the code in question (a Winsock service
provider?) failed to conform to the specification.  It is
Norman Virus Control's responsibility to fix the code that
throws exception.

I also did a simple experiment:

#include <stdio.h>
#include <windows.h>

int main()
{
    int *p = 0;

    __try {
        *p = 0;
    } __except(EXCEPTION_EXECUTE_HANDLER) {
        printf("exception caught\n");
    }
    return 0;
}

This experiment shows that the proposed patch will even mask
null pointer dereferencing, a serious programming error.  I'm
sorry that I can't accept this patch.
Status: NEW → RESOLVED
Closed: 18 years ago
Resolution: --- → WONTFIX
Target Milestone: --- → 4.7
wtc: that's precisely the point.

i'm an end user. i so don't care that some lame module in my system is crashing. i want my web browser to work.
(In reply to comment #15)
> Thank you very much for your patch.  But I don't want to
> catch exceptions thrown by code that should not throw
> exceptions.
> 
> In the Microsoft documentation of the recv() function
> (http://msdn.microsoft.com/library/default.asp?url=/library/en-us/winsock/winsock/recv_2.asp),
> search for "except", "try", and "catch".  You will find
> none of these.  So the code in question (a Winsock service
> provider?) failed to conform to the specification.  It is
> Norman Virus Control's responsibility to fix the code that
> throws exception.

Quite true.  But the point is, what from Normal Virus Control's developers' point of view is an error that should be fixed, from our point of view is an error preventing a network communication.  The current behavior of Mozilla (including Thunderbird, Firefox, etc.) in response to this error is terminating the whole application, while I think that the expected behavior should be a message to the user that there's an unknown error in network communications, or something like that.  In other words, the end users should not suffer from the technicalities (because it's an exception raised in Normal Virus Control's code, we won't do anything about it), and instead, get the behavior they usually expect (getting an error message which gives them some idea on what went wrong, and gives them the opportunity to terminate the application *if they want to*.)

> I also did a simple experiment:
> 
> #include <stdio.h>
> #include <windows.h>
> 
> int main()
> {
>     int *p = 0;
> 
>     __try {
>         *p = 0;
>     } __except(EXCEPTION_EXECUTE_HANDLER) {
>         printf("exception caught\n");
>     }
>     return 0;
> }
> 
> This experiment shows that the proposed patch will even mask
> null pointer dereferencing, a serious programming error.  I'm
> sorry that I can't accept this patch.

Well, the kind of exception handling in my patch was pretty simple.  Exception handling can be turned off in debug (and alpha/beta) builds (where the focus is on catching errors.  Moreover, there are even mode advanced solutions which can detect which module the exception occurs in, and act accordingly, log the error, send a Talkback report, etc.

I don't think refusing to answer this issue is the right choice.
If you catch and ignore an exception caused by a programming
error in a lower layer, the lower layer may be left in an
inconsistent state, and it may be dangerous to call into the
lower layer again.  I put together a simple test program that
illustrates this point.  When the lower layer dereferences a null
pointer, it doesn't get to release an internal semaphore.  So
when we catch and ignore that access violation and call the
lower again, the lower layer deadlocks trying to wait for the
semaphore.  If I spend more time, I may be able to create a
test program in which the inconsistent state causes a worse
problem than a deadlock.  But I hope you get the idea.

Again, I thank you for your interest in and patches for this
bug.  I know you want to help Mozilla users, and as a programmer
I am fascinated by the technical side of this bug.  But it is
a bad idea to keep going after a programming error may
have left the program in an inconsistent state.

If we really want to help our users, we can report this crash
in Niphk.dll to the vendor of Norman Virus Control and help
them fix this crash.
(In reply to comment #18)
> If you catch and ignore an exception caused by a programming
> error in a lower layer, the lower layer may be left in an
> inconsistent state, and it may be dangerous to call into the
> lower layer again.  I put together a simple test program that
> illustrates this point.  When the lower layer dereferences a null
> pointer, it doesn't get to release an internal semaphore.  So
> when we catch and ignore that access violation and call the
> lower again, the lower layer deadlocks trying to wait for the
> semaphore.  If I spend more time, I may be able to create a
> test program in which the inconsistent state causes a worse
> problem than a deadlock.  But I hope you get the idea.

Thanks for the sample program.  I stand corrected. I think this sample proves that my patch would not *solve* the issue.  And it could create more problems in the future, which would be more obscure to investigation.

So I guess I should go on and close bug #350087 (as WONTFIX) on the same ground as well?
wtc: here's the thing.

you're arguing that sometimes a crashed program will deadlock.

I'm asserting that an aborted program will *always* die.

I don't care that /sometimes/ a caught exception will result in an unhappy program. and i certainly don't care that when we catch the exception we explain to the user why the program is unhappy. telling the user that they have an infected wsa chain would be a *good* thing. and if we happened to get enough users complaining to their lame vendors, maybe enough vendors would fix the problem that it would go away.
Crash Signature: [@ Niphk.dll + 0x2bb3]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: