Closed Bug 368325 Opened 17 years ago Closed 17 years ago

password prompt for encrypted mail crashes ThunderBird & SeaMonkey

Categories

(MailNews Core :: Security: S/MIME, defect, P1)

x86
All

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: ChefChaudart, Assigned: KaiE)

References

(Blocks 1 open bug)

Details

(Keywords: dogfood, regression)

Attachments

(2 files, 2 obsolete files)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9a2pre) Gecko/20070125 Minefield/3.0a2pre
Build Identifier: version 3 alpha 1 (20070126)

Since one week,  I cannot open anymore my encrypted (and signed) mails.

When I click on the mail, it ask me the code, after ok, TB crashes ; see TB28730338H

Reproducible: Always

Steps to Reproduce:
1. receive an encrypted mail
2. try to open it
3. TB ass your code,  then crash
Actual Results:  
crashes
NSS_CMSRecipientInfo_UnwrapBulkKey  [mozilla\security\nss\lib\smime\cmsrecinfo.c, line 565]
0x0365f9f8

Cc'ing Nelson. It sounds like something is corrupt, but we shouldn't crash.
Status: UNCONFIRMED → NEW
Ever confirmed: true
> TB ass your code,  then crash

What does that mean?  

I send and receive lots of encrypted emails, and have never seen the problem
reported here, AFAIK.  That's not to say that I don't believe this is a real
problem, but only that it seems rare.  Unless I can reproduce it, I doubt
we'll be able to solve this.  More on that below.

I'd guess that there is something quite unusual about the email Chef received.
One possibility is that the email was encrypted using the cert(s) of one or 
more intended email recipients, and Chef does not have any of those certs
(that is, the certs that Chef has, for which he has a private key, are not 
among the certs of the intended recipients).  

I agree that, whatever it is, we shouldn't crash when we encounter it.
It also seems likely to me that the cause of the crash is in NSS, so 
I am making this into an NSS bug.  

Although the TB stack "trace" was empty, the crashing function is only 
called from one place, so it is possible to re-create some of the call stack
(but not the call arguments).  It looks like this:

  NSS_CMSRecipientInfo_UnwrapBulkKey  
  NSS_CMSEnvelopedData_Decode_BeforeData
  nss_cms_before_data
  nss_cms_decoder_notify
  sec_asn1d_notify_before  or sec_asn1d_notify_after
  ...

There are several possibilities about reproducing this:
a) maybe I could reproduce it with a copy of only the email message itself,
or 
b) maybe I will also need a copy of Chef's cert8.db file, 
or 
c) maybe it will not be possible to reproduce without a copy of Chef's 
private key DB, key3.db, which I would not ask for, and would not want 
Chef to send me.  (Please don't send me your key3.db file, Chef).

SO, Chef, please save the entire encrypted message to a file.  Then create 
a zip file containing the message and your cert8.db file (but not your 
key3.db file) and attach that zip file to this bug, or email it to me.  
I won't be able to read your encrypted email without your private key, 
but hope to be able to reproduce the problem without that.
Assignee: mscott → nobody
Component: Mail Window Front End → Libraries
Product: Thunderbird → NSS
QA Contact: front-end → libraries
Version: unspecified → 3.11
But first time I get the mail, I was able to read it.  It's after an update of last week that suddenly I get the crash.

see http://chefchaudart.trollprod.org/TBimapTK.png

If I enter the code, or click Cancel, I get the crash
Chef sent me his cert DB and the message.  
I tested them with NSS's CMS decoder test tool, cmsutil, and it worked 
flawlessly.  In short, I cannot reproduce the crash that Chef saw.

I am no longer convinced that this is purely (or primarily) an NSS bug.
In order to have failed where the TalkBack says it did, the code in 
NSS_CMSEnvelopedData_Decode_BeforeData would have to have gotten into 
an "impossible" state.  It would have been necessary for that function 
to have been called with a list of non-NULL recipientInfo pointers, and 
then during the execution of that function, one of those pointers would 
have to change to become NULL or invalid.  The function itself does not 
have that effect on the array of pointers, so some other code outside of 
the executing thread must have done that.

The screen shot cited in comment 3 also shows something VERY strange.
It shows an error message that is the result of having called 
NSS_CMSEnvelopedData_Decode_BeforeData and it returned error SEC_ERROR_NOT_A_RECIPIENT.  But at the same time, it is showing a master
password dialog.  It appears to me that the SMIME code got to a place
where it needed the master password, and called the password callback
function, which put up tht dialog.  

But then apparently the password callback function returned before the 
password had been obtained.  While the user was still seeing the password
dialog, apparently the functino returned and NSS went ahead and tried to
do the job with the missing/wrong master password, resulting in the 
SEC_ERROR_NOT_A_RECIPIENT failure.  This would have destroyed various
contexts, etc.

Then, according to Chef, when he finally dismissed the password dialog,
either by cancelling or by entering the password, apparently the code
somehow tried to re-enter the NSS CMS functions, trying (again) to 
process the parsed (and now destroyed?) message structures.  I think
that's how/why this code crashed.  I don't know enough about the thunderbird SMIME code to know how this could have occurred.  

I seem to recall that there was a change that makes password dialogs 
occur on a different thread than previously.  I wonder if that caused this.

At this point, I think the best I can offer is to add a bunch more of 
"sanity checks" inside of the SMIME code, checking that pointers that 
were previously seen to be non-NULL are still non-NULL, and not using
them if so.  That will avoid the crash, but it won't make Thunderbird
work right.  

I think some mail guru needs to try this.  Looks like all they need to 
do is to read an encrypted email message when they have not yet entered
the master password.  
This patch will prevent the crash, but won't make Thundebird work right.
Assignee: nobody → nelson
Status: NEW → ASSIGNED
Attachment #253071 - Attachment description: patch - detech "impossible" invalid pointers → patch - detect "impossible" invalid pointers
Chef, I have a question for you, and a request.

Q: What version of TBird were you using before the "update of last week" ?
   Please answer that in this bug.

In an effort to further determine if the problem is NSS or outside of NSS,
I'd like you to download and try a pure-NSS program to read your email.
If you can read your email with that program, and not with TBird, then 
I think this bug needs to changed back into a TBird bug.

You can download the NSS and NSPR libraries from these URLs
ftp://ftp.mozilla.org/pub/mozilla.org/security/nss/releases/NSS_3_11_4_RTM/msvc6.0/WIN954.0_OPT.OBJ/nss-3.11.4.zip
ftp://ftp.mozilla.org/pub/mozilla.org/nspr/releases/v4.6.4/msvc6.0/WIN954.0_OPT.OBJ/nspr-4.6.4.zip
Total download size is about 3.5 MB.  

Please email me directly for directions if you're willing to do this test.  
Thanks.
Summary: Encrypted mails are crashed TB → Encrypted mail crashes ThunderBird
I'm not sure, ..  around 22 or 23 of January.
I asked another encrypted mail from the same user.

I got it, I can open it,  and I can open the old mail without any crash.
Resolving WORKS FOR ME, since Chef reports it works for him.
If this problem reoccurs, please reopen this bug.
Status: ASSIGNED → RESOLVED
Closed: 17 years ago
Resolution: --- → WORKSFORME
(In reply to comment #9)
> Resolving WORKS FOR ME, since Chef reports it works for him.
> If this problem reoccurs, please reopen this bug.
> 

I get another signed/encrypted message from someone else,  I could read it the first time, but now, I cannot read anymore any message of that kind.  I got the crash.
On Chefchaurd request, I have reproduced the problem with version 3 alpha 1 (20070203) see TB28988028Y.

Steps to reproduce:

1. create a new user (to have a clean environment)
2. configure a new account (imap)
3. import certs
4. read an encrypted email (the mail is correctly decrypted)
5. close TB
6. start TB
7. open the same email 
8. the password dialog window pops up 
9. TB crashes if I cancel the dialog box or enter the right or wrong password






The problem does not occurred if I provide the "master password for Software Security Device" either by sending an encrypted mail or exporting the certificate before I read an encrypted mail.


Olivier, Thanks for that additional information.  That sounds very familiar.
SeaMonkey trunk had a similar problem for a while, but it got fixed.  
If you "logged in" to the "software security device" with the "master password"
BEFORE doing something in mail/news that required the master password, then
it was OK.  But if you tried to read an encrypted email or send an encrypted
email and didn't login to the "software security device" before going into 
mail/news, it would crash.  

As I recall, the problem had something to do with the handling of the cleanup
of the password dialog window itself.  All I really know for sure is that the
problem came AND WENT in Seamonkey trunk without any NSS fixes being made.  
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
Looking at Olivier's talkback, I see that (a) it's Linux, (b) it's a full 
stack trace, and (c) it died in a different place than Chef's.  

The stack trace has line number information for the parts of TBird other
than NSS, but not for NSS.  This makes me suspsect that TBird is not using
the NSS shared libs that come with TBird, but perhaps is using some other
NSS shared libs on the system, perhaps from another product?  Olivier, 
please check and see if you have NSS shared libraries in some system 
directory, such as /usr/lib.

Olivier, Chef, do either of you build your own Thunderbird from sources? 
If so are you willing to build and test the patch attached to this bug?
OS: Windows XP → All
I have replaced nss/nspr libs with a debug version from ftp://ftp.mozilla.org/pub/mozilla.org/security/nss/releases/NSS_3_11_4_RTM 
(see comment 6)

I don't know if it's useful but it produces a different trace TB28991605Y 

NSS shared libraries come from thunderbird local directory (lsof)

thunderbi 10787       test  mem       REG               8,19  1070487  1855097 /usr/local/trunk/thunderbird/libsoftokn3.so
thunderbi 10787       test  mem       REG               8,19  2029965  1855160 /usr/local/trunk/thunderbird/libnss3.so
thunderbi 10787       test  mem       REG               8,19   800855  1855163 /usr/local/trunk/thunderbird/libssl3.so
thunderbi 10787       test  mem       REG               8,19   999658  1855162 /usr/local/trunk/thunderbird/libsmime3.so
Bug 353558 is another report of a crash with a very similar (not identical)
call stack to Chef's call stack.  It's another "impossible" invalid pointer
crash, where a pointer is valid when the function starts, and invalid later. 
One of these bugs should probably be marked as a duplicate of the other.

Olivier, a question about your master password.  Is it empty?  
When it asks you for your master password, do you just click the OK button?
or do you have to type in a non-empty password first?
The problem seen in Olivier's new talkback TB28991605Y seems to me to be the
same UI problem I described above in comment 13.

Bug 353558 comment 3 says that that crash can no longer be reproduced with 
the fix for bug 353597 present.  That fix was committed on the trunk in 
September though.  :-/
My master password is not empty (11 characters)
Olivier, it seems that talkback is utterly confused by the presence of 
shared libraries other than the ones with which you TBird was built.
The symbol names for the NSS functions given in your talkback TB28991605Y 
are all incorrect.  

My idea of substituting debug libraries for the ones shipped with TBird 
on Windows is not a good idea on Linux.  So let me ask you to revert to a 
clean trunk build of TB on Linux and try again.  Maybe we'll get a better 
stack trace after that.
I replaced NSS libs with Debug version for Linux (i386). (ftp://ftp.mozilla.org/pub/mozilla.org/security/nss/releases/NSS_3_11_4_RTM/Linux2.6_x86_glibc_PTH_DBG.OBJ/nss-3.11.4.tar.gz)

I'm now back to a clean version 3 alpha 1 (20070203) TB28992825G
I have build a debug version on a x86_64 system.
My gdb knowledge is limited to type "bt" when the app crashes.



version 3 alpha 1 (20070204) 


Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 46949675506528 (LWP 4898)]
0x00002aaab386503d in NSS_CMSEnvelopedData_Decode_BeforeData (envd=0x15f3fb8) at cmsenvdata.c:362
362         ri = envd->recipientInfos[recipient->riIndex];
Current language:  auto; currently c
(gdb) bt
#0  0x00002aaab386503d in NSS_CMSEnvelopedData_Decode_BeforeData (envd=0x15f3fb8) at cmsenvdata.c:362
#1  0x00002aaab3862475 in nss_cms_before_data (p7dcx=0xb66600) at cmsdecode.c:264
#2  0x00002aaab3862389 in nss_cms_decoder_notify (arg=0xb66600, before=1, dest=0x15f4038, depth=4) at cmsdecode.c:210
#3  0x00002aaab3d47c95 in sec_asn1d_notify_before (cx=0x15f74b0, dest=0x15f4038, depth=4) at secasn1d.c:451
#4  0x00002aaab3d4a665 in sec_asn1d_next_in_sequence (state=0x15f7820) at secasn1d.c:2016
#5  0x00002aaab3d4b6d2 in SEC_ASN1DecoderUpdate (cx=0x15f74b0, buf=0x15f4830 "<", len=22) at secasn1d.c:2672
#6  0x00002aaab3862d45 in NSS_CMSDecoder_Update (p7dcx=0xb66600, buf=0x15f4810 "X�_\001", len=54) at cmsdecode.c:670
#7  0x00002aaab3607485 in nsCMSDecoder::Update (this=0xb65f20, buf=0x15f4810 "X�_\001", len=7274598)
    at /old/tmp/mozilla/security/manager/ssl/src/nsCMS.cpp:813
#8  0x00002ab353b5b7d4 in MimeCMS_write (buf=0x15f4810 "X�_\001", buf_size=54, closure=<value optimized out>)
    at /old/tmp/mozilla/mailnews/mime/src/mimecms.cpp:510
#9  0x00002ab353b5acf5 in MimeEncrypted_parse_decoded_buffer (buffer=0x13ac9f8 "\004", size=1, obj=<value optimized out>)
    at /old/tmp/mozilla/mailnews/mime/src/mimecryp.cpp:193
#10 0x00002ab353b36054 in mime_decode_base64_buffer (data=0xb66a40, buffer=0x15f4810 "X�_\001", length=0)
    at /old/tmp/mozilla/mailnews/mime/src/mimeenc.cpp:325
#11 0x00002ab353b36d49 in MimeDecoderWrite (data=0x13ac9f8, buffer=0x1 <Address 0x1 out of bounds>, size=7274598)
    at /old/tmp/mozilla/mailnews/mime/src/mimeenc.cpp:837
#12 0x00002ab353b5b51f in MimeEncrypted_parse_buffer (buffer=0x0, size=7274598, obj=<value optimized out>)
    at /old/tmp/mozilla/mailnews/mime/src/mimecryp.cpp:171
#13 0x00002ab353b3faf2 in MimeMessage_parse_line (aLine=0x15f4810 "X�_\001", aLength=73, obj=0x13e0bb0)
    at /old/tmp/mozilla/mailnews/mime/src/mimemsg.cpp:240
#14 0x00002ab353b49523 in convert_and_send_buffer (buf=0x15f4810 "X�_\001", length=73, convert_newlines_p=1, 
    per_line_fn=0x2ab353b3f8c6 <MimeMessage_parse_line>, closure=0x13e0bb0) at /old/tmp/mozilla/mailnews/mime/src/mimebuf.cpp:185
#15 0x00002ab353b497af in mime_LineBuffer (
    net_buffer=0x15fd8dd "CSqGSIb3DQEHATAUBggqhkiG9w0DBwQI2/vE7XKgitOggASCFIj4rdOQbCzA0QaV/eiGGszY\n0kROnVZA+LvW89uihYGXG4Y9VPdD46XZtaQbjYiAtULG467n2iJaUanON7zk7ENHD98zUApd\nkjlocP7TXgp+j8sglGKJRZkmmX1HFQzURa6Ix8GBll4zISEXtBDots"..., 
    net_buffer_size=<value optimized out>, bufferP=0x13e0bf0, buffer_sizeP=0x13e0c00, buffer_fpP=0x13e0c08, 
    convert_newlines_p=1, per_line_fn=0x2ab353b3f8c6 <MimeMessage_parse_line>, closure=0x13e0bb0)
    at /old/tmp/mozilla/mailnews/mime/src/mimebuf.cpp:273
#16 0x00002ab353b427db in MimeObject_parse_buffer (
    buffer=0x15fcc90 "Received: from mailsrv.trollprod.net ([192.168.0.5])\n        by webmail.trollprod.com (Postfix) with ESMTP id 1WH22349\n        for <olivn@trollprod.org>; Sat, 03 Feb 2007 17:23:12 +0100\nReceived: from"..., size=10336, obj=0x13e0bb0)
    at /old/tmp/mozilla/mailnews/mime/src/mimeobj.cpp:284
#17 0x00002ab353b4a03a in mime_display_stream_write (stream=<value optimized out>, 
    buf=0x15fcc90 "Received: from mailsrv.trollprod.net ([192.168.0.5])\n        by webmail.trollprod.com (Postfix) with ESMTP id 1WH22349\n        for <olivn@trollprod.org>; Sat, 03 Feb 2007 17:23:12 +0100\nReceived: from"..., size=10336)
    at /old/tmp/mozilla/mailnews/mime/src/mimemoz2.cpp:953
#18 0x00002ab353b53533 in nsStreamConverter::OnDataAvailable (this=0x15e6d10, request=<value optimized out>, 
    ctxt=<value optimized out>, aIStream=0x15f96a0, sourceOffset=<value optimized out>, aLength=10336)
    at /old/tmp/mozilla/mailnews/mime/src/nsStreamConverter.cpp:911
#19 0x00002aaaab71ab37 in nsDocumentOpenInfo::OnDataAvailable (this=<value optimized out>, request=0x15b47e0, aCtxt=0x0, 
    inStr=0x15f96a0, sourceOffset=0, count=10336) at /old/tmp/mozilla/uriloader/base/nsURILoader.cpp:360
#20 0x00002ab354a045da in nsStreamListenerTee::OnDataAvailable (this=<value optimized out>, request=0x15b47e0, context=0x0, 
    input=0x15e2378, offset=0, count=10336) at /old/tmp/mozilla/netwerk/base/src/nsStreamListenerTee.cpp:97
#21 0x00002ab34ca4b138 in NS_InvokeByIndex (that=0x2aaab4119aa0, methodIndex=5, paramCount=5, params=0x2aaab4145a20)
    at /old/tmp/mozilla/xpcom/reflect/xptcall/src/md/unix/xptcinvoke_x86_64_linux.cpp:208
#22 0x00002ab34ca3e839 in nsProxyObjectCallInfo::Run (this=0x2aaab41451e0)
    at /old/tmp/mozilla/xpcom/proxy/src/nsProxyEvent.cpp:181
#23 0x00002ab34ca37b21 in nsThread::ProcessNextEvent (this=0x640a00, mayWait=1, result=0x7fff5ec51ecc)
    at /old/tmp/mozilla/xpcom/threads/nsThread.cpp:482
#24 0x00002ab34c9e4fd7 in NS_ProcessNextEvent_P (thread=0x13ac9f8, mayWait=1) at nsThreadUtils.cpp:225
#25 0x00002aaaae815000 in nsBaseAppShell::Run (this=0x669e30) at /old/tmp/mozilla/widget/src/xpwidgets/nsBaseAppShell.cpp:153
#26 0x00002aaaaf768633 in nsAppStartup::Run (this=0x7977a0)
    at /old/tmp/mozilla/toolkit/components/startup/src/nsAppStartup.cpp:171
#27 0x00002ab34c575a13 in XRE_main (argc=<value optimized out>, argv=<value optimized out>, aAppData=<value optimized out>)
    at /old/tmp/mozilla/toolkit/xre/nsAppRunner.cpp:2749
#28 0x00000000004007e8 in main (argc=20630008, argv=0x1) at /old/tmp/mozilla/mail/app/nsMailApp.cpp:62
http://www.mozilla.org/unix/debugging-faq.html is probably worth reading (google: unix debugging faq)

p envd
p envd->recipientInfos
p recipient
p recipient->riIndex
(gdb) p envd
$2 = (NSSCMSEnvelopedData *) 0x14b1898
(gdb) p envd->recipientInfos
$3 = (NSSCMSRecipientInfo **) 0x650072006f0066
(gdb) p recipient
$4 = (NSSCMSRecipient *) 0xafbcf0
(gdb) p recipient->riIndex
$5 = 0

http://olivn.trollprod.org/TB/cmsenvdata_ddd.png
Perhaps bug 353558 is a duplicate of this one, or vice versa?

What these two bugs seem to have in commmon is that in both cases, 
the decoder calls NSS_CMSEnvelopedData_Decode_BeforeData which finds
that the recipientInfos pointers are valid, then it calls the actual
decryption function, which prompts for the password, and when that 
function returns, the relevant receipientInfos pointer is corrupted. 

Question: is this problem only seen on 64-bit builds?  
Has this been seen on any 32-bit builds?  
I have produced a 64bit build because it's easier for me but I initially reproduce the problme with a 32bit build and it was initialy reported on win32.
Comment on attachment 253071 [details] [diff] [review]
patch - detect "impossible" invalid pointers

This patch is intended to detect corrupted pointers
and avoid crasahing.
It also cleans up the function code a little.
Attachment #253071 - Flags: superreview?(wtchang)
Attachment #253071 - Flags: review?(alexei.volkov.bugs)
I am now experiencing this problem 100% reproducibly with SeaMonkey 1.5a
(trunk builds).  When we return from the call to get the password, 
the heap is corrupted.  This happens whether the password dialog 
had a password entered, or the user pressed cancel.  This result shows
that the problem in not in the decryption code, because the decryption
code doesn't get called when the user cancels the password request.  

The challenge now is to find the code that corrupts the heap when the 
password dialog is presented to the user.  

We don't see this in tests of the NSS CMS (a.k.a. S/MIME) code that don't
use PSM.  We see this only with PSM.  So I am rather stronly inclined to
make this into a PSM bug, as soon as the above patch is reviewed.
Priority: -- → P2
Target Milestone: --- → 3.11.7
Comment on attachment 253071 [details] [diff] [review]
patch - detect "impossible" invalid pointers

Asking Kaie to review, in place of Wan-Teh.
Attachment #253071 - Flags: superreview?(wtchang) → superreview?(kengert)
I started to look at this bug.
I am able to reproduce the crash on Linux trunk.

The patch does not prevent the crash.

I believe this bug might be caused by a new "order of events" when the application is decoding the message. (I'm making a very general statement here, without being able to provide details yet).

I compared the behaviour of MOZILLA_1_8_BRANCH (Seamonkey 1.1.x) with the latest trunk code (Seamonkey 1.5a).

I used these steps:
- start seamonkey
- login to mail account
- (not yet entered master pw)
- click encrypted message
- prompt for master pw comes up
- move the prompt window around, so that you can look at the mail message display area, in particular the header area

At this point I inspected the state of the application and found an interesting difference.

With the trunk, I get the "broken encryption" icon, and I believe this points us into the right direction.

When you do the above with the stable branch, there is no such icon yet displayed.

When a message is rendered for display, it gets passed into different decoders. One of them is the decryption decoder. Simply speaking, the decoder is supposed to block any other events until it is done. Only after we are done with the decoding, a decision will be made about the display and the crypto state icons.


In other words:

The code that tries to make the decision about the crypto state gets executed before we are done with decoding. This is wrong. We must find out why.

The crypto decoder is given "UI context" pointers. Because the UI update has already been executed, it probably carries a dangling pointer.


I agree this is NOT a NSS bug.
Nelson, if you still believe we should drive that NSS patch in, let me know.
Assignee: nelson → kengert
Status: REOPENED → NEW
Component: Libraries → Security: S/MIME
Product: NSS → Core
QA Contact: libraries
Target Milestone: 3.11.7 → ---
Version: 3.11 → Trunk
(In reply to comment #30)
> I started to look at this bug.
> I am able to reproduce the crash on Linux trunk.
> 
> The patch does not prevent the crash.

Please tell me how/why it crashed.  On what line did it crash?
What variable contained an invalid value?
Was that value NULL, or some other non-NULL but invalid value?

With the patch applied, was the crash actually an assertion failure?
My patch adds assertions (which will definitely crash in debug builds)
and code to return without crashing in non-DEBUG builds.

> Nelson, if you still believe we should drive that NSS patch in, let me know.

In the stacks and core files I've previously seen, this crash was due to 
an invalid (NULL) value being passed to NSS_CMSRecipientInfo_UnwrapBulkKey
or by an invalid array index being used to index the envd->recipientInfos 
or recipient_list arrays.  My patch detects all those errors, and (in 
the non-debug builds) avoids crashing because of them.  

So, if the patch doesn't eliminate the crash in non-debug builds, then 
there must be yet-another cause of a crash that I have not yet seen.  
I'd appreciate any info on that new cause that you can supply.
QA Contact: s.mime
Kai, Using the MSVC6 Debug build you made, I was able to debug this problem
with MSVC8.  I found that the ASN.1 decoder context was freed while the 
ASN.1 decoder was waiting for the master password to be entered.  I knew there 
was only one function that destroys that context, so i set a breakpoint in it,
and found out how it happens. The attached stack trace tells the tale.  

While waiting for something to happen with the modal dialog,  on that very 
same stack, the code processes the next event, which apparently is one 
signalling the end of the stream for the MIME message being parsed.  
So, we terminate and tear it all down, while this very thread is still in 
the middle of processing it.  

Looks like a consequence of bug 326273.
See bug 338549 comment 1 and following comments for more info.  

I'm thinking maybe we want to add some code to detect and avoid 
recursive calls to the ASN.1 and CMS decoder functions.
Attachment #253071 - Attachment is obsolete: true
Attachment #253071 - Flags: superreview?(kengert)
Attachment #253071 - Flags: review?(alexei.volkov.bugs)
Yeah, this bug and bug 338549 have more or less the same cause, and the problem is bug 326273.
Flags: blocking1.9?
Summary: Encrypted mail crashes ThunderBird → password prompt for encrypted mail crashes ThunderBird & SeaMonkey
Flags: blocking1.9? → blocking1.9+
This bug is blocking 1.9.  What is the plan to get it fixed for 1.9?

The only workaround is to configure master passwords to be asked only once
per process lifetime, and never time out, then do something to force a 
master password before reading the first encrypted email.  Because of that,
I have been more-or-less forced to set my password timeout to infinite,
which I regard as very insecure.  I would surely hope we won't ship FF or 
TB with this unfixed.
Nelson, thanks a lot for the stack, it's very helpful.
I'm now trying to find a solution for this bug.

As we can see in the stack, we arrive in
  nsStreamListenerTee::OnDataAvailable

We go down the chain of data processing, and discover that we need to unlock the private key. We bring up a prompt.

Now that network / data streams may proceed while blocking UI is shown, more data is being sent to us. So we arrive in
  nsStreamListenerTee::OnStopRequest
(which tries to delete an object we are still using).

Actually, it's worse!
During the call to
  nsStreamListenerTee::OnDataAvailable
that resulted in us showing the prompt,
more data was fed to us using the same function call!
It just happens we only crash once we arrive in OnStopRequest.


Why is all this data being sent to us?
Because the actual streams are being processed on a different thread (probably the socket transport thread).
The requests to process the data get "proxied" to our UI thread.

These requests are sent as events of the global event queue,
so we can't filter them out on the receiving side (the thread showing the prompt).


I see two different approaches to solve this bug.

(1)
Find a way to tell the caller: pause until we are done!

I think the data is currently being set to us asynchronously.
Maybe we can find the processing code and switch to a synchronous data delivery?


(2)
Avoid bringing up the prompt in the middle of stream processing.

When a stream arrives and we notice that we must log in:
- abort the stream
- display the message as "can't decrypt" (temporarily)
- post some UI event to ourself that triggers the login prompt
  (now decoupled from any streams, as if the user had clicked login)
- after the login is done, we discover that our login came from
  that special event we posted, so we'll trigger a re-display
  of the current message
- now we're logged in and can process without prompting


Solution 2 means, we can no longer make use of the global login setting "ask every time". (Obviously, this would trigger a login-request each time we try to process the stream, and would never be able to)
Chef, Nelson, are you both using IMAP ?
Attached patch Patch v1 (obsolete) — Splinter Review
This patch fixes it for me.

With this change, the mail data is sent synchronously into the decoder.
This way the data flow is blocked as long as our password prompt is shown.

David, do you think this is ok to do?
Attachment #283655 - Flags: review?
Attachment #283655 - Flags: review? → review?(bienvenu)
same patch as before, but cvs diff -u10 -p
Attachment #283655 - Attachment is obsolete: true
Attachment #283657 - Flags: review?
Attachment #283655 - Flags: review?(bienvenu)
Attachment #283657 - Flags: review? → review?(bienvenu)
In answer to comment 36, I use both IMAP and POP3 all the time, in the 
same SM profile.  Do you want/need me to determine if the problem happens
only with one or only with the other?
I can reproduce this just by reading an old encrypted email in a (local)
POP3 email folder.  No need to use IMAP to reproduce it.

I confirm an observation Kai made.  When you try to read an encrypted email
and you get the master password prompt, the encryption icon in the message 
header pane may already be showing the broken key icon, indicating 
unsuccessful decryption.  If you see that icon when you get the master 
password prompt, you're doomed.  The damage is already done.  Whether you
enter a password or just click cancel, it will crash.  Knowing this, you may
be able to close some windows to minimize the dataloss on crash.
Comment on attachment 283657 [details] [diff] [review]
Patch v1, more context

at best, I think this could slow us down a bit; at worst, it could introduce some deadlocks. Unlikely, though.

Is there a way to signal that you can't accept any more data? Maybe bisi has some thoughts...
(In reply to comment #41)
> (From update of attachment 283657 [details] [diff] [review])
> at best, I think this could slow us down a bit; at worst, it could introduce
> some deadlocks. Unlikely, though.

If we consider this patch and have the risk of deadlocks, I will start to use ThunderBird trunk with this patch as my primary email client (currently using TB2), so we get more testing.


> Is there a way to signal that you can't accept any more data? Maybe bisi has
> some thoughts...

You mean, switch the stream into a paused state, so the socket/stream tranport (or whoever is ansychronously pumping the data over) will pause until the prompt is ready?

I had initially considered this idea, but I don't know how to do it.
It would require to:
- use a context object that is carried all the way down to the password-prompt-callback, so we have a pointer to the stream object that we switch to "paused"
- decide which object will implement the paused state. Is there a feature available in the stream objects? (I couldn't fine one). If we implement it, should it be done in the stream implementation? Or should we switch the IMAP object to paused (which is producing the additional data)?
(In reply to comment #40)
> I can reproduce this just by reading an old encrypted email in a (local)
> POP3 email folder.  No need to use IMAP to reproduce it.

I removed the patch, then:
- I tried "encrypted mail in POP3 folder".
- I tried "encrypted mail in Local Folders"

No crash.

I crash when using IMAP only.
I have a "search folder" that shows me all the encrypted emails sitting in
my POP3 mail folders.  Clicking on any email in that search folder causes 
a crash with high probability.  The filter rule for this folder is simply:

Content-Description      contains     crypted

Until this bug became common, I formerly had my master password timeout 
configured to ask me for my master password if 5 minutes had elapsed since
the last time I had used it.  But when this bug arose, I found that reading
encrypted email caused many crashes per day.  So I was forced by necessity 
to switch to having no timeout on my master password.  Then I developed the 
habit of doing something that asks me for mey master password as soon as I
start the program, so that I won't be asked for it again when I read 
encrypted email.
I had a IM conversation with Nelson.
He pointed me to the "search folder" feature, which I had not seen before.

Nelson claims the bug is not limited to IAMP.
He said, with a 90% probability this happens with POP3 or "search folders", too.
He said, his search folder are based on POP3 accounts, only.

I tested, but I only crash with IMAP, never with POP3, never with search folders based on POP3.

Within the mozilla/mail and mozilla/mailnews source directories, only the IMAP code uses the PROXY_ASYNC flag.

(well, ldap and address book, too, but they don't cause us to display am email message)


Could we land my patch into trunk and see if the crash goes away?
Kai, I agree that if you have  fix for IMAP, we should try to land it.
A fix for IMAP only is better than no fix at all. 
Using the latest trunk build, I see the problem on IMAP every time I 
try it (try to read an encrypted email without logging into token first).
Am having difficulties reproducing with POP3 on this box. 
But I used to have this problem a LOT on my other box, so I want to try
it there before I completely agree this is just IMAP.
Comment on attachment 283657 [details] [diff] [review]
Patch v1, more context

Who else could review this patch?
Attachment #283657 - Flags: superreview?(mscott)
Attachment #283657 - Flags: review?(neil)
Comment on attachment 283657 [details] [diff] [review]
Patch v1, more context

I really don't care for this, for the reasons I outlined earlier, but let's find out if it fixes the crash.
Attachment #283657 - Flags: review?(bienvenu) → review+
Comment on attachment 283657 [details] [diff] [review]
Patch v1, more context

While I don't have any encrypted mail I can see how this would help so I would be prepared to sr this if mscott can't.
Attachment #283657 - Flags: review?(neil)
Comment on attachment 283657 [details] [diff] [review]
Patch v1, more context

I'll defer to David on this.
Attachment #283657 - Flags: superreview?(mscott) → superreview+
Comment on attachment 283657 [details] [diff] [review]
Patch v1, more context

Requesting approval for 1.9.

(As this is mail-only, I'm not requesting firefox m9 endgame approval)
Attachment #283657 - Flags: approval1.9?
Comment on attachment 283657 [details] [diff] [review]
Patch v1, more context

a=drivers for after the M9 freeze
Attachment #283657 - Flags: approval1.9? → approval1.9+
P1, because fix is ready and already queued for check in.
Priority: P2 → P1
checked in
Status: NEW → RESOLVED
Closed: 17 years ago17 years ago
Resolution: --- → FIXED
Should this patch be backed out, now that bug 389931 (which fixed bug 326273 regression) is fixed ?
Serge, is there something about this patch that is extremely undesirable?
What is the incentive to back this out??
This was a NASTY bug for almost a year before that fix was made.
It was most welcome relief.  Unless there's some REALLY GOOD reason to 
tempt the fates, I'd suggest we leave well enough alone.
(No extreme hurry.)
Maybe I had simply misread your comment 32:
I though this bug was caused by a regression which bug 389931 would fix;
but it seems that the cause was an improvement from bug 326273...

Is there any plan to go back to an async behavior ?
Is the dependence on bug 265780 still needed ?
Does this bug somehow depend on bug 338549 ?
Product: Core → MailNews Core
You need to log in before you can comment on or make changes to this bug.