Last Comment Bug 337903 - Computer freeze when viewing an email with Content-Transfer-Encoding: base64
: Computer freeze when viewing an email with Content-Transfer-Encoding: base64
Status: RESOLVED DUPLICATE of bug 411988
: hang
Product: SeaMonkey
Classification: Client Software
Component: General (show other bugs)
: Trunk
: x86 Linux
: -- critical (vote)
: ---
Assigned To: general
:
:
Mentors:
http://marc.theaimsgroup.com/?l=enlig...
Depends on: 411988
Blocks:
  Show dependency treegraph
 
Reported: 2006-05-14 02:30 PDT by Morten Nilsen
Modified: 2009-12-31 12:14 PST (History)
5 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---


Attachments
email in question, copied from Maildir on server (6.53 KB, text/plain)
2006-05-14 02:32 PDT, Morten Nilsen
no flags Details
TEXT showing NSPR_LOG_FILE at point of freeze, plus copy of message (17.25 KB, text/plain)
2007-12-28 10:06 PST, David A. Cobb
no flags Details
Tail -n 2000 from NSPR_LOG_FILE when frozen, with more MODULES (120.77 KB, text/plain)
2008-01-09 11:09 PST, David A. Cobb
no flags Details
Excerpt from NSPR_LOG_FILE showing successful receipt of a BASE64 encoded mime-part (12.83 KB, application/octet-stream)
2008-01-09 11:11 PST, David A. Cobb
no flags Details
Another NSPR-LOG excerpt. Two received successfully and the end-of-session (112.09 KB, text/plain)
2008-02-04 11:11 PST, David A. Cobb
no flags Details

Description Morten Nilsen 2006-05-14 02:30:40 PDT
I am not certain why it crashes, because my computer (display) freezes up.

I have not attempted to contact the computer over the network after it froze, but it didn't show any response to keyboard commands (ctrl-alt-backspace, followed by ctrl-alt-delete)

I am running Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.2) Gecko/20060409 SeaMonkey/1.0.1 Mnenhy/0.7.3.0 my distro is Trustix, and the email that freezes my computer will be attached shortly

I am not certain the Content-Transfer-Encoding is related, but that is the one thing about the email that leaped at me.
Comment 1 Morten Nilsen 2006-05-14 02:32:19 PDT
Created attachment 221948 [details]
email in question, copied from Maildir on server

Original filename: 1147572040.28436_2.ryo-ohki.4th-age.com:2,S
Comment 2 Morten Nilsen 2006-05-14 04:40:04 PDT
my computer freezes in the exact same manner when I try loading the url now attached
Comment 3 Morten Nilsen 2006-05-14 04:42:19 PDT
the same does not affect firefox (Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.1) Gecko/20060206 Firefox/1.5.0.1)
Comment 4 Morten Nilsen 2006-05-14 04:44:15 PDT
clearly not mailnews-specific..
Comment 5 David A. Cobb 2007-12-22 13:38:56 PST
[2007-12-05-03-trunk on Ubuntu 'Gutsy']

Yeah, I'm seeing a lot of this.  However, not every base64 message freezes the computer.  What I am seeing is that every message from a certain set of <From> addresses will freeze up immediately upon receipt.  My very first filter should have deleted at least one of the bunch, but it doesn't fire.  On the other hand, when I get control again and look in my inbox the message appears to be there -- I'm not eager to try opening it.  And, if things were working correctly, the message would have been filed by another filter and not appear in my Inbox.

One of the senders could be malicious, but I've opened her messages and they appear innocuous and on-topic.  I do note that they are CHAR-SET=KOI8-R, and I'm not sure I could display that. Anyway, the program isn't trying to render the messages at this point.

A second sender, however, is <zdi-disclosures@3mcom.com> via bugtraq.  That doesn't seem a likely origin of malicious mail.  Perhaps someone else reading this also subscribes to bugtraq and can look at those.
Comment 6 David A. Cobb 2007-12-28 10:06:42 PST
Created attachment 294753 [details]
TEXT showing NSPR_LOG_FILE at point of freeze, plus copy of message

This is the end of the logfile at the point of failure, plus the message as stored.  NOTE that the msg was stored in my INBOX; it should have been filtered.  This leads me to suspect the failure occurred during filtering.

I have no shortage of other examples, if useful.
Comment 7 David A. Cobb 2007-12-28 10:10:49 PST
(In reply to comment #5)
> A second sender, however, is <zdi-disclosures@3mcom.com> via bugtraq.  That
> doesn't seem a likely origin of malicious mail.

Like the attachment#294753 [details], the message wound up in my INBOX.  It should have been filtered. 

As it wasn't clear in my comment#6, there is no difficulty rendering the messages once I've recovered from the freeze.
Comment 8 David A. Cobb 2007-12-28 10:35:39 PST
And, to add to my perplexity, I can manually run the filters which should have caught both message groups with no problem.
Comment 9 WADA 2007-12-28 20:17:17 PST
(In reply to comment #6)
> -1224669504[8d9b548]: RECV: .
> ==============================================================================
> At the above point, the program froze, using all CPU.

Because POP3 case, after end of data indicator of following ".";
> 0[3b43f0]: RECV: .
next flow normally occurs.
> 0[3b43f0]: RECV: (null)
> 0[3b43f0]: POP3: Entering state: 15
> 0[3b43f0]: POP3: Entering state: 22
And RETR for more messages follows, and DLET for downloaded mail follows when no "Leave messages on Server", and logoff is executed by QUIT.
> 0[3b43f0]: SEND: QUIT

No "(null)" after "." in NSPR log is probably one of next.
(A) "RECV: (null)" and logs here after were held in log buffer, and Tb went in           
    loop, and buffered logs were not written to log file due to kill of Tb.
(B) After receive of null, Tb went in loop before log-write of "RECV: (null)",
    and logs here after did not appear.
(C) "(null)" was not passed to Tb yet, as log says.

To David A. Cobb:

(C) can be caused by mail scan feature of anti-virus software.
What anti-virus software do you use? Do you enable mail scan feature?

Network related NSPR log will help to see which case, (A), (B), or (C).
Please use next if you can do further test.
  SET NSPR_LOG_MODULES=pop3:5,nsSocketTransport:5,nsHostResolver:5
See http://www.mozilla.org/projects/netlib/http/http-debugging.html
Comment 10 David A. Cobb 2008-01-09 11:01:37 PST
(In reply to comment #9)
> (In reply to comment #6)
> > -1224669504[8d9b548]: RECV: .
> > ==============================================================================
> > At the above point, the program froze, using all CPU.
> 
> Because POP3 case, after end of data indicator of following ".";
> > 0[3b43f0]: RECV: .
> next flow normally occurs.
> > 0[3b43f0]: RECV: (null)
> > 0[3b43f0]: POP3: Entering state: 15
> > 0[3b43f0]: POP3: Entering state: 22
> And RETR for more messages follows, and DLET for downloaded mail follows when no "Leave messages on Server", and logoff is executed by QUIT.
> > 0[3b43f0]: SEND: QUIT

Not to be contentious, but that is simply not the case.  I have loads and loads of logs showing --

-1223883072[8d9b548]: RECV: View this message in context: http://www.nabble.com/netbeans-6---web-application-jdk-1.4-tp14282494p14284347.html
-1223883072[8d9b548]: RECV: Sent from the Netbeans - Users mailing list archive at Nabble.com.
-1223883072[8d9b548]: RECV: .
-1223883072[8d9b548]: RECV: (null)
-1223883072[8d9b548]: POP3: Entering state: 20
-1223883072[8d9b548]: SEND: DELE 3
               <===========
-1223883072[8d9b548]: Entering NET_ProcessPop3 5
-1223883072[8d9b548]: POP3: Entering state: 3
-1223883072[8d9b548]: RECV: +OK
-1223883072[8d9b548]: POP3: Entering state: 21
-1223883072[8d9b548]: POP3: Entering state: 15
-1223883072[8d9b548]: POP3: Entering state: 18
-1223883072[8d9b548]: SEND: RETR 4

-1223883072[8d9b548]: Entering NET_ProcessPop3 1448
-1223883072[8d9b548]: POP3: Entering state: 3
-1223883072[8d9b548]: RECV: +OK 5173 octets

> No "(null)" after "." in NSPR log is probably one of next.
> (A) "RECV: (null)" and logs here after were held in log buffer, and Tb went in  
>     loop, and buffered logs were not written to log file due to kill of Tb.
> (B) After receive of null, Tb went in loop before log-write of "RECV: (null)",
>     and logs here after did not appear.
> (C) "(null)" was not passed to Tb yet, as log says.
> 

I'm not sure what you're saying here. I see many, many instances of (null) -- some in the middle of a message, sometimes after the end-of-message.   

I don't understand why the program should even be reading the channel after the "." -- I see nothing in the RFC that would suggest this is not the final transmission of a transaction.

If, on the other hand, the (null) is being read AFTER completely processing the message, we may have something here.  That is the point that is never reached.

Here is one in the middle of a message:
-1223883072[8d9b548]: RECV: jobsearch
-1223883072[8d9b548]: RECV: 
-1223883072[8d9b548]: RECV: 25 new jobs matching programmer in Providence, RI were added to
-1223883072[8d9b548]: RECV: (null)
-1223883072[8d9b548]: Entering NET_ProcessPop3 4096
-1223883072[8d9b548]: POP3: Entering state: 19
-1223883072[8d9b548]: RECV: Jobster in the last 24 hours. Here are the newest jobs that you
-1223883072[8d9b548]: RECV: haven't seen yet.


> To David A. Cobb:
> 
> (C) can be caused by mail scan feature of anti-virus software.
> What anti-virus software do you use? Do you enable mail scan feature?

My virus scan is ClamAV, invoked from p3scan.  P3scan intercepts the channel -- thunderbird never even sees it until after the scan is complete.

In any case, when the machine locks up all the useful CPU is being taken by Thunderbird, not by P3scan or ClamAV. 

> 
> Network related NSPR log will help to see which case, (A), (B), or (C).
> Please use next if you can do further test.
>   SET NSPR_LOG_MODULES=pop3:5,nsSocketTransport:5,nsHostResolver:5
> See http://www.mozilla.org/projects/netlib/http/http-debugging.html
> 

Well, that made my logs even bigger and messier.  I'm not sure it shows anything useful.

See Attachment. This is only the TAIL of a humongous log.

A second attachment shows the log of a BASE64 message that came through without any problem.  To summarize what it looks like to me: all BASE64 from certain originators cause problems.  Other originators, not.

Let me highlight another symptom I think we're ignoring.  When I regain control of the computer, I find the "pathological" message sitting in my Inbox, and I can render it without problems.  But it shouldn't be in the Inbox, it should have been filtered.  And THAT is where I believe we are hanging up.

Is there a NSPR_LOG_MODULES value that would track a message through the rest of the processing??
Comment 11 David A. Cobb 2008-01-09 11:09:30 PST
Created attachment 296176 [details]
Tail -n 2000 from NSPR_LOG_FILE when frozen, with more MODULES

This log was captured hours after the "bad" message was received.  I note that the thread that handles the network connection appears to be active -- although the state looks weird! (Why "nsHostResolver::ThreadFunc exiting" immediately following the "RECV: ." ??
Comment 12 David A. Cobb 2008-01-09 11:11:31 PST
Created attachment 296179 [details]
Excerpt from NSPR_LOG_FILE showing successful receipt of a BASE64 encoded mime-part

But I note that this is a mime-part, whereas the "killer" messages are all BASE64 for the entire body.  FWIW.
Comment 13 WADA 2008-01-09 21:30:09 PST
(In reply to comment #10)
> I'm not sure what you're saying here. I see many, many instances of (null) --
> some in the middle of a message, sometimes after the end-of-message.   

You dis say following in attached log & report in your Comment #6, didn't you?
> -1224669504[8d9b548]: RECV: .
> ==============================================================================
> At the above point, the program froze, using all CPU.
It is real log when freeze, isn't it?

> -1223883072[8d9b548]: RECV: .
> -1223883072[8d9b548]: RECV: (null)
> -1223883072[8d9b548]: POP3: Entering state: 20
> -1223883072[8d9b548]: SEND: DELE 3
>(snip)
> -1223883072[8d9b548]: RECV: +OK
>(snip)
> -1223883072[8d9b548]: SEND: RETR 4

You are right. Sorry for my incorrect comment.
Comment 14 WADA 2008-01-09 22:05:41 PST
(In reply to comment #11)
> Created an attachment (id=296176) [details]
> Tail -n 2000 from NSPR_LOG_FILE when frozen, with more MODULES

Following is last part of log.
> -1223891264[8da5548]: RECV: L2RvY3MueWFob28uY29tL2luZm8vdGVybXMvCiAK
> -1223891264[8da5548]: RECV: .
> -1302733936[b0c00708]: nsHostResolver::ThreadFunc exiting
> -1233343600[8e3a020]:     ...returned after 239955 milliseconds
> -1233343600[8e3a020]: nsSocketTransport::OnSocketReady [this=161b9710 outFlags=1]
> -1233343600[8e3a020]: nsSocketInputStream::OnSocketReady [this=161b97ec cond=0]
> -1233343600[8e3a020]: STS dispatch [2657aa88]
> -1233343600[8e3a020]: nsSocketInputStream::Read [this=161b97ec count=4096]
> -1233343600[8e3a020]:   calling PR_Read [count=4096]
> -1233343600[8e3a020]:   PR_Read returned [n=-1]
> -1233343600[8e3a020]: nsSocketInputStream::AsyncWait [this=161b97ec]
>(snip of many polling log)
> -1233343600[8e3a020]: nsSocketInputStream::CloseWithStatus [this=161b97ec reason=0]
> -1233343600[8e3a020]: nsSocketInputStream::CloseWithStatus [this=161b97ec reason=80470002]
> -1233343600[8e3a020]: nsSocketTransport::OnMsgInputClosed [this=161b9710 reason=80470002]
> -1233343600[8e3a020]: nsSocketInputStream::OnSocketReady [this=161b97ec cond=80470002]
>(snip of polling logs)

As you said in report of Comment #6, no "null" from server after "RECV: .".
What can client do, when required "null" won't come from server?
Wait for data and detect timeout in case of "never come", is usual action, I think.

While waiting for "null", nsSocketInputStream::CloseWithStatus is seen in your log. It may be cause of freeze of Tb.

> (Why "nsHostResolver::ThreadFunc exiting" immediately following the "RECV: ." ??

It's log of "nsHostResolver". It's other task's(DNS related task) log and I think it has no relation to this bug's freeze problem, although possibility that it the cause of Tb's freeze is not ZERO.
(e.g. PC's IP address was changed by expiration of DHCP, then Tb froze)
Comment 15 WADA 2008-01-10 04:24:55 PST
Was error message(such as exception) issued to Error Console when receive of "killer" messages and freeze of Tb?
Comment 16 David A. Cobb 2008-02-04 09:27:42 PST
(In reply to comment #14)
> What can client do, when required "null" won't come from server?
> Wait for data and detect timeout in case of "never come", is usual action, I
> think.

WHAT "required null"??  I see nothing in the RFC that suggests any null line be sent by the server. 
Comment 17 David A. Cobb 2008-02-04 11:11:40 PST
Created attachment 301321 [details]
Another NSPR-LOG excerpt.  Two received successfully and the end-of-session

During the session when I recorded this, at least a dozen BASE64 messages were received and successfully processed.
The thing I want to call your attention to is the very last message in the log. It is a simple ASCII message, it ends with "RECV: .", and -- because it is the last message -- the program immediately closes the connection.  There is no (null) either from the server or from Netlib.

I do notice that the CPU usage always seems to spike wwhen a Base64 message is being received, but only in a few cases is it unable to recover.

The most consistent factor, which y'all seem to ignore, is that EVERY message from senders X, Y, and Z freeze me up, while other Base64 messages do not.  This seems to implicate the CONTENT (as encoded) of the message. 

Groping through the code, I note that in decoding Base64 there is a comment that the output is into the same buffer as the input, but "this is OK because the output is always smaller."  How about inputs that are in some way pathological?  Could you put an ASSERT (int) output-ptr < (int) input pointer in there?  If the output pointer ever did catch up, we would see exactly what we are seeing here as the program kept trying to decode its own output.

Beyond that, I am very very cramped for disk space so I cannot compile the system myself.  If some kind soul could make a binary with symbols left in, and put it somewhere where I can download it, I can try attaching DDD/GDB and seeing just where the program is wasting its time.
Comment 18 WADA 2008-02-04 17:27:47 PST
(In reply to comment #16)
> WHAT "required null"??

I wanted to say ;
 Required log of "RECV: (null)" after log of "RECV: ."(indicator of end of mail)

> I see nothing in the RFC that suggests any null line be sent by the server. 

Sorry but I don't know RFC level description well. But, Tb apparently goes ahead(goes to next DELE step) only when consecutive log of "RECV: ." & "RECV: (null)".
 (a. RECV: (null) : End of block of data )
 (   RECV: .      : End of mail data     )
 (   When end of mail data transfer, '.' & (null) seems to be required.)
 (b. "POP3: Entering state: 20" means "20 SEND_DELE". See next.)
 (   http://kb.mozillazine.org/Session_logging_for_mail/news   )
Log of "RECV: (null)" possibly doens't mean "server sent (null)".
It can be indicator of network communication level "End of buffer".

(Off-Topic, possibly different problem/bug)
About "CPU usage spikes" when large base64 encoded data.

Do you use filter rule of of "body contains"?
If yes, other bug can cause such phenomenon, because, AFAIR, there is at least problem of "search body searches base64 encoded attacment including binary attachments". And, AFAIR, binary attachement case such as jpeg is already resolved. But I don't know whether fix for the bug is already applied on branch or not. Anyway, search bugzilla.mozilla.org well, please. 

Comment 19 WADA 2008-02-05 23:55:45 PST
FYI.
See Bug 37031 for problem in "body search". It was still open. 
> Bug 37031 searching message body yields false positives because base64 encoded binary attachments are treated as plaintext
Already resolved problem was Bug 132340.
> Bug 132340 Local body search does not work if the body is encoded as Base64
Comment 20 David A. Cobb 2008-02-07 05:49:59 PST
(In reply to comment #18)
But, Tb apparently goes
> ahead(goes to next DELE step) only when consecutive log of "RECV: ." &
> "RECV: null)".
>  (a. RECV: (null) : End of block of data )
>  (   RECV: .      : End of mail data     )
>  (   When end of mail data transfer, '.' & (null) seems to be required.)

Well, I sure hope the program isn't waiting for something that the server isn't obliged to send!

My problem with that whole line of investigation is that only certain senders' mail triggers this bug.  This would assume that my ISP is sabotaging one small group of senders.  Seems unlikely.

I've been prowling around in the source "nsmimeenc.cpp" and elsewhere.
Comments such as the following are pretty scary when there's no ASSERT to check your assumptions.
<BLOCKQUOTE>
 mime_decode_base64_buffer (MimeDecoderData *data,
248                const char *buffer, PRInt32 length)
249 {
250   /* Warning, we are overwriting the buffer which was passed in.
251    This is ok, because decoding these formats will never result
252    in larger data than the input, only smaller. */
253   const char *in  = buffer;
254   char *out = (char *) buffer;
255   char token [4];
<SNIP />
295       /* If there are characters left over from the last time around,
296        we might not have space in the buffer to do our dirty work
297        (if there were 2 or 3 left over, then there is only room for
298        1 or 2 in the buffer right now, and we need 3.)  This is only
299        a problem for the first chunk in each buffer, so in that
300        case, just write prematurely. */

Somewhere in the subsequent code, it would be comforting to see an ASSERT to verify (ulong) out <= (ulong) in, and I don't know what to make of the second comment,
 

> (Off-Topic, possibly different problem/bug)
> About "CPU usage spikes" when large base64 encoded data.
> 
> Do you use filter rule of of "body contains"?

NO filters relating to a message body.  I process far too much mail to dare that sort of expensive test. But I do think you're on the right track there, because it APPEARS to be the case that what is failing is that the filtering process never returns.  Is it possible that somewhere in the filtering, a mime part is decoded even when there is no test for the body content??  There is plenty of room for an alligator or three in that code.

I sure wish there were a logging statement or two to indicate when a message is passed to filtering and when it's completed the tests.

And, on that topic, where can I find a COMPLETE list of the terms that work for NSPR_LOG_MODULES??
Comment 21 WADA 2008-06-17 16:29:43 PDT
(In reply to comment #20)
> mime_decode_base64_buffer (MimeDecoderData *data, (snip)

David A. Cobb, you were right.
Further analysis has been done by Bug 434091, and culprit is found to be Bug 411988.

Comment 22 Wayne Mery (:wsmwk, NI for questions) 2009-04-01 04:52:51 PDT
David A.
Do you see this problem when using http://www.seamonkey-project.org/releases/2.0a3 now that bug 411988 is fixed?
Comment 23 Wayne Mery (:wsmwk, NI for questions) 2009-12-31 12:14:14 PST

*** This bug has been marked as a duplicate of bug 411988 ***

Note You need to log in before you can comment on or make changes to this bug.