Closed Bug 337903 Opened 18 years ago Closed 15 years ago

Computer freeze when viewing an email with Content-Transfer-Encoding: base64

Categories

(SeaMonkey :: General, defect)

x86
Linux
defect
Not set
critical

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 411988

People

(Reporter: Morten, Unassigned)

References

()

Details

(Keywords: hang)

Attachments

(5 files)

I am not certain why it crashes, because my computer (display) freezes up.

I have not attempted to contact the computer over the network after it froze, but it didn't show any response to keyboard commands (ctrl-alt-backspace, followed by ctrl-alt-delete)

I am running Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.2) Gecko/20060409 SeaMonkey/1.0.1 Mnenhy/0.7.3.0 my distro is Trustix, and the email that freezes my computer will be attached shortly

I am not certain the Content-Transfer-Encoding is related, but that is the one thing about the email that leaped at me.
Original filename: 1147572040.28436_2.ryo-ohki.4th-age.com:2,S
my computer freezes in the exact same manner when I try loading the url now attached
the same does not affect firefox (Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.1) Gecko/20060206 Firefox/1.5.0.1)
clearly not mailnews-specific..
Component: MailNews: Main Mail Window → General
Assignee: mail → general
QA Contact: general
[2007-12-05-03-trunk on Ubuntu 'Gutsy']

Yeah, I'm seeing a lot of this.  However, not every base64 message freezes the computer.  What I am seeing is that every message from a certain set of <From> addresses will freeze up immediately upon receipt.  My very first filter should have deleted at least one of the bunch, but it doesn't fire.  On the other hand, when I get control again and look in my inbox the message appears to be there -- I'm not eager to try opening it.  And, if things were working correctly, the message would have been filed by another filter and not appear in my Inbox.

One of the senders could be malicious, but I've opened her messages and they appear innocuous and on-topic.  I do note that they are CHAR-SET=KOI8-R, and I'm not sure I could display that. Anyway, the program isn't trying to render the messages at this point.

A second sender, however, is <zdi-disclosures@3mcom.com> via bugtraq.  That doesn't seem a likely origin of malicious mail.  Perhaps someone else reading this also subscribes to bugtraq and can look at those.
This is the end of the logfile at the point of failure, plus the message as stored.  NOTE that the msg was stored in my INBOX; it should have been filtered.  This leads me to suspect the failure occurred during filtering.

I have no shortage of other examples, if useful.
(In reply to comment #5)
> A second sender, however, is <zdi-disclosures@3mcom.com> via bugtraq.  That
> doesn't seem a likely origin of malicious mail.

Like the attachment#294753 [details], the message wound up in my INBOX.  It should have been filtered. 

As it wasn't clear in my comment#6, there is no difficulty rendering the messages once I've recovered from the freeze.
And, to add to my perplexity, I can manually run the filters which should have caught both message groups with no problem.
(In reply to comment #6)
> -1224669504[8d9b548]: RECV: .
> ==============================================================================
> At the above point, the program froze, using all CPU.

Because POP3 case, after end of data indicator of following ".";
> 0[3b43f0]: RECV: .
next flow normally occurs.
> 0[3b43f0]: RECV: (null)
> 0[3b43f0]: POP3: Entering state: 15
> 0[3b43f0]: POP3: Entering state: 22
And RETR for more messages follows, and DLET for downloaded mail follows when no "Leave messages on Server", and logoff is executed by QUIT.
> 0[3b43f0]: SEND: QUIT

No "(null)" after "." in NSPR log is probably one of next.
(A) "RECV: (null)" and logs here after were held in log buffer, and Tb went in           
    loop, and buffered logs were not written to log file due to kill of Tb.
(B) After receive of null, Tb went in loop before log-write of "RECV: (null)",
    and logs here after did not appear.
(C) "(null)" was not passed to Tb yet, as log says.

To David A. Cobb:

(C) can be caused by mail scan feature of anti-virus software.
What anti-virus software do you use? Do you enable mail scan feature?

Network related NSPR log will help to see which case, (A), (B), or (C).
Please use next if you can do further test.
  SET NSPR_LOG_MODULES=pop3:5,nsSocketTransport:5,nsHostResolver:5
See http://www.mozilla.org/projects/netlib/http/http-debugging.html
(In reply to comment #9)
> (In reply to comment #6)
> > -1224669504[8d9b548]: RECV: .
> > ==============================================================================
> > At the above point, the program froze, using all CPU.
> 
> Because POP3 case, after end of data indicator of following ".";
> > 0[3b43f0]: RECV: .
> next flow normally occurs.
> > 0[3b43f0]: RECV: (null)
> > 0[3b43f0]: POP3: Entering state: 15
> > 0[3b43f0]: POP3: Entering state: 22
> And RETR for more messages follows, and DLET for downloaded mail follows when no "Leave messages on Server", and logoff is executed by QUIT.
> > 0[3b43f0]: SEND: QUIT

Not to be contentious, but that is simply not the case.  I have loads and loads of logs showing --

-1223883072[8d9b548]: RECV: View this message in context: http://www.nabble.com/netbeans-6---web-application-jdk-1.4-tp14282494p14284347.html
-1223883072[8d9b548]: RECV: Sent from the Netbeans - Users mailing list archive at Nabble.com.
-1223883072[8d9b548]: RECV: .
-1223883072[8d9b548]: RECV: (null)
-1223883072[8d9b548]: POP3: Entering state: 20
-1223883072[8d9b548]: SEND: DELE 3
               <===========
-1223883072[8d9b548]: Entering NET_ProcessPop3 5
-1223883072[8d9b548]: POP3: Entering state: 3
-1223883072[8d9b548]: RECV: +OK
-1223883072[8d9b548]: POP3: Entering state: 21
-1223883072[8d9b548]: POP3: Entering state: 15
-1223883072[8d9b548]: POP3: Entering state: 18
-1223883072[8d9b548]: SEND: RETR 4

-1223883072[8d9b548]: Entering NET_ProcessPop3 1448
-1223883072[8d9b548]: POP3: Entering state: 3
-1223883072[8d9b548]: RECV: +OK 5173 octets

> No "(null)" after "." in NSPR log is probably one of next.
> (A) "RECV: (null)" and logs here after were held in log buffer, and Tb went in  
>     loop, and buffered logs were not written to log file due to kill of Tb.
> (B) After receive of null, Tb went in loop before log-write of "RECV: (null)",
>     and logs here after did not appear.
> (C) "(null)" was not passed to Tb yet, as log says.
> 

I'm not sure what you're saying here. I see many, many instances of (null) -- some in the middle of a message, sometimes after the end-of-message.   

I don't understand why the program should even be reading the channel after the "." -- I see nothing in the RFC that would suggest this is not the final transmission of a transaction.

If, on the other hand, the (null) is being read AFTER completely processing the message, we may have something here.  That is the point that is never reached.

Here is one in the middle of a message:
-1223883072[8d9b548]: RECV: jobsearch
-1223883072[8d9b548]: RECV: 
-1223883072[8d9b548]: RECV: 25 new jobs matching programmer in Providence, RI were added to
-1223883072[8d9b548]: RECV: (null)
-1223883072[8d9b548]: Entering NET_ProcessPop3 4096
-1223883072[8d9b548]: POP3: Entering state: 19
-1223883072[8d9b548]: RECV: Jobster in the last 24 hours. Here are the newest jobs that you
-1223883072[8d9b548]: RECV: haven't seen yet.


> To David A. Cobb:
> 
> (C) can be caused by mail scan feature of anti-virus software.
> What anti-virus software do you use? Do you enable mail scan feature?

My virus scan is ClamAV, invoked from p3scan.  P3scan intercepts the channel -- thunderbird never even sees it until after the scan is complete.

In any case, when the machine locks up all the useful CPU is being taken by Thunderbird, not by P3scan or ClamAV. 

> 
> Network related NSPR log will help to see which case, (A), (B), or (C).
> Please use next if you can do further test.
>   SET NSPR_LOG_MODULES=pop3:5,nsSocketTransport:5,nsHostResolver:5
> See http://www.mozilla.org/projects/netlib/http/http-debugging.html
> 

Well, that made my logs even bigger and messier.  I'm not sure it shows anything useful.

See Attachment. This is only the TAIL of a humongous log.

A second attachment shows the log of a BASE64 message that came through without any problem.  To summarize what it looks like to me: all BASE64 from certain originators cause problems.  Other originators, not.

Let me highlight another symptom I think we're ignoring.  When I regain control of the computer, I find the "pathological" message sitting in my Inbox, and I can render it without problems.  But it shouldn't be in the Inbox, it should have been filtered.  And THAT is where I believe we are hanging up.

Is there a NSPR_LOG_MODULES value that would track a message through the rest of the processing??
This log was captured hours after the "bad" message was received.  I note that the thread that handles the network connection appears to be active -- although the state looks weird! (Why "nsHostResolver::ThreadFunc exiting" immediately following the "RECV: ." ??
But I note that this is a mime-part, whereas the "killer" messages are all BASE64 for the entire body.  FWIW.
(In reply to comment #10)
> I'm not sure what you're saying here. I see many, many instances of (null) --
> some in the middle of a message, sometimes after the end-of-message.   

You dis say following in attached log & report in your Comment #6, didn't you?
> -1224669504[8d9b548]: RECV: .
> ==============================================================================
> At the above point, the program froze, using all CPU.
It is real log when freeze, isn't it?

> -1223883072[8d9b548]: RECV: .
> -1223883072[8d9b548]: RECV: (null)
> -1223883072[8d9b548]: POP3: Entering state: 20
> -1223883072[8d9b548]: SEND: DELE 3
>(snip)
> -1223883072[8d9b548]: RECV: +OK
>(snip)
> -1223883072[8d9b548]: SEND: RETR 4

You are right. Sorry for my incorrect comment.
(In reply to comment #11)
> Created an attachment (id=296176) [details]
> Tail -n 2000 from NSPR_LOG_FILE when frozen, with more MODULES

Following is last part of log.
> -1223891264[8da5548]: RECV: L2RvY3MueWFob28uY29tL2luZm8vdGVybXMvCiAK
> -1223891264[8da5548]: RECV: .
> -1302733936[b0c00708]: nsHostResolver::ThreadFunc exiting
> -1233343600[8e3a020]:     ...returned after 239955 milliseconds
> -1233343600[8e3a020]: nsSocketTransport::OnSocketReady [this=161b9710 outFlags=1]
> -1233343600[8e3a020]: nsSocketInputStream::OnSocketReady [this=161b97ec cond=0]
> -1233343600[8e3a020]: STS dispatch [2657aa88]
> -1233343600[8e3a020]: nsSocketInputStream::Read [this=161b97ec count=4096]
> -1233343600[8e3a020]:   calling PR_Read [count=4096]
> -1233343600[8e3a020]:   PR_Read returned [n=-1]
> -1233343600[8e3a020]: nsSocketInputStream::AsyncWait [this=161b97ec]
>(snip of many polling log)
> -1233343600[8e3a020]: nsSocketInputStream::CloseWithStatus [this=161b97ec reason=0]
> -1233343600[8e3a020]: nsSocketInputStream::CloseWithStatus [this=161b97ec reason=80470002]
> -1233343600[8e3a020]: nsSocketTransport::OnMsgInputClosed [this=161b9710 reason=80470002]
> -1233343600[8e3a020]: nsSocketInputStream::OnSocketReady [this=161b97ec cond=80470002]
>(snip of polling logs)

As you said in report of Comment #6, no "null" from server after "RECV: .".
What can client do, when required "null" won't come from server?
Wait for data and detect timeout in case of "never come", is usual action, I think.

While waiting for "null", nsSocketInputStream::CloseWithStatus is seen in your log. It may be cause of freeze of Tb.

> (Why "nsHostResolver::ThreadFunc exiting" immediately following the "RECV: ." ??

It's log of "nsHostResolver". It's other task's(DNS related task) log and I think it has no relation to this bug's freeze problem, although possibility that it the cause of Tb's freeze is not ZERO.
(e.g. PC's IP address was changed by expiration of DHCP, then Tb froze)
Was error message(such as exception) issued to Error Console when receive of "killer" messages and freeze of Tb?
(In reply to comment #14)
> What can client do, when required "null" won't come from server?
> Wait for data and detect timeout in case of "never come", is usual action, I
> think.

WHAT "required null"??  I see nothing in the RFC that suggests any null line be sent by the server. 
During the session when I recorded this, at least a dozen BASE64 messages were received and successfully processed.
The thing I want to call your attention to is the very last message in the log. It is a simple ASCII message, it ends with "RECV: .", and -- because it is the last message -- the program immediately closes the connection.  There is no (null) either from the server or from Netlib.

I do notice that the CPU usage always seems to spike wwhen a Base64 message is being received, but only in a few cases is it unable to recover.

The most consistent factor, which y'all seem to ignore, is that EVERY message from senders X, Y, and Z freeze me up, while other Base64 messages do not.  This seems to implicate the CONTENT (as encoded) of the message. 

Groping through the code, I note that in decoding Base64 there is a comment that the output is into the same buffer as the input, but "this is OK because the output is always smaller."  How about inputs that are in some way pathological?  Could you put an ASSERT (int) output-ptr < (int) input pointer in there?  If the output pointer ever did catch up, we would see exactly what we are seeing here as the program kept trying to decode its own output.

Beyond that, I am very very cramped for disk space so I cannot compile the system myself.  If some kind soul could make a binary with symbols left in, and put it somewhere where I can download it, I can try attaching DDD/GDB and seeing just where the program is wasting its time.
(In reply to comment #16)
> WHAT "required null"??

I wanted to say ;
 Required log of "RECV: (null)" after log of "RECV: ."(indicator of end of mail)

> I see nothing in the RFC that suggests any null line be sent by the server. 

Sorry but I don't know RFC level description well. But, Tb apparently goes ahead(goes to next DELE step) only when consecutive log of "RECV: ." & "RECV: (null)".
 (a. RECV: (null) : End of block of data )
 (   RECV: .      : End of mail data     )
 (   When end of mail data transfer, '.' & (null) seems to be required.)
 (b. "POP3: Entering state: 20" means "20 SEND_DELE". See next.)
 (   http://kb.mozillazine.org/Session_logging_for_mail/news   )
Log of "RECV: (null)" possibly doens't mean "server sent (null)".
It can be indicator of network communication level "End of buffer".

(Off-Topic, possibly different problem/bug)
About "CPU usage spikes" when large base64 encoded data.

Do you use filter rule of of "body contains"?
If yes, other bug can cause such phenomenon, because, AFAIR, there is at least problem of "search body searches base64 encoded attacment including binary attachments". And, AFAIR, binary attachement case such as jpeg is already resolved. But I don't know whether fix for the bug is already applied on branch or not. Anyway, search bugzilla.mozilla.org well, please. 

FYI.
See Bug 37031 for problem in "body search". It was still open. 
> Bug 37031 searching message body yields false positives because base64 encoded binary attachments are treated as plaintext
Already resolved problem was Bug 132340.
> Bug 132340 Local body search does not work if the body is encoded as Base64
(In reply to comment #18)
But, Tb apparently goes
> ahead(goes to next DELE step) only when consecutive log of "RECV: ." &
> "RECV: null)".
>  (a. RECV: (null) : End of block of data )
>  (   RECV: .      : End of mail data     )
>  (   When end of mail data transfer, '.' & (null) seems to be required.)

Well, I sure hope the program isn't waiting for something that the server isn't obliged to send!

My problem with that whole line of investigation is that only certain senders' mail triggers this bug.  This would assume that my ISP is sabotaging one small group of senders.  Seems unlikely.

I've been prowling around in the source "nsmimeenc.cpp" and elsewhere.
Comments such as the following are pretty scary when there's no ASSERT to check your assumptions.
<BLOCKQUOTE>
 mime_decode_base64_buffer (MimeDecoderData *data,
248                const char *buffer, PRInt32 length)
249 {
250   /* Warning, we are overwriting the buffer which was passed in.
251    This is ok, because decoding these formats will never result
252    in larger data than the input, only smaller. */
253   const char *in  = buffer;
254   char *out = (char *) buffer;
255   char token [4];
<SNIP />
295       /* If there are characters left over from the last time around,
296        we might not have space in the buffer to do our dirty work
297        (if there were 2 or 3 left over, then there is only room for
298        1 or 2 in the buffer right now, and we need 3.)  This is only
299        a problem for the first chunk in each buffer, so in that
300        case, just write prematurely. */

Somewhere in the subsequent code, it would be comforting to see an ASSERT to verify (ulong) out <= (ulong) in, and I don't know what to make of the second comment,
 

> (Off-Topic, possibly different problem/bug)
> About "CPU usage spikes" when large base64 encoded data.
> 
> Do you use filter rule of of "body contains"?

NO filters relating to a message body.  I process far too much mail to dare that sort of expensive test. But I do think you're on the right track there, because it APPEARS to be the case that what is failing is that the filtering process never returns.  Is it possible that somewhere in the filtering, a mime part is decoded even when there is no test for the body content??  There is plenty of room for an alligator or three in that code.

I sure wish there were a logging statement or two to indicate when a message is passed to filtering and when it's completed the tests.

And, on that topic, where can I find a COMPLETE list of the terms that work for NSPR_LOG_MODULES??
(In reply to comment #20)
> mime_decode_base64_buffer (MimeDecoderData *data, (snip)

David A. Cobb, you were right.
Further analysis has been done by Bug 434091, and culprit is found to be Bug 411988.

Depends on: 411988
Version: Other Branch → Trunk
Severity: normal → critical
David A.
Do you see this problem when using http://www.seamonkey-project.org/releases/2.0a3 now that bug 411988 is fixed?
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: