Closed Bug 302314 Opened 19 years ago Closed 17 years ago

quoted-printable text attachments get extra lines inserted if there's a CR+LF in the original text at 0x2000 offset or multiple thereof

Categories

(MailNews Core :: MIME, defect)

x86
Windows 2000
defect
Not set
critical

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: bugzilla, Assigned: mscott)

Details

Attachments

(2 files, 1 obsolete file)

User-Agent:       Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en) AppleWebKit/412 (KHTML, like Gecko) Safari/412
Build Identifier: Mozilla Thunderbird 1.0.2 (Windows/20050317)

The summary says it quite exactly already.

Here's the symptom:

When sending mails with an attachments of ending ".trn", containing text with CR+LF as line delimiters, 
the recipient (no matter which client, I looked at the original mail on the mail server) would get this 
attachment sometimes with extra lines inserted.

I found that this happens when the original attachment had a CR at offset 0x1FFF and a LF at offset 
0x2000.

The resulting mail would turn this into this sequence: =,CR,CR (the att was encoded as "quoted-
printable")

I suspect that the code that encodes attachments uses a 8192 Byte-Buffer and looks for occurances of 
CR+LF, without taking into account this special boundary case.

BTW, here's a "lazy" solution to this problem that might affect many other sequence searches as well: 
Use a buffer that's a little bigger in size (e.g. 8K+10 bytes). When starting, load the first text fragment 
of 8K to offset 10 in this buffer, leaving the first 10 bytes unused. Then do your search in the buffer, 
starting at offset 10. Then, when done with these 8K, _copy_ the last 10 bytes of the buffer to the start 
of the buffer, then load a new fragment again to offset 10. Now you end up with having the boundary 
completely in memory, so you search not from offset 10 of the buffer any more, but from a few bytes 
backwards (i.e. offset 9 if you're searching for a string of 2 bytes in length). This way, you won't have to 
special case split fragments in your search any more.


Reproducible: Always

Steps to Reproduce:
1. Create a text with CR+LF at the 0x2000 boundary
2. Send this att with a mail
3. Check the mail at the receiving side, note the extra line inserted.
I just realized I'm using an older version of Thunderbird.
So I checked again with v1.0.5 - same problem there.

Thomas
(In reply to comment #1)
> I just realized I'm using an older version of Thunderbird.
> So I checked again with v1.0.5 - same problem there.

For the purposes of testing, that doesn't really count as a "new" build.

Could you provide a sample .TXT file that exhibits the problem?  (Attach it to 
this bug, using the Create New Attachment link above.)
If the file is send as ZIP file, it arrives in good shape, even if the ZIP file
exeeds 10KB in size.
Attachment #191065 - Attachment mime type: text/plain → application/zip
That attachment does not apply to this bug -- it has no CRLF in the plain-text 
data.  It's actually for bug 302760.
I'm currently on the road and it'll take a few days before I can do this.
But is it so hard to create such a file yourselves?
I have uploaded a zip file containing a demonstration file exposing the bug.
Just send this file as an attachment. Message headers and main text content do not matter. Once sent, 
even the message in the "Sent" folder exposes the bug already (the attachment should appear inlined as 
text).

So see the bug, search for the line containing "29, 35, 2, 28, 36". Right after that, an empty line is inserted 
where the original did not have it.
First: the supplied attachment is not a zip file, it's the raw (Windows/DOS 
format) text.  Second, it appears to have *already* had the superfluous CRLF 
added to it -- at the line indicated, there is a blank line following.

Finally: after editing that file to restore that point to a single CR/LF (which 
is indeed positioned at 0x1FFF) I cannot reproduce the problem as stated, with 
either TB 1.0.6 or TB 1.0+0725.  The attachment is sent and received and saved 
to disk with exactly the same bytes as the original file.
Attached file fixed test file
Yes, I did two mistakes:
1. I did create a zip file from the original but apparently accidentaly 
selected the original file for upload anyways.
2. The original file had already the modification in it. Another accident, as I 
used the received file instead of the one I originally sent.
Sorry for that.

But after I "corrected" the file but
(a) removing the extra line delim and
(b) making sure it was saved with CR+LF linefeeds, I could reproduce the 
describe problem again with Thunderbird 1.0.6.

Here's how I made sure you should be able to reproduce it:
1. Updated to 1.0.6 on Win2000
2. downloaded that "fixed test file" attachment from this bugzilla page to 
the desktop.
3. created a new mail in Thunderbird, entering an e-mail addr, "test" as 
subject and then dragged the file from the desktop into the address field 
area so that it became a new attachment.
Sent this mail and voila - the extra line is in there again.
Attachment #191210 - Attachment is obsolete: true
There is no difference between the file you just supplied and the one I used for 
testing.  The essential point of comment 8 is: this bug is not as reproducible 
as you think.

Note that I have no extensions installed; if you do, you should be testing in 
Safe Mode.

Are you seeing the extra CRLF in the copy of the message in your Sent folder, or 
only at the receiving side?  Is the receiver a Mozilla client or something else 
(like Outlook)?

I note that the character set for the file is "IBM-850" (I figured this out by 
trial-and-error).  Are you also using that character set for message 
composition?

Is the message window you open plain-text or HTML composition?  If HTML, is the 
message also *sent* as HTML?  (I don't think this is actually an issue, but at 
this point, it's worth knowing.)
>Note that I have no extensions installed; if you do, you should be testing 
in Safe Mode.

What kind of extensions? For Thunderbird? I've tested this on two very 
different machines, and one of them got Tunderbird freshly installed just 
for this test, so I believe that there are no such things installed.
But what would an extension have to do with the encoding of a file into the 
mail? Have you had a look at all at the code that encodes attachments? 
Have you read what I suspect to be the error, being a problem of using a 
buffer of 0x2000 bytes and not seeing the CR in the last byte of the buffer 
and LF in the beginning of the newly filled buffer, so as if the code looks 
for CR+LF in the buffer, it'll not find it?

> I note that the character set for the file is "IBM-850"

It's some encoding good old DOS uses. Only this file is encoded with it, 
while Thunderbird surely does not use it.

> Are you seeing the extra CRLF in the copy of the message in your Sent 
folder, or
> only at the receiving side?  Is the receiver a Mozilla client or something 
else  (like Outlook)?

As I wrote first, I can see this in the raw mail (not delivered to a client yet), 
and I can also see it in the Sent folder of Thunderbird, so this is really an 
encoding problem on the sending side, no doubt.

> Is the message window you open plain-text or HTML composition? 

Plain.

I could pack the files that are responsible for the Thunderbird prefs into an 
archive and upload them here so you have (hopefully) the same settings I 
use. If so, tell me which folder or files I should archive.
OK - I think I found what you're missing:

I've made a new installation of the Mac OS X version of Thunderbird 1.0.6 
and could first not reproduce it there - until I changed the preference 
Composition -> use 'quoted printable' to "yes". With that, I could cause the 
error even on the Mac version.

I hope this helps.
Hi,
what's happening now? Why is it still not confirmed? Is it still not reproducible?
I have reproduced the bug with TB 1.6a1-0830, Win2K; also with 
Seamonkey 1.0a-0806.  The attachment gets the following headers:
-----
Content-Type: text/plain;
 name="302314.trn"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment;
 filename="302314.trn"
-----
and the extra CRLF is part of the q-p attachment source -- so the problem 
appears to be generated during q-p *encoding*, rather than decoding.

I also discovered a workaround: instead of using ISO-8859-1, set the message's 
encoding to UTF-8.  (In the compose window,   Options | Character Encoding  )
For some reason, this forces the attachment to be encoded as Base-64, rather 
than q-p, and this encoding/decoding appears to work correctly.  However, this 
symptom is considered a bug -- bug 248639.
Status: UNCONFIRMED → NEW
Component: Message Compose Window → MailNews: MIME
Ever confirmed: true
Product: Thunderbird → Core
Summary: text attachments get extra lines inserted if there's a CR+LF in the original text at 0x2000 offset or multiple thereof → quoted-printable text attachments get extra lines inserted if there's a CR+LF in the original text at 0x2000 offset or multiple thereof
Version: unspecified → Trunk
This bug appears to have been fixed on the trunk; I believe it's from the fix 
at bug 269390.  Christian, please correct me if you think that's wrong.

That patch is (I think) only on the trunk right now but should be moved over 
to the 1.8.1 branch (for TB 2.0) shortly.
Marking fixed, per that patch.
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
Product: Core → MailNews Core
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: