Closed Bug 90161 Opened 23 years ago Closed 20 years ago

URL recognition at the end of line in QP (quoted-printable) misses last character

Categories

(MailNews Core :: MIME, defect)

All
Linux
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: hcaley, Assigned: anlan)

References

Details

(Whiteboard: See dup bug 242695 for good descr)

Attachments

(5 files, 1 obsolete file)

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux ppc; en-US; rv:0.9.2+) Gecko/20010709
BuildID:    2001070922

URL's included in mail messages are sometimes not completely converted to links.
 The following URL is the output of "View Source" on one of the messages in
question.  It ends in two digits; when viewed in Mozilla the entire URL will be
part of a link except for the last digit:

http://lana.neomorphic.com/cgi-bin/trouble.pl?type=3Dsearchdetail&format=
=3Duser&serial=3D24



Reproducible: Always
Steps to Reproduce:
1. See example above
2.
3.

Actual Results:  Last character is left out of the URL link

Expected Results:  All characters should have been part of the URL

The program that generates these messages is something I wrote in Perl; I don't
know why the CGI module is inserting the "3D"'s in the URLs.  That's the only
thing that looks weird about the URL.d
Is the message in format=flowed?  The =3Ds should only be inserted if it's
format=flowed, I believe...
I can confirm that the same behaviour is seen on MacosX, build 2001070515, and
Linux x86, build 200107118, and Netscape 6.1 PR1
Reporter, can you provide the entire source, including headers, of the message
you used to reproduce this?
Received: from uxpx01.affymetrix.com ([10.10.5.130]) by ntex01.Affymetrix.com
with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13)
	id RPPDXS6Z; Thu, 23 Aug 2001 17:44:12 -0700
Received: from iserver.affymetrix.com (iserver.affymetrix.com [204.162.24.3])
	by uxpx01.affymetrix.com (Pro-8.9.3/Pro-8.9.3/CL-INT-20010517-01) with ESMTP id
RAA07421
	for <Hugh_Caley@affymetrix.com>; Thu, 23 Aug 2001 17:46:20 -0700 (PDT)
Received: from roma.neomorphic.com (hidden-user@firewall.neomorphic.com
[205.217.46.68])
	by iserver.affymetrix.com (Pro-8.9.3/Pro-8.9.3/CL-EXT-20010517-01) with ESMTP id
RAA14052
	for <Hugh_Caley@affymetrix.com>; Thu, 23 Aug 2001 17:43:58 -0700 (PDT)
Received: from localhost.localdomain (lana.neomorphic.com [10.60.100.148])
	by roma.neomorphic.com (8.9.0/8.9.0) with SMTP id RAA14966
	for <Hugh_Caley@affymetrix.com>; Thu, 23 Aug 2001 17:46:18 -0700 (PDT)
Message-Id: <200108240046.RAA14966@roma.neomorphic.com>
Mime-version: 1.0
Content-type: text/plain; charset="iso-8859-1"
Date: Thu, 23 Aug 2001 17:46 -0700
Subject: Trouble - Config novaroma for raid, sendmail, ssh, bastille<?>, RH 7.1
To: Hugh_Caley@affymetrix.com, @affymetrix.com
From: Trouble <trouble@affymetrix.com>
Content-transfer-encoding: quoted-printable

The following trouble ticket has passed it's target date;=20
please update it or close it and inform the owner as to the status:

Owner is project,
Primary Assigned Support is Hugh_Caley,

Secondary Support is ,
The issue is "Config novaroma for raid, sendmail, ssh, bastille<?>, RH =
7.1"
Target date for completion is "2001-08-13"

http://lana.neomorphic.com/cgi-bin/trouble.pl?type=3Dsearchdetail&form=3D=
edit&serial=3D21
BTW, that last was the complete headers and text of a message that will 
not display properly when received in Mozilla or Netscape 6.  The final 
URL in the message will be highlighted in blue, except for the last digit.
hcaley, you mean that the whole URL is linked *except* for the final "1" character?

That's surprising; it appears there's a carriage return just prior to the "edit"
portion; this would have been inserted by the mail sender and would result in
the behavior seen here in Bugzilla, that only the first line would be linked.
That being said, perhaps it's worth considering whether an additional criteria
can be added to the "link" parser for text/plain e-mails; that, if http:// is
encountered with a blank line above, search down through text for a blank line
below and construct an anchor (link) for that text, removing all carriage
returns and line breaks first.
Correct, everything is "linked" except for the last character.  The line break 
seems to be irrelevant.
Reporter, send me a copy of the e-mail so I can confirm.
I too am seeing this problem. It appear to be when the URL is the last item in
the message. For example if the last line of the message was
http://www.dtu.ox.ac.uk then the k would not display as part of the link. I'll
make this message an example also:

http://www.dtu.ox.ac.uk
Well, before someone else says it I will, that worked fine. It must be something
different about the way the message is built. The one I'm seeing it on is from
http://www.bananaloto.co.uk. If you enter the draw it sends out a message the
following day telling you how you did. This is the one that goes wrong. If I
forward the problem message the problem goes away. Maybe it's missing the
trailing CR/LF or some such.
Confirming as Hugh forwarded me a copy of his e-mail message. In it, the URL,

http://lana.neomorphic.com/cgi-bin/trouble.pl?type=searchdetail&form=edit&serial=48

is displayed as a linked anchor with the exception of the final "8" character, which is not.
Status: UNCONFIRMED → NEW
Ever confirmed: true
I forgot to mention that the bug is confirmed under both Mac/2001083008 and Mac/
2001080214 (0.9.3).
*** Bug 110434 has been marked as a duplicate of this bug. ***
I am not sure, bug 110434 is a dupe. In that case the URL was simple, but at the
very end of a message. I this bug here the problem seems to be a more
complicated structure of the URL (possible made worse by quoted-printable
screwed up).

pi

Here's one:

news://news.mozilla.org:119/3CA615A6.C063DB82@webaccess.net

the last letter in the signature is not linkified
*** Bug 127840 has been marked as a duplicate of this bug. ***
Bug 127840 contains a hexdump of such an email; the reason seams to be only one
CR/LF at the end of the mail instead of two.
Is 133016 a dup?
*** Bug 140970 has been marked as a duplicate of this bug. ***
problem exists in freebsd build 2002090806

echoing comment #11 - this seems to be something to do with how
the message is built: in particular the headers. i sent a version
of an example email back to myself and the link showed up fine.
the only differences between the original email, and the one i
resent to myself, where in the headers.

stripping out the smtp server timestamps, here are the differing
headers:

one that has the problem:

< X-MimeOLE: Produced By Microsoft Exchange V6.0.5762.3
< content-class: urn:content-classes:message
< MIME-Version: 1.0
< Content-Type: text/plain;
<       charset="iso-8859-1"
< Content-Transfer-Encoding: quoted-printable
< Subject:
< Date: Wed, 11 Sep 2002 12:29:42 -0400
< Message-ID: <89D097CC1D003643BB9F13714BA9723B027816C8@OCCLUST01EVS1.snip>
< X-MS-Has-Attach:
< X-MS-TNEF-Correlator:
< Thread-Index: AcJZsFeTR0sb4cD/EdaiqQDQtwj42Q==
< From: <snipped>
< To: <snipped>

the one that was fine:

> Date: Wed, 11 Sep 2002 12:35:58 -0400 (EDT)
> From: <snipped>
> Message-Id: <200209111635.g8BGZwC66237@snip>
> To: <snipped>
>

note that the one that worked also has an additional newline after
the end of headers.
folks, isn't this an important enough problem to act on? it was
created 7/10 and is still NEW. outlook (which unfortunately is
used by lots of people) has a "send web link" option that sends
the URL of the web page in the message and no text after.
every time i receive such a message i am unable to click on the
link to view the page. i can live with it, but i would guess
this would be a major annoyance for end-users?
Guys, it's been a long time on this one.  Any progress at all?  I'm using build
200392250 on MacOSX and it STILL has this problem.
*** Bug 185377 has been marked as a duplicate of this bug. ***
Narrowing summary, assuming this was really a dup.
Summary: Some URL's in mail messages are not completely converted to link → URL recognition at the end of line in QP (quoted-printable) misses last character
Hi 
I have the same issue

Here is anther test case:

Source of message:
From - Thu Apr 03 09:42:52 2003
X-UIDL: $PV"!f3l"!TnP!!p77!!
X-Mozilla-Status: 0001
X-Mozilla-Status2: 00000000
Envelope-to: olivier.vit@duke-interactive.com
Received: from [10.42.0.3] (helo=assurancetourix)
	by mail.duke-interactive.com with esmtp (Exim 3.35 #1 (Debian))
	id 190t6X-0001TU-00
	for <olivier.vit@duke-interactive.com>; Thu, 03 Apr 2003 03:01:21 +0200
Received: from assurancetourix ([127.0.0.1])
	by assurancetourix with esmtp (Exim 3.35 #1 (Debian))
	id 190t5Y-0006JL-00
	for <olivier.vit@duke-interactive.com>; Thu, 03 Apr 2003 03:00:20 +0200
Message-ID: <7832149.1049331620422.JavaMail.root@assurancetourix>
From: intranet@duke-interactive.com
To: olivier.vit@duke-interactive.com
Subject: feuille de temps
Mime-Version: 1.0
Content-Type: multipart/mixed; boundary="----=_Part_31_162178.1049331620403"
Date: Thu, 03 Apr 2003 03:00:20 +0200
X-MailScanner: Found to be clean, Found to be clean
X-UIDL: $PV"!f3l"!TnP!!p77!!
Status: U

------=_Part_31_162178.1049331620403
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Bonjour,=20

=09=09Tes feuilles de temps n'ont pas =E9t=E9 remplies.
=09=09Merci de les compl=E9ter.

=09=09Pour cela, il te suffit de cliquer sur le lien :=20

=09=09 http://intranet.duke
------=_Part_31_162178.1049331620403--

using Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.3) Gecko/20030312

I'm adding a screenshot
reproduced in 1.4b windows 2003050714
Can't someone take a crack at this, please?  It's been years!  Still showing up
in 20030826 nightly build for MacOSX, and I'm sure in others.
This bug also affects Firebird for Linux, at least the trunk builds.
I'm sorry, I meant thunderbird
Coming up on 3 years for this problem ...
still an issue in 1.7b 2004042409 (post rc1)
Is anyone from Mozilla still associated with this bug?  I note that the QA Contact seems to be a non-
valid address.
*** Bug 242695 has been marked as a duplicate of this bug. ***
Dup bug 242695 contains good descr.
Whiteboard: See dup bug 242695 for good descr
could those of you who are cc'ed on this bug please vote for it, if
you think its important? the really annoying part of the bug for me
is that the behaviour chops the "l" off of ".html" making the link
look like "file.htm". the latter is the 3 letter suffix format used
by microsoft!!! seem like we are [inadvertently] promoting microsoft
conventions.
> could those of you who are cc'ed on this bug please vote for it

Asking others to vote will only make me treat the votes as faked (they are out
of balance in comparison to other bugs). I'm not dumb, I can see the cc list.

FYI, I consider this to be a bug and I want to have it fixed (I don't like bugs
in my code), but I haven't had the motivation to fix any bugs in this code
(libmime and mozTXT*) at all in the last time. Sorry.
Assignee: sspitzer → ben.bucksch
Attachment #147760 - Attachment description: sample message where the url is badly truncated at the end of the message → sample message where the url is badly truncated at the end of the message correponds to the screenshot also attached to this bug report
ben,

not calling you dumb. i was not aware that canvassing for votes
is disallowed (i have seen others do it, hence my attempt). my
apologies if that is the case. i do not know exactly how the
mozilla organization (for lack of a better word) treats votes.
i am assuming that the votes agains a bug will be given some
consideration w.r.t bug priority. many of the users on the Cc:
list may not know about voting, and i was hoping to point them
to it.

i am looking at the source too, to see if i can perhaps find the
problem and fix it.
I think bug #242817 ("Last character of QP message is displayed in a new line")
has something to do with this bug.  Maybe the same base problem in the code
results in both bugs.  Bug #242817 should be checked after fixing this one.
Bug 242817 (or anything like it) could very well the cause for this bug, in
which case this bug is a dup of that one.
*** Bug 242933 has been marked as a duplicate of this bug. ***
Ok, this is as small as it gets. It is a (very cropped) message from Pine.

I poked around a bit in the code, and confirmed that the txt2html stuff was ok.
It is simply the QP decoder that splits the last line in two parts instead of
one. The url finder is then fed the two parts after each other, and of course
only the url in the first part gets proper mark up. There are even a couple of
comments in the code that each line must be passed on as one whole chunk to
avoid stuff like this. :-) 

The last line splitting probably happens always, it is just that one doesn't
notice it unless it is an url. It is likely that bug 242817 is caused by this
somehow, but that symptom is very different in that it gets an additional
linebreak inserted. This one (with the minimal testcase) should be easier to
debug - hopefully it resolves both.
If one cares, the testcase can get one line smaller by removing the char above
the url (just used it to provoke QP in the first place).
:-)
After getting over the not so low treshold of mime code understanding, I think
get why this happens.

The QP decoder parses what it can, and then leaves the rest (one or two chars)
in a buffer waiting for more data. When it is destroyed, this buffer is appended
to what was finished before. The problem here is that the URL recognition is
done before the decoder destruction.

Decoder shutdown sequence currently:
1. MimeInlineTextPlain_parse_eof
2. MimeInlineText_parse_eof
3. MimeInlineText_rotate_convert_and_parse_line
4. MimeInlineTextPlain_parse_line
5. MimeLeaf_parse_eof
6. MimeDecoderDestroy

In (1), we have an explicit comment saying we need to go up to make sure we have
emptied all buffers. In (2), we have an explicit comment saying that we avoid
just that
(http://lxr.mozilla.org/mozilla/source/mailnews/mime/src/mimetext.cpp#234). Thus
we end up in (4) where we find the faulty URL. Then (5) triggers (6), where the
last chars are added to the buffer...

If we change (2) to also go up, we get this instead:
a. MimeInlineTextPlain_parse_eof
b. MimeInlineText_parse_eof
c. MimeLeaf_parse_eof
d. MimeDecoderDestroy
e. mime_LineBuffer
f. MimeInlineText_rotate_convert_and_parse_line
g. MimeInlineTextPlain_parse_line

In (g) we handle the whole line at once!

If it weren't for the comment mentioned above I would be satisified. One problem
is that the bug # cited either is wrong or is in Netscape's bugtool. It would be
nice to have more testcases to see if anything (and what) breaks...
Ok, here is a simple patch for testing - call the parent before continuing.

This needs to be tested with more examples, primarly rot13 messages. Either
this patch is the wrong approach, or there should be a new patch that also
updates the comment in the code...

Comments from those who know the code?
Thanks for the investigation (and patch)! Great work :)

I don't think there's *any* active Mozilla contributor who really understands
libmime in these depths. This is *very* old code originally from jwz. The bug
reference in the comment probably still refers to the old Netscape 4.x bug
database. I don't understand the possible side effects your patch could have.
ducarroz is the owner (I think), so could you, J-F, review? Seth superreview?

anlan, did you test the View|Body|Simple HTML and As Plaintext modes? They
inherit indirectly from mimetext, and I had to fiddle with parse_line and _eof,
so this is a possible area of regression.
Assignee: ben.bucksch → anlan
Component: Mail Window Front End → MIME
Attachment #150361 - Flags: superreview?(sspitzer)
Attachment #150361 - Flags: review?(ducarroz)
This is a simple but scary patch as it's hard to figure out the potential side
effect. Have you tested it against bug 124941 to make sure it does not regress
it? Also, dos this patch fix bug 242817?
Yes, this fixes bug 242817 as well. The common thing is that messages are
QP-encoded. I suspect that the difference that makes messages in that bug
recieve an extra linebreak is that they are "format=flowed".

Simple HTML / Plain text seems fine so far. I'll investigate bug 124941 and a
few other examples tomorrow.
Status: NEW → ASSIGNED
*** Bug 242817 has been marked as a duplicate of this bug. ***
Attachment #147760 - Attachment filename: testcaseQT.txt → testcaseQT.eml
Attachment #147760 - Attachment mime type: text/plain → message/rfc822
Thanks for pointing me to bug 124941. The patch does not quite regress it, but I
think something happens... Scary patch with subtle changes, wasn't it? :-) Oh
well, I'll dig deeper. Any other testcases while I'm at it?
Attachment #150361 - Attachment is obsolete: true
Attachment #150361 - Flags: superreview?(sspitzer)
Attachment #150361 - Flags: review?(ducarroz)
Attached patch Better patchSplinter Review
Ok, here is a second try. This patch looks a bit larger, but that is because it
touches three files, refactors some code and updates the relevant comments.
There is really only one new line of code (in MimeText_parse_eof()).

The problem is that we both need to close down the QPdecoder (as in the first
patch) _and_ do charset detection/conversion (which might fail with the first
patch).

This is solved by refactoring and exposing the decoder destruction in MimeLeaf
so we can access it from MimeText without changing the codepath from bug
124941. Is this an acceptable modification?
Attachment #150522 - Flags: review?(ducarroz)
Comment on attachment 150522 [details] [diff] [review]
Better patch

Looks good. R=ducarroz
Attachment #150522 - Flags: superreview?(bienvenu)
Attachment #150522 - Flags: review?(ducarroz)
Attachment #150522 - Flags: review+
Comment on attachment 150522 [details] [diff] [review]
Better patch

there's an indentation problem here that should be fixed before checkin

+  if (leaf->decoder_data)
+    {
+      int status = MimeDecoderDestroy(leaf->decoder_data, PR_FALSE);
+      leaf->decoder_data = 0;
+      return status;
+    }
Attachment #150522 - Flags: superreview?(bienvenu) → superreview+
The indentation is a bit odd (and inconsistent) in those files, but I hope this
is an improvment.

Could someone with CVS access take care of getting this in?
I haven't tested the patch - I just want to add an observation about the
original problem:

I tried hand-sending (using telnet to port 25) a mail containing only an email
address. The problem described in this bug occured, when I was using
Content-Type: text/plain; charset="ISO-8859-1" and Content-Transfer-Encoding:
quoted-printable. However, it did not occur, when either of these headers were
left out.
It is a confirmed QP problem for messages ending without a newline. For
text/plain, the only problem is that URL recognition in the last line will
break. For format-flowed, the problem is worse - one or more char at the last
line will end up after an extra linebreak, not pretty at all.

I have used the patch the last couple of weeks without any noticable
regressions. The only thing it needs is someone to get it into CVS for further
testing. Would be nice to get in Thunderbird 1.0 as well...
Checked in on trunk by timeless.
Fix also on aviary branch as of today. Marking as fixed.
Status: ASSIGNED → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
*** Bug 256967 has been marked as a duplicate of this bug. ***
*** Bug 252292 has been marked as a duplicate of this bug. ***
*** Bug 225552 has been marked as a duplicate of this bug. ***
Product: MailNews → Core
*** Bug 241811 has been marked as a duplicate of this bug. ***
*** Bug 140831 has been marked as a duplicate of this bug. ***
Product: Core → MailNews Core
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: