Open Bug 344288 Opened 18 years ago Updated 2 years ago

Long URLs sent from mail.app are not linkified properly

Categories

(MailNews Core :: MIME, defect, P3)

Tracking

(Not tracked)

People

(Reporter: mscott, Unassigned)

References

(Blocks 1 open bug, )

Details

(Whiteboard: [patchlove][has draft patch][gs])

Attachments

(2 obsolete files)

I'm sure this is a dupe but I can't find it. Asa pointed this out to me a while ago.

If someone sends you an e-mail from mail.app using format=flowed, with a long url, mail.app splits the url with a line break.

we properly fix the line break so the URL isn't broken when you read it, but we don't linkify the entire URL, just the part that was on the first line.

This really should be fixed by mail.app, they shouldn't be breaking urls. But we can try to make our plain text format flowed parser a little smarter.
Attached patch work in progress (obsolete) — — Splinter Review
Blake helped me figure out what was going on here. When we encounter a line break while processing inside a format=flowed line, we need to buffer it up until we finish processing the entire line. Then call ScanTxt on the entire flowed line.

This patch fixes my test cases, but I need to study the code more before I'm comfortable with it. I can only take libmime in small doses.
James, the format flowed guru, can you help us out here?

One interesting thing that jumped out at Blake and I in the patch I added to this bug:

I added back the code to null out the CR and LFs when delsp=true that you originally added and then took back out:

https://bugzilla.mozilla.org/show_bug.cgi?id=322089#c6

We aren't sure about the ramifications of leaving it. 

You might also have a better way for us to fix this. 
Re: comment 2, the buffering approach you describe sounds about right; I don't have any better suggestions. I'm sure if the code were rewritten from scratch there would be more efficient ways to do some things, but a rewrite would take a lot more work, of course.

As for putting the \r and \n back on the end of the line, as in the patch for 322089, I don't know what the effect is, if any. I just assumed it was the right thing to do because (unless I misread the code) a flowed line with delsp=no gets sent along to the next step with its newline intact, so a flowed line with delsp=yes should also be sent along with a newline--the only difference between a flowed line with delsp=no and a flowed line with delsp=yes should be the logical deletion of the space character at the end of the line.
Thanks for jumping in James. So it looks like we started stripping out the CRLFs  for delsp=yes, in July of 2005. The code to leave it alone just went into the trunk a few months ago, the branch still removes the CRLFs. 

So this patch keeps that inconsistency between delsp=yes and delsp=no that we've been shipping with since 2005. That doesn't make it right though :)

I'm not sure how to figure out what the right answer is here. 

James, can you review the rest of the patch and see if there are any other changes in it that look worrisome? 

Comment on attachment 228849 [details] [diff] [review]
work in progress

Blake said he'd help review this too.
Attachment #228849 - Flags: review?(mrbkap)
At line 349, you took out the code to decrement the length ("length--;"). I think that is needed, because you look at the length again later (the length-- decrement code was there in v.1.53 of July 2005).

A couple lines down from there, I think the code that appends the line to the buffer should appear outside the conditional block, i.e. it should look like :
       line[index] = '\0';
     }
     // flowed text should be buffered . . .
so that the buffering still works if format=flowed and delsp=no. We also need to check for the sigSeparator boolean again before doing the buffering, so that we do not treat a Usenet signature separator as a flowed line. (bug 322089 and RFC 3676)

Does this properly handle malformed messages, e.g. a message consisiting of only flowed lines with no fixed line at the end?

Re: to chop or not to chop the \r\n, my instinct would be that the right thing to do would be to chop it off in both cases, delsp=yes and delsp=no, but I don't know the text-to-HTML conversion code well enough to say whether that's right. When I wrote the patch I was just trying to take care of delsp and not tamper too much with the surrounding code.

(In reply to comment #4)
> Thanks for jumping in James. So it looks like we started stripping out the
> CRLFs  for delsp=yes, in July of 2005. The code to leave it alone just went
> into the trunk a few months ago, the branch still removes the CRLFs. 
> 
> So this patch keeps that inconsistency between delsp=yes and delsp=no that
> we've been shipping with since 2005. That doesn't make it right though :)
> 
> I'm not sure how to figure out what the right answer is here. 
> 
> James, can you review the rest of the patch and see if there are any other
> changes in it that look worrisome? 
> 
Attached patch updated fix based on feedback from James (obsolete) — — Splinter Review
I think I addressed your comments James, what do you think of this patch?
Attachment #228849 - Attachment is obsolete: true
Attachment #228849 - Flags: review?(mrbkap)
Attachment #229997 - Flags: review?(mozilla.org)
I think the patch might break the handling of the quote-depth-wins rule described in RFC 3676 section 4.5--a line that should be treated as fixed under quote-depth-wins would be appended to exdata->flowedline even though it should not be. If so, it might be possible to remedy this by adding an additional test for linequotelevel == exdata->quotelevel in the line that checks whether the content is flowed (line 339 in the current code in lxr). I'll look into this further (and look over the whole patch again) when I have more time, if someone else can't get to it sooner.

(In reply to comment #7)
> Created an attachment (id=229997) [edit]
> updated fix based on feedback from James
> 
> I think I addressed your comments James, what do you think of this patch?
> 
I looked at the patch again--I don't think it handles quote-depth-wins correctly, and adding a test as described in comment 8 won't do it. If I am reading the RFC correctly, when you run into a line that requires application of the quote-depth-wins rule, you need to treat the *previous* line having ended a paragraph. Not sure what is the best way to do this. 

Also, I couldn't tell from reading the code how it would handle a malformed message that did not have a fixed line at the end of the file. Assuming the code handles that case properly, the rest of the patch looks good to me.

This bug basically bug 5351, the format=flowed part of it.

Great direction of patch. This is basically the idea of bug 5351 comment 17.

I haven't reviewed carefully, but a few comments:

That
 if((quoteleveldiff != 0) && flowed && exdata->inflow) {}
line is interesting. It seems to be doing absolutely nothing, just wasting CPU cycles. It's there since revision 1.1, though. I wonder, why no review (including me) catched that or what I'm missing.
The comment is important and should stay, though, unless the logic changes with this patch.

> Also, I couldn't tell from reading the code how it would handle a malformed
> message that did not have a fixed line at the end of the file. Assuming the
> code handles that case properly, the rest of the patch looks good to me.

Indeed. Basically, you need to be sure to always write out your buffer.
Possibly in ..._parse_eof(). This may mean that you need the "output (and convert) buffer" code in 2 places, i.e. need to be moved to a new function.
Blocks: 5351
thanks for the feedback Ben. If you have some time and the test cases, feel free to jump in and  finish this patch off. I could use the help.
(In reply to comment #4)
> Thanks for jumping in James. So it looks like we started stripping out the
> CRLFs  for delsp=yes, in July of 2005. The code to leave it alone just went
> into the trunk a few months ago, the branch still removes the CRLFs. 
> 
> So this patch keeps that inconsistency between delsp=yes and delsp=no that
> we've been shipping with since 2005. That doesn't make it right though :)
> 
> I'm not sure how to figure out what the right answer is here. 
> 

I just looked back at the RFC, <http://www.rfc-editor.org/rfc/rfc3676.txt>. It says:

   If the line is flowed and DelSp is "yes", the trailing space
   immediately prior to the line's CRLF is logically deleted.  If the
   DelSp parameter is "no" (or not specified, or set to an unrecognized
   value), the trailing space is not deleted.

   Any remaining trailing spaces are part of the line's content, but the
   CRLF of a soft line break is not.

That second paragraph seems to suggest that the \r \n should be stripped when format=flowed and delsp=yes.
The part I'm not certain about is whether the text-to-HTML converter is expecting that line break to be there--I think not, but I didn't go too deep into the code.
Comment on attachment 229997 [details] [diff] [review]
updated fix based on feedback from James

Making Ben the requestee--I'm happy to provide comments but I'm not really qualified to do a review
Attachment #229997 - Flags: review?(mozilla.org) → review?(ben.bucksch)
Well, see comment 11 and 12. There are serious issues with this patch, so it would have to be r-. Only thing I could do would look carefully for more issues and/or create my own patch based on it.
Ben, if you are interested in running with this, by all means go for it. It's a very annoying problem.
Yeah, it's annoying. Although I primarily blame Apple Mail.

I'd like to take it over, but don't have much time at the moment.
OS: Windows XP → All
Hardware: PC → All
I'd love to see this fixed (or Apple implement an option to turned off format=flowed, but I know which is going to happen first ;-)

Having read the above comments I can appreciate how complex this is; is there anything I can do to help encourage this along?
Anyone filed a bug about this for apple mail? (Do they have a bugtracker or something similar for it?)
Yes, they have one (https://bugreport.apple.com/ see http://www.symphonious.net/2005/07/06/how-to-report-bugs-to-apple/ for more info about how it works), but:
i) It doesn't let you search for existing bugs
ii) This isn't a bug, it's a feature request. Apple is correctly following the RFC, Thunderbird is failing to do so. Sadly, Apple is ruthless in following the RFC; there is no pref to turn it off, the only way to avoid it is to fail plain text (ie. send HTML mail and they don't get broken). Which frankly, is not a good workaround IMHO. The pain of this is that experienced users work it out just fine, but novice users just click blindly and think the URL is broken, so this is a bug (TBird) that bites novice users worst.
RFC 3676 says (on Generating Format=Flowed):
   A generating agent SHOULD:

   o  Ensure all lines (fixed and flowed) are 78 characters or fewer in
      length, counting any trailing space as well as a space added as
      stuffing, but not counting the CRLF, *unless a word by itself
      exceeds 78 characters*.

... so Apple should not wrap long URLs. 
I think the previous paragraph is even more incriminating:

   ... a generating agent SHOULD NOT insert a space in an unnatural
   location, such as into a word (a sequence of printable characters,
   not containing spaces, in a language/coded character set in which
   spaces are common).

So the definition of 'word' extends beyond pure alpha and incorportes
any printing character (ie. Apple couldn't argue that a slash was
a word separator).

Given the above information about how Apple handles bugs, I encourage
as many people who are affected by this to report it!
The links I see from my mail.app-using correspondent don't display in TB with a space at the break, it's just that the linking stops.  What mail.app is doing, it's doing correctly, it's just more bleeding edge than other clients.
(Not unlike the issue where TB's following a newer RFC to name attachments that doesn't work with Outlook Express 6 and whatever the corresponding Outlook Office version is.)

Note that TB will correctly linkify URLs spread over multiple source lines in a quoted-printable message body, but I assume the code joining *those* lines together takes place in a different location than the f=f reflow operation.  (Insert obligatory moan about the dire state of Mozilla's MIME implementation.)
apple's mail.app is not the only app that does send mails with this. i've seen at least one other app that does this.
Mike: one of the links in comment 24 says delsp was added between RFC2646 and RFC3676. Since RFC3676 says it shouldn't break long words, shouldn't apple follow the newer RFC3676 spec?

Not that we shouldn't be better at dealing with the current behaviour...
(In reply to comment #25)
> What mail.app is doing, it's doing correctly, it's just more bleeding edge than other clients.

Nope, it's violating the RFC (see #c22 and #c23). Apple have responded to my bug report with 'After further investigation it has been determined that this is a known issue, which is currently being investigated by engineering.  This issue has been filed in our bug database under the original Bug ID# 4530785'.

Note that this (TBird) bug will still need to be fixed regardless (because at present, it is correctly(?) removing the space, but not correctly linkifying the whole URL, ie. the behaviour is inconsistent at best, broken at worst)
(In reply to comment #0)
> I'm sure this is a dupe but I can't find it.

FWIW, this would be a duplicate of Bug 299975 (which was marked as a duplicate of Bug 5351).
This is a serious usability annoyance; marking as blocking thunderbird 3.
Assignee: mscott → nobody
Flags: blocking-thunderbird3+
Comment on attachment 229997 [details] [diff] [review]
updated fix based on feedback from James

Marking r- as per Ben's comment 15.
Attachment #229997 - Flags: review?(ben.bucksch) → review-
Retriaging according to new policy for flags.  

https://wiki.mozilla.org/Thunderbird:Release_Driving

Ben, would you be interested in taking this bug?

Flags: wanted-thunderbird3+
Flags: blocking-thunderbird3-
Flags: blocking-thunderbird3+
Priority: -- → P3
Target Milestone: Thunderbird2.0 → Thunderbird 3.0b1
Whiteboard: [needs new patch benb?]
Target Milestone: Thunderbird 3.0a3 → Thunderbird 3.0b1
Ben, is this something you'd be willing to take a run at?
Target Milestone: Thunderbird 3.0b1 → Thunderbird 3.0b2
Version: 2.0 → Trunk
Not anytime soon. My highest priority currently is autoconfig.
(Sorry for having missed the previous comments.)
FWIW, it's not trivial, with real risk of things going wrong including eating parts of lines etc., and it's in frequently used code paths for mail reading, so I'd be reluctant to push it in shortly before a release anyways.
OK, thanks.  Setting wanted-.
Flags: wanted-thunderbird3+ → wanted-thunderbird3-
(In reply to comment #36)
> FWIW, it's not trivial, with real risk of things going wrong including eating
> parts of lines etc., and it's in frequently used code paths for mail reading,
> so I'd be reluctant to push it in shortly before a release anyways.

benb, is autoconfig sufficiently finished that you'd be willing to have a go at this?
Component: General → MIME
Product: Thunderbird → MailNews Core
QA Contact: general → mime
Whiteboard: [needs new patch benb?] → [patchlove][has draft patch][gs]
Target Milestone: Thunderbird 3.0b2 → ---
Comment on attachment 229997 [details] [diff] [review]
updated fix based on feedback from James

Setting as obsolete, due to r-..
Attachment #229997 - Attachment is obsolete: true
another example url which is not parsed correctly:


http://www.eex.com/en/Market%20Data/Trading%20Data/Power/Phelix%20Futures=
%20|%20Derivatives#y

(c/p from msg source)

version: shredder 3.1.9pre
Mozilla/5.0 (X11; U; Linux x86_64; de-DE; rv:1.9.2.15pre) Gecko/20110207 Shredder/3.1.9pre

cheers,
raoul
mhm - if i correctly understand rfc2396, "|" should be properly encoded.

so it's actually a firefox issue, if a c/p of an url does not encode "|" into "%7C", right?
No, it's a website issue. And both have nothing to do with this bug here.
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: