Closed Bug 448198 Opened 11 years ago Closed 3 years ago

format=flowed: Reply to non-f=f message containing certain characters breaks quoting

Categories

(Core :: Serializers, defect, major)

defect
Not set
major

Tracking

()

VERIFIED WORKSFORME

People

(Reporter: mcow, Unassigned)

References

(Blocks 2 open bugs)

Details

(Keywords: dataloss, regression, Whiteboard: [no l10n impact][needs updated patch with unit tests])

Attachments

(1 file, 3 obsolete files)

A plain-text but not format=flowed message which contains one of the characters
  &  <  >
(call these "entity" characters) directly adjacent to a space (before or after) will be misquoted on reply, if you are set up to send f=f mail.  A space character before the entity will be deleted; a space after the entity will be doubled.

Steps to reproduce:
1) Turn format=flowed off (set mailnews.send_plaintext_flowed to False)
2) Compose a plain-text message and send it to yourself.  Put this text in the message:
============
Ben & Jerry
Ben < Jerry
Ben > Jerry
Ben ; Jerry
Ben ' Jerry

Ben &Jerry
Ben <Jerry
Ben >Jerry
Ben 'Jerry
Ben ;Jerry

Ben& Jerry
Ben< Jerry
Ben> Jerry
Ben' Jerry
Ben; Jerry
============

2) Turn format=flowed back on (set the pref back to True).
3) When the message arrives, reply to it (in plaintext).  Examine the quoted text.

Actual results:
3) Spaces next to [&<>] chars are mangled:
============
> Ben&  Jerry
> Ben<  Jerry
> Ben>  Jerry
> Ben ' Jerry
> Ben ; Jerry
> 
> Ben&Jerry
> Ben<Jerry
> Ben>Jerry
> Ben 'Jerry
> Ben ;Jerry
> 
> Ben&  Jerry
> Ben<  Jerry
> Ben>  Jerry
> Ben' Jerry
> Ben; Jerry
============

Expected results:
3) Spaces are preserved:
============
> Ben & Jerry
> Ben < Jerry
> Ben > Jerry
> Ben ' Jerry
> Ben ; Jerry
> 
> Ben &Jerry
> Ben <Jerry
> Ben >Jerry
> Ben 'Jerry
> Ben ;Jerry
> 
> Ben& Jerry
> Ben< Jerry
> Ben> Jerry
> Ben' Jerry
> Ben; Jerry
============


This is trunk-only, not seen on 2.0.  I'm not sure, but I'm guessing this is fallout from one of Andriy's patches, probably the one at bug 155622.
Thank you Mike. I will check whether this is the regression caused by my patches.
Mike, you are right - this is the regression caused by my patch to bug 261467. Though, i can't say now which of the changes there caused this regression and why. TBC...
(In reply to comment #2)
> the regression caused by my patch to bug 261467.
sorry - not bug 261467, but bug 215068.
Blocks: 215068
Flags: blocking-thunderbird3+
Keywords: regression
Product: Core → MailNews Core
Attached patch patch1.0 (obsolete) — Splinter Review
It seems that this patch fixes the bug.
Attachment #332360 - Flags: review?(mscott)
Assignee: nobody → andrit
Whiteboard: [has patch]
Attachment #332360 - Flags: review?(mscott) → review?(bienvenu)
Target Milestone: --- → Thunderbird 3.0b2
Whiteboard: [has patch] → [has patch] [patch needs testing]
duh, can I even review this, since it's part of content? Bz, can you review this, or recommend someone who can?
Bz, see previous comment, thx!
I can probably do that, given a sufficiently clear explanation of what's going on (e.g. why just those characters are affected).
It's a long story, but i'll try to remember... :)

Here spaces are deleted by stringpart.Trim(" ", PR_FALSE, PR_TRUE, PR_TRUE); and appended by mCurrentLine.Append(PRUnichar(' ')); (see patch). It happens only on quoted text (i.e. preformatted) which is handled by first “major codepath” in nsPlainTextSerializer::Write() (see comment there).

While debugging this I noticed, that for some reason, when handling the text Write() receives not the full string line (for example, like “Ben & Jerry”) but it is called thee times with following arguments: “Ben ” – first, “&” symbol second and “ Jerry” and the end. That’s why space after Ben is trimmed and one space before Jerry is stuffed (code recognize it as start of new line). And this happens only with &, > and < symbols, in HTML-editing mode and with quoted text.

So to protect from such situations I added additional checks: to not trim if it is preformatted text in HTML-mode (IsInPre()) and to stuff space if it is not quotation (mCiteQuoteLevel == 0).

Now while explaining I recognized the flow of this patch. It won’t trim EOL spaces of Preformatted HTML text and thus lines will be jointed in format flowed capable viewers. What if to add the same check for quotation for trimming? I’m not sure we have to trim EOL spaces in quoted text.

Thanks for forcing me to explain things :)
Attached patch patch2.0 (obsolete) — Splinter Review
Here is new patch.
Attachment #332360 - Attachment is obsolete: true
Attachment #332360 - Flags: review?(bienvenu)
Comment on attachment 336995 [details] [diff] [review]
patch2.0

thx for the new patch. requesting review from bz - if that line is well over 80 chars, it might want to be wrapped, and we might prefer !mCiteQuoteLevel.

But I have a question - does your patch still work if you reply to the first reply in your test case, where the mCiteQuoteLevel will be 1?
Attachment #336995 - Flags: review?(bzbarsky)
(In reply to comment #10)
> But I have a question - does your patch still work if you reply to the first
> reply in your test case, where the mCiteQuoteLevel will be 1?

David, what do you mean?
I'll be honest; I really don't understand all the bits here well enough to review this...  I don't even have a good feel for what this code is trying to do, much less what the edge cases are.
I tried the patch, but unfortunately it doesn't seem to work. 

The simple way to see the bug is to hit reply - plain text composition - on a bug mail, then notice the space wrongness around the < and >. (On both sides.)

E.g. for the previous comment. 

original:      (todo: 200+ items)<bzbarsky@mit.edu>  2008-09-26
reply:         (todo: 200+ items)<bzbarsky@mit.edu>   2008-09-26
patched:       (todo: 200+ items)<bzbarsky@mit.edu>   2008-09-26
(In reply to comment #12)
> I'll be honest; I really don't understand all the bits here well enough to
> review this...  I don't even have a good feel for what this code is trying to
> do, much less what the edge cases are.

Boris, thank for for honesty :) Did you have a chance to read my comment #8 where I tried to explain the bits?
Yes, and then you posted a new patch different from the previous one...

Also, what about comment 13?
Attached image patch2.0 result screenshot (obsolete) —
Boris, in comment #8 i tried to explain the problem with patch1.0, that's why patch2.0 appeared. About comment 13 - no idea. The patch2.0 works for me, see screenshot attached. Magnus, could you provide more details? Thank you.
The details are pretty simple: apply the patch, hit reply on any bug mail and look at the --- Comment #16 from line.
Magnus, at what to look?
In the quote the line looks like 
> --- Comment #18 from Andriy Tkachuk<andrit@ukr.net>   2008-09-29

Although it should be

> --- Comment #18 from Andriy Tkachuk <andrit@ukr.net> 2008-09-29
Ah, right. I see now. Thanks.

Well, it should be fixed probably from another side. First, it should be investigated, why it works in following way:

(In reply to comment #8)
> While debugging this I noticed, that for some reason, when handling the text
> Write() receives not the full string line (for example, like “Ben & Jerry”) but
> it is called thee times with following arguments: “Ben ” – first, “&” symbol
> second and “ Jerry” and the end. That’s why space after Ben is trimmed and one
> space before Jerry is stuffed (code recognize it as start of new line). And
> this happens only with &, > and < symbols, in HTML-editing mode and with quoted
> text.
Attachment #336995 - Attachment is obsolete: true
Attachment #336995 - Flags: review?(bzbarsky)
Attachment #340916 - Attachment is obsolete: true
Presumably because the HTML escaping needed means breaking up the string and feeding it in a bit at a time.
(In reply to comment #20)
> > And this happens only with &, > and < symbols, in HTML-editing mode and with
> > quoted text.

As appeared, not only. Replying to notification mail from Bugzilla in plain-text mode also affected.
Whiteboard: [has patch] [patch needs testing] → [needs new/working patch]
moving to b2
Target Milestone: Thunderbird 3.0b1 → Thunderbird 3.0b2
As reported in bug 456053, reply to f=f plain-text messages unflows the quote, which appears to be related to the problem here. Andriy has posted a patch in that other bug, which includes a patch (update?) for this bug as well, but that was 2008-09-22, thus may be obsoleted by now with the discussion here.
Andriy: any update on this? Would be good to get this moving asap, especially as it affects core code.
Sorry guys - pretty tough schedule now...
Attached patch patch3.0Splinter Review
Magnus, please try this patch. Pay attention also that following bugs are still fixed with this patch: bug 125928, bug 215068, bug 261467. Thank you!
Comment on attachment 353384 [details] [diff] [review]
patch3.0

I'll run with this patch, but we need to get a module owner to review it...
Tried the patch, the previous problem is gone, however

I sent myself
<><><><>
< > < > < >
><><><
> < > < > < 

Replying I get
> <><><><>
> < > < > < >
>  ><><><
>  > < > < > <

... so I think there's still some minor problem there. Thanks for working on this!
Andriy, any idea about the problem Magnus saw? In particular, I guess there's an extra space after the '>' on the last two lines.
moving to b3, though we'd certainly take a patch for b2
Target Milestone: Thunderbird 3.0b2 → Thunderbird 3.0b3
Since this is a pure Gecko bug, in addition to fixing previously mentioned bug, it will also be necessary to write unit tests.  I can imagine two possible ways for this get approved for the appropriate Gecko branch:

1) A new patch with unit tests gets posted, reviewed, and landed on the trunk quickly.  We might then be able to convince the 1.9.1 drivers to allow this fix as a ridealong if there's another release candidate for Firefox 3.5 after RC1.

2) This happens more slowly, and we land it after Firefox 3.5 ships (for 1.9.1.1).  There's some risk that Thunderbird could ship before 1.9.1.1 does, however.

Andriy, what sort of time are you likely to be able to devote to this soonish?
Component: Composition → Serializers
Flags: blocking-thunderbird3+
Product: MailNews Core → Core
QA Contact: composition → dom-to-text
Whiteboard: [needs new/working patch] → [needs updated patch with unit tests] [tb3needs]
Target Milestone: Thunderbird 3.0b3 → ---
Flags: in-testsuite?
Status: NEW → ASSIGNED
(In reply to comment #32)
> 2) This happens more slowly, and we land it after Firefox 3.5 ships (for
> 1.9.1.1).  There's some risk that Thunderbird could ship before 1.9.1.1 does,
> however.

Well, this obviously missed that milestone and won't make 1.9.1.2 either.
So, what's the fallback if a fix doesn't come in? Backing out the patch for
bug 215068 again or living with its issues? Sounds like picking the lesser
of two evil... :-\
I'd be somewhat surprised if the Gecko folks allowed us to back out something that old without tests as well, so we may just need to push forward and get this fixed somehow.  

Andriy, would you be able to make some time to address the problem in comment 29 and put together some unit tests in the near future?
Keywords: dataloss
Whiteboard: [needs updated patch with unit tests] [tb3needs] → [needs updated patch with unit tests] [tb3needs][no l10n impact]
Whiteboard: [needs updated patch with unit tests] [tb3needs][no l10n impact] → [no l10n impact][needs updated patch with unit tests] [tb3needs]
Andriy and I had a short email exchange, and he said that the code is tricky enough that he has given up on trying to unregress this.  He recommends backing out bug 215068.

I notice that philor only has one blocker, so I'm hoping he'll be interested in picking this up, figuring out whether a backout is actually practical at this point, and either driving that or proposing another strategy...
Assignee: andrit → philringnalda
Please don't back out.  This bug is annoying, but not as annoying as the bug that the patch fixed.
I see two possible strategies:

One: I drop everything else, patches and reviews both, and spend all my time from now to shipping on first learning about f=f, and then tracking down every fix we've made to this code, so I know what output we want from what input so I can stick it into tests, then fill out the rest of the tests for things we got right in the first place and haven't yet broken, then learn the code well enough to patch it, then find someone willing to review it even though in the past pretty much nobody has been willing to do so, and then in the end find myself unwilling to lie to the 1.9.1 drivers, so that I have to tell them that this is in fact an API that already shipped, but because I like us better than I like a hypothetical extension which is using that API and already working around the broken bits, I want to change that shipped API on a stable branch and break that hypothetical extension, at which point they (better) say no.

Two: we drop the tb3needs, accept that we already slipped this bug late last spring when 1.9.1 first froze, and start out right now to persuade the people who already know about f=f and care about this bug to work up the testcases (without needing to write the automated tests, just needing to create a single comprehensive list saying that "- --" in format X quoted into format Y must be output as "(whatever it should be)" and in format Y quoted into format Y must be whatever else, for all the tens or hundreds of cases), so that we can actually fix it for whatever version of Tb we next ship off a Gecko that isn't yet frozen.

(Yeah, there's a third strategy, where we either fork the serializer or do our own post-processing workarounds, but I'm not persuaded that this bug, which has garnered zero duplicates over the course of almost two years, actually calls for that sort of really terrible idea.)
Assignee: philringnalda → nobody
Status: ASSIGNED → NEW
Phil's reasoning seems sound.  This code feels fragile to touch at this late date anyway, so removing tb3needs.
Whiteboard: [no l10n impact][needs updated patch with unit tests] [tb3needs] → [no l10n impact][needs updated patch with unit tests]
Could there be any relationship with <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=519165"> bug 519165 ?
Blocks: 519165
Blocks: 541978
Note that not only replies are affected, bug 541978 states a reproducible problem related to the issue here when composing a message from scratch and
a downconversion from HTML to plain text occurs upon sending...
Duplicate of this bug: 519165
Duplicate of this bug: 549959
(In reply to comment #0)
> Actual results:
> 3) Spaces next to [&<>] chars are mangled:
> ============
> > Ben&  Jerry
> > Ben<  Jerry
> > Ben>  Jerry
> > Ben ' Jerry
> > Ben ; Jerry

Plain Text mode composition utilizes <pre> style internally.  
As bug 541978 and bug 597181 is HTML/<pre> case with format=flowed, phenomenon becomes obvious. (Note: Options/Format/Plain and Rich(HTML) Text is required to see text/html part due to Bug 414299.)

Check result with Tb 3.1.7 on Win-XP.
> Content-Type: multipart/alternative; boundary="..."
>
> --...
> Content-Type: text/plain; charset=UTF-8; format=flowed
>
> P<    Q&&...&&    X>    Y
>
> --...
> Content-Type: text/html; charset=UTF-8
>
>(snip, three spaces between words)
> <pre>P   &lt;   Q   &amp;&amp;...&amp;&amp;   X   &gt;   Y
> </pre>

Phenomenon looks:
 - spaces before character entity is removed.
 - a space is added after word which starts with the character entity.
Blocks: 597181
Why the heck is this bug unfixed after 2.5 years?

Mangling messages is not an option.  I'm raising the severity of this.  If someone wants to drop it back down, they can add a comment explaining why.
Severity: normal → major
> Why the heck is this bug unfixed after 2.5 years?

Because there's no one around who understands the code involved, and no one has stepped up to learn it and fix it, other than Andriy.  And so far he hasn't written a correct patch yet.

If you're willing to work on this, that's great: you're in as good a position to do it as anyone else is.
Duplicate of this bug: 647897
No longer blocks: 536670
Duplicate of this bug: 536670
Blocks: 625961
Duplicate of this bug: 571502
Here is a thunderbird extension from Paolo Bonzini that you can install into "~/.thunderbird/<profile name>/extensions" while waiting for this bug to ever get fixed:

http://people.redhat.com/pbonzini/format-flawed.tar.gz
"It just disables format=flowed when replying to non-format=flowed messages.  It's ugly as hell, but it does the job."
A partial workaround is to select the original text, copy it to the clipboard, click reply, remove the quoted text and then insert as citation (Ctrl-Shift-O). Empty lines after lines already quoted in the original text seem to be dropped, but spaces around &, < and > are preserved.
Re comment #50:  Is this what you really want end-users to do?
(In reply to David E. Ross from comment #51)
> Re comment #50:  Is this what you really want end-users to do?

I'm just an end-user, and I won't jump through these hoops every time I reply to an email. I just used it as a last resort when commenting on a diff.

It's interesting that Thunderbird apparently has the ability to do the right thing built in, though. Perhaps this code path can be used to implement a solution or at least a band-aid fix?
Summary: f=f: Reply to non-f=f message containing certain characters breaks quoting → format=flowed: Reply to non-f=f message containing certain characters breaks quoting
Seems to work in Thunderbird 24.
Not for me. When I do selective quoting (and only then!), this still happens, e.g. with URLs. _Very_ annoying. (BTW, the _very_ would have been affected as well from this bug if this was a mail :-)
Indeed, while the original testcase works now for me, I can still reproduce this with a different testcase (and independent of selective vs. full quoting).  For example:

==================
      foo
-     bar
+     baz
      pig
==================

is all misaligned after replying.  My extension needs an update for TB24, too (gPrefBranch => Services.prefs).
I produced myself message with *no* f=f with the content described on comment #0 and comment #55.

I did full and partial replies with f=f and can't see a problem. That's using today's Daily.

Should this still be an issue, please open a new bug with a reproducible example.
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → WORKSFORME
Verified WFM.  Thanks, Jorg.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.