Closed Bug 772796 Opened 12 years ago Closed 9 years ago

Joining <div> with <pre> obliterates preformatting (Part 1: Fix MoveBlock ())

Categories

(Core :: DOM: Editor, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla44
Tracking Status
firefox43 --- fixed
firefox44 --- fixed

People

(Reporter: jcranmer, Assigned: jorgk-bmo)

References

Details

(Whiteboard: [tb-papercut])

Attachments

(5 files, 32 obsolete files)

923 bytes, text/html
Details
635 bytes, text/html
Details
3.63 KB, image/png
Details
4.04 KB, image/png
Details
21.38 KB, patch
roc
: review+
Details | Diff | Splinter Review
The best way to reproduce this is via Thunderbird's composer.

1. Start with a plaintext message that has a bottom-posted reply to it. mozilla.dev.platform is a good source of these messages.
2. Reply to this message. Verify that you have the HTML composer and that the style of the quoted comment is Preformatted.

The message should look like this:
On dada date, X said:
| On blah date Y said:
| | BLAH BLAH I AM LOUD
| |
| | OH I AM SO LOUD
|
| Oh no you didn't
|
| You have no idea what you are doing

3. Delete the entire inner quoted section, starting from the reply attribution to the end of that quote back (lines 2-5 in the above example).

4. Notice that the rest of the reply loses its preformatting, so all the other paragraphs bunch up into a single paragraph

Fixing this problem saves me a lot of pain from having to try to figure out how to trim quotes without turning the entire quote unreadable.
Hmm, I have no idea how the preformatting works there.  David, do you?
(In reply to Ehsan Akhgari [:ehsan] from comment #1)
> Hmm, I have no idea how the preformatting works there.  David, do you?

I don't know, but I don't think anything on the Thunderbird side has changed, and I've seen this bug as well. I don't think this changed when we did the file link stuff, but cc'ing mconley to be sure.
Hm, I could be wrong, but I don't believe our Filelink stuff could have caused this.
So somebody should bisect this.
one place to check might be a build w/o the fix for bug 590640
There's a related, long-standing bug that deleting the start of a quote as part of a range of text will delete the blockqoute, leaving the remaining text as unquoted. This happens a lot when trimming quotes and is highly annoying.
Whiteboard: [tb-papercut]
Can you two guys tell me whether this is the same issue, please:

Answer a BMO e-mail. You get in the first line (in Body Text):
  On 22/07/2015 17:33, Bugzilla@Mozilla wrote:
followed by (in Preformat):
  |
  | https://bugzilla.mozilla.org/show_bug.cgi?id=345852
  | (more stuff)

Position the cursor after the colon (:) on the first line and press (forward) <delete>.

Effect: The Preformat loses it's formatting and gets meshed into the Body Text.

What happens is not entirely surprising. When you join two "blocks", in the case of the BMO mail, a <div> with a <pre>, the second one loses its properties.

What is the desired behaviour?
Flags: needinfo?(ben.bucksch)
Flags: needinfo?(Pidgeot18)
Attached file Test message (obsolete) —
I tried replying to the enclosed message to check Joshua's example. I actually *can* delete the inner quote as long as I stay away from the Body Text.
Attachment #8637445 - Attachment mime type: message/rfc822 → text/text
Attachment #8637445 - Attachment mime type: text/text → text/plain
(In reply to Jorg K from comment #7)
> Can you two guys tell me whether this is the same issue, please:

I don't know for certain; you'd have to look at the HTML to see if the condition is the same.

> Effect: The Preformat loses it's formatting and gets meshed into the Body
> Text.
> 
> What happens is not entirely surprising. When you join two "blocks", in the
> case of the BMO mail, a <div> with a <pre>, the second one loses its
> properties.

Yes, but you're also gutting the contents of the first div completely. It's not clear from user-feedback level that you're merging a <div> with a <pre>, and since the first div's content is completely obliterated, it's not necessarily a wise idea to move the second block into that content.

> What is the desired behaviour?

The desired behavior, ultimately, is that editing an email should never put you in a position where attempting to trim quoted text would suddenly destroy formatting of the message.
Flags: needinfo?(Pidgeot18)
Maybe we could implement something in the M-C editor, so deleting a boundary between two "blocks" won't have such detrimental effects, if the second block is a <blockquote> or <pre>. There is already the eEditorMailMask flag, that could be used detect "special" situations, for example here:
http://mxr.mozilla.org/mozilla-central/source/extensions/spellcheck/src/mozInlineSpellChecker.cpp#1237
This should read:
There is already the eEditorMailMask flag, that can be used to detect "mail editor" situations. It is already used, for example here: ...
I went back to TB 3.1.20 as Ehsan had suggested bisecting in comment #4.

The behaviour of TB 3.1 is much better:
When deleting at the boundary between <div> and <blockquote><pre> the first line of the <blockquote><pre> gets joined into the <div>, but the rest is left intact.

I'd like to define this as the desired behaviour. I'll look into it further.

P.S.: TB 10 already shows the current undesired behaviour.
Assignee: nobody → mozilla
Flags: needinfo?(ben.bucksch)
More precisely: TB 10.0.4 ESR from 22nd Apr 2012 shows the problem. So it has nothing to do with bug 590640 (as was suggested in comment #5) since that landed in May 2012.
Good version: TB 3.2 of  1st Jan 2010 built on Gecko 1.9.3.
Bad  version: TB 3.3 of 31st Dec 2010 built on Gecko 2.
So this broke somewhere between Gecko 1.9.3 and Gecko 2 in 2010.
I will narrow it down to the day.
It's time to take Thunderbird out of the picture and chase the bug in Firefox. The attached simple page shows it very nicely.
Last good:
http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2010/05/2010-05-22-04-mozilla-central/
Mozilla/5.0 (Windows; U; Windows NT 6.1; WOW64; en-US; rv:1.9.3a5pre) Gecko/20100522 BTRS35926 Minefield/3.7a5pre

First bad:
http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2010/05/2010-05-23-04-mozilla-central/
Mozilla/5.0 (Windows; U; Windows NT 6.1; WOW64; en-US; rv:1.9.3a5pre) Gecko/20100523 BTRS35926 Minefield/3.7a5pre
Who can see someone here?
http://hg.mozilla.org/mozilla-central/pushloghtml?startdate=2010-05-21&enddate=2010-05-24
Quite a bit of activity on those days by Ehsan and Robert.

Maybe this one:
http://hg.mozilla.org/mozilla-central/rev/37b7ca43fafe

Any hints?
Flags: needinfo?(ehsan)
Summary: Deleting quotes causes preformats to be deleted → Joining <div> with <pre> obliterates preformatting
Attachment #8637445 - Attachment is obsolete: true
Aryeh, welcome back! Perhaps you feel like lending a hand here in editor territory ;-)
Test case in comment #15. Desired result: Join two lines, and leave the rest of the quote/pre alone. Don't lose the preformatting of the entire quote/pre.

Most likely changeset 37b7ca43fafe (see comment #17) is *not* the culprit. I put this code
  nsIFrame *frame = content->GetPrimaryFrame();
  if (frame) {
    const nsStyleText* styleText = frame->StyleText();
    *aResult = styleText->WhiteSpaceIsSignificant();
    return NS_OK;
  }
back into nsEditor::IsPreformatted() and it didn't make any difference.
Flags: needinfo?(ayg)
Just for the record:
Chrome behaves like the good pre-May 2010 version of Firefox, IE loses the <pre> like FF does now.
Jorg, can you use hg bisect to find the exact revision that caused the problem?
Flags: needinfo?(ayg)
Hmm, never used it. That means that I have to compile all those versions locally?
[https://selenic.com/hg/help/bisect]
I was hoping someone can spot the problem from the two dates.
No, you only have to compile a few of the versions, which bisect will provide for you.  It's unlikely anyone will be able to just spot the problem.  Bisect is the tool to use here.
Bisecting stuff from 2010 is really not such a good idea. I started, just to show my good will. First I had to remember how the build system worked back in those days (make -f client.mk). Then of course, I didn't get very far, since I have Visual C++ 2013. Required for the compilation of 2010 source is Visucal C++ 2005 or 2008 (see: https://developer.mozilla.org/en-US/docs/Simple_Firefox_build$revision/18035). I only have my travel laptop with me, which I don't want to shred right now with deinstallations, etc.
(In reply to Jorg K from comment #17)
> Who can see someone here?
> http://hg.mozilla.org/mozilla-central/pushloghtml?startdate=2010-05-
> 21&enddate=2010-05-24
> Quite a bit of activity on those days by Ehsan and Robert.
> 
> Maybe this one:
> http://hg.mozilla.org/mozilla-central/rev/37b7ca43fafe
> 
> Any hints?

This is five years ago, sorry I don't remember anything concretely about any of this stuff.  Please bisect as Aryeh suggested.

Also note that the preformatted whitespace code is absolutely terrible and almost has no tests. The last time that I changed something in it, it has caused so many regressions that I haven't even had time to look into all of them yet, and most of the regressions affect Thunderbird.  So unless the bisection uncovers a very small bug with an obvious fix, I don't think we should accept more changes to this code before the previous regressions have been fully fixed.
Flags: needinfo?(ehsan)
(In reply to Ehsan Akhgari (not reviewing patches, not reading bugmail, needinfo? me!) from comment #24)
> Please bisect as Aryeh suggested.

This will have to wait a few weeks until I have a machine to do it on.
Guys, I appreciate your faith in bisection, but we also need to apply some logic here before wasting more computing resources.

Nightlies are compiled at night, somewhere in the US, the two nightlies in question have a timestamp of 4:46. The zip format stores local time, not UTC. Both zip files landed on the FTP server at 11:34 UTC (see links above in comment #16).

The pushlog is based on UTC. So it is fair to say that we only need to look between
22 May 2010, somewhere in the UTC morning, say 8:00 UTC to be safe, to
23 May 2013, somewhere in the UTC morning, say 11:00 UTC.
Looking at http://hg.mozilla.org/mozilla-central/pushloghtml?startdate=2010-05-21&enddate=2010-05-24
there is exactly one changeset that touches the editor: 37b7ca43fafe.
And bingo, that changeset does a change in the area of styles and preformatting.

I am reasonably sure, that this changeset introduced the problem back in 2010. However, undoing this changeset on the 2015 code base doesn't fix the problem. Not a surprise.

So I come to the conclusion, that bisecting is not going to yield any more results. As Ehsan said on comment #24, the <pre> stuff has its problems, one only needs to look at bug 116083, bug 1151873, bug 1174452 (and its like duplicate bug 1162963).

So, unless I have a good idea later, I will suspend work on this bug until Ehsan addresses the other <pre> issues, and perhaps this bug will automatically get fixed ;-)
I don't know that Ehsan refers to with
> The last time that I changed something in it, it has
> caused so many regressions that I haven't even had time to look into all of
> them yet, and most of the regressions affect Thunderbird.  So unless the
> bisection uncovers a very small bug with an obvious fix, I don't think we
> should accept more changes to this code before the previous regressions have
> been fully fixed.

The bugs I mentioned in comment #26 are all in the area of copying <pre> stuff. This bug here is different: It's about editing and joining a <div> and a <pre>. In the example it's <blockquote><pre>, but it equally fails without the <blockquote>.

As I wrote in comment #26, I don't believe that bisecting will get us a quick fix here.

So if you can spare a minute, can you point me to the code in the editor were deletions and joining of two "blocks" are handled. (BTW, is there a good term to describe <div> and <pre>?)
Flags: needinfo?(ehsan)
(In reply to Jorg K from comment #27)
> So if you can spare a minute, can you point me to the code in the editor
> were deletions and joining of two "blocks" are handled. (BTW, is there a
> good term to describe <div> and <pre>?)

I think it's nsEditor::JoinNodes.  The interesting bit of that is in nsEditor::JoinNodesImpl.
Indeed ;-)

xul.dll!nsEditor::JoinNodesImpl(nsINode * aNodeToKeep, nsINode * aNodeToJoin, nsINode * aParent) Line 2828	C++
xul.dll!mozilla::dom::JoinNodeTxn::DoTransaction() Line 67	C++
xul.dll!nsTransactionItem::DoTransaction() Line 173	C++
xul.dll!nsTransactionManager::BeginTransaction(nsITransaction * aTransaction, nsISupports * aData) Line 784	C++
xul.dll!nsTransactionManager::DoTransaction(nsITransaction * aTransaction) Line 79	C++
xul.dll!nsEditor::DoTransaction(nsITransaction * aTxn) Line 732	C++
xul.dll!nsEditor::JoinNodes(nsINode & aLeftNode, nsINode & aRightNode) Line 1519	C++
xul.dll!nsEditor::JoinNodes(nsIDOMNode * aLeftNode, nsIDOMNode * aRightNode, nsIDOMNode * __formal) Line 1493	C++
xul.dll!nsHTMLEditor::CollapseAdjacentTextNodes(nsRange * aInRange) Line 3740	C++
xul.dll!nsHTMLEditRules::AfterEditInner(EditAction action, short aDirection) Line 471	C++
xul.dll!nsHTMLEditRules::AfterEdit(EditAction action, short aDirection) Line 392	C++
xul.dll!nsHTMLEditor::EndOperation() Line 3430	C++
xul.dll!nsAutoRules::~nsAutoRules() Line 100	C++
xul.dll!nsPlaintextEditor::DeleteSelection(short aAction, short aStripWrappers) Line 702	C++

Most likely the problem lies higher in the call stack.

In the pre-May 2010 version there must have been some special casing somewhere to treat <pre> a little differently. When joining a <div> with a <pre> the first line of the <pre> was moved into the <div> maintaining the rest of the <pre>.
Flags: needinfo?(ehsan)
Here comes some analysis:

First, let me repeat the HTML I'm working on:
  <div>This text is in a simple div:</div>
  <pre>line 1
  line 2
  line 3
  </pre>

When deleting forward after the colon, the *one single* (!!) text node of the <pre> gets moved into the <div> and the <pre> is deleted:
xul.dll!nsHTMLEditRules::MoveContents() Line 2953	C++
xul.dll!nsHTMLEditRules::MoveBlock() Line 2880	C++
xul.dll!nsHTMLEditRules::JoinBlocks() Line 2846	C++
xul.dll!nsHTMLEditRules::WillDeleteSelection() Line 2472	C++
xul.dll!nsHTMLEditRules::WillDoAction() Line 627	C++
xul.dll!nsPlaintextEditor::DeleteSelection() Line 690	C++
xul.dll!nsEditor::HandleKeyPressEvent() Line 4699	C++

Take a look here:
https://dxr.mozilla.org/mozilla-central/source/editor/libeditor/nsHTMLEditRules.cpp#2879

There is no special processing for <pre> to split it at the newlines (yes, the text is "line 1\nline 2\nline 3\n") and insert <br> to be able to only move the first line into the <div>.

So I asked myself, why does it work in the pre-May 2010 version. I tried it on the example and saved the result after the deletion:
 <div>This text is in a simple div:line 1<br>
 </div><pre>line 2<br>line 3<br></pre>

Lo and behold, the <pre> gets split into multiple text and <br> nodes after the edit. I checked before the edit, and yes, the <pre> got represented as a set of text and <br> nodes even before the edit.

I went back to the current Firefox and tried this example:
 <div>This text is in a simple div:</div>
 <pre>line 1<br>line 2<br>line 3<br></pre>

Now, if you delete after the colon, it all works: "line 1" is moved into the <div> and the rest stays.

Conclusion:
As far as I can see, nothing has changed in the way the join of the <div> and <pre> is done.
What has changed from earlier versions of Firefox is the DOM for the <pre>. At least in my case it is represented as a single text node with newlines, whereas before it was represented as a set of text and <br> nodes.

So, where do we go from here? Can we reinstate the previous model? Or can we split the <pre> before the join operation?
Flags: needinfo?(ehsan)
Flags: needinfo?(ayg)
Just confirming the above:

Last good:
Mozilla/5.0 (Windows; U; Windows NT 6.1; WOW64; en-US; rv:1.9.3a5pre) Gecko/20100522 BTRS35926 Minefield/3.7a5pre
Represents a multi-line <pre> as multiple text and <br> nodes.

First bad:
Mozilla/5.0 (Windows; U; Windows NT 6.1; WOW64; en-US; rv:1.9.3a5pre) Gecko/20100523 BTRS35926 Minefield/3.7a5pre
Represents a multi-line <pre> as a single text node with newlines.
I checked Chrome and IE. Both have one text inside the <pre>. So changing the DOM back to what it was pre-May 2010 is not an option. Also, most likely the change in May 2010 had its good reasons. Surely a lot of test would also fail, if we changed the DOM of the <pre>.

So if we want better join behaviour, we'd have to split off the first line from the <pre> and just move it into the preceding <div>.

That's a local change in nsHTMLEditRules.cpp. Would you like me to give that a go? Or would you like to do it?

It would be something along the lines of:

if source side of the join is a <pre>
  look in first text node of the pre for a newline.
  if found, split the text node into text before the newline, <br> and text after the newline (if any).
  re-attempt the join with the modified DOM.

There are more cases to consider: The text can sit inside some other tag(s) (bold, colour, italics, etc.) and the line break happens inside the other tag (see example below). Then the split would be more complex, since those properties would have to be repeated.

But perhaps there is a split function in the the system that does exactly what we need here.

Tips? Hints? Suggestions?

BTW, Chrome handles:
  <div>This text is in a simple div:</div>
  <pre><b>line 1
  line 2</b>
  line 3
  </pre>
When you delete after the colon, the "line 1" (in bold) gets moved into the <div>, "line 2" remains bold in the <pre>.
(In reply to Jorg K from comment #33)
> It would be something along the lines of:
> 
> if source side of the join is a <pre>
>   look in first text node of the pre for a newline.
>   if found, split the text node into text before the newline, <br> and text
> after the newline (if any).
>   re-attempt the join with the modified DOM.

Something like this sounds right, yes.
Flags: needinfo?(ayg)
Wait.  Before we change anything in the editor code, I would like to know what exactly changed the behavior here originally.  That means bisecting (not guessing!) to see what change in the range in comment 17 has changed the behavior, especially since I'm 99% sure we made no changes to how the descendants of <pre> are represented in 2010 (br nodes vs \n) so there is probably something wrong in your analysis.
Flags: needinfo?(ehsan)
(In reply to :Ehsan Akhgari (not reviewing patches, not reading bugmail, needinfo? me!) from comment #35)
> Wait.  Before we change anything in the editor code, I would like to know
> what exactly changed the behavior here originally.  That means bisecting
> (not guessing!) to see what change in the range in comment 17 has changed
> the behavior, especially since I'm 99% sure we made no changes to how the
> descendants of <pre> are represented in 2010 (br nodes vs \n) so there is
> probably something wrong in your analysis.

As I said, I won't ruin my travel machine with uninstalling the 2013 compiler and install the 2005, so I can't bisect right now.

As for the analysis: The DOM for the <pre> changed between 22nd and 23rd of May 2010. See for yourself. I'm using the example from attachment 8639854 [details].

In the following comment I'll post the same inspection, on the 23rd of May 2010.

So let me know what's wrong in my analysis ;-)
The <BR>s are gone, the \n (invisible) have come in.
Please let me know how you want to proceed. I can do the bisecting after mid-August when I'm back at home and have more machines at my disposition.

BTW, I wasn't guessing: Please look at comment #26. I gave the reasons why I'm 99% sure that there is only one changeset in the area of preformatting. The other changes are: Marco: XPCOM, Timothy: Removed assertions, added some stuff inside #ifdef DEBUG for frames (layout/base), Chris: IPC, Robert: Toolkit. 

Most likely this day the transition from <br> nodes to "\n" (silently) started.

But hey, people who are 99% sure can be wrong ;-)
Flags: needinfo?(ehsan)
The only suspect revision in that range involving the editor is bug 336104 but I don't understand how it would change the way that <pre> elements are represented.  As for the other changes in the range, I don't really have time to study them one by one to see what they could have possibly changed.

I'm afraid that until you or someone else bisects and then delves down into the changeset that changed the behavior to understand what exactly changed, this bug is not actionable.
Flags: needinfo?(ehsan)
FYI:
If you switch MXR to Firefox 3 and search for "IsPreformatted" you get here:
http://mxr.mozilla.org/firefox/source/editor/libeditor/text/nsTextEditRules.cpp#1203

ReplaceNewlines !!
http://mxr.mozilla.org/firefox/source/editor/libeditor/text/nsTextEditRules.cpp#1175

An in there it calls the function "IsPreformatted" you changed on 22nd of May 2010 in
http://hg.mozilla.org/mozilla-central/rev/37b7ca43fafe

"IsPreformatted" as of 23rd May 201 returned the wrong result, no replacement of newlines would take place. No one noticed, since it only happened in editable situations.

Later on, "ReplaceNewlines" got removed from the code base, bug 551704, part 3.

Is this enough investigation or do you still insist on the bisection? In fact, no bisection is necessary, we already have the changeset that caused the problem and we understand the mechanism. IHMO there is no need to compile these old versions to be able to step trough with the debugger. Do you agree?

I think we should focus on the fix: Do what I described in comment #33 or something else if you have a better suggestion.

My analysis was 100% correct and this bug is actionable now without any further ado.
Flags: needinfo?(ehsan)
Obviously the removal of "ReplaceNewlines" is the reason why undoing changeset 37b7ca43fafe, doesn't restore the previous behaviour. (Which is what I said in comment #26: Undoing this changeset on the 2015 code base doesn't fix the problem. Not a surprise.)
Status: NEW → ASSIGNED
Attached patch WIP 3: compiles and works (obsolete) — Splinter Review
This works in simple cases but needs more work:

Works fine here:
<div>This text is in a simple div:</div>
<pre>line 1
line 2
line 3
</pre>

In the following case, the whole <b>...</b> section is moved to the <div>
<div>This text is in a simple div:</div>
<pre><b>line 1
line 2</b>
line 3
</pre>

Comments are welcome!
Attachment #8641871 - Attachment is obsolete: true
Attachment #8642024 - Attachment is obsolete: true
Flags: needinfo?(ayg)
Attachment #8642078 - Flags: feedback?(ayg)
Attachment #8642078 - Flags: feedback?(ehsan)
This works a little better, the second case now also works, since I drill down to find the first text node.

<div>This text is in a simple div:</div>
<pre><b>line 1
line 2</b>
line 3
</pre>

I think I'll give this a very limited try run to see what breaks:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=47040e44dc6a
Attachment #8642078 - Attachment is obsolete: true
Attachment #8642078 - Flags: feedback?(ehsan)
Attachment #8642078 - Flags: feedback?(ayg)
Attachment #8642084 - Flags: feedback?(ehsan)
Attachment #8642084 - Flags: feedback?(ayg)
Cover the case where we backspace from the <pre> into the <div>.

Also make it compile for Linux ;-)

Try run: https://treeherder.mozilla.org/#/jobs?repo=try&revision=e778be16784c
Attachment #8642084 - Attachment is obsolete: true
Attachment #8642084 - Flags: feedback?(ehsan)
Attachment #8642084 - Flags: feedback?(ayg)
Attachment #8642094 - Flags: feedback?(ehsan)
Attachment #8642094 - Flags: feedback?(ayg)
One word about the backspace. If you place the cursor before "line 1" in

<div>This text is in a simple div:</div>
<pre>line 1
line 2
line 3
</pre>

and backspace, the result with the patch is:

<div>This text is in a simple div:</div>line 1<br><pre>line 2
line 3
</pre>

That's better than before, when the <pre> was removed altogether. The current version of FF gives:

<div>This text is in a simple div:</div>line 1
line 2
line 3

Chrome does this:

<div>This text is in a simple div:line 1</div><pre>line 2
line 3
</pre>

This is what I would expect since this is also the result of the forward delete.

*So backspace could do with some improvement.* As I said, it wasn't right, even without my patch. So this could be another bug ;-(

The test results are (surprisingly) all green:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=e778be16784c
which only confirms that tests are lacking.

I'd be happy to land like this, of course with a test added and after review.

Further considerations about backspace (off topic):

I've just checked a simple "two <div>" case and delete as well as backspace give terrible results:

<div>This text is in a simple div:</div>
<div style="color:#f00">and a red div</div>

If you forward delete, you get all red. If you backspace, you lose the red, but the lines don't join.

Forward delete:
<div style="color:#f00">This text is in a simple div:and a red div</div>
Understandable.

Backspace:
<div>This text is in a simple div:</div>and a red div
I really don't understand why backspace is so different and IMHO quite wrong.

Chrome handles both cases as the user would expect:
<div>This text is in a simple div:<span style="color: rgb(255, 0, 0);">and a red div</span></div>
Same as the last WIP, but with the test moved into the helper function to avoid code duplication. I will add a test next.
Attachment #8642094 - Attachment is obsolete: true
Attachment #8642094 - Flags: feedback?(ehsan)
Attachment #8642094 - Flags: feedback?(ayg)
Attachment #8642131 - Flags: feedback?(ehsan)
Attachment #8642131 - Flags: feedback?(ayg)
Damn, I've just spotted the trailing spaces in the code I had copied from the now disappeared ReplaceNewlines:
http://mxr.mozilla.org/firefox/source/editor/libeditor/text/nsTextEditRules.cpp#1175 (Firefox 3!)
I reported the backspace problem in bug 1190161. I suggest to land the preformatting fix here first and then look at the other bug.
See Also: → 1190161
I noticed that on backspace the previous solution delivers this (coment #49):

<div>This text is in a simple div:</div>line 1<br><pre>line 2
line 3
</pre>

Note the <br> after "line 1". With this additional clean-up we get this:

<div>This text is in a simple div:</div>line 1<pre>line 2
line 3
</pre>

This may be preferred. Visually, there is no difference.
Comment on attachment 8642131 [details] [diff] [review]
Proposed solution, test still missing

Review of attachment 8642131 [details] [diff] [review]:
-----------------------------------------------------------------

I don't know whether this is a good overall approach, Ehsan will have to say.  I commented on mostly stylistic issues, and one substantive issue.

::: editor/libeditor/nsHTMLEditRules.cpp
@@ +2568,5 @@
> + * In a preformat we need to change the first newline for a break so only the
> + * first line gets moved. If we don't do that, than entire preformatting may get
> + * obliterated.
> + */
> +nsIDOMNode *FindFirstTextNode (nsIDOMNode *aNode)

Space after the *, not before.  Also, why not make the return type nsIDOMText*?  static_cast<nsIDOMText*> on the return value should be safe, I think.

@@ +2574,5 @@
> +  nsCOMPtr<nsIDOMNode> child;
> +  aNode->GetFirstChild(getter_AddRefs(child));
> +  while (child) {
> +    uint16_t nodeType;
> +    nsCOMPtr<nsIDOMNode> next;

Define this before the first use, not at the start of the block.

@@ +2578,5 @@
> +    nsCOMPtr<nsIDOMNode> next;
> +    child->GetNodeType(&nodeType);
> +    if (nodeType == nsIDOMNode::TEXT_NODE) {
> +      return child;
> +    } else {

Omit "else" after "return".

@@ +2594,5 @@
> +  return nullptr;
> +}
> +
> +nsresult
> +nsHTMLEditRules::PreprocessPreformat(nsIDOMNode *aNode)

Space after the *, not before.

@@ +2605,5 @@
> +    return NS_OK;
> +  }
> +
> +  nsCOMPtr<nsIDOMNode> child;
> +  child = FindFirstTextNode(aNode);

Put this on one line: nsCOMPtr<nsIDOMNode> child = FindFirstTextNode(aNode);

@@ +2610,5 @@
> +  if (!child) {
> +    return NS_OK;
> +  }
> +
> +  nsCOMPtr<nsIDOMCharacterData> textNode = do_QueryInterface(child);

Not needed if you change FindFirstTextNode to return nsIDOMText*.

@@ +2612,5 @@
> +  }
> +
> +  nsCOMPtr<nsIDOMCharacterData> textNode = do_QueryInterface(child);
> +
> +  // Replace newline with break:

What happens if the newline is in the second text node, e.g., <pre><b>foo</b>bar\nbaz</pre>?  For this to be correct you need to iterate through all the text nodes on the first line.  (I.e., don't go over a <br> or block element.)

There might be functions somewhere that already do this.

@@ +2615,5 @@
> +
> +  // Replace newline with break:
> +
> +  // 1. Find the newline.
> +  PRInt32 offset;

Use int32_t, not PRInt32.

@@ +2623,5 @@
> +  if (offset >= 0) {
> +    nsCOMPtr<nsIDOMNode> brNode;
> +
> +    // 2. Delete the newline.
> +    nsCOMPtr<nsINode> content = do_QueryInterface(child);

Either name this "node", or make it nsIContent rather than nsINode.

@@ +2626,5 @@
> +    // 2. Delete the newline.
> +    nsCOMPtr<nsINode> content = do_QueryInterface(child);
> +    nsRefPtr<DeleteTextTxn> txn =
> +      mEditor->CreateTxnForDeleteText(*static_cast<nsGenericDOMDataNode*>(content.get()),
> +                                      offset, 1);

Use mEditor->DeleteNode().

@@ +2881,5 @@
>                                        &previousContentOffset);
>          }
>        }
>  
> +      res = PreprocessPreformat (aRightBlock);

No space before the parenthesis.

@@ +2920,5 @@
>      }
>      else
>      {
>        // nodes are disimilar types.
> +      res = PreprocessPreformat (aRightBlock);

Ditto.

::: editor/libeditor/nsHTMLEditRules.h
@@ +149,5 @@
>                                nsresult aResult);
>    nsresult InsertBRIfNeeded(mozilla::dom::Selection* aSelection);
>    ::DOMPoint GetGoodSelPointForNode(nsINode& aNode,
>                                      nsIEditor::EDirection aAction);
> +  nsresult PreprocessPreformat(nsIDOMNode *aNode);

Space after the *, not before.
Attachment #8642131 - Flags: feedback?(ayg)
Aryeh: Thanks a lot for your feedback!
I addressed most of the issues:
+ In this source file, you can find both "int* i" and "int *i".
  So I changed my code to match the former.
+ The find function now returns a nsIDOMText*
+ Most importantly, I fixed the logic error to look for the first text node.
  Now it looks for the first newline, but stops if a <br> is found.
  Please let me know whether the test for <br> is correct.
  Which block elements would also stop the search? 

I'm still unsure about the use of nsIDOMNode and its subclasses and nsINode and its subclassed. Some functions act on the former, some on the latter. Somehow you can convert one to the other: 
  nsCOMPtr<nsINode> node = do_QueryInterface(domNode), where domNode is an nsIDOMNode.
In the reverse direction there is "AsDOMNode".
Is there some documentation that explains the concept?

Lastly, I didn't understand your comment "Use mEditor->DeleteNode().". Surely I want to use "mEditor->CreateTxnForDeleteText()", do I not? This function splits the text in two, so a break can be inserted.

Would you mind looking at it again?

Then I should stop working on it until Ehsan approves the overall approach, which is, as I said, inspired by
http://mxr.mozilla.org/firefox/source/editor/libeditor/text/nsTextEditRules.cpp#1175
Attachment #8642131 - Attachment is obsolete: true
Attachment #8642155 - Attachment is obsolete: true
Attachment #8642131 - Flags: feedback?(ehsan)
Attachment #8642190 - Flags: feedback?(ehsan)
Attachment #8642190 - Flags: feedback?(ayg)
Previous patch, now with tests.

There are two tests: test_bug772796.html and test_bug772796A.html.

The second one is subtly different in that the <pre> starts on a new line.
The result is that forward delete and backspace give different results,
that's another *bug*!
In test test_bug772796.html forward delete and backspace give the same result.
Attachment #8642190 - Attachment is obsolete: true
Attachment #8642190 - Flags: feedback?(ehsan)
Attachment #8642190 - Flags: feedback?(ayg)
Attachment #8642207 - Flags: feedback?(ehsan)
Attachment #8642207 - Flags: feedback?(ayg)
Coming to think about it, the tests I added look an awful like the "conformance tests":
Have a snippet of HTML, do an action on a defined selection, compare the resulting innerHTML with what is expected.

My tests are also missing the case were the selection is not collapsed, so delete on:
  <div>te[st</div><pre>foo]bar
  baz</pre>
were part of the "test" and part of the "foobar" is selected ([] denotes the selection). I tried by hand, this works with my patch.

I'm happy to hear how to make the tests better. We will need tests anyway, regardless of what Ehsan thinks about the overall approach.
Comment on attachment 8642207 [details] [diff] [review]
Proposed solution after Aryeh's feedback, test included

Review of attachment 8642207 [details] [diff] [review]:
-----------------------------------------------------------------

(In reply to Jorg K from comment #55)
> + In this source file, you can find both "int* i" and "int *i".
>   So I changed my code to match the former.

The former is correct, see: https://developer.mozilla.org/en-US/docs/Mozilla/Developer_guide/Coding_Style

> I'm still unsure about the use of nsIDOMNode and its subclasses and nsINode
> and its subclassed. Some functions act on the former, some on the latter.
> Somehow you can convert one to the other: 
>   nsCOMPtr<nsINode> node = do_QueryInterface(domNode), where domNode is an
> nsIDOMNode.
> In the reverse direction there is "AsDOMNode".
> Is there some documentation that explains the concept?

Basically you have two separate class hierarchies for DOM nodes.  The old ugly one we don't like starts with nsIDOMNode, and has subclasses like nsIDOMElement, nsIDOMCharacterData, etc. -- all starting with "nsIDOM".  It's based on the DOM spec.  The methods on these classes are a big pain to use, because they follow old XPCOM conventions: everything must return nsresult even if it always succeeds, therefore return values use out parameters, and so on.  You want to avoid this as much as possible.

The new one we do like starts with nsINode.  Under it you have nsIContent (non-document nodes), nsIDocument, nsGenericDOMDataNode, mozilla::dom::Element, mozilla::dom::Text, and so on.  This has much nicer methods and you should always prefer to use it when possible.  I didn't ask you to change it for these patches because the existing code there seems to already use nsIDOMNode and will have to be cleaned up anyway.

To convert from an nsINode to nsIDOMNode, use the GetAsDOMNode() function.  (You can also use the ->AsDOMNode() method, if you're sure it's not null.)  This will always succeed.  To convert from an nsIDOMNode to an nsINode, you have to have an nsCOMPtr and use do_QueryInterface.  This might fail and return null in weird cases, so make sure you don't crash in that case (you can do anything else you like).  If you're converting between more specialized types, you may have to do some static_casting also, like to convert Text* to nsIDOMText*: static_cast<nsIDOMText*>(GetAsDOMNode(textNode)).

I'm in the process of steadily converting as much of editor as possible to use the nsINode family of classes instead of nsIDOMNode.

> Lastly, I didn't understand your comment "Use mEditor->DeleteNode().".
> Surely I want to use "mEditor->CreateTxnForDeleteText()", do I not? This
> function splits the text in two, so a break can be inserted.

Whoops, I meant mEditor->DeleteText().  The transaction is an implementation detail, so you want to call the wrapper function, unless I'm missing something.

::: editor/libeditor/nsHTMLEditRules.cpp
@@ +2568,5 @@
> + * In a preformat we need to change the first newline for a break so only the
> + * first line gets moved. If we don't do that, than entire preformatting may get
> + * obliterated.
> + */
> +nsIDOMText* FindTextNodeWithNewline (nsIDOMNode* aNode, int32_t* aOffset, bool* aBreakFound)

Remove the space before the (.  (I missed this the first time, sorry.)

@@ +2579,5 @@
> +  while (child) {
> +    uint16_t nodeType;
> +    child->GetNodeType(&nodeType);
> +    if (nodeType == nsIDOMNode::ELEMENT_NODE) {
> +        nsCOMPtr<nsINode> node = do_QueryInterface(child);

This line is indented too much.

@@ +2632,5 @@
> +    return NS_OK;
> +  }
> +
> +  // Delete the newline.
> +  nsCOMPtr<nsINode> node = do_QueryInterface(textNode);

This is probably okay, but NS_ENSURE_STATE(node); might be a good idea here.

@@ +2637,5 @@
> +  nsRefPtr<DeleteTextTxn> txn =
> +    mEditor->CreateTxnForDeleteText(*static_cast<nsGenericDOMDataNode*>(node.get()),
> +                                    offset, 1);
> +  NS_ENSURE_TRUE(txn, NS_ERROR_OUT_OF_MEMORY);
> +  res = mEditor->DoTransaction(txn);

Use DeleteText() instead.  (By the way, CreateTxnForDeleteText can never fail due to OOM, only if you're trying to delete in an uneditable node or similar.)

@@ +2898,5 @@
>                        previousContentOffset, rightOffset);
> +      NS_ENSURE_SUCCESS(res, res);
> +
> +      NS_ENSURE_STATE(mHTMLEditor);
> +      if (brInserted) mHTMLEditor->DeleteNode(brInserted);

Inserting a <br> to trick MoveBlock into behaving how you want doesn't look like the right way to do things.  Changing MoveBlock seems like it would make more sense.

Also -- if blocks always have curly braces, and the contents are always on a new line, like so:

    if (brInserted) {
      mHTMLEditor->DeleteNode(brInserted);
    }

@@ +2941,5 @@
>        res = MoveBlock(aLeftBlock, aRightBlock, leftOffset, rightOffset);
> +      NS_ENSURE_SUCCESS(res, res);
> +
> +      NS_ENSURE_STATE(mHTMLEditor);
> +      if (brInserted) mHTMLEditor->DeleteNode(brInserted);

Ditto.

::: layout/generic/test/mochitest.ini
@@ +87,5 @@
>  skip-if = (buildapp == 'b2g' && toolkit != 'gonk') #Bug 931116, b2g desktop specific, initial triage
>  [test_bug748961.html]
>  [test_bug756984.html]
> +[test_bug772796.html]
> +[test_bug772796A.html]

Why not put these in the same file?

::: layout/generic/test/test_bug772796.html
@@ +24,5 @@
> +baz</b></pre></div>
> +<div id="test3"><div>test</div><pre><b>foo</b>bar
> +baz</pre></div>
> +<div id="test4"><div>test</div><pre>foobar<br>baz</pre></div>
> +<div id="test5"><div>test</div><pre><b>foobar<br>baz</b></pre></div>

Could you also add some tests with block elements nested inside the <pre>?  Like:

    <div>test</div><pre><div>foobar</div>baz</pre>
    <div>test</div><pre>foo<div>barbaz</div></pre>

@@ +55,5 @@
> +
> +    SimpleTest.waitForFocus(function() {
> +
> +    var sel = window.getSelection();
> +    

You have trailing whitespace in the test files.

::: layout/generic/test/test_bug772796A.html
@@ +90,5 @@
> +        synthesizeMouse(theDiv, 100, 2, {});
> +        synthesizeKey("VK_RIGHT", {});
> +        synthesizeKey("VK_BACK_SPACE", {});
> +        
> +        is(theDiv.innerHTML, resultsBackspace[i-1], "backspace: inner HTML for " + testName);

If current behavior is known to be wrong, you should the correct expected results in the test, and use todo_is.  This way if it gets fixed, it doesn't look like a test failure, it shows up as "UNEXPECTED-PASS" and the patch writer knows to update the test instead of fixing a bug in his code.
Attachment #8642207 - Flags: feedback?(ayg)
Aryeh, thanks a lot for the feedback. I think I won't address any further issues until I hear from Ehsan. He needs to catch up with all that happened since last Thursday ;-)

His last comment #39 was that the bug was not actionable. I hope my analysis in comment #40 has convinced him of the opposite.

As for you comments:

(In reply to :Aryeh Gregor (working until August 13) from comment #58)
> Inserting a <br> to trick MoveBlock into behaving how you want doesn't look
> like the right way to do things.  Changing MoveBlock seems like it would
> make more sense.

We'll, I'm giving it the input it received before May 2010. I wouldn't call that "tricking it".
We'll hear from Ehsan about this.

> > +[test_bug772796.html]
> > +[test_bug772796A.html]
> Why not put these in the same file?

I tried that and it didn't work. It seems that the page gets too long and then simulating the click doesn't work. How do you place the selection without clicking? I'd be very happy to hear how to fix the tests. Given my comment #57 we might take another approach and make it more like the conformance tests.

> Could you also add some tests with block elements nested inside the <pre>? 
> Like:
>     <div>test</div><pre><div>foobar</div>baz</pre>
>     <div>test</div><pre>foo<div>barbaz</div></pre>

I will.

> If current behavior is known to be wrong, you should the correct expected
> results in the test, and use todo_is.  This way if it gets fixed, it doesn't
> look like a test failure, it shows up as "UNEXPECTED-PASS" and the patch
> writer knows to update the test instead of fixing a bug in his code.

Thanks will do.

Perhaps it would be quicker to fix bug 1190161 first. Can you take a look at it, please.
(In reply to Jorg K from comment #59)
> I tried that and it didn't work. It seems that the page gets too long and
> then simulating the click doesn't work. How do you place the selection
> without clicking?

getSelection().collapse(node, offset);

> Perhaps it would be quicker to fix bug 1190161 first. Can you take a look at
> it, please.

You'll have to look through the code to identify the issue.  If you're just making sure the behavior is the same as delete, hopefully regression potential will be low.
(In reply to :Aryeh Gregor (working until August 13) from comment #60)
> > How do you place the selection without clicking?
> getSelection().collapse(node, offset);

Thanks. I just saw that in other tests I wrote, I used synthesizeMouseAtCenter.
I'll fix the tests. Also, they went into the wrong spot: not layout/generic/test but editor/libeditor/test.

> > Perhaps it would be quicker to fix bug 1190161 first. Can you take a look at
> > it, please.
> You'll have to look through the code to identify the issue.  If you're just
> making sure the behavior is the same as delete, hopefully regression
> potential will be low.

I was hoping you could take a look at it and comment over in the other bug. May I ask what your roll/involvement is? You've cleaned up huge parts of the editor code, "blame" shows you a lot. But how about fixing this small issue? Or perhaps you're busy with other projects for the next 10 days. I'd just like to understand, no offense/critique intended. I'm actually very grateful for all your support.
Another word in defense of my "tricky" approach to this.

Instead of drilling open MoveBlock (which, by its name, moves blocks) and making it into a "Swiss army knife" of not only moving blocks but also chopping them up and deleting text out of them (the \n), it's simpler to use the existing function and give it the right input (and do a little clean-up later).

Maybe a matter of taste, I don't know. They say: "One man's trash is another man's treasure", so here it's: "One man's kludge is another man's «modular approach»" ;-)
This patch addresses Aryeh's second lot of feedback issues.

The test was rewritten and now works from an array. Tests with blocks, as suggested by Aryeh, are now included.

The question remains:
Is the approach of pre-processing the data before handing it to MoveBlock() acceptable (see comment #62).
Attachment #8642207 - Attachment is obsolete: true
Attachment #8642207 - Flags: feedback?(ehsan)
Attachment #8642639 - Flags: feedback?(ehsan)
Attachment #8642639 - Flags: feedback?(ayg)
Attached patch Proposed solution (v6) (obsolete) — Splinter Review
Changes to previous version:
- Now testing for block nodes (thanks Aryeh).
- Tests included for deleting non-collapsed selection.

Try run:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=787378d23628
Attachment #8642639 - Attachment is obsolete: true
Attachment #8642639 - Flags: feedback?(ehsan)
Attachment #8642639 - Flags: feedback?(ayg)
Attachment #8642872 - Flags: review?(ehsan)
Attachment #8642872 - Flags: feedback?(ayg)
All green.
OS: Windows 7 → All
Hardware: x86_64 → All
The patch needs to be rebased on bug 1156062, unless this patch lands first, in which case Aryeh could clean it up over in the other bug.
Actually, one last question:
nsIDOMText* FindTextNodeWithNewline(nsIDOMNode* aNode, int32_t* aOffset, bool* aBreakFound)
is not declared as a class method in the include file, but
nsresult nsHTMLEditRules::PreprocessPreformat(nsIDOMNode* aNode, nsCOMPtr<nsIDOMNode>* aBr)
is. Is that OK, or should both be handled the same way? If yes, how?
Comment on attachment 8642872 [details] [diff] [review]
Proposed solution (v6)

Review of attachment 8642872 [details] [diff] [review]:
-----------------------------------------------------------------

feedback+ with comments below accounted before, subject to the following caveats:

1) I don't know if Ehsan wants this patch at all, as previously discussed.

2) I don't think we want to insert a <br> that we're later going to remove just to fool MoveBlock.  Rather, MoveBlock itself should be changed.

3) This probably should have been written using nsINode* to start with, but it doesn't really matter, it's a drop in the bucket.

::: editor/libeditor/nsHTMLEditRules.cpp
@@ +2600,5 @@
> +    }
> +
> +    // Drill down recursively.
> +    nsCOMPtr<nsIDOMText> grandChild;
> +    grandChild = FindTextNodeWithNewline(child, aOffset, aBreakFound);

Initialize this on the same line you declare it:

  nsCOMPtr<nsIDOMText> grandChild = FindTextNodeWithNewline(child, aOffset, aBreakFound);

@@ +2887,5 @@
>                                        &previousContentOffset);
>          }
>        }
>  
> +      nsCOMPtr<nsIDOMNode> brInserted = nullptr;

Remove the " = nullptr" -- nsCOMPtr automatically initializes itself to null if you don't initialize it.

::: editor/libeditor/tests/test_bug772796.html
@@ +56,5 @@
> +      var theEdit = document.getElementById("editable");
> +
> +      for (i = 0; i < tests.length; i++) {
> +        var testName = "test" + i.toString();
> +        theEdit.innerHTML = "<div id=\"" + testName + "\">" + tests [i][0] + "</div>";

No space between "tests" and "[i][0]", please.
Attachment #8642872 - Flags: feedback?(ayg) → feedback+
Thanks for the feedback. I'll fix the nits after Ehsan reviewed it together with issues he might find.

(In reply to :Aryeh Gregor (working until August 13) from comment #68)
> 1) I don't know if Ehsan wants this patch at all, as previously discussed.
Surely we want a patch to fix the regression introduced in May 2010.

> 2) I don't think we want to insert a <br> that we're later going to remove
>    just to fool MoveBlock.  Rather, MoveBlock itself should be changed.
I won't repeat my answer to this. It can be read in comment #62 ;-)
I'd like to avoid overburdening MoveBlock with more logic.
Comment on attachment 8642872 [details] [diff] [review]
Proposed solution (v6)

Review of attachment 8642872 [details] [diff] [review]:
-----------------------------------------------------------------

Firstly thanks for the analysis in comment 40 and 41, things make a lot more sense now.  :-)

I have several comments here but the most important part that I would like to understand before reviewing more closely is the behavior of the backspace case after this patch, and the other things that I have suggested in the test.  While we're waiting for this, I also have a few high level issues that I would like to see addressed in the final version of the patch to be reviewed.

I have also described a better implementation strategy in the comments.

::: editor/libeditor/nsHTMLEditRules.cpp
@@ +2568,5 @@
> + * In a preformat we need to change the first newline for a break so only the
> + * first line gets moved. If we don't do that, than entire preformatting may get
> + * obliterated.
> + */
> +nsIDOMText* FindTextNodeWithNewline(nsIDOMNode* aNode, int32_t* aOffset, bool* aBreakFound)

It seems like you are trying to find the first line break and figure out what caused it to happen (whether it was caused by a significant newline, or something else such as a BR or block element.)

You should use a nsBlockInFlowLineIterator for that purpose.  That helper class will let you iterate over the line breaks in the block ancestor and see what has caused the next one.  You can see how this class is used in IsInvisibleBreak() in nsDocumentEncoder.cpp as an example.

Here are the necessary steps:

1. Get the block ancestor for the current node by obtaining its frame and looking up the parent frames to find the first nsBlockFrame (similar to IsInvisibleBreak)
2. Create an nsBlockInFlowLineIterator.  Make sure you have a valid one.
3. Save your current line (iter.GetLine()).  Call that prevLine.
4. Call .Next() on it in a loop to get the first non-empty line following the node (as can be seen in the while loop in IsInvisibleBreak for example).  This will move the iterator to the next line.  Call that newLine.
5. If newLine->IsBlock(), the line break was caused by a block element.  Otherwise, get prevLine->mFirstChild and iterate over to the last element on the line (similar to LineHasNonEmptyContent).  If lastFrame->GetType() == nsGkAtoms::brFrame, then the line break was caused by a br node.  Otherwise do_QueryFrame() the frame to an nsTextFrame and make sure that returns non-null, and call that textFrame.  textFrame->GetContentEnd() should be the offset of the \n causing the line break.  textFrame->GetContent() should return the nsIContent* corresponding to the text node.

@@ +2895,2 @@
>        res = MoveBlock(previousContentParent, aRightBlock,
>                        previousContentOffset, rightOffset);

I think the trick of inserting a br node and removing it is fine, however, this should not happen in the callers of MoveBlock.  This patch shows why: you have forgotten to update one of the MoveBlock call sites here.

So, please remove PreprecessPreformat, move that code into the beginning of MoveBlock, and then at the end call DeleteNode() on it like below.

::: editor/libeditor/tests/test_bug772796.html
@@ +23,5 @@
> +  <script type="application/javascript">
> +    var tests = [
> +      [ "<div>test</div><pre>foobar\nbaz</pre>", "<div>testfoobar</div><pre>baz</pre>", 1],
> +      [ "<div>test</div><pre><b>foobar\nbaz</b></pre>", "<div>testfoobar</div><pre><b>baz</b></pre>", 1],
> +      [ "<div>test</div><pre><b>foo</b>bar\nbaz</pre>", "<div>test<b>foo</b>bar</div><pre>baz</pre>", 1],

It would be also interesting to test "<div>test</div><pre><b>foo</b>\nbar</pre>" and "<div>test</div><pre><b>foo\n</b>bar\nbaz</pre>".

@@ +24,5 @@
> +    var tests = [
> +      [ "<div>test</div><pre>foobar\nbaz</pre>", "<div>testfoobar</div><pre>baz</pre>", 1],
> +      [ "<div>test</div><pre><b>foobar\nbaz</b></pre>", "<div>testfoobar</div><pre><b>baz</b></pre>", 1],
> +      [ "<div>test</div><pre><b>foo</b>bar\nbaz</pre>", "<div>test<b>foo</b>bar</div><pre>baz</pre>", 1],
> +      /* the <br> after the foobar is unfortunate but is beahviour that hasn't changed */

Nit: "hasn't changed in bug 772796."

@@ +36,5 @@
> +      [ "<div>test</div>\n<pre>foobar\nbaz</pre>", "<div>testfoobar</div><pre>baz</pre>", 0],
> +      [ "<div>test</div>\n<pre><b>foobar\nbaz</b></pre>", "<div>testfoobar</div><pre><b>baz</b></pre>", 0],
> +      [ "<div>test</div>\n<pre><b>foo</b>bar\nbaz</pre>", "<div>test<b>foo</b>bar</div><pre>baz</pre>", 0],
> +      [ "<div>test</div>\n<pre>foobar<br>baz</pre>", "<div>testfoobar<br></div><pre>baz</pre>", 0],
> +      [ "<div>test</div>\n<pre><b>foobar<br>baz</b></pre>", "<div>testfoobar<br></div><pre><b>baz</b></pre>", 0],

Please also extend these tests with cases similar to the above tests I suggested.

@@ +40,5 @@
> +      [ "<div>test</div>\n<pre><b>foobar<br>baz</b></pre>", "<div>testfoobar<br></div><pre><b>baz</b></pre>", 0],
> +
> +      /* Some tests with block elements. */
> +      [ "<div>test</div><pre><div>foobar</div>baz</pre>", "<div>testfoobar</div><pre>baz</pre>", 1],
> +      [ "<div>test</div><pre>foobar<div>baz</div></pre>", "<div>testfoobar</div><pre><div>baz</div></pre>", 1],

Please also add a test that checks the case where the block element is surrounded by text both before and after it.

@@ +42,5 @@
> +      /* Some tests with block elements. */
> +      [ "<div>test</div><pre><div>foobar</div>baz</pre>", "<div>testfoobar</div><pre>baz</pre>", 1],
> +      [ "<div>test</div><pre>foobar<div>baz</div></pre>", "<div>testfoobar</div><pre><div>baz</div></pre>", 1],
> +      [ "<div>test</div><pre><div>foobar</div>baz\nfred</pre>", "<div>testfoobar</div><pre>baz\nfred</pre>", 1],
> +      [ "<div>test</div><pre>foobar<div>baz</div>\nfred</pre>", "<div>testfoobar</div><pre><div>baz</div>\nfred</pre>", 1],

Please also add tests with newlines in the block element.

Also, I think we need tests here with the pre element appearing after a newline, similar to the "Repeating the first give tests" block above.

@@ +43,5 @@
> +      [ "<div>test</div><pre><div>foobar</div>baz</pre>", "<div>testfoobar</div><pre>baz</pre>", 1],
> +      [ "<div>test</div><pre>foobar<div>baz</div></pre>", "<div>testfoobar</div><pre><div>baz</div></pre>", 1],
> +      [ "<div>test</div><pre><div>foobar</div>baz\nfred</pre>", "<div>testfoobar</div><pre>baz\nfred</pre>", 1],
> +      [ "<div>test</div><pre>foobar<div>baz</div>\nfred</pre>", "<div>testfoobar</div><pre><div>baz</div>\nfred</pre>", 1],
> +    ];

Since MoveBlock and friends treat different tags differently, I think after you have extended the above tests with everything I suggested, you also need to add similar tests where:

a) uses of <pre> are replaced with <div class="pre">
b) uses of <pre> are replaced with <span class="pre">

(You should also add the following style element: <style> .pre { white-space: pre } </style>)

@@ +80,5 @@
> +          / * Test must work. */
> +          is(theDiv.innerHTML, tests[i][1], "backspace: inner HTML for " + testName);
> +        } else {
> +          /* Test is allowed to fail for now. */
> +          todo_is(theDiv.innerHTML, tests[i][1], "backspace: inner HTML for " + testName);

I think in addition to todo_is(), we also want to test the actual behavior as well.  Instead of having a 3rd item in each array, you can form the array like this:

    [ "<div>test</div>\n<pre>foobar\nbaz</pre>", "<div>testfoobar</div><pre>baz</pre>",
                                                 "<div>markup after backspace..."],

And then change the condition above to:

  if (tests[i].length == 2) {
    is(...);
  } else {
    todo_is(theDiv.innerHTML, tests[i][1], ...);
    is(theDiv.innerHTML, tests[i][2], ...);
  }

Lining up the expected and current markups in the test array vertically would make it very simple to look at the test and immediately understand what is broken in the backspace scenario.
Attachment #8642872 - Flags: review?(ehsan)
(In reply to Ehsan Akhgari (not reviewing patches, not reading bugmail, needinfo? me!) from comment #70)
> you have forgotten to update one of the MoveBlock call sites here.

Actually, I haven't. My editor has a "search" function. ;-)

But in my testing this was never executed, and why would I change something that is not executed and I can't test? Looks like it has to do with list items being joined. I'd have to work out how to get to this third MoveBlock. Or do you know?

I'll get to the other issues, but first I'll post a modified test so you can see what happens in the backspace case when the <pre> is on a newline. It looks a lot like bug 1190161.
I'll add all the test you mentioned next. This is just to show the backspace behaviour.
(In reply to Jorg K from comment #71)
> (In reply to Ehsan Akhgari (not reviewing patches, not reading bugmail,
> needinfo? me!) from comment #70)
> > you have forgotten to update one of the MoveBlock call sites here.
> 
> Actually, I haven't. My editor has a "search" function. ;-)
> 
> But in my testing this was never executed, and why would I change something
> that is not executed and I can't test? Looks like it has to do with list
> items being joined. I'd have to work out how to get to this third MoveBlock.
> Or do you know?

Looking at the code, it seems like that case happens in JoinBlocks when the right block is a descendant of the left block, and the left and right blocks are list items (<li> elements) that belong to different lists (<ol> or <ul> elements).  Actually, you should probably extend your tests with some cases that test the behavior for lists such as the above as well.
Attachment #8643220 - Attachment is obsolete: true
I tried to follow your instructions. Amazing how you have all these concepts in your head.

Let's consider the first basic test case:
<div>test</div><pre>foobar\nbaz</pre>

Unless I made a mistake, there is a logic error, since all the lines I test return IsBlock() true:
Debug:
  starting line is block 1
  next line is block 1
  no next line found
In fact, no next line is found.

Remember that the <pre> element is passed to the function, which is in itself a block element. In the model "IsInvisibleBreak" you pass in a break and take it from there.

What am I doing wrong? This is difficult for me to debug since I'm just copying code I don't fully understand. Also, I'm not familiar with the classes involved (ns...Frame), although I understand that these frames have something to do with the layout (having worked on bug 756984). So I don't know how to print/inspect them to see in what's going on.

I might try to catch you on IRC to advance this a bit faster. Or perhaps, if it's something obvious, tell me here.

Also, is "iterate over to the last element on the line" done like this?
+  nsIFrame* firstFrame = prevLine->mFirstChild;
+  nsIFrame* lastFrame = firstFrame;
+  for (frame = firstFrame; frame; frame = frame->GetNextSibling()) {
+    lastFrame = frame;
+  }
Hmm, I've just looked at LineHasNonEmptyContent again. Most likely I need to drill down further, since the frames can be nested.

You can also take a look at the tests. I've added the ones you wanted and repeated everything with the <pre> starting on a new line to show the consequences of bug 1190161.

Tests for CSS-driven preformatting and lists will be added once the existing tests work again. BTW, I don't intend to repeat them with the preformatted block starting on the a new line, or do you insist?
Attachment #8642872 - Attachment is obsolete: true
Attachment #8643318 - Attachment is obsolete: true
Flags: needinfo?(ehsan)
(In reply to Ehsan Akhgari (not reviewing patches, not reading bugmail, needinfo? me!) from comment #73)
> Looking at the code, it seems like that case happens in JoinBlocks when the
> right block is a descendant of the left block (*), and the left and right blocks
> are list items (<li> elements) that belong to different lists (<ol> or <ul>
> elements).  Actually, you should probably extend your tests with some cases
> that test the behavior for lists such as the above as well.
(*) Other way around?

Hmm, if you suspect which case would trigger that, please write it here. I've tried

<ul><ul><li>test</li></ul><li><pre>foobar
baz</pre></li></ul>

<ul><li>test</li><ul><li><pre>foobar
baz</pre></li></ul></ul>

<ul><li>test</li><li><ul><li><pre>foobar
baz</pre></li></ul></li></ul>

<ul><li>test</li><li><pre>foobar
baz</pre></li></ul>

and they all trigger the "normal" case.

In the code there are:
1) left block is inside right block.
2) right block is inside left block.
3) normal case.

Mostly I see 3).

2) is triggered with backspace, if the <pre> starts on a new line. Even case 2) is not so easy to understand. This might as well be an error due to bug 1190161, I don't know.

I've never seen case 1).

Anyway, let's get the code working again (comment #75).
Yippee! I found a test case where the second MoveBlock is executed even on forward delete:

test<pre>foobar
baz</pre>

(I didn't find this by accident, I just extended the newline in the backspace tests we discussed above to a longer text node).

Never mind that in this simple case undo doesn't work (*another bug*), so I have to set up the HTML before each test instead of simulating <control>Z.
Refreshed with more tests. No code change from preceding patch.

Questions remain:
- How to make it work (comment #75)
- How to trigger the first MoveBlock() call (left block is inside right block, comment #76).
Attachment #8643600 - Attachment is obsolete: true
Hmm, thinking out loud:

Case 2) right block is inside left block.
That's what we just saw. Right block, the stuff from the <pre>, gets moved to the right of the text node, but it gets moved into the left block, the surrounding <div>, which already contains the text and the <pre>.

Now case 1) left block is inside right block:
My head is spinning, but here it is:
<ul><li>list</li>text</ul>
The left block, the list item is inside the right block, the <ul>, which also contains the text node being moved.

But how to we merge a <pre> into this. Well, debugging (and logic) shows that the following example is also case 1) of MoveBlock():

<ul><pre><li>text</li>foo
bar</pre></ul>

This case works already today! But hold on, let's make it nasty:

<ul><pre><li style="white-space:normal">text</li>foo
bar</pre></ul>

This is the case what will profit from the \n to <br> replacement. Without it, the whole "foo\bar" is moved to the <li> and that's not what we want. Yippee!!

Now that we understand which tests are needed, please help me to get the code going, see comment #75.
Adding more tests as per comment #79.

Let me know if you want more. I still need to remove undo and set up the innerHTML again, since undo is known to not work in some cases (see comment #77).

I'll stop now and wait for more input from Ehsan.
Attachment #8643764 - Attachment is obsolete: true
Sorry I have a hard time following your train of thoughts, but it looks like this is where you're stuck still?  If not, please let me know where else.  :-)

(In reply to Jorg K from comment #75)
> Let's consider the first basic test case:
> <div>test</div><pre>foobar\nbaz</pre>
> 
> Unless I made a mistake, there is a logic error, since all the lines I test
> return IsBlock() true:
> Debug:
>   starting line is block 1
>   next line is block 1
>   no next line found
> In fact, no next line is found.

What does your code look like?

> Remember that the <pre> element is passed to the function, which is in
> itself a block element. In the model "IsInvisibleBreak" you pass in a break
> and take it from there.

Yes.  Since you have the pre element initially, you can get its first child frame.  Note that you cannot rely on this being a pre element though, for example if you have a <span style="white-space: pre"> instead of a <pre> that would be an inline element and not a block element, so you would need to adjust the code so that it works with both starting with a block element and starting with an inline element.

> What am I doing wrong? This is difficult for me to debug since I'm just
> copying code I don't fully understand. Also, I'm not familiar with the
> classes involved (ns...Frame), although I understand that these frames have
> something to do with the layout (having worked on bug 756984). So I don't
> know how to print/inspect them to see in what's going on.

I'm not sure what the question is yet since I don't know what specific code you are referring to.  Note that I am not very familiar with the nsBlockInFlowLineIterator myself, if your question is specific to that please ping roc.  He recently taught me how to use it, but my understanding may not be complete yet.

> I might try to catch you on IRC to advance this a bit faster. Or perhaps, if
> it's something obvious, tell me here.
> 
> Also, is "iterate over to the last element on the line" done like this?
> +  nsIFrame* firstFrame = prevLine->mFirstChild;
> +  nsIFrame* lastFrame = firstFrame;
> +  for (frame = firstFrame; frame; frame = frame->GetNextSibling()) {
> +    lastFrame = frame;
> +  }

No, since if you keep calling GetNextSibling you may eventually get to the next line.  You want to pay attention to how many children the line has by using GetChildCount().  See <https://dxr.allizom.org/mozilla-central/source/dom/base/nsDocumentEncoder.cpp#354>.

> You can also take a look at the tests. I've added the ones you wanted and
> repeated everything with the <pre> starting on a new line to show the
> consequences of bug 1190161.

Please flag me for review when you have a complete patch, or ask questions in the comments.  I typically don't look at random WIP patches since deciphering where a question relates to from many WIP patches can take hours of time, and doing incremental reviews is extremely time consuming, and time is what I don't have much of these days.  :(  (You may have noticed my "not reviewing patches" in my Bugzilla nick!)

> Tests for CSS-driven preformatting and lists will be added once the existing
> tests work again. BTW, I don't intend to repeat them with the preformatted
> block starting on the a new line, or do you insist?

Please test all combinations.  Testing more never hurts.
Flags: needinfo?(ehsan)
(In reply to Ehsan Akhgari (not reviewing patches, not reading bugmail, needinfo? me!) from comment #81)

> What does your code look like?
It is/was attached, see attachments.

> > What am I doing wrong?
> I'm not sure what the question is yet since I don't know what specific code
> you are referring to. 
It is/was attached, see attachments.

> (You may have noticed my "not reviewing patches" in my Bugzilla nick!)
Frankly, that's not realistic. If you don't review, who will? Here we're are looking at a 5-year-old regression in Mozilla's flagship product Firefox, which, well, you introduced. I've spent countless hours on analysing it, I came up with a (sadly bad) solution, and I've created a heap of good test cases after painfully analysing that all code path are covered.

So please look at the attachment and tell we where I didn't implement your suggestion from comment #70 well enough. We don't need a full blown review, just pinpoint the problem.
Flags: needinfo?(ehsan)
Forget the query for now:

get child node (node) and then 
nsIFrame* frame = node-> /* AsElement()-> */ GetPrimaryFrame();
gets me further. I seem to be finding two lines in the line iterator which don't claim to be blocks, so this looks promising.
Flags: needinfo?(ehsan)
Got it going for the simple cases where the first node of the <pre> is a text node text.
So we're on the home stretch ;-).
(In reply to Ehsan Akhgari (not reviewing patches, not reading bugmail, needinfo? me!) from comment #81)

> Yes.  Since you have the pre element initially, you can get its first child
> frame.  Note that you cannot rely on this being a pre element though, for
> example if you have a <span style="white-space: pre"> instead of a <pre>
> that would be an inline element and not a block element, so you would need
> to adjust the code so that it works with both starting with a block element
> and starting with an inline element.

Can you spell this out a little clearer?
What do you mean by "get its first child frame".

I've tried to get the first child, and then its frame.
Works fine if the first child is a text (see comment #84 and comment #85).
Doesn't work if the first child of the <pre> is for example a <b> element.

Seems foolish to me to dig around in the DOM to find the first text node (grand)child and then start from it.

What is the best way to get to the "ancestor" on which we want to call
nsBlockInFlowLineIterator iter(blockAncestor, frame, &valid);
to start the line iteration?
Flags: needinfo?(ehsan)
Aryeh, if you have time and feel inclined, you could give me some feedback on the code. The code has three "TODO"s, so perhaps you can give me some tips.

Ehsan, the first "TODO" relates to finding the starting frame. Since a <pre> is passed in, I get the first text node child drilling into it and use its frame. Perhaps there is a better way. I already asked the same question in more detail in comment #86.

The test cases have opened a hornets' nest. There are not only three calls to MoveBlock (which work well if we swap a relevant \n for a <br>).

In some cases JoinNodesSmart() is used, and therefore we need to replace \n there, too. Case 11 shows this:
<div>test</div><pre><div>foo\nbar</div>baz\nfred</pre>
Result:
<div>testfoo<br>bar</div><pre>baz\nfred</pre>
As we see, we can't remove the <br> later on.

However, making the test:
"<div>test</div><pre><div>foo\nbar\nbar</div>baz\nfred</pre>"
currently results in:
"<div>testfoo<br>bar\nbar</div><pre>baz\nfred</pre>"
since only the first \n was replaced.

I need to carefully look at which cases use JoinNodesSmart().
Attachment #8643798 - Attachment is obsolete: true
Flags: needinfo?(ayg)
I don't know anything about frames/layout stuff, sorry.  But if JoinNodesSmart is a problem too, it doesn't have to block fixing MoveBlock -- you could fix MoveBlock first and JoinNodesSmart later if it's easier for you.
Flags: needinfo?(ayg)
(In reply to :Aryeh Gregor (working until August 13) from comment #88)
> But if
> JoinNodesSmart is a problem too, it doesn't have to block fixing MoveBlock
> -- you could fix MoveBlock first and JoinNodesSmart later if it's easier for
> you.

I thought of that. I need to analyse which tests trigger which behaviour, so we know where the problem is located, if the test returns a funny result.
Summary: Joining <div> with <pre> obliterates preformatting → Joining <div> with <pre> obliterates preformatting (Part 1: Fix MoveBlock ())
I have limited the bug to fixing MoveBlock(). There is more work in JoinNodesSmart which joins two <div> elements. Since there are various problems with joining <div> elements, this will be subject to another bug.

I know that you hate to look at work in progress, but I would like you to look at three issues marked "TODO" in the code (there are also two "TODO" in the tests, which you can ignore).

The code issues tagged with "TODO" are:
1) What is the best way to find the frame to start with (see also comments #86).
2) Please look at the implementation of "iterate over to the last element on
   the line". The last frame on the line can be an inline or block frame,
   in which case I need to look through it. Is what I've done correct?
   The tests all work, but there are no complicated nested testcases. Perhaps
   recursion is missing.
3) In that same code I'm looking for nsTextFrame. Is there a way to tell other
   than casting/querying to that type?

I won't copy the code in here. Please use a review window. Thanking you in advance. Please keep in mind that Europe is six hours ahead of Toronto, so an early answer would be appreciated.
Attachment #8644316 - Attachment is obsolete: true
Attachment #8644399 - Flags: feedback?(ehsan)
The test cases along the lines of

<style> .pre { white-space: pre } </style>
<div>test</div><div class="pre">foobar
baz</div>

are pretty bad due to bug 1191875 which I've just raised. So far this bug has spun off two more ;-(
(bug 1190161 and bug 1191875)
Refreshing patch. I finally looked at all my tests.

The crazy list tests

<ul><pre><li>test</li>foobar\nbaz</pre></ul>

do not work at all.

The reason is that the iter.GetLine(); does not find a next line.
Debug "no next line found".

Passed in to the preprocessing is the <pre>, so that's good.

So question: Do I need to refer this to Robert?
Attachment #8644399 - Attachment is obsolete: true
Attachment #8644399 - Flags: feedback?(ehsan)
Attachment #8644484 - Flags: feedback?(ehsan)
Comment on attachment 8644399 [details] [diff] [review]
WIP: Publishing current state for feedback

Review of attachment 8644399 [details] [diff] [review]:
-----------------------------------------------------------------

Two of your questions below are not easy to answer without debugging.  Thinking about this more, I don't really think this is a good bug for someone who is not deeply familiar with our DOM/layout internals to work on...  Would you be OK for me to take it over in a few weeks?

::: editor/libeditor/nsHTMLEditRules.cpp
@@ +2910,5 @@
> +
> +  nsCOMPtr<nsIContent> node = do_QueryInterface(textNode);
> +
> +  nsIFrame* frame = node->GetPrimaryFrame();
> +  if (!frame) {

I don't think you want to do FindFirstTextNode() at all, I think you should be able to start with frame here, but it's hard to say without debugging, which I don't have time to do right now.

@@ +2965,5 @@
> +  nsIFrame* lastFrame;
> +  for (frame = prevLine->mFirstChild; count > 0;
> +       --count, frame = frame->GetNextSibling()) {
> +    lastFrame = frame;
> +  }

I thought that lastFrame after this loop would be a BRFrame if the line ends in a br, so in theory you should be able to remove the whole next block.  I don't know why it isn't the case, and I can't guess without debugging.

@@ +2978,5 @@
> +      if (frame->GetType() == nsGkAtoms::brFrame) {
> +        // The line end happened due to a break, so we are done.
> +        return nullptr;
> +      }
> +      nsTextFrame* tf = do_QueryFrame(frame); // TODO: Better way to check for text frame?

You could check for that like this:

  frame->GetType() == nsGkAtoms::textFrame.
Attachment #8644399 - Attachment is obsolete: false
Flags: needinfo?(ehsan)
Attachment #8644484 - Flags: feedback?(ehsan)
Attachment #8644399 - Attachment is obsolete: true
(In reply to Ehsan Akhgari (not reviewing patches, not reading bugmail, needinfo? me!) from comment #93)
> Thinking about this more, I don't really think this is a good bug for
> someone who is not deeply familiar with our DOM/layout internals to work
> on...  Would you be OK for me to take it over in a few weeks?

No problem. The WIP is attached, all the problems are clearly flagged.
The tests all pass, the ones that don't are clearly marked and are not run.

I'm looking forward to seeing your solution.

The Thunderbird crowd who experience this frequently would like to have it for mozilla45 or earlier.

I will do some work on the spin-off, bug 1190161, the funny backspace behaviour. I that got fixed first, you could scrap all the "todo" tests under "Repeating all tests above with the <pre> on a new line."
Assignee: mozilla → ehsan
Thanks for all of your help on this so far!  Needinfoing myself so that I don't forget.
Flags: needinfo?(Pidgeot18)
Ehsan: I assume you really meant to NI yourself and not jcranmer
Flags: needinfo?(Pidgeot18) → needinfo?(ehsan)
Yes, sorry!
Flags: needinfo?(ehsan)
Flags: needinfo?(ehsan)
Comment on attachment 8644399 [details] [diff] [review]
WIP: Publishing current state for feedback

Review of attachment 8644399 [details] [diff] [review]:
-----------------------------------------------------------------

::: editor/libeditor/nsHTMLEditRules.cpp
@@ +2869,5 @@
> + * obliterated.
> + */
> +static
> +nsIDOMNode*
> +FindFirstTextNode (nsIDOMNode* aNode)

Call this "FindFirstTextNodeDescendant"

@@ +2881,5 @@
> +    if (nodeType == nsIDOMNode::TEXT_NODE) {
> +      return child;
> +    } else {
> +      // Drill down recursively.
> +      nsCOMPtr<nsIDOMNode> grandChild = FindFirstTextNode(child);

Call this 'descendant' not 'grandChild'

@@ +3007,5 @@
> +  return textFrame->GetContent();
> +}
> +
> +nsresult
> +nsHTMLEditRules::PreprocessPreformat(nsIDOMNode* aNode, nsCOMPtr<nsIDOMNode>* aBr)

Can you document what this actually does?
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #98)
> Can you document what this actually does?

Let me give you the background on the bug. The problem is that when you join a <div> with a <pre> by either forward delete from the <div> or backspace from the <pre>, the content of the <pre> is moved into the <div> and loses its preformatting. This is a regression from May 2010 when Ehsan changed(broke) "IsPreformatted()". In May 2010 this regression wasn't noted. In November 2010 the function "ReplaceNewlines()" got removed, so newlines in editable <pre> elements are no longer replaced with <br>. So regardless of whether the error introduced in the detection of <pre> elements in "IsPreformatted()" got fixed or not, due to the November 2010 change, the <pre> elements still get obliterated when joined to the preceding <div>.

The join happens in "MoveBlock()", so the idea was to preprocess the <pre> by replacing the fist visible line break caused by a newline with a <br> and then let "MoveBlock()" take its course. So this is what "PreprocessPreformat()" actually does. The comment preceding the function says:

* In a preformat we need to change the first newline for a break so only the
* first line gets moved. If we don't do that, than entire preformatting may get
* obliterated.

So the preprocessing does the following:
Helper function FindTextNodeWithNewline:
Starting at the frame of first text node in the <pre> (found with "FindFirstTextNode()") it uses a nsBlockInFlowLineIterator to move from the starting line (prevline) onto the next line (nextline). If a next line is found, it searches for the last frame in the starting line. If this is already a <br>, we don't need to replace anything. If this is a textframe, the function returns the offset of the newline.
Once the helper function has returned the location of the newline, "PreprocessPreformat()" then deletes this and inserts a <br>.

I hope this answers your question.

This basically works as almost all the test pass (some with somewhat suboptimal results due to other editor problems). Three problems remain:
1) Do I really need to dig out the first text node in the <pre> to get the starting frame or is there a better way?
2) What is the best way to locate the last frame in the starting line?
3) The tests that look like <ul><pre><li>test</li>foobar\nbaz</pre></ul> fail, since the nsBlockInFlowLineIterator doesn't seem to find a next line. That's unexpected.
Sorry for the delayed reply.

One point: it seems like here we should be looking at computed style for 'white-space' rather than looking for a <pre> element directly. It's possible that elements other than <pre> have white-space:pre-line or white-space:pre, and descendants of the <pre> element may have white-space:normal. But it might be OK to ignore that here.

I wonder if it would make more sense to make nsHTMLEditRules::GetNodesFromPoint aware of preformatted newlines. Temporarily changing a newline to a <br> and then back again seems a bit hacky. I think it shouldn't be too hard to modify nsHTMLEditRules::GetPromotedPoint so that it stops after a text node which is white-space:pre/pre-line and ends with a newline. What do you think?
Flags: needinfo?(ehsan) → needinfo?(mozilla)
Robert, thanks for the analysis. I will look into it.

Yes, the proposed "hacky" solution of changing a newline to a <br> has already been flagged as sub-optimal by Aryeh. Sadly, I'm a newcomer here (as Ehsan said: not "someone who is not deeply familiar with our DOM/layout internals"), so I was desperately looking for guidance. So thanks again.
Flags: needinfo?(mozilla)
Attachment #8644484 - Attachment is obsolete: true
Robert, I made a start to investigate your suggestion. Sadly I got dragged off to other problems (dictionaries!). To benefit from the time difference Berlin-Auckland, I will ask my question now. If you don't answer, I will investigate further tomorrow.

In the attached patch I found the place to hook in. Just above my new code we have 
if (mHTMLEditor->IsVisBreak(nextNode->AsDOMNode())) { break ; }
which is the code that makes the <div>xxx</div><pre>foo<br>bar<br>baz</pre> case work.

Sadly what you said
> I think it shouldn't be too hard to modify nsHTMLEditRules::GetPromotedPoint so 
> that it stops after a text node which is white-space:pre/pre-line and *ends with a newline*.
is not accurate. The text node does not "end" with a new line. Reminder: We're looking at
<div>xxx</div><pre>foo\nbar\nbaz</pre>. The text node in fact spans three lines.

My question is, and maybe it would need debugging to answer it (so don't!):
Do you think I can just return from the function with the right return values (offset and node) or do you think I need to split the text node at the first newline and remove the newline?
I have the feeling that the latter will be necessary. The next question is whether this is the right place to do this edit.
Flags: needinfo?(roc)
Oops yeah, I got confused between nodes and frames.

For sanity we should minimize the number of places that modify the DOM. Thus I think PromoteRange and its callees should not modify the DOM. GetNodesForOperation already modifies the DOM (under BustUpInlinesAtBRs at least), so I suggest we make it responsible for splitting the text node when necessary.
Flags: needinfo?(roc)
I hope you are not bored by this yet. It is really hard to find a good spot to put the knife to it.

Just briefly on the history of this:
Before you (rightly) started questioning the "hacky" pre-processing approach and made other suggestions, there were *two* pre-processing attempts, the first, attachment 8642872 [details] [diff] [review], even good feedback+ from Aryeh (comment #68). Then Ehsan came along (comment #70) and suggested to find the relevant newline based on frames using line iterators, see attachment 8644484 [details] [diff] [review], which you already briefly reviewed in comment #98.

Next you suggested to hook into nsHTMLEditRules::GetPromotedPoint(). With the WIP patch (attachment 8656063 [details] [diff] [review]), I could see the text node go past, ready to stab it! I didn't capture the debug output, but the text (including the newlines) was printed nicely and the code found the first newline and printed its offset.

Now, for take two, I hooked into nsHTMLEditRules::GetNodesForOperation(). This is attractive, since it already does some slicing just above my new test code:
  // Put these nodes in aOutArrayOfNodes, replacing the current node
  aOutArrayOfNodes.RemoveElementAt(i);
  aOutArrayOfNodes.InsertElementsAt(i, arrayOfInlines);

However, let's look at what comes past in my debug:
In the case that already works, the <div>xxx</div><pre>foo<br>bar<br>baz</pre> I get this (I deleted some stuff that's not relevant):
***** isPRE 1 on ...
br@0A4ED920 state=[40000020000] flags=[00100200] primaryframe=0DD529F8
***** isPRE 1 on ...
Text@0A4ED880 flags=[02000200] primaryframe=0DD52968 refcount=9<foo>
***** preformatted text foo
So in this example, the inner nodes of the pre present themselves, ready to attack.

However, the other example that I aim to fix, the <div>xxx</div><pre>foo\nbar\nbaz</pre> gives this debug:
***** isPRE 1 on ...
pre@12D47880 state=[40000020000] flags=[00101200] primaryframe=0DD52560 
  Text@0A419AB0 flags=[02000208] ranges:1 <foo\u000abar\u000abaz\u000a>

In other words, I only see the pre and I'd still have to dig into it, like I did in the two pre-processing approaches. I need to find the first text node that has a newline (skipping over font/bold/etc.) and then split that node at the newline. This was already done in the first pre-processing approach, there I had a FindTextNodeWithNewline doing about that, so I could revive that (see attachment 8642872 [details] [diff] [review]).

What do you think?

P.S.: I haven't forgotten that you said not to focus too much on the pre, but rather in generic "white-space:pre". However, the pre is my first test case.
Flags: needinfo?(roc)
I do not understand why you're getting different results for those two cases. What is in aArrayOfRanges in each case?
Flags: needinfo?(roc)
Examples:
<body>
<div style="height:400px;background-color:#eee;border:1px solid #000;padding:10px" contenteditable="">
<div>test</div><pre>foo
bar
baz</pre>
<div>test</div><pre>foo<br>bar<br>baz</pre>
</div>
</body></html>

Here is more debug, a little ugly.

You can see that in both cases, the startparent is the surrounding <div contenteditable>.
The endparent is different. In one case also that <div contenteditable>, in the the other case the <pre>.

Case <div>test</div><pre>foo<br>bar<br>baz</pre>

***** nsHTMLEditRules::GetNodesForOperation - array of ranges
***** >> Startoffset 5
***** >> Endoffset 2
***** >> Startparent
div@13404380 contenteditable="" style="height:400px;background-color:#eee;border
:1px solid #000;padding:10px" state=[40000020006] flags=[00104208] ranges:1 prim
aryframe=14E3A010 refcount=66<
  Text@13404420 flags=[07800200] primaryframe=00000000 refcount=2<\u000a\u000a>
  div@13404470 state=[40000020000] flags=[00100200] primaryframe=14E3A1D8 refcou
nt=16<
    Text@134044C0 flags=[03000208] primaryframe=14E3A278 refcount=2<test>
  >
  pre@134060A0 state=[40000020000] flags=[00101200] primaryframe=14E3A520 refcou
nt=12<
    Text@13404560 flags=[02000208] ranges:1 primaryframe=14E3A660 refcount=6<foo
\u000abar\u000abaz>
  >
  Text@134045B0 flags=[07800200] primaryframe=00000000 refcount=1<\u000a\u000a>
  div@13404600 state=[40000020004] flags=[00100200] primaryframe=14E3A758 refcou
nt=60<
    Text@13404650 flags=[03000208] primaryframe=15028810 refcount=15<test>
  >
  pre@13406160 state=[40000020000] flags=[00101200] primaryframe=150288A0 refcou
nt=13<
    Text@134046A0 flags=[02000200] primaryframe=15028940 refcount=6<foo>
    br@13404740 state=[40000020000] flags=[00100200] primaryframe=150289D0 refco
unt=2<>
    Text@13404790 flags=[02000200] primaryframe=15028A10 refcount=2<bar>
    br@134047E0 state=[40000020000] flags=[00100200] primaryframe=15028AA0 refco
unt=2<>
    Text@13404830 flags=[02000208] ranges:1 primaryframe=15028AE0 refcount=4<baz
>
  >
  Text@13404880 flags=[07800200] primaryframe=00000000 refcount=1<\u000a\u000a>
>
***** >> Endparent
pre@13406160 state=[40000020000] flags=[00101200] primaryframe=150288A0 refcount
=14<
  Text@134046A0 flags=[02000200] primaryframe=15028940 refcount=6<foo>
  br@13404740 state=[40000020000] flags=[00100200] primaryframe=150289D0 refcoun
t=2<>
  Text@13404790 flags=[02000200] primaryframe=15028A10 refcount=2<bar>
  br@134047E0 state=[40000020000] flags=[00100200] primaryframe=15028AA0 refcoun
t=2<>
  Text@13404830 flags=[02000208] ranges:1 primaryframe=15028AE0 refcount=4<baz>
>
***** >> Node
br@13404740 state=[40000020000] flags=[00100200] primaryframe=150289D0 refcount=
5<>
***** isPRE 1
***** >> Node
Text@134046A0 flags=[02000200] primaryframe=15028940 refcount=9<foo>
***** isPRE 1
***** preformatted text foo

Case <div>test</div><pre>foo\nbar\nbaz</pre>

***** nsHTMLEditRules::GetNodesForOperation - array of ranges
***** >> Startoffset 2
***** >> Endoffset 3
***** >> Startparent
div@13404380 contenteditable="" style="height:400px;background-color:#eee;border
:1px solid #000;padding:10px" state=[40000020006] flags=[00104208] ranges:1 prim
aryframe=14E3A010 refcount=215<
  Text@13404420 flags=[07800200] primaryframe=00000000 refcount=2<\u000a\u000a>
  div@13404470 state=[40000020004] flags=[00100200] primaryframe=14E3A1D8 refcou
nt=40<
    Text@134044C0 flags=[03000208] primaryframe=14E3A278 refcount=15<test>
  >
  pre@134060A0 state=[40000020000] flags=[00101200] primaryframe=14E3A520 refcou
nt=61<
    Text@13404560 flags=[02000208] ranges:1 primaryframe=14E3A660 refcount=10<fo
o\u000abar\u000abaz>
  >
  Text@134045B0 flags=[07800200] primaryframe=00000000 refcount=1<\u000a\u000a>
  div@13404600 state=[40000020000] flags=[00100208] ranges:1 primaryframe=14E3A7
58 refcount=31<
    Text@13404650 flags=[03000208] primaryframe=15028810 refcount=5<test>
    Text@134046A0 flags=[03000208] primaryframe=15028940 refcount=8<foo>
    br@13404740 state=[40000020000] flags=[00100200] primaryframe=150289D0 refco
unt=5<>
  >
  pre@13406160 state=[40000020000] flags=[00101200] primaryframe=150288A0 refcou
nt=16<
    Text@13404790 flags=[02000200] primaryframe=15028A10 refcount=3<bar>
    br@134047E0 state=[40000020000] flags=[00100200] primaryframe=15028AA0 refco
unt=2<>
    Text@13404830 flags=[02000208] ranges:1 primaryframe=15028AE0 refcount=4<baz
>
  >
  Text@13404880 flags=[07800200] primaryframe=00000000 refcount=1<\u000a\u000a>
>
***** >> Endparent
div@13404380 contenteditable="" style="height:400px;background-color:#eee;border
:1px solid #000;padding:10px" state=[40000020006] flags=[00104208] ranges:1 prim
aryframe=14E3A010 refcount=216<
  Text@13404420 flags=[07800200] primaryframe=00000000 refcount=2<\u000a\u000a>
  div@13404470 state=[40000020004] flags=[00100200] primaryframe=14E3A1D8 refcou
nt=40<
    Text@134044C0 flags=[03000208] primaryframe=14E3A278 refcount=15<test>
  >
  pre@134060A0 state=[40000020000] flags=[00101200] primaryframe=14E3A520 refcou
nt=61<
    Text@13404560 flags=[02000208] ranges:1 primaryframe=14E3A660 refcount=10<fo
o\u000abar\u000abaz>
  >
  Text@134045B0 flags=[07800200] primaryframe=00000000 refcount=1<\u000a\u000a>
  div@13404600 state=[40000020000] flags=[00100208] ranges:1 primaryframe=14E3A7
58 refcount=31<
    Text@13404650 flags=[03000208] primaryframe=15028810 refcount=5<test>
    Text@134046A0 flags=[03000208] primaryframe=15028940 refcount=8<foo>
    br@13404740 state=[40000020000] flags=[00100200] primaryframe=150289D0 refco
unt=5<>
  >
  pre@13406160 state=[40000020000] flags=[00101200] primaryframe=150288A0 refcou
nt=16<
    Text@13404790 flags=[02000200] primaryframe=15028A10 refcount=3<bar>
    br@134047E0 state=[40000020000] flags=[00100200] primaryframe=15028AA0 refco
unt=2<>
    Text@13404830 flags=[02000208] ranges:1 primaryframe=15028AE0 refcount=4<baz
>
  >
  Text@13404880 flags=[07800200] primaryframe=00000000 refcount=1<\u000a\u000a>
>
***** >> Node
pre@134060A0 state=[40000020000] flags=[00101200] primaryframe=14E3A520 refcount
=64<
  Text@13404560 flags=[02000208] ranges:1 primaryframe=14E3A660 refcount=10<foo\
u000abar\u000abaz>
>
***** isPRE 1
Attachment #8656727 - Attachment is obsolete: true
Flags: needinfo?(roc)
Why does your testcase have
<div>test</div><pre>foo
bar
baz</pre>
in it, at the start? Having two <pre>s in each testcase seems irrelevant and confusing.

The second dump is very confusing. It looks like the startOffset is pointing to that first, irrelevant <pre>. Maybe fixing the testcase will make things clearer.
Flags: needinfo?(roc)
There are two test cases.

<body>
<div style="height:400px;background-color:#eee;border:1px solid #000;padding:10px" contenteditable="">
<div>test</div><pre>foo   <===== first test case
bar
baz</pre>
<div>test</div><pre>foo<br>bar<br>baz</pre>  <===== second test case
</div>
</body></html>

You had asked: "What is in aArrayOfRanges in each case?".

For the first test case, also referred to as <div>test</div><pre>foo\nbar\nbaz</pre>,
start and end node are the surrounding editable <div>:

***** nsHTMLEditRules::GetNodesForOperation - array of ranges
***** >> Startoffset 2
***** >> Endoffset 3
***** >> Startparent
div@13404380 contenteditable=""
0  Text@13404420 flags=[07800200] primaryframe=00000000 refcount=2<\u000a\u000a> <== newline before 1st case
1  div@13404470 state=[40000020000] flags=[00100200] <== div belonging to first test
2  pre@134060A0 state=[40000020000] flags=[00101200] <== pre belonging to first test
3  Text@134045B0 flags=[07800200] primaryframe=00000000 refcount=1<\u000a\u000a> <== newline separating tests
***** >> Endparent
div@13404380 contenteditable="" (content same as above)

Startoffset 2 is right before the <pre> and endoffset 3 is therefore right behind.
In other words, the range contains the <pre>, so the entire <pre> is moved/joined to the <div>.

In the second test case, also referred to as <div>test</div><pre>foo<br>bar<br>baz</pre>
the start node is the surrounding editable <div>, and the end node is the <pre>:

***** nsHTMLEditRules::GetNodesForOperation - array of ranges
***** >> Startoffset 5
***** >> Endoffset 2
***** >> Startparent
div@13404380 contenteditable=""
0  Text@13404420 flags=[07800200] primaryframe=00000000 refcount=2<\u000a\u000a> <== newline before 1st case
1  div@13404470 state=[40000020000] flags=[00100200] <== div belonging to first test
2  pre@134060A0 state=[40000020000] flags=[00101200] <== pre belonging to first test
3  Text@134045B0 flags=[07800200] primaryframe=00000000 refcount=1<\u000a\u000a> <== newline separating tests
4  div@13404600 state=[40000020004] flags=[00100200] primaryframe=14E3A758 <== div belonging to second test
5  pre@13406160 state=[40000020000] flags=[00101200] primaryframe=150288A0 <== pre belonging to second test
    Text@134046A0 flags=[02000200] primaryframe=15028940 refcount=6<foo>
    br@13404740 state=[40000020000] flags=[00100200] primaryframe=150289D0 
    Text@13404790 flags=[02000200] primaryframe=15028A10 refcount=2<bar>
    br@134047E0 state=[40000020000] flags=[00100200] primaryframe=15028AA0
    Text@13404830 flags=[02000208] ranges:1 primaryframe=15028AE0 refcount=4<baz>

***** >> Endparent
pre@13406160 state=[40000020000] 
0  Text@134046A0 flags=[02000200] primaryframe=15028940 refcount=6<foo>
1  br@13404740 state=[40000020000] flags=[00100200] primaryframe=150289D0
2  Text@13404790 flags=[02000200] primaryframe=15028A10 refcount=2<bar>  <=== Endoffset 2
   br@134047E0 state=[40000020000] flags=[00100200] primaryframe=15028AA0 
   Text@13404830 flags=[02000208] ranges:1 primaryframe=15028AE0 refcount=4<baz>

In this case, the start is right before the relevant (second) <pre>. The end of the range has been moved into the <pre>, to after the <br>, since the offset is 2. This was already done by PromoteRange.

Summary of the observations:
In the <br> case, the range has already been set to include parts of the <pre> being joined, right upto after the first <br> in that <pre>. In the newline case, the entire <pre> was marked for moving, not cutting into it.

Conclusion: 
Looking at this, GetNodesForOperation doesn't appear to be the right point of attack. I think the function sets the end of the range to cut into the <pre> (PromoteRange) should also be the function to slice up the text at the newlines.

As per comment #103, GetPromotedPoint (called inside PromoteRange) is a handy spot to do the slicing, since there, it already treats the <br> case:
https://dxr.mozilla.org/mozilla-central/source/editor/libeditor/nsHTMLEditRules.cpp#5614

If we don't do it there, then the "hacky" preprocessing is still an option. When we arrive in GetNodesForOperation, we missed the boat ;-)

So options:
1) "Hacky" preprocessing before PromoteRange is called.
2) Slice up the text in GetPromotedPoint (in PromoteRange) right after the <br> processing (see line above).
   See attachment 8656063 [details] [diff] [review] for where the code would go. The "preformatted" text *is* already visited there,
   the debug code you can see in the patch did dump out the text that needed the slicing and the
   offset to the newline.

The hacky preprocessing is a whole lot of code to locate the right spot, doing it inside GetPromotedPoint is very simple (as already outlined by attachment 8656063 [details] [diff] [review]: It finds the text in need of slicing and the offset. Only missing is to do the DOM modifications: Remove the newline and split the node).

What do you think?
Flags: needinfo?(roc)
(In reply to Jorg K (GMT+2) from comment #109)
> <body>
> <div style="height:400px;background-color:#eee;border:1px solid
> #000;padding:10px" contenteditable="">
> <div>test</div><pre>foo   <===== first test case
> bar
> baz</pre>
> <div>test</div><pre>foo<br>bar<br>baz</pre>  <===== second test case
> </div>
> </body></html>

Don't put two testcases in the same file. That makes everything more complicated to understand.
Flags: needinfo?(roc)
(In reply to Jorg K (GMT+2) from comment #109)
> <body>
> <div style="height:400px;background-color:#eee;border:1px solid
> #000;padding:10px" contenteditable="">
> <div>test</div><pre>foo   <===== first test case
> bar
> baz</pre>
> <div>test</div><pre>foo<br>bar<br>baz</pre>  <===== second test case
> </div>
> </body></html>
> 
> You had asked: "What is in aArrayOfRanges in each case?".
> 
> For the first test case, also referred to as
> <div>test</div><pre>foo\nbar\nbaz</pre>,
> start and end node are the surrounding editable <div>:
> 
> ***** nsHTMLEditRules::GetNodesForOperation - array of ranges
> ***** >> Startoffset 2
> ***** >> Endoffset 3
> ***** >> Startparent
> div@13404380 contenteditable=""
> 0  Text@13404420 flags=[07800200] primaryframe=00000000
> refcount=2<\u000a\u000a> <== newline before 1st case
> 1  div@13404470 state=[40000020000] flags=[00100200] <== div belonging to
> first test
> 2  pre@134060A0 state=[40000020000] flags=[00101200] <== pre belonging to
> first test
> 3  Text@134045B0 flags=[07800200] primaryframe=00000000
> refcount=1<\u000a\u000a> <== newline separating tests
> ***** >> Endparent
> div@13404380 contenteditable="" (content same as above)
> 
> Startoffset 2 is right before the <pre> and endoffset 3 is therefore right
> behind.
> In other words, the range contains the <pre>, so the entire <pre> is
> moved/joined to the <div>.

This is a bug. The end node should be the <pre>'s text node child, and the end offset should be 4 (after the first \n). So we'll need to fix that in nsHTMLEditRules::GetPromotedPoint or thereabouts. But we should *not* do actual DOM changes there.
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #111)
> This is a bug. The end node should be the <pre>'s text node child, and the
> end offset should be 4 (after the first \n). So we'll need to fix that in
> nsHTMLEditRules::GetPromotedPoint or thereabouts. But we should *not* do
> actual DOM changes there.

Thanks for the instructions. I'll look into it further.
Test case:
<body>
<div style="height:400px;background-color:#eee;border:1px solid #000;padding:10px" contenteditable="">
<div>test</div><pre>foo
bar
baz</pre>
</body></html>

The attached patch gives the following debug (with stuff deleted):

***** nsHTMLEditRules::GetNodesForOperation - array of ranges
***** >> Startoffset 2
***** >> Endoffset 4
***** >> Startparent
div@0C301D30 contenteditable="" style="height:400px;
0 Text@0C301E20 flags=[07800200] primaryframe=00000000 refcount=2<\u000a\u000a>
1 div@0C301E70 state=[40000020000] flags=[00100200] 
    Text@0C301F10 flags=[03000208] primaryframe=0FC9A2B8 refcount=15<test>
2 pre@14037040 state=[40000020000] flags=[00101200] 
    Text@0C302330 ranges:1 primaryframe=0FC9A6A0 refcount=12<foo\u000abar\u000abaz>

***** >> Endparent
Text@0C302330 refcount=13<foo\u000abar\u000abaz>

So the start is right before the <pre>, the end is in the <pre>'s text, after the newline. Just as you said in comment #111.

Next step it to slice up the text in nsHTMLEditRules::GetNodesForOperation().
Attachment #8656063 - Attachment is obsolete: true
Attachment #8656972 - Attachment is obsolete: true
Test case, | marks the cursor position from where we (forward) delete:
  <div>test</div>|<pre>foo
  bar
  baz</pre>

With the attached patch we get this result:
  <div>testfoo
  </div><pre>bar
  baz</pre>

As per comment #111, the offset into the <pre>'s text note is 4, so we cut after the first newline (which is therefore moved into the <div>).

Can you please give me some feedback on the patch.
The try server is currently closed, so I can't see whether it does any damage.

Which result would you prefer:

Option 1: Offset 4, moving the newline with the "foo" (as above)
  <div>testfoo
  </div><pre>bar
  baz</pre>

Option 2: Offset 3, not moving the newline with the "foo", but leaving it at the front of "bar"
  <div>testfoo</div><pre>
  bar
  baz</pre>

Option 3: Offset 3 but also removing the newline (some more processing required).
  <div>testfoo</div><pre>bar
  baz</pre>
Note that Chrome produces the third option.

In comparison, if we start with
  <div>test</div>|<pre>foo<br>bar
  baz</pre>
we get:
  <div>testfoo<br></div><pre>bar
  baz</pre>
so this is equivalent to option 1) (the break being moved into the <div>).

Once I have your answer, I'll adapt the 60 test cases I have already prepared (attachment 8644399 [details] [diff] [review]) to match the selected behaviour/option. (At the moment the existing test cases expect the third option, since they were written for the "hacky" solution, assuming the newline would be replaced by a <br> which was subsequently removed.)
Attachment #8658392 - Attachment is obsolete: true
Flags: needinfo?(roc)
Attachment #8658857 - Flags: feedback?(roc)
I actually prefer option 1, because it seems the simplest.
Flags: needinfo?(roc)
Comment on attachment 8658857 [details] [diff] [review]
WIP: Solution tested on simple case, publishing for feedback.

Review of attachment 8658857 [details] [diff] [review]:
-----------------------------------------------------------------

This looks good!

::: editor/libeditor/nsHTMLEditRules.cpp
@@ +5630,5 @@
> +        if (newlinePos >= 0) {
> +          offset = newlinePos + 1;
> +          // printf ("***** preformatted text, offset=%d\n", offset);
> +          *outNode = nextNode->AsDOMNode();
> +          *outOffset = offset;

Set outOffset directly without updating 'offset', because 'offset' means something else (an offset within nextNode's parent node, not an offset within nextNode itself).

@@ +5631,5 @@
> +          offset = newlinePos + 1;
> +          // printf ("***** preformatted text, offset=%d\n", offset);
> +          *outNode = nextNode->AsDOMNode();
> +          *outOffset = offset;
> +          return;

Seems to me that if the \n is the last character of the text node, we should do a break; here so that the "look up the tree" behaviour executes.
Attachment #8658857 - Flags: feedback?(roc) → feedback+
Finally the proposed solution. We had "hacky" DOM-based (Mark 1) and frame-based (Mark 2) pre-processing, so here now the "real" fix. The advantage of this (simple) fix is that more test cases give a decent result than ever before.

There are 71 test cases, each one is run three times: forward delete, backspace and delete with a non-collapsed range. So there are 213 tests in total.

Let's see what this breaks ;-)
https://treeherder.mozilla.org/#/jobs?repo=try&revision=c50c6103c6d4
Attachment #8658857 - Attachment is obsolete: true
Damn, that didn't compile on Linux, here we go again:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=b5306c35760f
I will submit the changed patch later, I only added a cast.
Assignee: ehsan → mozilla
Fixed the compile error and added a comment.

Robert, sorry about those 71 tests. Aryeh wanted some and Ehsan wanted some more, so that's why there are so many. They are a pain to look through, but even a bigger pain to write ;-)
Attachment #8659194 - Attachment is obsolete: true
Attachment #8659289 - Flags: review?(roc)
Comment on attachment 8659289 [details] [diff] [review]
Proposed solution (Mark 3, v1a) with 71 tests

Review of attachment 8659289 [details] [diff] [review]:
-----------------------------------------------------------------

Excellent!!!
Attachment #8659289 - Flags: review?(roc) → review+
Keywords: checkin-needed
Blocks: 1203847
https://hg.mozilla.org/mozilla-central/rev/af909b481b95
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla43
Depends on: 1206483
Backed out of mozilla-central in https://hg.mozilla.org/mozilla-central/rev/d49db60f3894
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: mozilla43 → ---
This is like the previous patch, but with the splitting of the text node removed from GetNodesForOperation and moved into the caller.

I repeat the comment from bug 1206483 comment #18:

To fix bug 772796 you suggested to change nsHTMLEditRules::GetPromotedPoint so it returned an offset into the text (bug 772796 comment #111) without changing the DOM.

I then put the slicing up, that is the DOM change, into nsHTMLEditRules::GetNodesForOperation().

Let's recall what GetNodesFromPoint, the work-horse of MoveBlock does:
  PromoteRange() (which calls GetPromotedPoint) followed by
  GetNodesForOperation (where we put the slicing).

The "offending slicing code" was in GetNodesForOperation (as you had suggested in bug 772796 comment #104). This function has other callers and under certain circumstances text slicing is undesired.

So all I did is the obvious: Moved the offending code into the caller. GetNodesFromPoint now does:
- PromoteRange() (which calls GetPromotedPoint)
- "text slicing"
- GetNodesForOperation (which no longer does the slicing).

That's the right fix, isn't it? Can you please approve.
Attachment #8659289 - Attachment is obsolete: true
Flags: needinfo?(roc)
Oops, another int/uint problem that Windows didn't pick up:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=a4545a180602
Attachment #8663620 - Attachment is obsolete: true
Flags: needinfo?(roc)
Attachment #8663666 - Flags: review?(roc)
This is (almost) all green now.
Robert, if you do an interdiff of version "Mark 3, v2b" (new proposal) with "Mark 3, v1a" (patch that was backed out), you can see that I just shifted the code to another function:
https://bugzilla.mozilla.org/attachment.cgi?oldid=8659289&action=interdiff&newid=8663666&headers=1

Copying the try run link again:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=a4545a180602
(In reply to Jorg K (GMT+2) from comment #124)
> The "offending slicing code" was in GetNodesForOperation (as you had
> suggested in bug 772796 comment #104). This function has other callers and
> under certain circumstances text slicing is undesired.

What are those other callers expecting? Why don't they want text slicing? What results are they expecting instead of text slicing when an offset inside a text nodes is passed, and do those results make sense or do the callers themselves have nonsensical expectations?
All good questions, thank you for your thoroughness. As I said in bug 1206483 comment #17 (quote):

====
I ran the attached example (attachment 8663398 [details]) in both non-e10s and e10s mode.
In non-e10s mode, nsHTMLEditRules::GetNodesForOperation() does not get called
(when double-clicking).
In e10s mode, it gets called all the time, the text at the end range is:
|Double-click Here| (length 17), the offset in the range is 17.
I don't understand why this code is called in e10s mode but not in non-e10s mode.
====

I suppose that the caller expects that nothing happens. Note that I had a patch (which I didn't submit), where I did no slicing when text.Length() == offset, but I thought it's safer to move the slicing elsewhere. 

I really have no idea why this is called in e10s mode. There is a stack dump in bug 1206483 comment #13, so perhaps you can make sense of this. Do you want me to investigate further?
Flags: needinfo?(roc)
(In reply to Jorg K (GMT+2) from comment #129)
> I suppose that the caller expects that nothing happens. Note that I had a
> patch (which I didn't submit), where I did no slicing when text.Length() ==
> offset, but I thought it's safer to move the slicing elsewhere.

No, I think that's the right idea. If the offset is text.Length() or 0, we don't have to split nodes and we should avoid doing so.

Also, I just realized we should avoid modifying the content in any way if aTouchContent == TouchContent::no, just like the rest of GetNodesForOperation does.
Flags: needinfo?(roc)
> No, I think that's the right idea. If the offset is text.Length() or 0,
> we don't have to split nodes and we should avoid doing so.
Done. I've added two test cases 71 and 72 where splitting takes place at the front, at offset 1.

> Also, I just realized we should avoid modifying the content in any way
> if aTouchContent == TouchContent::no, just like the rest of 
> GetNodesForOperation does.
Nice idea, but doesn't fly, since "no" is passed in when we do want to split.

Overall, I think this approach is more risky than just shifting the code to a safe place ;-)
Attachment #8664041 - Flags: review?(roc)
Handy interdiff between current proposal and landed/backed out patch:
https://bugzilla.mozilla.org/attachment.cgi?oldid=8659289&action=interdiff&newid=8664041&headers=1
Comment on attachment 8664041 [details] [diff] [review]
Proposed solution (Mark 3, v3) with 73 tests

Review of attachment 8664041 [details] [diff] [review]:
-----------------------------------------------------------------

(In reply to Jorg K (GMT+2) from comment #131)
> > Also, I just realized we should avoid modifying the content in any way
> > if aTouchContent == TouchContent::no, just like the rest of 
> > GetNodesForOperation does.
> Nice idea, but doesn't fly, since "no" is passed in when we do want to split.
> 
> Overall, I think this approach is more risky than just shifting the code to
> a safe place ;-)

The aTouchContent parameter to nsHTMLEditRules::GetNodesFromPoint and nsHTMLEditRules::GetNodesForOperation is pretty clearly supposed to mean that when "no" is passed, we should not modify the content. (This should be clearly stated in comments near those functions, but it isn't. We should fix that.) Therefore, we should definitely not modify the content in any way when "no" is passed.

Therefore, when MoveBlock calls GetNodesFromPoint, if we want it to split text nodes, we need to pass "yes" from MoveBlock. Is that clear?

Writing good code requires that functions have clearly defined contracts, that functions implement those contracts correctly, and that callers respect those contracts. That is at least as important as whether the code passes existing tests, or even whether it passes a lot of new tests. I understand that the existing code in these files is not very good, but we must make it better, not worse.
Attachment #8664041 - Flags: review?(roc) → review-
Now paying attention to 'aTouchContent'.

I bunch of tests have changed their outcome for the better, the bold property is now shifted across. All the tests on <span class="pre"> changed (unless covered by JoinNodesSmart, 07, 09, 11). Tests 44 and 45 are particularly bad, since they now create an empty span.
Cases 44 and 45 aren't even a case where a pre-formatted element needs slicing. It's a case that uses <br> which now got worse.

I modified the tests to reflect the improved/worse results.

This is most likely due to the code following:
// Bust up any inlines that cross our range endpoints, but only if we are
// allowed to touch content.
which is also controlled by 'aTouchContent'.

I'd have to run a try run to see what that affects. I predict quite some failures, since, as we can see in tests 44 and 45, behaviour for things we don't even want to touch also changes.

Try here:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=db42005835f6

I appreciate what you said about clarity of code and contracts. On the practical side, as we've seen, having the code in GetNodesForOperation() which is run from other places, creates quite a challenge, and so far it looks like that this cannot be controlled by 'aTouchContent'. Well see what the try runs says.

Please let me know your opinion.
Attachment #8664041 - Attachment description: Proposed solution (Mark 3, v3) with 71 tests → Proposed solution (Mark 3, v3) with 73 tests
That didn't compile (int/uint issue on Linux), here's a new one:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=2d1bc224b09d
https://treeherder.mozilla.org/#/jobs?repo=try&revision=2d1bc224b09d is all green.
So none of the Mochitests failed. I was wrong, when I predicted test failures.

So let's run all tests:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=b823418309db
(note: submitted patch doesn't show since it's still on the server)
Attachment #8664041 - Attachment is obsolete: true
Attachment #8663666 - Attachment is obsolete: true
Uploaded again with compile error fixed.
Try completely (!) green:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=b823418309db
(note: submitted patch doesn't show since it's still on the server)

So I'll dare ask for review again.

Note the result of tests 44 and 45 produce empty spans, not brilliant, but IMHO acceptable.
Attachment #8664076 - Attachment is obsolete: true
Attachment #8664172 - Flags: review?(roc)
Comment on attachment 8664172 [details] [diff] [review]
Proposed solution (Mark 3, v4b) with 73 tests

Review of attachment 8664172 [details] [diff] [review]:
-----------------------------------------------------------------

I think this is fine. The empty inlines in tests 44 and 45 are just an existing issue that could be triggered in other ways.

If you feel up to it, I'd like to see a separate bug to fix those empty inlines. It should be a relatively simple fix to BustUpInlinesAtRangeEndpoints and its callees.
Attachment #8664172 - Flags: review?(roc) → review+
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) (offline Oct 1-4) from comment #138)
> I think this is fine. The empty inlines in tests 44 and 45 are just an
> existing issue that could be triggered in other ways.
> If you feel up to it, I'd like to see a separate bug to fix those empty
> inlines. It should be a relatively simple fix to
> BustUpInlinesAtRangeEndpoints and its callees.

OK.
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/0ff44857e33f
Status: REOPENED → RESOLVED
Closed: 9 years ago9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla44
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) (offline Oct 1-4) from comment #138)
> If you feel up to it, I'd like to see a separate bug to fix those empty
> inlines. It should be a relatively simple fix to
> BustUpInlinesAtRangeEndpoints and its callees.
Raised bug 1209409 for the empty spans that are created.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: