Closed
Bug 1012530
Opened 11 years ago
Closed 11 years ago
Reorder child nodes when swapping document state
Categories
(Firefox :: Translations, defect)
Firefox
Translations
Tracking
()
VERIFIED
FIXED
Firefox 32
People
(Reporter: Felipe, Assigned: Felipe)
References
Details
(Whiteboard: p=8 s=33.1 [qa!])
Attachments
(2 files, 2 obsolete files)
|
5.91 KB,
patch
|
florian
:
review+
|
Details | Diff | Splinter Review |
|
12.76 KB,
patch
|
florian
:
review+
|
Details | Diff | Splinter Review |
The translation procedure is designed to be able to reorder child nodes when swapping the document from Original -> Translation or vice-versa, but that's not fully implemented yet. (swapping of text nodes is already present but not of child elements)
An example, ignoring whitespace:
English: <div id="n1"><em id="n2">I</em> miss <strong id="n3">you</strong>
French: <div id="n1"><strong id="n3">Tu</strong> <em id="n2">me</em> manques</div>
TranslationItem n1 = {
original: [item n2, "miss", item n3]
translation: [item n3, item n2, "manques"]
}
We can see that the <strong> and <em> elements are ordered differently in the original and translated versions.
Flags: firefox-backlog+
Updated•11 years ago
|
Whiteboard: p=0 [qa?]
Updated•11 years ago
|
Assignee: nobody → felipc
Status: NEW → ASSIGNED
Whiteboard: p=0 [qa?] → p=8 s=it-32c-31a-30b.3 [qa+]
| Assignee | ||
Comment 1•11 years ago
|
||
So this is working the way I wanted, but still needs a lot of clean-up and comments. Various tricky edge-cases covered. I'll add detailed info directly as comments in the code while I clean it up. Just wanted to post it now for a checkpoint with a patch which is working well.
| Assignee | ||
Comment 2•11 years ago
|
||
Previously we were only sending the <br> nodes in order to keep text node splitting correct in a translation request. For example:
---> Hello <hr/><img src="aaa.jpg"/> world
Which was converted to:
---> Hello <br><br> world
But we were not storing this in the .original array. However, for correct node reordering, we must do so.
Consider the following two different webpages (these are not one source and one translation.. these, are two difference source pages):
---> AAAA <hr/><h1>Hello world</h1> BBB
---> AAAA <h1>Hello world</h1><hr/> BBB
If we only stored the following info in the .original array:
translationItem.original = ["AAAA", ptr to h1, "BBB"]
There is no way that we can know what is the correct node ordering when transforming them.
The patch is simple. It made .generateTextForItem() a bit more complicated, but in exchange .regenerateTextFromOriginalHelper() gets simpler. I also sneaked one optimization in that we don't need to send consecutive placeholder markers, one is enough to represent "a block of non-useful nodes".
Attachment #8433854 -
Attachment is obsolete: true
Attachment #8434676 -
Flags: review?(florian)
| Assignee | ||
Updated•11 years ago
|
Attachment #8434676 -
Attachment description: Keep information about placeholder nodes → Part 1 - Keep information about placeholder nodes
| Assignee | ||
Comment 3•11 years ago
|
||
So the swapTextForItem function continues with the same basic structure and idea, but it now supports the reordering of the nodes.
It works by walking a pointer in the DOM childNodes list together while walking through the item.translation array (or original). If it any moment the expected node is not under the pointer position, it is moved to it.
I don't know if the long explanation added in the beginning of the function helps or makes it more confusing :) Maybe you'll find it easier to understand going directly to the code. In any case, to make it easier to understand, I suggest:
- do not look at the interdiff. Look at the new function as a whole (here's a pastebin with it: https://pastebin.mozilla.org/5348992)
- on first pass, totally skip the comment + while-loop that begins with "... actually". That is an optimization pass that doesn't interfere with the way the algorithm works
Attachment #8434683 -
Flags: review?(florian)
| Assignee | ||
Comment 4•11 years ago
|
||
Some of the properties of this algorithm that I wanted to document:
- it tries hard to avoid having to perform node reordering when it's not necessary, so hopefully the work performed by it shouldn't be much larger in the common case
- now it properly handles adjacent text nodes as we discussed before
- with the way it finds the textnodes now, we can expand the concept of simpleroots to more nodes: it can now include items with placeholder children, while before it only allowed items that had a direct, single textNode as firstChild. This will bring some more performance wins, but I haven't included it here to not mix things up and make them more complicated.. it should be part of another bug. I'll only work on that after the blocker bugs for the trial are finished.
Comment 5•11 years ago
|
||
Comment on attachment 8434683 [details] [diff] [review]
Part 2 - Reorder nodes when swapping content
Review of attachment 8434683 [details] [diff] [review]:
-----------------------------------------------------------------
This looks pretty good, the r- is mostly because I would like to have another look after some of the changes that I hope will improve readability, to see if I can detect issues that I wasn't able to spot in the current code.
::: browser/components/translation/TranslationDocument.jsm
@@ +434,5 @@
> + * The function iterates through the target array (either the `original` or
> + * `translation` array from the TranslationItem), while also keeping a pointer
> + * to a current position in the child nodes from the actual DOM node that we
> + * are modifying. This pointer is moved forward after each item of the array
> + * is translated. If, it any given time, the pointer doesn't match the expected
at any
@@ +453,5 @@
> + * pointer does not match the expected element, <b>. So let's move <b> to the
> + * pointer position.
> + *
> + * Current state of the DOM:
> + * <div><b>You</b>I <em>miss</em></div>
Isn't there a text node with one space between </em> and </div>?
@@ +516,5 @@
> + // the array and be pointing to the correct node that needs to modified.
> + // If it's not pointing to it, that means some sort of node reordering
> + // will be necessary to produce the correct translation.
> + // Note that text nodes don't need to be reordered, as we can just replace
> + // the content of one text node with another.
I wonder if not reordering the text nodes would make difficult a feature showing the original text when hovering the translation. I guess that can be figured out later if/when we decided to do such a thing :-).
@@ +519,5 @@
> + // Note that text nodes don't need to be reordered, as we can just replace
> + // the content of one text node with another.
> + //
> + // curPos starts in the firstChild...
> + let curPos = domNode.firstChild;
What about naming this curNode rather than curPos? When I read curPos I expect a numeric index.
@@ +562,4 @@
> // Now let's walk through all items in the `target` array of the
> // TranslationItem. This means either the TranslationItem.original or
> // TranslationItem.translation array.
> for (let child of curItem[target]) {
child is an ambiguous name here. A name containing the word "target" may clarify a bit.
@@ +571,5 @@
> // Adding this child to the stack.
> visitStack.push(child);
> +
> + // If the node has not gone away,
> + if (child.nodeRef &&
I don't see how a node going away would cause .nodeRef to become null. Did you mean to null check child.nodeRef.parentNode?
@@ +579,5 @@
> + // ..unless the page has reparented this node under a totally
> + // different node. In this case, all bets are off on being able
> + // to do anything correctly, so it's better not to bring back
> + // the node to this parent.
> + child.nodeRef.parentNode == domNode) {
Is == enough, or do we need === here?
@@ +585,5 @@
> + // If curPos is null, it means we need to reorder this node to the
> + // end of the child nodes. There's no need to do any special handling
> + // here, because insertBefore(.., null) already does the right thing
> + // that we are expecting (i.e. moving it to the end like appendChild).
> + child.nodeRef.parentNode.insertBefore(child.nodeRef, curPos);
child.nodeRef.parentNode can be shortened to domNode
I'm also wondering if using a temporary variable for child.nodeRef would help readability (it's used 5 times).
@@ +597,5 @@
> +
> + } else if (child instanceof TranslationItem_NodePlaceholder) {
> + // If the current item is a placeholder node, we need to move
> + // our pointer "past" it, until we find a suitable position
> + // for the next item that we will translate.
I'm not sure I understand this part. Shouldn't we move forward until we find something that would have been transformed to a placeholder... and remove all the text nodes we encounter while going forward?
@@ +602,5 @@
> +
> + while (curPos &&
> + (curPos.nodeType != TEXT_NODE ||
> + curPos.nodeValue.trim() == "")
> + ) {
nit: ) { on the previous line
@@ +619,5 @@
> + // let's create a new one.
> + if (!curPos) {
> + // We don't know if the original content had a space or not,
> + // so the best bet is to create the text node with " " which
> + // will added a space at the beginning and at the end of the
I can't parse "which will added a space".
@@ +621,5 @@
> + // We don't know if the original content had a space or not,
> + // so the best bet is to create the text node with " " which
> + // will added a space at the beginning and at the end of the
> + // content.
> + domNode.appendChild(domNode.ownerDocument.createTextNode(" "));
appendChild returns the added node, so you can do curPost = domNode.appendChild(...)
@@ +647,5 @@
> }
> }
>
> +function clearRemainingNonEmptyTextNodesFromElement(startSibling) {
> +let item = startSibling;
fix the indent.
Attachment #8434683 -
Flags: review?(florian)
Attachment #8434683 -
Flags: review-
Attachment #8434683 -
Flags: feedback+
Comment 6•11 years ago
|
||
Comment on attachment 8434676 [details] [diff] [review]
Part 1 - Keep information about placeholder nodes
Review of attachment 8434676 [details] [diff] [review]:
-----------------------------------------------------------------
::: browser/components/translation/TranslationDocument.jsm
@@ +348,5 @@
> +TranslationItem_NodePlaceholder.prototype = {
> + toString: function() {
> + return "[object TranslationItem_NodePlaceholder]";
> + }
> +}
Do we really need to create a new instances of this constant object each time we insert a placeholder?
The idea I have is:
const TranslationItem_NodePlaceholder = {
toString: function() "[object TranslationItem_NodePlaceholder]"
};
and then use === instead of instanceof.
Not sure this is really an improvement, but just thought I would mention it.
Attachment #8434676 -
Flags: review?(florian) → review+
| Assignee | ||
Comment 7•11 years ago
|
||
(In reply to Florian Quèze [:florian] [:flo] from comment #6)
> The idea I have is:
> const TranslationItem_NodePlaceholder = {
> toString: function() "[object TranslationItem_NodePlaceholder]"
> };
>
> and then use === instead of instanceof.
>
> Not sure this is really an improvement, but just thought I would mention it.
Definitely a good suggestion, I've made this change.
| Assignee | ||
Comment 8•11 years ago
|
||
(In reply to Florian Quèze [:florian] [:flo] from comment #5)
> Comment on attachment 8434683 [details] [diff] [review]
> Part 2 - Reorder nodes when swapping content
> > + * is translated. If, it any given time, the pointer doesn't match the expected
>
> at any
ok
>
> @@ +453,5 @@
> > + * pointer does not match the expected element, <b>. So let's move <b> to the
> > + * pointer position.
> > + *
> > + * Current state of the DOM:
> > + * <div><b>You</b>I <em>miss</em></div>
>
> Isn't there a text node with one space between </em> and </div>?
There is! Thanks. I fixed the example and added Step 4 explaining what happens to that space.
> @@ +516,5 @@
> > + // the array and be pointing to the correct node that needs to modified.
> > + // If it's not pointing to it, that means some sort of node reordering
> > + // will be necessary to produce the correct translation.
> > + // Note that text nodes don't need to be reordered, as we can just replace
> > + // the content of one text node with another.
>
> I wonder if not reordering the text nodes would make difficult a feature
> showing the original text when hovering the translation. I guess that can be
> figured out later if/when we decided to do such a thing :-).
For this case of non-simpleRoots (that requires reordering), displaying the content of a single text node would be strange anyway, because they are part of a larger phrase.. So for such feature we would need to display/highlight the entire root at once.
> @@ +562,4 @@
> > // Now let's walk through all items in the `target` array of the
> > // TranslationItem. This means either the TranslationItem.original or
> > // TranslationItem.translation array.
> > for (let child of curItem[target]) {
>
> child is an ambiguous name here. A name containing the word "target" may
> clarify a bit.
Yeah totally, I changed it to targetItem. And also aliased targetItem.nodeRef to targetNode, making it much simpler to read this block.
>
> @@ +519,5 @@
> > + // Note that text nodes don't need to be reordered, as we can just replace
> > + // the content of one text node with another.
> > + //
> > + // curPos starts in the firstChild...
> > + let curPos = domNode.firstChild;
>
> What about naming this curNode rather than curPos? When I read curPos I
> expect a numeric index.
I can use either.. I tend to think of this variable as a sort of numeric index, rather than just a reference to a node, because its purpose is not to reference to a node but to point to a position in the childNodes list. I thought that curPos conveys that concept better than curNode.
But I changed it to curNode in this version of the patch so you can look at it and see what looks best. I can keep it or revert when landing.
>
> @@ +571,5 @@
> > // Adding this child to the stack.
> > visitStack.push(child);
> > +
> > + // If the node has not gone away,
> > + if (child.nodeRef &&
>
> I don't see how a node going away would cause .nodeRef to become null. Did
> you mean to null check child.nodeRef.parentNode?
Yeah I did, I wanted to check if the node has been removed from the document. targetNode.parentNode is the right check.
Now, do you think it's necessary to check targetNode for null too? I don't see a way for it to become null, but maybe the peace of mind is worth the check..? I don't want the loop to break because of it
>
> @@ +579,5 @@
> > + // ..unless the page has reparented this node under a totally
> > + // different node. In this case, all bets are off on being able
> > + // to do anything correctly, so it's better not to bring back
> > + // the node to this parent.
> > + child.nodeRef.parentNode == domNode) {
>
> Is == enough, or do we need === here?
== is enough, there's no chance of type coercion getting involved here.
Now, for the TranslationItem_NodePlaceholder comparison, === is necessary: because of the toString function of the object, == would match a string targetItem that is "[object TranslationItem_NodePlaceholder]".
>
> @@ +597,5 @@
> > +
> > + } else if (child instanceof TranslationItem_NodePlaceholder) {
> > + // If the current item is a placeholder node, we need to move
> > + // our pointer "past" it, until we find a suitable position
> > + // for the next item that we will translate.
> > +
> > + while (curPos &&
> > + (curPos.nodeType != TEXT_NODE ||
> > + curPos.nodeValue.trim() == "")
> > + ) {
>
> I'm not sure I understand this part. Shouldn't we move forward until we find
> something that would have been transformed to a placeholder... and remove
> all the text nodes we encounter while going forward?
Yeah, this is a bit complicated. I've been trying to wrap my head around it as well.. There's some simplification going on here: I had more checks inside it but I couldn't find a situation where these ones are not enough.
The trick is that during the middle of the process, the element nodes do not need to be in their correct position (except the ones already put in position).. Even if we skip an element that is not a placeholder in this loop, it will be pulled to its correct position later when we get to its targetItem later in the process. So the only thing we need to do here is jump from one side of a block of "element nodes + empty text nodes" to the other.
I expanded the comment in the patch. If we find testcases where this turn out to be wrong, we can rethink how this loop needs to be implemented to take that new testcase into account.
> @@ +619,5 @@
> > + // let's create a new one.
> > + if (!curPos) {
> > + // We don't know if the original content had a space or not,
> > + // so the best bet is to create the text node with " " which
> > + // will added a space at the beginning and at the end of the
>
> I can't parse "which will added a space".
fixed: which will add a space
Another change that I added here is that I removed the quotes around id names from the generateHTMLForTranslation function. I verified that Bing's service work correctly without them, and each " would get escaped to ", so this saves some significant amount of data sent.
Attachment #8434683 -
Attachment is obsolete: true
Attachment #8435527 -
Flags: review?(florian)
Comment 9•11 years ago
|
||
Comment on attachment 8435527 [details] [diff] [review]
Part 2 - Reorder nodes when swapping content
Review of attachment 8435527 [details] [diff] [review]:
-----------------------------------------------------------------
I can't promise I haven't missed any edge case while reviewing... but this seems like it will be a huge improvement compared to the current behavior! :-)
::: browser/components/translation/TranslationDocument.jsm
@@ +575,5 @@
> +
> + let targetNode = targetItem.nodeRef;
> +
> + // If the node has not been removed from the document,
> + if (targetNode.parentNode &&
Do we actually need this test? (It's already covered by |targetNode.parentNode == domNode|, right?)
@@ +586,5 @@
> + // the node to this parent.
> + targetNode.parentNode == domNode) {
> +
> + // If curNode is null, it means we need to reorder this node to the
> + // end of the child nodes. There's no need to do any special handling
The wording of this comment is slightly confusing (after reading the beginning of this comment I expect the code after the comment to start with if (!curNode))
What about phrasing it like:
We don't need to null-check curNode because insertBefore(..., null) does what we need: reorder this node to the end of child nodes.
@@ +595,5 @@
> + }
> +
> + // Move pointer forward
> + if (curNode) {
> + curNode = getNextSiblingSkippingEmptyTextNodes(curNode);
We discussed it less than an hour ago, but I'm already not really sure any more of why we are skipping empty text nodes here, so I think this really deserves a comment.
@@ +603,5 @@
> + // If the current item is a placeholder node, we need to move
> + // our pointer "past" it, jumping from one side of a block of
> + // elements + empty text nodes to the other side. Even if
> + // non-placeholder elements exists inside the jumped block,
> + // they will be pulled correct later in the process when the
correctly
@@ +605,5 @@
> + // elements + empty text nodes to the other side. Even if
> + // non-placeholder elements exists inside the jumped block,
> + // they will be pulled correct later in the process when the
> + // targetItem for those nodes are handled.
> +
Shouldn't we clear all text nodes we encounter here until we find a placeholder?
Testcase:
original: foo<img>
translation: <img>bar
I think with the current patch we are showing: bar<img>
Attachment #8435527 -
Flags: review?(florian) → review+
| Assignee | ||
Comment 10•11 years ago
|
||
Comment on attachment 8435527 [details] [diff] [review]
Part 2 - Reorder nodes when swapping content
Review of attachment 8435527 [details] [diff] [review]:
-----------------------------------------------------------------
::: browser/components/translation/TranslationDocument.jsm
@@ +575,5 @@
> +
> + let targetNode = targetItem.nodeRef;
> +
> + // If the node has not been removed from the document,
> + if (targetNode.parentNode &&
right
@@ +586,5 @@
> + // the node to this parent.
> + targetNode.parentNode == domNode) {
> +
> + // If curNode is null, it means we need to reorder this node to the
> + // end of the child nodes. There's no need to do any special handling
done
@@ +595,5 @@
> + }
> +
> + // Move pointer forward
> + if (curNode) {
> + curNode = getNextSiblingSkippingEmptyTextNodes(curNode);
done
@@ +603,5 @@
> + // If the current item is a placeholder node, we need to move
> + // our pointer "past" it, jumping from one side of a block of
> + // elements + empty text nodes to the other side. Even if
> + // non-placeholder elements exists inside the jumped block,
> + // they will be pulled correct later in the process when the
done
@@ +605,5 @@
> + // elements + empty text nodes to the other side. Even if
> + // non-placeholder elements exists inside the jumped block,
> + // they will be pulled correct later in the process when the
> + // targetItem for those nodes are handled.
> +
Turns out this will have some further complications so I'll split it off to a different bug. I haven't seen examples of this causing problems in the wild (because it's rare for the placeholders to be significantly reordered during translation), only in artificial testcases.
| Assignee | ||
Comment 11•11 years ago
|
||
Comment 12•11 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/f0f129affe2f
https://hg.mozilla.org/mozilla-central/rev/09f8e14a2349
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 32
Comment 13•11 years ago
|
||
Hi Florin, can a QA Contact be assigned for verification of this bug.
Flags: needinfo?(florin.mezei)
| Assignee | ||
Comment 14•11 years ago
|
||
Tests split off to bug 1022729
Updated•11 years ago
|
Whiteboard: p=8 s=it-32c-31a-30b.3 [qa+] → p=8 s=33.1 [qa+]
| Assignee | ||
Comment 15•11 years ago
|
||
This build can be used to verify this bug: https://ftp-ssl.mozilla.org/pub/mozilla.org/firefox/try-builds/felipc@gmail.com-e078627d18c6
Two websites that can be used to verify it:
- http://www.pseudotecnico.org/blog/2014/05/10/il-fotografo-della-domenica-130/
Go to the bottom of the page, and pay attention to the footer text. Make sure that the link "licenza"/"license" stays inside the parentheses
- http://www.mobile.de
Go to near the bottom of the page, where there are two lines that says
"Das kann jeder erkennen
> Jetzt lesen"
Translate page, switch between Original/Translation a few times, and make sure that the layout of those lines were preserved (without this patch the two lines would be merged together, with the patch the <br> between them are kept correctly)
Updated•11 years ago
|
Flags: needinfo?(florin.mezei)
QA Contact: bogdan.maris
(In reply to :Felipe Gomes from comment #16)
> Maybe Bogdan could verify it?
Sure thing.
(In reply to :Felipe Gomes from comment #15)
> This build can be used to verify this bug:
> https://ftp-ssl.mozilla.org/pub/mozilla.org/firefox/try-builds/felipc@gmail.
> com-e078627d18c6
>
> Two websites that can be used to verify it:
> -
> http://www.pseudotecnico.org/blog/2014/05/10/il-fotografo-della-domenica-130/
> Go to the bottom of the page, and pay attention to the footer text. Make
> sure that the link "licenza"/"license" stays inside the parentheses
>
> - http://www.mobile.de
> Go to near the bottom of the page, where there are two lines that says
> "Das kann jeder erkennen
> > Jetzt lesen"
> Translate page, switch between Original/Translation a few times, and make
> sure that the layout of those lines were preserved (without this patch the
> two lines would be merged together, with the patch the <br> between them are
> kept correctly)
Verified on both examples that the issue is fixed using the trybuild on Windows 8.1 64bit, Windows 7 64bit, Mac OS X 10.9.2, Ubuntu 14.04 32bit and Ubuntu 13.04 64bit.
Status: RESOLVED → VERIFIED
Flags: needinfo?(bogdan.maris)
Whiteboard: p=8 s=33.1 [qa+] → p=8 s=33.1 [qa!]
You need to log in
before you can comment on or make changes to this bug.
Description
•