Closed Bug 976556 Opened 6 years ago Closed 6 years ago

Parse translation results back into our data structure

Categories

(Firefox :: Translation, defect)

defect
Not set

Tracking

()

VERIFIED FIXED
Firefox 32

People

(Reporter: Felipe, Assigned: Felipe)

References

Details

(Whiteboard: [translation] p=5 s=it-32c-31a-30b.2 [qa-])

Attachments

(1 file, 1 obsolete file)

When a translation provider finishes a translation, the result received is a serialized HTML string with the translated content. We need to parse that string back into our TranslationDocument structure so that we have the data where we need to perform bug 976554.

This is being done in bug 971043 but it's good to have a separate bug for it.
Whiteboard: [translation] p=0
Whiteboard: [translation] p=0 → [translation] p=0 [diamond]
Felipe: This bug stayed unassigned after triage. Would you be willing to mentor a contributor through it?
Flags: needinfo?(felipc)
Not a good candidate for mentoring as I'm about to start working on this bug very soon, as a sequence from bug 971054
Flags: needinfo?(felipc)
Whiteboard: [translation] p=0 [diamond] → [translation] p=0
Whiteboard: [translation] p=0 → [translation] p=5
Assignee: nobody → felipc
Status: NEW → ASSIGNED
Whiteboard: [translation] p=5 → [translation] p=5 s=it-32c-31a-30b.1 [qa?]
No longer blocks: fxdesktopbacklog
Flags: firefox-backlog+
Whiteboard: [translation] p=5 s=it-32c-31a-30b.1 [qa?] → [translation] p=5 s=it-32c-31a-30b.1 [qa-]
Mass move of translation bugs to the new Translation component.
Component: General → Translation
Version: Trunk → unspecified
Attached patch Parse results (obsolete) — Splinter Review
This is an example response that we need to parse:
https://pastebin.mozilla.org/5136617

Note that there are two document parsing steps: we must parse the XML structure, and also the HTML which is text data on the TranslatedText XML nodes (for non-simple roots)

Hopefully the comments on the patch help explain the process. But it's simple. The more complicated part is the resulting structure in TranslationItem, so I expanded the comment a lot there explaining how it looks like at the end of the process.
Attachment #8420807 - Flags: review?(florian)
Whiteboard: [translation] p=5 s=it-32c-31a-30b.1 [qa-] → [translation] p=5 s=it-32c-31a-30b.2 [qa-]
Comment on attachment 8420807 [details] [diff] [review]
Parse results

Review of attachment 8420807 [details] [diff] [review]:
-----------------------------------------------------------------

::: browser/components/translation/BingTranslator.jsm
@@ +123,5 @@
>        maybeFinishedTranslation();
>        return;
>      }
>  
> +    if (this._parseChunkResult(request)) {

This change is already in one of the newest patches in bug 971054.

@@ +142,5 @@
> +   * @returns boolean      True if parsing of this chunk was successful.
> +   */
> +  _parseChunkResult: function(request) {
> +    let domParser = Cc["@mozilla.org/xmlextras/domparser;1"]
> +                      .createInstance(Ci.nsIDOMParser);

Do we know if creating a DOMParser instance is expensive? Each _parseChunkResult and each parseResult call creates one.

@@ +154,5 @@
> +    }
> +
> +    let len = results.length;
> +    if (len == 0 ||
> +        len != request.translationData.length) {

If we are sure we never send empty requests, the |len == 0| check can be skipped.

::: browser/components/translation/TranslationDocument.jsm
@@ +189,5 @@
> + *
> + *  English: <div id="n1">Welcome to <b id="n2">Mozilla's</b> website</div>
> + *  Portuguese: <div id="n1">Bem vindo a pagina <b id="n2">da Mozilla</b></div>
> + *
> + *  TranslationIem n1 = {

TranslationI_t_em

@@ +192,5 @@
> + *
> + *  TranslationIem n1 = {
> + *    id: 1,
> + *    original: ["Welcome to ", ptr to n2, " website"]
> + *    translation: [Bem vindo a pagina ", ptr to n2]

Missing " before 'Bem'.

@@ +269,5 @@
> + * walk down it repeating the process.
> + *
> + * For text nodes we simply add it as a string.
> + */
> +function parseResultNode: function(item, node) {

drop ": function"

Why isn't this a method of the TranslationItem object?

@@ +275,5 @@
> +  for (let child of node.childNodes) {
> +    if (child.nodeType == TEXT_NODE) {
> +      item.translation.push(child.nodeValue);
> +    } else {
> +      let foundChild = null;

Would it make this code more readable to add a getChildById method on TranslationItem objects?

@@ +283,5 @@
> +          break;
> +        }
> +      }
> +
> +      if (foundChild) {

Do we expect to sometimes not find the child? Should we make noise if it happens?
Attachment #8420807 - Flags: review?(florian)
Attachment #8420807 - Flags: review-
Attachment #8420807 - Flags: feedback+
Comment on attachment 8420807 [details] [diff] [review]
Parse results

Review of attachment 8420807 [details] [diff] [review]:
-----------------------------------------------------------------

::: browser/components/translation/BingTranslator.jsm
@@ +142,5 @@
> +   * @returns boolean      True if parsing of this chunk was successful.
> +   */
> +  _parseChunkResult: function(request) {
> +    let domParser = Cc["@mozilla.org/xmlextras/domparser;1"]
> +                      .createInstance(Ci.nsIDOMParser);

On IRC we determined it's likely not expensive, so I'd prefer to not leak the impl detail in the API by having to pass around a DOMParser as argument, or keeping one in the object and having to take care of its lifetime.

@@ +154,5 @@
> +    }
> +
> +    let len = results.length;
> +    if (len == 0 ||
> +        len != request.translationData.length) {

removed

::: browser/components/translation/TranslationDocument.jsm
@@ +189,5 @@
> + *
> + *  English: <div id="n1">Welcome to <b id="n2">Mozilla's</b> website</div>
> + *  Portuguese: <div id="n1">Bem vindo a pagina <b id="n2">da Mozilla</b></div>
> + *
> + *  TranslationIem n1 = {

thx

@@ +192,5 @@
> + *
> + *  TranslationIem n1 = {
> + *    id: 1,
> + *    original: ["Welcome to ", ptr to n2, " website"]
> + *    translation: [Bem vindo a pagina ", ptr to n2]

ok

@@ +269,5 @@
> + * walk down it repeating the process.
> + *
> + * For text nodes we simply add it as a string.
> + */
> +function parseResultNode: function(item, node) {

since it's a recursive function I usually tend to keep it as a "helper" style.

@@ +275,5 @@
> +  for (let child of node.childNodes) {
> +    if (child.nodeType == TEXT_NODE) {
> +      item.translation.push(child.nodeValue);
> +    } else {
> +      let foundChild = null;

yes it would...

@@ +283,5 @@
> +          break;
> +        }
> +      }
> +
> +      if (foundChild) {

I have not seen this happen, so this is more to protect against the server hypothetically sending bad data. I don't think any noise needs to be made as there's nothing useful we can do in this situation if it happens.
(I did the replies on Splinter so the comment above doesn't properly quote you in the show_bug.cgi page)
Attachment #8420807 - Attachment is obsolete: true
Attachment #8423570 - Flags: review?(florian)
Comment on attachment 8420807 [details] [diff] [review]
Parse results

Review of attachment 8420807 [details] [diff] [review]:
-----------------------------------------------------------------

::: browser/components/translation/TranslationDocument.jsm
@@ +269,5 @@
> + * walk down it repeating the process.
> + *
> + * For text nodes we simply add it as a string.
> + */
> +function parseResultNode: function(item, node) {

I don't see how the function being recursive makes it belong less to the object, but if it's an intentional decision, ok...
Comment on attachment 8423570 [details] [diff] [review]
trParseResultsWorking

Review of attachment 8423570 [details] [diff] [review]:
-----------------------------------------------------------------

::: browser/components/translation/TranslationDocument.jsm
@@ +197,5 @@
> + *  TranslationItem n2 = {
> + *    id: 2,
> + *    original: ["Mozilla's"],
> + *    translation: ["da Mozilla"]
> + *  } 

Trailing whitespace.

@@ +259,5 @@
> +  getChildById: function(id) {
> +    let foundChild = null;
> +    for (let child of item.children) {
> +      if (("n" + child.id) == id) {
> +        foundChild = child;

return child; (and then you don't need the foundChild variable anymore).
Attachment #8423570 - Flags: review?(florian) → review+
https://hg.mozilla.org/mozilla-central/rev/e10896f3a7af
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 32
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.