Bug 1998089 Comment 6 Edit History

Note: The actual edited comment in the bug view page will always show the original commenter’s name and original timestamp.

## Analysis

### Full Diff

I'm looking at the diff of my commits here:

```
❯ git diff --stat 5ef169b4891fc4c7947f0064f63678ec205b19c7^..8cd5d52a46f898a44589239a49ec90aec22a6e18
```

```
 browser/components/translations/tests/browser/browser_translations_full_page_telemetry_auto_translate.js            | 18 ++++++++++++++++++
 browser/components/translations/tests/browser/browser_translations_full_page_telemetry_basics.js                    | 18 ++++++++++++++++++
 browser/components/translations/tests/browser/browser_translations_full_page_telemetry_open_panel.js                | 18 ++++++++++++++++++
 browser/components/translations/tests/browser/browser_translations_full_page_telemetry_panel_auto_offer.js          | 18 ++++++++++++++++++
 browser/components/translations/tests/browser/browser_translations_full_page_telemetry_panel_auto_offer_settings.js | 18 ++++++++++++++++++
 browser/components/translations/tests/browser/browser_translations_full_page_telemetry_retranslate.js               | 18 ++++++++++++++++++
 browser/components/translations/tests/browser/browser_translations_full_page_telemetry_switch_languages.js          | 36 ++++++++++++++++++++++++++++++++++++
 browser/components/translations/tests/browser/browser_translations_full_page_telemetry_translation_failure.js       | 72 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 browser/components/translations/tests/browser/browser_translations_full_page_telemetry_translation_request.js       | 36 ++++++++++++++++++++++++++++++++++++
 browser/components/translations/tests/browser/browser_translations_full_page_telemetry_unsupported_lang.js          |  7 +++++++
 toolkit/components/translations/TranslationsTelemetry.sys.mjs                                                       | 43 +++++++++++++++++++++++++++++++++++++++++++
 toolkit/components/translations/actors/TranslationsChild.sys.mjs                                                    | 23 +++++++++++++++++++++++
 toolkit/components/translations/actors/TranslationsParent.sys.mjs                                                   | 97 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++------------------------------
 toolkit/components/translations/metrics.yaml                                                                        | 43 +++++++++++++++++++++++++++++++++++++++++++
 toolkit/components/translations/tests/browser/shared-head.js                                                        |  4 ++--
 15 files changed, 437 insertions(+), 32 deletions(-)
```

---

### Ignore Tests

Let's ignore anything in the `tests` directory, because that certainly isn't it. 

```
❯ git diff --stat 5ef169b4891fc4c7947f0064f63678ec205b19c7^..8cd5d52a46f898a44589239a49ec90aec22a6e18 -- ':(exclude)*tests*'
```
```
 toolkit/components/translations/TranslationsTelemetry.sys.mjs     | 43 +++++++++++++++++++++++++++++++++++++++++++
 toolkit/components/translations/actors/TranslationsChild.sys.mjs  | 23 +++++++++++++++++++++++
 toolkit/components/translations/actors/TranslationsParent.sys.mjs | 97 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++------------------------------
 toolkit/components/translations/metrics.yaml                      | 43 +++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 176 insertions(+), 30 deletions(-)
```

### Ignore Telemetry

I've added plenty of telemetry metrics without any negative effect, so I'm assuming that isn't it, but I'll send a quick NI to :chutten just to confirm that this likely isn't it.

So that leaves just the `TranslationsParent` and `TranslationsChild` changes.

```
git diff --stat 5ef169b4891fc4c7947f0064f63678ec205b19c7^..8cd5d52a46f898a44589239a49ec90aec22a6e18 -- \
  ':(exclude)*tests*' \
  ':(exclude)*Telemetry*' \
  ':(exclude)*metrics.yaml'
```

```
 toolkit/components/translations/actors/TranslationsChild.sys.mjs  | 23 +++++++++++++++++++++++
 toolkit/components/translations/actors/TranslationsParent.sys.mjs | 97 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++------------------------------
 2 files changed, 90 insertions(+), 30 deletions(-)
```

---

### Remaining Diff

Here is the full diff of only those changes:

```diff
diff --git a/toolkit/components/translations/actors/TranslationsChild.sys.mjs b/toolkit/components/translations/actors/TranslationsChild.sys.mjs
index 7e3cc0e80dd0..0f1daa5e09c6 100644
--- a/toolkit/components/translations/actors/TranslationsChild.sys.mjs
+++ b/toolkit/components/translations/actors/TranslationsChild.sys.mjs
@@ -78,6 +78,29 @@ export class TranslationsChild extends JSWindowActorChild {
         this.#translatedDoc?.enterLazyTranslationsMode();
         return undefined;
       }
+      case "Translations:ExtractPageText": {
+        const { document } = this;
+        if (!document) {
+          return "";
+        }
+
+        const { sufficientLength } = data;
+
+        const encoder = Cu.createDocumentEncoder("text/plain");
+        encoder.init(
+          document,
+          "text/plain",
+          Ci.nsIDocumentEncoder.OutputBodyOnly |
+            Ci.nsIDocumentEncoder.SkipInvisibleContent |
+            Ci.nsIDocumentEncoder.AllowCrossShadowBoundary |
+            Ci.nsIDocumentEncoder.OutputForPlainTextClipboardCopy |
+            Ci.nsIDocumentEncoder.OutputDisallowLineBreaking |
+            Ci.nsIDocumentEncoder.OutputDropInvisibleBreak |
+            Ci.nsIDocumentEncoder.OutputLFLineBreak
+        );
+
+        return encoder.encodeToStringWithMaxLength(sufficientLength);
+      }
       case "Translations:TranslatePage": {
         if (this.#translatedDoc?.engineStatus === "error") {
           this.#translatedDoc.destroy();
diff --git a/toolkit/components/translations/actors/TranslationsParent.sys.mjs b/toolkit/components/translations/actors/TranslationsParent.sys.mjs
index a7e1ff26110a..9cfbc9b5dcf3 100644
--- a/toolkit/components/translations/actors/TranslationsParent.sys.mjs
+++ b/toolkit/components/translations/actors/TranslationsParent.sys.mjs
@@ -3554,7 +3554,8 @@ export class TranslationsParent extends JSWindowActorParent {
   }

   /**
-   * Uses the page extractor to identify the current page's language.
+   * Extracts a substring of visible text from the content document and
+   * runs it through the language detector to determine the page's language.
    *
    * @returns {Promise<DetectionResult>}
    */
@@ -3563,53 +3564,89 @@ export class TranslationsParent extends JSWindowActorParent {
       return this.languageState.detectedLanguages.identified;
     }

-    const actor =
-      this.browsingContext?.currentWindowGlobal?.getActor("PageExtractor");
-
-    if (!actor) {
-      throw new Error("Unable to get the PageExtractor actor.");
-    }
+    lazy.console.log(
+      "Beginning text extraction:",
+      this.browsingContext?.currentURI?.spec
+    );

-    const startTime = ChromeUtils.now();
+    const extractionStartTime = ChromeUtils.now();
+    const pageText = await this.sendQuery("Translations:ExtractPageText", {
+      sufficientLength: 4096,
+    });

-    // Manual profiling on 10 page loads of https://es.wikipedia.org/wiki/Felis_catus:
-    // -------------------------------------------------------------------------------
-    //
-    //   No limit: 2064 samples, 224/237/294 [min/med/max]ms (~85k code units)
-    // 8192 limit:  681 samples,  75/ 87/128 [min/med/max]ms
-    // 4096 limit:  457 samples,  51/ 55/ 97 [min/med/max]ms
-    // 2048 limit:  240 samples,  29/ 39/ 64 [min/med/max]ms
-    // 1024 limit:  142 samples,  19/ 28/ 58 [min/med/max]ms
-    //
-    // 2048 Code units feels like a decent length for performance and sample size.
-    const pageText = await actor.getText({ sufficientLength: 2048 });
     if (this.#isDestroyed) {
-      return { language: "", confident: false, languages: [] };
+      return { language: "en", confident: false, languages: [] };
     }

+    const extractionTime = ChromeUtils.now() - extractionStartTime;
+
+    lazy.console.debug(
+      `Extracted Page Text (${pageText.length} code units):\n\n`,
+      pageText
+    );
+
+    const extractionLog =
+      `Extracted ${pageText.length} code units of text in ` +
+      `${extractionTime.toFixed(3)} ms.`;
+
+    lazy.console.log(extractionLog);
+    ChromeUtils.addProfilerMarker(
+      "TranslationsParent",
+      { startTime: extractionStartTime, innerWindowId: this.innerWindowId },
+      extractionLog
+    );
+
+    const identificationStartTime = ChromeUtils.now();
     const result = await lazy.LanguageDetector.detectLanguage(pageText);
+
     if (this.#isDestroyed) {
-      return { language: "", confident: false, languages: [] };
+      return { language: "en", confident: false, languages: [] };
     }

-    const message =
-      `Identified page language as "${result.language}" ` +
-      `in ${((ChromeUtils.now() - startTime) / 1000).toFixed(3)} seconds: ` +
-      this.browsingContext?.currentURI?.spec;
+    const identificationTime = ChromeUtils.now() - identificationStartTime;
+    const identificationLog =
+      `Identified ${pageText.length} code units of text as "${result.language}" ` +
+      `in ${identificationTime.toFixed(3)} ms.`;

+    lazy.console.log(identificationLog);
     ChromeUtils.addProfilerMarker(
       "TranslationsParent",
-      { startTime, innerWindowId: this.innerWindowId },
-      message
+      { startTime: identificationStartTime, innerWindowId: this.innerWindowId },
+      identificationLog
+    );
+    ChromeUtils.addProfilerMarker(
+      "TranslationsParent",
+      { startTime: extractionStartTime, innerWindowId: this.innerWindowId },
+      "Total time to identify page language."
     );
-
-    lazy.console.debug("\nExtracted Page Text:\n\n", pageText);
-    lazy.console.log(message);

     if (pageText.length < TranslationsParent.#DOC_CONFIDENCE_THRESHOLD) {
       result.confident = false;
     }

+    const htmlLangAttribute =
+      this.languageState?.detectedLanguages?.htmlLangAttribute ?? null;
+    const identifiedLanguage = result.language;
+
+    TranslationsParent.telemetry().onIdentifyPageLanguage({
+      htmlLangAttribute,
+      identifiedLanguage,
+      langTagsMatch: htmlLangAttribute
+        ? lazy.TranslationsUtils.langTagsMatch(
+            htmlLangAttribute,
+            identifiedLanguage
+          )
+        : null,
+      isLangAttributeValid: htmlLangAttribute
+        ? lazy.TranslationsUtils.isLangTagValid(htmlLangAttribute)
+        : null,
+      extractedCodeUnits: pageText.length,
+      extractionTime,
+      identificationTime,
+      totalTime: extractionTime + identificationTime,
+      confident: result.confident,
+    });
+
     return result;
   }
```

Now, I know that adding lines to `TranslationsChild` can be a bit sensitive to performance, since the child actor is loaded in every content tab. But I wouldn't expect it to affect installer size.

I should also note that the intent of this patch is to move the page-extraction algorithm _back_ to using `nsIDocumentEncoder`, which it has been using for years without issue. 

In Bug 1967758 I moved away from `nsIDocumentEncoder`, but I did more performance testing, and realized that `nsIDcoumentEncoder` was still faster, so this patch moves it back.

(This all occurred within the same Nightly cycle.)

Bob, I'm sorry for the cold-call NI, but I know you are familiar with Windows. Feel free to redirect if needed. 

I genuinely am not sure what in this diff would affect Windows installer size by 481.3 KB. Do you see anything?
## Analysis

### Full Diff

I'm looking at the diff of my commits here:

```
❯ git diff --stat 5ef169b4891fc4c7947f0064f63678ec205b19c7^..8cd5d52a46f898a44589239a49ec90aec22a6e18
```

```
 browser/components/translations/tests/browser/browser_translations_full_page_telemetry_auto_translate.js            | 18 ++++++++++++++++++
 browser/components/translations/tests/browser/browser_translations_full_page_telemetry_basics.js                    | 18 ++++++++++++++++++
 browser/components/translations/tests/browser/browser_translations_full_page_telemetry_open_panel.js                | 18 ++++++++++++++++++
 browser/components/translations/tests/browser/browser_translations_full_page_telemetry_panel_auto_offer.js          | 18 ++++++++++++++++++
 browser/components/translations/tests/browser/browser_translations_full_page_telemetry_panel_auto_offer_settings.js | 18 ++++++++++++++++++
 browser/components/translations/tests/browser/browser_translations_full_page_telemetry_retranslate.js               | 18 ++++++++++++++++++
 browser/components/translations/tests/browser/browser_translations_full_page_telemetry_switch_languages.js          | 36 ++++++++++++++++++++++++++++++++++++
 browser/components/translations/tests/browser/browser_translations_full_page_telemetry_translation_failure.js       | 72 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 browser/components/translations/tests/browser/browser_translations_full_page_telemetry_translation_request.js       | 36 ++++++++++++++++++++++++++++++++++++
 browser/components/translations/tests/browser/browser_translations_full_page_telemetry_unsupported_lang.js          |  7 +++++++
 toolkit/components/translations/TranslationsTelemetry.sys.mjs                                                       | 43 +++++++++++++++++++++++++++++++++++++++++++
 toolkit/components/translations/actors/TranslationsChild.sys.mjs                                                    | 23 +++++++++++++++++++++++
 toolkit/components/translations/actors/TranslationsParent.sys.mjs                                                   | 97 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++------------------------------
 toolkit/components/translations/metrics.yaml                                                                        | 43 +++++++++++++++++++++++++++++++++++++++++++
 toolkit/components/translations/tests/browser/shared-head.js                                                        |  4 ++--
 15 files changed, 437 insertions(+), 32 deletions(-)
```

---

### Ignore Tests

Let's ignore anything in the `tests` directory, because that certainly isn't it. 

```
❯ git diff --stat 5ef169b4891fc4c7947f0064f63678ec205b19c7^..8cd5d52a46f898a44589239a49ec90aec22a6e18 -- ':(exclude)*tests*'
```
```
 toolkit/components/translations/TranslationsTelemetry.sys.mjs     | 43 +++++++++++++++++++++++++++++++++++++++++++
 toolkit/components/translations/actors/TranslationsChild.sys.mjs  | 23 +++++++++++++++++++++++
 toolkit/components/translations/actors/TranslationsParent.sys.mjs | 97 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++------------------------------
 toolkit/components/translations/metrics.yaml                      | 43 +++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 176 insertions(+), 30 deletions(-)
```

---

### Ignore Telemetry

I've added plenty of telemetry metrics without any negative effect, so I'm assuming that isn't it, but I'll send a quick NI to :chutten just to confirm that this likely isn't it.

So that leaves just the `TranslationsParent` and `TranslationsChild` changes.

```
git diff --stat 5ef169b4891fc4c7947f0064f63678ec205b19c7^..8cd5d52a46f898a44589239a49ec90aec22a6e18 -- \
  ':(exclude)*tests*' \
  ':(exclude)*Telemetry*' \
  ':(exclude)*metrics.yaml'
```

```
 toolkit/components/translations/actors/TranslationsChild.sys.mjs  | 23 +++++++++++++++++++++++
 toolkit/components/translations/actors/TranslationsParent.sys.mjs | 97 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++------------------------------
 2 files changed, 90 insertions(+), 30 deletions(-)
```

---

### Remaining Diff

Here is the full diff of only those changes:

```diff
diff --git a/toolkit/components/translations/actors/TranslationsChild.sys.mjs b/toolkit/components/translations/actors/TranslationsChild.sys.mjs
index 7e3cc0e80dd0..0f1daa5e09c6 100644
--- a/toolkit/components/translations/actors/TranslationsChild.sys.mjs
+++ b/toolkit/components/translations/actors/TranslationsChild.sys.mjs
@@ -78,6 +78,29 @@ export class TranslationsChild extends JSWindowActorChild {
         this.#translatedDoc?.enterLazyTranslationsMode();
         return undefined;
       }
+      case "Translations:ExtractPageText": {
+        const { document } = this;
+        if (!document) {
+          return "";
+        }
+
+        const { sufficientLength } = data;
+
+        const encoder = Cu.createDocumentEncoder("text/plain");
+        encoder.init(
+          document,
+          "text/plain",
+          Ci.nsIDocumentEncoder.OutputBodyOnly |
+            Ci.nsIDocumentEncoder.SkipInvisibleContent |
+            Ci.nsIDocumentEncoder.AllowCrossShadowBoundary |
+            Ci.nsIDocumentEncoder.OutputForPlainTextClipboardCopy |
+            Ci.nsIDocumentEncoder.OutputDisallowLineBreaking |
+            Ci.nsIDocumentEncoder.OutputDropInvisibleBreak |
+            Ci.nsIDocumentEncoder.OutputLFLineBreak
+        );
+
+        return encoder.encodeToStringWithMaxLength(sufficientLength);
+      }
       case "Translations:TranslatePage": {
         if (this.#translatedDoc?.engineStatus === "error") {
           this.#translatedDoc.destroy();
diff --git a/toolkit/components/translations/actors/TranslationsParent.sys.mjs b/toolkit/components/translations/actors/TranslationsParent.sys.mjs
index a7e1ff26110a..9cfbc9b5dcf3 100644
--- a/toolkit/components/translations/actors/TranslationsParent.sys.mjs
+++ b/toolkit/components/translations/actors/TranslationsParent.sys.mjs
@@ -3554,7 +3554,8 @@ export class TranslationsParent extends JSWindowActorParent {
   }

   /**
-   * Uses the page extractor to identify the current page's language.
+   * Extracts a substring of visible text from the content document and
+   * runs it through the language detector to determine the page's language.
    *
    * @returns {Promise<DetectionResult>}
    */
@@ -3563,53 +3564,89 @@ export class TranslationsParent extends JSWindowActorParent {
       return this.languageState.detectedLanguages.identified;
     }

-    const actor =
-      this.browsingContext?.currentWindowGlobal?.getActor("PageExtractor");
-
-    if (!actor) {
-      throw new Error("Unable to get the PageExtractor actor.");
-    }
+    lazy.console.log(
+      "Beginning text extraction:",
+      this.browsingContext?.currentURI?.spec
+    );

-    const startTime = ChromeUtils.now();
+    const extractionStartTime = ChromeUtils.now();
+    const pageText = await this.sendQuery("Translations:ExtractPageText", {
+      sufficientLength: 4096,
+    });

-    // Manual profiling on 10 page loads of https://es.wikipedia.org/wiki/Felis_catus:
-    // -------------------------------------------------------------------------------
-    //
-    //   No limit: 2064 samples, 224/237/294 [min/med/max]ms (~85k code units)
-    // 8192 limit:  681 samples,  75/ 87/128 [min/med/max]ms
-    // 4096 limit:  457 samples,  51/ 55/ 97 [min/med/max]ms
-    // 2048 limit:  240 samples,  29/ 39/ 64 [min/med/max]ms
-    // 1024 limit:  142 samples,  19/ 28/ 58 [min/med/max]ms
-    //
-    // 2048 Code units feels like a decent length for performance and sample size.
-    const pageText = await actor.getText({ sufficientLength: 2048 });
     if (this.#isDestroyed) {
-      return { language: "", confident: false, languages: [] };
+      return { language: "en", confident: false, languages: [] };
     }

+    const extractionTime = ChromeUtils.now() - extractionStartTime;
+
+    lazy.console.debug(
+      `Extracted Page Text (${pageText.length} code units):\n\n`,
+      pageText
+    );
+
+    const extractionLog =
+      `Extracted ${pageText.length} code units of text in ` +
+      `${extractionTime.toFixed(3)} ms.`;
+
+    lazy.console.log(extractionLog);
+    ChromeUtils.addProfilerMarker(
+      "TranslationsParent",
+      { startTime: extractionStartTime, innerWindowId: this.innerWindowId },
+      extractionLog
+    );
+
+    const identificationStartTime = ChromeUtils.now();
     const result = await lazy.LanguageDetector.detectLanguage(pageText);
+
     if (this.#isDestroyed) {
-      return { language: "", confident: false, languages: [] };
+      return { language: "en", confident: false, languages: [] };
     }

-    const message =
-      `Identified page language as "${result.language}" ` +
-      `in ${((ChromeUtils.now() - startTime) / 1000).toFixed(3)} seconds: ` +
-      this.browsingContext?.currentURI?.spec;
+    const identificationTime = ChromeUtils.now() - identificationStartTime;
+    const identificationLog =
+      `Identified ${pageText.length} code units of text as "${result.language}" ` +
+      `in ${identificationTime.toFixed(3)} ms.`;

+    lazy.console.log(identificationLog);
     ChromeUtils.addProfilerMarker(
       "TranslationsParent",
-      { startTime, innerWindowId: this.innerWindowId },
-      message
+      { startTime: identificationStartTime, innerWindowId: this.innerWindowId },
+      identificationLog
+    );
+    ChromeUtils.addProfilerMarker(
+      "TranslationsParent",
+      { startTime: extractionStartTime, innerWindowId: this.innerWindowId },
+      "Total time to identify page language."
     );
-
-    lazy.console.debug("\nExtracted Page Text:\n\n", pageText);
-    lazy.console.log(message);

     if (pageText.length < TranslationsParent.#DOC_CONFIDENCE_THRESHOLD) {
       result.confident = false;
     }

+    const htmlLangAttribute =
+      this.languageState?.detectedLanguages?.htmlLangAttribute ?? null;
+    const identifiedLanguage = result.language;
+
+    TranslationsParent.telemetry().onIdentifyPageLanguage({
+      htmlLangAttribute,
+      identifiedLanguage,
+      langTagsMatch: htmlLangAttribute
+        ? lazy.TranslationsUtils.langTagsMatch(
+            htmlLangAttribute,
+            identifiedLanguage
+          )
+        : null,
+      isLangAttributeValid: htmlLangAttribute
+        ? lazy.TranslationsUtils.isLangTagValid(htmlLangAttribute)
+        : null,
+      extractedCodeUnits: pageText.length,
+      extractionTime,
+      identificationTime,
+      totalTime: extractionTime + identificationTime,
+      confident: result.confident,
+    });
+
     return result;
   }
```

Now, I know that adding lines to `TranslationsChild` can be a bit sensitive to performance, since the child actor is loaded in every content tab. But I wouldn't expect it to affect installer size.

I should also note that the intent of this patch is to move the page-extraction algorithm _back_ to using `nsIDocumentEncoder`, which it has been using for years without issue. 

In Bug 1967758 I moved away from `nsIDocumentEncoder`, but I did more performance testing, and realized that `nsIDcoumentEncoder` was still faster, so this patch moves it back.

(This all occurred within the same Nightly cycle.)

Bob, I'm sorry for the cold-call NI, but I know you are familiar with Windows. Feel free to redirect if needed. 

I genuinely am not sure what in this diff would affect Windows installer size by 481.3 KB. Do you see anything?

Back to Bug 1998089 Comment 6