## Analysis
### Full Diff
I'm looking at the diff of my commits here:
```
❯ git diff --stat 5ef169b4891fc4c7947f0064f63678ec205b19c7^..8cd5d52a46f898a44589239a49ec90aec22a6e18
```
```
browser/components/translations/tests/browser/browser_translations_full_page_telemetry_auto_translate.js | 18 ++++++++++++++++++
browser/components/translations/tests/browser/browser_translations_full_page_telemetry_basics.js | 18 ++++++++++++++++++
browser/components/translations/tests/browser/browser_translations_full_page_telemetry_open_panel.js | 18 ++++++++++++++++++
browser/components/translations/tests/browser/browser_translations_full_page_telemetry_panel_auto_offer.js | 18 ++++++++++++++++++
browser/components/translations/tests/browser/browser_translations_full_page_telemetry_panel_auto_offer_settings.js | 18 ++++++++++++++++++
browser/components/translations/tests/browser/browser_translations_full_page_telemetry_retranslate.js | 18 ++++++++++++++++++
browser/components/translations/tests/browser/browser_translations_full_page_telemetry_switch_languages.js | 36 ++++++++++++++++++++++++++++++++++++
browser/components/translations/tests/browser/browser_translations_full_page_telemetry_translation_failure.js | 72 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
browser/components/translations/tests/browser/browser_translations_full_page_telemetry_translation_request.js | 36 ++++++++++++++++++++++++++++++++++++
browser/components/translations/tests/browser/browser_translations_full_page_telemetry_unsupported_lang.js | 7 +++++++
toolkit/components/translations/TranslationsTelemetry.sys.mjs | 43 +++++++++++++++++++++++++++++++++++++++++++
toolkit/components/translations/actors/TranslationsChild.sys.mjs | 23 +++++++++++++++++++++++
toolkit/components/translations/actors/TranslationsParent.sys.mjs | 97 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++------------------------------
toolkit/components/translations/metrics.yaml | 43 +++++++++++++++++++++++++++++++++++++++++++
toolkit/components/translations/tests/browser/shared-head.js | 4 ++--
15 files changed, 437 insertions(+), 32 deletions(-)
```
---
### Ignore Tests
Let's ignore anything in the `tests` directory, because that certainly isn't it.
```
❯ git diff --stat 5ef169b4891fc4c7947f0064f63678ec205b19c7^..8cd5d52a46f898a44589239a49ec90aec22a6e18 -- ':(exclude)*tests*'
```
```
toolkit/components/translations/TranslationsTelemetry.sys.mjs | 43 +++++++++++++++++++++++++++++++++++++++++++
toolkit/components/translations/actors/TranslationsChild.sys.mjs | 23 +++++++++++++++++++++++
toolkit/components/translations/actors/TranslationsParent.sys.mjs | 97 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++------------------------------
toolkit/components/translations/metrics.yaml | 43 +++++++++++++++++++++++++++++++++++++++++++
4 files changed, 176 insertions(+), 30 deletions(-)
```
### Ignore Telemetry
I've added plenty of telemetry metrics without any negative effect, so I'm assuming that isn't it, but I'll send a quick NI to :chutten just to confirm that this likely isn't it.
So that leaves just the `TranslationsParent` and `TranslationsChild` changes.
```
git diff --stat 5ef169b4891fc4c7947f0064f63678ec205b19c7^..8cd5d52a46f898a44589239a49ec90aec22a6e18 -- \
':(exclude)*tests*' \
':(exclude)*Telemetry*' \
':(exclude)*metrics.yaml'
```
```
toolkit/components/translations/actors/TranslationsChild.sys.mjs | 23 +++++++++++++++++++++++
toolkit/components/translations/actors/TranslationsParent.sys.mjs | 97 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++------------------------------
2 files changed, 90 insertions(+), 30 deletions(-)
```
---
### Remaining Diff
Here is the full diff of only those changes:
```diff
diff --git a/toolkit/components/translations/actors/TranslationsChild.sys.mjs b/toolkit/components/translations/actors/TranslationsChild.sys.mjs
index 7e3cc0e80dd0..0f1daa5e09c6 100644
--- a/toolkit/components/translations/actors/TranslationsChild.sys.mjs
+++ b/toolkit/components/translations/actors/TranslationsChild.sys.mjs
@@ -78,6 +78,29 @@ export class TranslationsChild extends JSWindowActorChild {
this.#translatedDoc?.enterLazyTranslationsMode();
return undefined;
}
+ case "Translations:ExtractPageText": {
+ const { document } = this;
+ if (!document) {
+ return "";
+ }
+
+ const { sufficientLength } = data;
+
+ const encoder = Cu.createDocumentEncoder("text/plain");
+ encoder.init(
+ document,
+ "text/plain",
+ Ci.nsIDocumentEncoder.OutputBodyOnly |
+ Ci.nsIDocumentEncoder.SkipInvisibleContent |
+ Ci.nsIDocumentEncoder.AllowCrossShadowBoundary |
+ Ci.nsIDocumentEncoder.OutputForPlainTextClipboardCopy |
+ Ci.nsIDocumentEncoder.OutputDisallowLineBreaking |
+ Ci.nsIDocumentEncoder.OutputDropInvisibleBreak |
+ Ci.nsIDocumentEncoder.OutputLFLineBreak
+ );
+
+ return encoder.encodeToStringWithMaxLength(sufficientLength);
+ }
case "Translations:TranslatePage": {
if (this.#translatedDoc?.engineStatus === "error") {
this.#translatedDoc.destroy();
diff --git a/toolkit/components/translations/actors/TranslationsParent.sys.mjs b/toolkit/components/translations/actors/TranslationsParent.sys.mjs
index a7e1ff26110a..9cfbc9b5dcf3 100644
--- a/toolkit/components/translations/actors/TranslationsParent.sys.mjs
+++ b/toolkit/components/translations/actors/TranslationsParent.sys.mjs
@@ -3554,7 +3554,8 @@ export class TranslationsParent extends JSWindowActorParent {
}
/**
- * Uses the page extractor to identify the current page's language.
+ * Extracts a substring of visible text from the content document and
+ * runs it through the language detector to determine the page's language.
*
* @returns {Promise<DetectionResult>}
*/
@@ -3563,53 +3564,89 @@ export class TranslationsParent extends JSWindowActorParent {
return this.languageState.detectedLanguages.identified;
}
- const actor =
- this.browsingContext?.currentWindowGlobal?.getActor("PageExtractor");
-
- if (!actor) {
- throw new Error("Unable to get the PageExtractor actor.");
- }
+ lazy.console.log(
+ "Beginning text extraction:",
+ this.browsingContext?.currentURI?.spec
+ );
- const startTime = ChromeUtils.now();
+ const extractionStartTime = ChromeUtils.now();
+ const pageText = await this.sendQuery("Translations:ExtractPageText", {
+ sufficientLength: 4096,
+ });
- // Manual profiling on 10 page loads of https://es.wikipedia.org/wiki/Felis_catus:
- // -------------------------------------------------------------------------------
- //
- // No limit: 2064 samples, 224/237/294 [min/med/max]ms (~85k code units)
- // 8192 limit: 681 samples, 75/ 87/128 [min/med/max]ms
- // 4096 limit: 457 samples, 51/ 55/ 97 [min/med/max]ms
- // 2048 limit: 240 samples, 29/ 39/ 64 [min/med/max]ms
- // 1024 limit: 142 samples, 19/ 28/ 58 [min/med/max]ms
- //
- // 2048 Code units feels like a decent length for performance and sample size.
- const pageText = await actor.getText({ sufficientLength: 2048 });
if (this.#isDestroyed) {
- return { language: "", confident: false, languages: [] };
+ return { language: "en", confident: false, languages: [] };
}
+ const extractionTime = ChromeUtils.now() - extractionStartTime;
+
+ lazy.console.debug(
+ `Extracted Page Text (${pageText.length} code units):\n\n`,
+ pageText
+ );
+
+ const extractionLog =
+ `Extracted ${pageText.length} code units of text in ` +
+ `${extractionTime.toFixed(3)} ms.`;
+
+ lazy.console.log(extractionLog);
+ ChromeUtils.addProfilerMarker(
+ "TranslationsParent",
+ { startTime: extractionStartTime, innerWindowId: this.innerWindowId },
+ extractionLog
+ );
+
+ const identificationStartTime = ChromeUtils.now();
const result = await lazy.LanguageDetector.detectLanguage(pageText);
+
if (this.#isDestroyed) {
- return { language: "", confident: false, languages: [] };
+ return { language: "en", confident: false, languages: [] };
}
- const message =
- `Identified page language as "${result.language}" ` +
- `in ${((ChromeUtils.now() - startTime) / 1000).toFixed(3)} seconds: ` +
- this.browsingContext?.currentURI?.spec;
+ const identificationTime = ChromeUtils.now() - identificationStartTime;
+ const identificationLog =
+ `Identified ${pageText.length} code units of text as "${result.language}" ` +
+ `in ${identificationTime.toFixed(3)} ms.`;
+ lazy.console.log(identificationLog);
ChromeUtils.addProfilerMarker(
"TranslationsParent",
- { startTime, innerWindowId: this.innerWindowId },
- message
+ { startTime: identificationStartTime, innerWindowId: this.innerWindowId },
+ identificationLog
+ );
+ ChromeUtils.addProfilerMarker(
+ "TranslationsParent",
+ { startTime: extractionStartTime, innerWindowId: this.innerWindowId },
+ "Total time to identify page language."
);
-
- lazy.console.debug("\nExtracted Page Text:\n\n", pageText);
- lazy.console.log(message);
if (pageText.length < TranslationsParent.#DOC_CONFIDENCE_THRESHOLD) {
result.confident = false;
}
+ const htmlLangAttribute =
+ this.languageState?.detectedLanguages?.htmlLangAttribute ?? null;
+ const identifiedLanguage = result.language;
+
+ TranslationsParent.telemetry().onIdentifyPageLanguage({
+ htmlLangAttribute,
+ identifiedLanguage,
+ langTagsMatch: htmlLangAttribute
+ ? lazy.TranslationsUtils.langTagsMatch(
+ htmlLangAttribute,
+ identifiedLanguage
+ )
+ : null,
+ isLangAttributeValid: htmlLangAttribute
+ ? lazy.TranslationsUtils.isLangTagValid(htmlLangAttribute)
+ : null,
+ extractedCodeUnits: pageText.length,
+ extractionTime,
+ identificationTime,
+ totalTime: extractionTime + identificationTime,
+ confident: result.confident,
+ });
+
return result;
}
```
Now, I know that adding lines to `TranslationsChild` can be a bit sensitive to performance, since the child actor is loaded in every content tab. But I wouldn't expect it to affect installer size.
I should also note that the intent of this patch is to move the page-extraction algorithm _back_ to using `nsIDocumentEncoder`, which it has been using for years without issue.
In Bug 1967758 I moved away from `nsIDocumentEncoder`, but I did more performance testing, and realized that `nsIDcoumentEncoder` was still faster, so this patch moves it back.
(This all occurred within the same Nightly cycle.)
Bob, I'm sorry for the cold-call NI, but I know you are familiar with Windows. Feel free to redirect if needed.
I genuinely am not sure what in this diff would affect Windows installer size by 481.3 KB. Do you see anything?
Bug 1998089 Comment 6 Edit History
Note: The actual edited comment in the bug view page will always show the original commenter’s name and original timestamp.
## Analysis
### Full Diff
I'm looking at the diff of my commits here:
```
❯ git diff --stat 5ef169b4891fc4c7947f0064f63678ec205b19c7^..8cd5d52a46f898a44589239a49ec90aec22a6e18
```
```
browser/components/translations/tests/browser/browser_translations_full_page_telemetry_auto_translate.js | 18 ++++++++++++++++++
browser/components/translations/tests/browser/browser_translations_full_page_telemetry_basics.js | 18 ++++++++++++++++++
browser/components/translations/tests/browser/browser_translations_full_page_telemetry_open_panel.js | 18 ++++++++++++++++++
browser/components/translations/tests/browser/browser_translations_full_page_telemetry_panel_auto_offer.js | 18 ++++++++++++++++++
browser/components/translations/tests/browser/browser_translations_full_page_telemetry_panel_auto_offer_settings.js | 18 ++++++++++++++++++
browser/components/translations/tests/browser/browser_translations_full_page_telemetry_retranslate.js | 18 ++++++++++++++++++
browser/components/translations/tests/browser/browser_translations_full_page_telemetry_switch_languages.js | 36 ++++++++++++++++++++++++++++++++++++
browser/components/translations/tests/browser/browser_translations_full_page_telemetry_translation_failure.js | 72 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
browser/components/translations/tests/browser/browser_translations_full_page_telemetry_translation_request.js | 36 ++++++++++++++++++++++++++++++++++++
browser/components/translations/tests/browser/browser_translations_full_page_telemetry_unsupported_lang.js | 7 +++++++
toolkit/components/translations/TranslationsTelemetry.sys.mjs | 43 +++++++++++++++++++++++++++++++++++++++++++
toolkit/components/translations/actors/TranslationsChild.sys.mjs | 23 +++++++++++++++++++++++
toolkit/components/translations/actors/TranslationsParent.sys.mjs | 97 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++------------------------------
toolkit/components/translations/metrics.yaml | 43 +++++++++++++++++++++++++++++++++++++++++++
toolkit/components/translations/tests/browser/shared-head.js | 4 ++--
15 files changed, 437 insertions(+), 32 deletions(-)
```
---
### Ignore Tests
Let's ignore anything in the `tests` directory, because that certainly isn't it.
```
❯ git diff --stat 5ef169b4891fc4c7947f0064f63678ec205b19c7^..8cd5d52a46f898a44589239a49ec90aec22a6e18 -- ':(exclude)*tests*'
```
```
toolkit/components/translations/TranslationsTelemetry.sys.mjs | 43 +++++++++++++++++++++++++++++++++++++++++++
toolkit/components/translations/actors/TranslationsChild.sys.mjs | 23 +++++++++++++++++++++++
toolkit/components/translations/actors/TranslationsParent.sys.mjs | 97 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++------------------------------
toolkit/components/translations/metrics.yaml | 43 +++++++++++++++++++++++++++++++++++++++++++
4 files changed, 176 insertions(+), 30 deletions(-)
```
---
### Ignore Telemetry
I've added plenty of telemetry metrics without any negative effect, so I'm assuming that isn't it, but I'll send a quick NI to :chutten just to confirm that this likely isn't it.
So that leaves just the `TranslationsParent` and `TranslationsChild` changes.
```
git diff --stat 5ef169b4891fc4c7947f0064f63678ec205b19c7^..8cd5d52a46f898a44589239a49ec90aec22a6e18 -- \
':(exclude)*tests*' \
':(exclude)*Telemetry*' \
':(exclude)*metrics.yaml'
```
```
toolkit/components/translations/actors/TranslationsChild.sys.mjs | 23 +++++++++++++++++++++++
toolkit/components/translations/actors/TranslationsParent.sys.mjs | 97 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++------------------------------
2 files changed, 90 insertions(+), 30 deletions(-)
```
---
### Remaining Diff
Here is the full diff of only those changes:
```diff
diff --git a/toolkit/components/translations/actors/TranslationsChild.sys.mjs b/toolkit/components/translations/actors/TranslationsChild.sys.mjs
index 7e3cc0e80dd0..0f1daa5e09c6 100644
--- a/toolkit/components/translations/actors/TranslationsChild.sys.mjs
+++ b/toolkit/components/translations/actors/TranslationsChild.sys.mjs
@@ -78,6 +78,29 @@ export class TranslationsChild extends JSWindowActorChild {
this.#translatedDoc?.enterLazyTranslationsMode();
return undefined;
}
+ case "Translations:ExtractPageText": {
+ const { document } = this;
+ if (!document) {
+ return "";
+ }
+
+ const { sufficientLength } = data;
+
+ const encoder = Cu.createDocumentEncoder("text/plain");
+ encoder.init(
+ document,
+ "text/plain",
+ Ci.nsIDocumentEncoder.OutputBodyOnly |
+ Ci.nsIDocumentEncoder.SkipInvisibleContent |
+ Ci.nsIDocumentEncoder.AllowCrossShadowBoundary |
+ Ci.nsIDocumentEncoder.OutputForPlainTextClipboardCopy |
+ Ci.nsIDocumentEncoder.OutputDisallowLineBreaking |
+ Ci.nsIDocumentEncoder.OutputDropInvisibleBreak |
+ Ci.nsIDocumentEncoder.OutputLFLineBreak
+ );
+
+ return encoder.encodeToStringWithMaxLength(sufficientLength);
+ }
case "Translations:TranslatePage": {
if (this.#translatedDoc?.engineStatus === "error") {
this.#translatedDoc.destroy();
diff --git a/toolkit/components/translations/actors/TranslationsParent.sys.mjs b/toolkit/components/translations/actors/TranslationsParent.sys.mjs
index a7e1ff26110a..9cfbc9b5dcf3 100644
--- a/toolkit/components/translations/actors/TranslationsParent.sys.mjs
+++ b/toolkit/components/translations/actors/TranslationsParent.sys.mjs
@@ -3554,7 +3554,8 @@ export class TranslationsParent extends JSWindowActorParent {
}
/**
- * Uses the page extractor to identify the current page's language.
+ * Extracts a substring of visible text from the content document and
+ * runs it through the language detector to determine the page's language.
*
* @returns {Promise<DetectionResult>}
*/
@@ -3563,53 +3564,89 @@ export class TranslationsParent extends JSWindowActorParent {
return this.languageState.detectedLanguages.identified;
}
- const actor =
- this.browsingContext?.currentWindowGlobal?.getActor("PageExtractor");
-
- if (!actor) {
- throw new Error("Unable to get the PageExtractor actor.");
- }
+ lazy.console.log(
+ "Beginning text extraction:",
+ this.browsingContext?.currentURI?.spec
+ );
- const startTime = ChromeUtils.now();
+ const extractionStartTime = ChromeUtils.now();
+ const pageText = await this.sendQuery("Translations:ExtractPageText", {
+ sufficientLength: 4096,
+ });
- // Manual profiling on 10 page loads of https://es.wikipedia.org/wiki/Felis_catus:
- // -------------------------------------------------------------------------------
- //
- // No limit: 2064 samples, 224/237/294 [min/med/max]ms (~85k code units)
- // 8192 limit: 681 samples, 75/ 87/128 [min/med/max]ms
- // 4096 limit: 457 samples, 51/ 55/ 97 [min/med/max]ms
- // 2048 limit: 240 samples, 29/ 39/ 64 [min/med/max]ms
- // 1024 limit: 142 samples, 19/ 28/ 58 [min/med/max]ms
- //
- // 2048 Code units feels like a decent length for performance and sample size.
- const pageText = await actor.getText({ sufficientLength: 2048 });
if (this.#isDestroyed) {
- return { language: "", confident: false, languages: [] };
+ return { language: "en", confident: false, languages: [] };
}
+ const extractionTime = ChromeUtils.now() - extractionStartTime;
+
+ lazy.console.debug(
+ `Extracted Page Text (${pageText.length} code units):\n\n`,
+ pageText
+ );
+
+ const extractionLog =
+ `Extracted ${pageText.length} code units of text in ` +
+ `${extractionTime.toFixed(3)} ms.`;
+
+ lazy.console.log(extractionLog);
+ ChromeUtils.addProfilerMarker(
+ "TranslationsParent",
+ { startTime: extractionStartTime, innerWindowId: this.innerWindowId },
+ extractionLog
+ );
+
+ const identificationStartTime = ChromeUtils.now();
const result = await lazy.LanguageDetector.detectLanguage(pageText);
+
if (this.#isDestroyed) {
- return { language: "", confident: false, languages: [] };
+ return { language: "en", confident: false, languages: [] };
}
- const message =
- `Identified page language as "${result.language}" ` +
- `in ${((ChromeUtils.now() - startTime) / 1000).toFixed(3)} seconds: ` +
- this.browsingContext?.currentURI?.spec;
+ const identificationTime = ChromeUtils.now() - identificationStartTime;
+ const identificationLog =
+ `Identified ${pageText.length} code units of text as "${result.language}" ` +
+ `in ${identificationTime.toFixed(3)} ms.`;
+ lazy.console.log(identificationLog);
ChromeUtils.addProfilerMarker(
"TranslationsParent",
- { startTime, innerWindowId: this.innerWindowId },
- message
+ { startTime: identificationStartTime, innerWindowId: this.innerWindowId },
+ identificationLog
+ );
+ ChromeUtils.addProfilerMarker(
+ "TranslationsParent",
+ { startTime: extractionStartTime, innerWindowId: this.innerWindowId },
+ "Total time to identify page language."
);
-
- lazy.console.debug("\nExtracted Page Text:\n\n", pageText);
- lazy.console.log(message);
if (pageText.length < TranslationsParent.#DOC_CONFIDENCE_THRESHOLD) {
result.confident = false;
}
+ const htmlLangAttribute =
+ this.languageState?.detectedLanguages?.htmlLangAttribute ?? null;
+ const identifiedLanguage = result.language;
+
+ TranslationsParent.telemetry().onIdentifyPageLanguage({
+ htmlLangAttribute,
+ identifiedLanguage,
+ langTagsMatch: htmlLangAttribute
+ ? lazy.TranslationsUtils.langTagsMatch(
+ htmlLangAttribute,
+ identifiedLanguage
+ )
+ : null,
+ isLangAttributeValid: htmlLangAttribute
+ ? lazy.TranslationsUtils.isLangTagValid(htmlLangAttribute)
+ : null,
+ extractedCodeUnits: pageText.length,
+ extractionTime,
+ identificationTime,
+ totalTime: extractionTime + identificationTime,
+ confident: result.confident,
+ });
+
return result;
}
```
Now, I know that adding lines to `TranslationsChild` can be a bit sensitive to performance, since the child actor is loaded in every content tab. But I wouldn't expect it to affect installer size.
I should also note that the intent of this patch is to move the page-extraction algorithm _back_ to using `nsIDocumentEncoder`, which it has been using for years without issue.
In Bug 1967758 I moved away from `nsIDocumentEncoder`, but I did more performance testing, and realized that `nsIDcoumentEncoder` was still faster, so this patch moves it back.
(This all occurred within the same Nightly cycle.)
Bob, I'm sorry for the cold-call NI, but I know you are familiar with Windows. Feel free to redirect if needed.
I genuinely am not sure what in this diff would affect Windows installer size by 481.3 KB. Do you see anything?