Closed Bug 1518252 Opened 5 years ago Closed 5 years ago

about:support screen corruption

Categories

(Core :: Internationalization, defect, P2)

66 Branch
x86_64
Windows 10
defect

Tracking

()

RESOLVED FIXED
mozilla66
Tracking Status
firefox66 --- fixed

People

(Reporter: streetwolf52, Assigned: zbraniecki)

References

Details

Attachments

(3 files, 2 obsolete files)

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0

Steps to reproduce:

Run about:support the first time you're in Fx66.

Actual results:

Screen has some momentary screen corruption before the screen looks normal

Expected results:

No screen corruption.

Component: Untriaged → Localization
OS: Unspecified → Windows 10
Product: Firefox → Core
Hardware: Unspecified → x86_64

I suppose this is an expected consequence of not having Fluent on the start-up path yet?

I think this is a regular flash-of-untranslated-content, aka, FOUC.

This might be specific to xhtml, which has xml parser and might do slightly different timing than xul or html. Zibi?

We could make this less apparent by giving the table rows an initial min-height, or just th.column.

Component: Localization → Internationalization
Status: UNCONFIRMED → NEW
Ever confirmed: true

I'll investigate today

Flags: needinfo?(gandalf)

Taking

Assignee: nobody → gandalf
Status: NEW → ASSIGNED
Flags: needinfo?(gandalf)
Priority: -- → P2

I'll need Olli's help with this one.

So, what happens here is not very different from bug 1455649 comment 185, but at least my naive attempt to reuse the solution didn't work here.

The FOUC is pretty visible and is caused by the fact that the Document::TriggerInitialDocumentTranslation is kicked off from SetReadyStateInternal for ready state interactive [0].

I never liked this choice, but now it's painfully visible that this event happens after layout has been kicked off, so in result we end up with localization happening after initial layout and flush/paint happens before the l10n is done.

I tried to find a better location to trigger the method and located nsXMLContentSink::HandleEndElement which looks promising since it has a branch that gets kicked off when the root element is closing - which is exactly when we want to trigger localization of the parsed DOM.

Triggering it here does work in a sense that now initial document translation is kicked off before layout begins, but it is still performed after layout happens :(
Sprinkling some printfs/console.logs gave me:

nsContentSink::StartLayout
nsXMLContentSink::HandleEndElem
Document::TriggerInitialDocumentTranslation
DOMLocalization::TranslateRoots
DOMLocalization::TranslateFragment
nsNode::Localize
nsXMLContentSink::MaybeStartLayout
DOMLocalization::TranslateFragment (nsNode::Localize::then part)

The interesting part is that now the nsNode::Localize pauses to let nsXMLContentSink::MaybeStartLayout happen in the XHTML scenario. I suspect that that happens because the callback to nsNode::Localize is asynchronous so that allows the MaybeStartLayout to happen.

I tried to use the same trick to set the event listener and fire an event from inside of Document::TriggerInitialDocumentTranslation instead of manually triggering DocumentL10n::TriggerInitialDocumentTranslation hoping it'll put me in the right microtask, but it didn't.

Olli, can you help me figure out how can we fire the translation in the XHTML scenario? Should we follow bz's recommendation and just add mLocalizationPending along the lines with stylesheets and block layout until l10n is done?

[0] https://searchfox.org/mozilla-central/rev/c3ebaf6de2d481c262c04bb9657eaf76bf47e2ac/dom/base/Document.cpp#8146
[1] https://searchfox.org/mozilla-central/rev/c3ebaf6de2d481c262c04bb9657eaf76bf47e2ac/dom/xml/nsXMLContentSink.cpp#1077

Flags: needinfo?(bugs)

I wrote a POC of a patch based on bz's suggestion that makes us treat localization in line with stylesheets allowing us to block layout on Fluent.

It's very rough, and I didn't measure perf impact, but it does make us not FOUC on about:support.

Olli - do you think that we could go this route? If so, I'd align XUL/XHTML/HTML parsers/content_sinks to just block layout instead of trying to win the race and if the performance will stay the same, we would at least remove the FOUC risk, while we work on other perf improvements like caching and migration away from JS.

There isn't really other option than to block layout. (X)HTML pages are loaded incrementally so the first
layout can, by default, happen at any time.

In the patch WaitForPendingSheets should be renamed to something which describes the new setup better.

Flags: needinfo?(bugs)

This patch is smaller than I expected!

I originally didn't touch the microtask in the XUL scenario, but that introduced a regression: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=6925d934c853&newProject=try&newRevision=3bbaf9a55839&framework=1

So I removed the microtask alignment trick and just called TriggerInitialDocumentTranslation in XUL just like I do in XHTML/HTML, and that gaves us no perf hit and even a small perf win on preferences: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=6925d934c853&newProject=try&newRevision=cc5aa805257c&framework=1

What's more important, we now are FOUC proof even if we have to load many fallbacks, we just display the content slower in the error scenario, which seems like a great tradeoff.

Attachment #9035214 - Attachment is obsolete: true

(In reply to Zibi Braniecki [:gandalf][:zibi] from comment #10)

This patch is smaller than I expected!

I originally didn't touch the microtask in the XUL scenario, but that introduced a regression: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=6925d934c853&newProject=try&newRevision=3bbaf9a55839&framework=1

So I removed the microtask alignment trick and just called TriggerInitialDocumentTranslation in XUL just like I do in XHTML/HTML, and that gaves us no perf hit and even a small perf win on preferences: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=6925d934c853&newProject=try&newRevision=cc5aa805257c&framework=1

What's more important, we now are FOUC proof even if we have to load many fallbacks, we just display the content slower in the error scenario, which seems like a great tradeoff.

Can we also doublecheck what happens perf-wise in the case with the mini-cache and the startup path? I'm a bit nervous about delaying layout here.

I'd also suspect we might run into issues where JS that didn't use to cause layout flushes now do because they happen before the 'natural' layout that is delayed by the fluent stuff, which effectively becomes racy now (e.g. things that are in an executeSoon() after DCL or load events), as well as (potentially) binding/CE initialization. Probably worth doing a trypush with tests if you haven't already, also with the patchset to convert some of browser.xul... just to see...

I'm a bit nervous about delaying layout here.

My understanding is that there are two scenarios:

  1. L10n is done before layout
  2. L10n is not done before layout

If (1) happens then this patch doesn't change anything. If (2) happens that this patch actually prevents us from:

a) layout untranslated DOM
a.1) potentially paint FOUC
b) apply translation
c) reflow DOM with text

and replaces it with:

a) apply translation
b) layout translated DOM
c) paint

which is a better UX.

(2) can happen in various cases, including when l10n has to load fallbacks (error scenario). If we assume that a paint with translation is the first meaningful paint, then a FOUC paint is just a waste of layout and bad UX, so I assume that delaying layout/paint here is always a win.
Now, once we establish the no-FOUC guarantee, we can continue improving the performance of l10n to reduce the delay caused by waiting for l10n, which is the plan. But if I'm not mistaken, we do no have a scenario where we prefer to layout/paint before l10n is done.

Probably worth doing a trypush with tests if you haven't already, also with the patchset to convert some of browser.xul... just to see...

Yeah, I'll definitely test more including browser.xhtml approach impact.

Attachment #9035705 - Attachment is obsolete: true

I think the patch is complete now! Sorry it took so long, but I had been fighting with a talos regression that I couldn't easily reason about.

Now, the current patchest is made of two patches.

The first one just fixes the ordering issue of the docl10n tests, which in very rare cases could have (and did) result in a race condition.

The second one is the full patch which now is smaller and hopefully easier to reason about.
The performance seems really good: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=6e6cd40f05a49ca0eaf2016f5466fbc07fcdc501&newProject=try&newRevision=afcc8a2df7be&framework=1

All of the about_preferences_* tests seem to improve slightly but not significantly yet. This combined with no regressions and the fact that we're getting no FOUCs makes me want to push for this patch to land in 66.

Olli - let me know what you think. If you prefer to delay it till 67, I'll understand :)

Doesn't look super scary, so I don't mind fixing this in 66.

What was causing the regression?

Attached patch regression.diffSplinter Review

This is a diff between the branch that causes regression and the final patch. I don't see anything here that could cause the regression.

I can try to bisect and overlay the new patch onto the old one until I get the change that causes it, but it'll be quite some manual work since I need to send each bisection to talos. Do you want me to do that?

Backed out 2 changesets (Bug 1518252) for frequent clipboard failures on browser_jsterm_middle_click_paste.

Push with failures: https://treeherder.mozilla.org/#/jobs?repo=autoland&searchStr=linux%2Cx64%2Cdebug%2Cmochitests%2Cwith%2Ce10s%2Ctest-linux64%2Fdebug-mochitest-clipboard-e10s%2Cm-e10s%28cl%29&group_state=expanded&fromchange=371bf42a4b472344a50f4e8adb3e72811f31151e&tochange=9aae2b5ef219907ff9647488ce19ccb755f96908&selectedJob=223602621

Failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=223602621&repo=autoland&lineNumber=14578

Backout link: https://hg.mozilla.org/integration/autoland/rev/9aae2b5ef219907ff9647488ce19ccb755f96908

[task 2019-01-23T23:15:09.264Z] 23:15:09 INFO - TEST-UNEXPECTED-FAIL | devtools/client/webconsole/test/mochitest/browser_jsterm_middle_click_paste.js | Uncaught exception - [Exception... "Component returned failure code: 0x80004005 (NS_ERROR_FAILURE) [nsIDOMWindowUtils.sendMouseEvent]" nsresult: "0x80004005 (NS_ERROR_FAILURE)" location: "JS frame :: chrome://mochikit/content/tests/SimpleTest/EventUtils.js :: synthesizeMouseAtPoint :: line 468" data: no]
[task 2019-01-23T23:15:09.264Z] 23:15:09 INFO - Stack trace:
[task 2019-01-23T23:15:09.264Z] 23:15:09 INFO - synthesizeMouseAtPoint@chrome://mochikit/content/tests/SimpleTest/EventUtils.js:468:7
[task 2019-01-23T23:15:09.265Z] 23:15:09 INFO - synthesizeMouse@chrome://mochikit/content/tests/SimpleTest/EventUtils.js:405:10
[task 2019-01-23T23:15:09.265Z] 23:15:09 INFO - performTests@chrome://mochitests/content/browser/devtools/client/webconsole/test/mochitest/browser_jsterm_middle_click_paste.js:35:3
[task 2019-01-23T23:15:09.266Z] 23:15:09 INFO - async*@chrome://mochitests/content/browser/devtools/client/webconsole/test/mochitest/browser_jsterm_middle_click_paste.js:18:9
[task 2019-01-23T23:15:09.266Z] 23:15:09 INFO - AsyncTester_execTest/<@chrome://mochikit/content/browser-test.js:1108:34
[task 2019-01-23T23:15:09.266Z] 23:15:09 INFO - async
Tester_execTest@chrome://mochikit/content/browser-test.js:1099:16
[task 2019-01-23T23:15:09.267Z] 23:15:09 INFO - nextTest/<@chrome://mochikit/content/browser-test.js:997:9
[task 2019-01-23T23:15:09.267Z] 23:15:09 INFO - SimpleTest.waitForFocus/waitForFocusInner/focusedOrLoaded/<@chrome://mochikit/content/tests/SimpleTest/SimpleTest.js:803:59
[task 2019-01-23T23:15:09.267Z] 23:15:09 INFO - Leaving test bound
[task 2019-01-23T23:15:09.268Z] 23:15:09 INFO - GECKO(8039) | [Parent 8039, DOM Worker] WARNING: '!holder->HoldWorker(aWorkerPrivate, Canceling)', file /builds/worker/workspace/build/src/dom/workers/WorkerRef.cpp, line 156
[task 2019-01-23T23:15:09.268Z] 23:15:09 INFO - GECKO(8039) | ++DOMWINDOW == 18 (0x7fa7da945000) [pid = 8039] [serial = 18] [outer = 0x7fa7e0842800]
[task 2019-01-23T23:15:10.261Z] 23:15:10 INFO - Removing tab.
[task 2019-01-23T23:15:10.261Z] 23:15:10 INFO - Waiting for event: 'TabClose' on [object XULElement].
[task 2019-01-23T23:15:10.286Z] 23:15:10 INFO - Got event: 'TabClose' on [object XULElement].
[task 2019-01-23T23:15:10.306Z] 23:15:10 INFO - Tab removed and finished closing
[task 2019-01-23T23:15:10.375Z] 23:15:10 INFO - GECKO(8039) | MEMORY STAT vsizeMaxContiguous not supported in this build configuration.
[task 2019-01-23T23:15:10.376Z] 23:15:10 INFO - GECKO(8039) | MEMORY STAT | vsize 1823MB | residentFast 391MB | heapAllocated 138MB
[task 2019-01-23T23:15:10.376Z] 23:15:10 INFO - TEST-OK | devtools/client/webconsole/test/mochitest/browser_jsterm_middle_click_paste.js | took 6325ms

Flags: needinfo?(gandalf)

Nicolas, you landed this test quite recently (bug 1482875) - do you have any gut feeling on why would it start failing after my patch?

The patch in this bug is making us block layout on localization (async), but it seems like your test is the only thing that starts intermittently failing (I investigated the TV bug separately) and I'm not sure how to investigate it. Does it mean that with my patch the devtools don't show up? Don't layout? The error message seems unrelated and very vague :(

Flags: needinfo?(nchevobbe)

For the test_docl10n.html/xhtml failure, I was able to reproduce with ./mach test intl/l10n/test/dom/test_docl10n.html --verify in debug build.

It seems to me that in some cases we cannot retrieve the sink from the document in Document::InitialDocumentTranslationCompleted. Here's a log from a healthy run:

 1:07.76 TEST_START: intl/l10n/test/dom/test_docl10n.html
 1:07.90 GECKO(28704) ++DOMWINDOW == 31 (0x7fd001c5bc00) [pid = 28704] [serial = 31] [outer = 0x7fd00a31dc00]
 1:07.96 GECKO(28704) nsContentSink::StartLayout.
 1:07.96 GECKO(28704) nsContentSink::StartLayout starting.
 1:08.85 GECKO(28704) ++DOMWINDOW == 32 (0x7fcfff350800) [pid = 28704] [serial = 32] [outer = 0x7fd00a31dc00]
 1:09.00 GECKO(28704) Document::LocalizationLinkAdded.
 1:09.00 GECKO(28704) Document::LocalizationLinkAdded parsing.
 1:09.00 GECKO(28704) Document::OnL10nResourceContainerParsed.
 1:09.00 GECKO(28704) Document::InitializeLocalization.
 1:09.04 GECKO(28704) nsContentSink::StartLayout.
 1:09.04 GECKO(28704) nsContentSink::StartLayout bailing.
 1:09.04 GECKO(28704) Document::TriggerInitialDocumentTranslation.
 1:09.04 GECKO(28704) Document::OnL10nResourceContainerParsed.
 1:09.04 GECKO(28704) DocumentL10n::TriggerInitialDocumentTranslation.
 1:09.04 GECKO(28704) nsContentSink::StartLayout.
 1:09.04 GECKO(28704) nsContentSink::StartLayout bailing.
 1:09.10 GECKO(28704) console.log: "DOMLocalization::translateMutations"
 1:09.10 GECKO(28704) DocumentL10n::ResolvedCallback.
 1:09.10 GECKO(28704) Document::InitialDocumentTranslationCompleted.
 1:09.10 GECKO(28704) nsContentSink::StartLayout.
 1:09.10 GECKO(28704) nsContentSink::StartLayout starting.
 1:09.11 PASS undefined assertion name
 1:09.12 PASS undefined assertion name
 1:09.12 GECKO(28704) DocumentL10n::SetAttributes.
 1:09.12 PASS undefined assertion name
 1:09.12 PASS undefined assertion name
 1:09.13 GECKO(28704) console.log: "DOMLocalization::translateMutations"
 1:09.17 GECKO(28704) console.log: "DOMLocalization::translateMutations"

and here from failure:

 2:42.59 TEST_START: intl/l10n/test/dom/test_docl10n.html
 2:42.70 GECKO(29110) ++DOMWINDOW == 31 (0x7f545d333000) [pid = 29110] [serial = 31] [outer = 0x7f545ca51800]
 2:42.73 GECKO(29110) nsContentSink::StartLayout.
 2:42.73 GECKO(29110) nsContentSink::StartLayout starting.
 2:43.66 GECKO(29110) ++DOMWINDOW == 32 (0x7f545d12a400) [pid = 29110] [serial = 32] [outer = 0x7f545ca51800]
 2:43.79 GECKO(29110) Document::LocalizationLinkAdded.
 2:43.79 GECKO(29110) Document::LocalizationLinkAdded parsing.
 2:43.79 GECKO(29110) Document::OnL10nResourceContainerParsed.
 2:43.79 GECKO(29110) Document::InitializeLocalization.
 2:43.82 GECKO(29110) nsContentSink::StartLayout.
 2:43.82 GECKO(29110) nsContentSink::StartLayout bailing.
 2:43.82 GECKO(29110) Document::TriggerInitialDocumentTranslation.
 2:43.82 GECKO(29110) Document::OnL10nResourceContainerParsed.
 2:43.82 GECKO(29110) DocumentL10n::TriggerInitialDocumentTranslation.
 2:43.82 GECKO(29110) nsContentSink::StartLayout.
 2:43.83 GECKO(29110) nsContentSink::StartLayout bailing.
 2:43.89 GECKO(29110) console.log: "DOMLocalization::translateMutations"
 2:43.89 GECKO(29110) DocumentL10n::ResolvedCallback.
 2:43.89 GECKO(29110) Document::InitialDocumentTranslationCompleted.
 2:43.90 PASS undefined assertion name
 2:43.90 PASS undefined assertion name
 2:43.91 GECKO(29110) DocumentL10n::SetAttributes.
 2:43.90 PASS undefined assertion name
 2:43.91 PASS undefined assertion name
 2:43.92 GECKO(29110) console.log: "DOMLocalization::translateMutations"
 2:43.92 GECKO(29110) console.log: "DOMLocalization::translateMutations"

Olli - any idea why would we not have access to the sink despite bailing in the previous nsContentSink::StartLayout?

Flags: needinfo?(bugs)

Yup, instrumented it further with:

void Document::InitialDocumentTranslationCompleted() {
  printf("Document::InitialDocumentTranslationCompleted.\n");
  mPendingInitialTranslation = false;

  nsCOMPtr<nsIContentSink> sink;
  if (mParser) {
    sink = mParser->GetContentSink();
  } else {
    sink = do_QueryReferent(mWeakSink);
  }
  if (sink) {
    printf("Document::InitialDocumentTranslationCompleted got sink.\n");
    sink->InitialDocumentTranslationCompleted();
  } else {
    printf("Document::InitialDocumentTranslationCompleted no sink.\n");
  }
}

and got:

 1:35.88 TEST_START: intl/l10n/test/dom/test_docl10n.html
 1:35.98 GECKO(2223) ++DOMWINDOW == 31 (0x7f7059031000) [pid = 2223] [serial = 31] [outer = 0x7f7055d9f000]
 1:36.03 GECKO(2223) nsContentSink::StartLayout.
 1:36.03 GECKO(2223) nsContentSink::StartLayout starting.
 1:36.95 GECKO(2223) ++DOMWINDOW == 32 (0x7f7055d95c00) [pid = 2223] [serial = 32] [outer = 0x7f7055d9f000]
 1:37.10 GECKO(2223) Document::LocalizationLinkAdded.
 1:37.10 GECKO(2223) Document::LocalizationLinkAdded parsing.
 1:37.12 GECKO(2223) Document::OnL10nResourceContainerParsed.
 1:37.12 GECKO(2223) Document::InitializeLocalization.
 1:37.14 GECKO(2223) nsContentSink::StartLayout.
 1:37.14 GECKO(2223) nsContentSink::StartLayout bailing.
 1:37.14 GECKO(2223) Document::TriggerInitialDocumentTranslation.
 1:37.14 GECKO(2223) Document::OnL10nResourceContainerParsed.
 1:37.14 GECKO(2223) DocumentL10n::TriggerInitialDocumentTranslation.
 1:37.15 GECKO(2223) nsContentSink::StartLayout.
 1:37.15 GECKO(2223) nsContentSink::StartLayout bailing.
 1:37.21 GECKO(2223) console.log: "DOMLocalization::translateMutations"
 1:37.21 GECKO(2223) DocumentL10n::ResolvedCallback.
 1:37.21 GECKO(2223) Document::InitialDocumentTranslationCompleted.
 1:37.21 GECKO(2223) Document::InitialDocumentTranslationCompleted no sink.
 1:37.21 PASS undefined assertion name
 1:37.22 PASS undefined assertion name
 1:37.22 GECKO(2223) DocumentL10n::SetAttributes.
 1:37.22 PASS undefined assertion name
 1:37.22 PASS undefined assertion name
 1:37.23 GECKO(2223) console.log: "DOMLocalization::translateMutations"
 1:37.24 GECKO(2223) console.log: "DOMLocalization::translateMutations"

its racy, so most of the time the sink is there, but seems like we need to do some extra work to make sure it doesn't go away until mPendingInitialTranslation is false?

(In reply to Zibi Braniecki [:gandalf][:zibi] from comment #21)

Nicolas, you landed this test quite recently (bug 1482875) - do you have any gut feeling on why would it start failing after my patch?

The patch in this bug is making us block layout on localization (async), but it seems like your test is the only thing that starts intermittently failing (I investigated the TV bug separately) and I'm not sure how to investigate it. Does it mean that with my patch the devtools don't show up? Don't layout? The error message seems unrelated and very vague :(

Hello Zibi. The test I added ensures that doing a middle-click on the console input paste the clipboard content into it. The error looks weird to me. Basically, it crashes when triggering the middle-click on a textarea element. Not sure if this is devtools only, or if it highlights a broader issue.

Flags: needinfo?(nchevobbe)

I guess document has been loaded and Contentsink deleted. Perhaps l10n needs to explicitly keep contentsink alive?

Flags: needinfo?(bugs)
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla66
Depends on: 1523074
Depends on: 1524995
Depends on: 1525099
No longer depends on: 1524995
No longer depends on: 1525099
Depends on: 1525234
Depends on: 1524995
Depends on: 1525099
Depends on: 1526793
Depends on: 1532071
No longer depends on: 1532071
Regressions: 1556769
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: