Closed Bug 1836753 Opened 2 years ago Closed 2 years ago

Startup Crash in [@ nsXULPrototypeDocument::RebuildL10nPrototype] called from DOMLocalization::ApplyTranslations

Categories

(Core :: Internationalization, defect)

Unspecified
Windows 10
defect

Tracking

()

RESOLVED FIXED
116 Branch
Tracking Status
firefox-esr102 --- unaffected
firefox114 --- unaffected
firefox115 + fixed
firefox116 + fixed

People

(Reporter: mccr8, Assigned: smaug)

Details

(Keywords: crash, regression)

Crash Data

Attachments

(1 file)

Crash report: https://crash-stats.mozilla.org/report/index/f4a17ee9-6d95-48d2-a984-f706a0230602

MOZ_CRASH Reason: MOZ_RELEASE_ASSERT(proto)

Top 10 frames of crashing thread:

0  xul.dll  nsXULPrototypeDocument::RebuildL10nPrototype  dom/xul/nsXULPrototypeDocument.cpp:514
1  xul.dll  mozilla::dom::DOMLocalization::ApplyTranslations  dom/l10n/DOMLocalization.cpp:546
2  xul.dll  ElementTranslationHandler::ResolvedCallback  dom/l10n/DOMLocalization.cpp:272
3  xul.dll  mozilla::dom::  dom/promise/Promise.cpp:469
4  xul.dll  mozilla::dom::NativeHandlerCallback  dom/promise/Promise.cpp:401
5  xul.dll  CallJSNative  js/src/vm/Interpreter.cpp:486
5  xul.dll  js::InternalCallOrConstruct  js/src/vm/Interpreter.cpp:580
5  xul.dll  InternalCall  js/src/vm/Interpreter.cpp:647
5  xul.dll  js::Call  js/src/vm/Interpreter.cpp:679
6  xul.dll  js::Call  js/src/vm/Interpreter.h:116

New signature, starting in the 20230601093525 build.

Here is the change log for the 20230601093525 build.

Greg, could bug 1835163 be related? That's first in the build where the crash first showed up, and at least seems in the general ballpark of this l10n code. Thanks.

Flags: needinfo?(gtatum)
Summary: Crash in [@ nsXULPrototypeDocument::RebuildL10nPrototype] called from DOMLocalization::ApplyTranslations → Startup Crash in [@ nsXULPrototypeDocument::RebuildL10nPrototype] called from DOMLocalization::ApplyTranslations

I don't think the crash is related to that bug. The C++ stack has "translations" in it, but the crash is in the localization code for the browser UI, not in the new translations feature. So I think the only similarity is in the names.

Flags: needinfo?(gtatum)

The bug is marked as tracked for firefox115 (beta) and tracked for firefox116 (nightly). However, the bug still isn't assigned.

:Amir, could you please find an assignee for this tracked bug? If you disagree with the tracking decision, please talk with the release managers.

For more information, please visit BugBot documentation.

Flags: needinfo?(ahabibi)

Looking at the bugs in the regression range with r=fluent-reviewers, I see bug 1836005, bug 1833438, bug 1835155 and bug 1835559. Maybe one of those could be the issue?

:robwu could Bug 1835559 be the cause?

Flags: needinfo?(rob)

Eemeli, could you investigate?

Flags: needinfo?(rob) → needinfo?(earo)

I feel out of my depth here. Smaug, would you be able to look at this? The crashes are coming from code originally added in bug 1517880, which you reviewed.

Flags: needinfo?(earo) → needinfo?(smaug)

I think this may happen if an element which had data-l10n-id or data-l10n-args attribute during initial load is either removed[1] and added back to DOM or it has one of those attributes changed[2] before translation.
Do you think you could try to write a testcase for this?

I think we should just change the assertion to a null check. About to upload a patch.

[1] https://searchfox.org/mozilla-central/rev/f76d80e486028313488d05f4e7fe2509cec11777/dom/base/Element.cpp#2042,2044
[2] https://searchfox.org/mozilla-central/rev/f76d80e486028313488d05f4e7fe2509cec11777/dom/base/Element.cpp#2616

Flags: needinfo?(smaug) → needinfo?(earo)
Assignee: nobody → smaug
Status: NEW → ASSIGNED
Pushed by opettay@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/4458e5842f0a make RebuildL10nPrototype a bit more resilient to unusual document states, r=emilio
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 116 Branch

(In reply to Olli Pettay [:smaug][bugs@pettay.fi] from comment #8)

I think this may happen if an element which had data-l10n-id or data-l10n-args attribute during initial load is either removed[1] and added back to DOM or it has one of those attributes changed[2] before translation.
Do you think you could try to write a testcase for this?

I'm not sure. First of all, there are concepts here that are new to me, like "prototype document" and "prototype element". Is there documentation somewhere about what these are, and what's happening with them?

Second, it seems a bit tricky to get to a state where this condition is triggered. RebuildL10nPrototype is only called from ApplyTranslations and the ElementTranslationHandler that it may use, and ApplyTranslations is only called with a non-null prototype by TranslateDocument (via TranslateElements), which checks each element's ElementCreatedFromPrototypeAndHasUnmodifiedL10n flag before including it. So I think what needs to happen to trigger this is that the element gets removed from mL10nProtoElements immediately after an async call of TranslateDocument, before the callback handler executes. The tricky bit here is how to time things to happen right after a TranslateDocument call, but adding a <link rel="localization"> might do it.

Does that sound about right, or is there perhaps more going on here that I'm missing?

Flags: needinfo?(earo) → needinfo?(smaug)

:smaug please submit a beta uplift request when ready

Prototype documents were initially created for XUL documents. Basically after the initial startup we create a binary representation of a Document used in the UI, that can be then stored on disc and loaded fast. When loading, we create a prototype, and keep that in memory, so when a new window is created, one doesn't need to do any loading but we can just create the real document from the prototype - this greatly help with performance.
But this all does change the loading process from what it is for normal web pages.

I think the issue might happen also if there are l10n elements in dom while the document is created and then some script, which is executed during the page load modifies them.

Flags: needinfo?(smaug)
Flags: needinfo?(ahabibi)

The patch landed in nightly and beta is affected.
:smaug, is this bug important enough to require an uplift?

  • If yes, please nominate the patch for beta approval.
  • If no, please set status-firefox115 to wontfix.

For more information, please visit BugBot documentation.

Flags: needinfo?(smaug)

Looks like Nightly hasn't got these crashes for few days and so far no complaints, so time to ask approval for beta.
I am a bit worried that something goes in some l10n stuff, but better to not crash, since this way we can at least hopefully catch the possible issues if something isn't localized.

Flags: needinfo?(smaug)

Comment on attachment 9338970 [details]
Bug 1836753, make RebuildL10nPrototype a bit more resilient to unusual document states, r=emilio

Beta/Release Uplift Approval Request

  • User impact if declined: Crashes
  • Is this code covered by automated tests?: No
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): This is just adding a null check. The risk is that by not crashing we reveal some other issue. But better to not crash and know about that possible other issue.
  • String changes made/needed: NA
  • Is Android affected?: No
Attachment #9338970 - Flags: approval-mozilla-beta?

Comment on attachment 9338970 [details]
Bug 1836753, make RebuildL10nPrototype a bit more resilient to unusual document states, r=emilio

Approved for 115.0b6.

Attachment #9338970 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: