Closed Bug 1555662 Opened 5 years ago Closed 5 years ago

Intermittent toolkit/components/extensions/test/xpcshell/test_ext_redirects.js | application crashed [@ mozilla::dom::DocumentL10n::Init(nsTArray<nsTString<char16_t> >&)]

Categories

(Core :: Internationalization, defect)

defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla69
Tracking Status
firefox-esr60 --- unaffected
firefox67 --- unaffected
firefox68 --- unaffected
firefox69 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: zbraniecki)

References

Details

(Keywords: crash, intermittent-failure, regression, Whiteboard: [retriggered][stockwell disable-recommended])

Crash Data

Attachments

(3 files)

Filed by: btara [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer.html#?job_id=249157524&repo=autoland
Full log: https://queue.taskcluster.net/v1/task/NbYoebmLRl-oaA8F3t_R5A/runs/0/artifacts/public/logs/live_backing.log


11:24:48 INFO - TEST-START | toolkit/components/places/tests/bookmarks/test_405938_restore_queries.js
11:24:48 INFO - mozcrash Saved minidump as /Users/cltbld/tasks/task_1559215027/build/blobber_upload_dir/78F052B2-D2A1-4046-B1F2-4E0DB10A5D81.dmp
11:24:48 INFO - mozcrash Saved app info as /Users/cltbld/tasks/task_1559215027/build/blobber_upload_dir/78F052B2-D2A1-4046-B1F2-4E0DB10A5D81.extra
11:24:48 WARNING - PROCESS-CRASH | xpcshell-remote.ini:toolkit/components/extensions/test/xpcshell/test_ext_redirects.js | application crashed [@ mozilla::dom::DocumentL10n::Init(nsTArray<nsTString<char16_t> >&)]
11:24:48 INFO - Crash dump filename: /var/folders/kn/qgv3cn8n0ns842l71y62c6th00000x/T/xpc-other-bDk9X5/78F052B2-D2A1-4046-B1F2-4E0DB10A5D81.dmp
11:24:48 INFO - Operating system: Mac OS X
11:24:48 INFO - 10.10.5 14F27
11:24:48 INFO - CPU: amd64
11:24:48 INFO - family 6 model 69 stepping 1
11:24:48 INFO - 4 CPUs
11:24:48 INFO - GPU: UNKNOWN
11:24:48 INFO - Crash reason: EXC_BAD_ACCESS / KERN_INVALID_ADDRESS
11:24:48 INFO - Crash address: 0x0
11:24:48 INFO - Process uptime: 17 seconds
11:24:48 INFO - Thread 0 (crashed)
11:24:48 INFO - 0 XUL!mozilla::dom::DocumentL10n::Init(nsTArray<nsTString<char16_t> >&) [DocumentL10n.cpp:c4f64fa957d9c379cf4c4939b9c252423841b4e9 : 115 + 0x11]
11:24:48 INFO - rax = 0x0000000114d4fb6f rdx = 0x000000011fe52000
11:24:48 INFO - rcx = 0x00000001197f7788 rbx = 0x000000012b0df3a0
11:24:48 INFO - rsi = 0x0000000000000000 rdi = 0x000000011fd3c680
11:24:48 INFO - rbp = 0x00007fff5155d8b0 rsp = 0x00007fff5155d860
11:24:48 INFO - r8 = 0x0000000119900d70 r9 = 0x0000000000000037
11:24:48 INFO - r10 = 0x0000000000000000 r11 = 0x00007ffe26466ccd
11:24:48 INFO - r12 = 0x000000012b0df3a0 r13 = 0x0000000000000001
11:24:48 INFO - r14 = 0x000000012b0c4118 r15 = 0x000000012b0c4118
11:24:48 INFO - rip = 0x000000010f4cb545
11:24:48 INFO - Found by: given as instruction pointer in context
11:24:48 INFO - 1 XUL!mozilla::dom::Document::OnL10nResourceContainerParsed() [Document.cpp:c4f64fa957d9c379cf4c4939b9c252423841b4e9 : 3516 + 0x23]
11:24:48 INFO - rbp = 0x00007fff5155d8e0 rsp = 0x00007fff5155d8c0
11:24:48 INFO - rip = 0x0000000110973478
11:24:48 INFO - Found by: previous frame's frame pointer
11:24:48 INFO - 2 XUL!mozilla::dom::HTMLSharedElement::DoneAddingChildren(bool) [HTMLSharedElement.cpp:c4f64fa957d9c379cf4c4939b9c252423841b4e9 : 60 + 0x8]
11:24:48 INFO - rbp = 0x00007fff5155d920 rsp = 0x00007fff5155d8f0
11:24:48 INFO - rip = 0x0000000111c750f3
11:24:48 INFO - Found by: previous frame's frame pointer
11:24:48 INFO - 3 XUL!nsXMLContentSink::CloseElement(nsIContent*) [nsXMLContentSink.cpp:c4f64fa957d9c379cf4c4939b9c252423841b4e9 : 534 + 0xf]
11:24:48 INFO - rbp = 0x00007fff5155d960 rsp = 0x00007fff5155d930
11:24:48 INFO - rip = 0x00000001124b4f12
11:24:48 INFO - Found by: previous frame's frame pointer
11:24:48 INFO - 4 XUL!nsXMLContentSink::HandleEndElement(char16_t const*, bool) [nsXMLContentSink.cpp:c4f64fa957d9c379cf4c4939b9c252423841b4e9 : 1059 + 0xf]
11:24:48 INFO - rbp = 0x00007fff5155d9b0 rsp = 0x00007fff5155d970
11:24:48 INFO - rip = 0x00000001124b6e62
11:24:48 INFO - Found by: previous frame's frame pointer
11:24:48 INFO - 5 XUL!<name omitted> [nsExpatDriver.cpp:c4f64fa957d9c379cf4c4939b9c252423841b4e9 : 325 + 0x6]
11:24:48 INFO - rbp = 0x00007fff5155d9d0 rsp = 0x00007fff5155d9c0
11:24:48 INFO - rip = 0x000000011030bb77
11:24:48 INFO - Found by: previous frame's frame pointer
11:24:48 INFO - 6 XUL!doContent [xmlparse.c:c4f64fa957d9c379cf4c4939b9c252423841b4e9 : 2978 + 0xe]
11:24:48 INFO - rbp = 0x00007fff5155daf0 rsp = 0x00007fff5155d9e0
11:24:48 INFO - rip = 0x00000001132794f6
11:24:48 INFO - Found by: previous frame's frame pointer
11:24:48 INFO - 7 XUL!doProlog [xmlparse.c:c4f64fa957d9c379cf4c4939b9c252423841b4e9 : 4585 + 0x31]
11:24:48 INFO - rbp = 0x00007fff5155dbf0 rsp = 0x00007fff5155db00
11:24:48 INFO - rip = 0x0000000113275850
11:24:48 INFO - Found by: previous frame's frame pointer
11:24:48 INFO - 8 XUL!prologInitProcessor [xmlparse.c:c4f64fa957d9c379cf4c4939b9c252423841b4e9 : 4120 + 0x45]
11:24:48 INFO - rbp = 0x00007fff5155e0f0 rsp = 0x00007fff5155dc00
11:24:48 INFO - rip = 0x000000011327249a
11:24:48 INFO - Found by: previous frame's frame pointer
11:24:48 INFO - 9 XUL!MOZ_XML_Parse [xmlparse.c:c4f64fa957d9c379cf4c4939b9c252423841b4e9 : 1894 + 0x1d]
11:24:48 INFO - rbp = 0x00007fff5155e140 rsp = 0x00007fff5155e100
11:24:48 INFO - rip = 0x0000000113271f1e
11:24:48 INFO - Found by: previous frame's frame pointer
11:24:48 INFO - 10 XUL!nsExpatDriver::ConsumeToken(nsScanner&, bool&) [nsExpatDriver.cpp:c4f64fa957d9c379cf4c4939b9c252423841b4e9 : 924 + 0xdb]
11:24:48 INFO - rbp = 0x00007fff5155e4f0 rsp = 0x00007fff5155e150
11:24:48 INFO - rip = 0x000000011030c599
11:24:48 INFO - Found by: previous frame's frame pointer
11:24:48 INFO - 11 XUL!nsParser::ResumeParse(bool, bool, bool) [nsParser.cpp:c4f64fa957d9c379cf4c4939b9c252423841b4e9 : 1414 + 0xd]
11:24:48 INFO - rbp = 0x00007fff5155e5c0 rsp = 0x00007fff5155e500
11:24:48 INFO - rip = 0x0000000110313402
11:24:48 INFO - Found by: previous frame's frame pointer
11:24:48 INFO - 12 XUL!nsParser::OnDataAvailable(nsIRequest*, nsIInputStream*, unsigned long long, unsigned int) [nsParser.cpp:c4f64fa957d9c379cf4c4939b9c252423841b4e9 : 1321 + 0x11]
11:24:48 INFO - rbp = 0x00007fff5155e630 rsp = 0x00007fff5155e5d0
11:24:48 INFO - rip = 0x0000000110315231
11:24:48 INFO - Found by: previous frame's frame pointer
11:24:48 INFO - 13 XUL!nsJARChannel::OnDataAvailable(nsIRequest*, nsIInputStream*, unsigned long long, unsigned int) [nsJARChannel.cpp:c4f64fa957d9c379cf4c4939b9c252423841b4e9 : 1055 + 0x19]
11:24:48 INFO - rbp = 0x00007fff5155e660 rsp = 0x00007fff5155e640
11:24:48 INFO - rip = 0x000000011011ac61
11:24:48 INFO - Found by: previous frame's frame pointer
11:24:48 INFO - 14 XUL!nsInputStreamPump::OnInputStreamReady(nsIAsyncInputStream*) [nsInputStreamPump.cpp:c4f64fa957d9c379cf4c4939b9c252423841b4e9 : 399 + 0x151]
11:24:48 INFO - rbp = 0x00007fff5155e6e0 rsp = 0x00007fff5155e670
11:24:48 INFO - rip = 0x000000010f53ddfc
11:24:48 INFO - Found by: previous frame's frame pointer
11:24:48 INFO - 15 XUL!nsInputStreamReadyEvent::Run() [nsStreamUtils.cpp:c4f64fa957d9c379cf4c4939b9c252423841b4e9 : 91 + 0x6]
11:24:48 INFO - rbp = 0x00007fff5155e700 rsp = 0x00007fff5155e6f0
11:24:48 INFO - rip = 0x000000010f411451
11:24:48 INFO - Found by: previous frame's frame pointer
11:24:48 INFO - 16 XUL!nsThread::ProcessNextEvent(bool, bool*) [nsThread.cpp:c4f64fa957d9c379cf4c4939b9c252423841b4e9 : 1176 + 0x6]
11:24:48 INFO - rbp = 0x00007fff5155ebf0 rsp = 0x00007fff5155e710
11:24:48 INFO - rip = 0x000000010f4568ab
11:24:48 INFO - Found by: previous frame's frame pointer
11:24:48 INFO - 17 XUL!NS_ProcessNextEvent(nsIThread*, bool) [nsThreadUtils.cpp:c4f64fa957d9c379cf4c4939b9c252423841b4e9 : 486 + 0xd]
11:24:48 INFO - rbp = 0x00007fff5155ec20 rsp = 0x00007fff5155ec00
11:24:48 INFO - rip = 0x000000010f459269
11:24:48 INFO - Found by: previous frame's frame pointer
11:24:48 INFO - 18 XUL!mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) [MessagePump.cpp:c4f64fa957d9c379cf4c4939b9c252423841b4e9 : 110 + 0xa]
11:24:48 INFO - rbp = 0x00007fff5155ec70 rsp = 0x00007fff5155ec30
11:24:48 INFO - rip = 0x000000010fb58efd
11:24:48 INFO - Found by: previous frame's frame pointer
11:24:48 INFO - 19 XUL!nsBaseAppShell::Run() [nsBaseAppShell.cpp:c4f64fa957d9c379cf4c4939b9c252423841b4e9 : 137 + 0x45]
11:24:48 INFO - rbp = 0x00007fff5155ecb0 rsp = 0x00007fff5155ec80
11:24:48 INFO - rip = 0x000000011271941c
11:24:48 INFO - Found by: previous frame's frame pointer
11:24:48 INFO - 20 XUL!nsAppShell::Run() [nsAppShell.mm:c4f64fa957d9c379cf4c4939b9c252423841b4e9 : 705 + 0x8]
11:24:48 INFO - rbp = 0x00007fff5155ecf0 rsp = 0x00007fff5155ecc0
11:24:48 INFO - rip = 0x000000011279d8d7
11:24:48 INFO - Found by: previous frame's frame pointer
11:24:48 INFO - 21 XUL!XRE_RunAppShell() [nsEmbedFunctions.cpp:c4f64fa957d9c379cf4c4939b9c252423841b4e9 : 911 + 0x9]
11:24:48 INFO - rbp = 0x00007fff5155ed50 rsp = 0x00007fff5155ed00
11:24:48 INFO - rip = 0x0000000113dda5c2
11:24:48 INFO - Found by: previous frame's frame pointer
11:24:48 INFO - 22 XUL!XRE_InitChildProcess(int, char**, XREChildData const*) [message_loop.cc:c4f64fa957d9c379cf4c4939b9c252423841b4e9 : 315 + 0xc]
11:24:48 INFO - rbp = 0x00007fff5155f040 rsp = 0x00007fff5155ed60
11:24:48 INFO - rip = 0x0000000113dda1ae
11:24:48 INFO - Found by: previous frame's frame pointer
11:24:48 INFO - 23 plugin-container!main [MozillaRuntimeMain.cpp:c4f64fa957d9c379cf4c4939b9c252423841b4e9 : 23 + 0x2c]
11:24:48 INFO - rbp = 0x00007fff5155f080 rsp = 0x00007fff5155f050
11:24:48 INFO - rip = 0x000000010e6a0f07
11:24:48 INFO - Found by: previous frame's frame pointer
11:24:48 INFO - 24 libdyld.dylib + 0x35c9
11:24:48 INFO - rbp = 0x00007fff5155f090 rsp = 0x00007fff5155f090
11:24:48 INFO - rip = 0x00007fff8d0465c9
11:24:48 INFO - Found by: previous frame's frame pointer

Component: General → Localization
Product: WebExtensions → Core
Flags: needinfo?(prathikshaprasadsuman)
Whiteboard: [retriggered]

I have no idea what's happening here and I can't tell how this is related to Bug 1549561. @gandalf, can you please take a look at this? :)

Flags: needinfo?(prathikshaprasadsuman) → needinfo?(gandalf)

I don't, and I cannot reproduce it locally. Tried with:

./mach test toolkit/components/extensions/test/xpcshell/test_ext_redirects.js --verify

on linux with debug m-c from today.

No patches landed around this area in a week, so I'm not sure why we'd see new crashes only starting two days ago. Maybe something about loading FTL files in Localization.jsm changed due to the bug 1549561?

Verifying the bisect would help.

Flags: needinfo?(gandalf)
Component: Localization → Internationalization
Flags: needinfo?(m_kato)
Whiteboard: [retriggered][stockwell disable-recommended] → [retriggered][stockwell needswork:owner]

The duped bug has:

01:55:53 INFO - PID 5564 | ipt error: resource://gre/modules/L10nRegistry.jsm, line 245: TypeError: sources is null
01:55:53 INFO - PID 5564 | Assertion failure: jsm, at z:/build/build/src/intl/l10n/DocumentL10n.cpp:115

This leads me to think that maybe this fails - https://searchfox.org/mozilla-central/source/intl/l10n/L10nRegistry.jsm#244

Is that possible?

Flags: needinfo?(kmaglione+bmo)

https://searchfox.org/mozilla-central/rev/f8b11433159cbc9cc80500b3e579d767473fa539/dom/ipc/SharedMap.cpp#62 says that returning null is the behavior on unfound data.

Given this is intermittent, this could be a race condition between the test and the first call to registerSource ?

(In reply to Zibi Braniecki [:gandalf][:zibi] from comment #9)

The duped bug has:

01:55:53 INFO - PID 5564 | ipt error: resource://gre/modules/L10nRegistry.jsm, line 245: TypeError: sources is null
01:55:53 INFO - PID 5564 | Assertion failure: jsm, at z:/build/build/src/intl/l10n/DocumentL10n.cpp:115

This leads me to think that maybe this fails - https://searchfox.org/mozilla-central/source/intl/l10n/L10nRegistry.jsm#244

Is that possible?

Well, .get() returns null if the entry doesn't exist, so yes.

Flags: needinfo?(kmaglione+bmo)

Great. So the fix is easy, but I'm wondering what changed that led to it. In particular, I'm wondering if the change that happen means that we initialize the content process earlier (before parent process injected things into shared data), or also using the content process.

If the latter, fixing this bug naively may lead to some other intermittents down the road as some content process tries to retrieve strings before parent process got the sources set.

But not sure if there's a way to test that, since we struggle to reproduce it. I'll likely just fix it naively.

Flags: needinfo?(m_kato) → needinfo?(gandalf)

Hm. So, these failures happen in xpcshell tests, which don't load BrowserGlue, and therefore don't ever call L10nRegistry.registerSource unless a langpack is loaded for some reason. Which means the parent process will generally never set the shared data property.

Which is generally not a problem, since we generally don't need these things in xpcshell tests.

In this case, though, I guess we're for some reason loading a neterror or certerror page in the child process sometimes, which is triggering document localization, which is failing because we don't have any L10n resources registered.

:gandalf , should we go ahead and disable the test until you`ll fix the issue ?

No need. I'll provide the patch today.

In this case, though, I guess we're for some reason loading a neterror or certerror page in the child process sometimes, which is triggering document localization, which is failing because we don't have any L10n resources registered.

Kris, I assume that we're ok in such scenario to have an unlocalized neterror/certerror page? (I guess that's why bug 1549561 regressed this)

If so, that's great. If not, then I'm not sure how to workaround this issue. I'm also slightly concerned about having tests that silently fail just because we allow for fluent to fail to localize a document.
In the future, I'd like to consider tightening test coverage for when the document cannot be localized and I expect we'll have to rework this test then? :(
Also, why is it intermittent and not perm?

Flags: needinfo?(gandalf) → needinfo?(kmaglione+bmo)

(In reply to Zibi Braniecki [:gandalf][:zibi] from comment #19)

In this case, though, I guess we're for some reason loading a neterror or certerror page in the child process sometimes, which is triggering document localization, which is failing because we don't have any L10n resources registered.

Kris, I assume that we're ok in such scenario to have an unlocalized neterror/certerror page? (I guess that's why bug 1549561 regressed this)

I mean, not really... I'd kind of expect those locale resources to be part of toolkit and to be available to any toolkit app.

But, to the extent that this is really only a thing that can happen in tests, I suppose I probably don't care enough to complain.

Flags: needinfo?(kmaglione+bmo)
Whiteboard: [retriggered][stockwell disable-recommended] → [retriggered]

Ugh, okay. Seems like we should find a way to inject the toolkit source in toolkit, not in browser. Not sure if there's an equivalent of browserglue for that.

Assignee: nobody → gandalf
Status: NEW → ASSIGNED
Crash Signature: [@ mozilla::dom::DocumentL10n::Init(nsTArray<nsTString<char16_t> >&)] → [@ mozilla::dom::DocumentL10n::Init(nsTArray<nsTString<char16_t> >&)] [@ mozilla::dom::DOMLocalization::Init(class nsTArray<nsTString<char16_t> > & const, class mozilla::ErrorResult & const)]
Crash Signature: [@ mozilla::dom::DocumentL10n::Init(nsTArray<nsTString<char16_t> >&)] [@ mozilla::dom::DOMLocalization::Init(class nsTArray<nsTString<char16_t> > & const, class mozilla::ErrorResult & const)] → [@ mozilla::dom::DocumentL10n::Init(nsTArray<nsTString<char16_t> >&)] [@ mozilla::dom::DOMLocalization::Init(class nsTArray<nsTString<char16_t> > & const, class mozilla::ErrorResult & const)]
Pushed by zbraniecki@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/ede8b931ee0d
Register L10nRegistry sources using categories. r=kmag
Pushed by apavel@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/84c57f9dda5b
follow-up linting fix on a CLOSED TREE

the issue was that the categories are ordered alphabetically so toolkit landed after browser, rather than before.

We switched it to 0-toolkit and 5-browser and pushed to try to ensure that it doesn't fail - https://treeherder.mozilla.org/#/jobs?repo=try&revision=b6a3a53ac877dce7538cda01332b6afd0dddce65

Flags: needinfo?(gandalf)
Pushed by zbraniecki@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/89603d14c1de
Register L10nRegistry sources using categories. r=kmag

Looks like TB needs to port this, or not? We're using:

calendar/base/src/calStartupService.js
6 var { FileSource, L10nRegistry } = ChromeUtils.import("resource://gre/modules/L10nRegistry.jsm");
mail/components/mailGlue.js
15 var {L10nRegistry, FileSource} = ChromeUtils.import("resource://gre/modules/L10nRegistry.jsm");
mail/components/preferences/advanced.js
17 var {L10nRegistry} = ChromeUtils.import("resource://gre/modules/L10nRegistry.jsm");

We also need to change our package-manifest.in, yes?

Flags: needinfo?(gandalf)

Looks like TB needs to port this, or not?

You don't. It just simplifies things for us, because it makes L10nRegistry work in environment when no app is launched (prior to the change, toolkit strings would be registered only when Firefox BrowserGlue.jsm was launched, so not for xpcshell)

Flags: needinfo?(gandalf)
Crash Signature: [@ mozilla::dom::DocumentL10n::Init(nsTArray<nsTString<char16_t> >&)] [@ mozilla::dom::DOMLocalization::Init(class nsTArray<nsTString<char16_t> > & const, class mozilla::ErrorResult & const)] → [@ mozilla::dom::DocumentL10n::Init(nsTArray<nsTString<char16_t> >&)] [@ mozilla::dom::DOMLocalization::Init(class nsTArray<nsTString<char16_t> > & const, class mozilla::ErrorResult & const)] [@ mozilla::dom::DOMLocalization::Init(nsTArray<nsTString<char…

Where is l10n-registry.js - https://hg.mozilla.org/integration/autoland/rev/89603d14c1de#l9.12 ? It's not in your patch and doesn't seem to exist on the system do far unless I'm missing something.

Crash Signature: , class mozilla::ErrorResult & const)] [@ mozilla::dom::DOMLocalization::Init(nsTArray<nsTString<char16_t> >&, mozilla::ErrorResult&)] → , class mozilla::ErrorResult & const)] [@ mozilla::dom::DOMLocalization::Init(nsTArray<nsTString<char16_t> >&, mozilla::ErrorResult&)]

I don't know what is the "l10!-registry J's" you're asking about. I added "l10n-registry.manifest" files to toolkit and browser directories

Oh damn, now I see it. It's a mistake. Can someone land a follow up toswitch it to manifest? I'm not by my laptop :( also, why it didn't fail Android?

Pushed by csabou@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/f630fc1bf336
Follow-up: change 'l10n-registry.js' to 'l10n-registry.manifest'. r=John-Galt,gandalf
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla69

Could you please explain the rationale for this patch? From reading the comments in the bug, it seems to be a patch intending to fix something for the xpcshell case, but when looking at the patch itself, it looks like a startup regression, which was caught by the browser_startup.js test, where you had to whitelist stuff during very early startup.

Flags: needinfo?(gandalf)
Comment on attachment 9070744 [details] [diff] [review]
followup.diff

>diff --git a/browser/base/content/test/performance/browser_startup.js b/browser/base/content/test/performance/browser_startup.js
>--- a/browser/base/content/test/performance/browser_startup.js
>+++ b/browser/base/content/test/performance/browser_startup.js
>@@ -36,6 +36,7 @@ const startupPhases = {
>       "resource://gre/modules/XPCOMUtils.jsm",
>       "resource://gre/modules/Services.jsm",
>       "resource://gre/modules/L10nRegistry.jsm",
>+      "resource://gre/modules/Fluent.jsm",

Unless I'm missing something, this change (punching a hole in the startup performance test) landed without review, with a commit message saying "follow-up linting fix". This feels wrong.

(In reply to Florian Quèze [:florian] from comment #45)

Comment on attachment 9070744 [details] [diff] [review]
followup.diff

diff --git a/browser/base/content/test/performance/browser_startup.js b/browser/base/content/test/performance/browser_startup.js
--- a/browser/base/content/test/performance/browser_startup.js
+++ b/browser/base/content/test/performance/browser_startup.js
@@ -36,6 +36,7 @@ const startupPhases = {
"resource://gre/modules/XPCOMUtils.jsm",
"resource://gre/modules/Services.jsm",
"resource://gre/modules/L10nRegistry.jsm",

  •  "resource://gre/modules/Fluent.jsm",
    

Unless I'm missing something, this change (punching a hole in the startup
performance test) landed without review, with a commit message saying
"follow-up linting fix". This feels wrong.

You can find the conversation on #sheriffs, Saturday, June 8th, 2019, starting with 2:36AM.

Hi gandalf,

I've got concerns with this patch. We're fixing an intermittent by causing a new JSM to be loaded during the start-up window - very early during the startup window (during profile selection). My team has been working during the past few months to clear the start-up path of things that aren't absolutely necessary before first paint.

Are L10nRegistry.jsm and Fluent.jsm absolutely and without question needed during the profile selection phase? Is there no way to make them any lazier? Can you talk a bit more about that here?

Unless I'm missing something, this change (punching a hole in the startup performance test) landed without review, with a commit message saying "follow-up linting fix". This feels wrong.

This change landed with r=kmag. Sorry for lack of updates in the bug.

We're fixing an intermittent by causing a new JSM to be loaded during the start-up window - very early during the startup window (during profile selection).

This is a coincidence that it lands to fix this bug. The bigger driver for this patch are bug 1501881, bug 1441035 and in particular bug 1501886. We're invested in moving away from DTD to Fluent on the startup path.

There are multiple major moving pieces around that code, and unfortunately only I am assigned to work on that. I am working on bug 1517880 which will make us load Fluent.jsm only conditionally that early (in cold load). I'm also working toward migrating Fluent.jsm to fluent-rs, which will remove those files.

The current plan is:

  1. Attempt to switch from using Localization to use LocalizationSync (sync I/O) for browser.xhtml and see if that removed talos penalty.
  2. If that fails, use fluent-minicache to remove penalty kick
  3. Migrate all DTD's on startup to use Fluent
  4. While getting fluent-rs and l10nregistry-rs integrated into Gecko to remove Fluent.jsm, Localization.jsm and L10nRegistry.jsm

1/2 will happen this week. My hope is to start with (3) by Whistler.

==

I also wanted to point out that all of Fluent related work, in particular L10nRegistry.jsm, Fluent.jsm and Localization.jsm are loaded lazily. Nothing starts eagerly. If your code loads Firefox main process without any Fluent strings, and then content process without Fluent strings then all we do is registered a couple categories. Only when some code asks for L10nRegistry do we add FileSources and load Fluent.jsm.
Only when some document uses Fluent, do we load Localization.jsm and via it Fluent.jsm and L10nRegistry.jsm.

It means that those files can be loaded early, but only if needed. In particular, I hope that fluent-minicache will make us load them early only in the no-cache scenario and then skip them until much later in the startup process.

Flags: needinfo?(gandalf)

It's not a startup regression in any meaningful sense. These modules have loaded before first window paint for a long time. Loading them before profile selection doesn't meaningfully change that. And given that the L10n registry is as core a part of the browser as the chrome registry at this point, it seems reasonable.

I don't love that Fluent.jsm is loaded that early, but as it's going to be needed on the startup path before too long anyway, lazily loading it would just add a bit of extra overhead without any real gain.

(In reply to Zibi Braniecki [:gandalf][:zibi] from comment #48)

  1. Attempt to switch from using Localization to use LocalizationSync (sync I/O) for browser.xhtml and see if that removed talos penalty.

If you do this, please use the URLPreloader, which should at least try to make sure the file contents are read on a background thread and available by the time they're needed. We already do this for string bundles (though not for DTDs, but we don't load DTDs at startup thanks to the XUL prototype cache). It probably won't make a difference to talos, since it has warm disk caches, but it should in real world usage.

If you do this, please use the URLPreloader, which should at least try to make sure the file contents are read on a background thread and available by the time they're needed.

Confirmed with Kris that that's what we're using in sync scenario now.

(In reply to Kris Maglione [:kmag] from comment #49)

It's not a startup regression in any meaningful sense. These modules have loaded before first window paint for a long time. Loading them before profile selection doesn't meaningfully change that.

They were loaded after the early blank paint triggered by BrowserGlue.jsm.

Blocks: 1520446
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: