Closed Bug 1642415 Opened 4 years ago Closed 3 years ago

Memory leak and high CPU when opening prefs/options/settings or on startup, due to language packs matching build locales with older translations that lack some strings causing excessive locale loading

Categories

(Core :: Internationalization, defect, P2)

78 Branch
defect

Tracking

()

RESOLVED FIXED
94 Branch
Performance Impact high
Tracking Status
firefox-esr91 - wontfix
firefox78 --- wontfix
firefox94 --- fixed

People

(Reporter: fayolle-florent, Assigned: dminor)

References

(Blocks 1 open bug)

Details

(Keywords: nightly-community, perf:resource-use, Whiteboard: [MemShrink])

Attachments

(16 files, 2 obsolete files)

246.74 KB, image/png
Details
30.92 KB, application/json
Details
14.18 KB, image/png
Details
21.35 KB, image/png
Details
25.27 KB, image/png
Details
548.04 KB, application/x-xpinstall
Details
437.54 KB, application/x-xpinstall
Details
2.39 KB, application/x-javascript
Details
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review

With my own profile, after start Firefox nightly, it starts hanging about 10s after. I have ~10 pinned tabs and a tab showing about:home.
Interestingly, it seems that the issue occurs only with French builds, not English ones.

I bissected manually and discovered the regression appeared between 2020-05-29-21-46-31 and and 2020-05-30-09-46-43.

I'd be glad to help diagnosing the issue, but I am not sure where to start:

  • what in my profile cause the issue?
  • why not with Engligh builds?

At your disposal for more information.

Florent

Bugbug thinks this bug should belong to this component, but please revert this change in case of error.

Component: Untriaged → General
Product: Firefox → Firefox Build System
Component: General → Untriaged
Product: Firefox Build System → Firefox

Could that be related to bug 1636067 ?

Flags: needinfo?(gijskruitbosch+bugs)
Flags: needinfo?(gandalf)

It'd take more to investigate.

On the surface it doesn't seem to, because the report is about about:home, and contains a regression window much newer than bug 1636067.
On the other hand, since we don't really understand the nature of bug 1636067 it could be that it is triggered by some cofactors between changes to DOM, cache, extensions, and localization, and the regression window reported is a red herring.

or maybe it is a different bug. I think it's too early to tell.

Flags: needinfo?(gandalf)

Can you record a performance profile? I can't really tell from comment #0 if the browser starts out usable and then hangs, or if it hangs on startup. The instructions in https://developer.mozilla.org/en-US/docs/Mozilla/Performance/Reporting_a_Performance_Problem should work if it's usable initially and then hangs for a bit; otherwise,t he steps at https://developer.mozilla.org/en-US/docs/Mozilla/Performance/Profiling_with_the_Built-in_Profiler#Profiling_Firefox_Startup would help to profile startup itself (for "the add-on", read "the builtin profiler", see instructions for enabling that on the first page)

The pushlog for the window described in comment #0 is https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=a58cc68b0c51645d34cbc7f8f8a30edae77ab396&tochange=8aaca63ec5c6dd73365ba31d1972771cb847d5bc . I don't see anything super obvious there, but equally we don't really know what area the problem is in right now...

Flags: needinfo?(gijskruitbosch+bugs) → needinfo?(fayolle-florent)

Can you record a performance profile?

Yes! Hopefully I could!
Here it is: https://share.firefox.dev/2U3FXDU

Florent

Flags: needinfo?(fayolle-florent)

Great! Yes, the profile looks related to fluent like the other issues...

Are there any errors in the browser console (not the regular devtools, the browser one, see the specific entry in the web developer menu) about missing strings or anything else mentioning "fluent"? You may need to toggle the errors/warnings/logs buttons at the top of the browser console to see them.

Flags: needinfo?(fayolle-florent)
See Also: → 1636067

Oh, and one more question - in about:config, if you filter for disable_xul, does anything show up?

Are there any errors in the browser console (not the regular devtools, the browser one, see the specific entry in the web developer menu) about missing strings or anything else mentioning "fluent"? You may need to toggle the errors/warnings/logs buttons at the top of the browser console to see them.

Yes, there are! See the attached screenshot.

Oh, and one more question - in about:config, if you filter for disable_xul, does anything show up?

Nope :)

Cheers!

Flags: needinfo?(fayolle-florent)
See Also: → 1642780
Flags: needinfo?(fayolle-florent)

Could you please post your about:support text?

See Also: 1642780

Am I reading the screenshot correctly that there are 999 (likely more than 1k?) warnings for missing translations?

That doesn't make a lot of sense, given that French is missing maybe 20 strings in that timeframe.

Is the problem still there with the latest Nightly?

The error message seems to indicate that the missing string is for tab-context-undo-close-tabs (the 999 ocurrences) but it also says lower in the log that it is also missing in en-US.

It is very difficult to have Firefox functionning. I'll do my best though.

Also worth mentioning (I discovered that lately, sorry): I don't have this slowness when running Firefox in safe-mode. And I wonder if that extension couldn't be the cause of my issues: https://github.com/eoger/tabcenter-redux/

I'll take you inform and try posting about:support. May it be OK if I collect the info of this page in safe mode?

Cheers!

(In reply to Florent Fayolle from comment #14)

It is very difficult to have Firefox functionning. I'll do my best though.

Also worth mentioning (I discovered that lately, sorry): I don't have this slowness when running Firefox in safe-mode. And I wonder if that extension couldn't be the cause of my issues: https://github.com/eoger/tabcenter-redux/

It's possible but unlikely; safe mode has other consequences that seem to influence whether this issue occurs.

I'll take you inform and try posting about:support. May it be OK if I collect the info of this page in safe mode?

Sure.

(In reply to Francesco Lodolo [:flod] from comment #12)

Am I reading the screenshot correctly that there are 999 (likely more than 1k?) warnings for missing translations?

That doesn't make a lot of sense, given that French is missing maybe 20 strings in that timeframe.

Is the problem still there with the latest Nightly?

The issue seems to be that the code complaining about missing strings just runs repeatedly. That's also why things are slow. The problem is that we don't really understand why this code is running repeatedly. See the fx-desktop-dev matrix discussion between Markus and Zibi over the last 12 hours or so.

Attached file about:support.json

The about:support as requested.

It's possible but unlikely; safe mode has other consequences that seem to influence whether this issue occurs.

You are right. I uninstalled my addons (in my copied profile) and nothing changed.

Flags: needinfo?(fayolle-florent)
See Also: → 1401649

Good news, it seems that I don't encounter the issue anymore!

You may resolve as FIXED if you consider you don't need more information about what happened.

(In reply to Florent Fayolle from comment #17)

Good news, it seems that I don't encounter the issue anymore!

I'm happy for you, but this issue has come and then gone for other people who have encountered it, too... (cf. https://bugzilla.mozilla.org/show_bug.cgi?id=1636067#c45 )

So unclear if this is really fixed. Is it possible for you to try if you can still reproduce with a slightly older version of nightly, and if so, perhaps use mozregression --find-fix (pointing it to your profile) to figure out when this got fixed?

Flags: needinfo?(fayolle-florent)

(In reply to :Gijs (he/him) from comment #18)

(In reply to Florent Fayolle from comment #17)

Good news, it seems that I don't encounter the issue anymore!

I'm happy for you, but this issue has come and then gone for other people who have encountered it, too... (cf. https://bugzilla.mozilla.org/show_bug.cgi?id=1636067#c45 )

So unclear if this is really fixed. Is it possible for you to try if you can still reproduce with a slightly older version of nightly, and if so, perhaps use mozregression --find-fix (pointing it to your profile) to figure out when this got fixed?

Unfortunately, we miss locales with mozregression (bug1401649). Are you sure it could be a good tool to investigate with?

However, if you look at the browser console, you see lots of warning related to the tab-context-undo-close-tabs l10n key. That key has been introduced here: https://hg.mozilla.org/mozilla-central/rev/3d607fcf4deef85cd48f6cd7d0507b62244a0e5c

So I guess the bug is solved only in appearance, because the French translation for this key has been probably integrated (still this is only an hypothesis).

The "good" news is that I still have the issue when I go to about:preferences for the same reason (see the attached screenshot). So there is a room for more investigations :).

Flags: needinfo?(fayolle-florent)

(In reply to Florent Fayolle from comment #19)

(In reply to :Gijs (he/him) from comment #18)

(In reply to Florent Fayolle from comment #17)

Good news, it seems that I don't encounter the issue anymore!

I'm happy for you, but this issue has come and then gone for other people who have encountered it, too... (cf. https://bugzilla.mozilla.org/show_bug.cgi?id=1636067#c45 )

So unclear if this is really fixed. Is it possible for you to try if you can still reproduce with a slightly older version of nightly, and if so, perhaps use mozregression --find-fix (pointing it to your profile) to figure out when this got fixed?

Unfortunately, we miss locales with mozregression (bug1401649). Are you sure it could be a good tool to investigate with?

I assumed you had the locale installed in the profile as a langpack, but I guess that was a mistake...

But yes, it would appear that providing the missing entity fixed things... that is interesting. The warnings are sort of strange in that the expectation is that there is 1 warning, not 1000 - the large number of extra ones indicate something went wrong, but it is not clear what. And in a clean profile, AIUI from the other bugs, the problem does not reproduce (for reasons we don't understand).

I don't suppose you are familiar with the devtools / debugger? The broken code is JS, so for the about:preferences case, it's possible it can be debugged with the browser toolbox, to find out what is going wrong...

Flags: needinfo?(fayolle-florent)
See Also: → 1643844

According to comment 16, he's on Nightly, so no reliable language pack support there.

I tried installing the same French build on macOS as comment 16
http://archive.mozilla.org/pub/firefox/nightly/2020/06/2020-06-03-09-29-05-mozilla-central-l10n/

As I expected, I'm only getting one warning for tab-context-undo-close-tabs, and one for sitedata-cookies-exceptions when I open Preferences.

Same situation on Ubuntu with that build.

@Florent
Which version/distribution of Linux are you using?

@Florent
Which version/distribution of Linux are you using?

I am using Debian Sid. I attempt to do a dist-upgrade if, by chance, an update changes the issue.

I don't suppose you are familiar with the devtools / debugger? The broken code is JS, so for the about:preferences case, it's possible it can be debugged with the browser toolbox, to find out what is going wrong...

I am quite familiar with the devtools as I use them nearly every day. I also used to contribute to the Devtools and Firebug (though it was years ago), so I am also a bit familiar with Browser Debugger.

But I don't understand why I cannot make the breakpoints halt on the functions the profiler highlights (especially generateBundles and generateResourceSetsForLocale). There might be a bad recursion in generateResourceSetsForLocale, though I am far from being confident about this hypothesis.

For what it's worth, another performance profile (the preference tab is open at 6 seconds) : https://share.firefox.dev/377OuL8

I will try the dist-upgrade and if it doesn't work anyway, I may ask for some help debugging. I am on Riot in the nightly channel, if I may ping you for that.

Flags: needinfo?(fayolle-florent)

(In reply to Florent Fayolle from comment #23)

I am quite familiar with the devtools as I use them nearly every day. I also used to contribute to the Devtools and Firebug (though it was years ago), so I am also a bit familiar with Browser Debugger.

But I don't understand why I cannot make the breakpoints halt on the functions the profiler highlights (especially generateBundles and generateResourceSetsForLocale). There might be a bad recursion in generateResourceSetsForLocale, though I am far from being confident about this hypothesis.

Hm, odd, setting breakpoints on lines inside generateBundles or generateResourceSetsForLocale works for me, if I do it prior to opening about:preferences . It's possible something's broken and it may work if you quit the debugger, then restart the browser, and then open the debugger and set the breakpoints, then open about:preferences ? Also make sure you get the "right" generateBundles - you want the L10nRegistry.jsm one.

Some recursion is expected, though the code isn't the easiest to follow...

Flags: needinfo?(fayolle-florent)

I'll make a screencast of what I do with the Browser Debugger if that may help. I may miss something important…

Also worth to mention that the breakpoint is hit when I open a new tab.

One more question, since I realize about:support doesn't list language packs.

If you open about:addons, do you have a Languages pane on the left? Are there any languages installed?

If you open about:addons, do you have a Languages pane on the left? Are there any languages installed?

Yes. French and American ones.

Flags: needinfo?(fayolle-florent)
Attached image language packs.png
Attached image dictionnaries.png

(In reply to Florent Fayolle from comment #27)

If you open about:addons, do you have a Languages pane on the left? Are there any languages installed?

Yes. French and American ones.

OK, that shouldn't happen.

  1. Do you remember installing them?
  2. How did you install Nightly on your system?

We had a bug recently that caused 77.* language packs to be compatible with Nightly, but it's fixed now, so those should be marked as incompatible.

We also updated about:support to include information about language packs, could you update your Nightly, then copy and paste the raw data from about:support to this bug?

OK, that shouldn't happen.

  1. Do you remember installing them?
  2. How did you install Nightly on your system?

I reset my firefox profile around April, so I guess I should have installed the language pack during that time.

I probably have installed it through this page : https://addons.mozilla.org/fr/firefox/language-tools/

Through the contextual menu ("Languages ⇒ Add dictionnary").

 We also updated about:support to include information about language packs, could you update your Nightly, then copy and paste the raw data from about:support to this bug?

Ooops, I should have read your request before. I may be able to restore my Firefox profile with the language pack, hang on!

Florent

I think I might have found a way to replicate this, it would be great if someone else can try.

I'm using a recent Italian Nightly build, but I would expect this to work on an English build as well (I'm on 20200629154604, 80.0a1), given the language pack should override it.

  1. Enable unsigned language pack by setting extensions.langpacks.signatures.required to false, and enable UI language switcher setting to intl.multilingual.enabled true.

  2. Install the two language packs that I'll attach to the bug (en-US and fr). They're old language packs (e.g. for English) that I've unpacked, removed signatures, set compatibility to 80.a1, and added an empty featureGates file to avoid localization falling back for lack of a FTL file in the context.

  3. Set French as first requested locale, and English as second locale. Restart the browser. In my case I had to force quit the browser.

NI to see if you can reproduce. In case, I guess I can also share the whole profile, since it doesn't contain any personal info.

Flags: needinfo?(gijskruitbosch+bugs)
See Also: → 1636196

I can reproduce. Investigating.

OK, my first confusion is that I'm seeing multiple calls to generateBundles from the window being opened. I added logging in https://searchfox.org/mozilla-central/rev/31d8600b73dc85b4cdbabf45ac3f1a9c11700d8e/intl/l10n/Localization.jsm#462 and it shows:

Creating cached bundles instance for branding/brand.ftl, browser/branding/sync-brand.ftl, browser/branding/brandings.ftl, toolkit/global/textActions.ftl, browser/browser.ftl, browser/browserContext.ftl, browser/browserSets.ftl, browser/menubar.ftl, browser/protectionsPanel.ftl, browser/appmenu.ftl, preview/interventions.ftl, browser/sidebarMenu.ftl, browser/allTabsMenu.ftl, browser/places.ftl, sync, not-eager

4 times on startup (there's only 1 browser window).

All of them are directly from C++, with stacks:

>	xul.dll!mozilla::intl::Localization::Activate(const bool aEager) Line 99	C++
 	xul.dll!mozilla::dom::HTMLSharedElement::DoneAddingChildren(bool aHaveNotified) Line 64	C++
 	[Inline Frame] xul.dll!mozilla::dom::PrototypeDocumentContentSink::CloseElement(mozilla::dom::Element * aElement) Line 400	C++
 	xul.dll!mozilla::dom::PrototypeDocumentContentSink::ResumeWalkInternal() Line 450	C++
 	[Inline Frame] xul.dll!mozilla::dom::PrototypeDocumentContentSink::ResumeWalk() Line 405	C++
 	xul.dll!mozilla::dom::PrototypeDocumentContentSink::OnPrototypeLoadDone(nsXULPrototypeDocument * aPrototype) Line 258	C++
 	[Inline Frame] xul.dll!std::_Func_class<void>::_Empty() Line 1283	C++
 	[Inline Frame] xul.dll!std::_Func_class<void>::operator()() Line 421	C++
 	xul.dll!nsXULPrototypeDocument::NotifyLoadDone() Line 419	C++
>	xul.dll!mozilla::intl::Localization::OnChange() Line 189	C++
 	xul.dll!mozilla::intl::Localization::AddResourceId(const nsTSubstring<char16_t> & aResourceId) Line 201	C++
 	xul.dll!mozilla::dom::Document::LocalizationLinkAdded(mozilla::dom::Element * aLinkElement) Line 0	C++
 	xul.dll!mozilla::dom::HTMLLinkElement::BindToTree(mozilla::dom::BindContext & aContext, nsINode & aParent) Line 134	C++
 	xul.dll!nsINode::InsertChildBefore(nsIContent * aKid, nsIContent * aChildToInsertBefore, bool aNotify) Line 1519	C++
 	[Inline Frame] xul.dll!nsINode::AppendChildTo(nsIContent * aKid, bool aNotify) Line 855	C++
 	xul.dll!mozilla::dom::PrototypeDocumentContentSink::ResumeWalkInternal() Line 493	C++
 	[Inline Frame] xul.dll!mozilla::dom::PrototypeDocumentContentSink::ResumeWalk() Line 405	C++
 	xul.dll!mozilla::dom::PrototypeDocumentContentSink::OnPrototypeLoadDone(nsXULPrototypeDocument * aPrototype) Line 258	C++
 	[Inline Frame] xul.dll!std::_Func_class<void>::_Empty() Line 1283	C++
 	[Inline Frame] xul.dll!std::_Func_class<void>::operator()() Line 421	C++
 	xul.dll!nsXULPrototypeDocument::NotifyLoadDone() Line 419	C++
>	xul.dll!mozilla::intl::Localization::Activate(const bool aEager) Line 99	C++
 	[Inline Frame] xul.dll!mozilla::dom::PrototypeDocumentContentSink::CloseElement(mozilla::dom::Element * aElement) Line 400	C++
 	xul.dll!mozilla::dom::PrototypeDocumentContentSink::ResumeWalkInternal() Line 450	C++
 	[Inline Frame] xul.dll!mozilla::dom::PrototypeDocumentContentSink::ResumeWalk() Line 405	C++
 	xul.dll!mozilla::dom::PrototypeDocumentContentSink::OnPrototypeLoadDone(nsXULPrototypeDocument * aPrototype) Line 258	C++
 	[Inline Frame] xul.dll!std::_Func_class<void>::_Empty() Line 1283	C++
 	[Inline Frame] xul.dll!std::_Func_class<void>::operator()() Line 421	C++
 	xul.dll!nsXULPrototypeDocument::NotifyLoadDone() Line 419	C++
>	xul.dll!mozilla::intl::Localization::Activate(const bool aEager) Line 102	C++
 	[Inline Frame] xul.dll!mozilla::dom::Document::OnL10nResourceContainerParsed() Line 3970	C++
 	xul.dll!mozilla::dom::Document::OnParsingCompleted() Line 3980	C++
 	xul.dll!mozilla::dom::PrototypeDocumentContentSink::MaybeDoneWalking() Line 593	C++
 	xul.dll!mozilla::dom::PrototypeDocumentContentSink::StyleSheetLoaded(mozilla::StyleSheet * aSheet, bool aWasDeferred, nsresult aStatus) Line 674	C++
 	xul.dll!mozilla::css::Loader::NotifyObservers(mozilla::css::SheetLoadData & aData, nsresult aStatus) Line 1546	C++
 	xul.dll!mozilla::SharedStyleSheetCache::LoadCompleted(mozilla::SharedStyleSheetCache * aCache, mozilla::css::SheetLoadData & aData, nsresult aStatus) Line 280	C++

Zibi, why is that? It seems... bad.

(Going to keep looking, but the steps from flod's comment mean I cannot use the JS debugger because the problem happens too early in startup, which is making this a bit tricky.)

Flags: needinfo?(gandalf)

Set French as first requested locale, and English as second locale. Restart the browser. In my case I had to force quit the browser.

In case it matters, I can reproduce without even doing this; after installing the locales, I could no longer open the prefs. So it's just trying to localize into English in my debugging.

The other (viz comment 38) issue here is, I think, to do with the combination of the approach we use to generate bundles, and the fact that the langpacks always come first (in terms of what file sources we use to look up ftl files and thus messages) but in this case are older ("more incomplete") than the app pack.

When we first enter formatWithFallbackSync for the initial translation of browser.xhtml, we want some 400-odd strings, from 15 files. We go through 1025 (!) "bundles" until we find a combination that provides all the necessary strings. That (and creating/finding all those bundles) is what's slow. On a profile without the langpack, we manage in our first "bundle".

This is on top of the fact that both langpacks and the app resources get represented as "file sources" twice - once for browser/ and once for toolkit/, and we apparently cannot pre-determine where to look for a given file, so "even" in the "good" case, we have some false starts finding this bundle in the generateResourceSetsForLocale method: we first try to find everything in browser, but toolkit/global/textActions.ftl lives in the toolkit package, which means we then have to "backtrack" and try again after looking for that file in the toolkit package. It's not clear to me why we cannot associate files ("resource ids") to "types" of file sources. This would speed the process up by at least half, as far as I can tell.

Still, the fundamental issue seems to be about preferring langpacks and what happens if strings do not exist in those langpacks (and then we end up falling back to the app resource if available in the relevant locale). I don't really know how to fix this - although we know which messages are missing, we also don't know what file they're "supposed" to come from, so we cannot target the search so that we only try fallback bundles for the "relevant" messages.

We can fix the particular occurrence I'm seeing by down-prioritizing the langpacks if they are for the app locale and predate the app (based on dates / build ids). I don't know if that fixes the case from comment #32 where the preferred locale does not match the build locale.

Zibi, any ideas?

Flags: needinfo?(gijskruitbosch+bugs)

and we apparently cannot pre-determine where to look for a given file

We can. When working on L10nRegistry I was particularly concerned about the cost of using I/O to "try" for files all around, while we know whether the file is there or not at build time (of langpack or omni.ja).

In order to have this option available I introduced FileSource and IndexedFileSource. We don't use IndexedFileSource but we could switch to it and provide a list of available file per source to speed up lookups.
That avenue was originally especially promising for "smaller" sources which override just several files and would have them whitelisted and allow for fast-bail for all others.

We can fix the particular occurrence I'm seeing by down-prioritizing the langpacks if they are for the app locale and predate the app (based on dates / build ids).

That seems to be bug 1420173. At the time when I was working on it, this has been deprioritized and I was asked to move to work on other things.

I don't know if that fixes the case from comment #32 where the preferred locale does not match the build locale.

I'm concerned that there's some remaining underlying bug. Even in the worst case scenario, as far as I understand, we should still not blow up the search to that magnitude. If your langpack has missing files, L10nRegistry should just add another fallback, not 1000 of them.

Zibi, why is that? It seems... bad.

It does, but it's not the same document. We load more windows during startup, so if you want to verify that, you'd need to dump document uri's. If all four were from the same document (browser.xhtml) then I'd be worried.

Flags: needinfo?(gandalf)

(In reply to Zibi Braniecki [:zbraniecki][:gandalf] from comment #40)

I don't know if that fixes the case from comment #32 where the preferred locale does not match the build locale.

I'm concerned that there's some remaining underlying bug. Even in the worst case scenario, as far as I understand, we should still not blow up the search to that magnitude. If your langpack has missing files, L10nRegistry should just add another fallback, not 1000 of them.

There aren't missing files, just missing strings, so all the bundles are valid, but formatWithFallbackSync continues going through the bundle iterator (until it finds one with 0 missing translations). All the files exist in at least 2 filesources - the langpack and the app. browser.xhtml asks for 15 ftl files. So for each of those files, there are 2 options. So the total number of possible bundles is on the order of 2 to the power of 15. The fact that we find one that works after 2 to the power of 10 "already" presumably has to do with whether the messages missing from the langpack are in the first or the 15th file in the list. But I don't see any reason way in which we could "shortcut" this generation process, because the registry is oblivious to the messages required from each resource and thus can't guide its search for a "winning" bundle combination. Am I missing something?

Flags: needinfo?(gandalf)

That makes sense, sadly.

Here's an idea, and I might be able to hack something up.

It's perfectly fine to put all resources for one language into one bundle. It might actually be good. It used to be frowned upon, and we have comments everywhere that says we should error, but also, we don't have to.

What this does is it allows us to generate one bundle per language, and then do fallback IO on missing strings only. We re-yield the same bundle instance in that case.

That way, we can reduce the generator to only go through each source once, instead of counting to 100 by counting 1 a hundred times.

In this case, the loop would drop to 30 yields of one bundle.

That would also eliminate the memory problem, as we reiterate over the same bundle, and don't cache 2**15 different ones (which would explain the memory problem part)

It's OK to put all resources for one language into a single bundle.
If we fall back for missing strings, load all resources from the
next source, as we don't know which resource we need to fall back
for anyway.

Gonna punt this to Axel whose patch seems to fix the issue, and who's opened a discourse thread on the exact way forward here.

Assignee: nobody → l10n
Status: NEW → ASSIGNED
OS: Linux → All
Hardware: x86_64 → All
See Also: 1636067
See Also: 1636196
Summary: Memory leak and high CPU on startup, leading Firefox french builds to be unusable → Memory leak and high CPU when opening prefs/options/settings or on startup, due to language packs matching build locales with older translations that lack some strings causing excessive locale loading

It's perfectly fine to put all resources for one language into one bundle.

I agree with Axel that putting all locales from one language into one bundle would be an option, but I also agree that it would be a quite significant change in the nature of bundles and may have consequences beyond of scope of this bug.

One alternative proposal that may be less invasive would be to hard separate packaged and non-packaged locales.

Basically, the issue is that we have a packaged locale with two FileSource's - toolkit and browser - one per omni.ja for a locale X.
Then, we have a langpack with the same locale (third source).

If the langpack misses any string, L10nRegistry tries to construct the right fallback chain combining all available permutations of files between those three sources.
While toolkit and browser mostly don't overlap, langpack and packaged overlap a lot. In most cases they overlap 100% in terms of files.

If we were to augment L10nRegistry to not mingle between langpack of locale X and packaged source of locale X, we'd be reducing this to basically two variants:

  1. FileBundle with 15 files from langpack-x
  2. FileBundle with 15 files from toolkit omni.ja and browser omni.ja

This would address the memory and performance issues and wouldn't require us to redefine what a FileBundle is meant to be and wouldn't limit our fallbacking since I don't think we really want to fallback between langpack and packaged of the same locale.

The simplest way I imagine to achieve such model would be to add a new field to the FileSource which would indicate whether the source is packaged or langpack and then not mix between them.

Flags: needinfo?(gandalf)

I agree that now is not a good time to drive change through the bundle implementations.

Given my patch depends on that, I'll abandon that and unassign myself.

If we revisit the idea of adding Remote Settings, we'll need to clearly understand what that's expected to be in the light of our findings here.

Assignee: l10n → nobody
Status: ASSIGNED → NEW
Attachment #9161219 - Attachment is obsolete: true

(In reply to Zibi Braniecki [:zbraniecki][:gandalf] from comment #48)

It's perfectly fine to put all resources for one language into one bundle.

I agree with Axel that putting all locales from one language into one bundle would be an option, but I also agree that it would be a quite significant change in the nature of bundles and may have consequences beyond of scope of this bug.

One alternative proposal that may be less invasive would be to hard separate packaged and non-packaged locales.

Basically, the issue is that we have a packaged locale with two FileSource's - toolkit and browser - one per omni.ja for a locale X.
Then, we have a langpack with the same locale (third source).

If the langpack misses any string, L10nRegistry tries to construct the right fallback chain combining all available permutations of files between those three sources.

Four sources - the langpack also gets split up into toolkit and browser bits.

While toolkit and browser mostly don't overlap, langpack and packaged overlap a lot. In most cases they overlap 100% in terms of files.

If we were to augment L10nRegistry to not mingle between langpack of locale X and packaged source of locale X, we'd be reducing this to basically two variants:

  1. FileBundle with 15 files from langpack-x

What is a "filebundle" ?

  1. FileBundle with 15 files from toolkit omni.ja and browser omni.ja

This would address the memory and performance issues and wouldn't require us to redefine what a FileBundle is meant to be and wouldn't limit our fallbacking since I don't think we really want to fallback between langpack and packaged of the same locale.

The simplest way I imagine to achieve such model would be to add a new field to the FileSource which would indicate whether the source is packaged or langpack and then not mix between them.

Not mix where?

Given Axel has unassigned, are you able to drive this?

Flags: needinfo?(gandalf)

Four sources - the langpack also gets split up into toolkit and browser bits.

Correct, I was wrong in my description.

So, we'd end up with:

  1. toolkit-langpack-x-browser + browser-langpack-x-browser
  2. 0-toolkit + 5-browser

the only permutations would happen for files that overlap between toolkit and browser within the same source (langpack vs packaged)

What is a "filebundle" ?

I meant FluentBundle.

Not mix where?

Not mix when producing iterations in the L10nRegistry generator.

Given Axel has unassigned, are you able to drive this?

I am not available at the moment.

Flags: needinfo?(gandalf)

Given that we now know the problem, I am flabbergasted that we have neither moved this bug to a component, nor prioritized it in any way.

I have no idea if I am moving it to the right one, but Firefox/Untriaged is definitely the wrong one.

Component: Untriaged → Internationalization
Product: Firefox → Core
Whiteboard: [qf][MemShrink]

This could belong to various different qf:* classifications, picking up resource for now. Sounds like p1 for the cases when this happens.
Zibi, do you think you could find someone to take a look at this?

Flags: needinfo?(gandalf)
Whiteboard: [qf][MemShrink] → [qf:p1:resource][MemShrink]

I've been working on an unrelated project for Shirley, but my team is planning to pick up L10nRegistry conversion to Rust, so we'll look into this. I can't promise any ETA or even when we'll start as of yet.

Flags: needinfo?(gandalf)

The severity field is not set for this bug.
:m_kato, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(m_kato)
Severity: -- → S3
Flags: needinfo?(m_kato)
Priority: -- → P3
See Also: → 1642243
Severity: S3 → S2
Priority: P3 → P2
Attached file catastrophic-l10nreg-test.js (obsolete) —

Here's a minimized test case that reproduces the hang in JS scenario. (I'll update it to also support Rust variant once I get the PRs in bug 1660392 to support registerSources/removeSources).

A proposal on how to solve it in l10nregistry-rs - https://github.com/zbraniecki/l10nregistry-rs/issues/12

Here's an updated test that tests against the Rust L10nRegistry. It uses the list of resIds from Preferences which is the pathological case (21 resIds). browser.xhtml is smaller (16 resIds)

The uncommented resIds is the largest number before JS OOMs, and the performance is roughly 2x better with Rust than JS, but that's definitely not enough :)

The current algorithm, for all 21 resIds, will compute 2097152 (two million) permutations.
The algorithm suggested in https://github.com/zbraniecki/l10nregistry-rs/issues/12 will shrink that to 2 (two).

Attachment #9200444 - Attachment is obsolete: true

(In reply to Zibi Braniecki [:zbraniecki][:gandalf] from comment #61)

The current algorithm, for all 21 resIds, will compute 2097152 (two million) permutations.
The algorithm suggested in https://github.com/zbraniecki/l10nregistry-rs/issues/12 will shrink that to 2 (two).

I assume there are some trade-offs to this approach. Could you help me understand which combinations we're giving up? Mostly trying to figure out what's the possible impact for users.

There are, and i had it listed in my original comment and then lost that part while editing. Thank you for calling it out!

The cost is that we lose a fallback scenario where we construct a localization bundle out of resources mixed between packaged and language pack.

Imagine a scenario where 20 resources are in langpack, and one is missing.
Currently one of the fallbacks would be those 20 from langpack plus the remaining one from packaged.

In the new model, we will load all 21 from packaged since we could not find all 21 in langpack.

(With a caveat that we also will gain flexibility in identifying how many missing resources invalidate a bundle, so maybe one missing file will not and then we will create a bundle with 20 locales from langpack, localize what we can, and then create a bundle with 21 from packaged and localize the rest).
The bottom line is that we'll lose a "mixed" bundle with some resources from langpack and some from packaged. But we will not lose ability to end up with translations on a screen where what was available came from langpack, and the rest came from packaged, so i think the impact is acceptable.

Lmk if you'd like me to document the trade-off better - it's a bit tricky to describe but i can try to draw some diagrams :)

(In reply to Zibi Braniecki [:zbraniecki][:gandalf] from comment #63)

There are, and i had it listed in my original comment and then lost that part while editing. Thank you for calling it out!

The cost is that we lose a fallback scenario where we construct a localization bundle out of resources mixed between packaged and language pack.

Imagine a scenario where 20 resources are in langpack, and one is missing.

Uh, so just to be clear, ISTR the failures we encountered before were not because an entire file was missing, but because a single message could not be translated with the given combination of resources (because some of them were old), see comment 41. This also meant we kept re-parsing all of these resources (which is what caused the memory leak).

Is this addressed by your new strategy? Because your comment talks about entire files missing.

And, orthogonal question: what's the timeline for a fix landing here?

Flags: needinfo?(zbraniecki)

Is this addressed by your new strategy?

Yes, that's precisely what the algorithm proposal solves.

Because your comment talks about entire files missing.

Flod asked what's the trade-off. I described what fallback scenarios we will not use when the new model is deployed.

In other words, the tradeoff is that on the pros side we will not be constructing millions of Bundles, on the cons, we will construct fewer permutations of fallbacks.
I tried to explain that while we will miss some potential fallback scenarios, they are quirky (combined packages+langpack).

what's the timeline for a fix landing here?

The only remaining blocker, bug 1660392 is in the review, and my hope is that we'll work on the fix for this bug in February.

Flags: needinfo?(zbraniecki)

Hi, I found this bug after a "quick search" and it seems like I might have this exact bug. I posted a screen recording on Reddit with some information. I'm using Canadian English dictionary + language pack and the French dictionary + language pack on Firefox 85 & Firefox Developer Edition 86.

I narrowed it down to this specific language pack : https://addons.mozilla.org/en-CA/firefox/addon/english-ca-language-pack/

(In reply to Nato Boram from comment #67)

I narrowed it down to this specific language pack : https://addons.mozilla.org/en-CA/firefox/addon/english-ca-language-pack/

@Nato

I'm sorry that you're having this issue, but the root of the problem is known (it's not a specific language pack, it's a set of conditions), and just needs resources to be fixed. Happy to follow-up on the Reddit thread.

This is now unblocked and :dminor has a patch that should address teh issue in https://github.com/mozilla/l10nregistry-rs/pull/14 .

He's on PTO until the end of next week, but after that I'll work with him on finding a way to validate it and land into Gecko.

NI on Dan to remind him to take a look at it when he's back :)

Flags: needinfo?(dminor)
Assignee: nobody → dminor
Flags: needinfo?(dminor)
See Also: → 1725955
Status: NEW → ASSIGNED
Attachment #9240599 - Attachment description: WIP: Bug 1642415 - Update to l10nregistry to 0.3.0 → Bug 1642415 - Update to l10nregistry to 0.3.0; r=#platform-i18n-reviewers!
Attachment #9240600 - Attachment description: WIP: Bug 1642415 - Rerun mach vendor rust → Bug 1642415 - Rerun mach vendor rust; r=#platform-i18n-reviewers!
Pushed by dminor@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/e46c1762af52
Update to l10nregistry to 0.3.0; r=platform-i18n-reviewers,gregtatum
https://hg.mozilla.org/integration/autoland/rev/624153727c52
Rerun mach vendor rust; r=platform-i18n-reviewers,gregtatum
https://hg.mozilla.org/integration/autoland/rev/55b7540af357
Use metasource in l10nregistry-ffi; r=platform-i18n-reviewers,gregtatum
https://hg.mozilla.org/integration/autoland/rev/3b37fd3ee78e
Remove unused L10nFileSource::Create method; r=platform-i18n-reviewers,gregtatum
https://hg.mozilla.org/integration/autoland/rev/c973e0fbc5ee
Use metasource in FileSource; r=platform-i18n-reviewers,extension-reviewers,gregtatum,zombie
https://hg.mozilla.org/integration/autoland/rev/d750cef3fb6a
Update existing tests to pass metasource into createMock; r=platform-i18n-reviewers,gregtatum
https://hg.mozilla.org/integration/autoland/rev/b1c1070d30b5
Add unit tests for metasources; r=platform-i18n-reviewers,gregtatum
https://hg.mozilla.org/integration/autoland/rev/e7f3e1f5b625
Add metasource to L10nFileSourceDescriptor; r=platform-i18n-reviewers,gregtatum
See Also: → 1733839
Flags: qe-verify+
See Also: 1725955
Blocks: 1674132
Blocks: 1738460
No longer blocks: 1738460
Blocks: 1738460
Blocks: 1737922

Can we uplift this to esr91? We have a bunch of reports of this causing problems in Thunderbird. (Yet unclear if it's the only issue around lang-packs.)

Flags: needinfo?(dminor)

(In reply to Magnus Melin [:mkmelin] from comment #87)

Can we uplift this to esr91? We have a bunch of reports of this causing problems in Thunderbird. (Yet unclear if it's the only issue around lang-packs.)

The issues are very severe indeed, ranging from crashes (OOM | small crashes in version 91 are up 50%, and 30% have lang language packs installed), to users unable to access add-on manager in order to remediate by disabling language packs (unless they have started in safe mode).

(In reply to Magnus Melin [:mkmelin] from comment #87)

Can we uplift this to esr91? We have a bunch of reports of this causing problems in Thunderbird. (Yet unclear if it's the only issue around lang-packs.)

bug 1660392 is in the "depends on" list, and AIUI is required for this fix, and didn't make ESR. Uplifting that and all its dependencies, which ends up meaning a bunch of FTL feature work that landed in 92, feels like it's a really sizable change to be making on ESR... Zibi, how feasible would this be?

It's probably worth having a separate bug tracking the TB breakage and trying to analyze what is causing it (e.g. are there particular locales where this is predominantly happening, and/or are there somehow more TB users who have both a langpack and a built-in package of localization for a given locale, and/or is TB code simply asking for strings that do not exist, which could AIUI trigger similar catastrophic fallback behaviour with the previous fluent strategy for finding suitable language information).

Flags: needinfo?(zbraniecki)
Flags: needinfo?(mkmelin+mozilla)

Zibi, how feasible would this be?

Safe, but very involved. It's not just bug 1660392 but also bug 1672317. The total volume of change is a result of over year of work.

It's probably worth having a separate bug tracking the TB breakage and trying to analyze what is causing it

Agree. I'm wondering why we see it in Tb 91 and we haven't before. Is there a small-scale fix that would save us (like, hardcoding an ftl resource that is missing, or fine-tuning the old L10nRegistry.js to not fallback on a particular missing file?).

Flags: needinfo?(zbraniecki)

I think this would be too large a change to uplift to ESR, given the dependencies.

Flags: needinfo?(dminor)

Thunderbird can track this in bug 1725955, bug 1674132, and bug 1737922.

(In reply to :Gijs (he/him) from comment #89)

It's probably worth having a separate bug tracking the TB breakage

Found a way to reproduce and at least work around one case in bug 1728744.

Flags: needinfo?(mkmelin+mozilla)
Blocks: 1728744
Performance Impact: --- → P1
Whiteboard: [qf:p1:resource][MemShrink] → [MemShrink]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: