Closed Bug 1384236 Opened 7 years ago Closed 6 years ago

Consider caching l10n resources differently in L10nRegistry

Categories

(Core :: Internationalization, enhancement, P2)

enhancement

Tracking

()

RESOLVED FIXED
mozilla63
Tracking Status
firefox63 --- fixed

People

(Reporter: zbraniecki, Assigned: zbraniecki)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

As per bug 1333980 comment 29:

Initially, L10nRegistry caches whole messagecontext based on the combination of sources and resourceIDs.

That means that opening two windows with browser.xul localized using Fluent will result in the same MessageContext being shared between them.

It may make sense to consider caching the parsed AST of the FTL file instead.
That would mean that if any other Localization context wants to use the same file (say "brand.ftl") they wouldn't need to parse it.

The cost is that we'd be creating a new MessageContext for each new browser.xul, but we wouldn't be parsing FTL for it, just reuse the cache.

It's something I'd like to decide on before we start using Fluent in Firefox but not block the landing of L10nRegistry, hence this bug.
Blocks: 1365426
Depends on: 1333980
Priority: -- → P3
While working on L10nRegistry, :mossop and I discussed different levels of caching in L10nRegistry.

At the moment we have two:
1) We cache files as they're loaded.

So, if a file "brand.ftl" is loaded from the same source for the same locale multiple times, starting from the second attempt it'll just return a cached resolved promise.

2) We cache resolved contexts.

This, in result means that, for example, if we open a new window, the browser.xul gets a cached context rather than a new one (since the first browser.xul generated the context already).


This dual-caching is a bit unfortunate because it means that:
a) we cache the same bits twice - once in a raw string (1), and then parsed in context (2).
b) we return the same context between different documents. I'm not sure if it bears any security consequences, but I believe it to be pretty unexpected since contexts are stateful.

From the perspective of the flow, it seems that a more "natural" fit would be to cache parsed FTL files and construct a new context for each new call.

This would mean that we cache things once, we avoid re-parsing (fix (a)), and we create new stateful context each time (fix (b)).

As far as I understand, we originally decided to try other approaches to keep the MessageContext API clean and avoid having to standardize AST of the parsed FTL as part of the API.
I now believe that it would be valuable to expose two additional methods (maybe for Gecko+L10nRegistry use only?):

1) A method to parse FTL file
2) A sister method to `MessageContext.addMessages` that accepts the parsed data structure rather than a string

:mossop, :stas, :pike - Thoughts?
Flags: needinfo?(stas)
Flags: needinfo?(l10n)
Flags: needinfo?(dtownsend)
(In reply to Zibi Braniecki [:gandalf][:zibi] from comment #1)

> 1) We cache files as they're loaded.
> 2) We cache resolved contexts.

I don't think this is necessarily a bad design. It separates concerns: #1 saves the IO time and #2 saves the CPU/parsing time. These two caching mechanisms serve different purposes and as such they don't struck me as redundant.
 
> b) we return the same context between different documents. I'm not sure if
> it bears any security consequences, but I believe it to be pretty unexpected
> since contexts are stateful.

If we're worried about MessageContext's mutability, we could expose a freeze() or a seal() method which forbids any more calls to addMessages(). Being able to re-use the same MessageContext instance is a feature of its design. If there's a problem with that design, let's fix it.

> As far as I understand, we originally decided to try other approaches to
> keep the MessageContext API clean and avoid having to standardize AST of the
> parsed FTL as part of the API.
> I now believe that it would be valuable to expose two additional methods
> (maybe for Gecko+L10nRegistry use only?):
> 
> 1) A method to parse FTL file
> 2) A sister method to `MessageContext.addMessages` that accepts the parsed
> data structure rather than a string

I'm opposed to this idea. I don't want the API to expose the internal representation of parsed messages to the outside world. That already happens with MessageContext.getMessage and I'm not happy with it.

If we still do something like this, I would ask that we explicitly not standardize the internal implementation and make it implementation dependent.
Flags: needinfo?(stas)
Nothing to add to stas' comment.
Flags: needinfo?(l10n)
(In reply to Staś Małolepszy :stas from comment #2)
> (In reply to Zibi Braniecki [:gandalf][:zibi] from comment #1)
> 
> > 1) We cache files as they're loaded.
> > 2) We cache resolved contexts.
> 
> I don't think this is necessarily a bad design. It separates concerns: #1
> saves the IO time and #2 saves the CPU/parsing time. These two caching
> mechanisms serve different purposes and as such they don't struck me as
> redundant.

It doubles the memory used by locale data though doesn't it?

> > b) we return the same context between different documents. I'm not sure if
> > it bears any security consequences, but I believe it to be pretty unexpected
> > since contexts are stateful.
> 
> If we're worried about MessageContext's mutability, we could expose a
> freeze() or a seal() method which forbids any more calls to addMessages().
> Being able to re-use the same MessageContext instance is a feature of its
> design. If there's a problem with that design, let's fix it.
> 
> > As far as I understand, we originally decided to try other approaches to
> > keep the MessageContext API clean and avoid having to standardize AST of the
> > parsed FTL as part of the API.
> > I now believe that it would be valuable to expose two additional methods
> > (maybe for Gecko+L10nRegistry use only?):
> > 
> > 1) A method to parse FTL file
> > 2) A sister method to `MessageContext.addMessages` that accepts the parsed
> > data structure rather than a string
> 
> I'm opposed to this idea. I don't want the API to expose the internal
> representation of parsed messages to the outside world. That already happens
> with MessageContext.getMessage and I'm not happy with it.
> 
> If we still do something like this, I would ask that we explicitly not
> standardize the internal implementation and make it implementation dependent.

I would prefer this too.
Flags: needinfo?(dtownsend)
Would you accept that as an internal performance optimization for Gecko?
Flags: needinfo?(stas)
(In reply to Dave Townsend [:mossop] from comment #4)

> It doubles the memory used by locale data though doesn't it?

It does. But a solution to this may be for instance to drop the first type of caching :)

MessageContext already is a thin-wrapper around its parsed messages. The difference between it and what this bug is proposing is that MessageContext is a wrapper around multiple FTL files at once.

A quick grep through mozilla-central suggests that it's not very common to have more than one DTD file in a XUL document. It's mostly brand.dtd (for obvious reasons) and global.dtd (I guess to get the direction of the current locale) that get included the most.

(In reply to Zibi Braniecki [:gandalf][:zibi] from comment #5)
> Would you accept that as an internal performance optimization for Gecko?

I think it would be helpful to know which problem we're trying to solve before we commit to optimizing performance. My preferred route would be to not change anything for now and measure performance without any caching, with #1 enabled, with #2 enabled, and with both enabled.  It may turn out that the simplest optimization available to us is to only IO-cache brand.dtd and go with type-2 caching for everything else.

If the numbers show that we can save milliseconds by doing what this bug proposes, here's another idea.  We could add a static MessageContext.from(...contexts) method which takes a variable number of other MessageContexts and merges them into a new one. It throws if the source contexts are not in the same locale. We could then cache MessageContexts per localization file and dynamically create MessageContexts for XUL/HTML files from them and avoid re-parsing. This would make use re-create Intl objects for every MessageContext though.
Flags: needinfo?(stas)
> My preferred route would be to not change anything for now and measure performance without any caching, with #1 enabled, with #2 enabled, and with both enabled. 

Sounds good. I don't expect to get to perf measuring before Fx57 branches off, but we then have over a month to iron this out.

> If the numbers show that we can save milliseconds by doing what this bug proposes, here's another idea.  We could add a static MessageContext.from(...contexts) method which takes a variable number of other MessageContexts and merges them into a new one. It throws if the source contexts are not in the same locale. We could then cache MessageContexts per localization file and dynamically create MessageContexts for XUL/HTML files from them and avoid re-parsing. This would make use re-create Intl objects for every MessageContext though.

I like it. My gut feeling is that we should even be able to make the Intl objects a non-issue if we don't instantiate them prematurely.

Then L10nRegistry would create a MessageContext per FTL file and cache those, and build document's MessageContexts out of those. As long as the memory overhead is close to zero, I think it may be a decent solution.
No longer blocks: 1365426
Assignee: nobody → gandalf
Blocks: 1455649
Status: NEW → ASSIGNED
Priority: P3 → P2
Based on All Hands discussion with the stakeholders, we decided to introduce `FluentResource` and `FluentResource.fromString`. This patch makes use of that to limit parsing of resources and cache each resource separately.
Depends on: 1474786
Comment on attachment 8991203 [details]
Bug 1384236 - Cache l10n resources differently in L10nRegistry.

https://reviewboard.mozilla.org/r/256172/#review263126

LGTM. Is `intl/l10n/fluent.js.patch` still maintained? does that change as part of this patch?
Attachment #8991203 - Flags: review?(l10n) → review+
Thank you for the review Axel!

Unfortunately, there's a perf regression with this patch: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=2b16262842dc&newProject=try&newRevision=8a55e7949f65&framework=1

Will work on it now.
Ok, as expected, the regression is caused by lack of shared intl memoizer. Stas - are you ok with adding an optional shared memoizer to MessageContext constructor?
Flags: needinfo?(stas)
And there was a bug in our caching logic in FileSource. I'm going to separate out the three projects:
 - this bug will be only about caching using FluentResource
 - a separate bug for using shared intl memoizer
 - bug 1464156 about changing the behavior for partial contexts

This will make it easier to track regressions both in perf and memory.
Flags: needinfo?(stas)
Got a green try and talos: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=2b16262842dc&newProject=try&newRevision=ebd39258e147&framework=1

Pike, setting ni? to give you a chance to take another look since I removed some changes from this patch.

I'll post separate patches in the respective bugs.
Flags: needinfo?(l10n)
LGTM, I didn't find the magic cookie, though.
Flags: needinfo?(l10n)
We used to have:
```
if (this.cache[fullPath] === false) {
  return false;
}
```

which meant that we returned cache *only* if it was a currently-resolving promise.

I now replaced it with:

```
if (this.cache[fullPath] !== true) {
  return this.cache[fullPath];
}
```

we now return cache for all values of cache except of `true` (because `true` means that IndexedFileSource listed it, but we don't have a value for it yet) - so for two cases a) cache is a Promise currently being resolved, and b) cache is an already resolved value.

Thank you!
Pushed by zbraniecki@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/24a8cf2131d1
Cache l10n resources differently in L10nRegistry. r=Pike
Backed out changeset 24a8cf2131d1 (bug 1384236) for failing at extensions/test/xpcshell/test_webextension_langpack.js on a CLOSED TREE

Backout link:  https://hg.mozilla.org/integration/autoland/rev/d40742f21b92f26d50e9e2c8ee213d886fe81cb7

Push with failures: https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=24a8cf2131d1964a65b96c770728f03467818dce

Log link: https://treeherder.mozilla.org/logviewer.html#?job_id=188074176&repo=autoland&lineNumber=1876

Log snippet: 

[task 2018-07-13T16:53:32.580Z] 16:53:32     INFO -  TEST-START | xpcshell.ini:toolkit/mozapps/extensions/test/xpcshell/test_webextension_langpack.js
[task 2018-07-13T16:53:35.655Z] 16:53:35  WARNING -  TEST-UNEXPECTED-FAIL | xpcshell.ini:toolkit/mozapps/extensions/test/xpcshell/test_webextension_langpack.js | xpcshell return code: 0
Flags: needinfo?(gandalf)
The reason langpack test failed is because it tested for skipping an incomplete context. In such case the current code returns `null`. L10nRegistry tests tested for the same scenario but didn't verify that a MessageContext was returned.

I fixed the code and added more specificity to the test.
Flags: needinfo?(gandalf)
Pushed by zbraniecki@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/cf60869ce62a
Cache l10n resources differently in L10nRegistry. r=Pike
https://hg.mozilla.org/mozilla-central/rev/cf60869ce62a
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla63
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: