1384236 - Consider caching l10n resources differently in L10nRegistry

Assignee

Description

•

7 years ago

As per bug 1333980 comment 29:

Initially, L10nRegistry caches whole messagecontext based on the combination of sources and resourceIDs.

That means that opening two windows with browser.xul localized using Fluent will result in the same MessageContext being shared between them.

It may make sense to consider caching the parsed AST of the FTL file instead.
That would mean that if any other Localization context wants to use the same file (say "brand.ftl") they wouldn't need to parse it.

The cost is that we'd be creating a new MessageContext for each new browser.xul, but we wouldn't be parsing FTL for it, just reuse the cache.

It's something I'd like to decide on before we start using Fluent in Firefox but not block the landing of L10nRegistry, hence this bug.

Zibi Braniecki [:zbraniecki][:gandalf]

Assignee

Updated

•

7 years ago

Blocks: 1365426

Depends on: 1333980

Makoto Kato [:m_kato]

Updated

•

7 years ago

Priority: -- → P3

Zibi Braniecki [:zbraniecki][:gandalf]

Assignee

Comment 1

•

7 years ago

While working on L10nRegistry, :mossop and I discussed different levels of caching in L10nRegistry.

At the moment we have two:
1) We cache files as they're loaded.

So, if a file "brand.ftl" is loaded from the same source for the same locale multiple times, starting from the second attempt it'll just return a cached resolved promise.

2) We cache resolved contexts.

This, in result means that, for example, if we open a new window, the browser.xul gets a cached context rather than a new one (since the first browser.xul generated the context already).


This dual-caching is a bit unfortunate because it means that:
a) we cache the same bits twice - once in a raw string (1), and then parsed in context (2).
b) we return the same context between different documents. I'm not sure if it bears any security consequences, but I believe it to be pretty unexpected since contexts are stateful.

From the perspective of the flow, it seems that a more "natural" fit would be to cache parsed FTL files and construct a new context for each new call.

This would mean that we cache things once, we avoid re-parsing (fix (a)), and we create new stateful context each time (fix (b)).

As far as I understand, we originally decided to try other approaches to keep the MessageContext API clean and avoid having to standardize AST of the parsed FTL as part of the API.
I now believe that it would be valuable to expose two additional methods (maybe for Gecko+L10nRegistry use only?):

1) A method to parse FTL file
2) A sister method to `MessageContext.addMessages` that accepts the parsed data structure rather than a string

:mossop, :stas, :pike - Thoughts?

Flags: needinfo?(stas)

Flags: needinfo?(l10n)

Flags: needinfo?(dtownsend)

Staś Małolepszy :stas

Comment 2

•

7 years ago

(In reply to Zibi Braniecki [:gandalf][:zibi] from comment #1)

> 1) We cache files as they're loaded.
> 2) We cache resolved contexts.

I don't think this is necessarily a bad design. It separates concerns: #1 saves the IO time and #2 saves the CPU/parsing time. These two caching mechanisms serve different purposes and as such they don't struck me as redundant.
 
> b) we return the same context between different documents. I'm not sure if
> it bears any security consequences, but I believe it to be pretty unexpected
> since contexts are stateful.

If we're worried about MessageContext's mutability, we could expose a freeze() or a seal() method which forbids any more calls to addMessages(). Being able to re-use the same MessageContext instance is a feature of its design. If there's a problem with that design, let's fix it.

> As far as I understand, we originally decided to try other approaches to
> keep the MessageContext API clean and avoid having to standardize AST of the
> parsed FTL as part of the API.
> I now believe that it would be valuable to expose two additional methods
> (maybe for Gecko+L10nRegistry use only?):
> 
> 1) A method to parse FTL file
> 2) A sister method to `MessageContext.addMessages` that accepts the parsed
> data structure rather than a string

I'm opposed to this idea. I don't want the API to expose the internal representation of parsed messages to the outside world. That already happens with MessageContext.getMessage and I'm not happy with it.

If we still do something like this, I would ask that we explicitly not standardize the internal implementation and make it implementation dependent.

Flags: needinfo?(stas)

Axel Hecht [:Pike]

Comment 3

•

7 years ago

Nothing to add to stas' comment.

Flags: needinfo?(l10n)

Dave Townsend [:mossop]

Comment 4

•

7 years ago

(In reply to Staś Małolepszy :stas from comment #2)
> (In reply to Zibi Braniecki [:gandalf][:zibi] from comment #1)
> 
> > 1) We cache files as they're loaded.
> > 2) We cache resolved contexts.
> 
> I don't think this is necessarily a bad design. It separates concerns: #1
> saves the IO time and #2 saves the CPU/parsing time. These two caching
> mechanisms serve different purposes and as such they don't struck me as
> redundant.

It doubles the memory used by locale data though doesn't it?

> > b) we return the same context between different documents. I'm not sure if
> > it bears any security consequences, but I believe it to be pretty unexpected
> > since contexts are stateful.
> 
> If we're worried about MessageContext's mutability, we could expose a
> freeze() or a seal() method which forbids any more calls to addMessages().
> Being able to re-use the same MessageContext instance is a feature of its
> design. If there's a problem with that design, let's fix it.
> 
> > As far as I understand, we originally decided to try other approaches to
> > keep the MessageContext API clean and avoid having to standardize AST of the
> > parsed FTL as part of the API.
> > I now believe that it would be valuable to expose two additional methods
> > (maybe for Gecko+L10nRegistry use only?):
> > 
> > 1) A method to parse FTL file
> > 2) A sister method to `MessageContext.addMessages` that accepts the parsed
> > data structure rather than a string
> 
> I'm opposed to this idea. I don't want the API to expose the internal
> representation of parsed messages to the outside world. That already happens
> with MessageContext.getMessage and I'm not happy with it.
> 
> If we still do something like this, I would ask that we explicitly not
> standardize the internal implementation and make it implementation dependent.

I would prefer this too.

Flags: needinfo?(dtownsend)

Zibi Braniecki [:zbraniecki][:gandalf]

Assignee

Comment 5

•

7 years ago

Would you accept that as an internal performance optimization for Gecko?

Flags: needinfo?(stas)

Staś Małolepszy :stas

Comment 6

•

7 years ago

(In reply to Dave Townsend [:mossop] from comment #4)

> It doubles the memory used by locale data though doesn't it?

It does. But a solution to this may be for instance to drop the first type of caching :)

MessageContext already is a thin-wrapper around its parsed messages. The difference between it and what this bug is proposing is that MessageContext is a wrapper around multiple FTL files at once.

A quick grep through mozilla-central suggests that it's not very common to have more than one DTD file in a XUL document. It's mostly brand.dtd (for obvious reasons) and global.dtd (I guess to get the direction of the current locale) that get included the most.

(In reply to Zibi Braniecki [:gandalf][:zibi] from comment #5)
> Would you accept that as an internal performance optimization for Gecko?

I think it would be helpful to know which problem we're trying to solve before we commit to optimizing performance. My preferred route would be to not change anything for now and measure performance without any caching, with #1 enabled, with #2 enabled, and with both enabled.  It may turn out that the simplest optimization available to us is to only IO-cache brand.dtd and go with type-2 caching for everything else.

If the numbers show that we can save milliseconds by doing what this bug proposes, here's another idea.  We could add a static MessageContext.from(...contexts) method which takes a variable number of other MessageContexts and merges them into a new one. It throws if the source contexts are not in the same locale. We could then cache MessageContexts per localization file and dynamically create MessageContexts for XUL/HTML files from them and avoid re-parsing. This would make use re-create Intl objects for every MessageContext though.

Flags: needinfo?(stas)

Zibi Braniecki [:zbraniecki][:gandalf]

Assignee

Comment 7

•

7 years ago

> My preferred route would be to not change anything for now and measure performance without any caching, with #1 enabled, with #2 enabled, and with both enabled. 

Sounds good. I don't expect to get to perf measuring before Fx57 branches off, but we then have over a month to iron this out.

> If the numbers show that we can save milliseconds by doing what this bug proposes, here's another idea.  We could add a static MessageContext.from(...contexts) method which takes a variable number of other MessageContexts and merges them into a new one. It throws if the source contexts are not in the same locale. We could then cache MessageContexts per localization file and dynamically create MessageContexts for XUL/HTML files from them and avoid re-parsing. This would make use re-create Intl objects for every MessageContext though.

I like it. My gut feeling is that we should even be able to make the Intl objects a non-issue if we don't instantiate them prematurely.

Then L10nRegistry would create a MessageContext per FTL file and cache those, and build document's MessageContexts out of those. As long as the memory overhead is close to zero, I think it may be a decent solution.

Zibi Braniecki [:zbraniecki][:gandalf]

Assignee

Updated

•

6 years ago

Blocks: 1441035

Zibi Braniecki [:zbraniecki][:gandalf]

Assignee

Updated

•

6 years ago

No longer blocks: 1365426

Zibi Braniecki [:zbraniecki][:gandalf]

Assignee

Updated

•

6 years ago

Blocks: 1462839

Zibi Braniecki [:zbraniecki][:gandalf]

Assignee

Updated

•

6 years ago

Assignee: nobody → gandalf

Blocks: 1455649

Status: NEW → ASSIGNED

Priority: P3 → P2

Comment hidden (mozreview-request)

Switch to cache FTLResources per FileSource, and change the logic of
context creation to return a context if at least one source for the
permutation works.

Review commit: https://reviewboard.mozilla.org/r/256172/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/256172/

Comment hidden (mozreview-request)

Comment on attachment 8991203 [details]
Bug 1384236 - Cache l10n resources differently in L10nRegistry.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/256172/diff/1-2/

Zibi Braniecki [:zbraniecki][:gandalf]

Assignee

Comment 10

•

6 years ago

Based on All Hands discussion with the stakeholders, we decided to introduce `FluentResource` and `FluentResource.fromString`. This patch makes use of that to limit parsing of resources and cache each resource separately.

Axel Hecht [:Pike]

Updated

•

6 years ago

Depends on: 1474786

Axel Hecht [:Pike]

Comment 11

•

6 years ago

mozreview-review

Comment on attachment 8991203 [details]
Bug 1384236 - Cache l10n resources differently in L10nRegistry.

https://reviewboard.mozilla.org/r/256172/#review263126

LGTM. Is `intl/l10n/fluent.js.patch` still maintained? does that change as part of this patch?

Attachment #8991203 - Flags: review?(l10n) → review+

Zibi Braniecki [:zbraniecki][:gandalf]

Assignee

Comment 12

•

6 years ago

Thank you for the review Axel!

Unfortunately, there's a perf regression with this patch: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=2b16262842dc&newProject=try&newRevision=8a55e7949f65&framework=1

Will work on it now.

Zibi Braniecki [:zbraniecki][:gandalf]

Assignee

Comment 13

•

6 years ago

Ok, as expected, the regression is caused by lack of shared intl memoizer. Stas - are you ok with adding an optional shared memoizer to MessageContext constructor?

Flags: needinfo?(stas)

Zibi Braniecki [:zbraniecki][:gandalf]

Assignee

Comment 14

•

6 years ago

And there was a bug in our caching logic in FileSource. I'm going to separate out the three projects:
 - this bug will be only about caching using FluentResource
 - a separate bug for using shared intl memoizer
 - bug 1464156 about changing the behavior for partial contexts

This will make it easier to track regressions both in perf and memory.

Flags: needinfo?(stas)

Comment hidden (mozreview-request)

Comment on attachment 8991203 [details]
Bug 1384236 - Cache l10n resources differently in L10nRegistry.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/256172/diff/2-3/

Comment hidden (mozreview-request)

Comment on attachment 8991203 [details]
Bug 1384236 - Cache l10n resources differently in L10nRegistry.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/256172/diff/3-4/

Zibi Braniecki [:zbraniecki][:gandalf]

Assignee

Comment 17

•

6 years ago

Got a green try and talos: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=2b16262842dc&newProject=try&newRevision=ebd39258e147&framework=1

Pike, setting ni? to give you a chance to take another look since I removed some changes from this patch.

I'll post separate patches in the respective bugs.

Flags: needinfo?(l10n)

Axel Hecht [:Pike]

Comment 18

•

6 years ago

LGTM, I didn't find the magic cookie, though.

Flags: needinfo?(l10n)

Zibi Braniecki [:zbraniecki][:gandalf]

Assignee

Comment 19

•

6 years ago

We used to have:
```
if (this.cache[fullPath] === false) {
  return false;
}
```

which meant that we returned cache *only* if it was a currently-resolving promise.

I now replaced it with:

```
if (this.cache[fullPath] !== true) {
  return this.cache[fullPath];
}
```

we now return cache for all values of cache except of `true` (because `true` means that IndexedFileSource listed it, but we don't have a value for it yet) - so for two cases a) cache is a Promise currently being resolved, and b) cache is an already resolved value.

Thank you!

Pulsebot

Comment 20

•

6 years ago

Pushed by zbraniecki@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/24a8cf2131d1
Cache l10n resources differently in L10nRegistry. r=Pike

Raul Gurzau (:RaulG)

Comment 21

•

6 years ago

Backed out changeset 24a8cf2131d1 (bug 1384236) for failing at extensions/test/xpcshell/test_webextension_langpack.js on a CLOSED TREE

Backout link:  https://hg.mozilla.org/integration/autoland/rev/d40742f21b92f26d50e9e2c8ee213d886fe81cb7

Push with failures: https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=24a8cf2131d1964a65b96c770728f03467818dce

Log link: https://treeherder.mozilla.org/logviewer.html#?job_id=188074176&repo=autoland&lineNumber=1876

Log snippet: 

[task 2018-07-13T16:53:32.580Z] 16:53:32     INFO -  TEST-START | xpcshell.ini:toolkit/mozapps/extensions/test/xpcshell/test_webextension_langpack.js
[task 2018-07-13T16:53:35.655Z] 16:53:35  WARNING -  TEST-UNEXPECTED-FAIL | xpcshell.ini:toolkit/mozapps/extensions/test/xpcshell/test_webextension_langpack.js | xpcshell return code: 0

Flags: needinfo?(gandalf)

Comment hidden (mozreview-request)

Comment on attachment 8991203 [details]
Bug 1384236 - Cache l10n resources differently in L10nRegistry.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/256172/diff/4-5/

Zibi Braniecki [:zbraniecki][:gandalf]

Assignee

Comment 23

•

6 years ago

The reason langpack test failed is because it tested for skipping an incomplete context. In such case the current code returns `null`. L10nRegistry tests tested for the same scenario but didn't verify that a MessageContext was returned.

I fixed the code and added more specificity to the test.

Flags: needinfo?(gandalf)

Pulsebot

Comment 24

•

6 years ago

Pushed by zbraniecki@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/cf60869ce62a
Cache l10n resources differently in L10nRegistry. r=Pike

Arthur Iakab [arthur_iakab]

Comment 25

•

6 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/cf60869ce62a

Status: ASSIGNED → RESOLVED

Closed: 6 years ago

status-firefox63: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla63