Closed Bug 595819 Opened 12 years ago Closed 5 years ago

L20n's performance impact

Categories

(L20n :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED INCOMPLETE

People

(Reporter: zbraniecki, Assigned: zbraniecki)

References

Details

(Keywords: meta, perf)

Attachments

(9 files)

This bug is devoted to track performance impact of the effort to bring L20n into Gecko and migrate away from .properties/.dtd .
Assignee: nobody → gandalf
Blocks: 595812
Status: NEW → ASSIGNED
Depends on: 595825
I did an extensive effort to get meaningful numbers out of try-talos builds.

In particular, here are the revisions I used:

 - referential beta4 build, rev 7c19fb706708
 - l20n beta4 build, rev cb5fb82c2b6a
 - another l20n beta4 build, rev 914373c38c0b
 - inline beta4 build, rev bf8a943ea176
 - empty (no changes) beta4 build, rev de98422ab500
 - another empty beta4 build, rev 01d25a0b3894

* inline means a build that does not use any i18n infrastructure. All entities are inlined into XUL code. This is possible thanks to silme-apps script and the goals here was to measure performance impact comparing to lack of any i18n inftastructure.
* empty runs were used to measure the variation of results comparing to the beta4 referential build. I wanted to verify if the 7c19fb706708 run had any outliers comparing to other beta4 talos runs.
unfortunately, not many tests are strictly measuring the impact of l20n changes on Firefox performance. I was looking for tests that would measure an impact in four major areas:
 - browser start
 - browser shutdown
 - new window opening
 - memory consumption

The basic theory was that we're ditching all the code that is required to load several dozen of .dtd files, parse them and store in memory with a context per .xul file and some JS code compiled into memory.

There were several assumptions:
 - we may be faster because DTD code has never been investigated from performance standpoint
 - we may be faster because we in fact replace DTD parser with JS parser which may be better optimized despite JS being more complex anyway.
 - we may be slower because we will have a lot of context switched between C++ contentsink processing code and JS context on each l10n_id call.
 - we may conserve memory since we just load one context per one xul file.
 - we can get faster by using compiled JS files in the future
 - we can get faster by optimizing our l20n->j20n code to expand local variable references.

Unfortunately, a lot of talos tests were of minor value to us:
 - tp4 could be valuable - it should measure memory consumption. Unfortunately it does so by loading 100 web pages which may make the sole impact of l20n switch not significant enough to notice and pollutes the result with anything that's related to page loading. It would be nice to have a test that measures only the browser, not the engine.
 - ts is the most meaningful to us. It measures the opening of the browser.
 - tdhtml, tgfx and tsvg all measure the engine loading times.
 - twinopen could be useful since it measures the opening of a window, but it does so with custom XUL code so I'm not sure if it measures the engine or the browser itself. I suspect it rather measures the engine
 - sunspider and dromaeo tests are meaningless to us
 - cold_* tests are supposed to be not important according to taras since they mostly measure the caching influence (or lack of it).

In result one should investigate mostly ts and tp4 tests when analyzing the results.
Attached file try-talos, ref vs l20n
first I tested try-talos results between beta4 and l20n build.
then I tested beta4 vs. an inline build.
because the inline numbers were confusing (can we really slow down after there?) I decided to give an empty build a shot. It means that I tested revision 7c19fb706708 against itself (with a totally minor change in the brand.dtd entity name to recognize the build if I was to run it).

The assumption was that the empty build should give us exactly the same averages and similar variation minus some noise.
I decided to give it another try to see how much noise there can be. Once again, beta4 vs. itself.
this is the last try-talos run. I tested empty1 vs. empty2 to measure the spread.

Notice, that each build got at least 2 talos runs each.
next was empty1+2 vs. inline build, ignoring referential build's talos results since I suspect something changed between that build and my builds on the try server that made ref-empty1 and ref-empty2 get greater spread than empty1-empty2.
empty1+2 vs l20n
After this, I decided not to spend time on measuring the significance level of the results since it seems clear that the spread between referential build, empty1 and empty2 in all tests is way to great to give a definitive answer.

The inline comparisons suggests that overall i18n infra does not have a high impact on the talos tests since we don't see a clear win when we remove its impact.
The last two comparisons may suggest that we're not slower with l20n (ts, twinopen, ts_shutdown), but we may use more memory (tp4_*) although I'd take it with a grain of salt since the variation is really high there.
the boxplot shows the sample minimum, median, maximum, lower/upper quartile.

My reading is that we seem to not be slower than the reference beta4, yet we do have a higher dispersion.
I created a custom performance test that loads a very long XUL document 10 times.
Three cases are:
1) The file uses DTD for l10n
2) The file uses l20n for l10n
3) The file has all the entities inlined.

The attachement is the boxplot generated out of the results of 5 runs of each test on Linux Ubuntu 64bit on latest mozilla-central code with XUL patch from bug 566906.
I restarted the browser before each run, because it seems that in case of L20n there is some caching going ton (the first run gave me an average 1200ms, while the following ones were around 1100ms).

I'll do more analysis but the basics are:

==== DTD ====
mean: 1089.96
std. error: 7.181
median: 1076
variance: 2576.325
std. deviation: 50.777

==== L20n ====
mean: 1127.14
std. error: 5.123
median: 1114.50
variance: 1312.327
std. deviation: 36.226

==== Inline ====
mean: 1066.24
std. error: 2.057
median: 1063.50
variance: 211.429
std. deviation: 14.543

It seems that at least in this test, l20n is the slowest solution for now, but we have room for improvement.

For example, there is the notices first run penalty which I tend to attribute to initial compilation. If I remove those first runs from l20n results the variance drops to 611 (from 1312!), std. deviation to 24 (from 36) and std. error to 3.6 (from 5.1).

Other findings:
 - I'm surprised by the spread of DTD results.
 - It seems that at least in this scenario l10n tech. contributes little to overall time cost of the window loading: 2.22% for DTD, 4.86% for L20n.
 - current code makes L20n solution freeze the UI for the time of compilation. DTD freezes as well but much less.

I'll rerun the test once I have the l20n patch updated to latest feedback and compare it with what I have now.
It would be awesome if someone could take a peak at the test and review its value (taras? :)).
Keywords: meta, perf
Component: Tracking → Localization
Component: Localization → General
Product: Core → L20n
Seven years later, we're making another attempt to refactor our l10n layer.

The new tracking bug is bug 1365426 and I'll mark the previous effort as "INCOMPLETE".
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.