Closed Bug 1527879 Opened 7 months ago Closed 6 months ago

localeCompare is very slow due to a lot of main thread I/O

Categories

(Core :: JavaScript: Internationalization API, defect, P2)

Unspecified
Windows
defect

Tracking

()

RESOLVED FIXED
mozilla68
Tracking Status
firefox-esr60 --- wontfix
firefox66 --- wontfix
firefox67 --- wontfix
firefox68 --- fixed

People

(Reporter: florian, Assigned: anba)

Details

(Whiteboard: [fxperf:p2][qf:p2:responsiveness])

Attachments

(1 file, 1 obsolete file)

This subset of a startup profile https://perfht.ml/2DRNa0O shows that we spend almost 2s on the main thread at the end of the asynchronous initialization of the search service to sort the list of available engines using localeCompare.

In this profile, the mainthread I/O markers show that we do 100+ attempts to access files from the mainthread.

Stacks look like:

compareEngines [jar:file:///C:/Program%20Files/Firefox%20Nightly/omni.ja!/components/nsSearchService.js:3659:0]
localeCompare [self-hosted:5980:0]
js::RunScript(JSContext *,js::RunState &) [xul.dll]
static bool Interpret(struct JSContext *, class js::RunState & const) [xul.dll]
static bool InternalCall(struct JSContext *, const class js::AnyInvokeArgs & const) [xul.dll]
js::InternalCallOrConstruct(JSContext *,JS::CallArgs const &,js::MaybeConstruct) [xul.dll]
js::intl_CompareStrings(JSContext *,unsigned int,JS::Value *) [xul.dll]
static struct UCollator * NewUCollator(struct JSContext *, class JS::Handle<js::CollatorObject *>) [xul.dll]
js::intl::GetInternalsObject(JSContext *,JS::Handle<JSObject *>) [xul.dll]
js::CallSelfHostedFunction(JSContext *,JS::Handle<js::PropertyName *>,JS::Handle<JS::Value>,js::AnyInvokeArgs const &,JS::MutableHandle<JS::Value>) [xul.dll]
js::Call(JSContext *,JS::Handle<JS::Value>,JS::Handle<JS::Value>,js::AnyInvokeArgs const &,JS::MutableHandle<JS::Value>) [xul.dll]
static bool InternalCall(struct JSContext *, const class js::AnyInvokeArgs & const) [xul.dll]
js::InternalCallOrConstruct(JSContext *,JS::CallArgs const &,js::MaybeConstruct) [xul.dll]
js::RunScript []
getInternals [self-hosted:2094:0]
resolveCollatorInternals [self-hosted:1261:0]
ResolveLocale [self-hosted:1917:0]
collatorSortCaseFirstDefault [self-hosted:1366:0]
js::RunScript(JSContext *,js::RunState &) [xul.dll]
static bool Interpret(struct JSContext *, class js::RunState & const) [xul.dll]
static bool InternalCall(struct JSContext *, const class js::AnyInvokeArgs & const) [xul.dll]
js::InternalCallOrConstruct(JSContext *,JS::CallArgs const &,js::MaybeConstruct) [xul.dll]
js::intl_isUpperCaseFirst(JSContext *,unsigned int,JS::Value *) [xul.dll]
js::intl::SharedIntlData::isUpperCaseFirst(JSContext *,JS::Handle<JSString *>,bool *) [xul.dll]
js::intl::SharedIntlData::ensureUpperCaseFirstLocales(JSContext *) [xul.dll]
ucol_open_63 [xul.dll]
icu_63::Collator::createInstance(icu_63::Locale const &,UErrorCode &) [xul.dll]
icu_63::CollationLoader::loadTailoring(icu_63::Locale const &,UErrorCode &) [xul.dll]
icu_63::UnifiedCache::get<icu_63::CollationCacheEntry>(icu_63::CacheKey<icu_63::CollationCacheEntry> const &,void const *,icu_63::CollationCacheEntry const * &,UErrorCode &) [xul.dll]
icu_63::UnifiedCache::_get(icu_63::CacheKeyBase const &,icu_63::SharedObject const * &,void const *,UErrorCode &) [xul.dll]
const struct icu_63::CollationCacheEntry * icu_63::LocaleCacheKey<icu_63::CollationCacheEntry>::createObject(const void *, UErrorCode & const) [xul.dll]
ures_openWithType(UResourceBundle *,char const *,char const *,UResOpenType,UErrorCode *) [xul.dll]
static struct UResourceDataEntry * findFirstExisting(const char *, char *, char *, char *, char *, UErrorCode *) [xul.dll]
static struct UResourceDataEntry * init_entry(const char *, const char *, UErrorCode *) [xul.dll]
doOpenChoice(char const *,char const *,char const *,signed char (*)(void *,char const *,char const *,UDataInfo const *),void *,UErrorCode *) [xul.dll]
static struct UDataMemory * doLoadFromIndividualFiles(const char *, const char *, const char *, const char *, const char *, const char *,  *, void *, UErrorCode *, UErrorCode *) [xul.dll]
uprv_mapFile_63 [xul.dll]
CreateFileA [KERNELBASE.dll]

I suspect this behavior of being Windows-only (at least I couldn't reproduce on Mac).

Here is the output I get when enabling the printf at: https://searchfox.org/mozilla-central/rev/01b4b3830ea3cae2e9e431019afa6391b471c6da/intl/icu/source/common/udata.cpp#1003

UDATA: trying individual file .\icudt63l\cnvalias.icu
UDATA: trying individual file .\icudt63l\uts46.nrm
UDATA: trying individual file .\icudt63l\likelySubtags.res
UDATA: trying individual file .\icudt63l\windowsZones.res
UDATA: trying individual file .\icudt63l\zoneinfo64.res
UDATA: trying individual file .\icudt63l\keyTypeData.res
UDATA: trying individual file .\icudt63l\timezoneTypes.res
UDATA: trying individual file .\icudt63l\coll\ucadata.icu
UDATA: trying individual file .\icudt63l\coll\en_US.res
UDATA: trying individual file .\icudt63l\coll\en.res
UDATA: trying individual file .\icudt63l\coll\root.res
UDATA: trying individual file .\icudt63l\coll\res_index.res
UDATA: trying individual file .\icudt63l\res_index.res
UDATA: trying individual file .\icudt63l\coll\af.res
UDATA: trying individual file .\icudt63l\coll\am.res
UDATA: trying individual file .\icudt63l\coll\ar.res
UDATA: trying individual file .\icudt63l\coll\as.res
UDATA: trying individual file .\icudt63l\coll\az.res
UDATA: trying individual file .\icudt63l\coll\be.res
UDATA: trying individual file .\icudt63l\coll\bg.res
UDATA: trying individual file .\icudt63l\coll\bn.res
UDATA: trying individual file .\icudt63l\coll\bo.res
UDATA: trying individual file .\icudt63l\coll\bs.res
UDATA: trying individual file .\icudt63l\coll\bs_Cyrl.res
UDATA: trying individual file .\icudt63l\coll\ca.res
UDATA: trying individual file .\icudt63l\coll\chr.res
UDATA: trying individual file .\icudt63l\coll\cs.res
UDATA: trying individual file .\icudt63l\coll\cy.res
UDATA: trying individual file .\icudt63l\coll\da.res
UDATA: trying individual file .\icudt63l\coll\de.res
UDATA: trying individual file .\icudt63l\coll\de_AT.res
UDATA: trying individual file .\icudt63l\coll\dsb.res
UDATA: trying individual file .\icudt63l\coll\dz.res
UDATA: trying individual file .\icudt63l\coll\ee.res
UDATA: trying individual file .\icudt63l\coll\el.res
UDATA: trying individual file .\icudt63l\coll\en_US_POSIX.res
UDATA: trying individual file .\icudt63l\coll\eo.res
UDATA: trying individual file .\icudt63l\coll\es.res
UDATA: trying individual file .\icudt63l\coll\et.res
UDATA: trying individual file .\icudt63l\coll\fa.res
UDATA: trying individual file .\icudt63l\coll\fa_AF.res
UDATA: trying individual file .\icudt63l\coll\fi.res
UDATA: trying individual file .\icudt63l\coll\fil.res
UDATA: trying individual file .\icudt63l\coll\fo.res
UDATA: trying individual file .\icudt63l\coll\fr.res
UDATA: trying individual file .\icudt63l\coll\fr_CA.res
UDATA: trying individual file .\icudt63l\coll\ga.res
UDATA: trying individual file .\icudt63l\coll\gl.res
UDATA: trying individual file .\icudt63l\coll\gu.res
UDATA: trying individual file .\icudt63l\coll\ha.res
UDATA: trying individual file .\icudt63l\coll\haw.res
UDATA: trying individual file .\icudt63l\coll\he.res
UDATA: trying individual file .\icudt63l\coll\hi.res
UDATA: trying individual file .\icudt63l\coll\hr.res
UDATA: trying individual file .\icudt63l\coll\hsb.res
UDATA: trying individual file .\icudt63l\coll\hu.res
UDATA: trying individual file .\icudt63l\coll\hy.res
UDATA: trying individual file .\icudt63l\coll\id.res
UDATA: trying individual file .\icudt63l\coll\ig.res
UDATA: trying individual file .\icudt63l\coll\is.res
UDATA: trying individual file .\icudt63l\coll\it.res
UDATA: trying individual file .\icudt63l\coll\ja.res
UDATA: trying individual file .\icudt63l\coll\ka.res
UDATA: trying individual file .\icudt63l\coll\kk.res
UDATA: trying individual file .\icudt63l\coll\kl.res
UDATA: trying individual file .\icudt63l\coll\km.res
UDATA: trying individual file .\icudt63l\coll\kn.res
UDATA: trying individual file .\icudt63l\coll\ko.res
UDATA: trying individual file .\icudt63l\coll\kok.res
UDATA: trying individual file .\icudt63l\coll\ku.res
UDATA: trying individual file .\icudt63l\coll\ky.res
UDATA: trying individual file .\icudt63l\coll\lb.res
UDATA: trying individual file .\icudt63l\coll\lkt.res
UDATA: trying individual file .\icudt63l\coll\ln.res
UDATA: trying individual file .\icudt63l\coll\lo.res
UDATA: trying individual file .\icudt63l\coll\lt.res
UDATA: trying individual file .\icudt63l\coll\lv.res
UDATA: trying individual file .\icudt63l\coll\mk.res
UDATA: trying individual file .\icudt63l\coll\ml.res
UDATA: trying individual file .\icudt63l\coll\mn.res
UDATA: trying individual file .\icudt63l\coll\mr.res
UDATA: trying individual file .\icudt63l\coll\ms.res
UDATA: trying individual file .\icudt63l\coll\mt.res
UDATA: trying individual file .\icudt63l\coll\my.res
UDATA: trying individual file .\icudt63l\coll\nb.res
UDATA: trying individual file .\icudt63l\coll\ne.res
UDATA: trying individual file .\icudt63l\coll\nl.res
UDATA: trying individual file .\icudt63l\coll\nn.res
UDATA: trying individual file .\icudt63l\coll\om.res
UDATA: trying individual file .\icudt63l\coll\or.res
UDATA: trying individual file .\icudt63l\coll\pa.res
UDATA: trying individual file .\icudt63l\coll\pl.res
UDATA: trying individual file .\icudt63l\coll\ps.res
UDATA: trying individual file .\icudt63l\coll\pt.res
UDATA: trying individual file .\icudt63l\coll\ro.res
UDATA: trying individual file .\icudt63l\coll\ru.res
UDATA: trying individual file .\icudt63l\coll\se.res
UDATA: trying individual file .\icudt63l\coll\si.res
UDATA: trying individual file .\icudt63l\coll\sk.res
UDATA: trying individual file .\icudt63l\coll\sl.res
UDATA: trying individual file .\icudt63l\coll\smn.res
UDATA: trying individual file .\icudt63l\coll\sq.res
UDATA: trying individual file .\icudt63l\coll\sr.res
UDATA: trying individual file .\icudt63l\coll\sr_Latn.res
UDATA: trying individual file .\icudt63l\coll\sv.res
UDATA: trying individual file .\icudt63l\coll\sw.res
UDATA: trying individual file .\icudt63l\coll\ta.res
UDATA: trying individual file .\icudt63l\coll\te.res
UDATA: trying individual file .\icudt63l\coll\th.res
UDATA: trying individual file .\icudt63l\coll\tk.res
UDATA: trying individual file .\icudt63l\coll\to.res
UDATA: trying individual file .\icudt63l\coll\tr.res
UDATA: trying individual file .\icudt63l\coll\ug.res
UDATA: trying individual file .\icudt63l\coll\uk.res
UDATA: trying individual file .\icudt63l\coll\ur.res
UDATA: trying individual file .\icudt63l\coll\uz.res
UDATA: trying individual file .\icudt63l\coll\vi.res
UDATA: trying individual file .\icudt63l\coll\wae.res
UDATA: trying individual file .\icudt63l\coll\wo.res
UDATA: trying individual file .\icudt63l\coll\xh.res
UDATA: trying individual file .\icudt63l\coll\yi.res
UDATA: trying individual file .\icudt63l\coll\yo.res
UDATA: trying individual file .\icudt63l\coll\zh.res
UDATA: trying individual file .\icudt63l\coll\zh_Hant.res
UDATA: trying individual file .\icudt63l\coll\zu.res

None of these files seem to exist on the disk, so I suspect this is a bit of leftover code that was half disabled (by not shipping the related resource files).

This is also a visible hang on BHR data.

We could just hard-code the two locales (Danish and Maltese) with upper-case first collation and only use the "intl_isUpperCaseFirst" function to assert our hard-coded assumption still holds.

var intl_Collator_availableLocales = getSelfHostedValue("intl_Collator_availableLocales");
var intl_isUpperCaseFirst = getSelfHostedValue("intl_isUpperCaseFirst");
print(Object.keys(intl_Collator_availableLocales()).filter(loc => intl_isUpperCaseFirst(loc))); // Prints "da,mt"

(In reply to Florian Quèze [:florian] from comment #1)

None of these files seem to exist on the disk, so I suspect this is a bit of leftover code that was half disabled (by not shipping the related resource files).

The files should be compiled into "config/external/icu/data/icudt63l.dat".

Component: JavaScript Engine → JavaScript: Internationalization API

Waldo would be the right person handle this issue. It definitely sounds that we could cache the fact that these files are missing instead of querying it all the time.

Flags: needinfo?(jwalden)
Priority: -- → P2
Whiteboard: [fxperf][qf] → [fxperf][qf:p2:responsiveness]

This seems like it'd help with start-up performance for a good chunk of our users - especially the ones with spinning disks.

Hey Waldo, would you have any time to look at this?

Whiteboard: [fxperf][qf:p2:responsiveness] → [fxperf:p2][qf:p2:responsiveness]

Possibly. I actually started looking at this a week or two ago, then Phabricator happened and I had to upend my life for a bit, plus deal with an unrelated landing that went awry. I may be able to look at this this week or so.

What locales do I have to enable to be able to reproduce this? I guess the better, more general, question is, "What do I have to do to reproduce this on a local build?"

Thanks!

(In reply to Will Hawkins from comment #7)

What locales do I have to enable to be able to reproduce this? I guess the better, more general, question is, "What do I have to do to reproduce this on a local build?"

Nothing special, a normal en-US nightly build on Windows reproduces. This is a startup profile with mainthreadio markers captured on today's nightly: https://perfht.ml/2CdhZx2

To profile startup with mainthreadio markers, you need to start the build with these environment variables:
MOZ_PROFILER_STARTUP=1
MOZ_PROFILER_STARTUP_ENTRIES=10000000
MOZ_PROFILER_STARTUP_FEATURES=js,stackwalk,responsiveness,mainthreadio

I recommend using a recent build including the fix from bug 1533775 to see the name of the files in the profile.

(In reply to Florian Quèze [:florian] from comment #8)

(In reply to Will Hawkins from comment #7)

What locales do I have to enable to be able to reproduce this? I guess the better, more general, question is, "What do I have to do to reproduce this on a local build?"

Nothing special, a normal en-US nightly build on Windows reproduces. This is a startup profile with mainthreadio markers captured on today's nightly: https://perfht.ml/2CdhZx2

To profile startup with mainthreadio markers, you need to start the build with these environment variables:
MOZ_PROFILER_STARTUP=1
MOZ_PROFILER_STARTUP_ENTRIES=10000000
MOZ_PROFILER_STARTUP_FEATURES=js,stackwalk,responsiveness,mainthreadio

I recommend using a recent build including the fix from bug 1533775 to see the name of the files in the profile.

Thanks! I am rebuilding now. I have a patch that I think implements the cache and may solve this problem. I am going to test as soon as the build completes.

Attached patch 1527879.patch (obsolete) — Splinter Review

This is a very basic (untested) patch that builds a cache of files that have been found to not exist when we try to access them. It includes a pre-emptive strike where the directory itself is found not to exist. I have not tested this patch -- I am waiting for try access to build it on Windows. If anyone wants to apply and test, please feel free to do so! I'd love to hear the results. Also, feedback welcome. I am especially interested in whether we are keen on the idea of using posix functions in this code.

Thanks!
Will

Is this possibly an ICU bug? We never want to load any ICU data from "individual files" (doLoadFromIndividualFiles function), but instead all data should be loaded from the "common data file" (doLoadFromCommonData function).

Loading individual files requires that a data path is set. And in a Linux build (shell and browser), the dataPath variable is set to "", which means no individual files will be loaded. The data path is initially set to "" in dataDirectoryInitFn. But look what's there: On Windows, dataPath is set to ".\\", which is not the empty string and therefore the load individual files condition is true.

Can someone try to call u_setDataDirectory("") before u_init() in JS::detail::InitWithFailureDiagnostic?

(In reply to André Bargull [:anba] from comment #11)

Loading individual files requires that a data path is set. And in a Linux build (shell and browser), the dataPath variable is set to "", which means no individual files will be loaded. The data path is initially set to "" in dataDirectoryInitFn. But look what's there: On Windows, dataPath is set to ".\\", which is not the empty string and therefore the load individual files condition is true.

Interesting, that looks like a regression introduced in bug 1353650 when ICU was updated.

Build with change mentioned in comment #11 doesn't show any load attempts for collation data: https://treeherder.mozilla.org/#/jobs?repo=try&revision=0fc5a7e0b6408a8e675573eddfed837b763ade2c

I'm going to file an ICU bug report and later post a patch here.

(In reply to André Bargull [:anba] from comment #13)

Build with change mentioned in comment #11 doesn't show any load attempts for collation data: https://treeherder.mozilla.org/#/jobs?repo=try&revision=0fc5a7e0b6408a8e675573eddfed837b763ade2c

I also tested locally with the added u_setDataDirectory("") call, and I no longer see any icu main thread I/O in profiles :-).

I'm going to file an ICU bug report and later post a patch here.

Thanks!

(In reply to Florian Quèze [:florian] from comment #14)

(In reply to André Bargull [:anba] from comment #13)
I also tested locally with the added u_setDataDirectory("") call, and I no longer see any icu main thread I/O in profiles :-).

Thanks for confirming!

I'm going to file an ICU bug report and later post a patch here.

Thanks!

And filed https://unicode-org.atlassian.net/browse/ICU-20491.

And change the shell to use JS_InitWithFailureDiagnostic instead of plain
JS_Init, so any ICU data loading errors will be printed to stderr instead of
simply exiting the application.

What's the status here? Is it blocked on the review, or do you want to take the upstream patch (https://github.com/unicode-org/icu/pull/535) instead?

Flags: needinfo?(andrebargull)

Redirecting the NI to Waldo.

The third option is to wait for the ICU update (bug 1533481), which will also include the upstream patch.

Flags: needinfo?(andrebargull)
Assignee: nobody → andrebargull
Status: NEW → ASSIGNED
Attachment #9050177 - Attachment is obsolete: true

Pushed by ccoroiu@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/6bfe0c0b681f
Explicitly set ICU's data directory to stop ICU from trying to load individual files on Windows. r=jwalden

Keywords: checkin-needed
Status: ASSIGNED → RESOLVED
Closed: 6 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla68

Do we want to uplift this fix to 67 Beta?

Since this bug is priority P2, I assume we won't uplift it. OTOH, we still have six weeks of test time remaining in the 67 Beta cycle.

OS: Unspecified → Windows

I still see the first localeCompare call beinig very slow on Windows startup: https://perfht.ml/2WFKefH (several seconds in this profile). I'm not sure if there's any main thread I/O left of if it's just some CPU intensive that's blocking. André, could you please have a look at this profile to see if something isn't behaving as expected?

Flags: needinfo?(andrebargull)

I got an even slower profile today; I wonder if this is a recent regression: https://perfht.ml/2WLJRAy

Filed bug 1564093 to address the main-thread I/O issue.

Flags: needinfo?(andrebargull)
You need to log in before you can comment on or make changes to this bug.