Closed
Bug 926980
Opened 11 years ago
Closed 9 years ago
Load mac ICU data from a separate file so that we don't have two copies of it in universal builds
Categories
(Firefox Build System :: General, defect, P2)
Tracking
(firefox27+ wontfix)
RESOLVED
FIXED
People
(Reporter: benjamin, Assigned: ted)
References
Details
(Whiteboard: p=8)
Attachments
(1 obsolete file)
The ICU data is currently being compiled into the mozjs binary. This is probably fine on windows/linux, but on mac it means we're keeping two copies of the data. We should instead build the data as a separate .dat file which is loaded dynamically (ICU already has support for this, so this is primarily a plumbing question).
Updated•11 years ago
|
OS: Linux → Mac OS X
Hardware: x86_64 → x86
Comment 1•11 years ago
|
||
This will negatively impact the FF27 download size, if left unresolved.
tracking-firefox27:
--- → ?
Updated•11 years ago
|
status-firefox27:
--- → affected
Comment 2•11 years ago
|
||
Ugh. Apparently I've been glancing over the notification emails about this bug because the subject line doesn't seem relevant.
Anyway, I don't think this should be tracking 27 because a beta uplift for the eventual fix is IMO too risky. I will also challenge the importance of fixing this. While having the ICU data packaged twice is certainly suboptimal, I suspect compression in the DMG will minimize the impact on download size. However, I haven't verified this.
Can we bump the tracking down to a newer version? I can try to obtain size data tomorrow (it's almost midnight now) if you need it to make a decision.
Comment 3•11 years ago
|
||
(In reply to Gregory Szorc [:gps] from comment #2)
> Ugh. Apparently I've been glancing over the notification emails about this
> bug because the subject line doesn't seem relevant.
>
> Anyway, I don't think this should be tracking 27 because a beta uplift for
> the eventual fix is IMO too risky. I will also challenge the importance of
> fixing this. While having the ICU data packaged twice is certainly
> suboptimal, I suspect compression in the DMG will minimize the impact on
> download size. However, I haven't verified this.
>
> Can we bump the tracking down to a newer version? I can try to obtain size
> data tomorrow (it's almost midnight now) if you need it to make a decision.
Hi :gps,
Have you had a chance to check on the compare the sizes yet ? Would our couple of FF27 Beta's that have been spun give you any helpful data to help here ?
Flags: needinfo?(gps)
Comment 4•11 years ago
|
||
Beta builds don't include Intl or ICU, because of bug 924839 and attachment 8345073 [details] [diff] [review] (which disables Intl in beta/release builds), so beta is unaffected by any duplication in play here.
Comment 5•11 years ago
|
||
Double-checked this -- a build from ftp://ftp.mozilla.org/pub/mozilla.org/firefox/candidates/27.0b4-candidates/build1/linux-x86_64/en-US/ gives "ReferenceError: Intl is not defined" in the console, so no effect at all from this on beta.
Flags: needinfo?(gps)
Reporter | ||
Updated•11 years ago
|
Priority: -- → P2
Comment 6•11 years ago
|
||
wontfixing this given the investigation in comment #4 and comment #5. Please re-open if needed.
Comment 7•11 years ago
|
||
Has there been any decision on whether this is desirable? After ICU landed, we can't link XUL on OS X/ppc because the .dat file bulges things by larger than the maximum displacement of a branch instruction (16MB), so we have to build --without-intl-api for now. Using an external .dat file would probably solve the problem.
If there is little interest, we can implement this ourselves; I just want to know where the best place to hack it into the build system might be.
Comment 8•11 years ago
|
||
You can probably build with MOZ_SHARED_ICU=1.
Comment 9•11 years ago
|
||
I didn't know that was still possible. Thanks!
Updated•11 years ago
|
Blocks: fxdesktopbacklog
Whiteboard: p=0
Updated•11 years ago
|
Status: NEW → ASSIGNED
Whiteboard: p=0 → p=13 s=it-31c-30a-29b.2
Updated•11 years ago
|
Whiteboard: p=13 s=it-31c-30a-29b.2 → p=13
Reporter | ||
Updated•11 years ago
|
Assignee: gps → nobody
Reporter | ||
Updated•11 years ago
|
Whiteboard: p=13 → p=8
Updated•11 years ago
|
Status: ASSIGNED → NEW
Updated•11 years ago
|
No longer blocks: fxdesktopbacklog
Flags: firefox-backlog+
Comment 10•10 years ago
|
||
Still there are some regressions in cpp unit tests though, with separated ICU data file, I confirmed filesize decrease both in dmg (2.5MB) and extracted total files (9.8MB):
file | original | patched | increase/decrease
-------------+-------------+-------------+---------------------------
dmg | 98,300,400 | 95,656,651 | - 2,643,749 (2.5MB, 2.7%)
| | |
XUL | 260,032,832 | 239,332,864 | - 20,699,968 (19.7MB, 8%)
icudt52l.dat | - | 10,386,304 | + 10,386,304 (9.9MB)
patched
https://treeherder.mozilla.org/#/jobs?repo=try&revision=a21bdc998376
original
https://treeherder.mozilla.org/ui/#/jobs?repo=mozilla-central&revision=0500f1032ea1
Also, confirmed that icudt52l.dat works well both with i386 and x86_64, not sure ppc's case ("l" in icudt52l means "little endian"), but it's no more supported and doesn't matter, right?
Comment 11•10 years ago
|
||
We may not be a tier-1 port anymore, but we're definitely still out there making PowerPC versions for OS X (TenFourFox and Tenfourbird). If there's an endian problem we'll work around it; just don't break us unnecessarily!
Comment 12•10 years ago
|
||
(In reply to Cameron Kaiser [:spectre] from comment #11)
> We may not be a tier-1 port anymore, but we're definitely still out there
> making PowerPC versions for OS X (TenFourFox and Tenfourbird). If there's an
> endian problem we'll work around it; just don't break us unnecessarily!
Oops, sorry for my ignorance, and thank you for letting me know that.
Currently the patch has configure option to disable separated ICU data file (--with-icu-data-archive=no).
Also, it could be possible to store two or more ICU data files with some minor tweak for build scripts,
since ICU API accepts a directory path for the ICU data file, instead of a individual file path.
Assignee | ||
Comment 13•9 years ago
|
||
I looked into this in the context of bug 1239083 and ICU is very flexible here:
http://userguide.icu-project.org/icudata
We could ship the .dat file as a file and point ICU at the directory containing it, or we could bundle it in omni.ja and call udata_setCommonData to pass a pointer to the data blob in memory.
Comment 14•9 years ago
|
||
should we apply same change to other OS/arch at once in this bug?
or apply to OSX first and do others in separated bug?
Flags: needinfo?(ted)
Assignee | ||
Comment 15•9 years ago
|
||
I'm going to look at making this change for all platforms.
Assignee: nobody → ted
Flags: needinfo?(ted)
Assignee | ||
Comment 16•9 years ago
|
||
Assignee | ||
Comment 17•9 years ago
|
||
Okay, I've got a WIP patch that works on Linux, that Try push is to see how it fares on other platforms. This patch configures ICU --with-data-packaging=archive, which builds an "icudt52.dat" file containing the data instead of linking it into a shared library. This works, but I don't know if we'd rather ship it in omni.ja or not.
https://hg.mozilla.org/try/rev/7754651a2957
Assignee | ||
Comment 18•9 years ago
|
||
Assignee | ||
Comment 19•9 years ago
|
||
Review commit: https://reviewboard.mozilla.org/r/33241/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/33241/
Attachment #8714787 -
Flags: review?(mh+mozilla)
Assignee | ||
Comment 20•9 years ago
|
||
Assignee | ||
Comment 21•9 years ago
|
||
Comment on attachment 8714787 [details]
MozReview Request: bug 926980 - load ICU data from an archive file. r?glandium,waldo
Review request updated; see interdiff: https://reviewboard.mozilla.org/r/33241/diff/1-2/
Assignee | ||
Comment 22•9 years ago
|
||
I had mucked up the package-manifest in the first Try run, the second one looks slightly better, but it's still broken packaging on Mac, probably because Mac needs special handling for app bundles.
Comment 23•9 years ago
|
||
Comment on attachment 8714787 [details]
MozReview Request: bug 926980 - load ICU data from an archive file. r?glandium,waldo
https://reviewboard.mozilla.org/r/33241/#review30075
You'll also want some other reviewers for the C++ code parts once you're all done with the FIXMEs and TODOs.
::: build/autoconf/icu.m4:82
(Diff revision 2)
> + ICU_DATA_FILE="icudt${version}l.dat"
This should only be set when not building against system ICU, I guess...
BTW, tell me if I got this right: we still end up with a icudt/icudata dll, which itself loads the .dat file?
::: config/external/icu/Makefile.in:22
(Diff revision 2)
> + #FIXME
#FIXME :)
::: js/src/shell/js.cpp:6981
(Diff revision 2)
> +#if EXPOSE_INTL_API
> + std::string binpath = sArgv[0];
> +#if defined(XP_WIN)
> + char* bindir = &binpath[0];
> + PathRemoveFileSpecA(bindir);
> +#else
> + char* bindir = dirname(&binpath[0]);
> +#endif
> + u_setDataDirectory(bindir);
> +#endif
This should presumably be conditional on not building against system ICU.
::: js/src/tests/lib/jittests.py:587
(Diff revision 2)
> - 'libplc4.so', 'libplds4.so']
> + 'libplc4.so', 'libplds4.so', 'icudt56l.dat']
This will require a manual change for every icu version bump :(
Attachment #8714787 -
Flags: review?(mh+mozilla)
Assignee | ||
Comment 24•9 years ago
|
||
https://reviewboard.mozilla.org/r/33241/#review30075
> This should only be set when not building against system ICU, I guess...
>
> BTW, tell me if I got this right: we still end up with a icudt/icudata dll, which itself loads the .dat file?
ICU has a "stubdata" library that it builds when you build --with-data-packaging=archive:
https://dxr.mozilla.org/mozilla-central/source/intl/icu/source/stubdata/stubdata.c
The other libraries want to link against that `icudt56` symbol, so it provides it, but then they fall back to loading from the data file. Kind of janky. Chromium has a little patch to link stubdata.c into the icu common library:
https://chromium.googlesource.com/chromium/deps/icu/+/42c58d4e49f2250039f0e98d43e0b76e8f5ca024/icu.gyp#344
...although I'm not really sure what configuration they're shipping.
> #FIXME :)
AFAIK we don't build this codepath right now (non-shared-ICU on Windows), and the patch in bug 1239083 removes this entire Makefile anyway. Do you mind if I just leave this broken?
> This will require a manual change for every icu version bump :(
Yeah, this sucks. We could do some hackery to embed the data into the js shell binary, that might make life simpler there.
Assignee | ||
Comment 25•9 years ago
|
||
https://reviewboard.mozilla.org/r/33241/#review30075
> AFAIK we don't build this codepath right now (non-shared-ICU on Windows), and the patch in bug 1239083 removes this entire Makefile anyway. Do you mind if I just leave this broken?
I fixed this by turning this into an $(error). I plan to try to remove the shared-ICU configuration entirely, which will get rid of the bits here we're not using and simplify this.
> Yeah, this sucks. We could do some hackery to embed the data into the js shell binary, that might make life simpler there.
I looked into this, and there's no way to override this filename without patching ICU, it's hardcoded in a header:
https://dxr.mozilla.org/mozilla-central/rev/584870f1cbc5d060a57e147ce249f736956e2b62/intl/icu/source/common/unicode/utypes.h#137
I think we can do something smarter here like linking the ICU data into the shell binary, but I'd rather try to fix that as part of bug 1239083 where I don't have to work around ICU's build system.
Assignee | ||
Comment 26•9 years ago
|
||
Assignee | ||
Comment 27•9 years ago
|
||
New patch incoming shortly, I fixed the OS X universal build and made some changes to better support standalone JS. Notably, if we're building JS standalone, I made it so we continue building ICU as we currently are, so that the ICU data gets linked against the shell etc, so that people building just the shell or embedders don't have to worry about calling u_setDataDirectory to make things work. There's a variable MOZ_ICU_DATA_ARCHIVE that indicates whether we're using a separate ICU data archive file or not.
I also gave in and added the u_setDataDirectory to jsapi-tests' main() to make that work for now. I would like to make things less bad, but like I said in my previous comment I would rather tackle that as part of the moz.build work because it'll be easier.
Comment 28•9 years ago
|
||
https://reviewboard.mozilla.org/r/33241/#review30501
::: js/src/shell/js.cpp:6989
(Diff revision 2)
> + u_setDataDirectory(bindir);
I'm not happy with forcing embedders to call u_setDataDirectory themselves. We've never required embedders to call any ICU functions themselves. Instead, we provide such operations through our own JSAPI functions that call them, e.g. JS_SetICUMemoryFunctions. This also lets us document where/when these functions should be called, e.g. JS_SetICUMemoryFunctions can *only* be called before JS_Init.
Here, it looks like u_setDataDirectory should be performed by JS_Init: a called-once function that every embedder must call to use the JS engine. But to be honest, I don't really understand why this sort of thing has to be done by the embedder at all. ICU is part of SpiderMonkey and is managed/maintained by it. Why isn't that also true of the data file? It seems like JS_Init should remain argument-less, and inside it should be calling u_setDataDirectory and passing the location of the SpiderMonkey library as an argument to it, then finding the sibling data file in the directory, more or less. I'd really prefer we didn't foist data-finding responsibilities on every embedder, if at all possible.
Also regarding adding this to JS_Init: by not adding it there, you're missing all the other JS_Init users. In particular, it seems very likely that by not changing how jsapi-tests are set up, you've broken js/src/jsapi-tests/testIntlAvailableLocales.cpp (run as $objdir/js/src/jsapi-tests/jsapi-tests testIntlAvailableLocales, if memory serves). Make sure to give that a shot locally before trying to run with this.
::: js/src/tests/lib/jittests.py:587
(Diff revision 2)
> - 'libplc4.so', 'libplds4.so']
> + 'libplc4.so', 'libplds4.so', 'icudt56l.dat']
Add this version-bump change requirement to the NOTE at the end of intl/update-icu.sh, so that people have a fighting chance of running across it during an upgrade.
Assignee | ||
Comment 29•9 years ago
|
||
https://reviewboard.mozilla.org/r/33241/#review30501
> I'm not happy with forcing embedders to call u_setDataDirectory themselves. We've never required embedders to call any ICU functions themselves. Instead, we provide such operations through our own JSAPI functions that call them, e.g. JS_SetICUMemoryFunctions. This also lets us document where/when these functions should be called, e.g. JS_SetICUMemoryFunctions can *only* be called before JS_Init.
>
> Here, it looks like u_setDataDirectory should be performed by JS_Init: a called-once function that every embedder must call to use the JS engine. But to be honest, I don't really understand why this sort of thing has to be done by the embedder at all. ICU is part of SpiderMonkey and is managed/maintained by it. Why isn't that also true of the data file? It seems like JS_Init should remain argument-less, and inside it should be calling u_setDataDirectory and passing the location of the SpiderMonkey library as an argument to it, then finding the sibling data file in the directory, more or less. I'd really prefer we didn't foist data-finding responsibilities on every embedder, if at all possible.
>
> Also regarding adding this to JS_Init: by not adding it there, you're missing all the other JS_Init users. In particular, it seems very likely that by not changing how jsapi-tests are set up, you've broken js/src/jsapi-tests/testIntlAvailableLocales.cpp (run as $objdir/js/src/jsapi-tests/jsapi-tests testIntlAvailableLocales, if memory serves). Make sure to give that a shot locally before trying to run with this.
I fully agree! As written, this patch doesn't require embedders to do anything extra. This `u_setDataDirectory` call is only required when building JS as part of Gecko. For standalone JS builds things will continue to work as they do now.
Making JS_Init do the right thing here is hard, mostly because there just isn't an easy way to get the directory in question, which is why I resorted to doing this somewhere with access to `argv[0]`. If you've got a suggestion that you think will work reliably I'm happy to implement that instead.
All that being said, I think I can get rid of these changes as part of bug 1239083--ICU is actually pretty flexible about its data storage, it's just hard to do in practice here when I also have to wrangle with the terrible ICU build system. Consider parts of this patch a half-measure until I finish the job there.
I will check that jsapi-tests test, but note that I did make the same change to jsapi-tests' `main`.
> Add this version-bump change requirement to the NOTE at the end of intl/update-icu.sh, so that people have a fighting chance of running across it during an upgrade.
Thanks, will do.
Assignee | ||
Updated•9 years ago
|
Attachment #8714787 -
Attachment description: MozReview Request: bug 926980 - WIP: load ICU data from an archive file. r?glandium → MozReview Request: bug 926980 - load ICU data from an archive file. r?glandium,waldo
Attachment #8714787 -
Flags: review?(mh+mozilla)
Attachment #8714787 -
Flags: review?(jwalden+bmo)
Assignee | ||
Comment 30•9 years ago
|
||
Comment on attachment 8714787 [details]
MozReview Request: bug 926980 - load ICU data from an archive file. r?glandium,waldo
Review request updated; see interdiff: https://reviewboard.mozilla.org/r/33241/diff/2-3/
Assignee | ||
Comment 31•9 years ago
|
||
https://reviewboard.mozilla.org/r/33241/#review30501
> I fully agree! As written, this patch doesn't require embedders to do anything extra. This `u_setDataDirectory` call is only required when building JS as part of Gecko. For standalone JS builds things will continue to work as they do now.
>
> Making JS_Init do the right thing here is hard, mostly because there just isn't an easy way to get the directory in question, which is why I resorted to doing this somewhere with access to `argv[0]`. If you've got a suggestion that you think will work reliably I'm happy to implement that instead.
>
> All that being said, I think I can get rid of these changes as part of bug 1239083--ICU is actually pretty flexible about its data storage, it's just hard to do in practice here when I also have to wrangle with the terrible ICU build system. Consider parts of this patch a half-measure until I finish the job there.
>
> I will check that jsapi-tests test, but note that I did make the same change to jsapi-tests' `main`.
Oh, apparently I made the change to jsapi-tests but didn't update the review with that version of the patch.
Assignee | ||
Comment 32•9 years ago
|
||
https://reviewboard.mozilla.org/r/33241/#review30075
> I fixed this by turning this into an $(error). I plan to try to remove the shared-ICU configuration entirely, which will get rid of the bits here we're not using and simplify this.
Patch for removing MOZ_SHARED_ICU is in bug 1247396.
Comment 33•9 years ago
|
||
Comment on attachment 8714787 [details]
MozReview Request: bug 926980 - load ICU data from an archive file. r?glandium,waldo
https://reviewboard.mozilla.org/r/33241/#review31361
I think you should rebase on top of the MOZ_SHARED_ICU removal, instead of the other way around.
::: build/autoconf/icu.m4:112
(Diff revision 3)
> + dnl JS standalone so that embeddors don't have to deal with it.
embedders?
::: build/autoconf/icu.m4:115
(Diff revision 3)
> + #TODO: the l is actually endian-dependent
I bet gaston will come to fix this when it breaks his openbsd sparc builds.
::: build/autoconf/icu.m4:118
(Diff revision 3)
> + else
> + # In this case we won't link to the stubdata library.
> + ICU_LIB_NAMES="$ICU_LIB_NAMES $ICU_DATA_LIB"
You should test -n $JS_STANDALONE, so that this branch ends up close to the comment on lines 111-112.
::: build/autoconf/icu.m4:189
(Diff revision 3)
> + HOST_ICU_BUILD_OPTS="$HOST_ICU_BUILD_OPTS --with-data-packaging=archive"
All the host ICU part should go away.
Attachment #8714787 -
Flags: review?(mh+mozilla)
Assignee | ||
Comment 34•9 years ago
|
||
https://reviewboard.mozilla.org/r/33241/#review31361
I kinda didn't want to deal with that headache since I already had this patch mostly working. Is there value there aside from the patches winding up slightly simpler?
> I bet gaston will come to fix this when it breaks his openbsd sparc builds.
He's gonna be even more broken by bug 1239083, since we're only going to have a little-endian file in-tree. I guess we can just error in configure and require that big-endian builds use --with-system-icu or --disable-intl-api.
Comment 35•9 years ago
|
||
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #34)
> > I bet gaston will come to fix this when it breaks his openbsd sparc builds.
>
> He's gonna be even more broken by bug 1239083, since we're only going to
> have a little-endian file in-tree. I guess we can just error in configure
> and require that big-endian builds use --with-system-icu or
> --disable-intl-api.
Why can't there be a big-endian file in the tree as well? We have other big-endian NPOTB stuff.
Assignee | ||
Comment 36•9 years ago
|
||
No real reason other than it's an extra 10MB file and I don't know how to generate one without building on a big-endian machine currently.
Comment 37•9 years ago
|
||
I can probably do that in a separate bug if :gaston doesn't get to it first, if you're willing to accept it.
Assignee | ||
Comment 38•9 years ago
|
||
I think we could take it. That'd be part of bug 1239083. This patch probably just needs a tiny check to use 'b' or 'l' depending on target endianness. I punted on that because we didn't have an existing endianness check in configure anywhere.
Updated•9 years ago
|
Attachment #8714787 -
Flags: review?(jwalden+bmo)
Comment 39•9 years ago
|
||
Comment on attachment 8714787 [details]
MozReview Request: bug 926980 - load ICU data from an archive file. r?glandium,waldo
https://reviewboard.mozilla.org/r/33241/#review31849
::: build/autoconf/icu.m4:112
(Diff revision 3)
> + dnl JS standalone so that embeddors don't have to deal with it.
*grmbls and cantankers about the goofy "embeddor" spelling versus embedder*
::: js/src/jsapi-tests/tests.cpp:124
(Diff revision 3)
> +#endif
Okay. I guess we do have to foist work that's something like this onto every caller.
But, I still want the u_setDataDirectory call pushed into JS_Init. JSAPI's ability to provide a useful Intl API is implicated here. The JS engine provides Intl functionality via ICU. The JS engine initializes ICU. It's responsible for setting ICU's memory functions, for memory-reporting purposes. It should be responsible for making that call.
JS_Init should take a const char\* in all build configurations. Then, #ifdef MOZ_ICU_DATA_ARCHIVE, assert the pointer is non-null with an assertion-reason-string indicating the user didn't pass a data location and make the u_setDataDirectory call. Otherwise, or if !EXPOSE_INTL_API, assert the pointer is null. The u_setDataDirectory bits should go within the #if EXPOSE_INTL_API in JS_Init, directly before the u_init call.
Assignee | ||
Updated•9 years ago
|
Attachment #8714787 -
Attachment is obsolete: true
Assignee | ||
Comment 40•9 years ago
|
||
This patch got folded into the patch in bug 1239083.
Assignee | ||
Updated•9 years ago
|
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Updated•7 years ago
|
Product: Core → Firefox Build System
You need to log in
before you can comment on or make changes to this bug.
Description
•