Closed Bug 1303091 Opened 4 years ago Closed 3 years ago

Share static Intl data across compartments

Categories

(Core :: JavaScript: Internationalization API, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla53
Tracking Status
firefox52 --- fixed
firefox53 --- fixed

People

(Reporter: anba, Assigned: anba)

References

Details

(Whiteboard: [MemShrink:P2])

Attachments

(2 files, 2 obsolete files)

bug 837961, comment #20:
> 30K is maybe borderline for memory consumption for this, on a per-global
> basis.  (At least if we don't know of an insane framework footgunning itself
> because of this, as always.)  Combine it with redoing a bunch of work every
> time a global does date/time formatting, and it seems like that probably
> pushes it past the point we should permit for too long.

> File a followup to move IntlData.js and IntlTzData.js into a single store in
> the runtime, please.  Should be fairly easy to fix, but let's get this
> featurework landed to not have reviewing bog down.
Assignee: nobody → andrebargull
Status: NEW → ASSIGNED
Attached patch bug1303091.patch (obsolete) — Splinter Review
Does this patch look at least somewhat correct? 


It gives us a 2x improvement for a simple newGlobal() µ-benchmark.

Without bug 837961:  2300-2400 ms
With bug 837961:  4900-5200 ms
With bug 837961 and patch:  2300-2500 ms

```
var t = Date.now();
for (var i = 0; i < 1000; ++i) {
  newGlobal().eval("Intl.DateTimeFormat");
}
print(Date.now() - t);
```
Attachment #8806749 - Flags: feedback?(jwalden+bmo)
Blocks: 1314354
Summary: Share static Intl data across runtimes → Share static Intl data across compartments
Comment on attachment 8806749 [details] [diff] [review]
bug1303091.patch

Review of attachment 8806749 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/builtin/Intl.cpp
@@ +1994,4 @@
>  {
> +    if ('a' <= c && c <= 'z')
> +        c &= ~0x20;
> +    return c;

Do it this way:

    return ('a' <= c && c <= 'z')
           ? (c & ~0x20)
           : c;

and make the function constexpr, then you can add

static_assert(ToUpperASCII('a') == 'A', "verifying 'a' uppercases correctly");
static_assert(ToUpperASCII('m') == 'M', "verifying 'm' uppercases correctly");
static_assert(ToUpperASCII('z') == 'Z', "verifying 'z' uppercases correctly");
static_assert(ToUpperASCII(u'a') == u'A', "verifying u'a' uppercases correctly");
static_assert(ToUpperASCII(u'k') == u'K', "verifying u'k' uppercases correctly");
static_assert(ToUpperASCII(u'z') == u'Z', "verifying u'z' uppercases correctly");

and have some compile-time checking of the algorithm.

@@ +2014,5 @@
> +{
> +    uint32_t hash = 0;
> +    for (size_t i = 0; i < length; i++) {
> +        hash = mozilla::AddToHash(hash, ToUpperASCII(s[i]));
> +    }

No bracing on single-line loop body.

@@ +2050,5 @@
> +        return EqualCharsIgnoreCaseASCII(lookup.latin1Chars, keyChars, lookup.length);
> +    return EqualCharsIgnoreCaseASCII(keyChars, lookup.twoByteChars, lookup.length);
> +}
> +
> +namespace TimeZones {

I think we tend to use lowercasing, so just timezones.

@@ +2079,2 @@
>      while (true) {
> +    next:

Labels are generally indented two spaces from this column, I think.

@@ +2089,5 @@
>              break;
>  
> +        // TODO: Do we save time when we first store the legacy names in a Hashset?
> +        for (size_t i = 0; i < mozilla::ArrayLength(TimeZones::legacyICUTimeZones); i++) {
> +            if (equal(rawTimeZone, TimeZones::legacyICUTimeZones[i]))

for (const auto& legacyTimeZone : TimeZones::legacyICUTimeZones) {
    if (equal(rawTimeZone, legacyTimeZone))
        goto next;
}

if a hashset isn't used.  Too bad we can't throw gperf at this.

Alternatively (and this is likely even more followupish), generalizing jsautokw.py some to work for this might not be unreasonable, either.

Moving this code into a separate function would be nice, to make the predicate question clearer in the algorithm, and to continue rather than goto.

@@ +2114,5 @@
> +        return false;
> +    }
> +
> +    for (size_t i = 0; i < mozilla::ArrayLength(TimeZones::timeZoneNamesNonICU); i++) {
> +        const char* rawTimeZone = TimeZones::timeZoneNamesNonICU[i];

for (const char* rawTimeZone : TimeZones::timeZoneNamesNonICU) {

@@ +2142,5 @@
> +    static_assert((mozilla::ArrayLength(TimeZones::timeZoneLinkNamesNonICU) & 1) == 0,
> +                  "Missing value in TimeZones::timeZoneLinkNamesNonICU array");
> +    for (size_t i = 0; i < mozilla::ArrayLength(TimeZones::timeZoneLinkNamesNonICU); i += 2) {
> +        const char* rawLinkName = TimeZones::timeZoneLinkNamesNonICU[i];
> +        const char* rawTarget = TimeZones::timeZoneLinkNamesNonICU[i + 1];

for (const LinkAndTarget& linkAndTarget : TimeZones::timeZoneLinkNamesNonICU) {
    const char* rawLinkName = linkAndTarget.link;
    const char* rawTarget = linkAndTarget.target;

@@ +2150,5 @@
> +        if (!linkName)
> +            return false;
> +
> +        MOZ_ASSERT(rawTarget != nullptr);
> +        TimeZoneName target = Atomize(cx, rawTarget, strlen(rawTarget));

I think multiple Atomize calls can GC, I think, so this is probably a GC hazard as written and needs the first of these to be a Rooted.  (And I'd kind of prefer if we just used Rooted for all of these, to avoid being too cutesy.)

@@ +2192,5 @@
> +bool
> +js::SharedIntlData::tryCanonicalizeNonICUTimeZoneName(JSContext* cx, HandleString timeZone,
> +                                                      MutableHandleString result)
> +{
> +    if (!initTimeZones(cx))

"init" should probably be "ensure" or somesuch.

@@ +2203,5 @@
> +    TimeZoneHasher::Lookup lookup(timeZoneFlat);
> +    MOZ_ASSERT(availableTimeZones.has(lookup), "Invalid time zone");
> +
> +    // ICU/CLDR doesn't map all names to their new IANA time zone names.
> +    // http://bugs.icu-project.org/trac/ticket/12044

SSLized would be nice.

@@ +2208,5 @@
> +    if (TimeZoneMap::Ptr p = timeZoneLinkNamesNonICU.lookup(lookup)) {
> +        // Case 1: ICU/CLDR maps the time zone to another time zone, e.g.
> +        // America/Virgin is mapped to America/St_Thomas, whereas IANA maps
> +        // America/Virgin to America/Port_of_Spain.
> +        // Only perform the update when ICU supports the new time zone.

This most assuredly should be a followup-land thing, as this is getting too messy on a deadline.

But, if ICU doesn't support the IANA target, should we be supporting the IANA name that links to it?  We'll return whatever ICU maps the name to (or just the name), and that's definitely not IANA behavior, and it's questionable whether it's good to knowingly deviate from what IANA says or not.

@@ +2225,5 @@
> +
> +void
> +js::SharedIntlData::mark(JSTracer* trc)
> +{
> +    // TODO: Does this need to be called and if yes, when?

I...am not sure off the top of my head where this needs to be called from.  :-\

JSCompartment::clearTables should probably have an assertion that the shared Intl data is all empty.

@@ +2268,5 @@
>  
> +    SharedIntlData& sharedIntlData = cx->sharedIntlData;
> +
> +    // Some time zone names are incorrectly canonicalized by ICU, handle those
> +    // first.

Run-on sentence: switch to a colon, maybe?

::: js/src/builtin/Intl.h
@@ +83,5 @@
> +     */
> +    TimeZoneSet availableTimeZones;
> +
> +    TimeZoneSet timeZoneNamesNonICU;
> +    TimeZoneMap timeZoneLinkNamesNonICU;

I puzzled this all out when you first wrote it, and why it was all needed/necessary/etc. and how it worked.  Then I apparently failed to comment on the whys of it, and to ask for explanations to be placed in comments.  :-(  So now I'm spending an unholy amount of time relearning how it all works and fits together.

So, this all desperately needs better comments.

First, let's make the existing comment by |availableTimeZones| into a general-information comment.  It's no longer really relevant to that set, precisely.  It should also discuss why we have all these sets and maps floating around, when naively one might expect just consulting ICU would be good enough.  How about:

"""
Information tracking the set of the supported time zone names, derived from the IANA time zone database <https://www.iana.org/time-zones>.

There are two kinds of IANA time zone names: Zone and Link (denoted as such in database source files). Zone names are the canonical, preferred name for a time zone, e.g. Asia/Kolkata. Link names simply refer to target Zone names for their meaning, e.g. Asia/Calcutta targets Asia/Kolkata. That a name is a Link doesn't *necessarily* reflect a sense of deprecation: some Link names also exist partly for convenience, e.g. UTC and GMT as Link names targeting the Zone name Etc/UTC.

Two data sources determine the time zone names we support: those ICU supports (described in intl/icu/source/data/misc), and IANA's zone information (described in intl/tzdata/source).  (We choose to include incomplete historical information for pre-1970 time zones, as found in IANA's backzone file.)

Unfortunately the names ICU and IANA support, and their Link relationships from name to target, aren't identical, so we can't simply implicitly trust ICU's name handling.  We must perform various preprocessing of user-provided zone names and post-processing of ICU-provided zone names to implement ECMA-402's IANA-consistent behavior.
"""

Now, adjacent to |availableTimeZones|:

"""
As a threshold matter, available time zones are those time zones ICU supports, via ucal_openTimeZones.  But ICU supports additional non-IANA time zones described in intl/icu/source/tools/tzcode/icuzones (listed in IntlTzData.cpp's |legacyICUTimeZones|) for its own backwards compatibility purposes.  This set consists of ICU's supported time zones, minus all backwards-compatibility time zones.
"""

Onward to |timeZoneNamesNonICU|.  This set consists of

  (IANA Zone time zones not present in ICU) +
  (zones that IANA considers as zones, but ICU considers as links to other zones)

On fresh read I don't think we need to track this first subset.  The context where we're using the overall set is in CanonicalizeTimeZoneName.  By construction, the argument we receive IsValidTimeZoneName.  If we look at how IsValidTimeZoneName is implemented now, it consults intl_availableTimeZones(), which is only names ICU reports via ucal_openTimeZones.  If we look at how it's implemented after this patch, it's consulting all time zones reported by ucal_openTimeZones minus ICU's legacy zones.  But we hae

  assert(typeof timeZone === "string", "CanonicalizeTimeZoneName");
  assert(timeZone === IsValidTimeZoneName(timeZone), "Time zone name not normalized");

as assertions present in CanonicalizeTimeZoneName.  So it must be the case that |timeZone| is something ICU recognizes, or else |IVTZN(timeZone) === null| which would fail the assert.  So why do we need to include this not-present subset in the overall set?  Seems like we should either mention the not-present zone names in a comment (so readers know which IANA time zones ICU doesn't support), or (preferably) add a DEBUG-only list and verify in DEBUG builds that CanonicalizeTimeZoneName never observes one of them.

Now, considering the other subset, which now makes a whole lot more sense of the "Case 2" comment in the code that was using this set.  I think this comment is adequate to document it:

"""
IANA treats some time zone names as Zones, that ICU instead treats as Links.  For example, IANA considers "America/Indiana/Indianapolis" to be a Zone and "America/Fort_Wayne" a Link that targets it, but ICU considers the former a Link that targets "America/Indianapolis" (which IANA treats as a Link).

ECMA-402 requires that we respect IANA data, so if we're asked to canonicalize a time zone name in this set, we must *not* return ICU's canonicalization.
"""

As I look at this comment, |timeZoneNamesNonICU| now seems pretty vague/unhelpful as a name.  How about |ianaZonesThatICUThinksAreLinks|?  Maybe?  Probably no way to avoid a word salad here, but better that than a pretty-generic name.

And for the final map, |timeZoneLinkNamesNonICU|.  This map consists of

  ((zone: target) for all IANA Link zones not present in ICU) +
  ((zone: target) for all IANA Link zones where IANA and ICU disagree as to target) +
  ((zone: target) for all IANA Link zones where ICU instead thinks the name is a Zone)

For the same reasons as for the previous set, I don't think we have to include the not-present subset here, and we should relegate it to a comment, or to DEBUG-only code to double-check impossibility.

Now, as for a comment documenting the map's meaning, with respect to its submaps:

"""
IANA treats some time zone names as Links to one target, that ICU instead treats as either Zones, or Links to different targets.  An example of the former is "Asia/Calcutta, which IANA assigns the target "Asia/Kolkata" but ICU considers its own Zone.  An example of the latter is "America/Virgin", which IANA assigns the target "America/Port_of_Spain" but ICU assigns the target "America/St_Thomas".

ECMA-402 requires that we respect IANA data, so if we're asked to canonicalize a time zone name that's a key in this map, we *must* return the corresponding value and *must not* return ICU's canonicalization.
"""

And, again, the map name is not super-helpful.  |ianaLinksThatDifferInICU| maybe?

::: js/src/builtin/IntlTimeZoneData.cpp
@@ +49,5 @@
>  };
>  
>  // Format:
> +// "LinkName", "Target" // ICU-Target [time zone file]
> +const char* const timeZoneLinkNamesNonICU[] = {

struct LinkAndTarget
{
    const char* link;
    const char* target;
};

const LinkAndTarget[] = {
  { "Africa/Asmera", "Africa/Asmara" },
  { "America/Buenos_Aires", "America/Argentina/Buenos_Aires" },
  ...
};

seems better than doing the indexing-by-two manually, and giving stuff names.

::: js/src/builtin/make_intl_data.py
@@ +615,2 @@
>      parser_tz.add_argument("--tz", help="Local tzdata directory or file, if omitted uses <URL>")
> +    parser_tz.add_argument("--url", metavar="URL", help="Download url for tzdata (default: https://www.iana.org/time-zones/repository/releases/tzdata<VERSION>.tar.gz")

Missing a closing parenthesis.  :-)

::: js/src/moz.build
@@ -757,5 @@
>      'builtin/Function.js',
>      'builtin/Generator.js',
>      'builtin/Intl.js',
>      'builtin/IntlData.js',
> -    'builtin/IntlTzData.js',

You'll need to remove this file (unless Splinter doesn't show removed files, but I thought it did).
Attachment #8806749 - Flags: feedback?(jwalden+bmo) → feedback+
Attached patch bug1303091.patch (obsolete) — Splinter Review
@jonco:
This patch adds a new member (js::SharedIntlData sharedIntlData) to JSContext. The SharedIntlData class itself contains GCHashSet and GCHashMap instances with JSAtom keys/values. And I'm not quite sure if I've done the GC bits correctly:

I've changed JSContext::mark() to call the GCHash{Map,Set}::trace() methods (implemented in js::SharedIntlData::mark()). And JSContext::~JSContext() was changed to call GCHash{Map,Set}::finish() (implemented in js::SharedIntlData::destroyInstance()) to avoid having reachable roots when destroyRuntime() is called. Is this all I need to do to mark sure the sets/maps are correctly handled by the GC, or did I miss something?


@Waldo:
More time zone fun! :-)
I always assumed ICU supports backzone time zones based on what it returns from ucal_getCanonicalTimeZoneID() and the contents of intl/tzdata/source/timezoneTypes.txt. But actually it does _not_ support the time zones listed in backzone! So, we've got two choices here: Canonicalize a time zone name to the zone whose rules are effectively used. Or keep using the backzone names without actually supporting backzone. I prefer the second choice for the reasons given by Stephen Colebourne at <https://bugs.ecmascript.org/show_bug.cgi?id=1892#c1>. (The third choice is to generate the ICU time zone files, including backzone, ourselves. But that should happen in a follow-up bug, if at all.)

I've updated make_intl_data.py to retrieve the correct set of ICU supported time zones (intl/tzdata/source/zoneinfo64.txt seems to contain the required information). And I've also removed a few options from make_intl_data.py, so it's less likely I shoot myself in the foot when updating tzdata. Plus the time zone related test files are now automatically generated by make_intl_data.py, too.
Attachment #8806749 - Attachment is obsolete: true
Attachment #8808241 - Flags: review?(jwalden+bmo)
Attachment #8808241 - Flags: review?(jcoppeard)
(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #2)
> This most assuredly should be a followup-land thing, as this is getting too
> messy on a deadline.
> 
> But, if ICU doesn't support the IANA target, should we be supporting the
> IANA name that links to it?  We'll return whatever ICU maps the name to (or
> just the name), and that's definitely not IANA behavior, and it's
> questionable whether it's good to knowingly deviate from what IANA says or
> not.
> 

Hmm, good question. I need to think about this one.


> On fresh read I don't think we need to track this first subset. [...]
> 
> ...
> 
> For the same reasons as for the previous set, I don't think we have to
> include the not-present subset here, and we should relegate it to a comment,
> or to DEBUG-only code to double-check impossibility.

make_intl_data.py now contains checks (the missing/additional time zone tests in findIncorrectICUZones() and findIncorrectICULinks()) to make sure this case can't happen any more.
Whiteboard: [MemShrink]
Comment on attachment 8808241 [details] [diff] [review]
bug1303091.patch

Review of attachment 8808241 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for doing this.  

Although JSRuntime and JSContext are the same thing now, it *probably* makes more sense to store this data as part of JSRuntime just because that's where all the other shared data is stored at the moment.

I'd prefer it if you called the mark() method trace() since that's what we're moving towards, but it's no big deal.

Otherwise, the GC parts look good.
Attachment #8808241 - Flags: review?(jcoppeard) → review+
Note to self: Request uplift to Aurora per bug 1314354, comment 3.
Comment on attachment 8808241 [details] [diff] [review]
bug1303091.patch

Review of attachment 8808241 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/builtin/Intl.cpp
@@ +2094,2 @@
>          MOZ_ASSERT(size >= 0);
> +        RootedAtom timeZone(cx, Atomize(cx, rawTimeZone, size_t(size)));

Put

  RootedAtom timeZone(cx);

outside the loop, and make this an assignment.

@@ +2115,5 @@
> +    }
> +
> +    for (const char* rawTimeZone : timezone::ianaZonesTreatedAsLinksByICU) {
> +        MOZ_ASSERT(rawTimeZone != nullptr);
> +        RootedAtom timeZone(cx, Atomize(cx, rawTimeZone, strlen(rawTimeZone)));

You can reuse the same outer |RootedAtom timeZone| that was used in the previous loop here.

@@ +2141,5 @@
> +        const char* rawLinkName = linkAndTarget.link;
> +        const char* rawTarget = linkAndTarget.target;
> +
> +        MOZ_ASSERT(rawLinkName != nullptr);
> +        RootedAtom linkName(cx, Atomize(cx, rawLinkName, strlen(rawLinkName)));

Maybe have outside the loop

  RootedAtom linkName(cx);
  RootedAtom& target = timeZone;

and then just assign to each variable inside this loop.

@@ +2276,5 @@
>  
> +    SharedIntlData& sharedIntlData = cx->sharedIntlData;
> +
> +    // Some time zone names are canonicalized differently by ICU, handle those
> +    // first:

s/,/ --/ so as to not have a run-on sentence.

@@ +2278,5 @@
> +
> +    // Some time zone names are canonicalized differently by ICU, handle those
> +    // first:
> +    RootedString timeZone(cx, args[0].toString());
> +    RootedString ianaTimeZone(cx);

This naming is so much better.

::: js/src/builtin/make_intl_data.py
@@ +579,3 @@
>      return version
>  
> +def findIncorrectICUZones(iana_zones, iana_links, icu_zones, icu_links, ignoreBackzone):

Good argument names (although, wouldn't camelCaps be more typical style?  also for the other function, too).  I should have suggested this.

@@ +588,5 @@
> +    missingTimeZones = [zone for zone in iana_zones if not isICUTimeZone(zone)]
> +    # Normally zones in backzone are also present as links in one of the other
> +    # time zone files. Except Asia/Hanoi, this zone is only present in backzone.
> +    expectedMissing = [] if ignoreBackzone else [Zone("Asia/Hanoi")]
> +    if missingTimeZones != expectedMissing:

Huh, Python arrays compare deeply, not referentially.  Learn something every day.  Also, fix the run-on in the "Except" sentence somehow.  And maybe make it a parenthetical.

(Okay, maybe not quite deeply, but some bizarre hybrid:

[jwalden@find-waldo-now tmp]$ python
Python 2.7.12 (default, Sep 29 2016, 13:30:34) 
[GCC 6.2.1 20160916 (Red Hat 6.2.1-2)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> x = [1, 2]
>>> x[1] = x
>>> x == x
True
>>> y = [1, 2]
>>> y[1] = y
>>> x == y
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: maximum recursion depth exceeded in cmp

Python is weird.)

@@ +590,5 @@
> +    # time zone files. Except Asia/Hanoi, this zone is only present in backzone.
> +    expectedMissing = [] if ignoreBackzone else [Zone("Asia/Hanoi")]
> +    if missingTimeZones != expectedMissing:
> +        raise RuntimeError(("Not all zones are present in ICU, did you forgot "
> +                            "to run intl/update-tzdata.sh? %s") % missingTimeZones)

Not sure you need the inner parentheses there.  Also, "forget".

@@ +596,5 @@
> +    # Zones which are only present in ICU?
> +    additionalTimeZones = [zone for zone in icu_zones if not isIANATimeZone(zone)]
> +    if additionalTimeZones:
> +        raise RuntimeError(("Additional zones present in ICU, did you forgot "
> +                            "to run intl/update-tzdata.sh? %s") % additionalTimeZones)

parentheses/forget

@@ +624,5 @@
>      # Links which are only present in ICU?
> +    additionalTimeZones = [zone for zone in icu_links.iterkeys() if not isIANATimeZone(zone)]
> +    if additionalTimeZones:
> +        raise RuntimeError(("Additional links present in ICU, did you forgot "
> +                            "to run intl/update-tzdata.sh? %s") % additionalTimeZones)

parentheses/forget

@@ +689,5 @@
> +        println(u'// "LinkName", "Target" // ICU-Target [time zone file]')
> +        println(u"struct LinkAndTarget");
> +        println(u"{");
> +        println(u"    const char* link;");
> +        println(u"    const char* target;");

Oops, make these

  const char* const link;
  const char* const target;

just for extra surety.  |ianaLinksCanonicalizedDifferentlyByICU| being const is basically the same, I guess, but more consting seems better than being slightly cute.

@@ +690,5 @@
> +        println(u"struct LinkAndTarget");
> +        println(u"{");
> +        println(u"    const char* link;");
> +        println(u"    const char* target;");
> +        println(u"};");

Print a blank line after this.

@@ +804,5 @@
> +
> +    generateTzDataLinkTestContent(
> +        testDir, version,
> +        "timeZone_backzone.js",
> +        u"// Backzone zones derived from IANA Time Zone Database.",

Please add into this comment (maybe have a second comment?) explaining why all of the entries in the generated test are identical, if |not ignoreBackzone|.  Something like

"""
This file was generated with historical, pre-1970 backzone information respected.  Therefore, every zone key listed below is its own Zone, not a Link to a modern-day target as IANA ignoring backzones would say.
"""

and

"""
This file was generated while ignoring historical, pre-1970 backzone information.  Therefore, every zone key listed below is part of a Link whose target is the corresponding value.
"""

or somesuch.

@@ +819,5 @@
> +
> +    generateTzDataLinkTestContent(
> +        testDir, version,
> +        "timeZone_backzone_links.js",
> +        u"// Backzone links derived from IANA Time Zone Database.",

Add similar comments to above.

@@ +837,5 @@
> +    (this_dir, this_file) = os.path.split(os.path.abspath(sys.argv[0]))
> +    this_dir = os.path.normpath(this_dir)
> +    if "/".join(this_dir.split(os.sep)[-3:]) != "js/src/builtin":
> +        raise RuntimeError("%s must reside in js/src/builtin" % sys.argv[0])
> +    topdir = "/".join(this_dir.split(os.sep)[:-3])

topsrcdir is the idiomatic name for this.

I am definitely a fan of orienting everything to topsrcdir, rather than having to grunge it out of the tree in random spots.  Good change.

@@ +905,5 @@
> +    parser_tags = subparsers.add_parser("langtags",
> +                                        help="Update language-subtag-registry")
> +    parser_tags.add_argument("--url",
> +                             metavar="URL",
> +                             default="http://www.iana.org/assignments/language-subtag-registry",

https

I'd be somewhat inclined to mandate that whatever's provided here match /^https:/, but unless there's a trivial optional argument to add_argument to let you do that, don't worry about it.

@@ +927,5 @@
> +    parser_tz.add_argument("--ignore-backzone",
> +                           action="store_true",
> +                           help="Ignore tzdata's 'backzone' file. Can be enabled to generate more\
> +                                 accurate time zone canonicalization reflecting the actual time\
> +                                 zones as used by ICU.")

Can't you do

  help="... "
       "... "
       "... "

and avoid the unsightly backslashes?  Same above.
Attachment #8808241 - Flags: review?(jwalden+bmo) → review+
(In reply to André Bargull from comment #3)
> I always assumed ICU supports backzone time zones based on what it returns
> from ucal_getCanonicalTimeZoneID() and the contents of
> intl/tzdata/source/timezoneTypes.txt. But actually it does _not_ support the
> time zones listed in backzone! So, we've got two choices here: Canonicalize
> a time zone name to the zone whose rules are effectively used. Or keep using
> the backzone names without actually supporting backzone. I prefer the second
> choice for the reasons given by Stephen Colebourne at
> <https://bugs.ecmascript.org/show_bug.cgi?id=1892#c1>.

The politically-contentiony bits of that comment are all sorts of lol.  :-\

So if I understand correctly, the second choice basically is to act as if Europe/Zagreb is a real time zone whose information merely by happenstance is identical to Europe/Belgrade, in the post-1970 era.  Even though the preferred time zone nowadays would be Europe/Belgrade.  Right?  And the first choice would be to make Europe/Zagreb resolve into Europe/Belgrade, which might make people fussy.  I agree the second choice of just leaving alone seems more or less sensible.

> (The third choice is
> to generate the ICU time zone files, including backzone, ourselves. But that
> should happen in a follow-up bug, if at all.)

So, with this change we would start knowing stuff about pre-1970 time zones, and Europe/Zagreb would be identical to Europe/Belgrade post-1970, but for pre-1970 times it would diverge?  I guess the question would be how much space this costs.  Ignoring pre-1970 times is probably not unreasonable (Sunbird did so at one time, according to dmose), but for little enough additional download I'd be inclined to support it.  Just guesstimating (for a future bug if at all), would we be talking a few K, a few tens of K, a few hundreds of K, or more?
Correction to that:

"""
...the second choice basically is to act as if Europe/Zagreb is a real time zone whose information merely by happenstance is identical to Europe/Belgrade at all times.
...
"""
Whiteboard: [MemShrink] → [MemShrink:P2]
(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #9)
> So if I understand correctly, the second choice basically is to act as if
> Europe/Zagreb is a real time zone whose information merely by happenstance
> is identical to Europe/Belgrade, in the post-1970 era.  Even though the
> preferred time zone nowadays would be Europe/Belgrade.  Right?  And the
> first choice would be to make Europe/Zagreb resolve into Europe/Belgrade,
> which might make people fussy.  I agree the second choice of just leaving
> alone seems more or less sensible.

Yes, that's it. "preferred time zone" is debatable, AFAIU the tzdata maintainers aim to avoid duplicate time zone information if the time zones match post-1970 [1]. But for Croatian users the preferred time zone is probably Europe/Zagreb instead of Europe/Belgrade. 

[1] There are various threads about this topic on the tzdata mailing list, for example <http://mm.icann.org/pipermail/tz/2014-July/021170.html>.

> 
> > (The third choice is
> > to generate the ICU time zone files, including backzone, ourselves. But that
> > should happen in a follow-up bug, if at all.)
> 
> So, with this change we would start knowing stuff about pre-1970 time zones,
> and Europe/Zagreb would be identical to Europe/Belgrade post-1970, but for
> pre-1970 times it would diverge?

Correct.

> I guess the question would be how much
> space this costs.  Ignoring pre-1970 times is probably not unreasonable
> (Sunbird did so at one time, according to dmose), but for little enough
> additional download I'd be inclined to support it.  Just guesstimating (for
> a future bug if at all), would we be talking a few K, a few tens of K, a few
> hundreds of K, or more?

I've got no idea. I'd assume it's only a few K, but it depends on how the data is internally represented in ICU.
Attached patch bug1303091.patchSplinter Review
Updated patch per review comments, carrying r+.


Important updates from Jonco's review:
- Moved the SharedIntlData instance from JSContext to JSRuntime
- JSRuntime doesn't have a mark/trace method entry point, therefore added traceSharedIntlData which is now called from js::gc::GCRuntime::traceRuntimeCommon
- JSRuntime also doesn't have a general memory metrics field (like RuntimeSizes::contexts for JSContext), so I simply added a new a entry to RuntimeSizes (RuntimeSizes::sharedIntlData) which is now reported separately in ReportJSRuntimeExplicitTreeStats
Attachment #8808241 - Attachment is obsolete: true
Attachment #8811958 - Flags: review+
Keywords: checkin-needed
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/db60e23a4c5b
Share static Intl data across compartments. r=Waldo, r=jonco
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/db60e23a4c5b
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla53
Comment on attachment 8811958 [details] [diff] [review]
bug1303091.patch

Approval Request Comment
[Feature/regressing bug #]: bug 837961 increased the time needed for initializing the Intl.DateTimeFormat prototype object. This prototype object is initialized as soon as the first Intl constructor function is directly or indirectly called from JavaScript. Based on local printf debugging, this already happens when a new browser tab is opened, which means the overall time needed to create a new tab has increased. This is especially noticeable in benchmarks which open and close many new tabs, like for example the tabpaint and tart benchmarks from Talos (bug 1314354).
[User impact if declined]: Increased time (*) needed to create a new browser tab. And also higher memory usage for each tab (resp. compartment), because without this patch each compartment holds its own copy for additional time zone data. (*) By a few milliseconds, but I can't provide exact numbers.
[Describe test coverage new/current, TreeHerder]: Try runs with the standard test suites (comment #13) and Talos runs (comment #14) to validate the changes improve the performance.
[Risks and why]: Medium risk. Code was moved from JavaScript to C++, so the usual risks for C++ apply here.
[String/UUID change made/needed]: None
Attachment #8811958 - Flags: approval-mozilla-aurora?
(In reply to André Bargull from comment #17)
> Comment on attachment 8811958 [details] [diff] [review]
> bug1303091.patch
> 
> Approval Request Comment
> [Feature/regressing bug #]: bug 837961 increased the time needed for
> initializing the Intl.DateTimeFormat prototype object. This prototype object
> is initialized as soon as the first Intl constructor function is directly or
> indirectly called from JavaScript. Based on local printf debugging, this
> already happens when a new browser tab is opened, which means the overall
> time needed to create a new tab has increased. This is especially noticeable
> in benchmarks which open and close many new tabs, like for example the
> tabpaint and tart benchmarks from Talos (bug 1314354).

The flags on this bug list 51 as affected, but bug 938961 seems to have landed in 52, could you clarify (or fix the status flags)?  Thanks.
Flags: needinfo?(andrebargull)
(In reply to Julien Cristau [:jcristau] from comment #18)
> The flags on this bug list 51 as affected, but bug 938961 seems to have
> landed in 52, could you clarify (or fix the status flags)?  Thanks.

Only 52 is affected, updated the status flags accordingly.
Flags: needinfo?(andrebargull)
Comment on attachment 8811958 [details] [diff] [review]
bug1303091.patch

let's take this fix for tab opening perf regression in aurora52, but keep an eye out for regressions
Attachment #8811958 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
need rebasing for aurora

grafting 376270:db60e23a4c5b "Bug 1303091 - Share static Intl data across compartments. r=Waldo, r=jonco"
merging js/src/builtin/Intl.cpp
merging js/src/builtin/IntlTzData.js and js/src/builtin/IntlTimeZoneData.h to js/src/builtin/IntlTimeZoneData.h
merging js/src/moz.build
merging js/src/vm/Runtime.cpp
warning: conflicts while merging js/src/builtin/IntlTimeZoneData.h! (edit, then use 'hg resolve --mark')
abort: unresolved conflicts, can't continue
(use 'hg resolve' and 'hg graft --continue')
Flags: needinfo?(andrebargull)
Rebased patch for Aurora.

The original patch didn't apply on Aurora because it's missing the changes from bug 1314920. To resolve the merge conflict I simply reran the js/src/builtin/make_intl_data.py script to regenerate js/src/builtin/IntlTimeZoneData.h and the time zone test files in js/src/tests/Intl/DateTimeFormat. IntlTimeZoneData.h and the test files are now based on tzdata2016h instead of tzdata2016i. 

Do I also need to set the "checkin needed" flag for Aurora patches, or are those patches handled differently?
Flags: needinfo?(andrebargull)
Depends on: 1321777
You need to log in before you can comment on or make changes to this bug.