Closed Bug 1320121 Opened 3 years ago Closed 3 years ago

Generate irregexp character tables with make_unicode.py

Categories

(Core :: JavaScript Engine, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla53
Tracking Status
firefox53 --- fixed

People

(Reporter: anba, Assigned: anba)

Details

Attachments

(1 file, 2 obsolete files)

Attached patch bug1320121.patch (obsolete) — Splinter Review
I've renamed "kIgnoreCaseWordCount" to "kIgnoreCaseWordRangeCount" so it matches the other generated names. And I've also renamed the "FOR_EACH_NON_ASCII_TO_ASCII_FOLDING" macro to "FOR_EACH_NON_LATIN1_TO_LATIN1_FOLDING" because it handles Latin1 and not ASCII characters.
Attachment #8814226 - Flags: review?(arai.unmht)
Comment on attachment 8814226 [details] [diff] [review]
bug1320121.patch

Review of attachment 8814226 [details] [diff] [review]:
-----------------------------------------------------------------

Thank you!

::: js/src/irregexp/RegExpCharacterRange.h
@@ +37,5 @@
> +
> +// -------------------------------------------------------------------
> +// CharacterRange
> +
> +// The '2' variant has inclusive from and exclusive to.

what does "'2' variant" mean here?

::: js/src/vm/make_unicode.py
@@ +141,5 @@
> +    start = next(b)
> +    for (curr, succ) in izip_longest(a, b):
> +        if curr + 1 != succ:
> +            yield (start, curr)
> +            start = succ

wow, it looks like a magic :D

@@ +889,5 @@
> +    def is_space(code):
> +        (_, _, flags) = case_info(code)
> +        return bool(flags & FLAG_SPACE)
> +
> +    def casefolds_to(code):

it would be nice to name this in a different way than casefolds_to_ascii that returns boolean.

how about "casefold" or "foldcase" ?

@@ +960,5 @@
> +    # Surrogate characters.
> +    surrogate_chars = range(LEAD_SURROGATE_MIN, TRAIL_SURROGATE_MAX + 1)
> +
> +    # Verify constants in irregexp/RegExpEngine.cpp are still valid.
> +    # FOR_EACH_NON_LATIN1_TO_LATIN1_FOLDING:

would be nice if we can generate those macro too.
maybe in another bug.
Attachment #8814226 - Flags: review?(arai.unmht) → review+
(In reply to Tooru Fujisawa [:arai] from comment #2)
> ::: js/src/irregexp/RegExpCharacterRange.h
> @@ +37,5 @@
> > +
> > +// -------------------------------------------------------------------
> > +// CharacterRange
> > +
> > +// The '2' variant has inclusive from and exclusive to.
> 
> what does "'2' variant" mean here?

I have absolutely no idea! :-)

It's from the original v8 irregexp code:
https://github.com/v8/v8/blob/061c2ab23a1d4cd192b935e7912e7dfb1fed845d/src/regexp/jsregexp.cc#L3610
https://github.com/v8/v8/commit/6a1b70ccbc2bbb0c425423c58d7a86e20c5b3aab#diff-26b227ded90c1683cf74d35da34d9379R3277

> @@ +960,5 @@
> > +    # Surrogate characters.
> > +    surrogate_chars = range(LEAD_SURROGATE_MIN, TRAIL_SURROGATE_MAX + 1)
> > +
> > +    # Verify constants in irregexp/RegExpEngine.cpp are still valid.
> > +    # FOR_EACH_NON_LATIN1_TO_LATIN1_FOLDING:
> 
> would be nice if we can generate those macro too.
> maybe in another bug.

I can update this patch to include the macro, if you don't mind.
(In reply to André Bargull from comment #3)
> (In reply to Tooru Fujisawa [:arai] from comment #2)
> > ::: js/src/irregexp/RegExpCharacterRange.h
> > @@ +37,5 @@
> > > +
> > > +// -------------------------------------------------------------------
> > > +// CharacterRange
> > > +
> > > +// The '2' variant has inclusive from and exclusive to.
> > 
> > what does "'2' variant" mean here?
> 
> I have absolutely no idea! :-)
> 
> It's from the original v8 irregexp code:
> https://github.com/v8/v8/blob/061c2ab23a1d4cd192b935e7912e7dfb1fed845d/src/
> regexp/jsregexp.cc#L3610
> https://github.com/v8/v8/commit/
> 6a1b70ccbc2bbb0c425423c58d7a86e20c5b3aab#diff-
> 26b227ded90c1683cf74d35da34d9379R3277

wow... okay, then it's fine to leave it there :)
(we could just remove it tho...

> > @@ +960,5 @@
> > > +    # Surrogate characters.
> > > +    surrogate_chars = range(LEAD_SURROGATE_MIN, TRAIL_SURROGATE_MAX + 1)
> > > +
> > > +    # Verify constants in irregexp/RegExpEngine.cpp are still valid.
> > > +    # FOR_EACH_NON_LATIN1_TO_LATIN1_FOLDING:
> > 
> > would be nice if we can generate those macro too.
> > maybe in another bug.
> 
> I can update this patch to include the macro, if you don't mind.

that's great!
Attached patch bug1320121.patch (obsolete) — Splinter Review
I've updated the patch to also generate RangeContainsLatin1Equivalents and ConvertNonLatin1ToLatin1 with make_unicode.py. 

To keep RangeContainsLatin1Equivalents as an inline function, we're now generating two new files: RegExpCharacters-inl.h and RegExpCharacters.cpp. I hope that's the correct approach for this case.
Attachment #8814226 - Attachment is obsolete: true
Attachment #8815491 - Flags: review?(arai.unmht)
Comment on attachment 8815491 [details] [diff] [review]
bug1320121.patch

Review of attachment 8815491 [details] [diff] [review]:
-----------------------------------------------------------------

Looks great, thanks you for update :D
Attachment #8815491 - Flags: review?(arai.unmht) → review+
Attached patch bug1320121.patchSplinter Review
Updated patch to include "mozilla/Assertions.h" in js/src/irregexp/RegExpCharacters.cpp to avoid breaking non-unified builds. Carrying r+ from arai.


https://treeherder.mozilla.org/#/jobs?repo=try&revision=a832af6210220f922a487e566b4ef4bc50bca6ff
https://treeherder.mozilla.org/#/jobs?repo=try&revision=9a0952aeedcfad5fccaacdd7749f8d9b2c0bc265
Attachment #8815491 - Attachment is obsolete: true
Attachment #8816209 - Flags: review+
Keywords: checkin-needed
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/2b38638df04d
Generate irregexp character tables with make_unicode.py. r=arai
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/2b38638df04d
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla53
You need to log in before you can comment on or make changes to this bug.