Closed Bug 1592302 Opened 5 years ago Closed 4 years ago

Import V8 regexp code into SpiderMonkey

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla73

Tracking Flags:

Tracking

Status

firefox73

---

fixed

People

(Reporter: iain, Assigned: iain)

References

Details

Attachments

(3 files)

Bug 1592302: Part 1: Import regexp directory from v8 r=mgaudet 5 years ago Iain Ireland [:iain] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1592302: Part 2: Add new regexp files to build behind config option r=mgaudet 5 years ago Iain Ireland [:iain] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1592302: Part 3: Update #includes in new regexp files r=mgaudet 5 years ago Iain Ireland [:iain] 47 bytes, text/x-phabricator-request		Details \| Review

Iain Ireland [:iain]

Assignee

Description

•

5 years ago

The first step of refreshing our regexp code is to pull the current V8 code into our tree.

Iain Ireland [:iain]

Assignee

Comment 1

•

5 years ago

Attached file Bug 1592302: Part 1: Import regexp directory from v8 r=mgaudet — Details

This is an unmodified copy of https://github.com/v8/v8/tree/master/src/regexp, with the following files removed:

regexp/OWNERS
regexp/regexp.cc
regexp/regexp-utils.cc
regexp/regexp-utils.h
regexp/s390/*
regexp/ppc/*

Iain Ireland [:iain]

Assignee

Comment 2

•

5 years ago

Attached file Bug 1592302: Part 2: Add new regexp files to build behind config option r=mgaudet — Details

Depends on D51928

Iain Ireland [:iain]

Assignee

Comment 3

•

5 years ago

Attached file Bug 1592302: Part 3: Update #includes in new regexp files r=mgaudet — Details

This patch adds a python script to update #includes from V8 to SM, and runs the script over each of the V8 source files. This automates 90%+ of the changes that are necessary to prepare V8 source for inclusion in SM. The remaining changes will come in a subsequent patch. Almost all of them simply involve adding a shim #include in the right place.

Depends on D51929

Pulsebot

Comment 4

•

4 years ago

Pushed by iireland@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/d5b26400bc5d
Part 1: Import regexp directory from v8 r=mgaudet,mhoye
https://hg.mozilla.org/integration/autoland/rev/9197fffa4083
Part 2: Add new regexp files to build behind config option r=mgaudet
https://hg.mozilla.org/integration/autoland/rev/f2a72cda0843
Part 3: Update #includes in new regexp files r=mgaudet

Natalia Csoregi [:nataliaCs]

Comment 5

•

4 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/d5b26400bc5d
https://hg.mozilla.org/mozilla-central/rev/9197fffa4083
https://hg.mozilla.org/mozilla-central/rev/f2a72cda0843

Status: NEW → RESOLVED

Closed: 4 years ago

status-firefox73: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla73

André Bargull [:anba]

Comment 6

•

4 years ago

V8 is calling ICU's C++ API, whereas we're using only the C API, because we allow embedders to bring their own ICU copy (see MOZ_SYSTEM_ICU and --with-system-icu build flag). (C++ doesn't have a stable ABI, so using C is required for this use case.)

Do we already have a plan how to handle this issue?

Iain Ireland [:iain]

Assignee

Comment 7

•

4 years ago

We do not have a plan. I haven't looked at the ICU code very closely yet; I was building with --disable-intl-api until last week.

In the short term, we might have to turn off ICU support for regexps when built with MOZ_SYSTEM_ICU. It looks like it is mostly (only?) used for case-insensitive comparisons, which we currently don't support at all, so embedders shouldn't see a regression vs the status quo.

André Bargull [:anba]

Comment 8

•

4 years ago

(In reply to Iain Ireland [:iain] from comment #7)

It looks like it is mostly (only?) used for case-insensitive comparisons, which we currently don't support at all, so embedders shouldn't see a regression vs the status quo.

status quo w.r.t. the new imported irregexp code? Because the old irregexp code supports case-insensitive comparisons (with and without the Unicode flag).

IIRC in addition to Unicode RegExps with the case-insensitive flag, V8 also uses ICU for the Unicode properties character class escapes (\p and \P). But we don't yet support those.

Iain Ireland [:iain]

Assignee

Comment 9

•

4 years ago

Sorry, I confused myself about which unicode-related feature we don't already support. Where I said "case-insensitive unicode comparisons" above, please read "unicode character class escapes". Let me try to lay out my current understanding, and you can explain where I am still wrong.

SpiderMonkey currently does not use ICU for anything regexp-related.

V8's implementation of irregexp does use ICU, for at least two features: case-insensitive unicode comparisons, and character class escapes. Case-insensitive unicode comparisons have a non-ICU backup implementation. (See, for example, the V8 commit that implemented them in the first place. It looks like there is no equivalent backup implementation for character classes.

If we turn off V8's ICU support with MOZ_SYSTEM_ICU, my understanding is that case-insensitive unicode comparisons will use the non-ICU fallback path, and character class escapes will not be supported. This is, I think, equivalent to the SM status quo.

Does this make sense? There is a very good chance I am missing something.

André Bargull [:anba]

Comment 10

•

4 years ago

(In reply to Iain Ireland [:iain] from comment #9)

Sorry, I confused myself about which unicode-related feature we don't already support. Where I said "case-insensitive unicode comparisons" above, please read "unicode character class escapes". Let me try to lay out my current understanding, and you can explain where I am still wrong.

Ah, I see.

SpiderMonkey currently does not use ICU for anything regexp-related.

That's correct.

V8's implementation of irregexp does use ICU, for at least two features: case-insensitive unicode comparisons, and character class escapes.

Yep.

Case-insensitive unicode comparisons have a non-ICU backup implementation. (See, for example, the V8 commit that implemented them in the first place. It looks like there is no equivalent backup implementation for character classes.

If we turn off V8's ICU support with MOZ_SYSTEM_ICU, my understanding is that case-insensitive unicode comparisons will use the non-ICU fallback path, and character class escapes will not be supported. This is, I think, equivalent to the SM status quo.

The backup implementation isn't spec-compliant, because it simply ignores the cases where Unicode compliant case-comparisons are needed. (See "test/mjsunit/es6/unicode-regexp-ignore-case.js" and "test/mjsunit/es6/unicode-regexp-ignore-case-noi18n.js" in the V8 sources.) So we'll have a regression when either --without-intl-api or --with-system-icu is used, because then case-comparisons will be wrong for certain strings.
I can't tell if it's worth spending time to ensure this won't regress, because I don't know how many embedders use either --without-intl-api or --with-system-icu. And there's already functionality which isn't spec-compliant when either --without-intl-api (cf. final Sigma handling for String.prototype.toLowerCase) or --with-system-icu is used (time zone handling in vm/DateTime), so this isn't a totally new situation for us. If you ask me, I'd say it's okay if certain things don't work correctly when the in-tree ICU version isn't used, as long as we are aware of that situation.

And just like said, not supporting Unicode property class character escapes when MOZ_SYSTEM_ICU should be acceptable, because we currently don't support it at all (bug 1361876).

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

Import V8 regexp code into SpiderMonkey

Categories

(Core :: JavaScript Engine, task, P1)

Tracking

()

People

(Reporter: iain, Assigned: iain)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(3 files)

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Attachment

General

Description

File Name

Content Type