Closed
Bug 1373195
Opened 4 years ago
Closed 4 years ago
CharacterRange::AddCaseEquivalents is horribly inefficient
Categories
(Core :: JavaScript Engine, enhancement)
Core
JavaScript Engine
Tracking
()
RESOLVED
FIXED
mozilla56
| Tracking | Status | |
|---|---|---|
| firefox56 | --- | fixed |
People
(Reporter: jandem, Assigned: jandem)
References
(Blocks 1 open bug)
Details
Attachments
(1 file, 1 obsolete file)
|
8.27 KB,
patch
|
arai
:
review+
|
Details | Diff | Splinter Review |
It (1) checks the whole range for case equivalent characters and (2) adds a singleton range for each of these. We often get here with a range containing > 50000 chars. We should skip ranges of "uninteresting" chars and we should try to add a single range instead of many singleton ones. The V8 code this was forked from is a lot smarter. Maybe we can import that.
| Assignee | ||
Comment 1•4 years ago
|
||
(In reply to Jan de Mooij [:jandem] from comment #0) > (2) adds a singleton range for each of these. Actually it does try to combine with existing ranges, so this part seems okay.
| Assignee | ||
Comment 2•4 years ago
|
||
I added some logging and it seems there are a few large ranges that have no case equivalent chars: [11566, 42559] (30994 chars) [43968, 65312] (21345 chars) It might make sense to skip these.
| Assignee | ||
Comment 4•4 years ago
|
||
Btw I think we're also calling this multiple times for the same ranges so we may be doing something stupid higher up the stack. I'll investigate more this week.
Flags: needinfo?(jdemooij)
Comment 5•4 years ago
|
||
OK so for whatever reason previously in bug 1374014 I wasn't seeing a backtrace of where these are coming from! And now I do: setTimeout_timer _matchStringToFieldName getInfo getFormInfo collectFormFields identifyAutofillFields doIdentifyAutofillFields/< setTimeout_timer Here are the regexes in question: https://searchfox.org/mozilla-central/source/browser/extensions/formautofill/content/heuristicsRegexp.js
Comment 6•4 years ago
|
||
(Also it seems like we run this code for each field name we find, which looks to be the name of each form control we find on the page: https://searchfox.org/mozilla-central/rev/7cc377ce3f0a569c5c9b362d589705cd4fecc0ac/browser/extensions/formautofill/FormAutofillHeuristics.jsm#101. This may be extremely expensive... If we can't do something about this in SpiderMonkey we should probably file a front-end bug about it.)
| Assignee | ||
Comment 7•4 years ago
|
||
OK so the problem here is that some of these chrome regexps use the "u" and "i" flags, and the ranges we add for '.' is where we spend a lot of (unnecessary) time.
The patch optimizes this as follows:
(1) In MakeCaseIndependent we can skip character classes that contain everything except whitespace and surrogates, as AddCaseEquivalents is a no-op in this case. This is similar to the cc->is_standard check there.
(2) In AddCaseEquivalents we can skip surrogate character ranges. The V8 code upstream has a similar check.
This improves the micro-benchmark below from >= 10 ms to 0 ms.
var re = /f...................................../ui;
var t = new Date;
print("foo".match(re));
print(new Date - t);Assignee: nobody → jdemooij
Status: NEW → ASSIGNED
Flags: needinfo?(jdemooij)
Attachment #8881328 -
Flags: review?(arai.unmht)
Comment 8•4 years ago
|
||
Comment on attachment 8881328 [details] [diff] [review] Patch Review of attachment 8881328 [details] [diff] [review]: ----------------------------------------------------------------- Thanks, the change looks good, except that js/src/vm/make_unicode.py needs corresponding modification. ::: js/src/irregexp/RegExpCharacters.cpp @@ +140,5 @@ > + 0x2028, 0x2029 + 1, // LINE SEPARATOR..PARAGRAPH SEPARATOR > + 0xD800, 0xDFFF + 1, // <Lead Surrogate Min>..<Trail Surrogate Max> > + 0xFFFF + 1 > +}; > +const int js::irregexp::kLineTerminatorAndSurrogateRangeCount = 9; this file is automatically generated by make_unicode.py, and this needs to be handled there, next to the following line: https://dxr.mozilla.org/mozilla-central/rev/53477d584130945864c4491632f88da437353356/js/src/vm/make_unicode.py#1427 > character_range('LineTerminator', line_terminator)
Attachment #8881328 -
Flags: review?(arai.unmht) → feedback+
| Assignee | ||
Comment 9•4 years ago
|
||
Good catch, arai. I undid my RegExpCharacters.cpp changes and ran make_unicode.py (which generated exactly the same file) :)
Attachment #8881328 -
Attachment is obsolete: true
Attachment #8881464 -
Flags: review?(arai.unmht)
Comment 10•4 years ago
|
||
Comment on attachment 8881464 [details] [diff] [review] Patch v2 Review of attachment 8881464 [details] [diff] [review]: ----------------------------------------------------------------- Great!
Attachment #8881464 -
Flags: review?(arai.unmht) → review+
Comment 11•4 years ago
|
||
Pushed by jandemooij@gmail.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/d084672cc9fc Speed up TextNode::MakeCaseIndependent for the unicode '.' character class ranges. r=arai
Comment 12•4 years ago
|
||
| bugherder | ||
https://hg.mozilla.org/mozilla-central/rev/d084672cc9fc
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
status-firefox56:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla56
You need to log in
before you can comment on or make changes to this bug.
Description
•