Closed
Bug 573504
Opened 14 years ago
Closed 14 years ago
Regular expression hoisting or caching is desirable
Categories
(Tamarin Graveyard :: Library, defect, P3)
Tamarin Graveyard
Library
Tracking
(Not tracked)
RESOLVED
FIXED
Q3 11 - Serrano
People
(Reporter: lhansen, Assigned: lhansen)
Details
(Whiteboard: PACMAN, has-patch)
Attachments
(6 files, 7 obsolete files)
18.15 KB,
patch
|
stejohns
:
review+
|
Details | Diff | Splinter Review |
2.26 KB,
text/plain
|
Details | |
2.26 KB,
text/plain
|
Details | |
2.81 KB,
text/plain
|
Details | |
2.71 KB,
text/plain
|
Details | |
20.70 KB,
patch
|
stejohns
:
review+
edwsmith
:
superreview+
|
Details | Diff | Splinter Review |
Two simple benchmarks that perform a simple loop of the form for ( ... ) re.exec(s) where re is either a literal regular expression or a variable holding a regular expression constructed outside the loop, show that the latter case is 2.9 times the speed of the former (score 110 vs score 38). It is nearly trivial to cache the internal compiled form of the regular expression in the AvmCore, so we should try that first (typically a single-element cache will do a lot). The alternative is to try to sniff the regex construction and then hoist it into the method prologue. However, caching can work well across invocations, and hoisting will not
Assignee | ||
Comment 1•14 years ago
|
||
Assignee | ||
Comment 2•14 years ago
|
||
Assignee | ||
Comment 3•14 years ago
|
||
We have a single-element cache in AvmCore, it's flushed during GC to avoid holding onto long patterns or large compiled regexes. The cache is used to bypass both regex compilation and redundant re-scanning of the regex string in pre-compilation (to build up the regex flags). The latter bit avoids further allocation, as the zero-terminated strings allocate temporary storage. For the benchmark in this test the cache raises the score from 38 to 60 (MacPro), which is not bad (58% speedup) but that's still only half the speed of manually hoisting the regular expression out of the loop. The remaining time must be spent in allocating and constructing the RegExp object itself; I'm a little puzzled at the cost and will need to investigate further. Not asking for a review yet, as there are some rough spots here and even a FIXME, but comments are welcome all the same.
Assignee | ||
Comment 4•14 years ago
|
||
Attachment #452751 -
Attachment is obsolete: true
Assignee | ||
Comment 5•14 years ago
|
||
There was a bug in the old benchmark: The combination of /g and RegExp.prototype.exec is not good if the regular expression is hoisted, since exec both reads and writes the regex's lastIndex field. In the case of the hoisted benchmark the matching would start at the end of the string as of the second iteration, and always fail immediately, and that gives an unfair advantage to the hoisted version. I should have seen that sooner - it's why the benchmark prints out the last result - but did not. With this new program, we have new scores: hoisted 79 cached 62 That is, hoisted/cached = 1.27. That is probably defensible, since the 'cached' version performs more work per iteration: - look up RegExp - call the RegExp constructor - allocate and initialize a RegExp object - destruct the RegExp object (The new benchmarks run for 2 seconds but still report iterations/sec.)
Attachment #452752 -
Attachment is obsolete: true
Assignee | ||
Comment 6•14 years ago
|
||
Cleaned-up code. Here we have a single-element global (AvmCore-relative) cache only, which solves the hot-regex-in-a-loop use case (which is probably somewhat important), but which will work less well on more complex problems. Alternative structures include: - multi-element global caches - local caches, eg, relative to a MethodInfo - JIT optimization, eg, precompiling regex strings Precompilation is actually a little dicey because compilation can cause errors which must not be visible unless control flow reaches the point of where the regex would have been compiled. Local caches are also a little tricky because we want to be sure not to retain storage indefinitely, the global cache is flushed on every GC to avoid this. In contrast, a multi-element global cache would be relatively straightforward.
Attachment #453050 -
Flags: feedback?(stejohns)
Assignee | ||
Updated•14 years ago
|
Attachment #452820 -
Attachment is obsolete: true
Comment 7•14 years ago
|
||
Comment on attachment 453050 [details] [diff] [review] Patch -- Nit: RegexCache should restructure its members to avoid compiler-inserted padding. (bool-int-bool-ptr should be ptr-int-bool-bool). Ditto RegExpObject (bool-int-int-bool-ptr -> ptr-int-int-bool-bool). -- Uber-nit: "an trailing flags string" -> "a trailing flags string"
Attachment #453050 -
Flags: feedback?(stejohns) → feedback+
Assignee | ||
Comment 8•14 years ago
|
||
Suggested reorg of existing class (only).
Attachment #453355 -
Flags: review?(stejohns)
Assignee | ||
Comment 9•14 years ago
|
||
Comment on attachment 453355 [details] [diff] [review] Reorganize the fields of RegExpObject That's a silly patch.
Attachment #453355 -
Attachment is obsolete: true
Attachment #453355 -
Flags: review?(stejohns)
Assignee | ||
Comment 10•14 years ago
|
||
I've addressed the feedback. In addition the regex cache has been split out of AvmCore as a separate data structure, and now encapsulates a multi-entry cache. This seems pretty clean. The cache size is set to 4, on the principle that 1 is too brittle in the face of reasonable programs but 4 is likely better, and anything larger would seem unreasonable for this approach. Cache replacement is LRU using a simple time stamping mechanism. I also corrected a bug in the caching code; in the case where the AS3 code is "new RegExp(re)" where re is a regular expression we should just copy that regular expression. (There was an obscure correctness problem also but anyway that's gone.) Asking for a review now, though I still need to verify that benchmarks with up to four hot regular expressions see a speedup as expected and that once we get five hot expressions we're back to the old behavior, more or less. (A significant slowdown would be bad even if the case is atypical.)
Attachment #453050 -
Attachment is obsolete: true
Attachment #453377 -
Flags: review?(stejohns)
Assignee | ||
Updated•14 years ago
|
Whiteboard: PACMAN → PACMAN, has-patch
Comment 11•14 years ago
|
||
(In reply to comment #9) > (From update of attachment 453355 [details] [diff] [review]) > That's a silly patch. From the Ministry of Silly Patches?
Comment 12•14 years ago
|
||
Comment on attachment 453377 [details] [diff] [review] Patch Agreed that we should benchmark before landing. -- nit: RegexCache and RegexCacheEntry should probably have comments pointing out that they expect to live in a GCRoot, due to DRC (vs DRCWB) usage. -- nit: named constant for RegexCache size, rather than "4".... -- style nit: most existing code in tamarin tends to do for (init; test; incr) rather than for ( init; test; incr ) your call, though.
Attachment #453377 -
Flags: review?(stejohns) → review+
Assignee | ||
Comment 13•14 years ago
|
||
(In reply to comment #12) > -- nit: named constant for RegexCache size, rather than "4".... I can fix that, but I hope you observed that that's the only occurence of that '4'. Everywhere else uses ARRAY_SIZE(m_entries). Agreed on the other issues.
Assignee | ||
Comment 14•14 years ago
|
||
I've honed the code some more to disable the cache if it's thrashing, new patch will follow, as will benchmark code. Below are results for four benchmarks: RE1 - one regex literal inside the loop RE2 - one regex literal outside the loop RE3 - five regex literals inside the loop RE4 - four regex literals inside the loop The cache size is four, so RE3 should thrash and RE4 should work very well indeed. NOTE benchmarks compiled without -AS3, for now, so calls to exec() are prototype calls, which will tend to hide effects of the regex cache. NOTE scores for RE3 and RE4 use a different scale than those for RE1 and RE2, in order to get precision. Runs are release-debugger -Dnodebugger, Mac Pro Old code: RE1 40, 40 RE2 69, 69 RE3 128, 127, 128 RE4 156, 156, 156 New code: RE1 58, 59, 59 RE2 72, 72, 71 RE3 122, 122, 121 RE4 265, 266, 265 The numbers show: RE1 48% speedup - as expected, the regex always hits the cache RE2 4% speedup - strange, should not really see any speedup here, as regex construction is not hot at all RE3 5% slowdown - as expected, no benefit from the cache, and some overhead to discover that RE4 70% speedup - all regexes fit in the cache, excellent utilization The main issue is the 5% slowdown when the cache is disabled. The code to track utilization and disable the cache if it's thrashing came about as a result of observing a 9% slowdown due to the thrashing (117, 118, 117 on RE3), so 5% is an improvement. It seems that disabling the cache more aggressively improves RE3, but it's unclear why doing so would be the right thing - real programs probably benefit from the cache being on, and suffer little from it being on uselessly.
Assignee | ||
Comment 15•14 years ago
|
||
Again release-debugger -Dnodebugger, but test programs compiled with -AS3 Old code: RE1 43 43 44 RE2 75 75 75 RE3 145 145 145 RE4 174 175 174 New code: RE1 65 66 65 RE2 76 76 76 RE3 135 136 136 RE4 316 315 315 The numbers show: RE1 53% speedup RE2 1% speedup RE3 6% slowdown RE4 81% speedup The trends and figures are pretty much the same. The main issue for the patch is whether we believe there's temporal locality in how regular expressions are used (really in how they are compiled from literals), or whether programs that use regular expressions use a large number of them and won't benefit from a small cache. The latter class of program can only be helped by a much larger - perhaps perfect - "cache", or indeed by caching locally in each method, which would amount to the same thing - remembering all compiled expressions, if only for a while. A method-local cache is probably straightforward if the RegExp constructor is early bound by the JIT, but otherwise ... maybe the RegExp constructor can sniff its call stack and find its caller, and attach information to it.
Assignee | ||
Comment 16•14 years ago
|
||
Attachment #453040 -
Attachment is obsolete: true
Assignee | ||
Comment 17•14 years ago
|
||
Attachment #453041 -
Attachment is obsolete: true
Assignee | ||
Comment 18•14 years ago
|
||
... for cache size 'n', ie, this will cause thrashing in the cache.
Assignee | ||
Comment 19•14 years ago
|
||
... also for cache size 'n', ie this has 100% hit rate and 100% utilization and represents a best case.
Assignee | ||
Comment 20•14 years ago
|
||
This is like the first patch, with the nits tidied up, and with a cache-disabling mechanism: if utilization falls below some percentage (here set to 10%) after a while (here set to 1000 compilations) then the cache is disabled. It is re-enabled by the garbage collector when the GC clears the regex cache in presweep. I'd like opinions on (a) whether the cache-disabling complexity, which is pretty localized, pays for itself and (b) whether we should land this or the other patch, given that we're seeing a small slowdown when the cache is thrashing.
Attachment #453727 -
Flags: review?(stejohns)
Comment 21•14 years ago
|
||
(In reply to comment #20) > I'd like opinions on (a) whether the cache-disabling complexity, which is > pretty localized, pays for itself and (b) whether we should land this or the > other patch, given that we're seeing a small slowdown when the cache is > thrashing. Perhaps re-running the benches (RE3 in particular) would provide useful data...?
Comment 22•14 years ago
|
||
Comment on attachment 453727 [details] [diff] [review] Alternate patch Looks fine -- I'd be curious to see how the numbers compare from re-running the benchmarks (especially RE3). I'm also a little puzzled as to why the slowdown in the thrash case should be that large; it doesn't appear that the cache lookup overhead should be very high, relative to the cost of compiling the regexp in the first place. Is it possible the slowdown is not in the cache operation per se but in some of the rewritten code in RegExpObject? On that note, a minor optimization there: instead of c == '(' && pos+3 < length && pattern->charAt(pos+1) == '?' && pattern->charAt(pos+2) == 'P' && pattern->charAt(pos+3) == '<') you could just do pattern->indexOfLatin1("(?P<", 4, pos) == pos
Attachment #453727 -
Flags: review?(stejohns) → review+
Assignee | ||
Comment 23•14 years ago
|
||
The numbers in comment #15 are for the code posted in comment #20 ("Alternate patch").
Assignee | ||
Comment 24•14 years ago
|
||
(In reply to comment #22) > I'm also a little puzzled as to why the slowdown > in the thrash case should be that large; it doesn't appear that the cache > lookup overhead should be very high, relative to the cost of compiling the > regexp in the first place. Is it possible the slowdown is not in the cache > operation per se but in some of the rewritten code in RegExpObject? I'm also puzzled. I'd be surprised if the precompilation is the culprit; but there could be many contributory factors. There's an RCObject that's allocated to serve as a proxy for the compiled regex, it is pure overhead if the cache thrashes; there are write barriers on the cache elements, for three RC objects; there's the cache search; and of course the somewhat slower precompilation.
Assignee | ||
Updated•14 years ago
|
Attachment #453727 -
Flags: superreview?(edwsmith)
Assignee | ||
Comment 25•14 years ago
|
||
Comment on attachment 453727 [details] [diff] [review] Alternate patch Ed, maybe you can weigh in on whether we should land one of these (and if so which one), or if we should hold off until we're not regressing in the cache-thrashing case.
Assignee | ||
Comment 26•14 years ago
|
||
(In reply to comment #24) > (In reply to comment #22) > > ...there could be many contributory factors. > There's an RCObject that's allocated to serve as a proxy for the > compiled regex, it is pure overhead if the cache thrashes Could possibly be fixed, at some increase in complexity, by using custom reference counting on the underlying pcre object > there are write barriers on the cache elements, for three RC objects; Faster write barriers will be here someday, but for now we'd remove the barrier for the new RC object by using custom reference counting. It's possible using WB() would be a hair faster. > there's the cache search; We could abandon the O(n) search and instead use a larger direct-mapped table indexed by string / options value; the problem there is that we can get a different kind of thrashing when two inputs map to the same slot. > and of course the somewhat slower precompilation. That can be optimized by providing more access to the string representation. Even then it is likely that the cache-thrashing case will incur measurable overheads.
Comment 27•14 years ago
|
||
(In reply to comment #15) > but otherwise ... maybe the RegExp > constructor can sniff its call stack and find its caller, and attach > information to it. This strikes me as crazy-talk in the short term, and in the medium/long term I'm hopeful that inlining might expose enough context to specialize the RE and enable inline caching to help.
Comment 28•14 years ago
|
||
Can user code extend RegExp and populate the cache with instance objects from their code origin (pool)? if so then the cache could prevent unloading by creating a reference chain from avmcore, to the cache, to the user PoolObject. Clearing the cache on gc sweep should be good enough, right? could it cause the user PoolObject to live one gc cycle longer than expected? I find myself wondering if the regexp cache code really belongs in RegExp[Class|Object].h instead, but its a minor point. I also wondered about whether a 1-entry inline cache would be more effective. One would certianly think so for /re-literals/, right? you talked about a per-method cache, but didn't mention inline caching so i'm not sure if you equate the two. Its worth noting that with inline caching, individual instances using an inline cache could enable/disable themselves independently, and it would be not hard to switch between monomorphic/polymorphic/megamorphic cases. (the last case would just disable the cache). to answer the question, i think the disabling code does pay for itself, and we should go with that version of the patch, but man, i wish we had better data to go on.
Assignee | ||
Comment 29•14 years ago
|
||
(In reply to comment #28) > Can user code extend RegExp and populate the cache with instance objects from > their code origin (pool)? Arguably, though I'm not sure how the pool figures into it. Are you thinking that the RegExpSubtype instance references the class/traits for RegExpSubtype which in turn references the pool? > if so then the cache could prevent unloading by > creating a reference chain from avmcore, to the cache, to the user > poolObject. Clearing the cache on gc sweep should be good enough, right? > could it cause the user PoolObject to live one gc cycle longer than > expected? It doesn't prevent unloading, but it may delay it. If we have unloading hooks (callbacks called when an unloading event occurs) then perhaps they should flush the regex caches, in order to prevent the delay. > I find myself wondering if the regexp cache code really belongs in > RegExp[Class|Object].h instead, but its a minor point. I also wondered about > whether a 1-entry inline cache would be more effective. One would certianly > think so for /re-literals/, right? you talked about a per-method cache, but > didn't mention inline caching so i'm not sure if you equate the two. You can only have an inline cache if you can early-bind to the RegExp constructor. Arguably this is a bug (actually several bugs) in Tamarin and/or ASC, but a literal regex /abc/g is compiled roughly as findpropstrict RegExp pushstring abc pushstring g constructprop RegExp, 2 and the jit does not seem to early-bind to RegExp. This is a bug not because we can't early-bind but because it means RegExp can be spoofed, which is wrong for literals - they should be inviolable. (The other bug is that compiling regexes this way is against ES3, which demands that the literal is compiled once - although it's been argued that that's a bug in ES3.) > to answer the question, i think the disabling code does pay for itself, > and we should go with that version of the patch, but man, i wish we had > better data to go on. Well, yes. The reason I brought up the stack sniffing earlier is because it's one way of dealing with late binding. There are others, possibly - we'd have to have an inline cache that handles a late-bound RegExp constructor and when that hits, handles a previously compiled regex.
Comment 30•14 years ago
|
||
there's two aspects to early binding constructors. one is statically deriving the types (knowing the class is T$ and thus knowing the result is T), and then also early binding the allocation and constructor invocation. The first aspect is disabled for builtin classes with constructors that don't follow the standard "newInstance(); invoke constructor" pattern (traits->hasCustomConstruct == true). This in turn disables the specialized allocation and constructor invocation. There's no good reason we couldn't specialize that path for a few important classes, like RegExp (or Array or anything else for that matter). We end up early binding to the result type (public::RegExp) even though we don't generate an early bound constructor call. 2:findpropstrict RegExp [Object~[A]] {} (global~[O] global~[O]) 4:getproperty RegExp [Object~[A]] {} (global~[O] RegExp$[O]) 6:pushstring "abc" [Object~[A]] {} (global~[O] RegExp$[O] String~[S]) 8:pushstring "g" [Object~[A]] {} (global~[O] RegExp$[O] String~[S] String~[S]) 10:construct 2 [Object~[A]] {} (global~[O] RegExp~[O]) Seems like possible low hanging fruit; the builtin classes are a known set, (Array, RegExp, Date, etc). out of scope for this bug but probably something good for PACMAN.
Assignee | ||
Comment 31•14 years ago
|
||
After talking to Ed, we've decided as follows: - I'll land the patch with the fallback, though we agree it pushes up against the limit of allowable complexity. - I annotate the patch with more prose re: the ad-hocness of the cache tuning parameters and their possible bogosity. - Ed will work on some early binding to the RegExp constructor (as part of a broader agenda of binding to important built-in classes, like Array); once that lands we can reconsider whether we might not want to use an in-line cache where it's possible to do so.
Updated•14 years ago
|
Attachment #453727 -
Flags: superreview?(edwsmith) → superreview+
Assignee | ||
Comment 32•14 years ago
|
||
(In reply to comment #31) > - Ed will work on some early binding to the RegExp constructor (as part of a > broader agenda of binding to important built-in classes, like Array) Bug #575848.
Assignee | ||
Comment 33•14 years ago
|
||
tamarin-redux changeset: 4893:8993c851682d
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Comment 34•14 years ago
|
||
(In reply to comment #28) > Can user code extend RegExp Although RegExp is not marked final, in practice, subclasses of RegExp have never worked properly, so this is a nonissue. (In a version-enabled future we should probably explicitly make RegExp final).
Updated•13 years ago
|
Flags: flashplayer-bug+
You need to log in
before you can comment on or make changes to this bug.
Description
•