Closed Bug 464866 Opened 16 years ago Closed 16 years ago

TM: Use regexp string as key for the regexp fragment cache

Categories

(Core :: JavaScript Engine, defect)

x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: gal, Assigned: dmandelin)

References

Details

(Keywords: fixed1.9.1, Whiteboard: fixed-in-tracemonkey)

Attachments

(1 file, 1 obsolete file)

      No description provided.
Attached patch use regexp source+flags as key (obsolete) — Splinter Review
Assignee: general → gal
Attachment #348154 - Flags: review?(danderson)
This is a whopping 50ms perf win on sunspider.
Comment on attachment 348154 [details] [diff] [review]
use regexp source+flags as key

>+static Fragment* 

Angels! -- (C) Brendan
Attachment #348154 - Flags: review?(danderson) → review+
Not a blocker, hold off on this one according to sayrer.

But we might want to redo our benchmark numbers for the paper with it since it affects a whole bunch of benchmarks. dmandelin?
Attachment #348154 - Flags: review?(danderson)
Attachment #348154 - Flags: review?(dmandelin)
Attachment #348154 - Flags: review?(danderson)
Attachment #348154 - Flags: review?
Attachment #348154 - Flags: review?
(In reply to comment #2)
> This is a whopping 50ms perf win on sunspider.

Which tests does it change? I'm expecting tagcloud, for one.
Comment on attachment 348154 [details] [diff] [review]
use regexp source+flags as key

Drive-by nit-picking.

>+static void*
>+hashRegExp(uint16 flags, jschar* s, size_t n)

Nit: static functions are capitalized.

>+struct RESideExit : public SideExit
>+{
>+    uint16 flags;
>+    size_t re_length;
>+    jschar re_chars[1];
>+};

Flip flags and re_length for better packing. Might use re_flags for similar naming (flags is vague -- are these side-exit or regexp flags? if obviously the latter, same goes for chars) and aesthetics.

Opening brace of struct on same line, generally. jstracer.h is not consistent. If this makes it hard to see the superclass(es) we should go the other way, but at least be consistent.

>+static Fragment* 
>+lookupNativeRegExp(JSContext* cx, uint16 flags, jschar* re_chars, size_t re_length)

Capitalization nit.

>+{
>+    Fragmento* fragmento = JS_TRACE_MONITOR(cx).reFragmento;
>+    Fragment* fragment = fragmento->getLoop(hashRegExp(flags, re_chars, re_length));

Why not make the caller hash to avoid rehashing later in compile when calling getAnchor? If one of two callers needs the string chars and length, you could even re-parameterize lookupNativeRegExp to take a JSString* and use js_HashString instead of duplicating its guts.

r=me on a followup for tracemonkey landing.

/be
http://hg.mozilla.org/tracemonkey/rev/2601301b793d
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Backed out. Sets the tinderboxes ablaze.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Fixing this is proving more difficult than I had hoped. I think the main problem is that the generated native code refers to the RECharSet* stored in the JSRegExp, so we can get:

  re = js_NewRegExp(/[abc]/, ...)   /* creates new RECharSet for [abc] */
  MatchRegExp(re);
  js_DestroyRegExp(re)              /* frees RECharSet created above */
  re2 = js_NewRegExp(/[abc]/, ...)  /* native compiler finds previous version
                                       which refers to freed RECharSet */

The obvious solution is to store a copy of the RECharSet (or really, just its bit map) with the generated code (say, in a skip buffer). But that doesn't work by itself because the bitmap is not actually created until the first time the regexp is used by matching. And the code that creates the bitmap takes an REGlobalData and an REMatchState as args, so it doesn't like to be called outside the matcher.

One possibility is to do the compilation lazily inside MatchRegExp like we had originally planned. That way, ProcessClass could be used as is and the only thing to make sure of is the memory usage.

Another is to refactor ProcessClass so it can be called from elsewhere. This solution seems cleaner but I have to make sure all the deps on REGlobalData and REMatchState can be boiled down to a few things available during compilation.
Attached patch Fixed patchSplinter Review
I took 348154 and fixed the bugs as explained in my previous comment. Then I  reorganized the initialization of character classes a bit to eliminate the ambiguity about when the RECharSet is initialized. 

I also figured out and fixed the perf regression that occurred before when we moved regexp compilation after bytecode compilation. The bytecode compilation process combines sequential flat characters into flat strings. The native compiler didn't handle these before, because it didn't have to when run before bytecode compilation. I added support to the native compiler and now it works.
Assignee: gal → dmandelin
Attachment #348154 - Attachment is obsolete: true
Attachment #348867 - Flags: review?(gal)
Attachment #348154 - Flags: review?(dmandelin)
Small update mostly for bizarreness value. The current patch won't compile all the regexps that appear in crypto-aes because it handles only character classes whose bitmap representations fit in 1024 bytes (in turn because LirBuffers don't hold things larger than pages). But it turns out that none of those regexps ever run anyway--they're inside dead functions.
Attachment #348867 - Flags: review?(gal) → review+
Pushed to TM as 21808:9f5e7a04b4bc.
Whiteboard: [fixed-tracemonkey]
Whiteboard: [fixed-tracemonkey] → fixed-in-tracemonkey
Status: REOPENED → RESOLVED
Closed: 16 years ago16 years ago
Resolution: --- → FIXED
Flags: in-testsuite-
Flags: in-litmus-
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: