Closed Bug 640072 Opened 13 years ago Closed 13 years ago

Represent /a/.{lastIndex,global,source,multiline,sticky,ignoreCase} with plain old data properties

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla5

People

(Reporter: Waldo, Assigned: Waldo)

References

Details

(Whiteboard: fixed-in-tracemonkey)

Attachments

(1 file, 2 obsolete files)

Storing those property values non-specially means we don't need property ops to represent them, and they'd be just like any other data property.  That makes it easier for ES5 meta-object stuff to work with them, too.  The technique may also be useful for other classes of objects, potentially (plausibly slow arrays, say).

The idea is basically to reserve slots for those properties, then use a pre-computed Shape* to encode them, and splat that into such objects immediately after they're created.  As long as all the properties are non-configurable (a hard requirement for this scheme), the locations for the properties won't change underneath you (in theory, and in practice from my testing, but of course a patch would include tests checking this).

This also happens to make a few nice changes possible: regular expressions no longer need a resolve hook, a trace hook, or an enumerate hook, because those special properties Just Exist.

Anyway, I'm posting what I have so far now for backup purposes.  It's totally not ready to be reviewed yet, just getting it up so I can't lose it somehow.
Attached patch Patch with test (obsolete) — Splinter Review
Okay, I think this is mostly good to go.
Attachment #517957 - Attachment is obsolete: true
Attachment #518167 - Flags: review?(jorendorff)
Blocks: 640503
"oops"; discovered when I tried to use the method in bug 640503.
Attachment #518167 - Attachment is obsolete: true
Attachment #518167 - Flags: review?(jorendorff)
Attachment #518316 - Flags: review?(jorendorff)
I'll have a review for this Monday.
Comment on attachment 518316 [details] [diff] [review]
Don't hardcode JSProto_RegExp into a method that should use a provided JSProtoKey

How much bigger does this make RegExp objects? Six values, so +48 bytes each? That's kind of rough. I guess I can live with it in exchange for ditching the resolve hook. Can you make this affect speed (positively or negatively) in a stupid microbenchmark?

It's surprising there aren't any assertions anywhere that the slots assigned to Shapes are after the last reserved slot. Surprising but lucky.

In jscompartment.h, comment on JSCompartment::initialRegExpShape:
>+     * Initial shape given to RegExp objects, encoding the initial set of
>+     * built-in instance properties and, critically, the slots where they're
>+     * stored. [...]

Micro-nit: I think I'd prefer "and the fixed slots where they must be stored
(see JSSLOT_REGEXP_*)" here. That "critically" calls attention to the slots,
but without really enlightening anything, and it breaks up the sentence. Just a
suggestion.

In jsregexp.cpp:
> JSObject *
>-js_InitRegExpClass(JSContext *cx, JSObject *obj)
>+js_InitRegExpClass(JSContext *cx, JSObject *global)
> {
>-    JSObject *proto = js_InitClass(cx, obj, NULL, &js_RegExpClass, regexp_construct, 2,
>-                                   NULL, regexp_methods, regexp_static_props, NULL);
>[...]

The code following this duplicates a lot of code from js_InitClass.
DefineConstructorAndPrototype too. Why exactly is this necessary? Is there
something wrong with the object js_InitClass would create (as opposed to what
the new code here creates)?

In js/src/tests/ecma_5/RegExp/instance-property-storage-introspection.js:
>+// Check a bunch of "empty" regular expressions first.
>+
>+var choices = [{ msg: "RegExp.prototype",
>+                 get: function() { return RegExp.prototype; } },
>+               { msg: "new RegExp()",
>+                 get: function() { return new RegExp(); } },
>+               { msg: "/(?:)/",
>+                 get: Function("return /(?:)/;") }];
>+
>+for (var i = 0; i < choices.length; i++)
>+{
>+  var choice = choices[i];
>+  var msg = choice.msg;
>+  var r = choice.get();
>+
>+  checkRegExp(r, msg, 0, false, false, false);
>+}

How about just:

  checkRegExp(RegExp.prototype, "RegExp.Prototype", 0, false, false, false);
  checkRegExp(new RegExp(), "new RegExp()", 0, false, false, false);
  checkRegExp(/(?:)/, "/(?:)/", 0, false, false, false);

(You could use an eval() on the third line, to reflect your Function() call,
but I couldn't tell what that was doing there, so I dropped it.)

>+  // Tricky bits of testing: make sure that redefining lastIndex doesn't change
>+  // the slot where the lastIndex property is initially stored, even if
>+  // the redefinition also changes writability.

Feeling slightly queasy but very glad you wrote a test for this. :)

>+  r = /c/g;
>+  r.lastIndex = 2;
>+  checkRegExp(r, "/c/g initially", 2, true, false, false);
>+  Object.defineProperty(r, "lastIndex", { writable: false });
>+  assertEq(Object.getOwnPropertyDescriptor(r, "lastIndex").writable, false);
>+  try { r.exec("aabbbba"); } catch (e) { /* swallow error if thrown */ }
>+  assertEq(Object.getOwnPropertyDescriptor(r, "lastIndex").writable, false);

This doesn't check that the value of r.lastIndex is unchanged by exec. Is that
because we would flunk that test?

>+  // Last things last: make all regular expressions either unreachable or move
>+  // them away from the cached initial regular expression shape, GC, and check
>+  // that nothing crashes and burns on recreation.

Isn't that futile? It seems misleading to have this test, given that we know it
doesn't actually do what the comment says it's trying to do.

You could do it as a jsapi-test pretty easily, though. Each test creates a
compartment and global. So make RegExps, make the global unreachable,
JS_EndRequest, and GC.

r=me with the question addressed. If you do write a new test, ping me on IRC. I'd like to take a quick look before you push.
Attachment #518316 - Flags: review?(jorendorff) → review+
(In reply to comment #8)
> How much bigger does this make RegExp objects? Six values, so +48 bytes each?
> That's kind of rough. I guess I can live with it in exchange for ditching the
> resolve hook. Can you make this affect speed (positively or negatively) in a
> stupid microbenchmark?

I just realized that this could get rid of the resolve hook without incurring memory overhead. Give the properties JSPropertyOp getters. They wouldn't be plain old data properties, of course--which is sort of the point.

I think I prefer that approach. Saving 48 bytes per RegExp object rather than optimizing access to these rarely-used properties seems like the correct choice. .lastIndex can still be a plain old data property.
Much of the point of this change was to move *away* from having property-op pairs for these properties, rather to make them entirely normal properties (although ones stored in reserved slots).  The memory increase seems negligible since regular expression objects aren't going to be used in huge quantities, nor are they likely to be used for particularly long times.

If you're concerned about the memory cost, there's really no reason why we need to have everything in a regular expression kept in its RegExp*.  It looks like we could keep RegExp::compiled in the private slot, and the rest of the data could easily be kept in reserved slots directly in the object itself.  That would eliminate the current duplication and extra memory cost.  What do you think of doing that as a followup, immediately if necessary?
(In reply to comment #10)
> Much of the point of this change was to move *away* from having property-op
> pairs for these properties, rather to make them entirely normal properties
> (although ones stored in reserved slots).

I think the most compelling reasons for doing this patch are (a) to get rid of a resolve hook, which is a real win in terms of correctness and sometimes speed; (b) to prototype a technique we can use for Functions and other classes, killing more resolve hooks and ultimately killing the shared permanent hack dead dead dead. Both of those are pretty nice goals.

Why is moving away from having property-op pairs desirable per se?

> The memory increase seems negligible
> since regular expression objects aren't going to be used in huge quantities,
> nor are they likely to be used for particularly long times.

It's certainly possible for regexes to be created in loops in real code:

    for (...) {
        var re = /blah/;
        ...
    }

> If you're concerned about the memory cost, there's really no reason why we need
> to have everything in a regular expression kept in its RegExp*.  It looks like
> we could keep RegExp::compiled in the private slot, and the rest of the data
> could easily be kept in reserved slots directly in the object itself.  That
> would eliminate the current duplication and extra memory cost.

No, because (referring to the code above) currently a single RegExp struct is created at compile time and shared across all instances, right?
Comment on attachment 518316 [details] [diff] [review]
Don't hardcode JSProto_RegExp into a method that should use a provided JSProtoKey

(In reply to comment #8)
> Can you make this affect speed (positively or negatively) in a stupid
> microbenchmark?

I'll test, but size classes likely eat up the difference.  And given regular expressions aren't created in vast quantities generally, the property simplification seemed to overcome any plausible disadvantages.  Getting rid of the RegExp class would avoid these issues.

(Oh, the other reason I was doing this, beyond getting rid of property-op properties, even more important than that?  Getting rid of a shared-permanent property in the way of removing the shared-permanent hack.)

> The code following this duplicates a lot of code from js_InitClass.
> DefineConstructorAndPrototype too. Why exactly is this necessary? Is there
> something wrong with the object js_InitClass would create (as opposed to what
> the new code here creates)?

This is necessary to make sure these data properties are defined on the prototype object *before* RegExp.prototype.exec and so on.  If these properties are defined after that, you can't use reserved slots to store these properties.  And while you could add reserved slots for test/exec/toString/compile/whatever, that cost would be borne by every regular expression, not just RegExp.prototype.  The prototype object is just special -- no way around it.  I think we're going to have to start moving our bootstrapping code in that direction in general anyway.

> >+  // Last things last: make all regular expressions either unreachable or move
> >+  // them away from the cached initial regular expression shape, GC, and check
> >+  // that nothing crashes and burns on recreation.
> 
> Isn't that futile? It seems misleading to have this test, given that we know it
> doesn't actually do what the comment says it's trying to do.

How do we *know* this?  I'm pretty sure the comment is correct.  It is indeed fragile, to be sure.

> You could do it as a jsapi-test pretty easily, though. Each test creates a
> compartment and global. So make RegExps, make the global unreachable,
> JS_EndRequest, and GC.

This is a good idea -- I'll do this.
(In reply to comment #11)
> Why is moving away from having property-op pairs desirable per se?

They're neither fish nor fowl as far as property descriptors and the like are concerned.  Note that this patch fixes redefinition of these properties on regular expressions:

   Object.defineProperty(/a/g, "global", {value: false}); // shouldn't throw
   Object.defineProperty(/a/g, "lastIndex",{value:5}); // shouldn't throw
   // and so on

The problem here is directly attributable to the property definition code being unable to see through property ops that store their data in some unknown location, and which might or might not have side effects when read or written.

> It's certainly possible for regexes to be created in loops in real code:
> 
>     for (...) {
>         var re = /blah/;
>         ...
>     }

Yes -- but the core of the regular expression, the compiled code, can still be shared pretty well.  I would guess people hoist regular expression literals as often as not, too.  If you hoist there's only one copy to handle.

> No, because (referring to the code above) currently a single RegExp struct is
> created at compile time and shared across all instances, right?

Ah, right.  But there'd still be sharing with the move, just somewhat less of it, depending how you did it.
(In reply to comment #12)
> (Oh, the other reason I was doing this, beyond getting rid of property-op
> properties, even more important than that?  Getting rid of a shared-permanent
> property in the way of removing the shared-permanent hack.)

I know, but that could be done without making these properties slotful.

> > The code following this duplicates a lot of code from js_InitClass.
> > DefineConstructorAndPrototype too. Why exactly is this necessary? Is there
> > something wrong with the object js_InitClass would create (as opposed to what
> > the new code here creates)?
> 
> This is necessary to make sure these data properties are defined on the
> prototype object *before* RegExp.prototype.exec and so on. [...]
> The prototype object is just special --
> no way around it.  I think we're going to have to start moving our
> bootstrapping code in that direction in general anyway.

Oh, I see. Yes, that is probably true. The copy and paste is still really gross though. Bug 624968 was caused by some classes (in jsexn.cpp) that didn't go through js_InitClass and therefore missed some little-understood emptyShape-related initialization step; I really hate to see that duplicated. Can we give js_InitClass a callback instead--factor out common code instead of copying it? Or simply relax whatever assertions prevent you from defining RegExp.prototype.lastIndex and friends afterwards?

> > >+  // Last things last: make all regular expressions either unreachable or move
> > >+  // them away from the cached initial regular expression shape, GC, and check
> > >+  // that nothing crashes and burns on recreation.
> > 
> > Isn't that futile? It seems misleading to have this test, given that we know it
> > doesn't actually do what the comment says it's trying to do.
> 
> How do we *know* this?  I'm pretty sure the comment is correct.  It is indeed
> fragile, to be sure.

Oh! I misinterpreted the code.

Does the internal RegExp object in the script of choices[2].get have the initial shape? ...Anyway I think a jsapi-test is a lot better, so please do that.

More to come; thanks for the thorough response.
(In reply to comment #13)
> (In reply to comment #11)
> > Why is moving away from having property-op pairs desirable per se?
> 
> They're neither fish nor fowl as far as property descriptors and the like are
> concerned.  Note that this patch fixes redefinition of these properties on
> regular expressions:
> 
>    Object.defineProperty(/a/g, "global", {value: false}); // shouldn't throw
>    Object.defineProperty(/a/g, "lastIndex",{value:5}); // shouldn't throw
>    // and so on
> 
> The problem here is directly attributable to the property definition code being
> unable to see through property ops that store their data in some unknown
> location, and which might or might not have side effects when read or written.

Very good point. I find this convincing.

I think your first example here is slightly wrong:

    Object.defineProperty(/a/g, "global", {value: false});  // should throw!
    Object.defineProperty(/a/g, "global", {value: true});  // shouldn't throw

I don't see tests for this, though. Please add them, or better yet, submit them to the ECMA test suite.

As for the future: We should make all properties definitively either fish or fowl. The mental overhead of keeping up with all the weird cases is just too much. Time to think about simplifying. This is for a future bug, but please take a look at https://wiki.mozilla.org/User:Jorend/Properties and I'll get a bug on file for that.
http://hg.mozilla.org/tracemonkey/rev/c1f5c784a38b
Whiteboard: fixed-in-tracemonkey
Target Milestone: --- → mozilla2.2
We have a regression in Fennec (bug 646444) which I think is created by this patch (there was a tracemonkey merge yesterday)
As a workaround I have to convert regexp written like /\bsearch\b/i(aLink.rel) to /\bsearch\b/i.test(aLink.rel).
Was it expected?
(In reply to comment #18)
> We have a regression in Fennec (bug 646444) which I think is created by this
> patch (there was a tracemonkey merge yesterday)
> As a workaround I have to convert regexp written like /\bsearch\b/i(aLink.rel)
> to /\bsearch\b/i.test(aLink.rel).
> Was it expected?

Ok, someone told this is because of bug 582717. Sorry for the spam here
http://hg.mozilla.org/mozilla-central/rev/c1f5c784a38b
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: