Closed Bug 802157 Opened 12 years ago Closed 11 years ago

make dataset (HTML5 data-*) faster

Categories

(Core :: DOM: Core & HTML, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: fb+mozdev, Unassigned)

References

(Depends on 2 open bugs, Blocks 1 open bug)

Details

Attachments

(1 file)

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:17.0) Gecko/17.0 Firefox/17.0
Build ID: 20121010150351

Steps to reproduce:

Visit http://jsperf.com/data-dataset


Actual results:

get/setAttribute is faster than dataset


Expected results:

dataset should be equally fast or (much) faster than get/setAttribute
Blocks: html
OS: Mac OS X → All
Hardware: x86 → All
Whiteboard: [parity-webkit]
So div.dataset = {yo: 'yo', ma: 'ma', la: 'la'}; isn't doing anything.
dataset is
readonly attribute DOMStringMap dataset;
I fixed the testcase, however this does not really change the result (it's even worse): http://jsperf.com/data-dataset/2
> dataset should be equally fast or (much) faster than get/setAttribute

No, getAttribute is expected to be much faster than dataset, because dataset can't be optimized by a JIT nearly as well.
And the "2" testcase has a bunch of .dataset gets as part of its timing code, of course, which adds a huge amount of overhead.
Now all that said, I'm pretty sure we can in fact make DOMStringMap faster.  But note that under the hood it has to end up calling setAttribute/getAttribute anyway, so can't possibly end up faster than those.
Version: 17 Branch → Trunk
So I set up an apples-to-apples comparison at http://jsperf.com/data-dataset/3

This shows us about 1.5x faster than Chrome on the dataset bit, as far as I can tell.  In a current nightly I get "459,691" for dataset, while Chrome gets 294,682.  In both, setAttribute is at about "1,000,000", as expected.
OK, on http://jsperf.com/data-dataset/2 I see "212,353" for us on the dataset part and "260,409" for Chrome.

On http://jsperf.com/data-dataset I see "689,389" for us and "1,889,080" for Chrome.

Which suggests that Chrome is a lot faster than we are at getting nonexistent things out of a dataset, and that's about it.  Is that what the "parity-webkit" bit is about?
And in particular, it _is_ possible to teach a JIT to cache the fact that a property is missing, so getting a missing property lots of times in a loop is likely to look like a lot of no-ops in a JIT that does that.  Unclear how useful that is in practice, though.
Oh, and what comment 8 paragraph 1 mostly shows is that the .dataset getter is not all that fast.  We may be able to speed that up by converting it to WebIDL.  That might well speed up the no-op gets too, though it might hurt the performance of gets that return something...
Well, I should not post bugs when I'm in a hurry. Sorry. 

Is there anything else we can gain except Bug 802243? E. g. "failing" earlier if we encounter an unmatched entity (-> no-op)? Also, Bug 802243 does not convert it to WebIDL AFAICS (can we try this and see if this hurts performance for queries with a good answer?). 

Why is rev3 so much faster than rev2 in Fx? Is it simply the "caching" of the dataset attribute? Chrome does not have this significant speed increase. 
Also, Fx 19 is slower in rev3 than Fx 17. Is this correlated to a code change (-> regression)?


Why is get/setAttribute so much faster? It's calling get/setAttribute internally. Can't we make it to get to that point faster? Can we make it more "JIT-friendly"?
Whiteboard: [parity-webkit]
> Is there anything else we can gain except Bug 802243?

Unclear.  The right solution is to convert this to WebIDL, then optimize the result, in general.

> E. g. "failing" earlier if we encounter an unmatched entity (-> no-op)?

Most of the cost is paid before we get to that point, unfortunately...

> can we try this and see if this hurts performance for queries with a good answer?

Yes, we should definitely try this.  Ms2ger was considering taking a stab at it.  We need to add deleter support for WebIDL first.

> Is it simply the "caching" of the dataset attribute?

Yes.  Without bug 802243 the .dataset get is pretty expensive.

> Chrome does not have this significant speed increase. 

It has a 10+% speed increase, afaict.  But yes, the cheaper your getter the less of a win from avoiding it.  ;)

> Also, Fx 19 is slower in rev3 than Fx 17.

Is that also the case if you disable ion?

> Why is get/setAttribute so much faster?

Because it's a lot easier for the JIT to optimize with hidden types and the like so it doesn't have to do a full property lookup.

> Can we make it more "JIT-friendly"?

It's very hard to make "look up arbitrary property name that might appear or disappear due to action at a distance" (which is what dataset.foo is) as fast as "look up known at compile time property name that lives on a given prototype"...
> The right solution is to convert this to WebIDL, then optimize the result, in general.

To be clear, there are two things to be converted:

1)  HTMLElement.  This will give us code similar to the second patch I just posted in bug 802243, conceptually.

2)  DOMStringMap.  This will be needed to make full use of the updated HTMLElement code.  The hard part here is possible regressions on the actual access to the dataset, even while the .dataset getter gets very fast.
Depends on: 803129
(In reply to Boris Zbarsky (:bz) from comment #12)
> Most of the cost is paid before we get to that point, unfortunately...
What is happening "before" get/setAttribute that takes "so much" time? Is it simply converting the name of the property (camelcase -> hyphenated)? Can we somehow shring this process? 
Is there a way to have a fast-track, like pre-emptively notifying something (the cache?) that there are datalist attributes and handle them differently (i. e. not using get/setAttribute at all)? Is this sane at all, or just too much work?

> Yes.  Without bug 802243 the .dataset get is pretty expensive.
Yay, first win! :)

> > Also, Fx 19 is slower in rev3 than Fx 17.
> 
> Is that also the case if you disable ion?
I was only comparing the charts, and as I understand Fx 18 is the first version to have IM support. Sorry, I don't have the time to try this out. 

> > Why is get/setAttribute so much faster?
> 
> Because it's a lot easier for the JIT to optimize with hidden types and the
> like so it doesn't have to do a full property lookup.
> 
> > Can we make it more "JIT-friendly"?
> 
> It's very hard to make "look up arbitrary property name that might appear or
> disappear due to action at a distance" (which is what dataset.foo is) as
> fast as "look up known at compile time property name that lives on a given
> prototype"...
Interpreting this, looking up arbitrary (non-standard) attributes should take about as long as accessing a datalist attribute?

(In reply to Boris Zbarsky (:bz) from comment #13)
> > The right solution is to convert this to WebIDL, then optimize the result, in general.
> 
> To be clear, there are two things to be converted:
> 
> 1)  HTMLElement.  This will give us code similar to the second patch I just
> posted in bug 802243, conceptually.
> 
> 2)  DOMStringMap.  This will be needed to make full use of the updated
> HTMLElement code.  The hard part here is possible regressions on the actual
> access to the dataset, even while the .dataset getter gets very fast.
I just saw you are creating the bugs, great!
However, I could not find one for HTMLElement. This is also "Paris bindings", isn't it? Is this somehow included in Node or does this need a separate bug?
> What is happening "before" get/setAttribute that takes "so much" time?

1)  Getting the "datalist" object off the element (in the testcases that test that).
2)  Looking up the right setter to call for the given property.
3)  Calling the datalist getter.
4)  String operations to produce the attribute name from the given property name (involves
    a string copy out of the JS string, which is not needed if getAttribute is called
    directly).

Probably a few other things.  Assuming you mean what happens in the datalist case, of course.

> Can we somehow shring this process? 

It's all code.  The question is how much it matters.  For example, should all string-handling code in Gecko and Spidermonkey be rewritten to make this a bit faster?  I doubt it.

> (i. e. not using get/setAttribute at all)?

How do you envision this working, exactly?   You _have_ to end up setting the attribute on set, per spec.  The get has to do the same exact work as getAttribute, so they share the same underlying code.  Again, the "extra" work obviously all happens before that part is called.

> I was only comparing the charts

Uh... you're not running the test yourself?  The data from the charts is completely useless, because it doesn't account for hardware differences in any way.  So unless you're measuring yourself, on identical hardware, the numbers mean nothing.

> Interpreting this, looking up arbitrary (non-standard) attributes should take about as
> long as accessing a datalist attribute?

It really depends on how datalist gets implemented.  Generally speaking, JITs cache information about what properties an object has.  See http://en.wikipedia.org/wiki/Inline_caching for some info on that.  But that only works if the JIT has a way to know when props are added/removed.  For normal JS objects it does, since it fully controls them.  But for datalist it does not, since there's this action at a distance where setAttribute magically makes properties appear on datalist.

> However, I could not find one for HTMLElement.

There probably isn't one yet.  There's not much point filing one until Node and Element are done.
Thank you very much for explaining all of this! So it's mostly that the spec is kinda blocking making datalist faster or equally fast than get/setAttribute. Nothing to do here. 

As for the tests, I'm sorry I haven't mentioned this earlier. As the results are not far from each other, I think this is in the margin of error. So it does not look like a regression. However, someone with Fx 17 and Nightly might want to check this on one machine.
So I don't have to deal with jsperf.
Depends on: 803278
Summary: make datalist (HTML5 data-*) faster → make dataset (HTML5 data-*) faster
Now that Bug 821606 is fixed, what bugs need to be filed? "Convert DOMStringMap to WebIDL"? Anything else?

(In reply to Boris Zbarsky (:bz) from comment #13)
> > The right solution is to convert this to WebIDL, then optimize the result, in general.
> 
> To be clear, there are two things to be converted:
> 
> 1)  HTMLElement.  This will give us code similar to the second patch I just
> posted in bug 802243, conceptually.
> 
> 2)  DOMStringMap.  This will be needed to make full use of the updated
> HTMLElement code.  The hard part here is possible regressions on the actual
> access to the dataset, even while the .dataset getter gets very fast.
DOMStringMap is already WebIDL.  You can see that by looking at the "Depends on" field of this bug.  And please don't overquote; it makes the bug totally unreadable in very short order....

In any case, the original testcase mostly shows the fact that setting a readonly property (which should be a no-op) is slow.  That's bug 803278.

The updated testcase at http://jsperf.com/data-dataset/2 shows these numbers for me:

                         setAttribute       Data       dataset

fx17                       1,111,086     8,319,882     272,714 

inbound                    1,080,779     8,031,168     378,654 

The obvious things blocking further performance improvements are the fact that sets on the dataset alias the "div" gets so we can't CSE the three separate .dataset instances during setting or loop-hoist the one .dataset before the gets and the fact that proxy performance is not that great in general.  

Proxy performance is bug 649887 and bug 824393.

There's no good way to really CSE the .dataset across the sets, since setting .dataset.foo can in fact change the value of "div"!  But bug 818912 might help a bit with those gets.
Depends on: 649887, 824393, 818912
Oh, and at http://jsperf.com/data-dataset/2 where I manually combined the .dataset gets, the numbers are like so:

                         setAttribute       Data       dataset

fx17                       1,094,170     8,013,543     573,241

inbound                    1,158,240     8,296,954     460,108 

(here the slowdown over fx17 is due to the conversion to a proxy).
The link in comment 20 should have been http://jsperf.com/data-dataset/3

That said, the attached testcase shows numbers quite different from jsperf for some reason.  In my local testing the jsperf setAttribute number is approximately right but the "dataset" number is just off by a factor of 2....
Sorry for overquoting, would have edited my comment the second I submitted it ;). And sorry if I ask some things over and over again because I only understand half of what is written here …

(In reply to Boris Zbarsky (:bz) from comment #13)
> 1)  HTMLElement.  This will give us code similar to the second patch I just
> posted in bug 802243, conceptually.
Is this last sentenced fixed with Bug 821606? Does the DOMStringMap WebIDL code need to be updated (as per comment #13, No. 2)?


(In reply to Boris Zbarsky (:bz) from comment #21)
> That said, the attached testcase shows numbers quite different from jsperf
> for some reason. 
The jsPerf test includes a teardown which resets data-* (via setAttribute) to null / removing the attributes. This teardown is called after every(?) test loop but is not counted. You probably hit some sort of "optimization" (e. g. if(new!==old) set();) if you re-set a dataset to the current state (but setAttribute does not?). But that's just a guess. (BTW, you forgot to close the PRE. ;) )
> Is this last sentenced fixed with Bug 821606? 

It was fixed in bug 802243, but it continues to be fixed with bug 821606 fixed and the custom quickstub removed, yes.

> Does the DOMStringMap WebIDL code need to be updated 

That was done in bug 803129.  Seriously, do please look at the "depends on" field... ;)

> The jsPerf test includes a teardown which resets data-* (via setAttribute) to null /
> removing the attributes.

Removing the teardown code, as in <http://jsperf.com/data-dataset/5>, changes absolutely nothing, not surprisingly (see below).

> You probably hit some sort of "optimization" (e. g. if(new!==old) set();) if you re-set
> a dataset to the current state (but setAttribute does not?)

Sets on a dataset call setAttribute internally.  setAttribute always compares new to old and returns immediately if equal.  So all the set bits in the testcase are just measuring the performance of this no-op operation anyway (because the teardown only happened between timed loops, so in each timed loop only the first set operation actually does anything and the rest are all no-ops).

I welcome any other hypotheses you may have about why jsperf shows totally different numbers from running the same code not in their harness.  ;)

> BTW, you forgot to close the PRE

Rather, I left it unclosed on purpose, because it doesn't matter and it wasn't worth the extra typing.  ;)
Depends on: 870514
Depends on: 870508
Depends on: 870566
Depends on: 874758
I'm calling this fixed, given the numbers today.
Status: UNCONFIRMED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Thanks a lot! Can you please post your numbers here?
For which testcase?
I was referring to your comment: 

(In reply to Boris Zbarsky (:bz) from comment #24)
> I'm calling this fixed, given the numbers today.

So I thought you reran the tests and compared them to the older ones …
Sure, but which exact testcases are you interested in?  There are several here, of varying levels of bogosity.
Note that it's obviously still slower than set/getAttribute, since it has to do strictly more work.
Yes, I'm aware of that. 

Any numbers will make me happy. But if I need to be specific, I'm very much interested in the ops/sec for dataset now (compared to the old numbers, e. g. from comment 20), and maybe also current data for get/setAttribute and a plain (non-DOM) data store (as tested in jsperf, or your standalone testcase), just to visualize overall performance improvements that do not touch dataset directly.
The equivalent of comment 20 in an opt build from today (and this should be in tomorrow's nightly if you want to try it yourself) is:

                         setAttribute       Data       dataset

                           1,305,011     9,066,636     703,758
Thanks a lot, those are quite better numbers!
No longer depends on: 824393
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.