All users were logged out of Bugzilla on October 13th, 2018

Much slower than WebKit on Dromaeo DOM getAttribute()




6 years ago
6 years ago


(Reporter: jrmuizel, Unassigned)


Firefox Tracking Flags

(Not tracked)



(3 attachments)



6 years ago,174430,174431

                Moz                     WebKit
getAttribute	73.03runs/s ±27.06%	224.39runs/s ±9.30%

I will try to get a profile for this.

Comment 1

6 years ago
It seems like this is not just on mobile.
With I get:
509 in Chrome and 1416 in FF
Summary: Much slower than WebKit on some of the DOM attribute tests on mobile → Much slower than WebKit on some of the DOM getAttribute()


6 years ago
Summary: Much slower than WebKit on some of the DOM getAttribute() → Much slower than WebKit on Dromaeo DOM getAttribute()

Comment 2

6 years ago
I think this is a dup, and IIRC sicking was looking at this some time ago.

Comment 3

6 years ago
A perf profile would be still great.
Jeff did a profile, and it looks like some DOM binding and string stuff, the ascii-lowercasing, and the general attribute name manipulation junk.

Bug 558516 covers a lot of the latter.

As far as DOM bindings go, on the testcase from comment 1 we make 102004000 calls to getAttribute in 15524ms on my machine, so about 152ns per call.  Chrome taks 5423ms, so 53ns per call.

I tried timing a method that just takes a (constant) string arg and returns an nsString (so makes use of the same refcounting-and-sharing setup that getAttribute would), to get an idea of binding performance.  Such a method with the quickstub bindings (which is what getAttribute is right now) seems to take about 90ns and with the webidl bindings takes about 72ns.  It'll get a little faster still when efaust does his specialized-native stuff for methods, but that'll save maybe another 10ns.

So we need to both reduce the 62ns of actual attr-value-getting gunk we have (starting with bug 558516) and do something to speed up the actual binding code here, since 62 > 53.  ;)

A profile of just the binding testcase says something like this:

  17% is self time in the generated binding code; efaust's stuff might help here.
  15% of the time is kernel time in vm_fault
  11% is under JS_NewInternalString
  10% is the overhead of the two(!) nested function calls it takes to get to there from
      the binding code (xpc::NonVoidStringToJsval and XPCStringConver::ReadableToJSVal)
  10% is the string assignment in the underlying C++ (the mov after the "lock add" is
      particularly prominent)
   9% are the destructors for the in-param dependent string and the return value
   2% is JS_GetStringCharsZAndLength
  17% is actual jitcode.

So yeah.

Jeff mentioned that WebKit has some sort of hashtable mapping string addresses to JS objects.  We could try doing something like that and seeing how much it helps; we'd gain an atomic release in the profile if we did that, I bet, since right now we steal the nsString's ref for the JSExternalString.
Depends on: 558516
A bit more data.  I did some testing with a new-binding method taking a string and returning nothing, a new-binding method taking nothing and returning a string, and new-binding method that takes nothing and returns nothing.  Based on that, the times needed for the three things involved are approximately:

  Method itself: 20ns
  String argument: 15ns
  String retval: 35s

which actually adds up to the 70ns I was seeing for the combination of the three above, yay.

Again, I think we can get the method itself down closer to 10-12ns.  We should probably get separate bugs filed for string arguments and string return values....

For string arguments, I wonder whether we can use a "fake" nsDependentString with an inline constructor and destructor.  Basically just have a class with the same member layout and a no-op destructor, which is enough for nsDependentString, and reinterpret_cast it when making the call.  This relies on there being no virtual stuff on strings, of course.  The current setup involves two function calls just to get to code that decides to do nothing for our dependent string.  :(  Peter, thoughts?

For string return values we should try this hashtable thing.  35ns is a pretty long time compared to even a hashtable lookup.
Filed bug 773519 on string arguments and bug 773520 on string return values.
Depends on: 773520, 773519
Oddly enough, an experiment to use non-atomic refcounting on stringbuffers doesn't seem to change much in the timing...
IIRC Justin saw the same thing last time he looked at that.
(In reply to Kyle Huey [:khuey] ( from comment #8)
> IIRC Justin saw the same thing last time he looked at that.

I did, and it's one of those experimental results I had difficulty believing, but never re-ran.  Maybe there's some other atomic op nearby which forces us to flush the string buffer refcount cache line even if we don't use atomic ops on the refcount.  I dunno.
We should make methods non-virtual too. InternalGetExistingAttrNameFromQName and GetAttr
(And we should have a method like InternalGetExistingAttrNameFromQName which returns the value)
So with the patches in the three bugs this depends on, we're down to about 90ns (from 145).

Switching to new bindings will buy us at least another 20, though we may be able to get some of that now if we switch to calling a non-virtual method from the custom quickstub.  If efaust's method patches go well, it might buy us 30.

Olli's removal of the XUL prototype stuff will let us remove a bit more code.

At that point we'll at least be in the same ballpark and it'll be worth reprofiling and filing off more rough edges (e.g. making the string-return fast path even faster by inlining the simple case, maybe).
Just as an update, compared to comment 5 I see these numbers with the patches this bug depends on:

  Method itself: 16ns
  String argument: 5ns
  String retval: 26s

getAttribute is down to about 90ns.  Of that time, about 35%, so about 31ns, is in the GetAttribute code itself.  That includes the string assignment inside nsAttrValue::ToString, etc.

efaust's patches should get the "Method itself" number down to 9s or so.  So then we'll be looking at a total of 71ns.

Doing better will involve getting rid of the two remaining layers of function calls for NonVoidStringToJsval and maybe speeding up the attr value get somehow...
Oh, and I tried hacking the stringbuffer refcounting to be non-atomic.... it doesn't seem to change things much(!).  I still get a lot of time on the memory traffic after the increment or decrement.

Another thought on what we could do better here: an inline-ish version of JS_GetStringCharsZAndLength.
Though the "String retval" time above includes some time that's currently spent under getAttribute.  So we might do better than 71, but I doubt we'd get below 60.
Created attachment 643736 [details] [diff] [review]
Hackypatch I've been using to measure performance
Created attachment 643737 [details]
HTML to exercise that stuff
I don't think we end up using stringbuffer's refcounting. Seems like we just copy.
In this particular test we handle atom attr values, so we should just share that atom to
JS. Would need to create a new kind of JSExternalString.
Atom addref/release should be cheap.
I'm wrong here. We do end up addref stringbuffer.
We should remeasure here now that the WebIDL bindings have landed.
Created attachment 699198 [details]

Modified the previous test.
Nightly 105-110, Chrome 90-95.
Bug 834877 will fix the rest of this.
Depends on: 834877
This is totally fixed on Mac.

Should still measure on ARM, I guess...
Last Resolved: 6 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.