Last Comment Bug 453738 - (trace-into-dom) Make the JS engine tracer emit code to call directly into the DOM's C++ functions.
(trace-into-dom)
: Make the JS engine tracer emit code to call directly into the DOM's C++ funct...
Status: RESOLVED WONTFIX
:
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: Trunk
: All All
: -- normal with 4 votes (vote)
: ---
Assigned To: Jason Orendorff [:jorendorff]
:
Mentors:
: 480183 (view as bug list)
Depends on: 457897 458735 458807 463153 480185 trace-quickstubs 480192
Blocks: 439371
  Show dependency treegraph
 
Reported: 2008-09-04 17:19 PDT by Johnny Stenback (:jst, jst@mozilla.com)
Modified: 2011-11-22 18:05 PST (History)
39 users (show)
pavlov: wanted‑fennec1.0+
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Simple getAttribute() testcase. (211 bytes, text/html)
2008-09-04 17:19 PDT, Johnny Stenback (:jst, jst@mozilla.com)
no flags Details
getElementsByTagName with JS Cache (115.05 KB, text/html)
2008-09-09 12:35 PDT, John Resig
no flags Details
getElementsByTagName with JS Cache (5000 loop) (115.71 KB, text/html)
2008-09-09 12:46 PDT, John Resig
no flags Details
getElementsByTagName with JS Cache (114.95 KB, text/html)
2008-09-09 13:13 PDT, John Resig
no flags Details
getElementsByTagName with JS Cache (114.88 KB, text/html)
2008-09-09 13:19 PDT, John Resig
no flags Details
getElementsByTagName with JS Cache (80,000 loop) (114.85 KB, text/html)
2008-09-09 13:52 PDT, John Resig
no flags Details
Tracing a DOM method call - proof of concept 1 (16.76 KB, patch)
2008-09-17 16:17 PDT, Jason Orendorff [:jorendorff]
no flags Details | Diff | Splinter Review
Avoiding QI - proof of concept 1 (12.98 KB, patch)
2008-09-19 17:18 PDT, Johnny Stenback (:jst, jst@mozilla.com)
no flags Details | Diff | Splinter Review
Tracing a DOM method call - proof of concept 2 (22.15 KB, patch)
2008-10-16 15:33 PDT, Jason Orendorff [:jorendorff]
no flags Details | Diff | Splinter Review

Description Johnny Stenback (:jst, jst@mozilla.com) 2008-09-04 17:19:58 PDT
Created attachment 336943 [details]
Simple getAttribute() testcase.

This came out of a discussion with gal, shaver, mrbkap, jonas, and myself (and has been discussed earlier to some extent as well).

It ought to give us a nice performance boost when dealing with the DOM from JS if we could make the tracer able to emit code to call directly into the DOM's C++ functions, and thus bypass all of xpconnect (in some cases at least). To do this we'd need to have the logic in the tracing code to find our way from DOM JSObject through it's private data, which is an XPCWrappedNative, which has the identity pointer on which the vtables etc can be found.

So to do this we need the JS engine to somehow know what interface the method to be called really lives in. XPConnect does this through its internal interface sets, ultimately using xptinfo to find what interface contains the property/method being called, and then does the QI to get to the right vtable. XPConnect internally only does a QI to the interface in question once and stores the result of that QI in its tearoffs, and doing that is somewhat important, especially when dealing with XPCOM tearoffs (and in general too since QI is somewhat expensive nowadays due to the cycle collectors purple marking).

The tracing code will need to be able to deal with the cases where we're calling from JS -> C++ -> JS, i.e. JS is making a DOM call that ends up back in JS through a mutation handler or whatever. Those cases do exist, but they're not the common cases. We'll also need to deal with the places in the code that does things like looking directly at cx->fp etc. Caps does that alot, but we won't hit it in these cases since we'll be bypassing XPConnect, which is who calls into caps in the common case. We'll need to look at other pieces of code that abuse similar things in the JS engine as well. gal says it's ok for code to ask for information like that, but it needs to be done through an API rather than directly accessing cx-fp or what not, that way the JS engine can compute the requested info if it's not directly available in the current state.

We'll also need to eventually deal with cases where we're calling into DOM functions that return objects, which now end up being wrapped through XPConnect. The idea of lazily doing the wrapping was suggested, i.e. to only do the wrapping if the JS code ends up storing a reference to it, or adds properties to the object, or reaches for its prototype or what not, at which point we would probably exit the trace, wrap the object, and continue on, maybe being able to do further tracing even in that case.

The initial plan is to take a simple case like getAttribute(), in a case like say var foo = document.body.getAttribute("class") on a simple HTML testcase (which I'll attach).

That's what I could remember of the discussion, others please fill in any missing pieces :)
Comment 1 Jonas Sicking (:sicking) No longer reading bugmail consistently 2008-09-04 20:53:00 PDT
To get the web (as opposed to chrome) fast it is only really important to do this in the cases when we have classinfo available. And only in the cases when no-one has done an explicit QI to any non-classinfo interface. I.e. it's really only important when the wrapper has a non-mutated set (I think that's the correct terminology).

Further, it's most likely ok to only do this when there are no expandos set. Though ideal is if we wouldn't completely bail when that happens.

One thing to watch out for though is that someone could have overridden for example .getAttribute by modifying the prototype.
Comment 2 Jason Orendorff [:jorendorff] 2008-09-09 11:30:34 PDT
I was investigating ways to trace across DOM methods.  The idea was a lot dumber than what you're describing here.  DOM methods would basically look just like any other function that we currently trace across (Math.floor, for example).  The JIT wouldn't have to do anything special--except support having more than a handful of those functions.

The win would be limited to handpicked methods that don't reenter JS.

Do we want that at all?  Separate bug?
Comment 3 Brendan Eich [:brendan] 2008-09-09 11:42:30 PDT
Separate bug, and an even quicker hack: make document.getElementById, e.g., be a JS function that uses a JS object to map from id to element, filling in the "miss" case using the real native (or its native "filler" if you want to spend time exposing that and avoiding the double-caching). Benchmark against our fully native document.getElementById. Repeat for childNodes, etc.

/be
Comment 4 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2008-09-09 11:48:40 PDT
I don't believe that's a quick hack. Populating and maintaining that JS object in the face of DOM mutations is not trivial, or free, and there's memory overhead if you're duplicating the C++ element map. (And if you're not, it's even more complicated.)
Comment 5 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2008-09-09 12:21:38 PDT
If you can trace up to the getElementById C++ entry point, and carry on tracing
after it, then the win from inlining the fast path boils down to avoiding
spills, does it not?

Maybe you can do a simpler experiment to estimate what the cost of those spills
would be. I'm having a hard time believing it would be significant for us
anytime soon.
Comment 6 John Resig 2008-09-09 12:35:39 PDT
Created attachment 337714 [details]
getElementsByTagName with JS Cache

I ran some tests, querying against a large document. I ran the tests once normal and again after having run this code:

  var g = document.getElementsByTagName, cache = {};
  document.getElementsByTagName = function(name){
    return cache[name] || (cache[name] = g.call(this, name));
  };

When I ran the tests 5000 times I got these results (lower is better):
          | Normal | w/ Cache |
Non-Trace |    129 |       97 |
Traced    |    139 |      108 |

When I ran the tests non-stop for a second I got these results (higher is better):
          | Normal | w/ Cache |
Non-Trace |  76586 |   132556 |
Traced    |  70384 |   124963 |

Let me know if any tweaks need to be made to the test.
Comment 7 Brendan Eich [:brendan] 2008-09-09 12:44:33 PDT
roc: we covered this on IRC, but I was proposing a test to see what tracing can win, ignoring coherence and correctness and that boring stuff. Buzzkill!

Looks like we fall off trace. bz is on the case -- great. We need more hacking, less yacking :-P.

/be
Comment 8 Mike Shaver (:shaver -- probably not reading bugmail closely) 2008-09-09 12:45:38 PDT
We could perhaps expose a "shape" for the DOM to invalidate the cache, even just a simple generation counter that's updated wherever we check the "mutation listeners registered?" flag, but the memory overhead does indeed remain without a shared data structure between C++ and JS.  Maybe using the JS object from C++ would be straightforward, but we're certainly adding to the complexity.

I think comment 2 in this bug is something we need for scaling beyond our hand-coded wrapping: a way for embedders to say "when tracing this function, let me know" such that they can emit LIR -- possibly specialized offset-loads as for .firstChild, or maybe just a LIR_call to a fastcall helper (the case we currently do).  There's room in the native-side union of JSFunction for a bit to indicate that clasp is really traceHelper; I can elaborate in a dependent bug later today, if my trajectory of improvement holds.

(And I think we still need to think a bit more about how to sink wrapper creation to side exits and heap stores, since the xpconnect wrapping machinery is still in play even post DOM-stubs.  That cursed monitor!)
Comment 9 John Resig 2008-09-09 12:46:48 PDT
Created attachment 337716 [details]
getElementsByTagName with JS Cache (5000 loop)

I just added the other version of the test that I used (that does 5000 loops, as opposed to running for a second).
Comment 10 Brendan Eich [:brendan] 2008-09-09 12:52:14 PDT
roc: the other point is that spills hurt. We spill everything, including all the SSE registers.

(In reply to comment #8)
> We could perhaps expose a "shape" for the DOM to invalidate the cache, even
> just a simple generation counter that's updated wherever we check the "mutation
> listeners registered?" flag,

We have shapes, tracing uses them (with the patch I'm trying to get landed to BRANCH_EXIT) to build trace-trees to cover all the common shapes for a given decision point (pc in the interpreter). Any system that traces will need some guard like this, it's not the issue.

> but the memory overhead does indeed remain without
> a shared data structure between C++ and JS.  Maybe using the JS object from C++
> would be straightforward, but we're certainly adding to the complexity.

Any plan to self-host would do want to remote the double caching, one way or another.

> I think comment 2 in this bug is something we need for scaling beyond our
> hand-coded wrapping: a way for embedders to say "when tracing this function,
> let me know" such that they can emit LIR -- possibly specialized offset-loads
> as for .firstChild, or maybe just a LIR_call to a fastcall helper (the case we
> currently do).  There's room in the native-side union of JSFunction for a bit
> to indicate that clasp is really traceHelper; I can elaborate in a dependent
> bug later today, if my trajectory of improvement holds.

Self-hosting beats LIR generation if it can win on perf and be correct and all that boring stuff :-).

The alternative to using a JS object is asm-for-JS: unmanaged JS windows into C++, JS peeking and poking and so on with a C++ hashtable. But the quick hack (on trace, so we can judge it fairly) first, because making a coherent cache out of a JS object is easier than making asm windows or generating LIR.

> (And I think we still need to think a bit more about how to sink wrapper
> creation to side exits and heap stores, since the xpconnect wrapping machinery
> is still in play even post DOM-stubs.  That cursed monitor!)

Where's the bug on XPConnect's monitor? Feng Qian was going to fix it two years ago. We really should not let it stay in the tree another day, as a main-thread XPConnect cost.

/be
Comment 11 Brendan Eich [:brendan] 2008-09-09 12:54:20 PDT
> Any plan to self-host would do want to remote the double caching, one way or

Er, s/do want to remote/want to remove/ of course.

/be
Comment 12 John Resig 2008-09-09 13:13:12 PDT
Created attachment 337720 [details]
getElementsByTagName with JS Cache

Tweaked the first patch to not be in an event handler. New results:

          | Normal | w/ Cache |
Non-Trace |  75644 |    83625 |
Traced    |  65479 |    82504 |
Comment 13 John Resig 2008-09-09 13:19:45 PDT
Created attachment 337722 [details]
getElementsByTagName with JS Cache

Per bz's request, no longer add the function back as the old property. New numbers - starting to see a tracing win:

          | Normal | w/ Cache |
Non-Trace |  74865 |   392155 |
Traced    |  72882 |   420756 |
Comment 14 John Resig 2008-09-09 13:52:24 PDT
Created attachment 337731 [details]
getElementsByTagName with JS Cache (80,000 loop)

Another version of the test - even simpler. Just a basic loop, run 80,000 times, no timer in the loop.

          | Normal | w/ Cache |
Non-Trace |    907 |      127 |
Traced    |    983 |       45 |
Comment 15 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2008-09-09 14:49:39 PDT
(In reply to comment #10)
> roc: the other point is that spills hurt. We spill everything, including all
> the SSE registers.

Ow! Is there a deep reason that's needed?
Comment 16 Brendan Eich [:brendan] 2008-09-09 15:14:18 PDT
(In reply to comment #15)
> (In reply to comment #10)
> > roc: the other point is that spills hurt. We spill everything, including all
> > the SSE registers.
> 
> Ow! Is there a deep reason that's needed?

We don't have better knowledge of the built-in's register usage, yet. UCI had some disassembly code to analyze builtins and improve the HotPath VM's spills based on exact reg usage, Graydon was looking at applying it or something like it to TM. Cc'ing him (I forget the bug #).

/be
Comment 17 Graydon Hoare :graydon 2008-09-09 15:24:05 PDT
You want bug 440601, but I have left it in a state of "probably can't win" due to the generally-high register use on the functions I can statically analyze, and the generally-pervasive nature of jumps through PLT, which defeats the static analysis.

A runtime disassembler might work, but it's a whole other kettle of fish. If you'd like to prod me into integrating one (there was one I was looking at earlier, udis86), make note of it in that bug; at the time it sounded like hand-coding LIR for the small builtin functions in question would be a better use of time, but in the case of DOM methods, we'd obviously need a runtime disassembler.
Comment 18 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2008-09-09 16:47:23 PDT
(In reply to comment #16)
> (In reply to comment #15)
> > (In reply to comment #10)
> > > roc: the other point is that spills hurt. We spill everything, including all
> > > the SSE registers.
> > 
> > Ow! Is there a deep reason that's needed?
> 
> We don't have better knowledge of the built-in's register usage, yet.

I was actually thinking more about spilling only what's live at that point in the trace.
Comment 19 Andreas Gal :gal 2008-09-16 07:59:11 PDT
Johnny and I talked yesterday about how to reduce the wrapping cost for DOM node objects and make them callable from trace code. We want to try to have DOM nodes implement the JS object interface similar to crowder's jit object (455065). As long the DOM node is unmodified (from the JS side) it can be accessed directly without wrapping through that interface. If JS code modifies it, adds properties or what not, we wrap on the fly and mutate the object to be native. This should give a nice speed boost overall, tracing or not.

The second step is to enable JIT code to call methods on these objects. For that we will add flat extern "C" functions that take the object as first argument. Arguments will be type specialized by hand (i.e. String in, Object out), and later maybe automatically based on profile date (what DOM methods get called with what arguments).
Comment 20 Boris Zbarsky [:bz] 2008-09-16 08:12:05 PDT
In terms of web compat, wouldn't any site that uses one of the JS toolkits pretty much automatically force node wrapping?  Or do the toolkits add to the prototype, not to the node?  John, do you happen to know offhand?
Comment 21 Mike Shaver (:shaver -- probably not reading bugmail closely) 2008-09-16 08:31:11 PDT
A lot of toolkits use expandos on a good proportion of nodes to help work around IE garbage collection problems (storing IDs in expandos rather than references to the objects), as I understand it.  We should measure that on some sites before we decide to spend too much time optimizing for the non-expando case, perhaps.

(We need logic for wrapping the naked DOM-object pointers in side exits and heap stores or wherever we sink the wrapping to, as well.)
Comment 22 Andreas Gal :gal 2008-09-16 08:38:26 PDT
We could add limited support for added properties to the special DOM object interface. If you add less than N=4 properties of a certain commonly seen kind then we keep you unwrapped. Benchmarks would be great. Definitively must have before we can nail down the details.
Comment 23 Andreas Gal :gal 2008-09-16 08:44:06 PDT
The cost for making all DOM nodes look like JSObjects is 2 words per DOM node. Maybe we should just add 2 dummy words and measure the overhead? That seems easy enough to do on the quick.
Comment 24 John Resig 2008-09-16 08:56:51 PDT
(In reply to comment #20)
> In terms of web compat, wouldn't any site that uses one of the JS toolkits
> pretty much automatically force node wrapping?  Or do the toolkits add to the
> prototype, not to the node?  John, do you happen to know offhand?

A lot of libraries add expandos (as Shaver said) for a variety of reasons (event storage, garbage collection, etc. - jQuery does this, for example). Only a couple libraries add to the element prototype (like Prototype and Mootools). All three of these libraries are tested in Dromaeo.
Comment 25 Boris Zbarsky [:bz] 2008-09-16 09:27:04 PDT
I just started a tryserver build with two PRUint32s tossed onto the end of nsINode.  Let's see what tp_RSS looks like with that.
Comment 26 Johnny Stenback (:jst, jst@mozilla.com) 2008-09-16 12:40:25 PDT
John, any guess at how many of these libraries override DOM methods on DOM object prototypes?
Comment 27 Boris Zbarsky [:bz] 2008-09-16 13:38:15 PDT
tp_RSS on Linux is unchanged with those extra two words (within the noise; certainly less than 1% change).  Mac seemingly decided to not build the patch, and the Windows tryserver is perma-orange.  :(
Comment 28 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2008-09-16 15:28:50 PDT
(In reply to comment #19)
> The second step is to enable JIT code to call methods on these objects. For
> that we will add flat extern "C" functions that take the object as first
> argument. Arguments will be type specialized by hand (i.e. String in, Object
> out), and later maybe automatically based on profile date (what DOM methods get
> called with what arguments).

Can someone explain why this is needed? The trace would be calling into particular C++ functions whose C++ and ABI signatures are known. Why can't we just generate trace code to do what xptcall and XPConnect already do?
Comment 29 Andreas Gal :gal 2008-09-16 15:35:36 PDT
#28: Calling a C++ method directly via the virtual method table is tricky and platform dependent. Calling into a extern "C" function that forwards the call seems cheap enough and it eliminates the vt headache.
Comment 30 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2008-09-16 15:40:32 PDT
We already depend on the vtable layout with xptcall. Adding a bunch of either manually or automatically generated stub functions sounds just as much of a headache to me, and with extra overhead.
Comment 31 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2008-09-16 15:45:13 PDT
Note that if the tracer does know about the vtable, then we can guard on the function pointer being correct for the object and jump directly to the function, effectively getting PIC for DOM methods, which would be rather nice.
Comment 32 Andreas Gal :gal 2008-09-16 15:50:03 PDT
We would get that both ways. Each extern "C" function is for a specific node type and a specific method in it, so you can cast the pointer to the concrete type and no vt dispatch will occur. The trace does know the precise target address of the function its calling, and guards on the shape of the dom node. I am not terribly opposed to the vt magic, but I predict horrible things to happen if GCC mucks with it, which they sometimes do.
Comment 33 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2008-09-16 16:40:48 PDT
They're not supposed to, the ABI is supposed to be frozen. All the gcc-ABI-breakage that I'm aware of in modern times is around symbol names, which don't bother us.

Indeed, the Win32 x86 xptcinvoke hasn't changed for ABI reasons since it was first checked in in 1999. AFAICT the last such change for xptcinvoke on Linux/x86 was in 2001, adding support for gcc 2.7.x (!). So I wouldn't worry about churn there.
Comment 34 Andreas Gal :gal 2008-09-16 16:55:22 PDT
As Johnny just pointed out, it might actually be a bad idea to specialize/PIC for DOM node types since script might iterate over nodes with varying types. If we don't want to inline and actually perform a vt dispatch, that would be a pretty strong argument in favor of extern "C" wrappers again. I just looked at the xp connect vt dispatch code and it looks ... scary. Performance wise I don't the additional hop into a C function matters. We will need an intermediate hop anyway since we can only call FASTCALL functions (first 2 arguments passed in registers).
Comment 35 Andreas Gal :gal 2008-09-16 16:59:42 PDT
David remarked that Adobe has a patch to add additional calling conventions to nanojit, which might make it possible to call methods directly without a FASTCALL wrapper.
Comment 36 Johnny Stenback (:jst, jst@mozilla.com) 2008-09-16 17:04:48 PDT
But whether we can make the call directly or not, some code somewhere still needs to do a QI to the right interface on the object in question depending on which interface the method being called comes from, as the this pointer might need to be adjusted etc. And doing that in the JS engine doesn't seem right, plus we must optimize away as many QI's we possibly can as they're generally speaking pretty slow now, especially on objects that participate in cycle collection.
Comment 37 Peter Van der Beken [:peterv] 2008-09-17 02:53:34 PDT
(In reply to comment #19)
> We want to try to have DOM
> nodes implement the JS object interface similar to crowder's jit object
> (455065).

> This should give a nice speed boost overall, tracing or not.

We would use that JSObject even when not tracing? Also, how do we get to 2 words per DOM object?
Comment 38 Jason Orendorff [:jorendorff] 2008-09-17 16:17:06 PDT
Created attachment 339150 [details] [diff] [review]
Tracing a DOM method call - proof of concept 1

The patch includes a file at the root, test.html, that contains a tight loop for testing.

Caveats:

- It's a total hack.  Correctness was not a goal.  It only traces one method, nsIDOMElement.getAttribute(string).

- The performance gains are modest here.  We're staying on trace, but the win is only 20% on my machine.  As jst points out there's a lot of plumbing that JITting doesn't address.

- That particular method returns a string if the attribute is found and null if it's not.  In the present horrible hack, this null-but-success path causes us to bail off trace and (I think) call the method again.  Which is terrible if the loop is looking for an attribute that *isn't* there; I reap a nice 500% slowdown or something in that case.
Comment 39 Jason Orendorff [:jorendorff] 2008-09-17 16:19:25 PDT
P.S.

- That patch does not emit a thiscall.  The JITted code calls a special FASTCALL stub function written in C++; the stub QIs and calls the DOM method.
Comment 40 Andreas Gal :gal 2008-09-17 17:15:55 PDT
Jason, any idea where we bleed performance in that test? The loop should be pretty fast, so we must be spending a ton of time in the DOM call itself.
Comment 41 Andreas Gal :gal 2008-09-17 17:17:19 PDT
#37: yes, that object replaces the current wrappers. 2 words is JSObject minus the fslots and dslots fields. That seems to be the minimum subset of JSObject that we need to be structurally compatible.
Comment 42 Peter Van der Beken [:peterv] 2008-09-18 05:18:15 PDT
How much of the time is spent in xpc_qsUnwrapThis and how much of that in the QI? Hopefully getting the nsISupports pointer should be fairly fast. If we can ensure we only get into that function for elements we should be able to cast from nsISupports to nsGenericElement (which all our elements derive from), that would avoid the QI if it's the main cost. For other methods we should be able to cast to nsINode/nsIContent/nsIDocument as needed too.
Comment 43 Jason Orendorff [:jorendorff] 2008-09-19 09:08:29 PDT
100%  JITted code
    1.6%  self
    1.5%  js_EqualStrings
   96.9%  rffn_dom_magic_GetAttribute
        3.5%  self
       57.1%  GetAttribute method
       16.7%  pack result (convert string c++ -> js)
       12.0%  unpack "this" (including QI)
            2.5%  self
            9.5%  QI
        5.0%  string memory management
        4.0%  AddRef/Release/cycle collector
        1.9%  unpack argument (convert string js -> c++)
Comment 44 Andreas Gal :gal 2008-09-19 09:23:32 PDT
57% for the GetAttribute method sounds still a lot. What happens in there? Isn't that just a data structure lookup? Even if we call GetAttribute directly (which we probably can with some effort), that would "only" give us a 2x speedup if GetAttribute is so heavyweight.
Comment 45 Jason Orendorff [:jorendorff] 2008-09-19 12:17:14 PDT
More than three quarters of the time in GetAttribute is in here:

http://hg.mozilla.org/mozilla-central/file/92a8ef85fbc2/content/html/content/src/nsGenericHTMLElement.cpp#l3461

and the other quarter in nsGenericElement::GetAttr(int, nsIAtom*, nsAString_internal&).

Call it a quarter in AppendUTF16toUTF8, a quarter in ToLowerCase, a quarter in nsGenericElement::GetAttr, an eighth in nsAttrAndChildArray::GetExistingAttrNameFromQName, and the rest in string overhead.  Separate bug, if anyone cares.  I'll trace a different method.
Comment 46 Jason Orendorff [:jorendorff] 2008-09-19 13:28:13 PDT
OK, tracing hasChildNodes instead of getAttribute, I get:

  without JIT: 247msec    <--  2.04x faster
     with JIT: 121msec
Comment 47 Jason Orendorff [:jorendorff] 2008-09-19 14:22:17 PDT
Er, "with JIT" is the faster one, of course.

With the JIT enabled here's the somewhat disappointing profile.

100%  JITted code
   11.7%  self
   88.3%  rffn_dom_magic_HasChildNodes
        5.7%  self
       56.4%  xpc_qsUnwrapThisImpl
           10.4%  self
           46.1%  QueryInterface
       18.0%  Release (just over half in cycle collector)
        8.1%  HasChildNodes method
Comment 48 Boris Zbarsky [:bz] 2008-09-19 15:44:06 PDT
I filed bug 456123 on GetAttribute performance.
Comment 49 Johnny Stenback (:jst, jst@mozilla.com) 2008-09-19 17:18:45 PDT
Created attachment 339539 [details] [diff] [review]
Avoiding QI - proof of concept 1

This is one way we could eliminate the need for doing a QI for every call. This uses the compiler to calculate this pointer offsets at compile time and then at call time we use that to offset the identity pointer, after that we simply make the call. This hooks into QI to get to the compile time computed this pointer offsets (and thus violates COM's QI rules, but that's almost common by now, so whatever). This combined with Jason's tracing proof of concept should make things noticeably faster. Conceptually speaking, at least :)
Comment 50 Johnny Stenback (:jst, jst@mozilla.com) 2008-09-19 17:20:09 PDT
Oh, and the attached patch only makes getAttribute() and hasChildNodes() faster on HTMLAnchorElement, HTMLBodyElement, and HTMLDivElement.
Comment 51 Jason Orendorff [:jorendorff] 2008-09-30 13:25:40 PDT
Filed bug 457897 -  Remove QI on 'this' object when calling from JS to C++.
Comment 52 Jason Orendorff [:jorendorff] 2008-10-06 08:40:11 PDT
Filed bug 458735 - Improve internal API for traceable natives.
Comment 53 Jason Orendorff [:jorendorff] 2008-10-06 11:47:07 PDT
Status:

I'm getting close to a minimal patch that will allow TraceMonkey to trace across a very limited subset of DOM methods, for real.  None of the limitations is insurmountable, and I expect we will address them all in follow-up bugs.  But for now, they are:

* Methods only.  No getters or setters are traceable yet.

* Effectful methods that could allocate memory can't be traced.  This includes effectful methods that (a) return a string or object or (b) can fail with an exception.

* Methods that could re-enter the JSAPI can't be traced.  This includes all DOM mutating methods, since they can fire events to JS listeners.

* "void" return type isn't supported yet.

* Only methods on DOM objects that have peterv's ThisPtrOffsets (see bug 457897) are traceable.  (If ThisPtrOffsets aren't present at run time, we just fall off trace.)

* Only methods that already have quick stubs are even considered, as I'm generating the traceable natives in qsgen.py, the same script that generates quick stubs.

* If an argument is not the exact type the method expects, we can't convert the argument on-trace, so we fall off trace.

* Similarly, if null is passed to a DOMString argument, we will fall off trace.  (The same would go for null DOMString return values, but we aren't tracing such methods yet anyway.)

* If 'this' is not a DOM object but an XPConnect wrapper or a plain Object that inherits from a DOM object, we fall off trace.
Comment 54 Jason Orendorff [:jorendorff] 2008-10-16 15:33:01 PDT
Created attachment 343467 [details] [diff] [review]
Tracing a DOM method call - proof of concept 2

Just a snapshot.  This adds a lot of stuff in qsgen.py that isn't tested yet.
Comment 55 Jason Orendorff [:jorendorff] 2008-11-05 08:55:10 PST
Update:  experience with the static analysis, plus looking at the whitelisted traceable natives in js/src, has convinced me that the way to go is to make all JSFastNatives traceable.  This approach requires fixing bug 462027, which blocks other important bugs too.  So that's what I'm working on.
Comment 56 Ben Turner (not reading bugmail, use the needinfo flag!) 2009-02-25 14:01:54 PST
*** Bug 480183 has been marked as a duplicate of this bug. ***
Comment 57 Ryan VanderMeulen [:RyanVM] 2011-11-22 18:05:41 PST
Obsolete with the removal of tracejit.

Note You need to log in before you can comment on or make changes to this bug.