Closed Bug 582766 Opened 11 years ago Closed 11 years ago

JIT vastly slows down Data Manager (let in .nextSibling loop makes TM loop way too often)

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
major

Tracking

()

RESOLVED FIXED
Tracking Status
blocking2.0 --- beta3+

People

(Reporter: kairo, Assigned: n.nethercote)

References

Details

(Keywords: regression, Whiteboard: fixed-in-tracemonkey)

Attachments

(2 files)

I found that selecting domains in Data Manager is really slow with JIT enabled (I estimate 3.4 seconds) when it's almost instantaneous with JIT disabled.

I assume that loops like this one are where the slowdown happens: http://hg.mozilla.org/users/kairo_kairo.at/dataman/file/815d5e4cf563/content/dataman.js#l613 - the hostMatchesSelected function called in that loop calls http://hg.mozilla.org/users/kairo_kairo.at/dataman/file/815d5e4cf563/content/dataman.js#l211 and the URL service call (which jumps into C++ code) is probably throwing us off the trace, which could be the cause of that massive slowdown.

I'll augment the code by doing a cache of host -> domain resolving on the JS side, but I still think it's a bug when the tracing code makes this so much slower compared to the interpreter.
This is current mozilla-central trunk JS code, so TM. If JM helps with this, I'll cheer a lot for sure!
Still, it's a tight loop, so TM should try to jump in in theory, but it happens to be thrown off a lot.
Actually, what I get with TMFLAGS=minimal,abort is this:

Recording starting from chrome://dataman/content/dataman.js:649@242 (FragID=000000)
trace stopped: 3512: non-stub getter
trace stopped: 12975: name() not accessing a valid slot
Abort recording of tree chrome://dataman/content/dataman.js:649@242 at chrome://global/content/bindings/tree.xml:0@8: name.

(this is with code I already modified somewhat, so what that output say is line 649 is what's 613 in comment #0)

What I wonder about is how it comes to calling into tree.xml there...
I optimized that loop so that it doesn't fall off trace in http://hg.mozilla.org/users/kairo_kairo.at/dataman/rev/f42548e3f20b but I still get the same effect, i.e. selecting a domain in Data Manager being quite slow with JIT (about 3 seconds for most domains in my debug build, I saw one pretty random case that only has one cookie that took 180 ms, all others I tried were 1.5 to 3 seconds) and very fast without JIT (4 to 8 milliseconds!) - I don't see any trace aborts in anything that would loop significantly now (with TMFLAGS=minimal,abort) so I really wonder what's up there.

If you want to try experimenting with this, putting the dataman repository into extensions/dataman and building with --enable-extensions=default,dataman gives it to you as pre-installed add-on, and the version the repo was at in comment #0 is version 0.9 on https://addons.mozilla.org/addon/162068
Robert, where are you placing your probes to measure the times you mention in comment 5?  Assume I'm at data manager rev f42548e3f20b.
(In reply to comment #6)
> Robert, where are you placing your probes to measure the times you mention in
> comment 5?  Assume I'm at data manager rev f42548e3f20b.

That rev includes the very basic way I took for measuring, just turn the gDatamanDebug var to true at the very beginning of dataman.js and look into error console when selecting different domains. I just watched the differences between "Domain selected" and "Domain select finished", so that's the start of end of domain_select - by very simple |Date.now()/1000| printouts (sorry, I don't know more involved actual profiling tools too well).
OK, I just tried doing that, and I can reproduce the bug.  Profiles show us spending a lot more time in DOM manipulation, and in fact this loop is responsible (in sort_cookies):

    for (let node = column.parentNode.firstChild; node; node = node.nextSibling) {
      node.removeAttribute("sortActive");
      node.removeAttribute("sortDirection");
    }

Note the let.  If let is replaced with var the bug goes away.

Simple logging indicates that if the loop is run without jit it runs 5 iterations, as expected.  With jit it runs ... a variable number of times much bigger than 5.  My last two tests were 7593 iterations and 28730 iterations.

Nominating for blocking, since we do use let in our chrome and all....
Severity: normal → major
blocking2.0: --- → ?
This could be related to the tbird chrome-jit problem, too.
If I log the actual value of node as I go (or rather push into an array and then join the array on '\n' at the end, so as to stay on trace) and pipe the result through uniq -c I get things like:

    1 [object XULElement @ 0x2067b2d0 (native @ 0x2064a040)]
    1 [object XULElement @ 0x206ca3a0 (native @ 0x2064a0a0)]
    1 [object XULElement @ 0x2067b620 (native @ 0x2064a0e0)]
52458 [object XULElement @ 0x206ca430 (native @ 0x2064a150)]
    1 [object XULElement @ 0x2067b9f0 (native @ 0x2064a190)]

So for some reason we spin for a while there on the fourth loop iteration...

Robert is going to see if he can create a smaller testcase here.
Attached file Simplified test case
Here's a quite simple test case. Open it, click the button and have fun!
BTW, with JIT turned on, I get different counts every time, but things like "Your actual count is 25870809." - with JIT off, it's always 7 as it should be.
I've run the test case with some older builds, Mozilla/5.0 (X11; Linux i686 (x86_64); en-US; rv:2.0b2pre) Gecko/20100714 Minefield/4.0b2pre (rev 5355f245b916) works, Mozilla/5.0 (X11; Linux i686 (x86_64); en-US; rv:2.0b2pre) Gecko/20100715 Minefield/4.0b2pre (rev 5fda39cd70) shows the problem.

Probably was introduced with the merge of tracemonkey into mozilla-central: http://hg.mozilla.org/mozilla-central/pushloghtml?changeset=c26c255bade9
Keywords: regression
Summary: JIT vastly slows down Data Manager → JIT vastly slows down Data Manager (let in .nextSibling loop makes TM loop way too often)
Bruno, are you willing to try some tracemonkey nightlies to pin this down to a smaller range?
TraceMonkey:

July 11: |468e317da3d8| WFM
July 12: |cf557e3fc53d| Fail

So range is: http://hg.mozilla.org/tracemonkey/pushloghtml?fromchange=468e317da3d8&tochange=cf557e3fc53d

(which also has a m-c merge in it)
As m-c only regressed later, it can't really be that merge, so http://hg.mozilla.org/tracemonkey/rev/cf557e3fc53d from bug 577764 looks very suspect. njn did that patch, dvander reviewed, I guess they should know most about what's going on with this regression.
hg bisect says:

The first bad revision is:
changeset:   47491:cf557e3fc53d
user:        Nicholas Nethercote <nnethercote@mozilla.com>
date:        Mon Jul 12 11:01:58 2010 +1000
summary:     Bug 577764 - TM: avoid a dead guard at the end of loop fragments.  r=dvander.
Blocks: 577764
That's really weird that bug 577764 would cause this, it was a very simple patch.  I'll take a look on Monday.
Yes, the patch for bug 577764 did cause this.  Here's an even simpler test case that can be run from the JS shell:

    var fourth = { nextSibling: null };
    var third  = { nextSibling: fourth };
    var second = { nextSibling: third };
    var first  = { nextSibling: second };

    function runLoop() {
      let loopcount = 0;
      for (let node = first; node; node = node.nextSibling) {
        loopcount++;
      }
      return loopcount;
    }

    print(runLoop());

Here's the LIR for it without the bogus patch from bug 577764:

      ebx = parami 0 ebx
      esi = parami 1 esi
      edi = parami 2 edi
      state = parami 0 ecx
      label1:
      sp = ldi.o state[8]
      (void *) &JS_THREAD_DATA(cx)->operationCallbackFlag = immi ......
      ldi1 = ldi.o/v (void *) &JS_THREAD_DATA(cx)->operationCallbackFlag/*......
*/[0]
      immi1 = immi 0
      eqi1 = eqi ldi1, immi1/*0*/
      xf1: xf eqi1 -> pc=...... imacpc=(nil) sp+8 rp+0 (GuardID=001)
      ldi2 = ldi.s sp[-8]
      immi2 = immi 1
      addxovi1 = addxovi ldi2, immi2/*1*/ -> pc=...... imacpc=(nil) sp+8 rp+0 (G
uardID=002)
      sti.s sp[-8] = addxovi1
      $stack5 = ldi.s sp[0]     # !!!
      map = ldi.o $stack5[0]
      shape = ldi.o map[0]
      immi3 = immi 192
      guard_kshape = eqi shape, immi3/*192*/
      xf2: xf guard_kshape -> pc=...... imacpc=(nil) sp+8 rp+0 (GuardID=003)
      ldi3 = ldi.o $stack5[36]
      JSVAL_TAG_OBJECT = immi ......
      eqi2 = eqi ldi3, JSVAL_TAG_OBJECT/*......*/
      xf3: xf eqi2 -> pc=...... imacpc=(nil) sp+8 rp+0 (GuardID=004)
      ldi4 = ldi.o $stack5[32]
      ldi5 = ldi.o ldi4[4]
      clasp = immi ......
      guard(class is Function) = eqi ldi5, clasp/*......*/
      xt1: xt guard(class is Function) -> pc=...... imacpc=(nil) sp+8 rp+0 (Guar
dID=005)
      sti.s sp[0] = ldi4        # !!!
      j -> label1
      livei state
      x1: x  -> pc=...... imacpc=(nil) sp+8 rp+0 (GuardID=006)

When the patch is applied, the final 'x1' guard is removed.  As a result,
the StackFilter also removes the final store to sp[0], which I've marked
with "!!!".  So when we loop and load from sp[0] the next time (also marked
with "!!!") the value hasn't been set appropriately.

So that final guard, even though it's dead, is serving a purpose... it's
indicating which stack values are live at the end of the loop.  Wow, nothing like a "trivial" patch that goes subtly wrong to sap one's confidence!
Attached patch patchSplinter Review
This patch reverses the patch from bug 577764 and also adds a comment explaining why the dead guard must be present.
Assignee: general → nnethercote
Status: NEW → ASSIGNED
Attachment #461966 - Flags: review?(dvander)
Comment on attachment 461966 [details] [diff] [review]
patch

Oh, how gross. Sorry for not catching this.
Attachment #461966 - Flags: review?(dvander) → review+
Sorry I wrote the patch in the first place :(

Pushed with the test in comment 19 added to trace-tests (which dvander r+'d on IRC):

http://hg.mozilla.org/tracemonkey/rev/06d0143b4f7e
Whiteboard: fixed-in-tracemonkey
http://hg.mozilla.org/mozilla-central/rev/06d0143b4f7e
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
blocking2.0: ? → beta3+
You need to log in before you can comment on or make changes to this bug.