Closed Bug 517909 Opened 15 years ago Closed 13 years ago

Perf regression August 25th

Tracking

()

Status:

RESOLVED WONTFIX

People

(Reporter: sayrer, Assigned: gal)

References

Details

Attachments

(9 files, 1 obsolete file)

tsspider windows xp 15 years ago Robert Sayre 137.23 KB, image/png		Details
txul windows xp 15 years ago Robert Sayre 153.87 KB, image/png		Details
txul ubuntu 7.10 15 years ago Robert Sayre 141.16 KB, image/png		Details
txul += shark 15 years ago Jason Orendorff [:jorendorff] 2.31 KB, patch		Details \| Diff \| Splinter Review
Patch to time currentset setter 15 years ago Boris Zbarsky [:bzbarsky] 3.00 KB, patch		Details \| Diff \| Splinter Review
The "after" TMFLAGS=full log 15 years ago Boris Zbarsky [:bzbarsky] 1.18 MB, application/x-bzip2		Details
patch 15 years ago Andreas Gal :gal 6.51 KB, patch		Details \| Diff \| Splinter Review
patch 15 years ago Andreas Gal :gal 6.52 KB, patch	graydon : review+	Details \| Diff \| Splinter Review
patch 15 years ago Andreas Gal :gal 6.52 KB, patch		Details \| Diff \| Splinter Review
avoid JSArenas in snapshot(), since JSArenas are evil 15 years ago Andreas Gal :gal 4.57 KB, patch		Details \| Diff \| Splinter Review

Robert Sayre

Reporter

Description

•

15 years ago

I think this is probably tracing getters. Details coming.

Robert Sayre

Reporter

Comment 1

•

15 years ago

Attached image tsspider windows xp — Details

Robert Sayre

Reporter

Comment 2

•

15 years ago

Attached image txul windows xp — Details

Robert Sayre

Reporter

Comment 3

•

15 years ago

Attached image txul ubuntu 7.10 — Details

Robert Sayre

Reporter

Comment 4

•

15 years ago

assigning to jorendorff on the assumption that it was tracing getters that caused this.

Assignee: general → jorendorff

Robert Sayre

Reporter

Updated

•

15 years ago

Flags: blocking1.9.2+

Priority: -- → P1

Damon Sicore (:damons)

Comment 5

•

15 years ago

Any progress here?

Jason Orendorff [:jorendorff]

Comment 6

•

15 years ago

Attached patch txul += shark — Details — Splinter Review

Here's the patch to txul I've been using to get profiles.

The profiles have not been helpful to me, but if anyone else wants to take a crack, please do.

Damon Sicore (:damons)

Comment 7

•

15 years ago

Update?

Boris Zbarsky [:bzbarsky]

Comment 8

•

15 years ago

Attached patch Patch to time currentset setter — Details — Splinter Review

OK, here's the state of things as I see it.

I can maybe reproducea 2% Txul regression on Mac. Hard to tell; the numbers are noisy.  I looked at the trace logs, and it seemed like most of the newly-acquired branch exits were in the "currentset" setter in toolkit/content/widgets/toolbar.xml

Measuring just this function locally using the patch below, it seems to have gotten about 1ms slower with the getter landing (going from 5ms to 6ms usually on 1/3 of its invocations; going from 0-1 to 0-2 on the other 2/3).  My baseline is about 110ms, so this accounts for at least half the regression I observe.

Boris Zbarsky [:bzbarsky]

Comment 9

•

15 years ago

OK, so I created some time profiles and malloc traces of that setter in shark, before and after the getter patch.  The time profiles do in fact show us taking more time in that setter (by about 1ms or so on my machine); the difference is entirely accounted for by time spent under TraceRecorder::monitorRecording.  About half of this is under closeLoop; the rest is scattered over various TraceRecorder stuff.

I sent Andreas a copy of the TMFLAGS=full,abort,stats log for Txul after the getter change.  I'm hoping he'll see something sane.  There are about 100 more branch exits than there used to be before the getter patch; fewer aborts and less blacklisting.  So it looks like the issue is that we try to trace more code (in particular this setter), and it just doesn't want to trace very well.  for one thing, it's reasonably branchy in the loops, looks like.

The malloc log shows about 80KB more malloc allocations coming from nanojit::Allocator::allocChunk; about a third of that is coming from nanojit::LirBufWriter::insSkip, another 25% from nanojit::LInsHashSet::grow, then insStorei, insLoad, ins2, trackCfgMerges, record_JSOP_CALLPROP, monitorRecording, TraceRecorder::guard, getAnchor.

Time spent under vm_fault went from 0.7% of time in this setter to 5%; that's about 1/8-1/4 of the regression I'm seeing right there....

Boris Zbarsky [:bzbarsky]

Comment 10

•

15 years ago

Attached file The "after" TMFLAGS=full log — Details

Andreas Gal :gal

Assignee

Comment 11

•

15 years ago

Attached patch patch (obsolete) — Details — Splinter Review

Andreas Gal :gal

Assignee

Comment 12

•

15 years ago

Attached patch patch — Details — Splinter Review

Attachment #405586 - Attachment is obsolete: true

Andreas Gal :gal

Assignee

Updated

•

15 years ago

Attachment #405587 - Flags: review?(graydon)

Graydon Hoare :graydon

Updated

•

15 years ago

Attachment #405587 - Flags: review?(graydon) → review+

Andreas Gal :gal

Assignee

Comment 13

•

15 years ago

The attached patch should make the tempAlloc go to the malloc heap less often. That might help with vm_faults. Need confirmation from bz to proceed.

Andreas Gal :gal

Assignee

Comment 14

•

15 years ago

Attached patch patch — Details — Splinter Review

And this time without overflowing the allocated blob size.

Andreas Gal :gal

Assignee

Comment 15

•

15 years ago

Attached patch avoid JSArenas in snapshot(), since JSArenas are evil — Details — Splinter Review

Assignee: jorendorff → gal

Jason Orendorff [:jorendorff]

Comment 16

•

15 years ago

Wow, nice effort bz+sayrer+gal+dvander!

Andreas Gal :gal

Assignee

Updated

•

15 years ago

Depends on: 521880

Andreas Gal :gal

Assignee

Updated

•

15 years ago

Depends on: 521881

Brendan Eich [:brendan]

Comment 17

•

15 years ago

Arena pools give back memory promptly to the malloc heap -- that's not evil. Your point must be less moralistic: stack or other "inline" allocation (jsvector can do this) beats heap (whether chunked by arena code or not). That's true!

/be

Damon Sicore (:damons)

Comment 18

•

15 years ago

Unblocking the beta on this.  P2.

Priority: P1 → P2

Damon Sicore (:damons)

Comment 19

•

15 years ago

[10:04am] bz: here are the things we do know
[10:04am] bz: 1)  We checked in a patch
[10:04am] bz: 2)  That patch causes a small but measurable (1-3%, depending on who's measuring) regression in Txul
[10:04am] bz: 3)  That patch causes a small but measurable win (10% or so) in microbenchmarks that specifically exercise that code.
[10:05am] bz: 4)  That patch has the potential to lead to significant wins in bigger programs that have a lot of JS computation and some DOM access in an inner loop
[10:05am] bz: 5)  I can't give you an exampe of such a program off the top of my head, though I could certainly write one
[10:05am] damons: 6)  Something else regressed Tdhtml on Oct 4.
[10:05am] bz: yes
[10:06am] bz: 7) At least half the measured Txul regression can be attributed to a regression in the runtime of a single function
[10:07am] bz: I say "at least" because it's hard to tell for sure given the low resolution of date objects in JS
[10:08am] bz:   We have a full TM log of a Txul run (attached in the bug)
[10:08am] bz: 9)  Someone who understands such logs might want to examine the parts of it dealing with the above function.
[10:08am] damons: So, to boil this down a bit, Are we willing to take the 1-3% txul regression for a 10% microbenchmark gain with potential for big payoff--in the context of delaying 3.6 beta to look for something we may never find?
[10:08am] bz: #9 is my opinion, not fact.
[10:09am] bz: I think the right course of action is probably to ship the beta
[10:09am] damons: with the regression?
[10:09am] bz: but continue looking into the txul hit here
[10:09am] bz: yes
[10:09am] damons: sayrer: ?
[10:09am] sayrer: yeah, I think so too
[10:10am] bz: The other obvious options are to not ship tracing getters at all in 1.9.2
[10:10am] damons: alright.. I'll unblock on this.
[10:10am] bz: or to hold the beta on the off chance that we need major surgery to fix the txul hit
[10:10am] bz: and that we'll block final on the txul hit
[10:10am] jorendorff: Are we going to block final on the txul hit?
[10:10am] bz: that is another excellent question

Andreas Gal :gal

Assignee

Comment 20

•

15 years ago

So looks like we made up 1% of the 3%. The patch is on m-c.

I did look through the TM log bz attached. There is nothing obvious that goes wrong. We compile a bunch of stuff, and run it. Most loops don't run for a long time (low iteration count), so we end up not winning any time on compiling.

The patch mentioned above makes compilation faster in general. That helped us making up some time. We will have to land more patches like that to make up the regression, but this isn't really fixing something, we are just making something else better to hide it again.

One of the problems seems to be that since this is code opening a new window, its global is different every time (the window). So we re-record at every window open. Thats a bit unfortunate. I will discuss that with brendan, but we did make a choice to special for global objects, so I am not sure we want to change that.

Robert Sayre

Reporter

Comment 21

•

15 years ago

(In reply to comment #20)
> 
> The patch mentioned above makes compilation faster in general. That helped us
> making up some time. We will have to land more patches like that to make up the
> regression, but this isn't really fixing something, we are just making
> something else better to hide it again.

This tradeoff seems inherent in the approach we're taking, and for compilation in general when it comes to web content (it's easier to amortize up front compilation costs for repeatedly executed chrome). I don't think it's hiding anything to make compilation or recording cheaper, though we might make bigger strides on these benchmarks with your suggested investigations.

Mike Shaver (:shaver -- probably not reading bugmail closely)

Comment 22

•

15 years ago

Do we need to hold the 3.6 release for this?  My sense, given 3.6's performance relative to 3.5 and the scale of this regression, is that we do not.

Who dares oppose?!?!

I mean: comments?

Andreas Gal :gal

Assignee

Comment 23

•

15 years ago

We did win back about 30-50% of this, so we are not looking at the full hit any more.

Shawn Wilsher :sdwilsh

Comment 24

•

15 years ago

(In reply to comment #22)
> Do we need to hold the 3.6 release for this?  My sense, given 3.6's performance
> relative to 3.5 and the scale of this regression, is that we do not.
Well, in the past we've said no performance regressions unless we have a really really really good reason (like security).  We've also said that we don't take a regression because it gets us a win in one or more other places unless we have a good reason again.  I've had to backout many sqlite upgrade because of the second reason.  Perhaps the justification has been mentioned in this bug in the various comments, but having a summary of why we should take the regression should be included.  Also, I don't think this bug is the right place for this conversation.  dev.tree-management and/or dev.platform would get a wider audience (unless this is strictly a driver decision).

Mike Shaver (:shaver -- probably not reading bugmail closely)

Comment 25

•

15 years ago

Blocking is a driver decision, yes.  If you're arguing that we should back it out, I think that the time for it has past, but would be a thread for d.t-m.

The second paragraph of comment 9 describes the situation, though: as we trace more code, we sometimes cause existing slow-compilation cases to be attempted unprofitably.  I don't think this is avoidable unless we believe that we can go from the current state to "trace *everything* profitably" in one step; I very much think that's unrealistic.

The performance discontinuities we see here (and elsewhere when a single odd part can disrupt the performance of the rest of the loop which might otherwise be well-traced) seem to be inherent to tracing, so we're going to have to just push through them as we complete our coverage.

Andreas Gal :gal

Assignee

Comment 26

•

15 years ago

The root cause seems to be that we specialize per global object to shortcut security checks. This code is run per every new window opening, so we specialize and recompile anew for that new global object (window). Undoing that basic decision would make a whole bunch of other cases slower. I agree with a very strict "only better" policy, but in this particular case that would mean never compile code that looks like the one in the window opening path, which is not realistic. Also, please not that the JS chrome code in question is supremely unpretty and probably ready for a rewrite which might help making this faster, too.

Mike Shaver (:shaver -- probably not reading bugmail closely)

Comment 27

•

15 years ago

Unblocking, per previous comments.

Flags: blocking1.9.2+ → blocking1.9.2-

Nochum Sossonko [:Natch]

Comment 28

•

14 years ago

These bugs are all part of a search I made for js bugs that are getting lost in transit:

http://tinyurl.com/jsDeadEndBugs

They all have a review+'ed, non-obsoleted patch and are not marked fixed-in-tracemonkey or checkin-needed but have not seen any activity in 300 days. Some of these got lost simply because the assignee/patch provider never requested a checkin, or just because they were forgotten about.

Ryan VanderMeulen [:RyanVM]

Comment 29

•

13 years ago

Obsolete with the removal of tracejit.

Status: NEW → RESOLVED

Closed: 13 years ago

Resolution: --- → WONTFIX

You need to log in before you can comment on or make changes to this bug.