Closed Bug 517909 Opened 15 years ago Closed 13 years ago

Perf regression August 25th

Categories

(Core :: JavaScript Engine, defect, P2)

x86
macOS
defect

Tracking

()

RESOLVED WONTFIX

People

(Reporter: sayrer, Assigned: gal)

References

Details

Attachments

(9 files, 1 obsolete file)

I think this is probably tracing getters. Details coming.
Attached image tsspider windows xp
Attached image txul windows xp
Attached image txul ubuntu 7.10
assigning to jorendorff on the assumption that it was tracing getters that caused this.
Assignee: general → jorendorff
Flags: blocking1.9.2+
Priority: -- → P1
Any progress here?
Attached patch txul += sharkSplinter Review
Here's the patch to txul I've been using to get profiles.

The profiles have not been helpful to me, but if anyone else wants to take a crack, please do.
Update?
OK, here's the state of things as I see it.

I can maybe reproducea 2% Txul regression on Mac. Hard to tell; the numbers are noisy.  I looked at the trace logs, and it seemed like most of the newly-acquired branch exits were in the "currentset" setter in toolkit/content/widgets/toolbar.xml

Measuring just this function locally using the patch below, it seems to have gotten about 1ms slower with the getter landing (going from 5ms to 6ms usually on 1/3 of its invocations; going from 0-1 to 0-2 on the other 2/3).  My baseline is about 110ms, so this accounts for at least half the regression I observe.
OK, so I created some time profiles and malloc traces of that setter in shark, before and after the getter patch.  The time profiles do in fact show us taking more time in that setter (by about 1ms or so on my machine); the difference is entirely accounted for by time spent under TraceRecorder::monitorRecording.  About half of this is under closeLoop; the rest is scattered over various TraceRecorder stuff.

I sent Andreas a copy of the TMFLAGS=full,abort,stats log for Txul after the getter change.  I'm hoping he'll see something sane.  There are about 100 more branch exits than there used to be before the getter patch; fewer aborts and less blacklisting.  So it looks like the issue is that we try to trace more code (in particular this setter), and it just doesn't want to trace very well.  for one thing, it's reasonably branchy in the loops, looks like.

The malloc log shows about 80KB more malloc allocations coming from nanojit::Allocator::allocChunk; about a third of that is coming from nanojit::LirBufWriter::insSkip, another 25% from nanojit::LInsHashSet::grow, then insStorei, insLoad, ins2, trackCfgMerges, record_JSOP_CALLPROP, monitorRecording, TraceRecorder::guard, getAnchor.

Time spent under vm_fault went from 0.7% of time in this setter to 5%; that's about 1/8-1/4 of the regression I'm seeing right there....
Attached patch patch (obsolete) — Splinter Review
Attached patch patchSplinter Review
Attachment #405586 - Attachment is obsolete: true
Attachment #405587 - Flags: review?(graydon)
Attachment #405587 - Flags: review?(graydon) → review+
The attached patch should make the tempAlloc go to the malloc heap less often. That might help with vm_faults. Need confirmation from bz to proceed.
Attached patch patchSplinter Review
And this time without overflowing the allocated blob size.
Assignee: jorendorff → gal
Wow, nice effort bz+sayrer+gal+dvander!
Depends on: 521880
Depends on: 521881
Arena pools give back memory promptly to the malloc heap -- that's not evil. Your point must be less moralistic: stack or other "inline" allocation (jsvector can do this) beats heap (whether chunked by arena code or not). That's true!

/be
Unblocking the beta on this.  P2.
Priority: P1 → P2
[10:04am] bz: here are the things we do know
[10:04am] bz: 1)  We checked in a patch
[10:04am] bz: 2)  That patch causes a small but measurable (1-3%, depending on who's measuring) regression in Txul
[10:04am] bz: 3)  That patch causes a small but measurable win (10% or so) in microbenchmarks that specifically exercise that code.
[10:05am] bz: 4)  That patch has the potential to lead to significant wins in bigger programs that have a lot of JS computation and some DOM access in an inner loop
[10:05am] bz: 5)  I can't give you an exampe of such a program off the top of my head, though I could certainly write one
[10:05am] damons: 6)  Something else regressed Tdhtml on Oct 4.
[10:05am] bz: yes
[10:06am] bz: 7) At least half the measured Txul regression can be attributed to a regression in the runtime of a single function
[10:07am] bz: I say "at least" because it's hard to tell for sure given the low resolution of date objects in JS
[10:08am] bz:   We have a full TM log of a Txul run (attached in the bug)
[10:08am] bz: 9)  Someone who understands such logs might want to examine the parts of it dealing with the above function.
[10:08am] damons: So, to boil this down a bit, Are we willing to take the 1-3% txul regression for a 10% microbenchmark gain with potential for big payoff--in the context of delaying 3.6 beta to look for something we may never find?
[10:08am] bz: #9 is my opinion, not fact.
[10:09am] bz: I think the right course of action is probably to ship the beta
[10:09am] damons: with the regression?
[10:09am] bz: but continue looking into the txul hit here
[10:09am] bz: yes
[10:09am] damons: sayrer: ?
[10:09am] sayrer: yeah, I think so too
[10:10am] bz: The other obvious options are to not ship tracing getters at all in 1.9.2
[10:10am] damons: alright.. I'll unblock on this.
[10:10am] bz: or to hold the beta on the off chance that we need major surgery to fix the txul hit
[10:10am] bz: and that we'll block final on the txul hit
[10:10am] jorendorff: Are we going to block final on the txul hit?
[10:10am] bz: that is another excellent question
So looks like we made up 1% of the 3%. The patch is on m-c.

I did look through the TM log bz attached. There is nothing obvious that goes wrong. We compile a bunch of stuff, and run it. Most loops don't run for a long time (low iteration count), so we end up not winning any time on compiling.

The patch mentioned above makes compilation faster in general. That helped us making up some time. We will have to land more patches like that to make up the regression, but this isn't really fixing something, we are just making something else better to hide it again.

One of the problems seems to be that since this is code opening a new window, its global is different every time (the window). So we re-record at every window open. Thats a bit unfortunate. I will discuss that with brendan, but we did make a choice to special for global objects, so I am not sure we want to change that.
(In reply to comment #20)
> 
> The patch mentioned above makes compilation faster in general. That helped us
> making up some time. We will have to land more patches like that to make up the
> regression, but this isn't really fixing something, we are just making
> something else better to hide it again.

This tradeoff seems inherent in the approach we're taking, and for compilation in general when it comes to web content (it's easier to amortize up front compilation costs for repeatedly executed chrome). I don't think it's hiding anything to make compilation or recording cheaper, though we might make bigger strides on these benchmarks with your suggested investigations.
Do we need to hold the 3.6 release for this?  My sense, given 3.6's performance relative to 3.5 and the scale of this regression, is that we do not.

Who dares oppose?!?!

I mean: comments?
We did win back about 30-50% of this, so we are not looking at the full hit any more.
(In reply to comment #22)
> Do we need to hold the 3.6 release for this?  My sense, given 3.6's performance
> relative to 3.5 and the scale of this regression, is that we do not.
Well, in the past we've said no performance regressions unless we have a really really really good reason (like security).  We've also said that we don't take a regression because it gets us a win in one or more other places unless we have a good reason again.  I've had to backout many sqlite upgrade because of the second reason.  Perhaps the justification has been mentioned in this bug in the various comments, but having a summary of why we should take the regression should be included.  Also, I don't think this bug is the right place for this conversation.  dev.tree-management and/or dev.platform would get a wider audience (unless this is strictly a driver decision).
Blocking is a driver decision, yes.  If you're arguing that we should back it out, I think that the time for it has past, but would be a thread for d.t-m.

The second paragraph of comment 9 describes the situation, though: as we trace more code, we sometimes cause existing slow-compilation cases to be attempted unprofitably.  I don't think this is avoidable unless we believe that we can go from the current state to "trace *everything* profitably" in one step; I very much think that's unrealistic.

The performance discontinuities we see here (and elsewhere when a single odd part can disrupt the performance of the rest of the loop which might otherwise be well-traced) seem to be inherent to tracing, so we're going to have to just push through them as we complete our coverage.
The root cause seems to be that we specialize per global object to shortcut security checks. This code is run per every new window opening, so we specialize and recompile anew for that new global object (window). Undoing that basic decision would make a whole bunch of other cases slower. I agree with a very strict "only better" policy, but in this particular case that would mean never compile code that looks like the one in the window opening path, which is not realistic. Also, please not that the JS chrome code in question is supremely unpretty and probably ready for a rewrite which might help making this faster, too.
Unblocking, per previous comments.
Flags: blocking1.9.2+ → blocking1.9.2-
These bugs are all part of a search I made for js bugs that are getting lost in transit:

http://tinyurl.com/jsDeadEndBugs

They all have a review+'ed, non-obsoleted patch and are not marked fixed-in-tracemonkey or checkin-needed but have not seen any activity in 300 days. Some of these got lost simply because the assignee/patch provider never requested a checkin, or just because they were forgotten about.
Obsolete with the removal of tracejit.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: