Severe performance degradation 11.1 => 11.3

RESOLVED FIXED

Status

defect
--
blocker
RESOLVED FIXED
7 years ago
7 years ago

People

(Reporter: tharwood, Assigned: mcgachey)

Tracking

Details

Attachments

(4 attachments)

Attached ABC run time went from 0.6s using 11.1 to 44s using 11.3.
Attachment #625667 - Attachment mime type: text/plain → application/binary
Note: You'll need to compile this with Falcon; I don't think ASC can successfully compile it, and it certainly won't constant-fold the same way Falcon does.
Attachment #625668 - Attachment mime type: application/applefile → text/plain
I just built the Cyril avmshell locally (@HEAD) on windows and can't reproduce the problem.

$ TIMEFORMAT=%R
$ time avmshell ConstantFoldTest.abc
7657 out of 7657 passed
0.449

Where did the avmshell that was checked into Falcon come from?

Actually, it looks the the version supposing to be the release one is actually a debug-debugger, which would account for the slowdown.


trbaker@trbaker-w510 
$ pwd
tools/avm/11.3/windows
$ ./avmshell
You must provide input files, -repl, or -Dselftest, or the executable must be a projector file.
avmplus shell 2.1 sec debug-debugger build 8412:14d50c6a4a64
Well, there may still something here.  It seems the repro steps should be to use the debug debugger build.  The release builds appear comparable.
Debug Debugger built from p4 /Milestones/Cyril
$ TIMEFORMAT=%R;./avmshell |grep "avmplus shell";time ./avmshell ConstantFoldTest.abc
avmplus shell 2.1 sec debug-debugger build cyclone
7657 out of 7657 passed
19.150

Debug Debugger build from Anza
$ TIMEFORMAT=%R;./avmshell |grep "avmplus shell";time ./avmshell_sd ConstantFoldTest.abc
avmplus shell 2.0 sec release build Serrano_Anza CL #956398
7657 out of 7657 passed
6.206

Cyril is ~3x slower than Anza with this testcase
Sorry, my grep on the shell for the version string was off on the Anza command above.  the results are valid though.  Here is the correct call for grep with roughly the same result:

$ TIMEFORMAT=%R;./avmshell_sd |grep "avmplus shell";time ./avmshell_sd ConstantFoldTest.abc 
avmplus shell 2.0 sec debug-debugger build Serrano_Anza CL #956398
7657 out of 7657 passed
6.457
I built the Serrano_Anza shell locally and reproduced the 6s time.
The Milestones/Brannan shell takes 18s, so something between Anza and Brannan blew this out.  

I tried bisecting in hg but it always takes >16s there for some reason.
How do Release-Debugger builds compare?

I agree we should look into this, but performance issues on Debug builds should not be a high priority.
(In reply to Felix S Klock II from comment #7)
> How do Release-Debugger builds compare?

(I asked this question under the assumption that when you said "release builds" in comment 3, you were referring to "release non-debugger builds", and thus the switch to Debug-Debugger builds is changing two different parameters, and so it behooves us to isolate whether Debug-alone regresses or Debugger-alone regresses or if both knobs need to get turned to replicate the regression)
release-debugger anza vs brannan are approx equivalent.  The large diff is with debug.

This matters somewhat to Falcon as they use debug builds to test with verbose exceptions, and a 600% slowdown adds up.
comment #9 yikes, bad math there.  200% increase in exec time, at least in this one testcase.
The debug builds include an assertion in the NanoJIT memory management code that scans the pages in a memory chunk to verify permissions. This only happens on debug builds, and removing this code brings the benchmark time from ~20s to ~5s on my machine.

The relevant code was added to fix Bug 602264, and is located in nanojit/CodeAlloc.cpp: method CodeAlloc::checkChunkMark, called by CodeAlloc::sanity_check.

Given that these checks were added to solve a specific issue, and since the slowdown occurs only in debug mode, the correctness benefit may be worth the performance impact. Are other benchmarks showing similar slowdowns, or is there something particular to the constant folding test that's triggering this issue more?
For whatever it's worth, I don't think that assertion code really carries it's weight.  The original bug was simply that someone chose the wrong flags for the mapping and assertion checking can't really detect that.  I'd say disable it completely or run it in a more limited way, like checking the mapping for the CodeList at the end of a compile.
We have a precedent for disabling an expensive debug-only check with a command-line option, see -Dnofixedcheck.  In this case, we might want to invert the sense as an opt-in (like -Dselftest).
Looking more at the assertion code, it's not actually checking each page in the chunk; it iterates over each page, but doesn't use the loop invariant when calculating the address for the check. As a result, the code as written is just redundant.

There are three approaches that seem to make sense:
  1: Delete the check entirely;
  2: Reduce the scope of the assertion to make only single check (which is what the code does now);
  3: Fix the code to check each page as intended, and guard it with an opt-in -Dcheckjitpageflags option.
From my (very rough) performance tests, options 1 and 3 should bring the test run time down from 30s to 6.5s, while option 2 will bring it to 8.5s. 

In the absence of any objections, I'm going to go with option 3.
Assignee: nobody → mcgachey
Status: NEW → ASSIGNED
Comment on attachment 645021 [details] [diff] [review]
Patch to make code allocation page permissions testing optional

Review of attachment 645021 [details] [diff] [review]:
-----------------------------------------------------------------

R+
Looks good, but indentation is messed up in a few places.  We do not use tabs in avmplus (spaces only), which helps to avoid such situations.  Please fix before submitting, but looks good for substantive issues.  I'm asking Brent to take a look at this, however, as I'm not sure of the implications of making the check opt-in.  We need to make sure that the opt-in is done where needed to get the test coverage expected.

::: core/LirHelper.cpp
@@ +536,4 @@
>              NanoAssert(b);
>  #elif defined(__MACH30__) && !defined(VMCFG_SHARK)
>              /* mach / osx */
> +            vm_address_t vmaddr = (vm_address_t)n;

Good catch!
Attachment #645021 - Flags: review?(brbaker)
Fixed in CL 1091218
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Adding config parameters to a call to the CodeAlloc constructor.
Halfmoon build fixed in CL 1092100.
changeset: 7503:41819f6c28f3
user:      Philip McGachey <mcgachey@adobe.com>
summary:   Bug 757076: Make page protection check optional (r=wmaddox+)

http://hg.mozilla.org/tamarin-redux/rev/41819f6c28f3
changeset: 7505:745724d20464
user:      Philip McGachey <mcgachey@adobe.com>
summary:   Bug 757076: Update CodeAlloc constructor call in Halfmoon code. (r=mcgachey)

http://hg.mozilla.org/tamarin-redux/rev/745724d20464
Attachment #645021 - Flags: review?(brbaker)
You need to log in before you can comment on or make changes to this bug.