Closed Bug 1321146 Opened 8 years ago Closed 8 years ago

Baldr: PageProtectingVector still adds a bunch of overhead for wasm, especially the baseline

Categories

(Core :: JavaScript Engine: JIT, defect, P1)

defect

Tracking

()

RESOLVED FIXED
Tracking Status
firefox53 --- affected

People

(Reporter: luke, Unassigned)

References

Details

Even after bug 1317033, testing on a recent Unity build (http://files.unity3d.com/jonas/benchmarks/5.5-wasm-release/Release/5.5-wasm-rel.wasmgz) (timing `new WebAssembly.Module(code)`), with --wasm-always-baseline, I see a 14-50% slowdown (14% for --no-threads, gradually increasing up to 50% for --thread-count=4) from turning the page protection in PageProtectingVector on. I also see a 0-10% slowdown (ranging from 0 to 10 threads) for wasm-Ion compilation. E.g., one major source, e.g., is MacroAssemblerX64::finish, which does a bunch of masm.linkJump()s in a loop. But there seems to be a lot more. I know there is some work in progress to improve this, so interested to see how that helps. But if we're not able to get the overhead down to a negligible level, I think we should just turn it off for wasm.
Yeah, in another bug I mentioned we should probably enable this on Nightly + Aurora only for now.
Flags: needinfo?(emanuel.hoogeveen)
(In reply to Jan de Mooij [:jandem] from comment #1) > Yeah, in another bug I mentioned we should probably enable this on Nightly + > Aurora only for now. Sorry about the delay - I was very busy until today. I'll file a bug about this. > E.g., one major source, e.g., is MacroAssemblerX64::finish, which does a bunch of masm.linkJump()s in a loop. But there seems to be a lot more. That sounds like it might benefit from bug 1306972, which I need to rebase. That will also involve adding back a way to disable the protection completely, which we could then use if needed. Out of curiosity how many calls to unprotectRegion/reprotectRegion are you seeing in this benchmark? Just to get an idea of how much individual calls slow it down.
Flags: needinfo?(emanuel.hoogeveen)
Oh, and I'm open to disabling this across the board for now. It still hasn't caught any crashes [1] even though I'm pretty sure the annotations work now. I have some ideas on why that might be, and we might need the protection again later to follow up on that (and it would be nice if it was as fast as possible), but right now it doesn't seem to be doing any good :( [1] https://crash-stats.mozilla.com/search/?moz_crash_reason=~Tried%20to%20access%20a%20protected%20region%21&build_id=%3E%3D20161110030211&product=Firefox&version=52.0a1&version=53.0a1&version=52.0a2&date=%3E%3D2016-11-10T00%3A00%3A00.000Z&date=%3C2017-11-09T00%3A00%3A00.000Z
(In reply to Emanuel Hoogeveen [:ehoogeveen] from comment #2) Thanks for all the help! > Out of curiosity how many calls to unprotectRegion/reprotectRegion are you > seeing in this benchmark? Just to get an idea of how much individual calls > slow it down. With --no-threads, I see 35,402 individual protection calls for the Ion backend and 125,272 calls for the baseline.
(In reply to Emanuel Hoogeveen [:ehoogeveen] from comment #3) > Oh, and I'm open to disabling this across the board for now. This may be a good idea if there is more widespread slowdown: partners, media, and devs often look to Nightly/DevEd to get the most cutting edge perf numbers so it's bitten us before (iirc, with some GC bug-catching measures) when these perform worse than release.
Priority: -- → P1
With bug 1322445 fixed, I now only see ~9k mprotect syscalls in baseline (down from 125k) and no significant perf difference from commenting out the gc:: calls in PageProtectingVector.h. Combined with making this all nightly/deved-only in bug 1321286, seems like this problem is fully addressed. Thanks!
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Oops, I spoke a little too soon: with the patch in bug 1320374, which improves baseline significantly, I can see an ~9% slowdown (from 340ms to 370ms) from all the mprotects. This is probably fine since it's nightly/deved-only. But once we find the mystery bug, it would be kindof nice to eliminate this on nightly/deved too since this is often where people measure.
Okay, that makes sense. Yeah, we can switch back to regular mozilla::Vector once we track down the cause - there may have been some value in protecting used pages, but protecting *unused* pages is pretty pointless except for tracking down a specific problem like this.
Another observation made by dmajor: even with the gc:: calls commented out, there is a non-trivial slowdown that decreases with --thread-count (17%, 15%, 8%, ...) just from PageProtectingVector compared to Vector in MacroAssembler. Measuring both together (now on Windows, comment 7 was Linux) we see a slowdown of 22% for 1 and 2 threads (that diminishes with more threads) of Nightly/Aurora compared to Beta/Release which is pretty significant.
That's pretty big. PageProtectingVector does have some administrative overhead, unfortunately there's no real way to reduce that (it probably got a bit worse with the patch in bug 1320374 though). I wonder if it's the extra branches or if things just aren't getting inlined anymore. That could maybe be fixed by moving some of the logic out-of-line and making other things MOZ_ALWAYS_INLINE, but there's only so much that can be done.
By the way, how hard would it be for me to repeat these tests locally? There are some things I'd like to check. Testing on Windows is not a problem for me. The way we actually disable PageProtectingVector on Aurora is a bit different from just disabling the gc:: calls, so that's one thing I'd like to check the performance of.
We often fetch a big module (like http://webassembly.org/demo/AngryBots/AngryBots13.wasm) and just test how long it takes to do `new WebAssembly.Module(file)` (read using `os.file.readFile(x, 'binary');`). If we can just get our experimental data (one way or another) and then disable on all release channels, then it might not be spending too much time optimizing, though, unless there is low-hanging fruit.
Great, that wasn't hard to set up. Just to confirm, I see a slowdown of ~26.5% (303ms -> 383ms) on that file using m-c with the patch from bug 1320374 and comparing PageProtectingVector to plain mozilla::Vector. I called JSGC_DISABLE_POISONING=1 js --thread-count=1 --wasm-always-baseline wasmtiming.js where wasmtiming.js is a little file that calls `new WebAssembly.Module(file)` and prints the timing.
And the overhead after commenting out the system calls is indeed ~21.5%. That's definitely something I want to improve! Even if this use is temporary, it would be good if people could use PageProtectingVector without worrying too much about performance in the future.
Alright, I have a patch locally that reduces the administrative overhead to ~5% with the gc:: calls commented out. Notably the overhead *with* the system calls is only 6.5% (where it was 26.5% before), so the actual self time of the ~7000 system calls is 1.5% or less!
Looks like bug 1325702 changed things up a bit on m-c for --thread-count=1! Overall performance is worse by about 50%, but it also makes the overhead of PageProtecting vector stand out even more (this is with the latest patch from bug 1320374 as well). The following times are in milliseconds with measurements taken across 200 runs in each case: mean diff min diff Plain mozilla::Vector 447.4 0.00% 444.3 0.00% Patched w/o system calls 482.1 7.74% 478.7 7.73% Patched with system calls 501.5 12.09% 498.3 12.15% Current w/o system calls 571.3 27.67% 567.4 27.71% Current with system calls 630.8 40.97% 626.7 41.04% Obviously this is the worst case scenario, but it also shows a very nice improvement from the redesign. I'll post that patch tomorrow.
Thanks, that's a lot better! Regarding bug 1325702: I was also thinking perhaps we should have cpuCount=1 trigger the sequential path instead of creating 1 helper thread that fights with the validating thread.
You need to log in before you can comment on or make changes to this bug.