Closed Bug 1276278 Opened 8 years ago Closed 6 years ago

Interpreter.cpp takes 165+s to compile on Windows

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: gps, Unassigned)

References

()

Details

Attachments

(2 files)

Attached image cl-profiling.png
Unified_cpp_js_src31.cpp takes over 3 minutes to compile on Windows with VS2015u2 (although possibly not limited to that VS version). This cl.exe process lingers for 2+ minutes after all other Unified_cpp_js_src*.cpp cl.exe processes have finished. In other words, this is slowing down the build by 2+ minutes.

I /think/ this is a recent regression (last week or two).

On my machine, the unified source file is including:

* vm/GeneratorObject.cpp
* vm/GlobalObject.cpp
* vm/HelperThreads.cpp
* vm/Id.cpp
* vm/Interpreter.cpp
* vm/JSONParser.cpp

I sampled cl.exe at 1000Hz and the attached screenshot from Windows Performance Analyzer shows where the time is spent in cl.exe. Not sure if that's helpful.
The problem is Interpreter.cpp: it takes ~165s to compile on its own. This is on an i7-6700K, which has one of the fastest CPU cores you can buy at the moment.
(In reply to Gregory Szorc [:gps] from comment #1)
> The problem is Interpreter.cpp: it takes ~165s to compile on its own. This
> is on an i7-6700K, which has one of the fastest CPU cores you can buy at the
> moment.

This outcome is basically expected; Interpreter.cpp has one large function (one of the biggest, if not the biggest, in Gecko) that has very peculiar control flow, which probably drives up the time spent in the optimizer.
VS 2015 update 3 will have a new optimizer, it will be interesting to see if that improves performance. The profile shows most time is spent there.

Maybe we can inline less aggressively in Interpret.

A regression window would be very useful...
It still takes ~165s to compile on a48fafcdd36f, which is from ~1 month ago. So I was wrong about this being a recent regression. I guess I just started noticing.
Perhaps we could submit this source file to Microsoft as a "stress test" for their compiler in hopes they can optimize it in future toolchain versions? My sampling trace clearly indicates some hot spots. Although I'm not sure if there's anything they can do given the nature of the problem (I'm no compiler guru).
Summary: Unified_cpp_js_src31.cpp takes over 3 minutes to compile on Windows → Interpreter.cpp takes 165+s to compile on Windows
(In reply to Gregory Szorc [:gps] from comment #5)
> Perhaps we could submit this source file to Microsoft as a "stress test" for
> their compiler in hopes they can optimize it in future toolchain versions?
> My sampling trace clearly indicates some hot spots. Although I'm not sure if
> there's anything they can do given the nature of the problem (I'm no
> compiler guru).

They might be interested, yes.  It's possible one of their optimization passes isn't doing the right thing.
FWIW, gcc 5.2 builds Unified_cpp_js_src31.o in ~17s on my machine, in par with other unified sources in the same directory, so this is really limited to MSVC.
Also FWIW, on an EC2 instance, building Unified_cpp_js_src31.obj with 2015 update 2 takes ~122s and building it with 14.0.24104-pre (a version that contains the new optimizer) takes ~121s (if I didn't mess up my attempt at using it), so about the same time.
(In reply to Mike Hommey [:glandium] from comment #8)
> Also FWIW, on an EC2 instance, building Unified_cpp_js_src31.obj with 2015
> update 2 takes ~122s

<sidetrack> Which build configuration is that? EC2 instances are running Intel Xeons from a few generations ago and don't have 4.0 GHz clock speeds: I'm surprised you report it taking shorter than my i7-6700K (Skylake at 4.0GHz). I'm building without a mozconfig and Unified_cpp_js_src31.obj takes ~153s (best of 3 runs, all of which were very consistent).

Here's my cl.exe invocation as reported by Process Explorer:

cl.EXE -FoUnified_cpp_js_src31.obj -c -DNDEBUG=1 -DTRIMMED=1 -D_CRT_RAND_S -DENABLE_BINARYDATA -DENABLE_SIMD -DENABLE_SHARED_ARRAY_BUFFER -DEXPORT_JS_API -DJS_HAS_CTYPES -DDLL_PREFIX=\"\" -DDLL_SUFFIX=\".dll\" -DFFI_BUILDING -Ic:/dev/src/firefox-unified/js/src -Ic:/dev/src/firefox-unified/obj-i686-pc-mingw32/js/src -Ic:/dev/src/firefox-unified/obj-i686-pc-mingw32/js/src/ctypes/libffi/include -Ic:/dev/src/firefox-unified/intl/icu/source/common -Ic:/dev/src/firefox-unified/intl/icu/source/i18n -Ic:/dev/src/firefox-unified/obj-i686-pc-mingw32/dist/include -Ic:/dev/src/firefox-unified/obj-i686-pc-mingw32/dist/include/nspr -MD -FI c:/dev/src/firefox-unified/obj-i686-pc-mingw32/js/src/js-confdefs.h -DMOZILLA_CLIENT -TP -nologo -wd4345 -wd4351 -wd4800 -wd4819 -wd4595 -D_CRT_SECURE_NO_WARNINGS -wd5026 -wd5027 -Zc:sizedDealloc- -Zc:threadSafeInit- -D_HAS_EXCEPTIONS=0 -W3 -Gy -arch:SSE2 -FS -wd4244 -wd4267 -wd4251 -we4553 -GR- -Zi -O2 -Oy -wd4805 -wd4661 -we4067 -we4258 -we4275 -wd4146 -wd4577 -wd4312 -Fdgenerated.pdb c:/dev/src/firefox-unified/obj-i686-pc-mingw32/js/src/Unified_cpp_js_src31.cpp -showIncludes
cl.exe -FoUnified_cpp_js_src31.obj -c   -DNDEBUG=1 -DTRIMMED=1 -D_CRT_RAND_S -DENABLE_BINARYDATA -DENABLE_SIMD -DENABLE_SHARED_ARRAY_BUFFER -DEXPORT_JS_API -Ic:/Users/Administrator/mozilla-central/js/src -Ic:/Users/Administrator/mozilla-central/obj-i686-pc-mingw32/js/src -Ic:/Users/Administrator/mozilla-central/intl/icu/source/common -Ic:/Users/Administrator/mozilla-central/intl/icu/source/i18n -Ic:/Users/Administrator/mozilla-central/obj-i686-pc-mingw32/dist/include  -Ic:/Users/Administrator/mozilla-central/obj-i686-pc-mingw32/dist/include/nspr         -MD -FI c:/Users/Administrator/mozilla-central/obj-i686-pc-mingw32/js/src/js-confdefs.h -DMOZILLA_CLIENT   -TP -nologo -wd4345 -wd4351 -wd4800 -wd4819 -wd4595 -D_CRT_SECURE_NO_WARNINGS -wd5026 -wd5027 -Zc:sizedDealloc- -Zc:threadSafeInit- -D_HAS_EXCEPTIONS=0 -W3 -Gy -arch:SSE2 -FS -wd4244 -wd4267 -wd4251 -we4553 -GR-  -Zi -O2 -Oy  -wd4805 -wd4661 -we4067 -we4258 -we4275 -wd4146 -wd4577 -wd4312 -Fdgenerated.pdb  c:/Users/Administrator/mozilla-central/obj-i686-pc-mingw32/js/src/Unified_cpp_js_src31.cpp

Note that was a --enable-project=js --disable-shared-js build, but that shouldn't matter.

Windows reports this as CPU: Xeon E5-2666 v3 @ 2.90GHz.
(same timing in a Firefox build, unsurprisingly)
I built with your exact arguments modulo the paths and it didn't appear to change timings.

I tried setting the processor affinity for the process to a single core, but that didn't improve things.

I changed the processor priority to its highest setting and that didn't move the needle.

Task Manager is saying my CPU is clocked at 4.0+GHz for the duration of the process lifetime (then it backs off when idle).

I'm, uh, slightly disturbed by this disparity.

One thing I can think of is CPU cache: the Xeons should have more of it. If we're in some tight-ish loops, extra CPU cache could matter. FWIW, the E5-2666 is a custom SKU made for Amazon. IIRC it has better base and burst CPU frequency than retail Xeons. But that's still a Haswell versus a Skylake. I want to believe the cycle-for-cycle efficiency of the Skylake is greater.
One workaround might be to change the build system to build SOURCES first, before UNIFIED_SOURCES, and then putting Interpreter.cpp in SOURCES.  That way a core will end up building Interpreter.cpp while everything else is going on, hopefully having everything finish at roughly the same time.  (The Interpreter.cpp build takes *5 minutes* on my Dell XPS 15 laptop; 2.75Ghz Core i7, VS 2015u2)
(In reply to Gregory Szorc [:gps] from comment #13)
> Filed
> https://connect.microsoft.com/VisualStudio/feedback/details/2757523/
> compiling-firefox-spidermonkey-javascript-engines-interpreter-cpp-takes-2-
> minutes-10x-slower-than-gcc-clang upstream.

They need more information from us. Did you provide it somehow or do we still have to do that?
Flags: needinfo?(gps)
Attached file Interpreter.i
Preprocessed output for Interpreter.cpp.
OK, hopefully they have enough details now. Not sure why my uploads never went through :(
Flags: needinfo?(gps)
See Also: → 1325686
See Also: → 1262241
should this be resolved FIXED and/or bug 1325686 be backed out now that clang is used on windows?
Flags: needinfo?(gps)
We still support building on MSVC. But Clang is the default and MSVC optimization is now a lower priority. So if the owners of this code want to resolve this bug as WONTFIX or WORKSFORME or whatnot, you will have no objections from me :)
Flags: needinfo?(gps)
Yep. MSVC is on its way out, so we don't care about its compile times. As for backing out bug 1325686 -- I don't know how long clang takes to compile this file, but just based on the idea that having all compilation units take roughly the same amount of time is good, and that compilation time is correlated with line count, I think I'd rather keep it separate.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: