Closed Bug 1494976 Opened 6 years ago Closed 6 years ago

0.8% installer size (osx-cross) regression on push 2aec596947892a0c1aa78bba5e3a1d861140fde6 (Fri Sep 28 2018)

Categories

(Core :: JavaScript: WebAssembly, defect)

defect
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: igoldan, Unassigned)

References

Details

(Keywords: regression)

We have detected a build metrics regression from push:

https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?changeset=2aec596947892a0c1aa78bba5e3a1d861140fde6

As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

  >550KBytes  installer size osx-cross opt      72,844,288.00 -> 73,427,547.83


You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=16230

On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the jobs in a pushlog format.

To learn more about the regressing test(s), please see: https://developer.mozilla.org/en-US/docs/Mozilla/Performance/Automated_Performance_Testing_and_Sheriffing/Build_Metrics
Component: General → Javascript: Web Assembly
Product: Testing → Core
Flags: needinfo?(bbouvier)
Note: this happens on all other platforms, including Android.
Thanks for the heads up!

- This is expected since we're including much more code in Spidermonkey when building Cranelift (a new code generator written in Rust, included in Spidermonkey).
- Note that Cranelift gets built only on Nightly at the moment, so this regression will *not* propagate until we enable Cranelift by default.
- In the long term, we'll probably remove some other code in Spidermonkey to replace it by Cranelift code, and there's a chance  we'll want to take the code size regression at this time.

So I think we might close as WONTFIX at the moment. That being said, I'd like to make sure there are no Rust lib features we could disable that could lower the installer's size.
+1
The builds times on Linux have also increased. This is something we need to look closer into.

== Change summary for alert #16230 (as of Fri, 28 Sep 2018 04:14:31 GMT) ==

Regressions:

 51%  sccache requests_not_cacheable windows-mingw32 all 64 clang opt taskcluster-m5.4xlarge       37.00 -> 56.00
 51%  sccache requests_not_cacheable windows-mingw32 all 64 clang debug taskcluster-m4.4xlarge     37.00 -> 56.00
  8%  build times windows2012-64 opt plain taskcluster-c4.4xlarge                                  1,732.08 -> 1,873.39
  6%  build times linux64 opt taskcluster-m5.4xlarge tup                                           960.94 -> 1,014.97
  4%  build times windows2012-64 debug plain taskcluster-c4.4xlarge                                1,627.61 -> 1,694.20
  4%  build times linux64 pgo taskcluster-c4.4xlarge                                               5,141.14 -> 5,331.48

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=16230
We've investigated the code size issue and there's going to be a fix in the next version of Cranelift ( https://github.com/CraneStation/cranelift/pull/531 ).

Regarding build time, I've found that force-cargo-library-build takes some time even after a fresh build, which could explain the large sccache regression. Nathan, is that expected?

For the other platforms, there has to be a toll because we're including new code. In particular, I expect this could be especially slow due to: Rust code being slower to compile than C++; we're compiling/linking too much code because of the above Github issue; linking/LTO takes more time, etc.
Flags: needinfo?(bbouvier) → needinfo?(nfroyd)
(In reply to Benjamin Bouvier [:bbouvier] from comment #5)
> We've investigated the code size issue and there's going to be a fix in the
> next version of Cranelift (
> https://github.com/CraneStation/cranelift/pull/531 ).

Well, a fix for one of the issues, at least. ;)

I see bits and pieces of non-x86 things in my Nightly Firefox, which suggests that we're also building things we don't need in cranelift?

> Regarding build time, I've found that force-cargo-library-build takes some
> time even after a fresh build, which could explain the large sccache
> regression. Nathan, is that expected?

I expect the initial sccache regression to be from generating and caching new objects; I would not expect force-cargo-library-build after a fresh build to take more time in a cranelift-enabled vs. cranelift-disabled scenario.  Perhaps cranelift is not being careful about mtimes of any files it generates, and thus we're inadvertently forcing recompiles of things that don't need to be compiled?  I'm not completely sure of the interaction with sccache here, forwarding ni? to Ted to see if he has ideas.
Flags: needinfo?(nfroyd) → needinfo?(ted)
(In reply to Ionuț Goldan [:igoldan], Performance Sheriffing from comment #4)
>  51%  sccache requests_not_cacheable windows-mingw32 all 64 clang opt
> taskcluster-m5.4xlarge       37.00 -> 56.00
>  51%  sccache requests_not_cacheable windows-mingw32 all 64 clang debug
> taskcluster-m4.4xlarge     37.00 -> 56.00

The `requests_not_cacheable` regression is almost certainly just due to the fact that we're building more Rust crates and some of those are build scripts or proc macros which we can't cache. I don't think this is a big issue. Maybe we should change this metric to be more like "percentage of requests that were not cacheable?" The thing we're mostly worried about catching is "did we stop being able to cache most of our compiles?".

>   8%  build times windows2012-64 opt plain taskcluster-c4.4xlarge           
> 1,732.08 -> 1,873.39
>   6%  build times linux64 opt taskcluster-m5.4xlarge tup                    
> 960.94 -> 1,014.97
>   4%  build times windows2012-64 debug plain taskcluster-c4.4xlarge         
> 1,627.61 -> 1,694.20
>   4%  build times linux64 pgo taskcluster-c4.4xlarge                        
> 5,141.14 -> 5,331.48

None of these builds use sccache--plain builds by design (they're there to give us a more accurate accounting of build times without caching), tup because sccache doesn't currently work with tup, and pgo builds just don't use sccache because most of the work they do is not cacheable.

I don't think there's anything actionable here--we added a bunch more code and that takes more time to build.
Flags: needinfo?(ted)
QA Contact: ajones
QA Contact: ajones
with JS and Build teams accepting this, lets marks this as wontfix.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.