Build webrender_api with opt-level=3

enhancement NEW
Unassigned

Status

()

P3
normal
a year ago
4 months ago

People

(Reporter: jrmuizel, Unassigned)

Tracking

(Blocks 2 bugs)

unspecified
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox57 unaffected, firefox58 unaffected)

Details

(Whiteboard: [wr-reserve] [gfx-noted])

(Reporter)

Description

a year ago
Using opt-level=3 makes the serialization code go from:

        push    rbp
        mov     rbp, rsp
        mov     rax, qword ptr [rdi]
        mov     ecx, dword ptr [rsi]
        mov     dword ptr [rax], ecx
        mov     ecx, dword ptr [rsi + 4]
        mov     dword ptr [rax + 4], ecx
        mov     ecx, dword ptr [rsi + 8]
        mov     dword ptr [rax + 8], ecx
        mov     ecx, dword ptr [rsi + 12]
        mov     dword ptr [rax + 12], ecx
        mov     ecx, dword ptr [rsi + 16]
        mov     dword ptr [rax + 16], ecx
        mov     ecx, dword ptr [rsi + 20]
        mov     dword ptr [rax + 20], ecx
        mov     ecx, dword ptr [rsi + 24]
        mov     dword ptr [rax + 24], ecx
        mov     ecx, dword ptr [rsi + 28]
        mov     dword ptr [rax + 28], ecx
        mov     ecx, dword ptr [rsi + 32]
        mov     dword ptr [rax + 32], ecx
        mov     ecx, dword ptr [rsi + 36]
        mov     dword ptr [rax + 36], ecx
        mov     ecx, dword ptr [rsi + 40]
        mov     dword ptr [rax + 40], ecx
        mov     ecx, dword ptr [rsi + 44]
        mov     dword ptr [rax + 44], ecx
        mov     ecx, dword ptr [rsi + 48]
        mov     dword ptr [rax + 48], ecx
        mov     ecx, dword ptr [rsi + 52]
        mov     dword ptr [rax + 52], ecx
        mov     ecx, dword ptr [rsi + 56]
        mov     dword ptr [rax + 56], ecx
        mov     ecx, dword ptr [rsi + 60]
        mov     dword ptr [rax + 60], ecx
        mov     rcx, qword ptr [rdi + 16]
        lea     rax, [rcx + rax + 64]
        sub     rax, qword ptr [rdi]
        mov     qword ptr [rdi + 16], rax
        pop     rbp
        ret

to:

        push    rbp
        mov     rbp, rsp
        mov     rax, qword ptr [rdi]
        movups  xmm0, xmmword ptr [rsi]
        movups  xmmword ptr [rax], xmm0
        movups  xmm0, xmmword ptr [rsi + 16]
        movups  xmmword ptr [rax + 16], xmm0
        movups  xmm0, xmmword ptr [rsi + 32]
        movups  xmmword ptr [rax + 32], xmm0
        movups  xmm0, xmmword ptr [rsi + 48]
        movups  xmmword ptr [rax + 48], xmm0
        mov     rcx, qword ptr [rdi + 16]
        lea     rax, [rcx + rax + 64]
        sub     rax, qword ptr [rdi]
        mov     qword ptr [rdi + 16], rax
        pop     rbp
        ret
Whiteboard: [wr-mvp] [triage]
Whiteboard: [wr-mvp] [triage] → [wr-mvp] [triage][gfx-noted]
Blocks: 1386665
status-firefox57: --- → unaffected
status-firefox58: --- → unaffected
Priority: -- → P2
Whiteboard: [wr-mvp] [triage][gfx-noted] → [wr-mvp] [gfx-noted]
(Reporter)

Comment 1

a year ago
Ted, do you know how to accomplish this?
Flags: needinfo?(ted)
I do not! The cargo docs explicitly say that only the profile from the top-level Cargo.toml is used:
http://doc.crates.io/manifest.html#the-profile-sections

Short of getting to a state where we're having cargo output a build plan and then executing it ourselves I don't think there's a supported way to do this.

I think I did once complain about how sccache debug builds were slow because the sha1 crate was slow when built debug, and Manish suggested that I could do some hacky thing like make a wrapper script for rustc that munged the rustc compiler options. We could do something like that as a band-aid if this is a big enough perf issue.

If there's not already an issue filed on cargo about this it'd be good to get one on file as well. I don't know if this has come up with many Rust projects, but we certainly have this issue with C++ code in Gecko, so I imagine as Rust projects get larger they'll start wanting to have this sort of flexibility as well.

Similarly, rillian wanted to have `force-overflow-checks=on` for mp4parse at one point:
https://hg.mozilla.org/mozilla-central/file/4b71272c92cf/media/libstagefright/binding/mp4parse/Cargo.toml#l20
Flags: needinfo?(ted)
Priority: P2 → P3
Whiteboard: [wr-mvp] [gfx-noted] → [wr-reserve] [gfx-noted]
(Reporter)

Comment 5

8 months ago
It seems like this should be available now that https://github.com/rust-lang/cargo/issues/5298 is fixed?
No longer blocks: 1386665
Flags: needinfo?(ted)
It looks like the profile overrides feature is only on the rust nightly channel.
That does seem to be the case. That feature is documented here:
https://doc.rust-lang.org/nightly/cargo/reference/unstable.html#profile-overrides

It sounds like if we're using nightly cargo we'd just need to add this to Cargo.toml:
```
[profile.release.overrides.webrender]
opt-level = 3
```

I'm not sure if that needs to go in toolkit/library/rust/Cargo.toml, or if toolkit/library/rust/shared/Cargo.toml would work.

We have support for packaging a different version of cargo in our Rust toolchain tasks, you just pass `--cargo-channel` to repack_rust.py. If you wanted to try this out in CI you'd just have to add something like `'--cargo-channel', 'nightly-2018-MM-DD'` to the existing arguments of the various rust toolchain tasks:
https://dxr.mozilla.org/mozilla-central/source/taskcluster/ci/toolchain/linux.yml
https://dxr.mozilla.org/mozilla-central/source/taskcluster/ci/toolchain/windows.yml
Flags: needinfo?(ted)
Currently the configuration needs to go in the workspace root, but I believe it can also go in `.cargo/config` to tweak the settings at build time as well
Oh! That's fine, that just means it'd go here right next to our existing `profile.dev` / `profile.release` sections:
https://dxr.mozilla.org/mozilla-central/rev/4e56a2f51ad739ca52046723448f3129a58f1666/Cargo.toml#54
Once our toolchain allows fixing this bug, let's fix bug 1498985, too.
See Also: → bug 1498985
Just for kicks, I did a windows-qr talos push [1] with all the rust code compiled at opt-level=3. It's still going but so far there doesn't appear to be much of a difference in results compared to the average results on m-c over the preceding two days [2].

[1] https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&selectedJob=213408734&revision=66cc0d8e89bb0af6b701c92fdfdc48fe6d29964e
[2] https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&newProject=try&newRevision=66cc0d8e89bb0af6b701c92fdfdc48fe6d29964e&selectedTimeRange=172800
(Reporter)

Comment 13

4 months ago
I suspect https://github.com/rust-lang/rust/pull/54639 may have hurt our ability to get better code with opt-level=3
You need to log in before you can comment on or make changes to this bug.