Closed
Bug 1471497
Opened 6 years ago
Closed 6 years ago
Versions of rust > 1.24 break win64 quantumrender reftests
Categories
(Core :: Graphics: WebRender, defect, P1)
Core
Graphics: WebRender
Tracking
()
RESOLVED
WORKSFORME
People
(Reporter: glandium, Unassigned)
References
Details
(Whiteboard: [gfx-noted])
The landing of bug 1447116 caused win64 permanent quantumrender reftest failures like:
https://treeherder.mozilla.org/logviewer.html#?job_id=185055229&repo=autoland&lineNumber=1794 (note that all reftest chunks have failed if you look on subsequent pushes, not only R4, they just didn't all on all pushes for some reason)
Further attempts on try revealed that all rustc versions > 1.24 have the same problem. Since 1.25 is the first version with LLVM 6, I'm suspecting this could be related, but it's weird that this doesn't affect other platforms (mac or linux), and only affects debug builds, not opt.
I'm waiting for try results with the rustc nightlies around the LLVM 6 landing.
Reporter | ||
Comment 1•6 years ago
|
||
Expectedly(?), try would seem to confirm this is related to LLVM 6:
Using nightly-2018-01-10 (last nightly using LLVM 4):
https://treeherder.mozilla.org/#/jobs?repo=try&revision=35055d27a7e351920fe7402bee5a6c69e36d2074
Using nightly-2018-01-11 (first nightly using LLVM 6):
https://treeherder.mozilla.org/#/jobs?repo=try&revision=4109f34249191705261c60958f20bd4a8537ee93
I guess this is all too old to have per-commit builds of rust?
Now the question is whether it's miscompilation or something weird in webrender code. That is only happens on one kind of build bugs me.
Reporter | ||
Comment 2•6 years ago
|
||
The rust version upgrade also apparently triggered intermittent win64 debug crashtest failures which were filed as bug 1471443 and disappeared on backout.
Reporter | ||
Comment 3•6 years ago
|
||
Another data point: when reftests were enabled on quantumrender builds in bug 1344350, we were building with rust 1.25, and they weren't busted.
Reporter | ||
Comment 4•6 years ago
|
||
The last build before we reverted to 1.24 was https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=13863e08584ae95ac1262b517bf8ef06dd29dbdf and it wasn't busted either.
Reporter | ||
Comment 5•6 years ago
|
||
Bisection on try narrowed it down to somewhere between 5866d6685849311f057e7e229b9ace63a2641c29 and caa6f15a7c44361d0cee655971334ed2b5cffcce. Still bisecting...
Reporter | ||
Comment 6•6 years ago
|
||
Bisection narrowed it down to bug 1452845, specifically, on "Turn on async scene building by default".
Taking the tree from the landing of bug 1447116 and reverting that part of bug 1452845 does yield green reftests.
Blocks: 1452845
Flags: needinfo?(bugmail)
Comment 7•6 years ago
|
||
Oh, interesting! I remember we had an issue with miscompilation in the function at [1] before, maybe that's back with the new LLVM? See https://bugzilla.mozilla.org/show_bug.cgi?id=1449982#c22 where I describe the two versions which should be equivalent but were behaving differently. The version I ended up landing at the time to work around the miscompilation is at [2]. Later I made more changes to the function in [3] to fix another bug, and the updated version didn't have the miscompilation problem so I was able to remove the workaround.
So I guess I would look at that function and ensure that wr_transaction_new is getting the right bool value on the rust side when compiled with the new LLVM.
[1] https://searchfox.org/mozilla-central/rev/d2966246905102b36ef5221b0e3cbccf7ea15a86/gfx/webrender_bindings/WebRenderAPI.cpp#137
[2] https://searchfox.org/mozilla-central/diff/4a07281ed3741eb467eeb076e3cd80460c8fcf7e/gfx/webrender_bindings/WebRenderAPI.cpp#142
[3] https://searchfox.org/mozilla-central/diff/27baa6b4f26544bd10b48d9c9864de5032c10483/gfx/webrender_bindings/WebRenderAPI.cpp#140
Flags: needinfo?(bugmail)
I wonder if this is another instance of https://bugs.llvm.org/show_bug.cgi?id=36886.
(In reply to David Major [:dmajor] from comment #8)
> I wonder if this is another instance of
> https://bugs.llvm.org/show_bug.cgi?id=36886.
In fact I'm pretty sure this is it. (Although I found the bug in clang-cl, the issue was in core LLVM code and would apply to Rust too.)
The caller only populates the low byte of the bool param, and the rest of the register still contains the upper bytes of TransactionBuilder's `this` pointer:
xul!mozilla::wr::TransactionBuilder::TransactionBuilder+0x40
140 00000001`81c1decc 8acb mov cl,bl
...
140 00000001`81c1ded0 e8bbfe4ffe call xul!webrender_bindings::bindings::wr_transaction_new
The callee tests the full 32-bit value:
xul!webrender_bindings::bindings::wr_transaction_new
1007 00000001`8011dda1 89ce mov esi,ecx
...
1008 00000001`8011ddb0 85f6 test esi,esi
Updated•6 years ago
|
Comment 10•6 years ago
|
||
Should we just change this code to pass an int32_t instead as a workaround?
Reporter | ||
Comment 11•6 years ago
|
||
(In reply to Kartikaya Gupta (away until Jul03) (email:kats@mozilla.com) from comment #7)
> (snip)
Aha! Guess what version of rust the tree was built with on automation when you landed the following changes:
> [2] https://searchfox.org/mozilla-central/diff/4a07281ed3741eb467eeb076e3cd80460c8fcf7e/gfx/webrender_bindings/WebRenderAPI.cpp#142
> [3] https://searchfox.org/mozilla-central/diff/27baa6b4f26544bd10b48d9c9864de5032c10483/gfx/webrender_bindings/WebRenderAPI.cpp#140
Respectively, 1.25 and 1.24.
(In reply to Alexis Beingessner [:Gankro] from comment #10)
> Should we just change this code to pass an int32_t instead as a workaround?
I'm testing this on try.
Reporter | ||
Comment 12•6 years ago
|
||
Now, while this is fine for this one particular problem, it seems like a rather dangerous bug in the rust compiler. That is, any C++ to rust ffi calls that use bool could have the same problem, and that might just be happening in less obvious ways than those reftest failures.
Comment 13•6 years ago
|
||
(In reply to Mike Hommey [:glandium] from comment #12)
> Now, while this is fine for this one particular problem, it seems like a
> rather dangerous bug in the rust compiler. That is, any C++ to rust ffi
> calls that use bool could have the same problem, and that might just be
> happening in less obvious ways than those reftest failures.
In theory it could happen with rust->rust calls too. If it's any comfort, when C++->C++ calls were affected, the bug only hit us in one place in our entire test suite.
Reporter | ||
Comment 14•6 years ago
|
||
Let's see how quickly we can get this fixed in rustc.
https://github.com/rust-lang/llvm/pull/119
Reporter | ||
Comment 15•6 years ago
|
||
For the record, this worked: https://hg.mozilla.org/try/rev/865ba89e28c79a8b6c7a7b2e8eabeb38d8055998
Reporter | ||
Comment 16•6 years ago
|
||
Let's call this WFM, as the underlying issue will be fixed in next rustc nightly and next rustc beta.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WORKSFORME
Comment 17•6 years ago
|
||
So just to confirm, we're still using 1.24, correct? Should https://wiki.mozilla.org/Rust_Update_Policy_for_Firefox be updated to reflect this? We use that page to determine which version of rust WebRender needs to be compatible with and if it's out-of-date that's less than ideal.
Reporter | ||
Comment 18•6 years ago
|
||
Not anymore. As of bug 1447116, we require 1.26, and will bump to 1.27 in a few.
Comment 19•6 years ago
|
||
(In reply to Kartikaya Gupta (away until Jul03) (email:kats@mozilla.com) from comment #17)
> So just to confirm, we're still using 1.24, correct? Should
> https://wiki.mozilla.org/Rust_Update_Policy_for_Firefox be updated to
> reflect this? We use that page to determine which version of rust WebRender
> needs to be compatible with and if it's out-of-date that's less than ideal.
I'll update the wiki shortly and differentiate between projected dates and actual dates so we don't fall out of sync like this again.
You need to log in
before you can comment on or make changes to this bug.
Description
•