Closed Bug 1365915 Opened 7 years ago Closed 6 years ago

stylo: valgrind error when running with stylo

Categories

(Core :: CSS Parsing and Computation, defect, P2)

defect

Tracking

()

RESOLVED FIXED
mozilla57
Tracking Status
firefox57 --- fixed

People

(Reporter: froydnj, Assigned: jseward)

References

Details

Attachments

(2 files)

Building with Stylo (and apparently accidentally enabled), our Valgrind tests fail:

 TEST-UNEXPECTED-FAIL | valgrind-test | Conditional jump or move depends on uninitialised value(s) at selectors::parser::parse_qualified_name::he896e0b4bdce0335 / parse_type_selector / selectors::parser::parse_compound_selector::h33c1357312950419 / selectors::parser::parse_complex_selector_and_pseudo_element::h75f7e473575429c4
 ==38792== Conditional jump or move depends on uninitialised value(s)
 ==38792==    at 0x1031D119: selectors::parser::parse_qualified_name::he896e0b4bdce0335 (parser.rs:1112)
 ==38792==    by 0x1031986C: parse_type_selector<style::selector_parser::SelectorParser,style::gecko::selector_parser::SelectorImpl> (parser.rs:1025)
 ==38792==    by 0x1031986C: selectors::parser::parse_compound_selector::h33c1357312950419 (parser.rs:1316)
 ==38792==    by 0x10319290: selectors::parser::parse_complex_selector_and_pseudo_element::h75f7e473575429c4 (parser.rs:960)
 ==38792==    by 0x1040ECAC: parse_selector<style::selector_parser::SelectorParser,style::gecko::selector_parser::SelectorImpl> (parser.rs:932)
 ==38792==    by 0x1040ECAC: {{closure}}<style::gecko::selector_parser::SelectorImpl,style::selector_parser::SelectorParser> (parser.rs:125)
 ==38792==    by 0x1040ECAC: {{closure}}<closure,selectors::parser::Selector<style::gecko::selector_parser::SelectorImpl>> (parser.rs:367)
 ==38792==    by 0x1040ECAC: parse_entirely<closure,selectors::parser::Selector<style::gecko::selector_parser::SelectorImpl>> (parser.rs:347)
 ==38792==    by 0x1040ECAC: parse_until_before<closure,selectors::parser::Selector<style::gecko::selector_parser::SelectorImpl>> (parser.rs:437)
 ==38792==    by 0x1040ECAC: parse_comma_separated<closure,selectors::parser::Selector<style::gecko::selector_parser::SelectorImpl>> (parser.rs:367)
 ==38792==    by 0x1040ECAC: _$LT$selectors..parser..SelectorList$LT$Impl$GT$$GT$::parse::h267c42493683ed79 (parser.rs:125)
 ==38792==    by 0x10410093: parse_prelude (stylesheets.rs:1354)
 ==38792==    by 0x10410093: parse_prelude (stylesheets.rs:1150)
 ==38792==    by 0x10410093: {{closure}}<style::stylesheets::TopLevelRuleParser> (rules_and_declarations.rs:429)
 ==38792==    by 0x10410093: parse_entirely<closure,selectors::parser::SelectorList<style::gecko::selector_parser::SelectorImpl>> (parser.rs:347)
 ==38792==    by 0x10410093: parse_until_before<closure,selectors::parser::SelectorList<style::gecko::selector_parser::SelectorImpl>> (parser.rs:437)
 ==38792==    by 0x10410093: cssparser::rules_and_declarations::parse_qualified_rule::h8127c554b75315a0 (rules_and_declarations.rs:428)
 ==38792==    by 0x1041FC56: next<style::stylesheets::CssRule,style::stylesheets::TopLevelRuleParser> (rules_and_declarations.rs:332)
 ==38792==    by 0x1041FC56: style::stylesheets::Stylesheet::parse_rules::h5eced3ff0890ce11 (stylesheets.rs:782)
 ==38792==    by 0x1042000E: style::stylesheets::Stylesheet::from_str::h4a7447755da1e35f (stylesheets.rs:813)
 ==38792==    by 0x1014AF35: Servo_StyleSheet_FromUTF8Bytes (glue.rs:602)
 ==38792==    by 0xED24111: mozilla::ServoStyleSheet::ParseSheet(mozilla::css::Loader*, nsAString const&, nsIURI*, nsIURI*, nsIPrincipal*, unsigned int) (ServoStyleSheet.cpp:111)
 ==38792==    by 0xED1CA00: mozilla::css::Loader::ParseSheet(nsAString const&, mozilla::css::SheetLoadData*, bool&) (Loader.cpp:1783)
 ==38792==    by 0xED1D683: mozilla::css::SheetLoadData::OnStreamComplete(nsIUnicharStreamLoader*, nsISupports*, nsresult, nsAString const&) [clone .part.483] (Loader.cpp:998)
 ==38792==    by 0xD903834: nsUnicharStreamLoader::OnStopRequest(nsIRequest*, nsISupports*, nsresult) (nsUnicharStreamLoader.cpp:97)
 ==38792==    by 0xE214342: nsSyncLoadService::PushSyncStreamToListener(nsIInputStream*, nsIStreamListener*, nsIChannel*) (nsSyncLoadService.cpp:393)
 ==38792==    by 0xED1B08D: mozilla::css::Loader::LoadSheet(mozilla::css::SheetLoadData*, mozilla::css::StyleSheetState, bool) (Loader.cpp:1550)
 ==38792==    by 0xED1BFAB: mozilla::css::Loader::InternalLoadNonDocumentSheet(nsIURI*, bool, mozilla::css::SheetParsingMode, bool, nsIPrincipal*, nsCString const&, RefPtr<mozilla::StyleSheet>*, nsICSSLoaderObserver*, mozilla::CORSMode, mozilla::net::ReferrerPolicy, nsAString const&) (Loader.cpp:2446)
 ==38792==    by 0xED1C0A4: mozilla::css::Loader::LoadSheetSync(nsIURI*, mozilla::css::SheetParsingMode, bool, RefPtr<mozilla::StyleSheet>*) (Loader.cpp:2333)
 ==38792==    by 0xEE120FA: LoadSheet(nsIURI*, mozilla::css::SheetParsingMode, mozilla::StyleBackendType, RefPtr<mozilla::StyleSheet>*) (nsStyleSheetService.cpp:223)
 ==38792==    by 0xEE17106: nsStyleSheetService::LoadAndRegisterSheetInternal(nsIURI*, unsigned int) (nsStyleSheetService.cpp:261)
 ==38792==    by 0xEE17309: nsStyleSheetService::RegisterFromEnumerator(nsICategoryManager*, char const*, nsISimpleEnumerator*, unsigned int) (nsStyleSheetService.cpp:82)
 ==38792==    by 0xEE173FC: nsStyleSheetService::Init() (nsStyleSheetService.cpp:139)
 ==38792==    by 0xEFBC2B5: nsStyleSheetServiceConstructor(nsISupports*, nsID const&, void**) (nsLayoutModule.cpp:487)
 ==38792==    by 0xD866338: nsComponentManagerImpl::CreateInstance(nsID const&, nsISupports*, nsID const&, void**) (nsComponentManager.cpp:1021)
 ==38792==    by 0xD867CB3: nsComponentManagerImpl::GetService(nsID const&, nsID const&, void**) (nsComponentManager.cpp:1264)
 ==38792==    by 0xDE09549: nsJSCID::GetService(JS::Handle<JS::Value>, JSContext*, unsigned char, JS::MutableHandle<JS::Value>) (XPCJSID.cpp:695)
 ==38792==    by 0xD87D6BD: ??? (xptcinvoke_asm_x86_64_unix.S:106)
 ==38792==    by 0xDE294C0: Invoke (XPCWrappedNative.cpp:1996)
 ==38792==    by 0xDE294C0: Call (XPCWrappedNative.cpp:1315)
 ==38792==    by 0xDE294C0: XPCWrappedNative::CallMethod(XPCCallContext&, XPCWrappedNative::CallMode) (XPCWrappedNative.cpp:1282)
 ==38792==    by 0xDE30331: XPC_WN_CallMethod(JSContext*, unsigned int, JS::Value*) (XPCWrappedNativeJSOps.cpp:982)
 ==38792==    by 0xF94B07D: CallJSNative (jscntxtinlines.h:293)
 ==38792==    by 0xF94B07D: js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct) (Interpreter.cpp:470)
 ==38792==    by 0xF93DF0F: CallFromStack (Interpreter.cpp:521)
 ==38792==    by 0xF93DF0F: Interpret(JSContext*, js::RunState&) (Interpreter.cpp:3028)
 ==38792==    by 0xF94AACF: js::RunScript(JSContext*, js::RunState&) (Interpreter.cpp:410)
 ==38792==    by 0xF94B186: js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct) (Interpreter.cpp:488)
 ==38792==    by 0xF93DF0F: CallFromStack (Interpreter.cpp:521)
 ==38792==    by 0xF93DF0F: Interpret(JSContext*, js::RunState&) (Interpreter.cpp:3028)
 ==38792==    by 0xF94AACF: js::RunScript(JSContext*, js::RunState&) (Interpreter.cpp:410)
 ==38792==    by 0xF94B186: js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct) (Interpreter.cpp:488)
 ==38792==    by 0xF94B6A8: js::Call(JSContext*, JS::Handle<JS::Value>, JS::Handle<JS::Value>, js::AnyInvokeArgs const&, JS::MutableHandle<JS::Value>) (Interpreter.cpp:534)
 ==38792==    by 0xFCC6C3E: js::Wrapper::call(JSContext*, JS::Handle<JSObject*>, JS::CallArgs const&) const (Wrapper.cpp:166)
 ==38792==  Uninitialised value was created by a stack allocation
 ==38792==    at 0x104B3676: cssparser::tokenizer::next_token::ha62f57e0e08b9ce0 (tokenizer.rs:176)
I don't really understand how this is happening.  I've checked the packaged tarball and can confirm that layout.css.servo.enabled is false in the preferences file.  But we're obviously executing Stylo code...

valgrind is complaining about this line:

https://hg.mozilla.org/try/file/501d42aa00cdbc7a320c74885f81bb155cec3ab4/servo/components/selectors/parser.rs#l1112

specifically the Ok(Token::Delim('|')):

            match input.next_including_whitespace() {
                Ok(Token::Delim('|')) => {

where Token::Delim is defined here:

https://hg.mozilla.org/try/file/501d42aa00cdbc7a320c74885f81bb155cec3ab4/third_party/rust/cssparser/src/tokenizer.rs#l54

Judging from the source code and the assembly, I think valgrind is complaining about one of the two comparisons below:

 3516458:	48 8b 9d b0 fe ff ff 	mov    -0x150(%rbp),%rbx
 351645f:	48 85 db             	test   %rbx,%rbx           <== XXX
 3516462:	75 45                	jne    35164a9 <selectors::parser::parse_qualified_name::he896e0b4bdce0335+0x529>
 3516464:	48 b8 06 00 00 00 7c 	movabs $0x7c00000006,%rax
 351646b:	00 00 00 
 351646e:	48 39 85 b8 fe ff ff 	cmp    %rax,-0x148(%rbp)   <== XXX
 3516475:	75 32                	jne    35164a9 <selectors::parser::parse_qualified_name::he896e0b4bdce0335+0x529>

I believe the `test` is matching on Ok(...), and the `cmp` is matching Token::Delim.  Delim is the 7th variant (so discriminant 0x6) and '|' is 0x7c in ASCII, so the above looks plausible for doing some sort of matching there.

But the only place where we construct Delim('|'), AFAICT, is:

https://hg.mozilla.org/try/file/501d42aa00cdbc7a320c74885f81bb155cec3ab4/third_party/rust/cssparser/src/tokenizer.rs#l532

and looking at the assembly for that location, we initialize the object by storing 0x7c00000006 directly into it, so that can't be the uninitialized memory...

Presumably, then, valgrind is complaining about the first comparison, which is performing matching on the Result<Token, ()>?  I don't understand why rustc is doing a fullword test here, because some of those bytes must overlap with the data of Token.  ISTR that rustc has some fancy optimizations for Result<> of various flavors, so maybe those are kicking in here and rustc knows the uninitialized use here is OK?  Or Valgrind is turning in a false positive here?

Julian, does a false positive look plausible?
Alex, do you know offhand what rustc is doing here?
Flags: needinfo?(jseward)
Flags: needinfo?(acrichton)
AFAIK we don't try to do something like this at least in terms of what we tell LLVM to do (LLVM is entirely responsible for code generation). Historically though (https://github.com/rust-lang/rust/issues/11710) we've seen false positives and the supposed underlying LLVM issue (https://bugs.llvm.org//show_bug.cgi?id=12319) is still open. 

Unfortunately thought I don't know a whole lot more other than that. I've tended to just not trust these warnings from valgrind on Rust code as every time I look into them it turns out to be a false positive. If we can narrow down though what's going on here (e.g. like disabling an LLVM pass) we'd love to accommodate valgrind by default!
Flags: needinfo?(acrichton)
(In reply to Alex Crichton [:acrichto] from comment #2)
> AFAIK we don't try to do something like this at least in terms of what we
> tell LLVM to do (LLVM is entirely responsible for code generation).
> Historically though (https://github.com/rust-lang/rust/issues/11710) we've
> seen false positives and the supposed underlying LLVM issue
> (https://bugs.llvm.org//show_bug.cgi?id=12319) is still open. 

Ah, that's useful to know.

> Unfortunately thought I don't know a whole lot more other than that. I've
> tended to just not trust these warnings from valgrind on Rust code as every
> time I look into them it turns out to be a false positive. If we can narrow
> down though what's going on here (e.g. like disabling an LLVM pass) we'd
> love to accommodate valgrind by default!

Yeah, likewise!  It'd be unfortunate if we had to disable valgrind because of Rust code.  We could get by with disabling Stylo on our valgrind builds for now, but eventually we'd want to turn it back on.  I guess we could try suppressing errors; I don't know how easy that is with rustc sticking its compilation hash into the symbol name, though...

How does one turn off individual LLVM passes?

Julian, can we use suppressions if we're only matching on portions of the symbol name in the stack?  Or some of blanket "if this function appears in the stack, don't bother with it"?
Ah I don't think you can turn off individual LLVM passes through the command line. I just mean moreso in the compiler implementation itself we could turn off passes if we determine them to be "bad". (although I'm not even sure if this is the case, just a complete shot in the dark)
Various misc points:

LLVM optimised code is a more of problem for Valgrind/Memcheck than
gcc generated code is.

Newer Valgrinds are better with LLVM generated code, but still not
perfect.  The upcoming 3.13 has some new fixes for it.

You can select extra-precise and extra-expensive definedness 
tracking using --expensive-definedness-checks=yes.  That might help,
although is not easily done in automation.
Flags: needinfo?(jseward)
(In reply to Nathan Froyd [:froydnj] from comment #1)

> Judging from the source code and the assembly, I think valgrind is
> complaining about one of the two comparisons below:
> 
>  3516458:	48 8b 9d b0 fe ff ff 	mov    -0x150(%rbp),%rbx
>  351645f:	48 85 db             	test   %rbx,%rbx           <== XXX
>  3516462:	75 45                	jne    35164a9
> <selectors::parser::parse_qualified_name::he896e0b4bdce0335+0x529>
>  3516464:	48 b8 06 00 00 00 7c 	movabs $0x7c00000006,%rax
>  351646b:	00 00 00 
>  351646e:	48 39 85 b8 fe ff ff 	cmp    %rax,-0x148(%rbp)   <== XXX
>  3516475:	75 32                	jne    35164a9
> <selectors::parser::parse_qualified_name::he896e0b4bdce0335+0x529>

The instruction it is complaining about has a page offset of 0x119:

==38792== Conditional jump or move depends on uninitialised value(s)
==38792==    at 0x1031D119: selectors::parser::parse_qualified_name::he896e0b4bdce0335 (parser.rs:1112)

and that doesn't match the two you pointed at, with page offsets 45f
and 46e.  So I think those are not the ones in question.

Is it possible you can find the instruction that got mapped to 0x1031D119 ?
There's a good chance that 0x1031D119 is in the first page of 
selectors::parser::parse_qualified_name::he896e0b4bdce0335's code.
Also, you're looking for conditional branch insns, so if you find
something else at xxxx119 then you're looking in the wrong place.

Or just get me the libxul.so somehow and I'll find it.
(In reply to Julian Seward [:jseward] from comment #6)
> The instruction it is complaining about has a page offset of 0x119:
> 
> ==38792== Conditional jump or move depends on uninitialised value(s)
> ==38792==    at 0x1031D119:
> selectors::parser::parse_qualified_name::he896e0b4bdce0335 (parser.rs:1112)
> 
> and that doesn't match the two you pointed at, with page offsets 45f
> and 46e.  So I think those are not the ones in question.

Hm, good point.

> Is it possible you can find the instruction that got mapped to 0x1031D119 ?
> There's a good chance that 0x1031D119 is in the first page of 
> selectors::parser::parse_qualified_name::he896e0b4bdce0335's code.
> Also, you're looking for conditional branch insns, so if you find
> something else at xxxx119 then you're looking in the wrong place.

There's nothing in parse_qualified_name that qualifies.  The disassembly there at a page offset of 0x119 is:

 3516105:	4c 8d a5 70 ff ff ff 	lea    -0x90(%rbp),%r12
 351610c:	48 8d 9d 68 ff ff ff 	lea    -0x98(%rbp),%rbx
 3516113:	66 66 66 66 2e 0f 1f 	data16 data16 data16 nopw %cs:0x0(%rax,%rax,1)
 351611a:	84 00 00 00 00 00 
 3516120:	4c 89 e7             	mov    %r12,%rdi
 3516123:	e8 98 62 19 00       	callq  36ac3c0 <drop::ha1e02b3084fd1afd>

I *guess* we could have jumped into the 13-byte NOP, but that seems unlikely.

> Or just get me the libxul.so somehow and I'll find it.

You can download the Firefox package from:

https://queue.taskcluster.net/v1/task/GZzqrHsGSu2_fqYxTbDKzA/runs/0/artifacts/public/build/target.tar.bz2

and find the libxul.so you need in that.
Flags: needinfo?(jseward)
(In reply to Nathan Froyd [:froydnj] from comment #7)

General confusion.  I pulled the libxul.so from

> https://queue.taskcluster.net/v1/task/GZzqrHsGSu2_fqYxTbDKzA/runs/0/
> artifacts/public/build/target.tar.bz2

as you suggested, but there's nothing at xxx119 that corresponds:

 3516104:       74 da                   je     35160e0 <_ZN9selectors6parser20parse_qualified_name17he896e0b4bdce0335E+0x1a0>
 3516106:       0f 10 85 68 ff ff ff    movups -0x98(%rbp),%xmm0
 351610d:       0f 10 8d 78 ff ff ff    movups -0x88(%rbp),%xmm1
 3516114:       0f 10 55 88             movups -0x78(%rbp),%xmm2
 3516118:       0f 10 5d 98             movups -0x68(%rbp),%xmm3
 351611c:       0f 29 9d e0 fe ff ff    movaps %xmm3,-0x120(%rbp)
 3516123:       0f 29 95 d0 fe ff ff    movaps %xmm2,-0x130(%rbp)
 351612a:       0f 29 8d c0 fe ff ff    movaps %xmm1,-0x140(%rbp)
 3516131:       0f 29 85 b0 fe ff ff    movaps %xmm0,-0x150(%rbp)
 3516138:       4c 8b a5 b0 fe ff ff    mov    -0x150(%rbp),%r12
 351613f:       4d 85 e4                test   %r12,%r12
 3516142:       0f 85 21 01 00 00       jne    3516269 <_ZN9selectors6parser20parse_qualified_name17he896e0b4bdce0335E+0x329>

However, this is also different from the area-around-119 disassembly that
you show in comment 7.  So it seems as if there's potentially three 
different variants of libxul.so in play here -- the one generating the
original xxx119 complaint, the one you disassembled, and the one from
the link in comment 7.

Can you re-check?  It would be good to at least get a diagnosis on 
what's going on here, so as to classify this as FP-not-fixed,
FP-fixed-in-newer-V, or a bug in Stylo or Rustc.

One thing that makes symbol-hunting much easier is to rerun V with
--sym-offsets=yes, so it prints all names in the form "sym+offset".
You can do that easily by editing build/valgrind/mach_commands.py,
around line 105 is a definition for |valgrind_args|.  Add
--sym-offsets=yes to that.
Flags: needinfo?(jseward) → needinfo?(nfroyd)
Priority: -- → P3
(In reply to Julian Seward [:jseward] from comment #8)
> (In reply to Nathan Froyd [:froydnj] from comment #7)
> 
> General confusion.  I pulled the libxul.so from
> 
> > https://queue.taskcluster.net/v1/task/GZzqrHsGSu2_fqYxTbDKzA/runs/0/
> > artifacts/public/build/target.tar.bz2
> 
> as you suggested, but there's nothing at xxx119 that corresponds:
> 
>  3516104:       74 da                   je     35160e0
> <_ZN9selectors6parser20parse_qualified_name17he896e0b4bdce0335E+0x1a0>
>  3516106:       0f 10 85 68 ff ff ff    movups -0x98(%rbp),%xmm0
>  351610d:       0f 10 8d 78 ff ff ff    movups -0x88(%rbp),%xmm1
>  3516114:       0f 10 55 88             movups -0x78(%rbp),%xmm2
>  3516118:       0f 10 5d 98             movups -0x68(%rbp),%xmm3
>  351611c:       0f 29 9d e0 fe ff ff    movaps %xmm3,-0x120(%rbp)
>  3516123:       0f 29 95 d0 fe ff ff    movaps %xmm2,-0x130(%rbp)
>  351612a:       0f 29 8d c0 fe ff ff    movaps %xmm1,-0x140(%rbp)
>  3516131:       0f 29 85 b0 fe ff ff    movaps %xmm0,-0x150(%rbp)
>  3516138:       4c 8b a5 b0 fe ff ff    mov    -0x150(%rbp),%r12
>  351613f:       4d 85 e4                test   %r12,%r12
>  3516142:       0f 85 21 01 00 00       jne    3516269
> <_ZN9selectors6parser20parse_qualified_name17he896e0b4bdce0335E+0x329>
> 
> However, this is also different from the area-around-119 disassembly that
> you show in comment 7.  So it seems as if there's potentially three 
> different variants of libxul.so in play here -- the one generating the
> original xxx119 complaint, the one you disassembled, and the one from
> the link in comment 7.
> 
> Can you re-check?  It would be good to at least get a diagnosis on 
> what's going on here, so as to classify this as FP-not-fixed,
> FP-fixed-in-newer-V, or a bug in Stylo or Rustc.

Sorry, I pulled disassembly and error messages from an earlier version, but pointed you at a later version.  I suppose our builds are not quite deterministic. :)

The error in the version I pointed you at comes from:

at 0x1031D159: selectors::parser::parse_qualified_name::he896e0b4bdce0335 (parser.rs:1112)

and *that* address does match up with the analysis I pointed at earlier, modulo the exact addresses, of course.
Flags: needinfo?(nfroyd)
I can reproduce this locally.

It disappears when run with --expensive-definedness-checks=yes.  This
gives a more accurate analysis of 64 bit EQ/NE, which is based on the
insight that

   the result is defined if two
   corresponding bits can be found, one from each argument, so that
   both bits are defined but are different -- that makes EQ say "No"
   and NE say "Yes".

So it might be that rustc and/or clang has produced a comparison 
which relies on that fact.
Summary: valgrind error when running with stylo → stylo: valgrind error when running with stylo
Depends on: 1382280
On further testing, it appears that --expensive-definedness-checks=yes does
significantly reduce the false positive rate, but it does not drive it down
to zero.  I suspect the problem shown at [1] and [2] (which are the same
thing) play a role.  Unfortunately, the feasibility of fixing them in Valgrind
is currently unknown to me.

[1] https://bugs.llvm.org//show_bug.cgi?id=12319
[2] https://github.com/rust-lang/rust/issues/11710
Julian is working on some patches (to Valgrind itself) which reduce the noise level on LLVM compiled code. wcosta will add those patches to the Valgrind used on automation.  We should be able to suppress that the remaining errors from Stylo code.
Priority: P3 → --
Assignee: nobody → jseward
Priority: -- → P2
The blocking bug 1382280 has landed now, so we should re-evaluate the false
positive situation here now.
(In reply to Julian Seward [:jseward] from comment #13)
> The blocking bug 1382280 has landed now, so we should re-evaluate the false
> positive situation here now.

Sounds like an updated try run of Valgrind with Stylo enabled at build time (but still disabled at run time) is what's needed.

I've started one now:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=383eeccb2cacd3902fa545c8f1d4aa380e834302
At first I thought from commit history that we only needed to build Stylo, but maybe we need it also enabled at runtime since Valgrind is a dynamic analysis tool.  So, I'll add another run with Stylo enabled at runtime as well.

Stylo build on, runtime off: https://treeherder.mozilla.org/#/jobs?repo=try&revision=383eeccb2cacd3902fa545c8f1d4aa380e834302
Stylo build on, runtime on:  https://treeherder.mozilla.org/#/jobs?repo=try&revision=ba4489c9103d38a375425f111f6d7ad91e73f976
Okay, seems like we do indeed need Stylo enabled at runtime.  With that, we get a few errors similar to comment 0:

https://treeherder.mozilla.org/logviewer.html#?job_id=122632022&repo=try&lineNumber=34886

TEST-UNEXPECTED-FAIL | valgrind-test | Conditional jump or move depends on uninitialised value(s) at map_or / selectors::matching::matches_complex_selector_internal / selectors::matching::matches_complex_selector / matches_selector
==43715== Conditional jump or move depends on uninitialised value(s)
==43715==    at 0x115296FE: map_or<selectors::parser::Combinator,bool,closure> (option.rs:421)
==43715==    by 0x115296FE: selectors::matching::matches_complex_selector_internal (matching.rs:530)
==43715==    by 0x115295F9: selectors::matching::matches_complex_selector (matching.rs:501)
==43715==    by 0x115283C0: matches_selector<style::gecko::wrapper::GeckoElement,closure> (matching.rs:397)
==43715==    by 0x115283C0: <style::selector_map::SelectorMap<style::stylist::Rule>>::get_matching_rules (selector_map.rs:216)
==43715==    by 0x115279C1: <style::selector_map::SelectorMap<style::stylist::Rule>>::get_all_matching_rules (selector_map.rs:184)
==43715==    by 0x11526A63: push_applicable_declarations<style::gecko::wrapper::GeckoElement,smallvec::SmallVec<[style::applicable_declarations::ApplicableDeclarationBlock; 16]>,closure> (stylist.rs:1197)
==43715==    by 0x11526A63: <style::style_resolver::StyleResolverForElement<'a, 'ctx, 'le, E>>::match_primary (style_resolver.rs:336)
==43715==    by 0x1151D8F7: <style::style_resolver::StyleResolverForElement<'a, 'ctx, 'le, E>>::resolve_primary_style (style_resolver.rs:96)
==43715==    by 0x11519E57: <style::style_resolver::StyleResolverForElement<'a, 'ctx, 'le, E>>::resolve_style (style_resolver.rs:144)
==43715==    by 0x11650679: resolve_style<style::gecko::wrapper::GeckoElement> (traversal.rs:464)
==43715==    by 0x11650679: Servo_ResolveStyleLazily (glue.rs:2926)
==43715==    by 0xF9DEEB8: mozilla::ServoStyleSet::ResolveStyleLazilyInternal(mozilla::dom::Element*, mozilla::CSSPseudoElementType, nsIAtom*, mozilla::ServoStyleContext const*, mozilla::StyleRuleInclusion, bool) (ServoStyleSet.cpp:1228)
==43715==    by 0xF9DF0B1: mozilla::ServoStyleSet::ResolveStyleFor(mozilla::dom::Element*, mozilla::ServoStyleContext*, mozilla::LazyComputeBehavior) (ServoStyleSet.cpp:197)
==43715==    by 0xFB45526: ResolveStyleFor (StyleSetHandleInlines.h:95)
==43715==    by 0xFB45526: GetPropagatedScrollbarStylesForViewport(nsPresContext*, mozilla::ScrollbarStyles*) (nsPresContext.cpp:1492)
==43715==    by 0xFB45B2A: nsPresContext::UpdateViewportScrollbarStylesOverride() (nsPresContext.cpp:1541)
==43715==    by 0xFB32489: nsCSSFrameConstructor::ConstructDocElementFrame(mozilla::dom::Element*, nsILayoutHistoryState*) (nsCSSFrameConstructor.cpp:2519)
==43715==    by 0xFB344F3: nsCSSFrameConstructor::ContentRangeInserted(nsIContent*, nsIContent*, nsIContent*, nsILayoutHistoryState*, bool, bool, TreeMatchContext*) (nsCSSFrameConstructor.cpp:8110)
==43715==    by 0xFB351C7: ContentRangeInserted (nsCSSFrameConstructor.h:277)
==43715==    by 0xFB351C7: nsCSSFrameConstructor::ContentInserted(nsIContent*, nsIContent*, nsILayoutHistoryState*, bool) (nsCSSFrameConstructor.cpp:8001)
==43715==    by 0xFAD86C7: mozilla::PresShell::Initialize(int, int) (PresShell.cpp:1814)
==43715==    by 0xE77D80C: nsContentSink::StartLayout(bool) (nsContentSink.cpp:1262)
==43715==    by 0xF72EAF5: nsXMLContentSink::HandleStartElement(char16_t const*, char16_t const**, unsigned int, unsigned int, bool) (nsXMLContentSink.cpp:1006)
==43715==    by 0xE296AD1: nsExpatDriver::HandleStartElement(char16_t const*, char16_t const**) (nsExpatDriver.cpp:380)
==43715==    by 0x100FA5C7: doContent (xmlparse.c:2468)
==43715==    by 0x100FB69F: contentProcessor (xmlparse.c:2098)
==43715==    by 0x100F98D2: doProlog (xmlparse.c:4078)
==43715==    by 0x100FBA29: prologProcessor (xmlparse.c:3812)
==43715==    by 0x100FD82A: MOZ_XML_Parse (xmlparse.c:1530)
==43715==    by 0xE296989: nsExpatDriver::ParseBuffer(char16_t const*, unsigned int, bool, unsigned int*) (nsExpatDriver.cpp:1001)
==43715==    by 0xE29C502: nsExpatDriver::ConsumeToken(nsScanner&, bool&) (nsExpatDriver.cpp:1097)
==43715==    by 0xE29A941: nsParser::Tokenize(bool) (nsParser.cpp:1540)
==43715==    by 0xE29B12B: nsParser::ResumeParse(bool, bool, bool) (nsParser.cpp:1057)
==43715==    by 0xE29BBEA: nsParser::OnDataAvailable(nsIRequest*, nsISupports*, nsIInputStream*, unsigned long, unsigned int) (nsParser.cpp:1437)
==43715==    by 0xE61178A: imgRequest::OnDataAvailable(nsIRequest*, nsISupports*, nsIInputStream*, unsigned long, unsigned int) (imgRequest.cpp:1173)
==43715==    by 0xD94E4FD: nsBaseChannel::OnDataAvailable(nsIRequest*, nsISupports*, nsIInputStream*, unsigned long, unsigned int) (nsBaseChannel.cpp:903)
==43715==    by 0xD96C8D4: nsInputStreamPump::OnStateTransfer() (nsInputStreamPump.cpp:617)
==43715==    by 0xD96CD37: nsInputStreamPump::OnInputStreamReady(nsIAsyncInputStream*) (nsInputStreamPump.cpp:443)
==43715==    by 0xD8B9673: nsInputStreamReadyEvent::Run() (nsStreamUtils.cpp:97)
==43715==    by 0xD8ED947: nsThread::ProcessNextEvent(bool, bool*) [clone .part.142] (nsThread.cpp:1446)
==43715==    by 0xD8EB51C: NS_ProcessNextEvent(nsIThread*, bool) (nsThreadUtils.cpp:480)
==43715==  Uninitialised value was created by a stack allocation
==43715==    at 0x11528250: <style::selector_map::SelectorMap<style::stylist::Rule>>::get_matching_rules (selector_map.rs:205)

`--expensive-definedness-checks=yes` is already set, since :jseward added it in bug 1382280.

Should we add suppressions for these cases?
Flags: needinfo?(jseward)
(In reply to J. Ryan Stinnett [:jryans] (use ni?) from comment #16)

Thanks for the test runs.

> Should we add suppressions for these cases?

Yes.  Could you pls try with the following, added to the end of 
build/valgrind/x86_64-redhat-linux-gnu.sup ?

{
   map_or<selectors::parser::Combinator,bool,closure> #1 (see bug 1365915)
   Memcheck:Cond
   fun:map_or<selectors::parser::Combinator,bool,closure>
   fun:_ZN9selectors8matching33matches_complex_selector_internal*
   fun:_ZN9selectors8matching24matches_complex_selector*
   fun:matches_selector<style::gecko::wrapper::GeckoElement,closure>
}

{
   map_or<selectors::parser::Combinator,bool,closure> #2 (see bug 1365915)
   Memcheck:Cond
   fun:map_or<selectors::parser::Combinator,bool,closure>
   fun:_ZN9selectors8matching33matches_complex_selector_internal*
   fun:_ZN9selectors8matching24matches_complex_selector*
   fun:{{closure}}<closure>
}
Flags: needinfo?(jseward)
Flags: needinfo?(jryans)
(In reply to Julian Seward [:jseward] from comment #17)
> (In reply to J. Ryan Stinnett [:jryans] (use ni?) from comment #16)
> 
> Thanks for the test runs.
> 
> > Should we add suppressions for these cases?
> 
> Yes.  Could you pls try with the following, added to the end of 
> build/valgrind/x86_64-redhat-linux-gnu.sup ?
> 
> {
>    map_or<selectors::parser::Combinator,bool,closure> #1 (see bug 1365915)
>    Memcheck:Cond
>    fun:map_or<selectors::parser::Combinator,bool,closure>
>    fun:_ZN9selectors8matching33matches_complex_selector_internal*
>    fun:_ZN9selectors8matching24matches_complex_selector*
>    fun:matches_selector<style::gecko::wrapper::GeckoElement,closure>
> }
> 
> {
>    map_or<selectors::parser::Combinator,bool,closure> #2 (see bug 1365915)
>    Memcheck:Cond
>    fun:map_or<selectors::parser::Combinator,bool,closure>
>    fun:_ZN9selectors8matching33matches_complex_selector_internal*
>    fun:_ZN9selectors8matching24matches_complex_selector*
>    fun:{{closure}}<closure>
> }

Here's a new run with the suppressions:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=6d621f1b495db64be57775b7709bc69229b92ba4
Flags: needinfo?(jryans)
Looks like this is sufficient for things to pass!

I'll attach my patches for this.  I filed bug 1390185 to add a separate Valgrind run with Stylo fully enabled at runtime.
Comment on attachment 8897034 [details]
Bug 1365915 - Enable Stylo for Valgrind runs.

https://reviewboard.mozilla.org/r/168310/#review173536

How confident are we that the same sort of errors aren't going to pop up in other parts of the Stylo code?
Attachment #8897034 - Flags: review?(nfroyd) → review+
Attachment #8897035 - Flags: review?(jseward) → review+
(In reply to Nathan Froyd [:froydnj] from comment #22)
> How confident are we that the same sort of errors aren't going to pop up in
> other parts of the Stylo code?

I'm unclear what you mean here.  Do you mean "are similar false positives likely
elsewhere", or "is it possible that these suppressions could hide real errors"?

For the first, there could be other false positives, but I think they'll
be rare.  I did a test run (manual) today with V, these suppressions, and
Stylo, and it ran clean.

For the second, the suppressions are pretty specific and I don't think
we're likely to miss any real errors.
Pushed by jryans@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/b8f416eba875
Enable Stylo for Valgrind runs. r=froydnj
https://hg.mozilla.org/integration/autoland/rev/f7a81e5a7de9
Add Valgrind suppressions for Stylo. r=jseward
https://hg.mozilla.org/mozilla-central/rev/b8f416eba875
https://hg.mozilla.org/mozilla-central/rev/f7a81e5a7de9
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla57
You need to log in before you can comment on or make changes to this bug.