Closed Bug 1439440 Opened 7 years ago Closed 3 years ago

Crash in style::rule_tree::RuleTree::compute_rule_node

Categories

(Core :: CSS Parsing and Computation, defect, P2)

57 Branch
defect

Tracking

()

RESOLVED WORKSFORME
Tracking Status
firefox-esr52 --- unaffected
firefox-esr60 --- wontfix
firefox58 --- wontfix
firefox59 --- wontfix
firefox60 --- wontfix
firefox61 --- wontfix
firefox62 --- fix-optional

People

(Reporter: philipp, Unassigned)

References

Details

(Keywords: crash, regression, Whiteboard: qa-not-actionable)

Crash Data

This bug was filed from the Socorro interface and is report bp-d0f82fbb-84d6-40c0-84de-2bb780180124. ============================================================= Top 10 frames of crashing thread: 0 xul.dll style::rule_tree::RuleTree::compute_rule_node servo/components/style/rule_tree/mod.rs:283 1 xul.dll style::style_resolver::StyleResolverForElement<style::gecko::wrapper::GeckoElement>::match_primary<style::gecko::wrapper::GeckoElement> servo/components/style/style_resolver.rs:446 2 xul.dll style::style_resolver::StyleResolverForElement<style::gecko::wrapper::GeckoElement>::resolve_style<style::gecko::wrapper::GeckoElement> servo/components/style/style_resolver.rs:224 3 xul.dll style::traversal::compute_style<style::gecko::wrapper::GeckoElement> servo/components/style/traversal.rs:607 4 xul.dll rayon_core::job::{{impl}}::execute<closure> third_party/rust/rayon-core/src/job.rs:139 5 xul.dll rayon_core::registry::WorkerThread::wait_until<rayon_core::latch::CountLatch> third_party/rust/rayon-core/src/registry.rs:433 6 xul.dll std::sys_common::backtrace::__rust_begin_short_backtrace<closure, > src/libstd/sys_common/backtrace.rs:131 7 xul.dll alloc::boxed::{{impl}}::call_box<, closure> src/liballoc/boxed.rs:725 8 xul.dll std::sys::imp::thread::{{impl}}::new::thread_start src/libstd/sys/windows/thread.rs:54 9 kernel32.dll BaseThreadInitThunk ============================================================= content crashes with this signature are starting to show up in 57 release till recent nightly builds. they are coming from windows builds, over 90% from 32bit firefox. a smaller subsection of reports is annotated with MOZ_CRASH Reason 'Locked::read_with called with a guard from an unrelated SharedRwLock'.
That needs to be some sort of memory corruption, because we only use a single SharedRwLock.
[ Triage 2017/02/20: P2 ] P2 bugs may become P1's after further analysis. Please prioritize diagnosis and repair.
Priority: -- → P2
Pretty low-frequency, but still happening on 60.
I looked at the stats again, and for the null derefs the amount of AMD CPUs seems surprisingly high... Maybe a CPU bug related to memory synchronization? This code runs in parallel many times, and it's well audited, so crashing with a null deref should be impossible.
Crash Signature: [@ style::rule_tree::RuleTree::compute_rule_node] → [@ style::rule_tree::RuleTree::compute_rule_node] [@ static struct style::rule_tree::StrongRuleNode style::rule_tree::RuleTree::compute_rule_node]
Hmmm. The other possibility is that because the cache impls are different between AMD and Intel, if a memory barrier or lock is missing somewhere, multithreaded code may (be more likely to) end up picking up the wrong/stale value on one thread. I realize this is well-audited; how certain are we that this data is either only accessed on one thread, or if multiple threads access it that memory barriers or locks are guaranteed to be correct? Are there any assertions of this we could use in debug (or perhaps in nightly at a small-enough runtime cost)? I suspect we don't want the overhead of assertions here in release or even beta code, and possibly not in nightly either. Emilio: what do you think?
Flags: needinfo?(emilio)
(In reply to Maire Reavy [:mreavy] Plz needinfo? from comment #6) > I realize this is well-audited; how certain are we > that this data is either only accessed on one thread, or if multiple threads > access it that memory barriers or locks are guaranteed to be correct? I'm pretty sure the memory ordering bits here are correct, otherwise this crash would be all over the place in Fennec (since ARM has not aquire-release semantics by default unlike x86), which is not the case. > Are there any assertions of this we could use in debug (or perhaps in > nightly at a small-enough runtime cost)? I suspect we don't want the > overhead of assertions here in release or even beta code, and possibly not > in nightly either. I can't think of anything that would be straight-forward and would actually keep testing the same.
Flags: needinfo?(emilio)
(Also given how hot is this code, this crash wouldn't be so low volume I'd think)
Whiteboard: qa-not-actionable

Closing because no crashes reported for 12 weeks.

Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.