Closed Bug 1404791 Opened 7 years ago Closed 3 years ago

stylo: Crash in geckoservo::glue::Servo_ComputedValues_EqualCustomProperties

Categories

(Core :: CSS Parsing and Computation, defect, P3)

defect

Tracking

()

RESOLVED WORKSFORME
Tracking Status
firefox-esr52 --- unaffected
firefox-esr60 --- wontfix
firefox55 --- unaffected
firefox56 --- unaffected
firefox57 --- wontfix
firefox58 --- wontfix
firefox59 --- wontfix
firefox60 --- wontfix
firefox61 --- wontfix
firefox62 --- wontfix
firefox63 --- wontfix
firefox64 --- wontfix
firefox65 --- wontfix

People

(Reporter: cpeterson, Unassigned)

References

(Depends on 1 open bug)

Details

(5 keywords, Whiteboard: [sec-triage-backlog])

Crash Data

Attachments

(1 file)

This bug was filed from the Socorro interface and is 
report bp-9c36234d-3cbb-46d9-a932-316c90171002.
=============================================================

135 crash reports from 62 installations. Among Stylo-specific crash signatures, this is Stylo top crash #4.
Summary: stylo: Crash in memcmp | geckoservo::glue::Servo_ComputedValues_EqualCustomProperties → stylo: Crash in geckoservo::glue::Servo_ComputedValues_EqualCustomProperties
All URLs from crash reports are YouTube videos.
So, we're crashing when performing deep-equality on the CustomPropertiesMap hashtables.

dmajor and I dug into some of the minidumps. We're crashing when trying to extract nsIAtom::mHash from garbage memory (0xe5d5..., 0x0, or similar). This happens in HashMap::eq(), which looks like this:

self.iter().all(|(key, value)| other.get(key).map_or(false, |v| *value == *v))

So we iterate the first map, which gives us a bunch of key/value pairs, where the nsIAtom* is the key. And at least one of those pointers is garbage. This strongly suggests that the storage of at least one (k,v) pair returned from iterating the first map is uninitialized memory. This makes sense, since upon table allocation, only the hash storage is zeroed, and the (k,v) storage is left uninitialized as an optimization.

This also suggests that the crashes in bug 1385925 are indeed due to an std::HashMap bug, rather than external corruption. This also explains why bug 1403397 hasn't found any strong evidence of external memory corruption.

I could further confirm this theory by poisoning the rest of the table buffer with a unique value. It's a bit tricky that these particular crashes only happen on 57 beta (where we don't want to run diagnostic code if we can avoid it), but we may be able to see it on the other HashMap crashes we still have on Nightly.

In the mean time, Gankro is doing some HashMap auditing/fuzzing, looking for ways this might happen. Marking this as s-s, since a memory hazard in std::HashMap would be a big deal.
Group: core-security
See Also: → 1385925
Assignee: nobody → bobbyholley
So I captured a 1.5G log of all the custom property insertions (and associated precomputed hashes) on youtube: https://www.dropbox.com/s/15wbkopvj3893tw/youtube%20custom%20property%20log.bz2?dl=0

The plan was for Gankro to feed that into a fuzzer. But I just realized that, up until 4 days ago [1], the custom properties map used the default hasher, with randomized state. So we won't be able to reliably reproduce the hashes. That said, we can at least try with the default hasher on those atoms, by just feeding the precomputed hash back into a RandomState hasher [2]. The crashes are consistent enough across installations that they can't be entirely dependent on a single random seed.

Interestingly, these crashes are _almost_ exclusively on beta (with one exception on 58a and one exception on 57a) over the past 14 days. On the other hand, the switch to PrecomputedHash on nightly only happened 4 days ago.

Still, I think it's worth getting [1] uplifted (which we were going to do anyway in bug 1385971) to see if that has any impact on the crashes.

NI Manish to request uplift on [1] (and the rest of bug bug 1385971), and NI Gankro to try fuzzing with DefaultHasher re-hashing the hashvalues in the log.

[1] https://github.com/servo/servo/pull/18679
[2] http://searchfox.org/mozilla-central/rev/e62604a97286f49993eb29c493672c17c7398da9/servo/components/style/gecko_string_cache/mod.rs#283
Flags: needinfo?(manishearth)
Flags: needinfo?(a.beingessner)
Oh! I didn't realize these were going through the random hasher. Will change.

I'm new to reading aggregate crash logs, do we have any sense on if this is in any way platform-specific? I'm wondering if this might be hashmap getting miscompiled. Also: y'all were seeing this long before you forked HashMap, right?
Flags: needinfo?(a.beingessner)
Attached file insertions.txt
This is the result of feeding the log through `awk '!a[$0]++'`, which the internet tells me deduplicates lines.
(In reply to Alexis Beingessner [:Gankro] from comment #4)
> Oh! I didn't realize these were going through the random hasher. Will change.
> 
> I'm new to reading aggregate crash logs, do we have any sense on if this is
> in any way platform-specific?

They're almost all on windows (there is exactly one crash on mac). But that's not at all unusual, given how skewed our population is.

> I'm wondering if this might be hashmap getting miscompiled.

It's possible, but IMO unlikely. If there are any theories about specific ways it might be getting miscompiled it's straightforward to verify them.

> Also: y'all were seeing this long before you forked HashMap,
> right?

Correct.
If we stop seeing this crash after the elimination of RandomState I wonder if it's because SipHash13 has some bad initialization values (that get hit when the random state starts at a particular value) or something. Though that shouldn't cause unsafety since Hash and Eq are safe APIs (and thus should be totally safe to mis-write) and collisions should not cause anything other than correctness problems. (Though that's worth checking, do we ever assume that hashes do not collide?)
Flags: needinfo?(manishearth)
HashMap should be perfectly capable of handling hash collisions -- although this is probably an incredibly untested path considering it's ~1/2^63 for the builtin hasher.
Group: core-security → layout-core-security
Depends on: 1405879
Tracking this investigation in bug 1406996.
Priority: P2 → P3
Keywords: regression
status-firefox57=wontfix unless someone thinks this bug should block 57
(In reply to Alexis Beingessner [:Gankro] from comment #8)
> HashMap should be perfectly capable of handling hash collisions -- although
> this is probably an incredibly untested path considering it's ~1/2^63 for
> the builtin hasher.

What is the the hash function used in HashMap? 
Assuming it's not collision resistant (which functions for hash storage usually aren't), an attacker can blow this low probability out of proportion.
Does that change anything?
How can that be even possible?

Gecko_CalcStyleDifference always passes aIgnoreVariables = true:

  https://searchfox.org/mozilla-central/rev/14d933246211b02f5be21d2e730a57cf087c6606/layout/style/ServoBindings.cpp#378

Which means that unless it got miscompiled or what not, that function should never be called:

  https://searchfox.org/mozilla-central/rev/14d933246211b02f5be21d2e730a57cf087c6606/layout/style/nsStyleContext.cpp#165
Whiteboard: [sec-triage-backlog]
As part of layout security bug scrub, marking bugs being tracked in the Rust HashMap investigation (bug 1406996) as stalled for now.
Keywords: stalled
Assignee: bobbyholley → nobody

Looks like the crash volume has been at zero here for a while. Calling this WORKSFORME.

Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → WORKSFORME

Since the bug is closed, the stalled keyword is now meaningless.
For more information, please visit auto_nag documentation.

Keywords: stalled
Group: layout-core-security → core-security-release
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: