Closed
Bug 1405879
Opened 7 years ago
Closed 7 years ago
stylo: poison the hashtable buffers
Categories
(Core :: CSS Parsing and Computation, defect, P1)
Core
CSS Parsing and Computation
Tracking
()
RESOLVED
FIXED
mozilla58
People
(Reporter: bholley, Assigned: bholley)
References
(Blocks 1 open bug)
Details
(Keywords: sec-audit, sec-want, Whiteboard: [keep hidden while bug 1406996 is open][adv-main58-][post-critsmash-triage])
Attachments
(3 files)
1.29 KB,
patch
|
manishearth
:
review+
|
Details | Diff | Splinter Review |
2.33 KB,
patch
|
Details | Diff | Splinter Review | |
3.51 KB,
text/plain
|
Details |
Per investigation in bug 1385925, bug 1404791, and bug 1403845, I have a reasonably strong suspicion that there's an uninitialized memory read in rust's std::HashMap. HashMap stores the table as follows: [h,h,h...][(k,v),(k,v),(k,v)...] If a given hash is nonzero, it's assumed that the (k,v) at the same index is valid. As such, only the first portion of the table is zeroed on construction, and the rest is left uninitialized. We're seeing a lot of crashes trying to access keys/values at addresses near 0x0 and 0xe5e5, which suggests that this invariant isn't holding. As both a diagnostic tool and a safety improvement, I want to poison the entire buffer, so that any uninitialized memory has a safe (and recognizable) pattern. This should hurt hashtable construction performance a little but, but given that we already poison memory on free, we're basically just making these allocations as expensive as the corresponding deallocations, which is probably acceptable.
Assignee | ||
Comment 1•7 years ago
|
||
MozReview-Commit-ID: 8uLGtFv6X4P
Attachment #8915371 -
Flags: review?(manishearth)
Updated•7 years ago
|
Attachment #8915371 -
Flags: review?(manishearth) → review+
Assignee | ||
Comment 2•7 years ago
|
||
https://github.com/servo/servo/pull/18751
Comment 3•7 years ago
|
||
Just a note, we use 0xe4 for junk fill in mozjemalloc. It might make sense to align with that.
Assignee | ||
Comment 4•7 years ago
|
||
(In reply to Eric Rahm [:erahm] (please no mozreview requests) from comment #3) > Just a note, we use 0xe4 for junk fill in mozjemalloc. It might make sense > to align with that. I specifically wanted to use something else that wouldn't come from elsewhere, which is why I went with 0xe7 (which, according to 1044077#c4, should have the same properties).
Comment 5•7 years ago
|
||
(In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #0) > I have a reasonably strong suspicion that there's an uninitialized memory > read in rust's std::HashMap. I am (somewhat) surprised to hear that, given that I've been running Stylo-ised Fx builds on Valgrind on a fairly regular basis these past few months, and have never seen any such complaints. Maybe it happens on a very obscure path? Or is there some other allocator in between hash table creation and the underlying malloc, that recycles memory in such a way that it does not appear to be uninitialised? If we had a way, in your patch, to call MOZ_MAKE_MEM_{UNDEFINED,NOACCESS} (in mfbt/MemoryChecking.h), or the direct Valgrind equivalents thereof, then we could verify this a bit more directly, at least for local testing. In your investigations, have you seen anything that suggests these crashes could be caused by a data race?
Updated•7 years ago
|
Assignee | ||
Comment 6•7 years ago
|
||
https://hg.mozilla.org/integration/autoland/rev/2cd9b59aef83 (In reply to Julian Seward [:jseward] from comment #5) > (In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #0) > > I have a reasonably strong suspicion that there's an uninitialized memory > > read in rust's std::HashMap. > > I am (somewhat) surprised to hear that, given that I've been running > Stylo-ised Fx builds on Valgrind on a fairly regular basis these past few > months, and have never seen any such complaints. > > Maybe it happens on a very obscure path? Well, hashtables are probabilistic data structures. My hypothesis is that we're hitting an edge cases in the robin-hood algorithm, which would only arise in a very particular initial state of occupied/unoccupied buckets when the problematic insertion occurs. With that, it seems plausible that the uninitialized reads could be vanishingly rare. See bug 1404791 comment 2 for an analysis of some of the effects we're seeing. > Or is there some other allocator > in between hash table creation and the underlying malloc, that recycles > memory in such a way that it does not appear to be uninitialised? > > If we had a way, in your patch, to call MOZ_MAKE_MEM_{UNDEFINED,NOACCESS} > (in mfbt/MemoryChecking.h), or the direct Valgrind equivalents thereof, then > we could verify this a bit more directly, at least for local testing. Yeah. It's possible that the problem actually occurs more frequently, but that most of the time the contents of the associated (k,v) memory are simply leftovers from a previous (k,v), which would basically work fine for lookups (we'd compare the key, it wouldn't match, and we'd move on). Given that we're using Atoms as keys though, this would result in later (harder-to-detect) refcounting errors and prematures frees when the hashtable was dropped (since we'd free the same atom twice). Anyway, instrumenting hashglobe in local builds could certainly be worthwhile. > In your investigations, have you seen anything that suggests these crashes > could be caused by a data race? I think that is very unlikely. We have several hashmaps being used in entirely different contexts. A data race in either would be pretty unlikely, and in both even less so.
Updated•7 years ago
|
Assignee | ||
Updated•7 years ago
|
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 7•7 years ago
|
||
Need to wait a day or two to verify on crash stats that the poisoning works.
Comment 8•7 years ago
|
||
did this land? Perhaps you forgot to mark it here; automatic marking doesn't work on sec bugs.
Flags: needinfo?(bobbyholley)
Assignee | ||
Comment 9•7 years ago
|
||
(In reply to Randell Jesup [:jesup] from comment #8) > did this land? Perhaps you forgot to mark it here; automatic marking > doesn't work on sec bugs. It landed in https://hg.mozilla.org/integration/autoland/rev/2cd9b59aef83 per comment 6. For stuff that gets synced over by the VCS sync service, we tend to resolve fixed when it hits autoland, since we don't get automatic marking. I will be watching the crash reports to see if this does the right thing, and flag for uplift if so.
Flags: needinfo?(bobbyholley)
Comment 10•7 years ago
|
||
> It landed in https://hg.mozilla.org/integration/autoland/rev/2cd9b59aef83
> per comment 6.
Aha... I missed that since it was just sitting at the head of large comment
Comment 11•7 years ago
|
||
(In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #6) > Anyway, instrumenting hashglobe in local builds could certainly be worthwhile. I instrumented components/hashglobe/src/table.rs using the attached patch, which is derived from yours. Note, the use of zutil.c is entirely arbitrary -- I just needed a C source file in which to park the helper function. Running startup, + Obama, + techcrunch.com, + the W3 single-page spec, I found no evidence of any unitialised value uses arising from that annotation. The fprintf verified that it was actually getting used. So I have no evidence in support of your hypothesis, alas.
Comment 12•7 years ago
|
||
Comment 13•7 years ago
|
||
(In reply to Julian Seward [:jseward] from comment #11) > I have no evidence in support of your hypothesis I did however end up looking at the attached Valgrind report, which is unrelated to your hypothesis. It's something I've seen several times before, but discounted as a false positive. However, on looking at it again, and with some help from Michael Woerister, I think my initial judgement was wrong. It appears to pertain to RawVec deallocation. Do you think it could be relevant?
Updated•7 years ago
|
Flags: needinfo?(bobbyholley)
Updated•7 years ago
|
Target Milestone: --- → mozilla58
Updated•7 years ago
|
Group: core-security → core-security-release
Assignee | ||
Comment 14•7 years ago
|
||
(In reply to Julian Seward [:jseward] from comment #13) > Created attachment 8915978 [details] > A Valgrind complaint relating to RawVec deallocation > > (In reply to Julian Seward [:jseward] from comment #11) > > I have no evidence in support of your hypothesis > > I did however end up looking at the attached Valgrind report, which is > unrelated to your hypothesis. It's something I've seen several times > before, but discounted as a false positive. However, on looking at it > again, and with some help from Michael Woerister, I think my initial > judgement was wrong. > > It appears to pertain to RawVec deallocation. Do you think it could be > relevant? At first glance, this appears to be related to the OS string we create in the one of the env::var calls in global_style_data.rs. It doesn't really seem related to hashmap, aside from the fact that they go through the same allocator. If you have reason to believe there's something fishy there it may be worth investigating, but I don't see an on-face reason why it would be related to the hashmap corruption.
Flags: needinfo?(bobbyholley)
Comment 15•7 years ago
|
||
I am seeing 2 crashes/1 install in [@ hashglobe::diagnostic::DiagnosticHashMap<T>::report_corruption<T>] which seem to point back to this landing.
Comment 16•7 years ago
|
||
(In reply to Marcia Knous [:marcia - use ni] from comment #15) > I am seeing 2 crashes/1 install in [@ > hashglobe::diagnostic::DiagnosticHashMap<T>::report_corruption<T>] which > seem to point back to this landing. Sorry, I neglected to include the link to the crash report: https://crash-stats.mozilla.com/report/index/8d69088a-a876-4b40-beb4-979f40171009
Assignee | ||
Updated•7 years ago
|
Blocks: stylo-hashmap-crashes
Comment 18•7 years ago
|
||
Yes; this is pure diagnostic code that should not make it into a release.
Flags: needinfo?(manishearth)
Updated•7 years ago
|
Comment 19•7 years ago
|
||
(In reply to Manish Goregaokar [:manishearth] from comment #18) > Yes; this is pure diagnostic code that should not make it into a release. Ok, so how are we making sure it doesn't get to release? It's not inside whatever-Rust-thing is needed to be nightly-only or nightly-and-early-beta/etc....
Flags: needinfo?(manishearth)
Comment 20•7 years ago
|
||
(In reply to Randell Jesup [:jesup] from comment #19) > (In reply to Manish Goregaokar [:manishearth] from comment #18) > > Yes; this is pure diagnostic code that should not make it into a release. > > Ok, so how are we making sure it doesn't get to release? It's not inside > whatever-Rust-thing is needed to be nightly-only or > nightly-and-early-beta/etc.... What we've done before is just backing out the diagnostics once we got enough info.
Comment 21•7 years ago
|
||
> Ok, so how are we making sure it doesn't get to release?
What emilio said, basically. We'll hopefully be backing these out next week.
Flags: needinfo?(manishearth)
Updated•7 years ago
|
Whiteboard: [adv-main57-]
Updated•7 years ago
|
Whiteboard: [adv-main57-] → [adv-main58-]
Updated•7 years ago
|
Flags: qe-verify-
Whiteboard: [adv-main58-] → [adv-main58-][post-critsmash-triage]
Updated•6 years ago
|
Group: core-security-release
Updated•6 years ago
|
Group: core-security-release
Whiteboard: [adv-main58-][post-critsmash-triage] → [keep hidden while bug 1406996 is open][adv-main58-][post-critsmash-triage]
Updated•4 years ago
|
Group: core-security-release
You need to log in
before you can comment on or make changes to this bug.
Description
•