1389009 - stylo: Add fallible append APIs for Vec and SmallVec

It appears the answer is "no": there are no large infallible allocations from Stylo. Here are details of my methodology. I mention this because I'm sure this is easy to get wrong. Feel free to shoot holes in it. My mozconfig and measurement patch are attached. The patch prints all allocations via moz_x* over a certain threshold. I tried with thresholds 1024k(bytes), 128k and 16k. I loaded both the Wikipedia Obama page and the HTML5 spec-as-a-single page, waited for both to load, and scrolled down and back up. I ran with STYLO_THREADS=4 DUMP_STYLE_STATISTICS=2, so as to verify that Stylo actually ran, and in parallel. In all 3 log files there are a bunch of "[PERF]" prefixed lines which confirm that Stylo was used. In the 1024k and 128k threshold cases I see no Rust source lines in any stack, so Stylo cannot be doing infallible allocations over 128k. For the 16k cases, I only see Rust code in stacks in which Rayon registers its worker threads with the Gecko profiler (it looks like) and that allocates ThreadInfo objects of size 32808. So no Stylo allocation there, either.

Bobby Holley (:bholley)

Comment 7

•

8 years ago

Great news! Can you try with the 100x myspace testcase? Download https://www.dropbox.com/s/h51fspacftf1lcq/myspace.tar.bz2?dl=0, and use the "myspace" directory (not myspace-orig), which has the stylesheet from myspace-orig duplicated 100x. That will give us some sense of whether increasing the size of the stylesheet causes any difference. NI bz for his thoughts on the methodology and for suggestions of other pessimal testcases.

Flags: needinfo?(jseward)

Boris Zbarsky [:bzbarsky]

Comment 8

•

8 years ago

> NI bz for his thoughts on the methodology My main question is what allocation functions rust code actually ends up calling. Does it call moz_xmalloc, or does it call malloc and then panic on failure? Nathan, do you know offhand? > suggestions of other pessimal testcases Make sure you have Adblock Plus installed when testing? What sites were tested?

Flags: needinfo?(nfroyd)

Boris Zbarsky [:bzbarsky]

Comment 9

•

8 years ago

> Does it call moz_xmalloc If I understood Nathan correctly on IRC, rust doesn't call moz_xmalloc. It calls malloc (or whatever other system allocator functions it calls), then panics on failure.

Flags: needinfo?(nfroyd)

Julian Seward [:jseward]

Assignee

Comment 10

•

8 years ago

(In reply to Boris Zbarsky [:bz] (vacation Aug 14-27) from comment #9) > If I understood Nathan correctly on IRC, rust doesn't call moz_xmalloc. It > calls malloc (or whatever other system allocator functions it calls), then > panics on failure. Mrrmh, ok. That invalidates my results, then. I'll try again in the morning. Thanks for the sanity check.

Flags: needinfo?(jseward)

Nicholas Nethercote [inactive]

Comment 11

•

8 years ago

> If I understood Nathan correctly on IRC, rust doesn't call moz_xmalloc. It > calls malloc (or whatever other system allocator functions it calls), then > panics on failure. Makes sense... you'd expect some Stylo allocations in the 16 KiB results. Apologies for suggesting a dud methodology, Julian, and thanks bz/froydnj for identifying this.

Julian Seward [:jseward]

Assignee

Comment 12

•

8 years ago

No problem. This is what I envisaged when I said above "I'm sure this is easy to get wrong." Let me try to identify the calling path for vanilla Rust allocations in Stylo to see if there is some feasible place we can intercept calls to malloc. My fear is that either malloc is called from a (Rust) system library, or the function(s) that call(s) malloc is inlined or is otherwise dispersed, so that there is no single place we can observe the size of requests passed to malloc. Either of those would make it difficult to find out what we want to know.

Bobby Holley (:bholley)

Comment 13

•

8 years ago

I don't understand. Why can't we just intercept calls to malloc like we did for xmalloc, and then filter by stacks emanating from rust code?

Julian Seward [:jseward]

Assignee

Comment 14

•

8 years ago

In principle that's a good idea. In practice I found it difficult to identify the source code that actually generates |malloc| in |firefox-bin|, due to the complexity of the intercepting and routing machinery in memory/build/ and memory/mozalloc/. After some digging I believe that it is created by an instantation of the macro GENERIC_MALLOC_DECL_HELPER at memory/build/replace_malloc.c:159. AFAICT there are no non-macro-generated bits of source code that constitute |malloc|, |calloc|, |realloc|, etc. I see from profiling that |malloc| always just calls on directly to |je_malloc|, so I decided to put intercepts in that group of functions instead. Even that's confusing; what's called |malloc_impl| at mozjemalloc.cpp:4506 seems to turn up as |je_malloc| in the object code. I tested this with a 1MB threshold, loading: - the Wikipedia Obama page - the HTML5 spec-as-a-single-page page - Bobby's 100x myspace testcase per comment 7 and I had Adblock Plus installed. The results are attached. Note, these are simply, all allocation requests of 1MB and over, with no obvious way to distinguish fallible from infallible requests. There are quite a lot of stacks with involving .rs source files.

Julian Seward [:jseward]

Assignee

Comment 15

•

8 years ago

Attached file infallible allocs of 1024KB and over, take 2 — Details

Attachment #8895779 - Attachment is obsolete: true

Attachment #8895780 - Attachment is obsolete: true

Attachment #8895781 - Attachment is obsolete: true

Julian Seward [:jseward]

Assignee

Comment 16

•

8 years ago

Attached patch patch used for measurement, take 2 — Details — Splinter Review

Attachment #8895778 - Attachment is obsolete: true

Julian Seward [:jseward]

Assignee

Comment 17

•

8 years ago

Attached file infallible allocs of 1024KB and over, take 2, with C++-only stacks removed — Details

The same log file as for comment 15, but restricted to stacks involving Rust sources, for easier analysis. It looks like there might be some areas of concern.

Julian Seward [:jseward]

Assignee

Comment 18

•

8 years ago

Fishing around in the logfile from comment 17, looking only at stacks involving path component /servo/, I see (at least) the calls below. So it looks like there is some work to do, assuming these assume infallibility. 1048576 bytes selectors::parser::parse_one_simple_selector -> into<&str,style::gecko_string_cache::Atom> -> from -> Gecko_Atomize (C++) -> (more C++ frames) -> je_calloc 1966080 bytes entry<style::gecko_string_cache::Atom,style::selector_map::SelectorMap<style::invalidation::element::invalidation_map::Dependency>,core::hash::BuildHasherDefault<style::selector_map::PrecomputedHasher>> -> <std::collections::hash::map::HashMap<K, V, S>>::resize -> new<style::gecko_string_cache::Atom,style::selector_map::SelectorMap<style::invalidation::element::invalidation_map::Dependency>> -> new_uninitialized<style::gecko_string_cache::Atom,style::selector_map::SelectorMap<style::invalidation::element::invalidation_map::Dependency>> -> allocate -> UnknownInlinedFun -> UnknownInlinedFun -> je_malloc 1048576 bytes insert<style::stylist::Rule> -> entry<smallvec::SmallVec<[style::stylist::Rule; 1]>> -> <std::collections::hash::map::HashMap<K, V, S>>::entry -> <std::collections::hash::map::HashMap<K, V, S>>::resize -> new<style::gecko_string_cache::Atom,smallvec::SmallVec<[style::stylist::Rule; 1]>> -> new_uninitialized<style::gecko_string_cache::Atom,smallvec::SmallVec<[style::stylist::Rule; 1]>> -> allocate -> UnknownInlinedFun -> UnknownInlinedFun -> je_malloc 1048576 bytes parse_rules -> push<style::stylesheets::CssRule> -> <alloc::raw_vec::RawVec<T>>::double -> reallocate -> UnknownInlinedFun -> UnknownInlinedFun -> je_realloc

Emilio Cobos Álvarez [:emilio]

Comment 19

•

8 years ago

(In reply to Julian Seward [:jseward] from comment #18) > Fishing around in the logfile from comment 17, looking only at stacks > involving path component /servo/, I see (at least) the calls below. So it > looks like there is some work to do, assuming these assume infallibility. > > 1048576 bytes > selectors::parser::parse_one_simple_selector > -> into<&str,style::gecko_string_cache::Atom> > -> from > -> Gecko_Atomize (C++) > -> (more C++ frames) > -> je_calloc So, there's not too much we could do here, I think, this is part of the gecko atom machinery. > 1966080 bytes > entry<style::gecko_string_cache::Atom,style::selector_map::SelectorMap<style: > :invalidation::element::invalidation_map::Dependency>,core::hash:: > BuildHasherDefault<style::selector_map::PrecomputedHasher>> > -> <std::collections::hash::map::HashMap<K, V, S>>::resize > -> > new<style::gecko_string_cache::Atom,style::selector_map::SelectorMap<style:: > invalidation::element::invalidation_map::Dependency>> > -> > new_uninitialized<style::gecko_string_cache::Atom,style::selector_map:: > SelectorMap<style::invalidation::element::invalidation_map::Dependency>> > -> allocate > -> UnknownInlinedFun > -> UnknownInlinedFun > -> je_malloc > > 1048576 bytes > insert<style::stylist::Rule> > -> entry<smallvec::SmallVec<[style::stylist::Rule; 1]>> > -> <std::collections::hash::map::HashMap<K, V, S>>::entry > -> <std::collections::hash::map::HashMap<K, V, S>>::resize > -> > new<style::gecko_string_cache::Atom,smallvec::SmallVec<[style::stylist::Rule; > 1]>> > -> > new_uninitialized<style::gecko_string_cache::Atom,smallvec::SmallVec<[style:: > stylist::Rule; 1]>> > -> allocate > -> UnknownInlinedFun > -> UnknownInlinedFun > -> je_malloc These two are known-ish, and we could make them fallible (Gecko does that, at the expense of incorrect styling, but that seems acceptable). There is no API to push to a raw vector fallibly right now though (nor insert in a HashMap, fwiw), so it can be hard... > 1048576 bytes > parse_rules > -> push<style::stylesheets::CssRule> > -> <alloc::raw_vec::RawVec<T>>::double > -> reallocate > -> UnknownInlinedFun > -> UnknownInlinedFun > -> je_realloc Not sure we can do too much about this either, this one is parsing stylesheets, which we assume infallible. We could make them fallible I guess, and just lose track of the stylesheets, but I'm not sure it's worth it? (Also, not sure whether Gecko does it).

Emilio Cobos Álvarez [:emilio]

Comment 20

•

8 years ago

So, I did a few analysis of the oom crashes in bug 1388301 comment 5. Note that all of them are win32, and that those are _all_ crashes reported ever for stylo (there's only a single one stylo-related on beta). Seems like those allocations aren't a huge problem, but we should probably consider making the rule cascades not OOM (that's a few of the crashes, though some of the signatures are from bug 1380488, where the culprit ended up not being that). We could also look into what could we do with stylesheet parsing, we could probably improve memory usage there quite a bit, and there are two ooms reported (again, most of them not recently at all, so could be for other culprits).

Nicholas Nethercote [inactive]

Comment 21

•

8 years ago

Note that Nightly users tend to have higher-end machines with more RAM, and so the OOM rate is higher on release. So it's easy to underestimate the magnitude of the OOM situation when you're just looking at Nightly. We regularly convert large infallible allocations to fallible in Firefox based on crash report data.

Emilio Cobos Álvarez [:emilio]

Comment 22

•

8 years ago

(In reply to Nicholas Nethercote [:njn] from comment #21) > Note that Nightly users tend to have higher-end machines with more RAM, and > so the OOM rate is higher on release. That's a fair point. I don't think we can easily make fallible allocations right now on stable Rust without forking Vec and HashMap. Note that it takes even a bit more work than that because we use SmallVec, so we'd need to fork that too (and maybe others? I'm not sure). In any case, depending on how much we care about it, there's active discussion on fallible allocations in Rust[1] (the WR team has been pushing for this). Forking half of libcollections isn't trivial, so I'd prefer to wait until we have proper fallible allocations in Rust's standard library. [1]: https://internals.rust-lang.org/t/notes-interviews-on-fallible-collection-allocations/5619

Simon Sapin (:SimonSapin)

Comment 23

•

8 years ago

(SmallVec is not in the standard library, we maintain it. But it is based on Vec to allocate on stable Rust with the standard library’s allocator.)

Nicholas Nethercote [inactive]

Comment 24

•

8 years ago

> Forking half of libcollections isn't trivial, so I'd prefer to wait until we > have proper fallible allocations in Rust's standard library. I'm sure everyone would prefer that. But that won't happen until after Stylo is released, and in the meantime there's a chance that we'll have high numbers of avoidable OOM crashes and waiting a couple of months to fix it won't be acceptable :(

Bobby Holley (:bholley)

Comment 25

•

8 years ago

So. I think we need to give ourselves the capability to support fallible allocations. We don't necessarily need to go crazy converting all the callers to fallible APIs, but we need to be in a position to respond to OOM crashes that appear on beta when we hit a more diverse population of hardware. We can uplift targeted fixes to make calls fallible if need be, but we don't want to be uplifting core data structure replacements. So we need to start preparing now. Here's my proposal to making this as painless as possible: First, we add a crate to crates.io called fallible_vec, or somesuch. This crate is very small, providing basically just a few simple free functions: fallible_append, fallible_insert, etc. It would look like this: https://play.rust-lang.org/?gist=d11096a7a52ac712d38dbb1d411a9a46&version=stable The embedding would have to provide an implementation of __rust_fallible_realloc in the final linked binary. The advantage of this approach is that it lets us patch fallibility into the existing Vec type, which avoids forking lots of code. SmallVec can just have an optional "fallible" feature, which would cause it to pull in fallible_vec, and use it to expose analogous fallible insertion APIs onto SmallVec. The HashMaps will be more of a pain. We will either need to fork the std hashmap and base it on vec, or find a suitable vec-based replacement on crates.io. But once we have it, we can use fallible_vec to add the fallible APIs. What does everybody think here? Curious for Nathan's opinion (since he did the allocators), and also curious as to whether Matt has cycles to make the crate.

Flags: needinfo?(nfroyd)

Flags: needinfo?(mbrubeck)

Summary: stylo: check whether stylo performs large infallible allocations → stylo: Add fallibility support for large allocations

Jack Moffitt [:jack]

Comment 26

•

8 years ago

I am much happier with this approach that doesn't fork the standard Vec along with targeted application. However, this doesn't seem to help us for HashMap. Changing the implementation of HashMap may also lead to chasing performance issues, not to mention the possibility of introducing memory safety issues. We've had multiple years in which to debug and tune Rust's HashMaps, and that is with a bigger number of eyes on the code and more developers using it. There were only a few Stylo crashes in the nightly population, and a theory that there may be more in bigger populations. How many are solved by just the Vec portion of this vs. Vec + HashMap? How many could Emilio's mem optimization solve? Would one or both of those solutions (ie, not touching HashMap) be enough to address the issue here?

Bobby Holley (:bholley)

Comment 27

•

8 years ago

(As a small additionaly note to comment 25, we'd want to mark the double method as #[inline(never)] to keep the hot path fast). (In reply to Jack Moffitt [:jack] from comment #26) > I am much happier with this approach that doesn't fork the standard Vec > along with targeted application. However, this doesn't seem to help us for > HashMap. Changing the implementation of HashMap may also lead to chasing > performance issues, not to mention the possibility of introducing memory > safety issues. We've had multiple years in which to debug and tune Rust's > HashMaps, and that is with a bigger number of eyes on the code and more > developers using it. > > There were only a few Stylo crashes in the nightly population, and a theory > that there may be more in bigger populations. How many are solved by just > the Vec portion of this vs. Vec + HashMap? How many could Emilio's mem > optimization solve? Would one or both of those solutions (ie, not touching > HashMap) be enough to address the issue here? The impression I get from njn (who's been running Firefox stability stuff for a while) is that large allocation OOMs consistently show up when code moves to a wider channel, and that they need to be addressed when that happens. We might get lucky and have it turn out to be a non-issue, but it seems pretty darn risky to hang the entire stylo-in-57 story on that. That said, I would also love to avoid forking HashMap. Here are some ideas: (1) AFAICT, HashMap insertion does reserve(1), which basically checks capacity, and if capacity isn't there, allocates an entirely separate RawTable and re-inserts everything. The key point here is that there's no realloc, so we can maintain the same characteristics as std::HashMap by simply allocating an additional hashmap (fallibly, if we can figure out how to do that), and draining/re-inserting. It might be more palatable to depend on the struct layout of HashMap and monkeypatch in our initial buffer than to fork it. (2) I wonder if we could add FFI APIs to mozjemalloc to help us out here. Could we have a function that just says: "if we try to allocate a chunk of size X next, will it succeed?" This potentially slows down allocations, because it might need to duplicate work. But it might be able to say "you're fine" most of the time, and only have to do that extra work to answer the question precisely in the cases where we're under severe memory pressure. I also don't know how much of the work of malloc is finding the chunk vs other stuff, so I don't have a good sense of how much overhead this would add. glandium, can you comment?

Flags: needinfo?(mh+mozilla)

Manish Goregaokar [:manishearth]

Comment 28

•

8 years ago

(In reply to Jack Moffitt [:jack] from comment #26) > We've had multiple years in which to debug and tune Rust's HashMaps, and that is with a bigger number of eyes on the code and more developers using it. The thing with hashmap performance is that it's highly dependent on workload. Rust's hashmaps are robin-hood and generally perform well for most workloads, but they may not always be the best. So it's not as clear cut; switching hashmap impls may not change perf at all, or even make things faster. I would measure first. The main problem is that std's hashmaps are full of unsafe code. You don't need unsafe to make good hashmaps in Rust, but you do need unsafe to build the particular hashmap design the stdlib has (in particular because it has to deal with complex invariants for vacant/full slots IIRC). There are a bigger number of eyes on std's code, which is why I trust it. I do not feel comfortable with us maintaining a fork with essentially zero eyes. For example, https://crates.io/crates/ordermap is a pure safe crate which we probably can add APIs to (or fork, it's not unsafe, and easier to maintain!). It is generally pretty good, and better over HashMap in some cases (of course, also worse in others). Worth measuring before we use it but I suspect it won't be an issue. There are other hashmap impls out there, though I'm not really aware of which ones are recommended. ------- (In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #27) > AFAICT, HashMap insertion does reserve(1), which basically checks capacity, and if capacity isn't there, allocates an entirely separate RawTable and re-inserts everything. The key point here is that there's no realloc, so we can maintain the same characteristics as std::HashMap by simply allocating A caveat is that there's a bunch of quadratic behavior that may happen here if not done right. I think this was an issue in the stdlib at some point and the fix was to reseed the hasher on creating new maps so that iteration order of the old hashtable is effectively random from the perspective of the new hash table. I think within the resize method it does something different to avoid this. If we're doing this, we should be careful. I don't think it's an insurmountable problem, but it may be a problem to be wary of. > Could we have a function that just says: "if we try to allocate a chunk of size X next, will it succeed?" I like this idea.

Julian Seward [:jseward]

Assignee

Comment 29

•

8 years ago

(In reply to Manish Goregaokar [:manishearth] from comment #28) > > Could we have a function that just says: "if we try to allocate a chunk of size X next, will it succeed?" > I like this idea. It's intrinsically racey, though, and I'm not sure how we could de-race it without taking a big performance hit. Is it OK if it occasionally says "yes (it will succeed)" and later have the allocation fail?

Mike Hommey [:glandium]

Comment 30

•

8 years ago

> (2) I wonder if we could add FFI APIs to mozjemalloc to help us out here. Could we have a function that just says: "if we try to allocate a chunk of size X next, will it succeed?" We're only talking about very large allocations here, right? (>= 1MB). Then the only way for the allocator to know is to actually do it, which means such an API would essentially be: bool can_alloc(size_t size) { void *ptr = malloc(size); if (ptr) { free(ptr); } return !!ptr; } which you might as well do yourself. Note that doesn't prevent some other thread to race with you, get another chunk of memory, preventing your own malloc after the check to fail anyway. Really, what you need is an Hashmap that can deal gracefully with a failure to allocate its underlying buffer.

Mike Hommey [:glandium]

Updated

•

8 years ago

Flags: needinfo?(mh+mozilla)

Mike Hommey [:glandium]

Comment 31

•

8 years ago

Note, the API /could/ take some shorcuts that free(malloc()) wouldn't, but: - It would mean keeping that API functional over changes to the allocator (and those aren't hypothetical changes, I'm planning to make changes to those areas), which means maintenability burden on the allocator end. - That wouldn't change anything about the raceability.

Julian Seward [:jseward]

Assignee

Comment 32

•

8 years ago

(In reply to Mike Hommey [:glandium] from comment #30) > Note that doesn't prevent some other thread to race with you, [..] It's not just other threads in the same process that we're racing against. We're also potentially racing against other processes in the system since processes globally compete for memory resources (ram, swap).

Mike Hommey [:glandium]

Comment 33

•

8 years ago

BTW, that's already the second API proposed at the allocator level, with dependencies on the allocator guts, with the maintenability burden that has, to avoid maintaining a custom Hashmap implementation in rust, that deals with fallible allocation and can tell how much memory it allocated. At this point, I'm willing to say the latter would be less a burden than the former. And I'm not saying this because I'd be the maintainer in the former case and not in the latter case: I'd rather maintain such a hashmap implementation if necessary than those 2 extra allocator APIs (meaning, I'm volunteering if necessary).

Bobby Holley (:bholley)

Updated

•

8 years ago

Priority: -- → P1

Nathan Froyd [:froydnj]

Comment 34

•

8 years ago

(In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #25) > The embedding would have to provide an implementation of > __rust_fallible_realloc in the final linked binary. > > What does everybody think here? Curious for Nathan's opinion (since he did > the allocators), and also curious as to whether Matt has cycles to make the > crate. I think you wildly overestimate how much I did with the allocator stuff, on the Rust and/or the Gecko side. :) I think the approach sounds reasonable. I have a small preference for not prefixing the embedding-required function with __rust, to avoid name collisions with any future Rust runtime changes, but perhaps the function's interface can be designed to be palatable if Rust ever wanted to have fallible allocation functions in their runtime?

Flags: needinfo?(nfroyd)

Chris Peterson [:cpeterson]

Updated

•

8 years ago

Blocks: 1388301

Aria Beingessner [:Gankra]

Comment 35

•

8 years ago

This won't allow you to avoid forking the stdlib collections, but here's my proposals for the API y'all should mimic: RFC: https://github.com/rust-lang/rfcs/pull/2116 impl: https://github.com/rust-lang/rust/pull/43890 The precise error type is likely to change a bit, but I don't expect you'll ever inspect it anyway.

Julian Seward [:jseward]

Assignee

Comment 36

•

8 years ago

(In reply to Manish Goregaokar [:manishearth] from comment #28) > switching hashmap impls may not change perf at all, or even make things > faster. I would measure first. I pulled OrderMap (https://crates.io/crates/ordermap/0.2.10) and had a look at it. It looks small and self contained and my impression is that it wouldn't be hard to fork and hack on, if we wanted to do that. Helpfully it comes with a set of benchmark tests, so I ran them ('cargo bench'). I didn't look at the tests at all, but from their names, typically xx_hashmap_yy and xx_ordermap_yy, I'd guess that many of them are pairs that compare HashMap with OrderMap for the same workload. In hindsight this sounds like an obvious thing that the OrderMap author(s) would want to do, so I'm not surprised to see it. The results are interesting. OrderMap is often faster than HashMap, and usually at least competitive. Details in attachment below. Given my total non-investigation of what the tests actually do, I'd take this cautiously. But the initial impression is that OrderMap might be worth investigating as a stopgap until such time as HashMap gets the fallible support we need. A couple of concerns: * This only concerns speed. OrderMap would need to be competitive from a space perspective too. * From an initial scan of the sources, OrderMap contains some internal trickery in which it is possibly slightly differently on 32- vs 64-bit targets. So we'd need to check both.

Julian Seward [:jseward]

Assignee

Comment 37

•

8 years ago

Attached file OrderMap vs HashMap perf comparisons, as referred to in comment 36. — Details

Julian Seward [:jseward]

Assignee

Comment 38

•

8 years ago

(In reply to Manish Goregaokar [:manishearth] from comment #28) > > Could we have a function that just says: "if we try to allocate a chunk of size X next, will it succeed?" > I like this idea. Considering that there is no feasible way to de-race this [AFAICS] and that it would add further to our mozjemalloc maintenance burden, I don't think this is a plausible approach. At least as it stands. Plus .. how would we implement it in the --disable-jemalloc case?

Manish Goregaokar [:manishearth]

Comment 39

•

8 years ago

Yeah, ordermap uses a scheme similar to Python's dict, and is what Rust uses on the benchmarks game. IIRC it isn't a clear win (highly depends on the kind of load) and there are reasons for keeping the stdlib hashmap as is instead of switching to this.

Emilio Cobos Álvarez [:emilio]

Comment 40

•

8 years ago

Attached file FallibleVec::push. — Details

I talked with Julian today, and probably hacking on ordermap + smallvec using a simple FallibleVec instead of Vec for the storage may not be _that_ hard. here's a prototype implementation for FallibleVec::push, pretty much based on the snippet Bobby posted before. Bobby, does this seem reasonable?

Attachment #8898800 - Flags: feedback?(bobbyholley)

Julian Seward [:jseward]

Assignee

Comment 41

•

8 years ago

So, what I propose to do is make a patch that uses OrderMap instead of HashMap for Stylo's PreComputedHashMap and MaybeCaseInsensitiveHashMap (since Emilio tells me these are the main hot ones). And then see what the perf effects are. Then, if they are acceptable, we can see about modifying OrderMap to be infallible. Bobby (+ all!), does that also sound reasonable?

Flags: needinfo?(bobbyholley)

Bobby Holley (:bholley)

Comment 42

•

8 years ago

Comment on attachment 8898800 [details] FallibleVec::push. This looks good, though I think we ought to have the core functionality be free functions, separate from the newtype (like I did in the snippet) so that stuff like SmallVec can optionally expose fallible methods without altering the inner types.

Flags: needinfo?(bobbyholley)

Attachment #8898800 - Flags: feedback?(bobbyholley) → feedback+

Bobby Holley (:bholley)

Comment 43

•

8 years ago

(In reply to Julian Seward [:jseward] from comment #41) > So, what I propose to do is make a patch that uses OrderMap instead of > HashMap for Stylo's PreComputedHashMap and MaybeCaseInsensitiveHashMap > (since Emilio tells me these are the main hot ones). And then see what > the perf effects are. Then, if they are acceptable, we can see about > modifying OrderMap to be infallible. > > Bobby (+ all!), does that also sound reasonable? Yes, this sounds like a great plan. Please work with emilio to develop the appropriate microbenchmarks to measure the cases we care about. Thanks!

Flags: needinfo?(mbrubeck)

Simon Sapin (:SimonSapin)

Comment 44

•

8 years ago

We’re now talking about a full set of alloc/realloc/dealloc functions, fully separate from the standard library’s allocator, right? With new collection types built on top? That sounds fine. Comment 25 proposed adding __rust_fallible_realloc that we would use with Vec::from_raw_parts. I think that could cause UB in case __rust_fallible_realloc and Vec::drop end up using different allocators. Making sure they don’t is not trivial: in current Rust versions the default allocator used (jemalloc or system’s malloc/HeapAlloc) by the standard library (including Vec::drop) depends on the platform and compilation mode (executable vs dynlib or staticlib). There’s a mechanism to pick one explicitly, but it’s not stable yet: https://github.com/rust-lang/rust/issues/27389

Bobby Holley (:bholley)

Comment 45

•

8 years ago

(In reply to Simon Sapin (:SimonSapin) from comment #44) > We’re now talking about a full set of alloc/realloc/dealloc functions, fully > separate from the standard library’s allocator, right? With new collection > types built on top? That sounds fine. That's not my understanding. My proposal is designed to patch in the realloc bit so that all of the code and data types can otherwise stay the same, to minimize the type divergence and special handling for callers that want fallible append. The proposal for comment 25 does this, for Vec, and we could then add a feature to SmallVec to support fallible append using the same mechanism (using the rest of the code and types unchanged). The only question is what to do for hashmaps. If we succeed in finding a suitable third-party hashmap alternative, we could presumably apply the same trick. > > Comment 25 proposed adding __rust_fallible_realloc that we would use with > Vec::from_raw_parts. I think that could cause UB in case > __rust_fallible_realloc and Vec::drop end up using different allocators. > Making sure they don’t is not trivial: in current Rust versions the default > allocator used (jemalloc or system’s malloc/HeapAlloc) by the standard > library (including Vec::drop) depends on the platform and compilation mode > (executable vs dynlib or staticlib). Sure, consumers would have to be in a situation to ensure that, which Firefox is (all Rust allocations are routed through the same mozjemalloc allocator). If we can keep the complexity and maintenance cost our solution simpler at the expense of not supporting some non-Firefox configurations, that's totally worth it. The only reason we're doing this extra stuff here is to support Firefox's OOM handling requirements, and the standard library will eventually grow a real replacement. > There’s a mechanism to pick one > explicitly, but it’s not stable yet: > https://github.com/rust-lang/rust/issues/27389

Julian Seward [:jseward]

Assignee

Comment 46

•

8 years ago

(In reply to Julian Seward [:jseward] from comment #41) > So, what I propose to do is make a patch that uses OrderMap instead of > HashMap for Stylo's PreComputedHashMap and MaybeCaseInsensitiveHashMap I have this working now, with a lot of help from Emilio. I did some initial perf runs on a pretty quiet machine, 10 iterations of the Obama Wikipedia page, and collecting the "traversal_time_ms" values. This is just so as to get some feel for the OrderMap perf effects. It's slightly slower, although the measurements are as usual swamped with noise: 6 thr HM-best 15.0039 OM-best 15.0047 (+ 0.00%) 2 thr HM-best 27.6556 OM-best 28.1321 (+ 1.72%) 1 thr HM-best 47.1426 OM-best 47.8027 (+ 1.40%) I'll see if there's any possibility of hacking OrderMap to be faster. I talked with its author on irc just now and got some useful tips.

Bobby Holley (:bholley)

Comment 47

•

8 years ago

(In reply to Julian Seward [:jseward] from comment #46) > (In reply to Julian Seward [:jseward] from comment #41) > > So, what I propose to do is make a patch that uses OrderMap instead of > > HashMap for Stylo's PreComputedHashMap and MaybeCaseInsensitiveHashMap > > I have this working now, with a lot of help from Emilio. > > I did some initial perf runs on a pretty quiet machine, 10 iterations of the > Obama Wikipedia page, and collecting the "traversal_time_ms" values. This > is just so as to get some feel for the OrderMap perf effects. It's slightly > slower, although the measurements are as usual swamped with noise: Is this with manual measurements? It might be worthwhile to try out my iterations patch a la bug 1365692 comment 20. A 1-2% regression in overall styling performance as a result of switching out the hashmaps is a lot, given that styling does a lot of things, most of which are not related to hashmap lookups. If the regression is real, I would expect us to be able to construct microbenchmarks that demonstrate the problem more clearly. Basically, it seems like we may want to construct a gtest benchmark, similar to Servo_StyleSheet_FromUTF8Bytes_Bench. We'd parse a stylesheet and use it to construct a stylist. We would both measure the time spent constructing the stylist (which would measure insertion performance), and then do a bunch of lookups on the various hashtables and measure that (to give us an idea of lookup performance).

Bobby Holley (:bholley)

Comment 48

•

8 years ago

Joe asked me to put together a document outlining what we're doing here and why. Here it is, for reference: https://docs.google.com/document/d/1ERMv87Fir5krSRKOklg-dLdn10c5NtKMdk8UB49AWBM/edit?usp=sharing I just had a conversation with Jack about it. He's much more in favor of patching in fallibility to std::HashMap than switching to OrderMap, since it keeps the data structures more standard and can be removed when the stdlib exposes the appropriate APIs. I'm also happy with that solution, because it removes the agony around performance comparisons with different HashMap implementations. So to recap, here is the plan of record: First, we implement fallibility for Vec and SmallVec per comment 5. Emilio's patch somewhat does this, but per my review comments I think we shouldn't introduce a separate type. We'd also want to use Gankro's API convention from comment 35. Then, we add a fallible insertion free function for HashMap. The wrapper would be inline and forward to insert() if capacity() != len(). If |capacity() == len() && (cfg!(debug_assertions) || capacity() > OOMLargeThreshold/2)), we invoke the cold out-of-line fallible resize routine. This routine does the following: (1) Fallibly allocates a vec with capacity*2. Returns failure if the allocation fails. (2) Creates a new empty HashMap, with a different seed to address comment 28. (3) With unsafe code and static assertions, patches the buffer and capacity into the RawTable inside the new HashMap. (4) Drains the old hashmap and inserts into the new one. Julian, does this all make sense and seem like something you can take care of? Sorry about wasting your time with the OrderMap stuff.

Flags: needinfo?(jseward)

Nicholas Nethercote [inactive]

Comment 49

•

8 years ago

(In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #48) > Joe asked me to put together a document outlining what we're doing here and > why. Here it is, for reference: > > https://docs.google.com/document/d/1ERMv87Fir5krSRKOklg- > dLdn10c5NtKMdk8UB49AWBM/edit?usp=sharing Nice summary! > I just had a conversation with Jack about it. He's much more in favor of > patching in fallibility to std::HashMap than switching to OrderMap Relatedly, I have code for bug 1389305 working -- it will add a new function to jemalloc that lets us measure heap allocations even when we only have an interior pointer. This means we will be able to measure HashMap accurately, i.e. sticking with HashMap won't cause problems for memory reporting. > (1) Fallibly allocates a vec with capacity*2. Returns failure if the > allocation fails. > (2) Creates a new empty HashMap, with a different seed to address comment 28. > (3) With unsafe code and static assertions, patches the buffer and capacity > into the RawTable inside the new HashMap. Presumably this requires knowing the internal layout of HashMap? Presumably the static assertions will aim to detect if that layout changes? I'm struggling to imagine what those static assertions would look like.

Bobby Holley (:bholley)

Comment 50

•

8 years ago

(In reply to Nicholas Nethercote [:njn] from comment #49) > Presumably this requires knowing the internal layout of HashMap? Presumably > the static assertions will aim to detect if that layout changes? I'm > struggling to imagine what those static assertions would look like. They could detect size changes (so added/removed members), which is at least something. But we can also just tell the libs team to warn us. I think the chance that it changes before Rust adds the right APIs is small, and the chance that nobody would warn us is smaller still, and the chance that it wouldn't cause obvious crashes in Firefox automation is even smaller. So I think we can live with it.

Julian Seward [:jseward]

Assignee

Comment 51

•

8 years ago

(In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #48) > Julian, does this all make sense and seem like something you can take care of? Yes, I'll do that. It seems like a good plan. FWIW, I spent much of yesterday hacking around with OrderMap. It has some extra complexity to do with ensuring that more than 2^32 elements can be inserted, if there's enough memory. This gives some overhead even in the case that there are fewer than 2^32 elements in the map. It might be possible to remove the more-than-2^32-elements support and associated overhead, but the performance effects are uncertain and it would be some work. So sticking with HashMap sounds better.

Flags: needinfo?(jseward)

Julian Seward [:jseward]

Assignee

Comment 52

•

8 years ago

Attached patch bug1389009-Vec-fallible_push-1.diff (obsolete) — Details — Splinter Review

Let's start with Vec::fallible_push, since it is lower hanging fruit -- it's simple to do, and from the stacks in comment 17, it looks like there are some quite large vectors created by Stylo. Here's a first attempt. It is mostly in the style of Bobby's patch in comment 25 but with a lot of help from Emilio. To test it, I used it in the |rules.push| call in |impl Stylesheet, fn parse_rules| (components/style/stylesheets/stylesheet.rs) since that's one place that the profiling in comment 17 pointed at. Loading techcrunch.com, the vector in question is sometimes extended out to 4096 elements. I caused an artificial failure at the 1024 point. This causes techcrunch.com to look (extremely) screwy, but the system holds together -- no exceptions, segfaults or any such. Regarding the patch, there are a number of things I am unsure about and would appreciate clarification: (1) I simply used the system malloc and realloc: +extern "C" { + fn realloc(ptr: *mut u8, bytes: usize) -> *mut u8; + fn malloc(bytes: usize) -> *mut u8; +} and it seems to work OK, but Simon's comment 44 has me worried. Is using the system allocators OK, or is there more trickery that needs to happen here? Also, Bobby's comment 25 proposal seemed to involve a |__rust_fallible_realloc|, which I don't understand. Is it related to Simon's comment? (2) In the patch, function |fallible_double|, I am unsure whether it is correct even if realloc reallocates in-place. I just copied the magic incantation |mem::forget(mem::replace(vec, new_vec));| from Bobby's patch without understanding what it does. (3) Do we want |fallible_push| only to return a bool? Or should it return the to-be-pushed item on failure, so that the caller doesn't simply "lose" it in that case? (I saw something about this in the preceding discussions to do with fallible HashMaps.) (4) Is there a way we can annotate the |if vec.capacity() == vec.len()| condition with a branch-direction hint (here, "unlikely"), like we can in C++-world?

Julian Seward [:jseward]

Assignee

Updated

•

8 years ago

Flags: needinfo?(simon.sapin)

Flags: needinfo?(bobbyholley)

Emilio Cobos Álvarez [:emilio]

Comment 53

•

8 years ago

(In reply to Julian Seward [:jseward] from comment #52) > Created attachment 8900734 [details] [diff] [review] > bug1389009-Vec-fallible_push-1.diff > > Let's start with Vec::fallible_push, since it is lower hanging fruit -- it's > simple to do, and from the stacks in comment 17, it looks like there are > some quite large vectors created by Stylo. > > Here's a first attempt. It is mostly in the style of Bobby's patch in > comment 25 but with a lot of help from Emilio. To test it, I used it in the > |rules.push| call in |impl Stylesheet, fn parse_rules| > (components/style/stylesheets/stylesheet.rs) since that's one place that the > profiling in comment 17 pointed at. > > Loading techcrunch.com, the vector in question is sometimes extended out to > 4096 elements. I caused an artificial failure at the 1024 point. This > causes techcrunch.com to look (extremely) screwy, but the system holds > together -- no exceptions, segfaults or any such. That's great to hear :) > Regarding the patch, there are a number of things I am unsure about and > would appreciate clarification: > > (1) I simply used the system malloc and realloc: > > +extern "C" { > + fn realloc(ptr: *mut u8, bytes: usize) -> *mut u8; > + fn malloc(bytes: usize) -> *mut u8; > +} > > and it seems to work OK, but Simon's comment 44 has me worried. > Is using the system allocators OK, or is there more trickery that > needs to happen here? Yeah, this works fine for us, but not for random rust consumers. > Also, Bobby's comment 25 proposal seemed to involve a > |__rust_fallible_realloc|, which I don't understand. Is it related > to Simon's comment? The spirit was to provide that symbol in central, but I think just using realloc is fine. > (2) In the patch, function |fallible_double|, I am unsure whether it is > correct even if realloc reallocates in-place. I just copied the magic > incantation |mem::forget(mem::replace(vec, new_vec));| from Bobby's > patch without understanding what it does. It's pretty much equivalent to "avoid destroying the old vector, since now it contains an invalid pointer we'll double-free". > (3) Do we want |fallible_push| only to return a bool? Or should it > return the to-be-pushed item on failure, so that the caller doesn't > simply "lose" it in that case? (I saw something about this in the > preceding discussions to do with fallible HashMaps.) It should probably return at least a `Result<(), ()>`, which is equivalent to a bool, but is `#[must_use]`. I don't have a strong preference in `Result<(), ()>`, vs. `Result<(), T>`. I don't think we're going to need the T any time soon, so I think the first is easier. > (4) Is there a way we can annotate the |if vec.capacity() == vec.len()| > condition with a branch-direction hint (here, "unlikely"), like we > can in C++-world? I know there are likely / unlikely intrinsics, but those are unstable: https://doc.rust-lang.org/std/intrinsics/fn.likely.html I _think_ you may get the same effect with a standalone function and #[cold], but I'm not an expert on that.

Simon Sapin (:SimonSapin)

Comment 54

•

8 years ago

I’m extremely worried about any plan that involves making assumptions about private implementation details of standard library types, such as the memory layout or which allocator is used. Yes, static assertions can help turn some cases of UB into compilation errors. Yes, we can ask the Rust team to tell us when these details change. But things would still break when they do change. With such a plan, in addition to a minimum Rust version requirement, any given Firefox version would effectively also have a maximum Rust version. Do we really want to say “only Rust 1.20.x is supported for compiling Firefox 57”? That would defeat the point of using the stable channel. I strongly feel that this is a deal-breaker, and that the only viable way to get fallible allocations before they’re supported in the standard library is to build separate collections types on top of libc::malloc/free (or jemalloc equivalents), fully independent from the standard library collections or allocators. (They can be a copy/fork of the same code, as long as we control what exact version of it is used.)

Flags: needinfo?(simon.sapin)

Manish Goregaokar [:manishearth]

Comment 55

•

8 years ago

Agreed. I consider "guessing" HashMap layout to be the worst of the options here. Especially since we don't pin rust versions at all. I really think the ordermap solution is the way to go; fallible APIs can be easily added to OrderMap, even without a fork (but if we fork it it's easy to maintain since it's 100% safe)

Simon Sapin (:SimonSapin)

Comment 56

•

8 years ago

Specifically, I’m proposing: * Copying into a new crate the entire implementation of std::vec::Vec and std::collections::HashMap, including anything else they need that is private or unstable. (This includes the std::heap::Alloc trait, except that our equivalent of its std::heap::Heap impl could use libc or jemalloc directly and skip std’s mechanism for changing the default allocator.) * Add fallible methods to that copy. * Change SmallVec to be based on this instead of std::vec::Vec Forking such a large amount of code is unfortunate, but I believe this is the “least bad” option we have left. As more relevant APIs (like the Alloc trait) are stabilized in future Rust versions, we can incrementally replace pieces of this crate with imports from std. When std supports fallible operations and there’s nothing left, we can switch everything back to using std::vec::Vec and std::collections::HashMap.

Bobby Holley (:bholley)

Updated

•

8 years ago

Blocks: 1393656

Bobby Holley (:bholley)

Comment 57

•

8 years ago

Let's move the HashMap discussion over to bug 1393656 and focus on Vec and SmallVec in this bug.

Summary: stylo: Add fallibility support for large allocations → stylo: Add fallible append APIs for Vec and SmallVec

Bobby Holley (:bholley)

Comment 58

•

8 years ago

Looks like emilio mostly answered the questions. Let me know if you have others!

Flags: needinfo?(bobbyholley)

Julian Seward [:jseward]

Assignee

Comment 59

•

8 years ago

(In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #58) > Looks like emilio mostly answered the questions. Let me know if you have > others! I don't feel that we have a clear story on whether it's OK or not to use system malloc and realloc, and that seems to me to be a gating condition for this approach. So: * is it OK to use them? * if not, is there some other way to do the allocation, that would be acceptable?

Flags: needinfo?(simon.sapin)

Flags: needinfo?(bobbyholley)

Simon Sapin (:SimonSapin)

Comment 60

•

8 years ago

It is important that, when deallocating or reallocating, the same allocator is used that previously allocated that particular chunk of memory. The allocator used by Vec can be controlled with #[global_allocator], but that language feature is unstable: https://github.com/rust-lang/rust/issues/27389. The default is unspecified. In my opinion, this means it is not OK to use Vec::from_raw_parts with a pointer that was not allocated by Vec itself (for example with Vec::with_capacity). Unfortunately, Vec does not provide a fallible allocation API in current Rust versions. In my opinion, this means we need to fork Vec in order to control which allocator is used in drop().

Flags: needinfo?(simon.sapin)

Emilio Cobos Álvarez [:emilio]

Comment 61

•

8 years ago

(In reply to Simon Sapin (:SimonSapin) from comment #60) > It is important that, when deallocating or reallocating, the same allocator > is used that previously allocated that particular chunk of memory. > > The allocator used by Vec can be controlled with #[global_allocator], but > that language feature is unstable: > https://github.com/rust-lang/rust/issues/27389. The default is unspecified. > In my opinion, this means it is not OK to use Vec::from_raw_parts with a > pointer that was not allocated by Vec itself (for example with > Vec::with_capacity). Unfortunately, Vec does not provide a fallible > allocation API in current Rust versions. In my opinion, this means we need > to fork Vec in order to control which allocator is used in drop(). Note that we rely on already on the fact that rust static libraries call into the system allocator. So as long as we don't introduce these hacks into crates that are used outside of stylo, I personally think it's fine.

Emilio Cobos Álvarez [:emilio]

Comment 62

•

8 years ago

The default allocator is specified for static and dynamic libraries (which stylo is one): https://doc.rust-lang.org/1.9.0/book/custom-allocators.html#default-allocator: > Dynamic and static libraries, however, will use alloc_system by default. > Here Rust is typically a 'guest' in another application or another world > where it cannot authoritatively decide what allocator is in use. > > As a result it resorts back to the standard APIs (e.g. malloc and free) for > acquiring and releasing memory. Based on ^, I think it's ok to use as long as we don't end up using it on libraries on the general rust ecosystem (that is, as long as we keep it on the subset of rust code that gets into firefox, but not on other crates published on crates.io).

Emilio Cobos Álvarez [:emilio]

Comment 63

•

8 years ago

For reference, this is the code where we rely on it already: https://github.com/servo/servo/blob/5a57a3ef4b80d277059f1dd68fd6396621691707/components/style/properties/gecko.mako.rs#L5481-L5488

Bobby Holley (:bholley)

Comment 64

•

8 years ago

Looks like emilio answered the question here.

Flags: needinfo?(bobbyholley)

Julian Seward [:jseward]

Assignee

Comment 65

•

8 years ago

Attached patch bug1389009-Vec-fallible_push-3.diff (obsolete) — Details — Splinter Review

Implements {Vec,SmallVec}::try_push. Has been tested a bit but the test code is not in the patch.

Julian Seward [:jseward]

Assignee

Updated

•

8 years ago

Attachment #8902183 - Flags: feedback?(emilio)

Emilio Cobos Álvarez [:emilio]

Comment 66

•

8 years ago

Comment on attachment 8902183 [details] [diff] [review] bug1389009-Vec-fallible_push-3.diff Review of attachment 8902183 [details] [diff] [review]: ----------------------------------------------------------------- Looks pretty reasonable to me! ::: servo/components/fallible/lib.rs @@ +43,5 @@ > + > +// Double the capacity of |vec|, or fail to do so due to lack of memory. > +// Returns true on success, false on failure. > +#[inline(never)] > +fn try_double_vec<T>(vec: &mut Vec<T>) -> bool I'd probably make these functions return a result too. @@ +77,5 @@ > + > +impl<T: Array> FallibleVec<T::Item> for SmallVec<T> { > + #[inline] > + fn try_push(&mut self, val: T::Item) -> Result<(),()> > + { nit: this formatting is somewhat odd, feel free to install `rustfmt` using: cargo install rustfmt-nightly And run it through this crate. @@ +79,5 @@ > + #[inline] > + fn try_push(&mut self, val: T::Item) -> Result<(),()> > + { > + if self.capacity() == self.len() { > + if !try_double_small_vec(self) { That way you can just do: try_double_small_vec(self)?; debug_assert!(..); @@ +120,5 @@ > + memcpy(new_ptr as *mut u8, old_ptr as *const u8, > + old_size_bytes); > + } > + } > + } nit: Is there something that guarantees the alignment of the memory is suitable? I think we can rely on this, but perhaps we should assert we don't use more than word-aligned types? I'd check that with Bobby or Manish, that implemented the Arc hackery to be able to store stuff inline in the header and so on.

Attachment #8902183 - Flags: feedback?(emilio) → feedback+

Julian Seward [:jseward]

Assignee

Comment 67

•

8 years ago

(In reply to Emilio Cobos Álvarez [:emilio] from comment #66) > Review of attachment 8902183 [details] [diff] [review]: Thanks for the feedback. The alignment thing is a good point. I can add this (per your irc hinting): // Assert that the alignment required by T doesn't exceed the minimum // alignment (word alignment) we can reasonably assume that // malloc/realloc give us. assert!(mem::align_of::<T>() <= mem::align_of::<*const ()>());

Julian Seward [:jseward]

Assignee

Comment 68

•

8 years ago

Attached patch bug1389009-Vec-fallible_push-4.diff (obsolete) — Details — Splinter Review

Incorporating Emilio's feedback.

Attachment #8900734 - Attachment is obsolete: true

Attachment #8902183 - Attachment is obsolete: true

Julian Seward [:jseward]

Assignee

Comment 69

•

8 years ago

Attached patch bug1389009-Vec-fallible_push-5.diff (obsolete) — Details — Splinter Review

Fix comment to match source code (return types).

Attachment #8902232 - Attachment is obsolete: true

Julian Seward [:jseward]

Assignee

Updated

•

8 years ago

Attachment #8902237 - Flags: feedback?(bobbyholley)

Julian Seward [:jseward]

Assignee

Comment 70

•

8 years ago

(In reply to Julian Seward [:jseward] from comment #67) > (In reply to Emilio Cobos Álvarez [:emilio] from comment #66) > > Review of attachment 8902183 [details] [diff] [review]: > // Assert that the alignment required by T doesn't exceed the minimum > // alignment (word alignment) we can reasonably assume that > // malloc/realloc give us. > assert!(mem::align_of::<T>() <= mem::align_of::<*const ()>()); On contemplation I have doubts about this. On 32 bit ARM, the word size is 4, but minimum alignment is 8, since the ARM doubleword load/store instructions for one require that.

Bobby Holley (:bholley)

Comment 71

•

8 years ago

Comment on attachment 8902237 [details] [diff] [review] bug1389009-Vec-fallible_push-5.diff Review of attachment 8902237 [details] [diff] [review]: ----------------------------------------------------------------- Generally seems good to me. Please get review from Manish on the final patch, and coordinate with him to ensure everything works smoothly between this stuff and the HashMap stuff. For the allocation alignment stuff, it might be worth coordinating with Manish and breaking https://github.com/Manishearth/hashglobe/blob/master/src/alloc.rs out into a separate crate that could be shared by hashglobe and fallible. Also see https://github.com/servo/servo/issues/17320 for an ARM crash that we had with our hand-rolled Arc, and the code we used to fix it. ::: servo/components/fallible/lib.rs @@ +39,5 @@ > +} > + > +// Double the capacity of |vec|, or fail to do so due to lack of memory. > +// Returns Ok(()) on success, Err(()) on failure. > +#[inline(never)] Mark this as #[cold], same for the other double impl. @@ +119,5 @@ > + unsafe { > + new_ptr = malloc(new_size_bytes); > + if !new_ptr.is_null() && old_size_bytes > 0 { > + memcpy(new_ptr as *mut u8, > + old_ptr as *const u8, old_size_bytes); You can use std::ptr::copy_nonoverlapping instead of memcpy, which reduces the C functions you need to link against. ::: servo/components/style/lib.rs @@ +46,5 @@ > #[allow(unused_extern_crates)] extern crate byteorder; > #[cfg(feature = "gecko")] #[macro_use] #[no_link] extern crate cfg_if; > #[macro_use] extern crate cssparser; > extern crate euclid; > +//extern crate fallible; Why is this commented out?

Attachment #8902237 - Flags: feedback?(bobbyholley) → feedback+

Julian Seward [:jseward]

Assignee

Updated

•

8 years ago

Blocks: 1395064

Julian Seward [:jseward]

Assignee

Comment 72

•

8 years ago

Attached patch bug1389009-Vec-fallible_push-6.diff (obsolete) — Details — Splinter Review

Incorporates feedback from comment 71. > ::: servo/components/style/lib.rs > @@ +46,5 @@ > > #[allow(unused_extern_crates)] extern crate byteorder; > > #[cfg(feature = "gecko")] #[macro_use] #[no_link] extern crate cfg_if; > > #[macro_use] extern crate cssparser; > > extern crate euclid; > > +//extern crate fallible; > > Why is this commented out? I should have simply removed that (and have now). As it stands this patch only adds {Vec,SmallVec}::try_push but doesn't add any uses. So rustc warns that the crate is unused and the build fails. > For the allocation alignment stuff [..] I removed the assertion because I believe it is incorrect. Also, AFAICS malloc/realloc will return memory aligned suitable for any basic type, so we're OK in the short term, providing we're not using SIMD types or anything similarly exotic. From talking with MikeH, I agree that the proper fix is to route it through Manish's copy of src/alloc.rs.

Attachment #8902237 - Attachment is obsolete: true

Julian Seward [:jseward]

Assignee

Updated

•

8 years ago

Attachment #8902554 - Flags: review?(manishearth)

Manish Goregaokar [:manishearth]

Comment 73

•

8 years ago

Comment on attachment 8902554 [details] [diff] [review] bug1389009-Vec-fallible_push-6.diff Review of attachment 8902554 [details] [diff] [review]: ----------------------------------------------------------------- ::: servo/components/fallible/lib.rs @@ +30,5 @@ > + #[inline] > + fn try_push(&mut self, val: T) -> Result<(), ()> { > + if self.capacity() == self.len() { > + try_double_vec(self)?; > + debug_assert!(self.capacity() > self.len()); Gankro suggested making the overflow case return Err(()) for hashglobe and I presume that applies here too. @@ +48,5 @@ > + let old_cap = vec.capacity(); > + > + let new_cap = if old_cap == 0 { 4 } else { old_cap * 2 }; > + > + let new_size_bytes = new_cap * mem::size_of::<T>(); the stdlib vec impl (src/liballoc/raw_vec.rs) seems to actually care about alignment (even during deallocation?!). Given that the buffer we create here will be used there too, must we care about it too? @@ +61,5 @@ > + if new_ptr.is_null() { > + return Err(()); > + } > + > + let new_vec; let new_vec = unsafe { ... }; @@ +126,5 @@ > + return Err(()); > + } > + > + let new_vec; > + unsafe { nit: no need for deferred init as above

Attachment #8902554 - Flags: review?(manishearth) → review+

Julian Seward [:jseward]

Assignee

Comment 74

•

8 years ago

(In reply to Manish Goregaokar [:manishearth] from comment #73) Thanks for the review. > ::: servo/components/fallible/lib.rs > @@ +30,5 @@ > > + #[inline] > > + fn try_push(&mut self, val: T) -> Result<(), ()> { > > + if self.capacity() == self.len() { > > + try_double_vec(self)?; > > + debug_assert!(self.capacity() > self.len()); > > Gankro suggested making the overflow case return Err(()) for hashglobe and I > presume that applies here too. Yes. If try_double_vec(self) fails then the ?; causes try_push to also return with failure. Is that what you meant? > the stdlib vec impl (src/liballoc/raw_vec.rs) seems to actually care about > alignment (even during deallocation?!). Where should I look in src/liballoc/raw_vec.rs to see concerns there about deallocation? I looked at all places mentioning "free" and "drop" but didn't see anything obvious.

Flags: needinfo?(manishearth)

Simon Sapin (:SimonSapin)

Comment 75

•

8 years ago

Comment on attachment 8902554 [details] [diff] [review] bug1389009-Vec-fallible_push-6.diff Review of attachment 8902554 [details] [diff] [review]: ----------------------------------------------------------------- ::: servo/components/fallible/lib.rs @@ +46,5 @@ > + let old_ptr = vec.as_mut_ptr(); > + let old_len = vec.len(); > + let old_cap = vec.capacity(); > + > + let new_cap = if old_cap == 0 { 4 } else { old_cap * 2 }; This and four other multiplications should use checked_mul and return an error on integer overflow.

Simon Sapin (:SimonSapin)

Comment 76

•

8 years ago

Comment on attachment 8902554 [details] [diff] [review] bug1389009-Vec-fallible_push-6.diff Review of attachment 8902554 [details] [diff] [review]: ----------------------------------------------------------------- ::: servo/components/fallible/lib.rs @@ +12,5 @@ > +use std::vec::Vec; > + > +extern "C" { > + fn realloc(ptr: *mut u8, bytes: usize) -> *mut u8; > + fn malloc(bytes: usize) -> *mut u8; Should these be HeapAlloc on Windows like in https://github.com/servo/servo/pull/18334 ?

Aria Beingessner [:Gankra]

Comment 77

•

8 years ago

(In reply to Simon Sapin (:SimonSapin) from comment #75) > Comment on attachment 8902554 [details] [diff] [review] > bug1389009-Vec-fallible_push-6.diff > > Review of attachment 8902554 [details] [diff] [review]: > ----------------------------------------------------------------- > > ::: servo/components/fallible/lib.rs > @@ +46,5 @@ > > + let old_ptr = vec.as_mut_ptr(); > > + let old_len = vec.len(); > > + let old_cap = vec.capacity(); > > + > > + let new_cap = if old_cap == 0 { 4 } else { old_cap * 2 }; > > This and four other multiplications should use checked_mul and return an > error on integer overflow. We don't need to worry about overflow here due to preconditions. However you need to assert those as postconditions, like RawVec does. I suggest reading through RawVec to understand the invariants at play here: https://doc.rust-lang.org/src/alloc/raw_vec.rs.html#48-52 (notably: capacaity can't exceed isize::MAX bytes or else ptr::offset is UB)

Aria Beingessner [:Gankra]

Comment 78

•

8 years ago

Also https://github.com/rust-lang/rust/pull/43890 shows how to impl this allocation code fallibly.

Manish Goregaokar [:manishearth]

Comment 79

•

8 years ago

> also return with failure. Is that what you meant? No, I was referring to the debug_assert for capacity overflow > Should these be HeapAlloc on Windows like in https://github.com/servo/servo/pull/18334 ? Windows still has malloc IIRC? Just not the posix_memalign variants.

Flags: needinfo?(manishearth)

Julian Seward [:jseward]

Assignee

Comment 80

•

8 years ago

Attached patch bug1389009-Vec-fallible_push-7.diff — Details — Splinter Review

Adds checks for integer overflow, and removes some deferred inits.

Attachment #8902554 - Attachment is obsolete: true

Attachment #8904638 - Flags: review?(manishearth)

Manish Goregaokar [:manishearth]

Updated

•

8 years ago

Attachment #8904638 - Flags: review?(manishearth) → review+

Bobby Holley (:bholley)

Comment 81

•

8 years ago

https://hg.mozilla.org/integration/autoland/rev/b9b58cc99f2f

Status: NEW → RESOLVED

Closed: 8 years ago

Resolution: --- → FIXED

Ryan VanderMeulen [:RyanVM]

Updated

•

8 years ago

status-firefox57: --- → fixed

Target Milestone: --- → mozilla57

Liz Henry (:lizzard)

Comment 82

•

8 years ago

Can we (do we need to) uplift this to 56 on m-r, to help with the fix for bug 1389470?

status-firefox56: --- → ?

Flags: needinfo?(jseward)

Flags: needinfo?(ayang)

Julian Seward [:jseward]

Assignee

Comment 83

•

8 years ago

(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #82) Looking at what was pushed in bug 1389470 comment 50, I think the relevant bits from this patch have been copied into m-c, which (presumably) fixes bug 1389470 on m-c. But those fixes will have to get uplifted to m-r if you want bug 1389470 to be fixed on m-r. To clarify: the patches in this bug only support fallible allocation for Stylo. They can't be used as-is for bug 1389470. So uplifting them to 56 won't help bug 1389470. This also means that Ryan's assertion in bug 1389470 comment 53 doesn't hold. Sorry this is confusing. I *think* I have this right, but a second opinion would be good. ayang?

Flags: needinfo?(jseward) → needinfo?(lhenry)

Simon Sapin (:SimonSapin)

Comment 84

•

8 years ago

Stylo is not enabled in 56 release (and possibly not even built?), so I think this bug is not relevant. (It is enabled for some percentage of users of 56 beta.)

Bobby Holley (:bholley)

Comment 85

•

8 years ago

Yeah, the issue is that the folks in bug 1389009 reused the fallible vec stuff to fix an OOM in the mp4 parser, which is shipping in release. But AFAICT they just copy-pasted it, so there should be no reason to uplift this.

Liz Henry (:lizzard)

Comment 86

•

8 years ago

Thanks for the clarification!

Flags: needinfo?(lhenry)

Alfredo Yang (:alfredo)

Updated

•

8 years ago

Flags: needinfo?(ayang)

Liz Henry (:lizzard)

Updated

•

8 years ago

status-firefox56: ? → unaffected

mozconfig used for measurement 8 years ago Julian Seward [:jseward] 466 bytes, text/plain		Details
patch used for measurement 8 years ago Julian Seward [:jseward] 4.12 KB, patch		Details \| Diff \| Splinter Review
infallible allocs of 1024KB and over 8 years ago Julian Seward [:jseward] 38.17 KB, text/plain		Details
infallible allocs of 128KB and over 8 years ago Julian Seward [:jseward] 435.53 KB, text/plain		Details
infallible allocs of 16KB and over (.txt.bz2) 8 years ago Julian Seward [:jseward] 125.21 KB, application/x-bzip		Details
infallible allocs of 1024KB and over, take 2 8 years ago Julian Seward [:jseward] 599.90 KB, text/plain		Details
patch used for measurement, take 2 8 years ago Julian Seward [:jseward] 2.58 KB, patch		Details \| Diff \| Splinter Review
infallible allocs of 1024KB and over, take 2, with C++-only stacks removed 8 years ago Julian Seward [:jseward] 34.44 KB, text/plain		Details
OrderMap vs HashMap perf comparisons, as referred to in comment 36. 8 years ago Julian Seward [:jseward] 3.99 KB, text/plain		Details
FallibleVec::push. 8 years ago Emilio Cobos Álvarez [:emilio] 858 bytes, text/plain	bholley : feedback+	Details
bug1389009-Vec-fallible_push-1.diff 8 years ago Julian Seward [:jseward] 11.38 KB, patch		Details \| Diff \| Splinter Review
bug1389009-Vec-fallible_push-3.diff 8 years ago Julian Seward [:jseward] 6.17 KB, patch	emilio : feedback+	Details \| Diff \| Splinter Review
bug1389009-Vec-fallible_push-4.diff 8 years ago Julian Seward [:jseward] 6.32 KB, patch		Details \| Diff \| Splinter Review
bug1389009-Vec-fallible_push-5.diff 8 years ago Julian Seward [:jseward] 6.32 KB, patch	bholley : feedback+	Details \| Diff \| Splinter Review
bug1389009-Vec-fallible_push-6.diff 8 years ago Julian Seward [:jseward] 5.29 KB, patch	manishearth : review+	Details \| Diff \| Splinter Review
bug1389009-Vec-fallible_push-7.diff 8 years ago Julian Seward [:jseward] 5.55 KB, patch	manishearth : review+	Details \| Diff \| Splinter Review