Closed Bug 1389009 Opened 3 years ago Closed 2 years ago

stylo: Add fallible append APIs for Vec and SmallVec

Categories

(Core :: CSS Parsing and Computation, enhancement, P1)

enhancement

Tracking

()

RESOLVED FIXED
mozilla57
Tracking Status
firefox56 --- unaffected
firefox57 --- fixed

People

(Reporter: jseward, Assigned: jseward)

References

Details

Attachments

(7 files, 9 obsolete files)

466 bytes, text/plain
Details
599.90 KB, text/plain
Details
2.58 KB, patch
Details | Diff | Splinter Review
34.44 KB, text/plain
Details
3.99 KB, text/plain
Details
858 bytes, text/plain
bholley
: feedback+
Details
5.55 KB, patch
manishearth
: review+
Details | Diff | Splinter Review
We need to do this in order to assess the need for using a fallible HashMap
variant (and perhaps fallible versions of other types) in Stylo.
Attached patch patch used for measurement (obsolete) — Splinter Review
Attached file infallible allocs of 1024KB and over (obsolete) —
Attached file infallible allocs of 128KB and over (obsolete) —
It appears the answer is "no": there are no large infallible allocations
from Stylo.

Here are details of my methodology.  I mention this because I'm sure this is
easy to get wrong.  Feel free to shoot holes in it.

My mozconfig and measurement patch are attached.  The patch prints all
allocations via moz_x* over a certain threshold.  I tried with thresholds
1024k(bytes), 128k and 16k.  I loaded both the Wikipedia Obama page and the
HTML5 spec-as-a-single page, waited for both to load, and scrolled down and
back up.

I ran with STYLO_THREADS=4 DUMP_STYLE_STATISTICS=2, so as to verify that
Stylo actually ran, and in parallel.  In all 3 log files there are a bunch
of "[PERF]" prefixed lines which confirm that Stylo was used.

In the 1024k and 128k threshold cases I see no Rust source lines in any
stack, so Stylo cannot be doing infallible allocations over 128k.  For the
16k cases, I only see Rust code in stacks in which Rayon registers its
worker threads with the Gecko profiler (it looks like) and that allocates
ThreadInfo objects of size 32808.  So no Stylo allocation there, either.
Great news!

Can you try with the 100x myspace testcase? Download https://www.dropbox.com/s/h51fspacftf1lcq/myspace.tar.bz2?dl=0, and use the "myspace" directory (not myspace-orig), which has the stylesheet from myspace-orig duplicated 100x. That will give us some sense of whether increasing the size of the stylesheet causes any difference.

NI bz for his thoughts on the methodology and for suggestions of other pessimal testcases.
Flags: needinfo?(jseward)
> NI bz for his thoughts on the methodology

My main question is what allocation functions rust code actually ends up calling.  Does it call moz_xmalloc, or does it call malloc and then panic on failure?  Nathan, do you know offhand?

> suggestions of other pessimal testcases

Make sure you have Adblock Plus installed when testing?

What sites were tested?
Flags: needinfo?(nfroyd)
> Does it call moz_xmalloc

If I understood Nathan correctly on IRC, rust doesn't call moz_xmalloc.  It calls malloc (or whatever other system allocator functions it calls), then panics on failure.
Flags: needinfo?(nfroyd)
(In reply to Boris Zbarsky [:bz] (vacation Aug 14-27) from comment #9) 
> If I understood Nathan correctly on IRC, rust doesn't call moz_xmalloc.  It
> calls malloc (or whatever other system allocator functions it calls), then
> panics on failure.

Mrrmh, ok.  That invalidates my results, then.  I'll try again in the
morning.  Thanks for the sanity check.
Flags: needinfo?(jseward)
> If I understood Nathan correctly on IRC, rust doesn't call moz_xmalloc.  It
> calls malloc (or whatever other system allocator functions it calls), then
> panics on failure.

Makes sense... you'd expect some Stylo allocations in the 16 KiB results.

Apologies for suggesting a dud methodology, Julian, and thanks bz/froydnj for identifying this.
No problem.  This is what I envisaged when I said above "I'm sure this is
easy to get wrong."

Let me try to identify the calling path for vanilla Rust allocations in
Stylo to see if there is some feasible place we can intercept calls to
malloc.  My fear is that either malloc is called from a (Rust) system
library, or the function(s) that call(s) malloc is inlined or is otherwise
dispersed, so that there is no single place we can observe the size of
requests passed to malloc.  Either of those would make it difficult to find
out what we want to know.
I don't understand. Why can't we just intercept calls to malloc like we did for xmalloc, and then filter by stacks emanating from rust code?
In principle that's a good idea.  In practice I found it difficult to
identify the source code that actually generates |malloc| in |firefox-bin|,
due to the complexity of the intercepting and routing machinery in
memory/build/ and memory/mozalloc/.  After some digging I believe that it is
created by an instantation of the macro GENERIC_MALLOC_DECL_HELPER at
memory/build/replace_malloc.c:159.  AFAICT there are no non-macro-generated
bits of source code that constitute |malloc|, |calloc|, |realloc|, etc.

I see from profiling that |malloc| always just calls on directly to
|je_malloc|, so I decided to put intercepts in that group of functions
instead.  Even that's confusing; what's called |malloc_impl| at
mozjemalloc.cpp:4506 seems to turn up as |je_malloc| in the object code.

I tested this with a 1MB threshold, loading:

- the Wikipedia Obama page
- the HTML5 spec-as-a-single-page page
- Bobby's 100x myspace testcase per comment 7

and I had Adblock Plus installed.

The results are attached.  Note, these are simply, all allocation requests
of 1MB and over, with no obvious way to distinguish fallible from infallible
requests.  There are quite a lot of stacks with involving .rs source files.
Attachment #8895779 - Attachment is obsolete: true
Attachment #8895780 - Attachment is obsolete: true
Attachment #8895781 - Attachment is obsolete: true
Attachment #8895778 - Attachment is obsolete: true
The same log file as for comment 15, but restricted to stacks involving Rust
sources, for easier analysis.  It looks like there might be some areas of
concern.
Fishing around in the logfile from comment 17, looking only at stacks
involving path component /servo/, I see (at least) the calls below.  So it
looks like there is some work to do, assuming these assume infallibility.

1048576 bytes
selectors::parser::parse_one_simple_selector
-> into<&str,style::gecko_string_cache::Atom>
-> from
-> Gecko_Atomize (C++)
-> (more C++ frames)
-> je_calloc

1966080 bytes
entry<style::gecko_string_cache::Atom,style::selector_map::SelectorMap<style::invalidation::element::invalidation_map::Dependency>,core::hash::BuildHasherDefault<style::selector_map::PrecomputedHasher>>
-> <std::collections::hash::map::HashMap<K, V, S>>::resize
-> new<style::gecko_string_cache::Atom,style::selector_map::SelectorMap<style::invalidation::element::invalidation_map::Dependency>>
-> new_uninitialized<style::gecko_string_cache::Atom,style::selector_map::SelectorMap<style::invalidation::element::invalidation_map::Dependency>>
-> allocate
-> UnknownInlinedFun
-> UnknownInlinedFun
-> je_malloc

1048576 bytes
insert<style::stylist::Rule>
-> entry<smallvec::SmallVec<[style::stylist::Rule; 1]>>
-> <std::collections::hash::map::HashMap<K, V, S>>::entry
-> <std::collections::hash::map::HashMap<K, V, S>>::resize
-> new<style::gecko_string_cache::Atom,smallvec::SmallVec<[style::stylist::Rule; 1]>>
-> new_uninitialized<style::gecko_string_cache::Atom,smallvec::SmallVec<[style::stylist::Rule; 1]>>
-> allocate
-> UnknownInlinedFun
-> UnknownInlinedFun
-> je_malloc

1048576 bytes
parse_rules
-> push<style::stylesheets::CssRule>
-> <alloc::raw_vec::RawVec<T>>::double
-> reallocate
-> UnknownInlinedFun
-> UnknownInlinedFun
-> je_realloc
(In reply to Julian Seward [:jseward] from comment #18)
> Fishing around in the logfile from comment 17, looking only at stacks
> involving path component /servo/, I see (at least) the calls below.  So it
> looks like there is some work to do, assuming these assume infallibility.
> 
> 1048576 bytes
> selectors::parser::parse_one_simple_selector
> -> into<&str,style::gecko_string_cache::Atom>
> -> from
> -> Gecko_Atomize (C++)
> -> (more C++ frames)
> -> je_calloc

So, there's not too much we could do here, I think, this is part of the gecko atom machinery.

> 1966080 bytes
> entry<style::gecko_string_cache::Atom,style::selector_map::SelectorMap<style:
> :invalidation::element::invalidation_map::Dependency>,core::hash::
> BuildHasherDefault<style::selector_map::PrecomputedHasher>>
> -> <std::collections::hash::map::HashMap<K, V, S>>::resize
> ->
> new<style::gecko_string_cache::Atom,style::selector_map::SelectorMap<style::
> invalidation::element::invalidation_map::Dependency>>
> ->
> new_uninitialized<style::gecko_string_cache::Atom,style::selector_map::
> SelectorMap<style::invalidation::element::invalidation_map::Dependency>>
> -> allocate
> -> UnknownInlinedFun
> -> UnknownInlinedFun
> -> je_malloc
> 
> 1048576 bytes
> insert<style::stylist::Rule>
> -> entry<smallvec::SmallVec<[style::stylist::Rule; 1]>>
> -> <std::collections::hash::map::HashMap<K, V, S>>::entry
> -> <std::collections::hash::map::HashMap<K, V, S>>::resize
> ->
> new<style::gecko_string_cache::Atom,smallvec::SmallVec<[style::stylist::Rule;
> 1]>>
> ->
> new_uninitialized<style::gecko_string_cache::Atom,smallvec::SmallVec<[style::
> stylist::Rule; 1]>>
> -> allocate
> -> UnknownInlinedFun
> -> UnknownInlinedFun
> -> je_malloc

These two are known-ish, and we could make them fallible (Gecko does that, at the expense of incorrect styling, but that seems acceptable). There is no API to push to a raw vector fallibly right now though (nor insert in a HashMap, fwiw), so it can be hard...

> 1048576 bytes
> parse_rules
> -> push<style::stylesheets::CssRule>
> -> <alloc::raw_vec::RawVec<T>>::double
> -> reallocate
> -> UnknownInlinedFun
> -> UnknownInlinedFun
> -> je_realloc

Not sure we can do too much about this either, this one is parsing stylesheets, which we assume infallible. We could make them fallible I guess, and just lose track of the stylesheets, but I'm not sure it's worth it? (Also, not sure whether Gecko does it).
So, I did a few analysis of the oom crashes in bug 1388301 comment 5.

Note that all of them are win32, and that those are _all_ crashes reported ever for stylo (there's only a single one stylo-related on beta).

Seems like those allocations aren't a huge problem, but we should probably consider making the rule cascades not OOM (that's a few of the crashes, though some of the signatures are from bug 1380488, where the culprit ended up not being that).

We could also look into what could we do with stylesheet parsing, we could probably improve memory usage there quite a bit, and there are two ooms reported (again, most of them not recently at all, so could be for other culprits).
Note that Nightly users tend to have higher-end machines with more RAM, and so the OOM rate is higher on release. So it's easy to underestimate the magnitude of the OOM situation when you're just looking at Nightly. We regularly convert large infallible allocations to fallible in Firefox based on crash report data.
(In reply to Nicholas Nethercote [:njn] from comment #21)
> Note that Nightly users tend to have higher-end machines with more RAM, and
> so the OOM rate is higher on release.

That's a fair point.

I don't think we can easily make fallible allocations right now on stable Rust without forking Vec and HashMap. Note that it takes even a bit more work than that because we use SmallVec, so we'd need to fork that too (and maybe others? I'm not sure).

In any case, depending on how much we care about it, there's active discussion on fallible allocations in Rust[1] (the WR team has been pushing for this).

Forking half of libcollections isn't trivial, so I'd prefer to wait until we have proper fallible allocations in Rust's standard library.

[1]: https://internals.rust-lang.org/t/notes-interviews-on-fallible-collection-allocations/5619
(SmallVec is not in the standard library, we maintain it. But it is based on Vec to allocate on stable Rust with the standard library’s allocator.)
> Forking half of libcollections isn't trivial, so I'd prefer to wait until we
> have proper fallible allocations in Rust's standard library.

I'm sure everyone would prefer that. But that won't happen until after Stylo is released, and in the meantime there's a chance that we'll have high numbers of avoidable OOM crashes and waiting a couple of months to fix it won't be acceptable :(
So. I think we need to give ourselves the capability to support fallible allocations. We don't necessarily need to go crazy converting all the callers to fallible APIs, but we need to be in a position to respond to OOM crashes that appear on beta when we hit a more diverse population of hardware. We can uplift targeted fixes to make calls fallible if need be, but we don't want to be uplifting core data structure replacements. So we need to start preparing now.

Here's my proposal to making this as painless as possible:

First, we add a crate to crates.io called fallible_vec, or somesuch. This crate is very small, providing basically just a few simple free functions: fallible_append, fallible_insert, etc. It would look like this: https://play.rust-lang.org/?gist=d11096a7a52ac712d38dbb1d411a9a46&version=stable

The embedding would have to provide an implementation of __rust_fallible_realloc in the final linked binary.

The advantage of this approach is that it lets us patch fallibility into the existing Vec type, which avoids forking lots of code. SmallVec can just have an optional "fallible" feature, which would cause it to pull in fallible_vec, and use it to expose analogous fallible insertion APIs onto SmallVec.

The HashMaps will be more of a pain. We will either need to fork the std hashmap and base it on vec, or find a suitable vec-based replacement on crates.io. But once we have it, we can use fallible_vec to add the fallible APIs.

What does everybody think here? Curious for Nathan's opinion (since he did the allocators), and also curious as to whether Matt has cycles to make the crate.
Flags: needinfo?(nfroyd)
Flags: needinfo?(mbrubeck)
Summary: stylo: check whether stylo performs large infallible allocations → stylo: Add fallibility support for large allocations
I am much happier with this approach that doesn't fork the standard Vec along with targeted application. However, this doesn't seem to help us for HashMap. Changing the implementation of HashMap may also lead to chasing performance issues, not to mention the possibility of introducing memory safety issues. We've had multiple years in which to debug and tune Rust's HashMaps, and that is with a bigger number of eyes on the code and more developers using it.

There were only a few Stylo crashes in the nightly population, and a theory that there may be more in bigger populations. How many are solved by just the Vec portion of this vs. Vec + HashMap? How many could Emilio's mem optimization solve? Would one or both of those solutions (ie, not touching HashMap) be enough to address the issue here?
(As a small additionaly note to comment 25, we'd want to mark the double method as #[inline(never)] to keep the hot path fast).

(In reply to Jack Moffitt [:jack] from comment #26)
> I am much happier with this approach that doesn't fork the standard Vec
> along with targeted application. However, this doesn't seem to help us for
> HashMap. Changing the implementation of HashMap may also lead to chasing
> performance issues, not to mention the possibility of introducing memory
> safety issues. We've had multiple years in which to debug and tune Rust's
> HashMaps, and that is with a bigger number of eyes on the code and more
> developers using it.
> 
> There were only a few Stylo crashes in the nightly population, and a theory
> that there may be more in bigger populations. How many are solved by just
> the Vec portion of this vs. Vec + HashMap? How many could Emilio's mem
> optimization solve? Would one or both of those solutions (ie, not touching
> HashMap) be enough to address the issue here?

The impression I get from njn (who's been running Firefox stability stuff for a while) is that large allocation OOMs consistently show up when code moves to a wider channel, and that they need to be addressed when that happens. We might get lucky and have it turn out to be a non-issue, but it seems pretty darn risky to hang the entire stylo-in-57 story on that.

That said, I would also love to avoid forking HashMap. Here are some ideas:

(1) AFAICT, HashMap insertion does reserve(1), which basically checks capacity, and if capacity isn't there, allocates an entirely separate RawTable and re-inserts everything. The key point here is that there's no realloc, so we can maintain the same characteristics as std::HashMap by simply allocating an additional hashmap (fallibly, if we can figure out how to do that), and draining/re-inserting. It might be more palatable to depend on the struct layout of HashMap and monkeypatch in our initial buffer than to fork it.

(2) I wonder if we could add FFI APIs to mozjemalloc to help us out here. Could we have a function that just says: "if we try to allocate a chunk of size X next, will it succeed?" This potentially slows down allocations, because it might need to duplicate work. But it might be able to say "you're fine" most of the time, and only have to do that extra work to answer the question precisely in the cases where we're under severe memory pressure. I also don't know how much of the work of malloc is finding the chunk vs other stuff, so I don't have a good sense of how much overhead this would add. glandium, can you comment?
Flags: needinfo?(mh+mozilla)
(In reply to Jack Moffitt [:jack] from comment #26)
> We've had multiple years in which to debug and tune Rust's HashMaps, and that is with a bigger number of eyes on the code and more developers using it.

The thing with hashmap performance is that it's highly dependent on workload. Rust's hashmaps are robin-hood and generally perform well for most workloads, but they may not always be the best. So it's not as clear cut; switching hashmap impls may not change perf at all, or even make things faster. I would measure first.

The main problem is that std's hashmaps are full of unsafe code. You don't need unsafe to make good hashmaps in Rust, but you do need unsafe to build the particular hashmap design the stdlib has (in particular because it has to deal with complex invariants for vacant/full slots IIRC).

There are a bigger number of eyes on std's code, which is why I trust it. I do not feel comfortable with us maintaining a fork with essentially zero eyes.

For example, https://crates.io/crates/ordermap is a pure safe crate which we probably can add APIs to (or fork, it's not unsafe, and easier to maintain!). It is generally pretty good, and better over HashMap in some cases (of course, also worse in others). Worth measuring before we use it but I suspect it won't be an issue.

There are other hashmap impls out there, though I'm not really aware of which ones are recommended.


-------

(In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #27)
> AFAICT, HashMap insertion does reserve(1), which basically checks capacity, and if capacity isn't there, allocates an entirely separate RawTable and re-inserts everything. The key point here is that there's no realloc, so we can maintain the same characteristics as std::HashMap by simply allocating 

A caveat is that there's a bunch of quadratic behavior that may happen here if not done right. I think this was an issue in the stdlib at some point and the fix was to reseed the hasher on creating new maps so that iteration order of the old hashtable is effectively random from the perspective of the new hash table.

I think within the resize method it does something different to avoid this.

If we're doing this, we should be careful. I don't think it's an insurmountable problem, but it may be a problem to be wary of.

> Could we have a function that just says: "if we try to allocate a chunk of size X next, will it succeed?" 

I like this idea.
(In reply to Manish Goregaokar [:manishearth] from comment #28)
> > Could we have a function that just says: "if we try to allocate a chunk of size X next, will it succeed?" 
> I like this idea.

It's intrinsically racey, though, and I'm not sure how we could de-race
it without taking a big performance hit.  Is it OK if it occasionally
says "yes (it will succeed)" and later have the allocation fail?
> (2) I wonder if we could add FFI APIs to mozjemalloc to help us out here. Could we have a function that just says: "if we try to allocate a chunk of size X next, will it succeed?"

We're only talking about very large allocations here, right? (>= 1MB). Then the only way for the allocator to know is to actually do it, which means such an API would essentially be:

bool can_alloc(size_t size) {
  void *ptr = malloc(size);
  if (ptr) {
    free(ptr);
  }
  return !!ptr;
}

which you might as well do yourself. Note that doesn't prevent some other thread to race with you, get another chunk of memory, preventing your own malloc after the check to fail anyway.

Really, what you need is an Hashmap that can deal gracefully with a failure to allocate its underlying buffer.
Flags: needinfo?(mh+mozilla)
Note, the API /could/ take some shorcuts that free(malloc()) wouldn't, but:
- It would mean keeping that API functional over changes to the allocator (and those aren't hypothetical changes, I'm planning to make changes to those areas), which means maintenability burden on the allocator end.
- That wouldn't change anything about the raceability.
(In reply to Mike Hommey [:glandium] from comment #30)
> Note that doesn't prevent some other thread to race with you, [..]

It's not just other threads in the same process that we're racing against.
We're also potentially racing against other processes in the system since
processes globally compete for memory resources (ram, swap).
BTW, that's already the second API proposed at the allocator level, with dependencies on the allocator guts, with the maintenability burden that has, to avoid maintaining a custom Hashmap implementation in rust, that deals with fallible allocation and can tell how much memory it allocated. At this point, I'm willing to say the latter would be less a burden than the former. And I'm not saying this because I'd be the maintainer in the former case and not in the latter case: I'd rather maintain such a hashmap implementation if necessary than those 2 extra allocator APIs (meaning, I'm volunteering if necessary).
Priority: -- → P1
(In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #25)
> The embedding would have to provide an implementation of
> __rust_fallible_realloc in the final linked binary.
>
> What does everybody think here? Curious for Nathan's opinion (since he did
> the allocators), and also curious as to whether Matt has cycles to make the
> crate.

I think you wildly overestimate how much I did with the allocator stuff, on the Rust and/or the Gecko side. :)

I think the approach sounds reasonable.  I have a small preference for not prefixing the embedding-required function with __rust, to avoid name collisions with any future Rust runtime changes, but perhaps the function's interface can be designed to be palatable if Rust ever wanted to have fallible allocation functions in their runtime?
Flags: needinfo?(nfroyd)
This won't allow you to avoid forking the stdlib collections, but here's my proposals for the API y'all should mimic:

RFC: https://github.com/rust-lang/rfcs/pull/2116
impl: https://github.com/rust-lang/rust/pull/43890

The precise error type is likely to change a bit, but I don't expect you'll ever inspect it anyway.
(In reply to Manish Goregaokar [:manishearth] from comment #28)
> switching hashmap impls may not change perf at all, or even make things
> faster. I would measure first.

I pulled OrderMap (https://crates.io/crates/ordermap/0.2.10) and had a look
at it.  It looks small and self contained and my impression is that it
wouldn't be hard to fork and hack on, if we wanted to do that.

Helpfully it comes with a set of benchmark tests, so I ran them ('cargo
bench').  I didn't look at the tests at all, but from their names, typically
xx_hashmap_yy and xx_ordermap_yy, I'd guess that many of them are pairs that
compare HashMap with OrderMap for the same workload.  In hindsight this
sounds like an obvious thing that the OrderMap author(s) would want to do,
so I'm not surprised to see it.

The results are interesting.  OrderMap is often faster than HashMap, and
usually at least competitive.  Details in attachment below.  Given my total
non-investigation of what the tests actually do, I'd take this cautiously.
But the initial impression is that OrderMap might be worth investigating as
a stopgap until such time as HashMap gets the fallible support we need.

A couple of concerns:

* This only concerns speed.  OrderMap would need to be competitive from a
  space perspective too.

* From an initial scan of the sources, OrderMap contains some internal
  trickery in which it is possibly slightly differently on 32- vs 64-bit
  targets.  So we'd need to check both.
(In reply to Manish Goregaokar [:manishearth] from comment #28)
> > Could we have a function that just says: "if we try to allocate a chunk of size X next, will it succeed?" 
> I like this idea.

Considering that there is no feasible way to de-race this [AFAICS] and
that it would add further to our mozjemalloc maintenance burden, I don't
think this is a plausible approach.  At least as it stands.

Plus .. how would we implement it in the --disable-jemalloc case?
Yeah, ordermap uses a scheme similar to Python's dict, and is what Rust uses on the benchmarks game. IIRC it isn't a clear win (highly depends on the kind of load) and there are reasons for keeping the stdlib hashmap as is instead of switching to this.
Attached file FallibleVec::push.
I talked with Julian today, and probably hacking on ordermap + smallvec using a simple FallibleVec instead of Vec for the storage may not be _that_ hard. here's a prototype implementation for FallibleVec::push, pretty much based on the snippet Bobby posted before.

Bobby, does this seem reasonable?
Attachment #8898800 - Flags: feedback?(bobbyholley)
So, what I propose to do is make a patch that uses OrderMap instead of
HashMap for Stylo's PreComputedHashMap and MaybeCaseInsensitiveHashMap
(since Emilio tells me these are the main hot ones).  And then see what
the perf effects are.  Then, if they are acceptable, we can see about
modifying OrderMap to be infallible.

Bobby (+ all!), does that also sound reasonable?
Flags: needinfo?(bobbyholley)
Comment on attachment 8898800 [details]
FallibleVec::push.

This looks good, though I think we ought to have the core functionality be free functions, separate from the newtype (like I did in the snippet) so that stuff like SmallVec can optionally expose fallible methods without altering the inner types.
Flags: needinfo?(bobbyholley)
Attachment #8898800 - Flags: feedback?(bobbyholley) → feedback+
(In reply to Julian Seward [:jseward] from comment #41)
> So, what I propose to do is make a patch that uses OrderMap instead of
> HashMap for Stylo's PreComputedHashMap and MaybeCaseInsensitiveHashMap
> (since Emilio tells me these are the main hot ones).  And then see what
> the perf effects are.  Then, if they are acceptable, we can see about
> modifying OrderMap to be infallible.
> 
> Bobby (+ all!), does that also sound reasonable?

Yes, this sounds like a great plan. Please work with emilio to develop the appropriate microbenchmarks to measure the cases we care about. Thanks!
Flags: needinfo?(mbrubeck)
We’re now talking about a full set of alloc/realloc/dealloc functions, fully separate from the standard library’s allocator, right? With new collection types built on top? That sounds fine.

Comment 25 proposed adding __rust_fallible_realloc that we would use with Vec::from_raw_parts. I think that could cause UB in case __rust_fallible_realloc and Vec::drop end up using different allocators. Making sure they don’t is not trivial: in current Rust versions the default allocator used (jemalloc or system’s malloc/HeapAlloc) by the standard library (including Vec::drop) depends on the platform and compilation mode (executable vs dynlib or staticlib). There’s a mechanism to pick one explicitly, but it’s not stable yet: https://github.com/rust-lang/rust/issues/27389
(In reply to Simon Sapin (:SimonSapin) from comment #44)
> We’re now talking about a full set of alloc/realloc/dealloc functions, fully
> separate from the standard library’s allocator, right? With new collection
> types built on top? That sounds fine.

That's not my understanding. My proposal is designed to patch in the realloc bit so that all of the code and data types can otherwise stay the same, to minimize the type divergence and special handling for callers that want fallible append.

The proposal for comment 25 does this, for Vec, and we could then add a feature to SmallVec to support fallible append using the same mechanism (using the rest of the code and types unchanged). The only question is what to do for hashmaps. If we succeed in finding a suitable third-party hashmap alternative, we could presumably apply the same trick.

> 
> Comment 25 proposed adding __rust_fallible_realloc that we would use with
> Vec::from_raw_parts. I think that could cause UB in case
> __rust_fallible_realloc and Vec::drop end up using different allocators.
> Making sure they don’t is not trivial: in current Rust versions the default
> allocator used (jemalloc or system’s malloc/HeapAlloc) by the standard
> library (including Vec::drop) depends on the platform and compilation mode
> (executable vs dynlib or staticlib).

Sure, consumers would have to be in a situation to ensure that, which Firefox is (all Rust allocations are routed through the same mozjemalloc allocator). If we can keep the complexity and maintenance cost our solution simpler at the expense of not supporting some non-Firefox configurations, that's totally worth it. The only reason we're doing this extra stuff here is to support Firefox's OOM handling requirements, and the standard library will eventually grow a real replacement.

> There’s a mechanism to pick one
> explicitly, but it’s not stable yet:
> https://github.com/rust-lang/rust/issues/27389
(In reply to Julian Seward [:jseward] from comment #41)
> So, what I propose to do is make a patch that uses OrderMap instead of
> HashMap for Stylo's PreComputedHashMap and MaybeCaseInsensitiveHashMap

I have this working now, with a lot of help from Emilio.

I did some initial perf runs on a pretty quiet machine, 10 iterations of the
Obama Wikipedia page, and collecting the "traversal_time_ms" values.  This
is just so as to get some feel for the OrderMap perf effects.  It's slightly
slower, although the measurements are as usual swamped with noise:

  6 thr    HM-best 15.0039
           OM-best 15.0047 (+ 0.00%)

  2 thr    HM-best 27.6556
           OM-best 28.1321 (+ 1.72%)

  1 thr    HM-best 47.1426
           OM-best 47.8027 (+ 1.40%)

I'll see if there's any possibility of hacking OrderMap to be faster.
I talked with its author on irc just now and got some useful tips.
(In reply to Julian Seward [:jseward] from comment #46)
> (In reply to Julian Seward [:jseward] from comment #41)
> > So, what I propose to do is make a patch that uses OrderMap instead of
> > HashMap for Stylo's PreComputedHashMap and MaybeCaseInsensitiveHashMap
> 
> I have this working now, with a lot of help from Emilio.
> 
> I did some initial perf runs on a pretty quiet machine, 10 iterations of the
> Obama Wikipedia page, and collecting the "traversal_time_ms" values.  This
> is just so as to get some feel for the OrderMap perf effects.  It's slightly
> slower, although the measurements are as usual swamped with noise:

Is this with manual measurements? It might be worthwhile to try out my iterations patch a la bug 1365692 comment 20.

A 1-2% regression in overall styling performance as a result of switching out the hashmaps is a lot, given that styling does a lot of things, most of which are not related to hashmap lookups. If the regression is real, I would expect us to be able to construct microbenchmarks that demonstrate the problem more clearly.

Basically, it seems like we may want to construct a gtest benchmark, similar to Servo_StyleSheet_FromUTF8Bytes_Bench. We'd parse a stylesheet and use it to construct a stylist. We would both measure the time spent constructing the stylist (which would measure insertion performance), and then do a bunch of lookups on the various hashtables and measure that (to give us an idea of lookup performance).
Joe asked me to put together a document outlining what we're doing here and why. Here it is, for reference:

https://docs.google.com/document/d/1ERMv87Fir5krSRKOklg-dLdn10c5NtKMdk8UB49AWBM/edit?usp=sharing

I just had a conversation with Jack about it. He's much more in favor of patching in fallibility to std::HashMap than switching to OrderMap, since it keeps the data structures more standard and can be removed when the stdlib exposes the appropriate APIs. I'm also happy with that solution, because it removes the agony around performance comparisons with different HashMap implementations.

So to recap, here is the plan of record:

First, we implement fallibility for Vec and SmallVec per comment 5. Emilio's patch somewhat does this, but per my review comments I think we shouldn't introduce a separate type. We'd also want to use Gankro's API convention from comment 35.

Then, we add a fallible insertion free function for HashMap. The wrapper would be inline and forward to insert() if capacity() != len(). If |capacity() == len() && (cfg!(debug_assertions) || capacity() > OOMLargeThreshold/2)), we invoke the cold out-of-line fallible resize routine. This routine does the following:
(1) Fallibly allocates a vec with capacity*2. Returns failure if the allocation fails.
(2) Creates a new empty HashMap, with a different seed to address comment 28.
(3) With unsafe code and static assertions, patches the buffer and capacity into the RawTable inside the new HashMap.
(4) Drains the old hashmap and inserts into the new one.

Julian, does this all make sense and seem like something you can take care of? Sorry about wasting your time with the OrderMap stuff.
Flags: needinfo?(jseward)
(In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #48)
> Joe asked me to put together a document outlining what we're doing here and
> why. Here it is, for reference:
> 
> https://docs.google.com/document/d/1ERMv87Fir5krSRKOklg-
> dLdn10c5NtKMdk8UB49AWBM/edit?usp=sharing

Nice summary!

> I just had a conversation with Jack about it. He's much more in favor of
> patching in fallibility to std::HashMap than switching to OrderMap

Relatedly, I have code for bug 1389305 working -- it will add a new function to jemalloc that lets us measure heap allocations even when we only have an interior pointer. This means we will be able to measure HashMap accurately, i.e. sticking with HashMap won't cause problems for memory reporting.

> (1) Fallibly allocates a vec with capacity*2. Returns failure if the
> allocation fails.
> (2) Creates a new empty HashMap, with a different seed to address comment 28.
> (3) With unsafe code and static assertions, patches the buffer and capacity
> into the RawTable inside the new HashMap.

Presumably this requires knowing the internal layout of HashMap? Presumably the static assertions will aim to detect if that layout changes? I'm struggling to imagine what those static assertions would look like.
(In reply to Nicholas Nethercote [:njn] from comment #49)
> Presumably this requires knowing the internal layout of HashMap? Presumably
> the static assertions will aim to detect if that layout changes? I'm
> struggling to imagine what those static assertions would look like.

They could detect size changes (so added/removed members), which is at least something. But we can also just tell the libs team to warn us. I think the chance that it changes before Rust adds the right APIs is small, and the chance that nobody would warn us is smaller still, and the chance that it wouldn't cause obvious crashes in Firefox automation is even smaller. So I think we can live with it.
(In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #48)
> Julian, does this all make sense and seem like something you can take care of?

Yes, I'll do that.  It seems like a good plan.

FWIW, I spent much of yesterday hacking around with OrderMap.  It has some
extra complexity to do with ensuring that more than 2^32 elements can be
inserted, if there's enough memory.  This gives some overhead even in the
case that there are fewer than 2^32 elements in the map.  It might be
possible to remove the more-than-2^32-elements support and associated
overhead, but the performance effects are uncertain and it would be some
work.  So sticking with HashMap sounds better.
Flags: needinfo?(jseward)
Let's start with Vec::fallible_push, since it is lower hanging fruit -- it's
simple to do, and from the stacks in comment 17, it looks like there are
some quite large vectors created by Stylo.

Here's a first attempt.  It is mostly in the style of Bobby's patch in
comment 25 but with a lot of help from Emilio.  To test it, I used it in the
|rules.push| call in |impl Stylesheet, fn parse_rules|
(components/style/stylesheets/stylesheet.rs) since that's one place that the
profiling in comment 17 pointed at.

Loading techcrunch.com, the vector in question is sometimes extended out to
4096 elements.  I caused an artificial failure at the 1024 point.  This
causes techcrunch.com to look (extremely) screwy, but the system holds
together -- no exceptions, segfaults or any such.

Regarding the patch, there are a number of things I am unsure about and
would appreciate clarification:

(1) I simply used the system malloc and realloc:

    +extern "C" {
    +    fn realloc(ptr: *mut u8, bytes: usize) -> *mut u8;
    +    fn malloc(bytes: usize) -> *mut u8;
    +}

    and it seems to work OK, but Simon's comment 44 has me worried.
    Is using the system allocators OK, or is there more trickery that
    needs to happen here?

    Also, Bobby's comment 25 proposal seemed to involve a
    |__rust_fallible_realloc|, which I don't understand.  Is it related
    to Simon's comment?

(2) In the patch, function |fallible_double|, I am unsure whether it is
    correct even if realloc reallocates in-place.  I just copied the magic
    incantation |mem::forget(mem::replace(vec, new_vec));| from Bobby's
    patch without understanding what it does.

(3) Do we want |fallible_push| only to return a bool?  Or should it
    return the to-be-pushed item on failure, so that the caller doesn't
    simply "lose" it in that case?  (I saw something about this in the
    preceding discussions to do with fallible HashMaps.)

(4) Is there a way we can annotate the |if vec.capacity() == vec.len()|
    condition with a branch-direction hint (here, "unlikely"), like we
    can in C++-world?
Flags: needinfo?(simon.sapin)
Flags: needinfo?(bobbyholley)
(In reply to Julian Seward [:jseward] from comment #52)
> Created attachment 8900734 [details] [diff] [review]
> bug1389009-Vec-fallible_push-1.diff
> 
> Let's start with Vec::fallible_push, since it is lower hanging fruit -- it's
> simple to do, and from the stacks in comment 17, it looks like there are
> some quite large vectors created by Stylo.
> 
> Here's a first attempt.  It is mostly in the style of Bobby's patch in
> comment 25 but with a lot of help from Emilio.  To test it, I used it in the
> |rules.push| call in |impl Stylesheet, fn parse_rules|
> (components/style/stylesheets/stylesheet.rs) since that's one place that the
> profiling in comment 17 pointed at.
> 
> Loading techcrunch.com, the vector in question is sometimes extended out to
> 4096 elements.  I caused an artificial failure at the 1024 point.  This
> causes techcrunch.com to look (extremely) screwy, but the system holds
> together -- no exceptions, segfaults or any such.

That's great to hear :)

> Regarding the patch, there are a number of things I am unsure about and
> would appreciate clarification:
> 
> (1) I simply used the system malloc and realloc:
> 
>     +extern "C" {
>     +    fn realloc(ptr: *mut u8, bytes: usize) -> *mut u8;
>     +    fn malloc(bytes: usize) -> *mut u8;
>     +}
> 
>     and it seems to work OK, but Simon's comment 44 has me worried.
>     Is using the system allocators OK, or is there more trickery that
>     needs to happen here?

Yeah, this works fine for us, but not for random rust consumers.

>     Also, Bobby's comment 25 proposal seemed to involve a
>     |__rust_fallible_realloc|, which I don't understand.  Is it related
>     to Simon's comment?

The spirit was to provide that symbol in central, but I think just using realloc is fine.

> (2) In the patch, function |fallible_double|, I am unsure whether it is
>     correct even if realloc reallocates in-place.  I just copied the magic
>     incantation |mem::forget(mem::replace(vec, new_vec));| from Bobby's
>     patch without understanding what it does.

It's pretty much equivalent to "avoid destroying the old vector, since now it contains an invalid pointer we'll double-free".

> (3) Do we want |fallible_push| only to return a bool?  Or should it
>     return the to-be-pushed item on failure, so that the caller doesn't
>     simply "lose" it in that case?  (I saw something about this in the
>     preceding discussions to do with fallible HashMaps.)

It should probably return at least a `Result<(), ()>`, which is equivalent to a bool, but is `#[must_use]`. I don't have a strong preference in `Result<(), ()>`, vs. `Result<(), T>`. I don't think we're going to need the T any time soon, so I think the first is easier.

> (4) Is there a way we can annotate the |if vec.capacity() == vec.len()|
>     condition with a branch-direction hint (here, "unlikely"), like we
>     can in C++-world?

I know there are likely / unlikely intrinsics, but those are unstable: https://doc.rust-lang.org/std/intrinsics/fn.likely.html

I _think_ you may get the same effect with a standalone function and #[cold], but I'm not an expert on that.
I’m extremely worried about any plan that involves making assumptions about private implementation details of standard library types, such as the memory layout or which allocator is used. Yes, static assertions can help turn some cases of UB into compilation errors. Yes, we can ask the Rust team to tell us when these details change. But things would still break when they do change. With such a plan, in addition to a minimum Rust version requirement, any given Firefox version would effectively also have a maximum Rust version.

Do we really want to say “only Rust 1.20.x is supported for compiling Firefox 57”? That would defeat the point of using the stable channel.

I strongly feel that this is a deal-breaker, and that the only viable way to get fallible allocations before they’re supported in the standard library is to build separate collections types on top of libc::malloc/free (or jemalloc equivalents), fully independent from the standard library collections or allocators. (They can be a copy/fork of the same code, as long as we control what exact version of it is used.)
Flags: needinfo?(simon.sapin)
Agreed.

I consider "guessing" HashMap layout to be the worst of the options here. Especially since we don't pin rust versions at all.

I really think the ordermap solution is the way to go; fallible APIs can be easily added to OrderMap, even without a fork (but if we fork it it's easy to maintain since it's 100% safe)
Specifically, I’m proposing:

* Copying into a new crate the entire implementation of std::vec::Vec and std::collections::HashMap, including anything else they need that is private or unstable. (This includes the std::heap::Alloc trait, except that our equivalent of its std::heap::Heap impl could use libc or jemalloc directly and skip std’s mechanism for changing the default allocator.)
* Add fallible methods to that copy.
* Change SmallVec to be based on this instead of std::vec::Vec

Forking such a large amount of code is unfortunate, but I believe this is the “least bad” option we have left.

As more relevant APIs (like the Alloc trait) are stabilized in future Rust versions, we can incrementally replace pieces of this crate with imports from std. When std supports fallible operations and there’s nothing left, we can switch everything back to using std::vec::Vec and std::collections::HashMap.
Blocks: 1393656
Let's move the HashMap discussion over to bug 1393656 and focus on Vec and SmallVec in this bug.
Summary: stylo: Add fallibility support for large allocations → stylo: Add fallible append APIs for Vec and SmallVec
Looks like emilio mostly answered the questions. Let me know if you have others!
Flags: needinfo?(bobbyholley)
(In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #58)
> Looks like emilio mostly answered the questions. Let me know if you have
> others!

I don't feel that we have a clear story on whether it's OK or not to use
system malloc and realloc, and that seems to me to be a gating condition
for this approach.  So:

* is it OK to use them?

* if not, is there some other way to do the allocation, that would be
  acceptable?
Flags: needinfo?(simon.sapin)
Flags: needinfo?(bobbyholley)
It is important that, when deallocating or reallocating, the same allocator is used that previously allocated that particular chunk of memory.

The allocator used by Vec can be controlled with #[global_allocator], but that language feature is unstable: https://github.com/rust-lang/rust/issues/27389. The default is unspecified. In my opinion, this means it is not OK to use Vec::from_raw_parts with a pointer that was not allocated by Vec itself (for example with Vec::with_capacity). Unfortunately, Vec does not provide a fallible allocation API in current Rust versions. In my opinion, this means we need to fork Vec in order to control which allocator is used in drop().
Flags: needinfo?(simon.sapin)
(In reply to Simon Sapin (:SimonSapin) from comment #60)
> It is important that, when deallocating or reallocating, the same allocator
> is used that previously allocated that particular chunk of memory.
> 
> The allocator used by Vec can be controlled with #[global_allocator], but
> that language feature is unstable:
> https://github.com/rust-lang/rust/issues/27389. The default is unspecified.
> In my opinion, this means it is not OK to use Vec::from_raw_parts with a
> pointer that was not allocated by Vec itself (for example with
> Vec::with_capacity). Unfortunately, Vec does not provide a fallible
> allocation API in current Rust versions. In my opinion, this means we need
> to fork Vec in order to control which allocator is used in drop().

Note that we rely on already on the fact that rust static libraries call into the system allocator.

So as long as we don't introduce these hacks into crates that are used outside of stylo, I personally think it's fine.
The default allocator is specified for static and dynamic libraries (which stylo is one):

https://doc.rust-lang.org/1.9.0/book/custom-allocators.html#default-allocator:

> Dynamic and static libraries, however, will use alloc_system by default.
> Here Rust is typically a 'guest' in another application or another world
> where it cannot authoritatively decide what allocator is in use.
>
> As a result it resorts back to the standard APIs (e.g. malloc and free) for
> acquiring and releasing memory.

Based on ^, I think it's ok to use as long as we don't end up using it on
libraries on the general rust ecosystem (that is, as long as we keep it on the 
subset of rust code that gets into firefox, but not on other crates published
on crates.io).
Looks like emilio answered the question here.
Flags: needinfo?(bobbyholley)
Implements {Vec,SmallVec}::try_push.  Has been tested a bit but 
the test code is not in the patch.
Attachment #8902183 - Flags: feedback?(emilio)
Comment on attachment 8902183 [details] [diff] [review]
bug1389009-Vec-fallible_push-3.diff

Review of attachment 8902183 [details] [diff] [review]:
-----------------------------------------------------------------

Looks pretty reasonable to me!

::: servo/components/fallible/lib.rs
@@ +43,5 @@
> +
> +// Double the capacity of |vec|, or fail to do so due to lack of memory.
> +// Returns true on success, false on failure.
> +#[inline(never)]
> +fn try_double_vec<T>(vec: &mut Vec<T>) -> bool

I'd probably make these functions return a result too.

@@ +77,5 @@
> +
> +impl<T: Array> FallibleVec<T::Item> for SmallVec<T> {
> +    #[inline]
> +    fn try_push(&mut self, val: T::Item) -> Result<(),()>
> +    {

nit: this formatting is somewhat odd, feel free to install `rustfmt` using:

    cargo install rustfmt-nightly

And run it through this crate.

@@ +79,5 @@
> +    #[inline]
> +    fn try_push(&mut self, val: T::Item) -> Result<(),()>
> +    {
> +        if self.capacity() == self.len() {
> +            if !try_double_small_vec(self) {

That way you can just do:

    try_double_small_vec(self)?;
    debug_assert!(..);

@@ +120,5 @@
> +                memcpy(new_ptr as *mut u8, old_ptr as *const u8,
> +                       old_size_bytes);
> +            }
> +        }
> +    }

nit: Is there something that guarantees the alignment of the memory is suitable? I think we can rely on this, but perhaps we should assert we don't use more than word-aligned types? I'd check that with Bobby or Manish, that implemented the Arc hackery to be able to store stuff inline in the header and so on.
Attachment #8902183 - Flags: feedback?(emilio) → feedback+
(In reply to Emilio Cobos Álvarez [:emilio] from comment #66)
> Review of attachment 8902183 [details] [diff] [review]:

Thanks for the feedback.  The alignment thing is a good point.
I can add this (per your irc hinting):

    // Assert that the alignment required by T doesn't exceed the minimum
    // alignment (word alignment) we can reasonably assume that
    // malloc/realloc give us.
    assert!(mem::align_of::<T>() <= mem::align_of::<*const ()>());
Incorporating Emilio's feedback.
Attachment #8900734 - Attachment is obsolete: true
Attachment #8902183 - Attachment is obsolete: true
Fix comment to match source code (return types).
Attachment #8902232 - Attachment is obsolete: true
Attachment #8902237 - Flags: feedback?(bobbyholley)
(In reply to Julian Seward [:jseward] from comment #67)
> (In reply to Emilio Cobos Álvarez [:emilio] from comment #66)
> > Review of attachment 8902183 [details] [diff] [review]:

>     // Assert that the alignment required by T doesn't exceed the minimum
>     // alignment (word alignment) we can reasonably assume that
>     // malloc/realloc give us.
>     assert!(mem::align_of::<T>() <= mem::align_of::<*const ()>());

On contemplation I have doubts about this.  On 32 bit ARM, the word size
is 4, but minimum alignment is 8, since the ARM doubleword load/store
instructions for one require that.
Comment on attachment 8902237 [details] [diff] [review]
bug1389009-Vec-fallible_push-5.diff

Review of attachment 8902237 [details] [diff] [review]:
-----------------------------------------------------------------

Generally seems good to me. Please get review from Manish on the final patch, and coordinate with him to ensure everything works smoothly between this stuff and the HashMap stuff.

For the allocation alignment stuff, it might be worth coordinating with Manish and breaking https://github.com/Manishearth/hashglobe/blob/master/src/alloc.rs out into a separate crate that could be shared by hashglobe and fallible.

Also see https://github.com/servo/servo/issues/17320 for an ARM crash that we had with our hand-rolled Arc, and the code we used to fix it.

::: servo/components/fallible/lib.rs
@@ +39,5 @@
> +}
> +
> +// Double the capacity of |vec|, or fail to do so due to lack of memory.
> +// Returns Ok(()) on success, Err(()) on failure.
> +#[inline(never)]

Mark this as #[cold], same for the other double impl.

@@ +119,5 @@
> +        unsafe {
> +            new_ptr = malloc(new_size_bytes);
> +            if !new_ptr.is_null() && old_size_bytes > 0 {
> +                memcpy(new_ptr as *mut u8,
> +                       old_ptr as *const u8, old_size_bytes);

You can use std::ptr::copy_nonoverlapping instead of memcpy, which reduces the C functions you need to link against.

::: servo/components/style/lib.rs
@@ +46,5 @@
>  #[allow(unused_extern_crates)] extern crate byteorder;
>  #[cfg(feature = "gecko")] #[macro_use] #[no_link] extern crate cfg_if;
>  #[macro_use] extern crate cssparser;
>  extern crate euclid;
> +//extern crate fallible;

Why is this commented out?
Attachment #8902237 - Flags: feedback?(bobbyholley) → feedback+
Blocks: 1395064
Incorporates feedback from comment 71.

> ::: servo/components/style/lib.rs
> @@ +46,5 @@
> >  #[allow(unused_extern_crates)] extern crate byteorder;
> >  #[cfg(feature = "gecko")] #[macro_use] #[no_link] extern crate cfg_if;
> >  #[macro_use] extern crate cssparser;
> >  extern crate euclid;
> > +//extern crate fallible;
>
> Why is this commented out?

I should have simply removed that (and have now).  As it stands this patch
only adds {Vec,SmallVec}::try_push but doesn't add any uses.  So rustc warns
that the crate is unused and the build fails.

> For the allocation alignment stuff [..]

I removed the assertion because I believe it is incorrect.  Also, AFAICS
malloc/realloc will return memory aligned suitable for any basic type, so
we're OK in the short term, providing we're not using SIMD types or anything
similarly exotic.  From talking with MikeH, I agree that the proper fix is
to route it through Manish's copy of src/alloc.rs.
Attachment #8902237 - Attachment is obsolete: true
Attachment #8902554 - Flags: review?(manishearth)
Comment on attachment 8902554 [details] [diff] [review]
bug1389009-Vec-fallible_push-6.diff

Review of attachment 8902554 [details] [diff] [review]:
-----------------------------------------------------------------

::: servo/components/fallible/lib.rs
@@ +30,5 @@
> +    #[inline]
> +    fn try_push(&mut self, val: T) -> Result<(), ()> {
> +        if self.capacity() == self.len() {
> +            try_double_vec(self)?;
> +            debug_assert!(self.capacity() > self.len());

Gankro suggested making the overflow case return Err(()) for hashglobe and I presume that applies here too.

@@ +48,5 @@
> +    let old_cap = vec.capacity();
> +
> +    let new_cap = if old_cap == 0 { 4 } else { old_cap * 2 };
> +
> +    let new_size_bytes = new_cap * mem::size_of::<T>();

the stdlib vec impl (src/liballoc/raw_vec.rs) seems to actually care about alignment (even during deallocation?!). Given that the buffer we create here will be used there too, must we care about it too?

@@ +61,5 @@
> +    if new_ptr.is_null() {
> +        return Err(());
> +    }
> +
> +    let new_vec;

let new_vec = unsafe {
    ...
};

@@ +126,5 @@
> +        return Err(());
> +    }
> +
> +    let new_vec;
> +    unsafe {

nit: no need for deferred init as above
Attachment #8902554 - Flags: review?(manishearth) → review+
(In reply to Manish Goregaokar [:manishearth] from comment #73)

Thanks for the review.

> ::: servo/components/fallible/lib.rs
> @@ +30,5 @@
> > +    #[inline]
> > +    fn try_push(&mut self, val: T) -> Result<(), ()> {
> > +        if self.capacity() == self.len() {
> > +            try_double_vec(self)?;
> > +            debug_assert!(self.capacity() > self.len());
> 
> Gankro suggested making the overflow case return Err(()) for hashglobe and I
> presume that applies here too.

Yes.  If try_double_vec(self) fails then the ?; causes try_push to
also return with failure.  Is that what you meant?

> the stdlib vec impl (src/liballoc/raw_vec.rs) seems to actually care about
> alignment (even during deallocation?!).

Where should I look in src/liballoc/raw_vec.rs to see concerns there about
deallocation?  I looked at all places mentioning "free" and "drop" but didn't
see anything obvious.
Flags: needinfo?(manishearth)
Comment on attachment 8902554 [details] [diff] [review]
bug1389009-Vec-fallible_push-6.diff

Review of attachment 8902554 [details] [diff] [review]:
-----------------------------------------------------------------

::: servo/components/fallible/lib.rs
@@ +46,5 @@
> +    let old_ptr = vec.as_mut_ptr();
> +    let old_len = vec.len();
> +    let old_cap = vec.capacity();
> +
> +    let new_cap = if old_cap == 0 { 4 } else { old_cap * 2 };

This and four other multiplications should use checked_mul and return an error on integer overflow.
Comment on attachment 8902554 [details] [diff] [review]
bug1389009-Vec-fallible_push-6.diff

Review of attachment 8902554 [details] [diff] [review]:
-----------------------------------------------------------------

::: servo/components/fallible/lib.rs
@@ +12,5 @@
> +use std::vec::Vec;
> +
> +extern "C" {
> +    fn realloc(ptr: *mut u8, bytes: usize) -> *mut u8;
> +    fn malloc(bytes: usize) -> *mut u8;

Should these be HeapAlloc on Windows like in https://github.com/servo/servo/pull/18334 ?
(In reply to Simon Sapin (:SimonSapin) from comment #75)
> Comment on attachment 8902554 [details] [diff] [review]
> bug1389009-Vec-fallible_push-6.diff
> 
> Review of attachment 8902554 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: servo/components/fallible/lib.rs
> @@ +46,5 @@
> > +    let old_ptr = vec.as_mut_ptr();
> > +    let old_len = vec.len();
> > +    let old_cap = vec.capacity();
> > +
> > +    let new_cap = if old_cap == 0 { 4 } else { old_cap * 2 };
> 
> This and four other multiplications should use checked_mul and return an
> error on integer overflow.

We don't need to worry about overflow here due to preconditions. However you need to assert those as postconditions, like RawVec does. I suggest reading through RawVec to understand the invariants at play here: https://doc.rust-lang.org/src/alloc/raw_vec.rs.html#48-52

(notably: capacaity can't exceed isize::MAX bytes or else ptr::offset is UB)
Also https://github.com/rust-lang/rust/pull/43890 shows how to impl this allocation code fallibly.
> also return with failure.  Is that what you meant?

No, I was referring to the debug_assert for capacity overflow

> Should these be HeapAlloc on Windows like in https://github.com/servo/servo/pull/18334 ?

Windows still has malloc IIRC? Just not the posix_memalign variants.
Flags: needinfo?(manishearth)
Adds checks for integer overflow, and removes some deferred inits.
Attachment #8902554 - Attachment is obsolete: true
Attachment #8904638 - Flags: review?(manishearth)
Attachment #8904638 - Flags: review?(manishearth) → review+
https://hg.mozilla.org/integration/autoland/rev/b9b58cc99f2f
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla57
Can we (do we need to) uplift this to 56 on m-r, to help with the fix for bug 1389470?
Flags: needinfo?(jseward)
Flags: needinfo?(ayang)
(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #82)
Looking at what was pushed in bug 1389470 comment 50, I think the
relevant bits from this patch have been copied into m-c, which
(presumably) fixes bug 1389470 on m-c.  But those fixes will have
to get uplifted to m-r if you want bug 1389470 to be fixed on m-r.

To clarify: the patches in this bug only support fallible allocation
for Stylo.  They can't be used as-is for bug 1389470.  So uplifting
them to 56 won't help bug 1389470.

This also means that Ryan's assertion in bug 1389470 comment 53
doesn't hold.

Sorry this is confusing.  I *think* I have this right, but a second
opinion would be good.  ayang?
Flags: needinfo?(jseward) → needinfo?(lhenry)
Stylo is not enabled in 56 release (and possibly not even built?), so I think this bug is not relevant. (It is enabled for some percentage of users of 56 beta.)
Yeah, the issue is that the folks in bug 1389009 reused the fallible vec stuff to fix an OOM in the mp4 parser, which is shipping in release. But AFAICT they just copy-pasted it, so there should be no reason to uplift this.
Thanks for the clarification!
Flags: needinfo?(lhenry)
Flags: needinfo?(ayang)
You need to log in before you can comment on or make changes to this bug.