Closed Bug 903519 Opened 6 years ago Closed 2 years ago

GC: allocate JSStrings and string data in the Nursery

Categories

(Core :: JavaScript: GC, defect, P2)

defect

Tracking

()

RESOLVED FIXED
mozilla60
Tracking Status
firefox60 --- fixed

People

(Reporter: terrence, Assigned: sfink)

References

(Blocks 8 open bugs)

Details

(Keywords: perf, Whiteboard: [qf:f61][qf:p1])

Attachments

(34 files, 27 obsolete files)

59.74 KB, patch
Details | Diff | Splinter Review
6.98 KB, patch
jonco
: review+
Details | Diff | Splinter Review
8.63 KB, patch
jonco
: review+
Details | Diff | Splinter Review
6.63 KB, patch
jonco
: review+
Details | Diff | Splinter Review
33.36 KB, patch
jandem
: review+
Details | Diff | Splinter Review
11.13 KB, patch
jimb
: review+
Details | Diff | Splinter Review
32.13 KB, patch
jonco
: review+
Details | Diff | Splinter Review
20.50 KB, patch
jonco
: review+
Details | Diff | Splinter Review
32.72 KB, patch
jonco
: review+
Details | Diff | Splinter Review
1.60 KB, patch
jonco
: review+
Details | Diff | Splinter Review
1.30 KB, patch
jonco
: review+
Details | Diff | Splinter Review
5.32 KB, patch
jonco
: review+
Details | Diff | Splinter Review
7.24 KB, patch
jonco
: review+
Details | Diff | Splinter Review
2.99 KB, patch
jonco
: review+
Details | Diff | Splinter Review
6.67 KB, patch
jonco
: review+
Details | Diff | Splinter Review
11.40 KB, patch
sfink
: review+
Details | Diff | Splinter Review
10.67 KB, patch
jonco
: review+
Details | Diff | Splinter Review
9.57 KB, patch
jandem
: review+
Details | Diff | Splinter Review
105.77 KB, patch
jandem
: review+
Details | Diff | Splinter Review
7.48 KB, patch
sfink
: feedback+
Details | Diff | Splinter Review
8.31 KB, patch
Details | Diff | Splinter Review
41.03 KB, application/octet-stream
Details
1.43 KB, patch
njn
: review+
Details | Diff | Splinter Review
2.98 KB, patch
jandem
: review+
Details | Diff | Splinter Review
13.70 KB, patch
jonco
: feedback+
Details | Diff | Splinter Review
8.92 KB, patch
jonco
: review+
Details | Diff | Splinter Review
1.89 KB, patch
jonco
: review+
Details | Diff | Splinter Review
3.08 KB, patch
jonco
: review+
Details | Diff | Splinter Review
26.50 KB, patch
jandem
: review+
Details | Diff | Splinter Review
9.17 KB, patch
jonco
: review+
Details | Diff | Splinter Review
297.80 KB, patch
decoder
: feedback+
Details | Diff | Splinter Review
8.90 KB, patch
jandem
: review+
Details | Diff | Splinter Review
311.25 KB, patch
decoder
: feedback+
Details | Diff | Splinter Review
3.43 KB, patch
jonco
: review+
Details | Diff | Splinter Review
Many JSStrings are short lived and SpiderMonkey would benefit from being able to allocate these in the nursery. Our initial nursery does not allocate strings (1) for simplicity, (2) to lessen the set of barriers that are required initially, and (3) because some code paths extract inline characters from a JSString. We will need to address all of these before we can store strings in the nursery.
Blocks: GC.performance
No longer blocks: 885550
Depends on: 1008590
Depends on: 1009466
Depends on: 1032726
Depends on: 1034191
Depends on: 1034627
Depends on: 1034689
Depends on: 1038099
Assignee: general → nobody
Attached patch Prototype v0Splinter Review
Here's a very hackish patch (applies to inbound rev 934d518c24e8) I wrote last Friday to prototype this. It's x86 only and there's a lot of unnecessary code duplication, hacks and unimplemented parts but it works just enough to do some measurements.

On a best-case rope allocation benchmark I see improvements from 186 to 45 ms.

Kraken doesn't seem affected. SS is about 4% slower, I haven't investigated it yet; could be post barrier overhead or some extra minor GCs. Probably fixable with some work.

On Octane, Deltablue becomes about 3-4% faster but there's not much of a difference except for Splay:

Before:

Splay: 16437
SplayLatency: 10530

After:

Splay: 5167
SplayLatency: 7699

Profile shows a ton of time in store buffer code. I think the problem is that we pre-tenure the tree objects but allocate the strings in the nursery, so there will be a post barrier when we create a leaf node :(

Not sure what to do about this. Ion could do some analysis and pre-tenure the string as well in this case...

This patch also doesn't allocate string chars in the nursery, but a fat inline string can store up to 23 Latin1 chars inline, that's quite a lot and I don't think string mallocs are a problem for any of our main benchmarks. We should measure this of course.
I've got some passable thoughts. I guess the core problem is that we don't have a TypeObjectAddendum for JSString that can tell the jit to allocate them tenured if they survive collections, as we do for JSObject. However, there are some other heuristics we could use here, specifically (1) strings declared as constants in the source, (2) strings arriving from StringObject, (3) short/inline vs malloced chars [i.e. length] and (4) a content based hash of the string that we populate on gc and use to check for pre-tenuring when compiling.

The long-lived string in splay appears to be:
      |'String for key ' + tag + ' in leaf node'|

Where tag comes from String(Math.random()), but gets re-used for entire subtrees. So I guess 2 ropes and 2 inline/short strings. It seems like a content based hash would not be so effective, but I really don't know how many tags there are. Also we don't have a way to invalidate scripts once we populate it, so may not work too well anyway.
(In reply to Terrence Cole [:terrence] from comment #2)
> The long-lived string in splay appears to be:
>       |'String for key ' + tag + ' in leaf node'|
>
> Where tag comes from String(Math.random()), but gets re-used for entire
> subtrees. So I guess 2 ropes and 2 inline/short strings.

We have to allocate the following strings:

tag = String(Math.random());
rope1 = 'String for key ' + tag
rope2 = rope1 + ' in leaf node'

The string literals are atoms (always tenured) and are used directly.

> It seems like a
> content based hash would not be so effective, but I really don't know how
> many tags there are.

Unfortunately every tag is unique (InsertNewNode keeps generating new keys/tags until it has one that's not in the tree).

I think you're right and we'll need some kind of heuristic. In this case, the string is stored in a pre-tenured object, so Ion could propagate that info to the 2 rope allocations. That will still leave the tag string in the nursery though (the ropes are never flattened as far as I can see). V8 seems to have a way to pre-tenure strings and objects, based on the allocation site.

I'll think about it a bit more, but without (significant) wins on the real benchmarks it's harder to justify working on this.
Yeah, I agree. Did you get any counts for how many mallocs chars-in-nursery would remove? I think that's the only likely win here that's still unknown.
Blocks: emberperf
Websites do a lot of string manipulation, often with short-lived strings. It's very easy for us right now to end up with major GCs caused by allocation triggers. Allocating strings in the nursery would eliminate this.

I prototyped this 3 years ago (see the patch) and it didn't seem too difficult. There's the Octane-splay issue, we should check if that's still an issue. There have been a lot of improvements to our barriers etc since then.
Blocks: 1351769
Whiteboard: [qf]
Kannan, what do you think we should do about this bug? Should we [qf:p1] it?
Flags: needinfo?(kvijayan)
Yes.  Short string allocation is endemic in JS.  Doing this in the nursery should speed up allocation (and reduce GC pressure) by a significant amount.
Flags: needinfo?(kvijayan)
Whiteboard: [qf] → [qf:p1]
Component: JavaScript Engine → JavaScript: GC
Whiteboard: [qf:p1] → [qf:p2]
This likely impacts Speedometer so qf:p1
Whiteboard: [qf:p2] → [qf:p1]
(This bug came up in QF triage - Jan, would you be up for taking this? Seems like there's a lot to do here -- so if we're gonna get this & any fallout addressed for 57, then whoever takes it will probably need to start soon.)
Flags: needinfo?(jdemooij)
(In reply to Daniel Holbert [:dholbert] (reduced availability - travel & post-PTO backlog) from comment #9)
> (This bug came up in QF triage - Jan, would you be up for taking this? Seems
> like there's a lot to do here -- so if we're gonna get this & any fallout
> addressed for 57, then whoever takes it will probably need to start soon.)

I'm currently filing/fixing tons of perf bugs in all parts of the engine, and I think it's important for me to stay on top of that for at least a few more weeks/months. Maybe one of the QF people (Sean, Steve, Kannan) can take this one?
Flags: needinfo?(jdemooij)
jandem said on IRC (edited):

One of the problems will be the JITs using the generic tracing functions for GC pointers. We probably need to track these types separately in shapshots. When tracing values stored in Ion frames, we have a Cell* and if it's in the nursery we assume it's an object and trace it as one.
I've started working on this, though it's unclear how much work will be required on tracking and heuristics before it's landable, once the basic framework is done.
Assignee: nobody → sphink
Oops. *This* is the bug I am going to start landing incremental patches for. It doesn't work at all yet -- or rather, it works fine, but it doesn't allocate anything in the nursery yet. But I'm trying to split it up into small changes so I don't get lost.
Keywords: leave-open
Summary: [meta] GC: allocate JSStrings and string data in the Nursery → GC: allocate JSStrings and string data in the Nursery
These patches are obviously not complete by themselves, but they won't break anything, and the full patch was getting really hard to follow. Hopefully it'll be easier to review this way.
Attachment #8877373 - Flags: review?(jcoppeard)
This is pretty annoying for eg JSAtom, because it *is* still always tenured, but the class hierarchy no longer shows it.
Attachment #8877374 - Flags: review?(jcoppeard)
A mostly-pointless refactoring at this point, since it just shuffles a static_cast around, but it makes the later version a little more natural.

Note that this patch passes in a dummy gc::InitialHeap parameter that is required to be TenuredHeap, since strings cannot yet be nursery-allocated. This restriction will be lifted in a later patch.
Attachment #8877375 - Flags: review?(jcoppeard)
Attachment #8877373 - Flags: review?(jcoppeard) → review+
Comment on attachment 8877375 [details] [diff] [review]
Make js::Allocate cast strings to requested type

Review of attachment 8877375 [details] [diff] [review]:
-----------------------------------------------------------------

Looks fine, but we should maybe consider renaming these to AllocateString and AllocateObject.  Also, it would be great if we could restrict the string one to only allow casting to string types.  I'm not sure how easy that is.
Attachment #8877375 - Flags: review?(jcoppeard) → review+
Comment on attachment 8877374 [details] [diff] [review]
Reparent JSString from TenuredCell to Cell

Review of attachment 8877374 [details] [diff] [review]:
-----------------------------------------------------------------

This seems fine but it's still a bit annoying.

I know you said you investigated multiple inheritance, but I'm wondering if we can extract the methods in TenuredCell into a mixin and use that to add them to JSAtom?  That wouldn't make it a TenuredCell though so that may not fix the problem.

::: js/src/vm/String.h
@@ +526,5 @@
>      static const JS::TraceKind TraceKind = JS::TraceKind::String;
>  
> +    JS::Zone* zone() const {
> +            return asTenured().zone();
> +    }

What happens if we're not a tenured string?  Can we even work out the zone in that case?

I guess this gets handled in a later patch.
Attachment #8877374 - Flags: review?(jcoppeard) → review+
(In reply to Jon Coppeard (:jonco) from comment #17)
> Looks fine, but we should maybe consider renaming these to AllocateString
> and AllocateObject.

I've gone back and forth on that in the full patch. It's kind of weird because you wouldn't use AllocateString for all string subtypes. Which also means that AllocateObject sticks out a bit. I ended up using Allocate for everything so I wouldn't need to justify the splits.

> Also, it would be great if we could restrict the string
> one to only allow casting to string types.  I'm not sure how easy that is.

The header can't, because it doesn't have the full type definitions. (And it cannot, because of some hairy include ordering that pretty much needs to be the way it is.)

But I have a static_assert in the .cpp file. And if you did js::Allocate<Shape>(cx, DefaultHeap), you'd get an error on the static_cast from JSString* -> Shape*. So I think we're covered.

(In reply to Jon Coppeard (:jonco) from comment #18)
> This seems fine but it's still a bit annoying.
> 
> I know you said you investigated multiple inheritance, but I'm wondering if
> we can extract the methods in TenuredCell into a mixin and use that to add
> them to JSAtom?  That wouldn't make it a TenuredCell though so that may not
> fix the problem.

Hm... I'll have to think about that.

> ::: js/src/vm/String.h
> @@ +526,5 @@
> >      static const JS::TraceKind TraceKind = JS::TraceKind::String;
> >  
> > +    JS::Zone* zone() const {
> > +            return asTenured().zone();
> > +    }
> 
> What happens if we're not a tenured string?  Can we even work out the zone
> in that case?

Notice the weird indentation? :-) This is extracted from a later patch that says:

    JS::Zone* zone() const {
        if (isTenured())
            return asTenured().zone();
        return js::Nursery::getStringZone(this);
    }

When nursery-allocating, it injects the Zone pointer just before the JSString body and subtracts from the cell pointer to find it. I'm still working out how to split that patch up, given that it has to remain dead code until nursery tracing and jit handling all land, and I'll probably hold off until I make sure I'm not going to have to go back and completely re-work things. (I'm doing things a little differently from the way Jan did originally.)
Atoms will not be nursery-allocated, so if a cell is in the nursery and its low bit is set, we can be sure it's a string.

jandem -- any chance you could look at this patch and tell me (1) whether you think it's ok to steal a bit like this, and (2) tell me what I'm doing wrong? All of jit-tests passes except asm.js/check-crypto-md5.js, and I'm pretty sure I must have messed up MacroAssembler::loadStringChar so that str.charCodeAt is giving incorrect results. The following simple test case also fails (as in, sum and sum2 give different results.)

var plainText = "abcdefghijklmnopqrstuvwyzabcdefghijklmnopqrstuvwyzabcdefghijklmnopqrstuvwyzabcdefghijklmnopqrstuvwyzabcdefghijklmnopqrstuvwyzabcdefghijklmnopqrstuvwyzabcdefghijklmnopqrstuvwyzabcdefghijklmnopqrstuvwyzabcdefghijklmnopqrstuvwyzabcdefghijklmnopqrstuvwyzabcdefghijklmnopqrstuvwyzabcdefghijklmnopqrstuvwyzabcdefghijklmnopqrstuvwyzabcdefghijklmnopqrstuvwyzabcdefghijklmnopqrstuvwyzabcdefghijklmnopqrstuvwyzabcdefghijklmnopqrstuvwyzabcdefghijklmnopqrstuvwyzabcdefghijklmnopqrstuvwyzabcdefghijklmnopqrstuvwyzabcdefghijklmnopqrstuvwyzabcdefghijklmnopqrstuvwyzabcdefghijklmnopqrstuvwyzabcdefghijklmnopqrstuvwyzabcdefghijklmnopqrstuvwyzabcdefghijklmnopqrstuvwyzabcdefghijklmnopqrstuvwyzabcdefghijklmnopqrstuvwyzabcdefghijklmnopqrstuvwyzabcdefghijklmnopqrstuvwyzabcdefghijklmnopqrstuvwyzabcdefghijklmnopqrstuvwyzabcdefghijklmnopqrstuvwyzabcdefghijklmnopqrstuvwyzabcdefghijklmnopqrstuvwyzabcdefghijklmnopqrstuvwyzabcdefghijklmnopqrstuvwyzabcdefghijklmnopqrstuvwyzabcdefghijklmnopqrstuvwyzabcdefghijklmnopqrstuvwyzabcdefghijklmnopqrstuvwyzabcdefghijklmnopqrstuvwyzabcdefghijklmnopqrstuvwyzabcdefghijklmnopqrstuvwyzabcdefghijklmnopqrstuvwyzabcdefghijklmnopqrstuvwyzabcdefghijklmnopqrstuvwyzabcdefghijklmnopqrstuvwyzabcdefghijklmnopqrstuvwyzabcdefghijklmnopqrstuvwyzabcdefghijklmnopqrstuvwyzabcdefghijklmnopqrstuvwyz";

for (var i = 0; i <4; i++) {
  plainText += plainText;
}

var sum = 0;
var sum2 = 0;
for (i = 0; i < plainText.length; i++)
  sum += plainText.charCodeAt(i);
for (i = 0; i < plainText.length; i++)
  sum2 += plainText.charCodeAt(i);

print(sum);
print(sum2);
Attachment #8878620 - Flags: feedback?(jdemooij)
Note that the above test case passes with --no-ion.
Comment on attachment 8878620 [details] [diff] [review]
Force non-atom strings to have their low flag bit set in order to distinguish them from JSObjects in the nursery

Review of attachment 8878620 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/Lowering.cpp
@@ +1981,5 @@
>      MOZ_ASSERT(idx->type() == MIRType::Int32);
>  
> +    LCharCodeAt* lir = new(alloc()) LCharCodeAt(useRegister(str),
> +                                                useRegister(idx),
> +                                                tempFixed(CallTempReg0));

I think you need to use |temp()| here since LCharCodeAt doesn't extend LCallInstructionHelper
Comment on attachment 8878620 [details] [diff] [review]
Force non-atom strings to have their low flag bit set in order to distinguish them from JSObjects in the nursery

jandem gave me the answer over IRC: I was using the original string instead of the left side of a rope. Which is pretty stupid. I went to some effort to avoid adding a temporary, and was fixated on the number of live registers at each point, so I didn't notice when the *value* was different. :(
Attachment #8878620 - Flags: feedback?(jdemooij)
That seems to have fixed it. You still have to tell me whether this is even a good idea. :-)
Attachment #8878665 - Flags: review?(jdemooij)
Attachment #8878620 - Attachment is obsolete: true
Comment on attachment 8878665 [details] [diff] [review]
Force non-atom strings to have their low flag bit set in order to distinguish them from JSObjects in the nursery

Come to think of it, I'm not sure if this even makes sense. If we atomize a string in the nursery, we'll be forced to evict the nursery. Hm... maybe I'll try to get more of the main patch working first. Cancelling review.
Attachment #8878665 - Flags: review?(jdemooij)
(In reply to Steve Fink [:sfink] [:s:] from comment #25)
> Come to think of it, I'm not sure if this even makes sense. If we atomize a
> string in the nursery, we'll be forced to evict the nursery.

It makes perfect sense. Atomizing a string will allocate a new one (or return an existing atom of course). It has to because all atoms are in the atoms Zone.
(In reply to Jan de Mooij [:jandem] from comment #26)
> It makes perfect sense. Atomizing a string will allocate a new one (or
> return an existing atom of course). It has to because all atoms are in the
> atoms Zone.

Doh! You're right, of course. I guess I was staring at too much bit-twiddling JSRope code and got it into my head that the flags would be updated in-place. Thanks!

I'm still working through various missing barriers and tracing in non-JIT code. I probably ought to start generating perf numbers.
Jim - Strings appearing as literals in the source text are atomized, so this test wasn't testing what it was supposed to be testing. I ran into this because I'm going to be nursery-allocating strings, and nothing was getting nursery-allocated because it was atomized already. I made heavy use of dumpStringRepresentation to verify what everything actually is.

I switched things over to check sizes based on what the underlying string object is expected to be, since it made it *much* easier to keep things straight that way. There's also some restructuring that is really because I was hacking on a newer version of this test that uses nursery strings, and I split just this test out and undid all of the nursery-related stuff.
Attachment #8880049 - Flags: review?(jimb)
Seems to be working ok, and I didn't run into any major blockers while working on the full jit support for barriering string writes.
Attachment #8881598 - Flags: review?(jdemooij)
Attachment #8878665 - Attachment is obsolete: true
Strings appearing as literals in the source text are atomized, so this test wasn't testing quite what it thought it was testing.

I had to modify this test because nursery strings are larger, and it was easier if I added a layer of indirection (test case --> specific string subtype --> exact sizes).
Attachment #8881599 - Flags: review?(jimb)
Attachment #8880049 - Attachment is obsolete: true
Attachment #8880049 - Flags: review?(jimb)
Implement the capability to allocate strings from the nursery. (Nothing will actually allocate from there yet.)
Attachment #8881600 - Flags: review?(jcoppeard)
Blocks: 1376646
I think I should have most of the non-JIT tracing and tenuring consolidated into this patch. Note that nothing is being allocated in the nursery yet, so very little of this code gets run.
Attachment #8881626 - Flags: review?(jcoppeard)
Attachment #8881630 - Flags: review?(jcoppeard)
Comment on attachment 8881600 [details] [diff] [review]
Strings in the nursery: allocation

Review of attachment 8881600 [details] [diff] [review]:
-----------------------------------------------------------------

This looks good, although I'd like the code to be exercised by something before checkin.

For the test code changes, did you run this with nursery string allocation enabled or something?

::: js/src/gc/Allocator.h
@@ +46,4 @@
>  }
>  
>  // Specialization for JSFatInlineString that must use a different allocation
>  // type. Note that we have to explicitly specialize for both values of AllowGC

Why do we have to use a different allocation type for fat inline strings - are they a different size?

::: js/src/gc/Nursery.cpp
@@ +303,5 @@
> +    /* Ensure there's enough space to replace the contents with a RelocationOverlay. */
> +    MOZ_ASSERT(size >= sizeof(RelocationOverlay));
> +
> +    size_t allocSize = JS_ROUNDUP(sizeof(StringLayout) - 1 + size, CellAlignBytes);
> +    auto spot = static_cast<StringLayout*>(allocate(allocSize));

nit: 'spot' is an unusual name for this :)

::: js/src/gc/Nursery.h
@@ +185,5 @@
>       */
>      JSObject* allocateObject(JSContext* cx, size_t size, size_t numDynamic, const js::Class* clasp);
>  
> +    /*
> +     * Allocate and return a pointer to a new GC thing. Returns nullptr if the

Comment should say string instead of GC thing.

@@ +199,5 @@
> +        auto cell = reinterpret_cast<const js::gc::Cell*>(str); // JSString type is incomplete here
> +        MOZ_ASSERT(js::gc::IsInsideNursery(cell), "getStringZone must be passed a nursery string");
> +#endif
> +
> +        const char* layout = reinterpret_cast<const char*>(str) - offsetof(StringLayout, cell);

Should probably use uint8_t here rather than char.
Attachment #8881600 - Flags: review?(jcoppeard) → review+
Comment on attachment 8881626 [details] [diff] [review]
Strings in the nursery: tracing and tenuring

Review of attachment 8881626 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/gc/Marking.cpp
@@ +2808,5 @@
> +    MOZ_ASSERT(traceKind == JS::TraceKind::Object || traceKind == JS::TraceKind::String);
> +
> +    // Bug 1376646: Make separate store buffers for strings and objects, and
> +    // only check IsInsideNursery once.
> +    

nit: trailing whitespace

::: js/src/gc/Nursery-inl.h
@@ +17,5 @@
>  #include "js/TracingAPI.h"
>  #include "vm/Runtime.h"
>  
>  MOZ_ALWAYS_INLINE /* static */ bool
> +js::Nursery::getForwardedPointer(js::gc::Cell** ref)

Can we template this method on the GC thing type?  I think this would require less casting elsewhere.

::: js/src/vm/String.cpp
@@ +513,5 @@
>          return nullptr;
>      }
>  
> +    if (!isTenured() && maybecx) {
> +        JSRuntime* rt = maybecx->runtime();

What happens if we don't have a context here but the string is in the nursery?  Is that possible?
Attachment #8881626 - Flags: review?(jcoppeard) → review+
Attached patch Strings in the nursery: JIT (obsolete) — Splinter Review
Ok, this patch doesn't even come close to compiling. It's my "please teach me how to JIT" patch, resulting from scanning through (mostly) CodeGenerator.cpp and looking for all of the stores. I annotated the source with a number of questions, and I'd like to go through it with you sometime so you can explain all of my misconceptions to me before I continue hacking. Thanks!
Attachment #8881835 - Flags: feedback?(jdemooij)
Comment on attachment 8881598 [details] [diff] [review]
Force non-atom strings to have their low flag bit set in order to distinguish them from JSObjects in the nursery

Review of attachment 8881598 [details] [diff] [review]:
-----------------------------------------------------------------

Nice. Lots of bit twiddling :)
Attachment #8881598 - Flags: review?(jdemooij) → review+
(In reply to Jon Coppeard (:jonco) from comment #34)
> Comment on attachment 8881600 [details] [diff] [review]
> Strings in the nursery: allocation
> 
> Review of attachment 8881600 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> This looks good, although I'd like the code to be exercised by something
> before checkin.

Yes, this patch is completely out of order. For one, I'm still working through the JIT stuff. For another, there are no barriers, no tenuring, etc. I should really reorder it to after those, but since eventually this patch has to come after all of the JIT fixes too, I didn't bother reordering yet.

> For the test code changes, did you run this with nursery string allocation
> enabled or something?

Yes, I ran the shell tests with the interpreter only and they all pass. (Later patches get them passing in baseline as well, but they come later in my current patch stack.) I have not tried running the browser yet. But note that they require later patches in this stack as well. (Sorry, I should have noticed the stupid ordering before requesting review.)

> 
> ::: js/src/gc/Allocator.h
> @@ +46,4 @@
> >  }
> >  
> >  // Specialization for JSFatInlineString that must use a different allocation
> >  // type. Note that we have to explicitly specialize for both values of AllowGC
> 
> Why do we have to use a different allocation type for fat inline strings -
> are they a different size?

Hence "fat", yes. ;-)

On 64-bit, sizeof(JSString)==24. sizeof(JSFatInlineString)==32. (There's a JSThinInlineString that is 24 bits.)

> 
> ::: js/src/gc/Nursery.cpp
> @@ +303,5 @@
> > +    /* Ensure there's enough space to replace the contents with a RelocationOverlay. */
> > +    MOZ_ASSERT(size >= sizeof(RelocationOverlay));
> > +
> > +    size_t allocSize = JS_ROUNDUP(sizeof(StringLayout) - 1 + size, CellAlignBytes);
> > +    auto spot = static_cast<StringLayout*>(allocate(allocSize));
> 
> nit: 'spot' is an unusual name for this :)

Uh... yes it is. I'm not sure what I was thinking. Switched to 'header'.

Remaining comments applied.
(In reply to Steve Fink [:sfink] [:s:] from comment #38)
> (In reply to Jon Coppeard (:jonco) from comment #34)
> > Comment on attachment 8881600 [details] [diff] [review]
> > Strings in the nursery: allocation
> > 
> > Review of attachment 8881600 [details] [diff] [review]:
> > -----------------------------------------------------------------
> > 
> > This looks good, although I'd like the code to be exercised by something
> > before checkin.
> 
> Yes, this patch is completely out of order. For one, I'm still working
> through the JIT stuff. For another, there are no barriers, no tenuring, etc.
> I should really reorder it to after those, but since eventually this patch
> has to come after all of the JIT fixes too, I didn't bother reordering yet.
> 
> > For the test code changes, did you run this with nursery string allocation
> > enabled or something?
> 
> Yes, I ran the shell tests with the interpreter only and they all pass.
> (Later patches get them passing in baseline as well, but they come later in
> my current patch stack.) I have not tried running the browser yet. But note
> that they require later patches in this stack as well. (Sorry, I should have
> noticed the stupid ordering before requesting review.)

Er, never mind. This isn't the patch I was thinking of. You're right, nothing exercises this, it just makes it possible. The other prerequisite patches come next, *then* the one that finally turns on nursery allocation (*that's* the patch that is out of order, but only with respect to the JIT support.)
Comment on attachment 8881630 [details] [diff] [review]
Strings in the nursery: barriers

Review of attachment 8881630 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/gc/Barrier.h
@@ +288,5 @@
>          MOZ_ASSERT(vp);
>  
>          // If the target needs an entry, add it.
>          js::gc::StoreBuffer* sb;
> +        if ((next.isObject() || next.isString()) && (sb = next.toGCThing()->storeBuffer())) {

This gets rid of a cast, nice.

::: js/src/vm/String.h
@@ +549,2 @@
>              return asTenured().zone();
> +        return js::Nursery::getStringZone(this);

We should assert that CurrentThreadCanAccessZone() if we don't end up calling TenuredCell::zone().
Attachment #8881630 - Flags: review?(jcoppeard) → review+
Comment on attachment 8881835 [details] [diff] [review]
Strings in the nursery: JIT

Review of attachment 8881835 [details] [diff] [review]:
-----------------------------------------------------------------

Makes sense, some comments after skimming the patch.

::: js/src/jit/CacheIRCompiler.cpp
@@ +2213,5 @@
>          masm.bind(&done);
>          break;
>        }
>  
> +      // FIXME: Understand why the above objects don't require post barriers.

The callers post barrier, we should add a comment here mentioning that...

::: js/src/jit/CodeGenerator.cpp
@@ +1363,5 @@
>      masm.guardedCallPreBarrier(matchesInputAddress, MIRType::String);
>      masm.guardedCallPreBarrier(lazySourceAddress, MIRType::String);
>  
> +    // Question: when do I need to saveVolatile()?
> +    masm.movePtr(pendingInputAddress, temp3);

masm.loadPtr to load the previous value into temp3

@@ +2718,5 @@
>      masm.store32(Imm32(u.word), Address(output, JSFunction::offsetOfNargs()));
>      masm.storePtr(ImmGCPtr(info.scriptOrLazyScript),
>                    Address(output, JSFunction::offsetOfNativeOrScript()));
>      masm.storePtr(envChain, Address(output, JSFunction::offsetOfEnvironment()));
> +    // Question: why no post barrier needed for envChain?

We call masm.allocateObject with DefaultHeap kind, I think this means we end up in MacroAssembler::nurseryAllocate where we fail if we can't nursery allocate. So I think the function is guaranteed to be in the nursery. It's subtle, we should emit a check for that in DEBUG builds (see masm.assumeUnreachable).

@@ +3176,5 @@
>  
>      if (valueType == MIRType::ObjectOrNull) {
> +        // QUESTION: Why does this not require a post barrier? When this is
> +        // used, does it have a separate post barrier LIR node or something
> +        // funky like that?

IonBuilder has to add MPostWriteBarrier

@@ +6529,5 @@
>      masm.assumeUnreachable("Result in ArgumentObject shouldn't be JSVAL_TYPE_MAGIC.");
>      masm.bind(&success);
>  #endif
>      masm.storeValue(value, argAddr);
> +    // Question: why no post barrier needed?

IonBuilder adds the barrier here: http://searchfox.org/mozilla-central/rev/fdb811340ac4e6b93703d72ee99217f3b1d250ac/js/src/jit/IonBuilder.cpp#12122-12123

@@ +7886,5 @@
>      masm.newGCString(output, temp3, &failure);
>  
> +    // This stub does not barrier tenured -> nursery pointers. Fail now if the
> +    // preceding allocation was in tenured space and either of the input
> +    // strings is in the nursery.

Maybe newGCString should only do nursery allocation and fail if we don't succeed... Then we know we don't need any barriers here and in similar code.

::: js/src/jit/x64/MacroAssembler-x64.cpp
@@ +668,5 @@
> +
> +    Label done, checkAddress;
> +    bool testNursery = (cond == Assembler::Equal);
> +    branchTestObject(Assembler::Equal, value, testNursery ? &checkAddress : &done);
> +    branchTestString(Assembler::NotEqual, value, testNursery ? &done : label);

We can optimize this a bit to avoid splitting the tag twice on 64-bit:

  Register tag = masm.splitTagForTest(value);

And then testing |tag|.

@@ +671,5 @@
> +    branchTestObject(Assembler::Equal, value, testNursery ? &checkAddress : &done);
> +    branchTestString(Assembler::NotEqual, value, testNursery ? &done : label);
> +    bind(&checkAddress);
> +
> +    extractObject(value, temp);

On 32-bit platforms this returns value.payloadReg(), so you could do:

  Register obj = extractObject(value, temp);
  ..use obj..

Or do unboxNonDouble(value, temp), but that's an extra move on 32-bit.
Attachment #8881835 - Flags: feedback?(jdemooij) → feedback+
Attachment #8881599 - Flags: review?(jimb) → review+
Attached patch Strings in the nursery: MIR node (obsolete) — Splinter Review
Add a post-write barrier to MStoreUnboxedString.
Attached patch Strings in the nursery: JIT (obsolete) — Splinter Review
Updated version. Has a little bit of debugging goop mixed in (rrbreakpoint) that requires a patch I haven't uploaded. Contains all JIT support except it doesn't have a MIR node for postbarriering.
Attachment #8881835 - Attachment is obsolete: true
This is mostly green on try, though I have several browser failures (and an arm64 failure, but that should be easy). So it's time to start posting these patches for review.

I have a bunch of spot fixes to various things, and for the most part I've been rolling them into the other patches, but I'm going to split out a few that feel separate for one reason or another.
Attachment #8891429 - Flags: review?(jcoppeard)
Note that this patch would need to land with the JIT changes.
Attachment #8891441 - Flags: review?(jcoppeard)
Attached patch Strings in the nursery: MIR node (obsolete) — Splinter Review
Sorry, I left a couple of FIXME comments in the code. If you check them out and things look ok, I'll just delete them.
Attachment #8891442 - Flags: review?(jdemooij)
Attachment #8888584 - Attachment is obsolete: true
Attached patch Strings in the nursery: JIT (obsolete) — Splinter Review
This is seeming close enough to post for review.

Latest full try push is

https://treeherder.mozilla.org/#/jobs?repo=try&revision=37220b1b9fef24abefb44ee3f7fb2e9dae8931d1&filter-tier=1&filter-tier=2&filter-tier=3
Attachment #8891470 - Flags: review?(jdemooij)
Attachment #8888642 - Attachment is obsolete: true
I don't know how important this cache is. Perhaps I should rekey nursery entries instead?

Please pardon the refactoring mixed in. The only change to behavior should be that the eval cache discards all of its entries that are keyed on nursery-allocated strings. And I'm somewhat abusing the GCHashSet sweep() mechanism for this. The entire cache gets thrown out at the beginning of every major GC, so it doesn't need sweep() for that purpose.
Attachment #8891587 - Flags: review?(jcoppeard)
Just curious, do you have benchmark score numbers before and after to share from these patches?  Thanks!
Ugh, that was painful. I tracked down a thread access assertion to a case where BindingUtils.h in a worker runtime is requesting the zone of a main runtime string (in order to decide whether to wrap it.) The string is cross-runtime because it turns out to be a permanent atom from the main runtime.

Previously, this worked because JS::GetStringZone went through js::gc::detail::GetGCThingZone(uintptr_t(string)), which only does bit masking to read the zone out of the arena, bypassing all assertions. I had switched this to call JSString::zone(), which checks for nursery allocation, but also verifies that the current thread is allowed to access the zone. That latter check was failing, which also made me realize I had un-inlined this zone retrieval for external callers, largely defeating the purpose of the checks. (BindingUtils does this to avoid making a function call to wrap the string if it doesn't need it; here, I was doing a function call just to check whether we need to make a function call!)
Attachment #8891821 - Flags: review?(jcoppeard)
(In reply to :Ehsan Akhgari (needinfo please, extremely long backlog) from comment #50)
> Just curious, do you have benchmark score numbers before and after to share
> from these patches?  Thanks!

I'm just now starting to look at timings. Here are some basic numbers:

short nursery-only string allocation (very short-lived, small strings): 94 ns -> 64 ns
short nursery -> tenured allocation: 880 ns -> 870 ns

That last one doesn't really make sense. It's allocating strings into the nursery, but storing them on an object so they'll always end up getting tenured. It should be slower, since we're doing extra work -- we first allocate in the nursery, then copy into the tenured area.

Looking at longer strings that require malloc allocation:

long string nursery-only allocation: 80 ns -> 100 ns
long string nursery -> tenured: 820 ns -> 800 ns

Hm... I'm creating the strings by grabbing an existing (probably tenured) string and appending a character to it. That's going to create a rope. I'm not sure what the simplest way is to create a longer non-rope string (I can create the rope and then flatten it.) I think anything that shows up in the source is going to get atomized.

But still, those results make no sense. The nursery strings are getting slower, the tenured ones are getting faster, which is exactly the opposite of what should happen. :-( So I will need to look more closely at what's going on.
(In reply to Steve Fink [:sfink] [:s:] from comment #49)
> I don't know how important this cache is.

It's not important. Btw I added a FunctionToStringCache last week, we'll have to purge that one too on minor GC.
Comment on attachment 8891429 [details] [diff] [review]
External strings can morph into plain ones while still in EXTERNAL_STRING arena

Review of attachment 8891429 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/vm/String.h
@@ +573,5 @@
> +#if DEBUG
> +        if (isTenured()) {
> +            // Normally, the kinds should match, but an EXTERNAL_STRING arena
> +            // may contain strings that have been flattened, and thus
> +            // isExternal() will be false.

Since this is surprising behaviour, can you be a little more explicit and say something like "previously external strings that have been flattened" and provide a reference to where we do this?
Attachment #8891429 - Flags: review?(jcoppeard) → review+
Attachment #8891430 - Flags: review?(jcoppeard) → review+
Attachment #8891441 - Flags: review?(jcoppeard) → review+
Attachment #8891587 - Flags: review?(jcoppeard) → review+
Attachment #8891821 - Flags: review?(jcoppeard) → review+
Comment on attachment 8891442 [details] [diff] [review]
Strings in the nursery: MIR node

Review of attachment 8891442 [details] [diff] [review]:
-----------------------------------------------------------------

Makes sense. A suggestion to simplify the type policy below; should be good then.

::: js/src/jit/IonBuilder.cpp
@@ +13538,5 @@
> +        // Defer to the type policy because I have minimal understanding of
> +        // this code that I am mindlessly changing, so will stick as close as
> +        // possible to postbarriered objects. For all I know, it is guaranteed
> +        // that we will require a postbarrier here, and could just do it
> +        // directly. (FIXME)

It's probably best to refer to the previous comment and say the same applies to ToString inserted by the StoreUnboxedString type policy, or something.

::: js/src/jit/TypePolicy.cpp
@@ +1117,5 @@
> +    MStoreUnboxedString* store = ins->toStoreUnboxedString();
> +
> +    MOZ_ASSERT(store->typedObj()->type() == MIRType::Object);
> +
> +    MDefinition* value = store->value();

I think the ConvertToStringPolicy<2>::staticAdjustInputs call above should have ensured value->type() == MIRType::String. Considering that I think we can simplify the code below to just:

  MOZ_ASSERT(value->type() == MIRType::String);
  MInstruction* barrier = MPostWriteBarrier::New(alloc, store->typedObj(), value);
  store->block()->insertBefore(store, barrier);
  return true;
Attachment #8891442 - Flags: review?(jdemooij)
Comment on attachment 8891470 [details] [diff] [review]
Strings in the nursery: JIT

Review of attachment 8891470 [details] [diff] [review]:
-----------------------------------------------------------------

This is excellent. Most of my comments are just nits but I'd like to take a second look in case I missed something the first time.

::: js/src/jit/CodeGenerator.cpp
@@ +1206,5 @@
> +    Label exit;
> +    Label checkRemove, putCell;
> +
> +    // if (next && (buffer = next->storeBuffer()))
> +    masm.branchPtr(Assembler::Equal, next, ImmWord(0), &checkRemove);

This is okay but I think all callers know the pointer is non-null?

@@ +1211,5 @@
> +    Register storebuffer = next;
> +    masm.loadStoreBuffer(next, storebuffer);
> +    masm.branchPtr(Assembler::Equal, storebuffer, ImmWord(0), &checkRemove);
> +
> +    // if (prev && prev->storeBuffer())

What do you think about using the whole-cell buffer here (jit::PostWriteBarrier)? We don't match the C++ barrier code elsewhere and I'm not sure we need to do that here... We emit quite a lot of JIT code here, not a problem for the current callers because they are trampolines that are shared, but it could become a problem if we use this for random LIR instructions.

@@ +1501,2 @@
>      masm.storePtr(temp3, lazySourceAddress);
> +    EmitPostWriteBarrierS(masm, temp1, RegExpStatics::offsetOfLazySource(),

I don't think this needs a post barrier because we're storing an atom, RegExpShared::source?

@@ +3920,5 @@
> +            MOZ_ASSERT(lir->mir()->value()->type() == MIRType::Object);
> +    } else {
> +        MOZ_ASSERT(nurseryType == MIRType::String);
> +        MOZ_ASSERT(lir->mir()->value()->type() == MIRType::String);
> +        masm.branchPtrInNurseryChunk(Assembler::Equal, value, temp, ool->entry());

Can remove this line (it's the same as the code after the if-else).

@@ +3946,5 @@
>  
>      maybeEmitGlobalBarrierCheck(lir->object(), ool);
>  
>      ValueOperand value = ToValue(lir, LPostBarrierType::Input);
> +    masm.branchValueIsNurseryCell(Assembler::Equal, value, temp, ool->entry());

Please file a follow-up bug to optimize this. Often we know the value might be an object but it's definitely not a string, or similarly it might be a string but it's definitely not an object. (I'd be happy to take that bug.)

::: js/src/jit/MacroAssembler.cpp
@@ +959,5 @@
> +    int thingSize = int(gc::Arena::thingSize(allocKind));
> +    int totalSize = js::Nursery::stringHeaderSize() + thingSize;
> +    MOZ_ASSERT(totalSize % gc::CellAlignBytes == 0);
> +
> +    // The nursery position (allocation pointer) and the nursery end are stored

Do you think it makes sense to merge this with MacroAssembler::nurseryAllocate (your version has some clever optimizations)? If so please file a follow-up bug. Also, we should rename nurseryAllocate to nurseryAllocateObject.

@@ +987,5 @@
> +        branchPtr(Assembler::Below, Address(temp, nurseryEndAddr - nurseryPosAddr), result, fail);
> +        storePtr(result, Address(temp, 0));
> +        movePtr(ImmPtr(zone), temp);
> +        subPtr(Imm32(thingSize), result);
> +        storePtr(temp, Address(result, -js::Nursery::stringHeaderSize()));

we can use storePtr(ImmPtr(zone), ...) and remove the movePtr

::: js/src/jit/arm/MacroAssembler-arm.cpp
@@ +5385,5 @@
>  
>  void
> +MacroAssembler::loadStoreBuffer(Register ptr, Register buffer)
> +{
> +    ma_lsr(Imm32(gc::ChunkShift), ptr, buffer);

Do you know why we use a different implementation of this on ARM32? If it was not for that, we could move this to MacroAssembler.cpp... Maybe worth another follow-up.

::: js/src/jit/arm64/MacroAssembler-arm64.cpp
@@ +875,5 @@
> +    branchTestObject(Assembler::Equal, value, &checkAddress);
> +    branchTestString(Assembler::NotEqual, value, testNursery ? &done : label);
> +    bind(&checkAddress);
> +
> +    extractCell(value, temp);

Nit: the x64 code uses unboxNonDouble here IIRC. Please compare the platform-specific files and get rid of the differences where possible.
Attachment #8891470 - Flags: review?(jdemooij) → feedback+
(In reply to :Ehsan Akhgari (needinfo please, extremely long backlog) from comment #50)
> Just curious, do you have benchmark score numbers before and after to share
> from these patches?  Thanks!

I don't know how stable these numbers on, they are from a pair of runs on my laptop of speedometer v2 from sm.duct.io with and without nursery strings. The arithmetic mean goes from 430.8 -> 420.4.

Iteration  1:  154.6 ms ->  157.4 ms =   +2.8 ms ( +1.8%)
Iteration  2:  188.9 ms ->  189.6 ms =   +0.7 ms ( +0.4%)
Iteration  3:  193.6 ms ->  189.2 ms =   -4.4 ms ( -2.3%)
Iteration  4:  508.2 ms ->  490.8 ms =  -17.4 ms ( -3.4%)
Iteration  5:  882.6 ms ->  759.7 ms = -122.9 ms (-13.9%)
Iteration  6: 1142.0 ms -> 1117.0 ms =  -25.0 ms ( -2.2%)
Iteration  7:  256.5 ms ->  259.8 ms =   +3.3 ms ( +1.3%)
Iteration  8:  547.6 ms ->  553.5 ms =   +5.9 ms ( +1.1%)
Iteration  9:  185.4 ms ->  186.3 ms =   +0.9 ms ( +0.5%)
Iteration 10:  124.7 ms ->  124.5 ms =   -0.2 ms ( -0.2%)
Iteration 11:  795.7 ms ->  771.0 ms =  -24.7 ms ( -3.1%)
Iteration 12:   75.9 ms ->   75.9 ms =   -0.0 ms ( -0.0%)
Iteration 13:  493.6 ms ->  501.3 ms =   +7.7 ms ( +1.6%)
Iteration 14:  638.8 ms ->  645.6 ms =   +6.8 ms ( +1.1%)
Iteration 15:  273.1 ms ->  284.3 ms =  +11.2 ms ( +4.1%)
Blocks: 1386094
(In reply to Jan de Mooij [:jandem] from comment #56)
> Comment on attachment 8891470 [details] [diff] [review]
> Strings in the nursery: JIT
> 
> Review of attachment 8891470 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> This is excellent. Most of my comments are just nits but I'd like to take a
> second look in case I missed something the first time.
> 
> ::: js/src/jit/CodeGenerator.cpp
> @@ +1206,5 @@
> > +    Label exit;
> > +    Label checkRemove, putCell;
> > +
> > +    // if (next && (buffer = next->storeBuffer()))
> > +    masm.branchPtr(Assembler::Equal, next, ImmWord(0), &checkRemove);
> 
> This is okay but I think all callers know the pointer is non-null?

Oh yeah, you're right. Definitely better to skip it, thanks.

> @@ +1211,5 @@
> > +    Register storebuffer = next;
> > +    masm.loadStoreBuffer(next, storebuffer);
> > +    masm.branchPtr(Assembler::Equal, storebuffer, ImmWord(0), &checkRemove);
> > +
> > +    // if (prev && prev->storeBuffer())
> 
> What do you think about using the whole-cell buffer here
> (jit::PostWriteBarrier)? We don't match the C++ barrier code elsewhere and
> I'm not sure we need to do that here... We emit quite a lot of JIT code
> here, not a problem for the current callers because they are trampolines
> that are shared, but it could become a problem if we use this for random LIR
> instructions.

Huh. I didn't even consider that; I was just translating the C++ code.

I should time all the options -- current code, unconditional whole-cell, whole-cell conditional on the next value, whole-cell conditional on both next and prev values. I don't know how expensive these calls are, so I'm not sure how much code I should spend on avoiding them. Or maybe none of this matters.

> @@ +1501,2 @@
> >      masm.storePtr(temp3, lazySourceAddress);
> > +    EmitPostWriteBarrierS(masm, temp1, RegExpStatics::offsetOfLazySource(),
> 
> I don't think this needs a post barrier because we're storing an atom,
> RegExpShared::source?

Ah! Thanks, I missed that.

> 
> @@ +3920,5 @@
> > +            MOZ_ASSERT(lir->mir()->value()->type() == MIRType::Object);
> > +    } else {
> > +        MOZ_ASSERT(nurseryType == MIRType::String);
> > +        MOZ_ASSERT(lir->mir()->value()->type() == MIRType::String);
> > +        masm.branchPtrInNurseryChunk(Assembler::Equal, value, temp, ool->entry());
> 
> Can remove this line (it's the same as the code after the if-else).

Oops, you're right. I had these as separate functions until very recently, when I combined them. And in retrospect, I'm not sure why I did the funky template parameter thing -- seems like passing in a regular parameter would be fine, it's not like this affects the generated code. Do you have a preference?

> @@ +3946,5 @@
> >  
> >      maybeEmitGlobalBarrierCheck(lir->object(), ool);
> >  
> >      ValueOperand value = ToValue(lir, LPostBarrierType::Input);
> > +    masm.branchValueIsNurseryCell(Assembler::Equal, value, temp, ool->entry());
> 
> Please file a follow-up bug to optimize this. Often we know the value might
> be an object but it's definitely not a string, or similarly it might be a
> string but it's definitely not an object. (I'd be happy to take that bug.)

bug 1386094, added a comment

> ::: js/src/jit/MacroAssembler.cpp
> @@ +959,5 @@
> > +    int thingSize = int(gc::Arena::thingSize(allocKind));
> > +    int totalSize = js::Nursery::stringHeaderSize() + thingSize;
> > +    MOZ_ASSERT(totalSize % gc::CellAlignBytes == 0);
> > +
> > +    // The nursery position (allocation pointer) and the nursery end are stored
> 
> Do you think it makes sense to merge this with
> MacroAssembler::nurseryAllocate (your version has some clever
> optimizations)? If so please file a follow-up bug. Also, we should rename
> nurseryAllocate to nurseryAllocateObject.

I wasn't sure if those optimizations were worth keeping, honestly -- it made sense when I was just blindly assuming that it would always fit within a 32-bit offset, but one of my last test failures was due to the allocations being way too far apart. I thought I probably ought to rip it out, but I guess I thought maybe it might be possible to ensure that the structures would be close, and then the only code would be the small-offset one. Since neither path seemed a clear winner in the end, I kept both. :( The "optimization" seems like a pretty minor win to be worth the complexity, but I don't know how to judge. (It mostly just bothered me to see all the huge 64-bit pointer loads in my initial version, which had three of them.)

Other than that, there's not much to be merged. Objects have to deal with the slots pointer and failing if there are too many slots. Strings have to store the zone pointer. The size computation is different.

Anyway, filed bug 1386101.

> @@ +987,5 @@
> > +        branchPtr(Assembler::Below, Address(temp, nurseryEndAddr - nurseryPosAddr), result, fail);
> > +        storePtr(result, Address(temp, 0));
> > +        movePtr(ImmPtr(zone), temp);
> > +        subPtr(Imm32(thingSize), result);
> > +        storePtr(temp, Address(result, -js::Nursery::stringHeaderSize()));
> 
> we can use storePtr(ImmPtr(zone), ...) and remove the movePtr

Oh! I can never quite get the hang of what's allowed.

Hm. And in looking at this code again, I'm not sure why I don't store the Zone* before backing up result by thingSize. I wouldn't need stringHeaderSize() in that case. But I guess it probably makes zero difference in practice.

> 
> ::: js/src/jit/arm/MacroAssembler-arm.cpp
> @@ +5385,5 @@
> >  
> >  void
> > +MacroAssembler::loadStoreBuffer(Register ptr, Register buffer)
> > +{
> > +    ma_lsr(Imm32(gc::ChunkShift), ptr, buffer);
> 
> Do you know why we use a different implementation of this on ARM32? If it
> was not for that, we could move this to MacroAssembler.cpp... Maybe worth
> another follow-up.

I don't know for sure, but kind of remember Marty telling me once how ARM is really good at rotating and shifting registers. It seems a little crazy that a pair of shifts would be faster than a mask... oh, unless the problem is that the mask would be an immediate, and immediates are expensive and annoying on ARM? Oh yeah, I think that's it. And I just looked it up on stack overflow too; https://stackoverflow.com/questions/347889/how-to-mask-bytes-in-arm-assembly touches on some of the craziness.

ARM seems to love bit shifting; looks like your immediates can be shifted. But I guess that doesn't help much with a pattern like 0x000fffff. You could do 0x001000000 - 1, but that comes to 3 instructions.

Should I file a bug on adding MaskLowBits or something like it to MacroAssembler.cpp?

> ::: js/src/jit/arm64/MacroAssembler-arm64.cpp
> @@ +875,5 @@
> > +    branchTestObject(Assembler::Equal, value, &checkAddress);
> > +    branchTestString(Assembler::NotEqual, value, testNursery ? &done : label);
> > +    bind(&checkAddress);
> > +
> > +    extractCell(value, temp);
> 
> Nit: the x64 code uses unboxNonDouble here IIRC. Please compare the
> platform-specific files and get rid of the differences where possible.

Ok. I think I was trying to match the extractObject in the branchValueIsNurseryObjectImpl. But that's silly; I had to add a bunch of random *Cell functions just because of this. I'm not sure why unboxObject is special.
Attached patch Strings in the nursery: MIR node (obsolete) — Splinter Review
That's simpler, thanks.
Attachment #8892246 - Flags: review?(jdemooij)
Attachment #8891442 - Attachment is obsolete: true
Attached patch Strings in the nursery: JIT (obsolete) — Splinter Review
I think I got through all of the review comments. Please let me know anything else you spot.
Attachment #8892247 - Flags: review?(jdemooij)
Attachment #8891470 - Attachment is obsolete: true
smaug tried a PGO build on speedometer, and didn't see any difference. I tried speedometer with the benj.me site, and unless I'm getting numbers backwards, it seemed to get worse with this patch. :(

I've started looking into perf. I have some microbenchmarks giving puzzling numbers (see comment 52). I'm looking at Octane now, which shows some pretty major shifts. I checked my base revision against just the "safe" patches that shouldn't be slowing anything down much at all (eg nothing actually gets nursery allocated), and I'm seeing substantial regressions in Splay and Typescript (10% and 7%). So it looks like I have some things to look into.
FWIW, Speedometer seems to be rather minorGC heavy, triggering it very often. If nursery strings make use use minorGC even more, I wouldn't be surprised of some regressions there.
(but I got pretty much exactly the same score with or without the patch on QF hardware.)
Oh, but the basic current summary for Octane is: major hit on Splay and SplayLatency, moderate hit on PdfJS, large win on RegExp. I have some work to do.
(In reply to Steve Fink [:sfink] [:s:] from comment #63)
> Oh, but the basic current summary for Octane is: major hit on Splay and
> SplayLatency, moderate hit on PdfJS, large win on RegExp. I have some work
> to do.

How does it affect the total score? How large is the RegExp win?

The Splay issue is probably what I ran into in comment 1. At least the regression seems to be a bit smaller this time...
Arbitrary example runs, the base inbound revision -> nursery strings:

          Box2D:  53950 ->  54816 =   +866 ( +1.6%)
       CodeLoad:  22597 ->  22877 =   +280 ( +1.2%)
         Crypto:  33237 ->  32914 =   -323 ( -1.0%)
      DeltaBlue:  58395 ->  56280 =  -2115 ( -3.6%)
    EarleyBoyer:  28466 ->  27366 =  -1100 ( -3.9%)
        Gameboy:  60222 ->  64243 =  +4021 ( +6.7%)
       Mandreel:  30574 ->  30313 =   -261 ( -0.9%)
MandreelLatency:  37767 ->  39565 =  +1798 ( +4.8%)
   NavierStokes:  33142 ->  32319 =   -823 ( -2.5%)
          PdfJS:  19660 ->  16663 =  -2997 (-15.2%)
       RayTrace: 113736 -> 113070 =   -666 ( -0.6%)
         RegExp:   4814 ->   6351 =  +1537 (+31.9%)
       Richards:  32030 ->  31447 =   -583 ( -1.8%)
          Splay:  22434 ->  11600 = -10834 (-48.3%)
   SplayLatency:  21923 ->   7144 = -14779 (-67.4%)
     Typescript:  33750 ->  30948 =  -2802 ( -8.3%)
           zlib:  85755 ->  84992 =   -763 ( -0.9%)

          Score:  33301 ->  30010 =  -3291 ( -9.9%)

Depending on the runs I look at, the overall score drops 8-10%. Splay drops 30-50%, SplayLatency drops about 65% (the variance of its regression is much lower than Splay, oddly enough.)

Hm, that's odd. For your numbers in comment 1:

       Splay:  16437 ->   5167 = -11270 (-68.6%)
SplayLatency:  10530 ->   7699 =  -2831 (-26.9%)

So you saw a larger drop on Splay, a smaller drop on SplayLatency.

Looking closer at GC times, my SplayLatency loss makes sense. With the base revision, all of the major GCs take roughly the same time, in the range 30-50 ms. With nursery strings, they vary wildly, from 42-108 ms. (The times are higher because with nursery strings, I get fewer but slower major GCs.) Hm... I could experiment with a smaller nursery, just to see.
Dropping nursery size from 16MB -> 1MB boosts the SplayLatency score somewhat, unsurprisingly, but it blows the number of minor GCs way up. The time shifts from major GCs to minor GCs, with the total getting slightly worse.

The variance in the SplayLatency major GCs is a result of allocating lots of GC bytes, which triggers a nonincremental GC. And that ends up waiting for a long time on background sweeping.

More specifically, as splay runs it fills up the whole cell store buffer, triggering a minor GC. But the way that splay works, 73% of the nursery gets tenured. (It is known that splay doesn't at all follow the generational hypothesis.) That triggers a huge wave of tenured allocations, and when that exceeds the major GC trigger threshold, it easily overwhelms the nonincremental threshold as well. And in a nonincremental GC, we set up "background" sweeping (in this case, really just parallel sweeping) and then wait for it to complete, producing the large pause.

If we triggered an incremental collection, we would at least have some mutator time to overlap with the sweeping. Though given the benchmark that it is, that mutator execution will involve allocating many *more* nursery strings, so I'm not sure if we can keep ahead of it. I tried changing the thresholds so that an incremental GC would be triggered at 20% of the max size, and it didn't seem to do anything. (But I don't think it mattered; because this benchmark runs to completion, that threshold is only checked when the zone has already been triggered by something else.) I should also note that in the incremental GCs, I see a long gap before the final slice -- we get a slice every 12ms or so, but then get a few hundred ms gap before the final slice. That may mean that it's polling the background tasks and finding that they're not done yet, so skipping the slice? Hm... no, if I put a printout whenever it polls the bg sweep, it only fires 4 times. 

These slices appear to be scheduled off of GC pressure (ALLOC_TRIGGER or filling up the whole cell buffer). So this is a little weird, in that an interactive web page would have more GC opportunities. But it still seems like there may be a legitimate problem here.
In case it's useful, I learned a bit about what v8 does:  They started by adding pretenuring to avoid the cost of copying from the nursery into the tenured generation, mostly motivated by splay.  However, when measuring the overall effect of pretenuring on speedometer-type real-world workloads, they found that pretenuring ended up being too aggressive, i.e., pretenuring a lot of sites early on that later produced young garbage (unnecessarily causing many full gcs).  So they switched (or are in the process of switching) to a more dynamic scheme wherein, when the nursery is mostly full of live objects, it is handed over to be part of the old generation (avoiding the copy) and a new nursery is used.
(In reply to Luke Wagner [:luke] from comment #67)
> In case it's useful, I learned a bit about what v8 does:  They started by

Whoa, thanks! I was (mostly idly) speculating about doing exactly that. It's pretty messy, though; our nursery layout is not the same as our tenured layout. (Things aren't split up by zone or AllocKind, and strings have extra data prefixed to them in the nursery.) But if we wanted to specifically fix splay, we could consider having a string-only nursery that was easier to graft onto the tenured heap.

My windows PGO build failed for some strange infrastructure-looking reason, but the Talos results for Linux PGO are actually looking pretty good to me:

https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-inbound&newProject=try&newRevision=dadd4462dc1aeb7f599a89503d123390b4a3bb00&framework=1&showOnlyImportant=0&selectedTimeRange=172800
Belay that. I'm guessing that is comparing a PGO build to a non-PGO build. :-( I was using the old instructions for generating a pgo build, and I suspect they confused the talos comparison thing?
In Ion we could try to pretenure strings that are written to objects that are allocated in the tenured heap; that might be good enough for the Splay case. I can try something if you have a combined patch.
Here's a rollup patch. It has various other minor things wrapped up in it, but nothing that should have much of an effect.
While I was attempting to make a version that dynamically disables nursery strings, in hopes that I could land them switched off for now, I encountered a problem where it was mistakenly tenuring a forwarded object as if it were a string, because the Relocated pattern 0xbad0bad1 has the low bit set, which when you're in the nursery is used to indicate that the cell is a string rather than an object.

So I added an explicit IsForwarded check, which required reshuffling a bunch of stuff around to have it available and inlined where I needed it. It turns out that my try pushes with nursery strings on didn't break, because tracing a nursery string to tenure it does *almost* the same thing as tenuring an object.

But what's really weird is that I did this in the context of trying to track down why the disabled nursery strings were spending quite a bit more time in minor GCs (all the time is when tracing the whole cell store buffers, and they trace about the same number of cells.) I expected this added check to slow it down further, but instead it completely eliminated the difference. (There's still a difference once I add in the JIT support, even disabled, which I am still trying to figure out.) I can see why it wouldn't change perf by much, since the IsForwarded check is dereferencing the same memory address that is going to be needed to distinguish strings vs objects anyway. Though it then does an additional memory load if it does turn out to be forwarded. I don't have a good explanation for why it speeds things (back) up.

Hm... perhaps it's a branch prediction thing? With the buggy 0xbad0bad1 check, sometimes you end up running the string tenuring code, sometimes the object tenuring code. With it fixed, you should always do the object tenuring (since there are no nursery strings to tenure.)
Attachment #8894118 - Flags: review?(jcoppeard)
Here's the patch that adds the option to enable or disable nursery strings.
Attachment #8894119 - Flags: review?(jcoppeard)
Comment on attachment 8894118 [details] [diff] [review]
Handle forwarded cells in Cell::getTraceKind

Review of attachment 8894118 [details] [diff] [review]:
-----------------------------------------------------------------

> I encountered a problem where it was mistakenly tenuring a forwarded object as if it were a string, because the Relocated pattern 0xbad0bad1 has the low bit set, which when you're in the nursery is used to indicate that the cell is a string rather than an object.

It seems bad that we allow this possibility.  Is it possible to change the relocation overlay format so that it's still possible to distinguish nursery strings/objects after they have been forwarded?

One possibility might be to write the magic value at cell + 4 and leave the first 32 bits alone.  We don't support systems which allocate memory with the high bits because of the Value format so writing this over the top half of a 64 bit ObjectGroup* will still result in a bad pointer.  We should make sure it represents and invalid string length too and assert this somewhere.
Attachment #8894118 - Flags: review?(jcoppeard)
Comment on attachment 8894119 [details] [diff] [review]
Add ability to disable nursery strings

Review of attachment 8894119 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jsapi.cpp
@@ +7698,5 @@
> +JS_PUBLIC_API(void)
> +JS::EnableNurseryStrings(JSContext* cx)
> +{
> +    ReleaseAllJITCode(cx->runtime()->defaultFreeOp());
> +    cx->runtime()->gc.nursery().enableStrings();

enableStrings() asserts the nursery is empty, so do we need an AutoEmptyNursery here too?
Attachment #8894119 - Flags: review?(jcoppeard) → review+
Unfortunately, the results are not looking good. After two months of working on this, I'm coming to pretty much the same conclusion that Jan did after a day's worth of implementation.

Speedometer seems unaffected by this. If some of the slowdowns were fixed, eg by figuring out a way to pretenure strings, then perhaps this could have a positive effect. I can't figure out how to get useful results for try pushes on arewefastyet, and my reference hardware needs to be reimaged, but smaug gave me results for an earlier version of this patch stack.

Talos regresses: https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&newProject=try&newRevision=479b80d39f537dd93927a6150a39cc4907c23f22&framework=1&selectedTimeRange=172800

Octane takes a big hit, mostly in Splay and SplayLatency.

So with the current implementation, this is not worth landing.

To possibly make it landable, I experimented with dynamically disabling it to see if I could remove the regression when it was compiled in but disabled. So far, I have only really looked at Splay/SplayLatency on Octane, since I was getting a clear slowdown there that was easy to reproduce and bisect. After some fixes, Octane is now unaffected by having nursery strings compiled in but disabled. Unfortunately, Talos is still slightly negative: https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&newProject=try&newRevision=38bfe1e2638dbc7026f937fabb6e4bacc5843f70&framework=1&selectedTimeRange=172800 (it's not surprising that it's somewhat negative, given that it is doing strictly more work even though there are no strings in the nursery). So I don't think I can land it disabled either. Even #ifdef'ed out, I think we would need some evidence for a clear win before accepting the complexity.

I have run out of time to spend on this without a clearer indication of benefit, because other projects need attention to avoid blocking other people. I will suggest avenues for further exploration:

For removing the overhead when this is compiled in but disabled:

It should be possible to undo all the JIT code generation changes when nursery strings are disabled. It means a runtime check everywhere different code is compiled, but if we cache the result on the masm or somewhere (to avoid a TLS lookup for the JitContext), I think this should be essentially free. There is other non-JIT overhead that would be compiled in, hopefully the branch predictor can take care of most of it. (I didn't need to do this while focusing on Octane, since I was able regain full speed by disabling only a small portion of the JIT changes.)

One known issue is that I believe there is a bug in pretenuring -- I think the pretenuring decision is now based on the promotion rate of both strings and objects, even though only objects can be pretenured. This should be fixed to pretenure objects based on the object-only promotion rate. (The specific ObjectGroups to pretenure shouldn't be affected, since they are based on a hardcoded count of 3000; this is about whether to even check for pretenuring when collecting a full nursery.)

For making nursery strings enabled be faster:

- pretenuring for strings. This should be much easier than I first thought, with the insight that we only need to track allocation site info for nursery strings, not tenured strings, and we can prepend an additional pointer or id (along with the Zone*, already present) to nursery strings. We would need to:

   - determine a static ID of the allocation site to compile into the JIT (and probably during VM allocations as well)
   - count the tenurings for each allocation ID
   - record a set of allocation IDs to pretenure
   - have the JIT check against those, and either
     - add additional barriers for pretenurable nursery allocations (some of these would be avoidable, if we can propagate pretenuring to the stored string allocation)
     - or bail out to a slow path when barriers would be required
   - (optional but probably needed for Splay) have a trigger mechanism to invalidate JIT code when a string becomes pretenured (rather than waiting for the next code-discarding GC)

Unless I am missing something, it seems like a feasible project, but I can't predict whether it would make the difference. It *should* address most of the Splay and SplayLatency regressions, for slightly different reasons. I'm overdue on some other things, so I can't even try this out right now. Somebody like jandem could probably implement this far faster than I am capable of.

- switch strings in the JIT to the whole cell storebuffer instead of using the more precise barriers. Insertions should be faster, code would be smaller. On the other hand, some of the slowdown I have seen has been from tracing the whole cell buffer, so I cannot predict the final outcome.

- Tracing the whole cell buffer is slower when both string and object properties must be considered, since there's a dynamic switch between them that probably is poorly predicted. So there's a 3rd option, which is to have separate whole cell buffers for objects and strings, and if a container object is in one and not the other, call a specialized trace that only looks at the appropriate properties. This is a micro-optimization, though.

I don't have the newest patches posted, but https://hg.mozilla.org/try/rev/0bcf65f050796d2fc0bb4e67a604a072c85f8400 has all of them.
Steve, what if we add a per-zone pre-tenuring mechanism? So when a Zone tenures a lot of strings in a single nursery GC, we turn off nursery allocation of strings for that particular Zone. I implemented this on top of your patches and it fixes Octane-splay. This does not hit on Speedometer or when browsing normal websites. I know it's very crude but since you're so close it would be really unfortunate to let these patches bitrot.

I've also been profiling this a bit and there are some perf issues (non-inlined string post barriers in JSRope::flattenInternal etc) that are easy to fix. IMO it's still worth getting this landed; I'm happy to help with that.
Attached file Steve's patches (obsolete) —
Steve's patches from the Try link in comment 76, in case the repo is reset again.
(In reply to Jan de Mooij [:jandem] from comment #77)
> Steve, what if we add a per-zone pre-tenuring mechanism? So when a Zone
> tenures a lot of strings in a single nursery GC, we turn off nursery
> allocation of strings for that particular Zone. I implemented this on top of
> your patches and it fixes Octane-splay. This does not hit on Speedometer or
> when browsing normal websites. I know it's very crude but since you're so
> close it would be really unfortunate to let these patches bitrot.

That seems like a great idea, especially since a Zone granularity is relatively easy and makes sense to do.

I forgot to mention one other idea for improving this - there should probably be separate nurseries for strings and objects. That would make it trivial to fix the object pretenuring bug (there's no straightforward way to know what percentage of just objects were tenured right now, because we don't have counts for anything; we estimate based on the pre and post nursery size).   It might also be the right way to fix the root cause of the SplayLatency regression -- pretenuring can fix SplayLatency itself, but it seems very plausible that the same underlying behavior would show up in a lot of other places.

When you nursery-allocate a bunch of strings that end up getting tenured, you end up allocating a huge number of GCBytes all at once -- plenty enough to overrun our incremental GC trigger threshold and instead trigger a nonincremental GC. And just preventing tenuring from triggering nonincremental GCs seems problematic; if most of your tenured allocations are coming via the nursery, then that's exactly the mechanism needed to respond to high allocation pressure.

The straightforward way to fix it would be to shrink the nursery, so that not as many strings can pile up. I tested Octane with a 1MB nursery instead of the default 16MB, and it indeed fixes SplayLatency, at the cost of slowing other stuff down. The problem is that you really want a larger nursery for objects. And since we use ChunkTrailers to identify nursery cells, there's no need to keep the nurseries in the same memory region or anything. I think it would be a pretty easy change to separate them out.

I also have a handwavy argument that whether or not a string is going to be tenured is generally tied to whether the object it gets stored into is tenured; there shouldn't be very many long-lived strings that float around without getting stored in an object somewhere. So a small nursery for strings + a larger nursery for objects seems promising. If strings are getting allocated at a high rate, this runs the risk of excessive object tenuring, but your Zone pretenuring should fix that.

> I've also been profiling this a bit and there are some perf issues
> (non-inlined string post barriers in JSRope::flattenInternal etc) that are
> easy to fix. IMO it's still worth getting this landed; I'm happy to help
> with that.

The latest round of measurements has made things look a bit more favorable (especially with the strings disabled). If your pretenuring plus a few more fixes were enough to get this in with a net positive effect, it would be awesome. Can you do (or tell me how to do) an awfy try push? Most of mine seem to fail for reasons I don't understand. I've gotten a speedometer v1 run out of it, but no v2.
(In reply to Jon Coppeard (:jonco) from comment #74)
> Comment on attachment 8894118 [details] [diff] [review]
> Handle forwarded cells in Cell::getTraceKind
> 
> Review of attachment 8894118 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> > I encountered a problem where it was mistakenly tenuring a forwarded object as if it were a string, because the Relocated pattern 0xbad0bad1 has the low bit set, which when you're in the nursery is used to indicate that the cell is a string rather than an object.
> 
> It seems bad that we allow this possibility.  Is it possible to change the
> relocation overlay format so that it's still possible to distinguish nursery
> strings/objects after they have been forwarded?
> 
> One possibility might be to write the magic value at cell + 4 and leave the
> first 32 bits alone.  We don't support systems which allocate memory with
> the high bits because of the Value format so writing this over the top half
> of a 64 bit ObjectGroup* will still result in a bad pointer.  We should make
> sure it represents and invalid string length too and assert this somewhere.

(After Jon re-explained to me) that's a great idea. It will expand the size of RelocationOverlay on 32-bit, but I *think* that's ok (it's from 3 -> 4 32-bit words, which I think is still small enough for everything we're nursery allocating.)
(In reply to Steve Fink [:sfink] [:s:] from comment #80)
> Can you do (or tell me how to do) an awfy try push? Most of mine
> seem to fail for reasons I don't understand. I've gotten a speedometer v1
> run out of it, but no v2.

Was that with speedometer-misc or speedometer? I just triggered a speedometer-misc Win64 run for your Try push in comment 76 (with my rebased patches I get jit-test assertion failures locally).

Do you still want me to review these jit patches or is there a newer version?
Flags: needinfo?(sphink)
You're right, this is a much better approach. And it seems to work.
Attachment #8897287 - Flags: review?(jcoppeard)
Attachment #8894118 - Attachment is obsolete: true
Sorry jandem, somehow I didn't see your last comment (and needinfo). I think this patch is probably the same as what is already posted, but yes, I think it can be reviewed now.
Attachment #8897288 - Flags: review?(jdemooij)
Attachment #8892246 - Attachment is obsolete: true
Attachment #8892246 - Flags: review?(jdemooij)
This is pretty much the original patch, though I threaded through a "can strings be in the nursery" flag through a bunch of stuff because without it I was failing CurrentThreadCanAccessZone checks. There's probably a better way to do this?

Sorry, I was trying to split it out into a separate patch, but ended up with a mess. If I had another half hour right now, I could do it properly.
Attachment #8897291 - Flags: review?(jdemooij)
Comment on attachment 8897287 [details] [diff] [review]
Change Relocated marker to not confuse string vs object bit

Review of attachment 8897287 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jsgc.h
@@ +1093,5 @@
>  
>      /* Set to Relocated when moved. */
> +    uint32_t magic_;
> +#else
> +# error RelocationOverlay unimplemented on big-endian.

Does this not work on big-endian?  I think it just means that we will overwrite the low word of an object's group pointer... which should be fine.

::: js/src/jsgcinlines.h
@@ +469,5 @@
>  {
>      MOZ_ASSERT(!isForwarded());
>      // The location of magic_ is important because it must never be valid to see
>      // the value Relocated there in a GC thing that has not been moved.
> +    static_assert(offsetof(RelocationOverlay, magic_) == offsetof(JSObject, group_) + 4 &&

These should use sizeof(uint32_t).
Attachment #8897287 - Flags: review?(jcoppeard) → review+
(In reply to Jon Coppeard (:jonco) from comment #86)
> Comment on attachment 8897287 [details] [diff] [review]
> Change Relocated marker to not confuse string vs object bit
> 
> Review of attachment 8897287 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: js/src/jsgc.h
> @@ +1093,5 @@
> >  
> >      /* Set to Relocated when moved. */
> > +    uint32_t magic_;
> > +#else
> > +# error RelocationOverlay unimplemented on big-endian.
> 
> Does this not work on big-endian?  I think it just means that we will
> overwrite the low word of an object's group pointer... which should be fine.

Oh, you're right. I was thinking I'd have to adjust a bunch of static asserts, and decided to just punt. But it looks like they're all using length/flags, preserved/magic, or the beginning of where the group is stored -- so it's all the same.

> ::: js/src/jsgcinlines.h
> @@ +469,5 @@
> >  {
> >      MOZ_ASSERT(!isForwarded());
> >      // The location of magic_ is important because it must never be valid to see
> >      // the value Relocated there in a GC thing that has not been moved.
> > +    static_assert(offsetof(RelocationOverlay, magic_) == offsetof(JSObject, group_) + 4 &&
> 
> These should use sizeof(uint32_t).

ok
Flags: needinfo?(sphink)
Attached file nursery-strings.bundle (obsolete) —
Fresh Talos push at https://treeherder.mozilla.org/#/jobs?repo=try&revision=72101760754de6ceff0e3356ee22c70a9faec438

I'm attempting to make a user repo to store these, but for now, I'll attach a bundle (hg pull nursery-strings.bundle).
Comment on attachment 8892247 [details] [diff] [review]
Strings in the nursery: JIT

Review of attachment 8892247 [details] [diff] [review]:
-----------------------------------------------------------------

Canceling r? as there's a new version.
Attachment #8892247 - Flags: review?(jdemooij)
Comment on attachment 8897288 [details] [diff] [review]
Strings in the nursery: MIR node

Review of attachment 8897288 [details] [diff] [review]:
-----------------------------------------------------------------

LGTM, sorry for the delay.
Attachment #8897288 - Flags: review?(jdemooij) → review+
Comment on attachment 8897291 [details] [diff] [review]
Strings in the nursery: JIT

Review of attachment 8897291 [details] [diff] [review]:
-----------------------------------------------------------------

Great!

::: js/src/jit/CodeGenerator.cpp
@@ +10192,5 @@
>          const Value* vp = graph.constantPool();
>          ionScript->copyConstants(vp);
>          for (size_t i = 0; i < graph.numConstants(); i++) {
>              const Value& v = vp[i];
> +            if ((v.isObject() || v.isString()) && IsInsideNursery(v.toGCThing())) {

nit: Maybe just v.isGCThing() so we don't have to change this if we ever nursery-allocate more things.

::: js/src/jit/JitCompartment.h
@@ +624,5 @@
>      }
>  
>      size_t sizeOfIncludingThis(mozilla::MallocSizeOf mallocSizeOf) const;
> +
> +    bool stringsCanBeInNursery;

Nit: add a short comment here, explaining why this is needed.

::: js/src/tests/lib/tests.py
@@ +41,5 @@
>           '--ion-check-range-analysis', '--ion-extra-checks', '--no-sse3', '--no-threads'],
>          ['--no-asmjs', '--no-wasm', '--no-baseline', '--no-ion'],
>      ],
> +    'baseline': [
> +        ['--no-ion'],

Adding this does not affect what we run in automation right?
Attachment #8897291 - Flags: review?(jdemooij) → review+
There is alot of work on this bug along with dependencies that is not going to make it for 57. I am moving this bug to P2 for post 57.
Whiteboard: [qf:p1] → [qf:p2]
I spent some time looking at this patch set investigating why the octane benchmark is slower with it.  There are a couple of improvements we can make.

Firstly, disabling nursery strings per-zone doesn't unset the JitCompartment's stringsCanBeInNursery flags so we still get JIT allocation of nursery strings.  Also does this does unnecessary work if allocation is already disabled.  This helped with PdfJS (and splay I think?) by making sure we disabled nursery string allocation.

Secondly the growth and pretenuring heuristics are no longer working for us now we can put more stuff in the nursery.  I reduced these somewhat.  These changes didn't make any discernable difference to a branch without nursery string allocation.  This helped deltablue and splay.

There was a null pointer deref when doing asm.js compilation that I fixed too.  I also removed some asserts that were firing when we JIT compile off thread to make that work.

With this patch I get comparable octane results overall, although I'm seeing even more 
variability in things like splay-latency.  We could probably improve our heuristics further.

                        Original:                With nursery string allocation:
         Richards:       30794.3 (10  1.6%)       30905.1 (10  0.2%)   0.4%
        DeltaBlue:       67542.8 (10  0.9%)       67520.5 (10  6.3%)  -0.0%
           Crypto:       30211.3 (10  0.9%)       30202.3 (10  1.0%)  -0.0%
         RayTrace:      134071.2 (10  0.6%)      134937.0 (10  1.2%)   0.6%
      EarleyBoyer:       35471.2 (10  3.1%)       35528.7 (10  1.7%)   0.2%
           RegExp:        5065.3 (10  3.2%)        6642.6 (10  0.6%)  31.1%
            Splay:       15803.2 (10  4.3%)       15811.8 (10  4.2%)   0.1%
     SplayLatency:       21284.5 (10  2.8%)       20453.8 (10  6.8%)  -3.9%
     NavierStokes:       43542.5 (10  0.4%)       43613.2 (10  0.3%)   0.2%
            PdfJS:       17796.5 (10  6.3%)       17429.0 (10  3.6%)  -2.1%
         Mandreel:       29890.4 (10  3.4%)       29786.9 (10  3.3%)  -0.3%
  MandreelLatency:       36208.0 (10  3.1%)       36410.9 (10  6.1%)   0.6%
          Gameboy:       57623.7 (10  3.5%)       59896.9 (10  5.6%)   3.9%
         CodeLoad:       20875.0 (10  1.5%)       21013.3 (10  2.0%)   0.7%
            Box2D:       52836.2 (10  4.2%)       51790.7 (10  3.7%)  -2.0%
             zlib:       83391.4 (10  1.9%)       85212.2 (10  1.2%)   2.2%
       Typescript:       28537.6 (10  2.9%)       28501.9 (10  5.1%)  -0.1%
Score (version 9):       32967.0 (10  1.0%)       33490.3 (10  1.3%)   1.6%
Attachment #8912301 - Flags: feedback?(sphink)
Keywords: perf
Attachment #8895337 - Attachment is obsolete: true
Attachment #8893883 - Attachment is obsolete: true
Attachment #8892247 - Attachment is obsolete: true
The ReportOutOfMemory; recoverFromOutOfMemory one is a little odd. If allowGC is false, it will still call GCRuntime::onOutOfMallocMemory, which will do various things. And I think some callbacks will be invoked. I'm not entirely sure what the desired behavior is here, but this patch preserves the existing setup, I think?
Attachment #8920274 - Flags: review?(jcoppeard)
Comment on attachment 8920274 [details] [diff] [review]
Postpone malloc accounting of nursery strings until they are tenured

Review of attachment 8920274 [details] [diff] [review]:
-----------------------------------------------------------------

As discussed on IRC, I don't think this necessary.  We can add this later if we see lots of strings with malloc allocations in the nursery.
Attachment #8920274 - Flags: review?(jcoppeard)
Blocks: 1410238
Priority: -- → P2
Whiteboard: [qf:p2] → [perf:p1]
Whiteboard: [perf:p1] → [qf:p1]
Whiteboard: [qf:p1] → [qf:i60][qf:p1]
Whiteboard: [qf:i60][qf:p1] → [qf:i60][qf:p1]
Whiteboard: [qf:i60][qf:p1] → [qf:f60][qf:p1]
Comment on attachment 8912301 [details] [diff] [review]
nursery-string-tweaks

Review of attachment 8912301 [details] [diff] [review]:
-----------------------------------------------------------------

I've incorporated these changes into my patches, and indeed it helps significantly with speed.

Current status is that the perf issues are fixed (though could still be improved), but I'm getting crashes that appear to be due to a problem with unboxed objects. I fixed 2 problems with unboxed objects already, but it looks like there's at least one more.
Attachment #8912301 - Flags: feedback?(sphink) → feedback+
Attachment #8897621 - Attachment is obsolete: true
Attached patch Another rollup, for fuzzing (obsolete) — Splinter Review
I'm having a lot of trouble reproducing the test failures locally (even though they are very reliable on try), and I was hoping I might be able to get a better test case from fuzzing.

I will also upload a bundle, which might be easier to use?
Attachment #8938659 - Flags: feedback?(gary)
Attachment #8938659 - Flags: feedback?(choller)
Attached file nursery-strings.bundle
Bundle of all patches. Save to a file, then |hg pull -u <filename>| to get it into a local repository. Assumes you have 45fe585acac1 which is the current head of mozilla-inbound.
Comment on attachment 8895335 [details] [diff] [review]
Disable per Zone

Review of attachment 8895335 [details] [diff] [review]:
-----------------------------------------------------------------

May as well mark this r=me; I've read the patch and have been running with it for quite some time now (for testing).
Attachment #8895335 - Flags: review+
test_memoryReporters.xul looks for specific strings in about:memory, but there is no way to iterate over the nursery, so we won't find those strings.
Attachment #8938682 - Flags: review?(n.nethercote)
Attachment #8938682 - Flags: review?(n.nethercote) → review+
The following crash was found while fuzzing the attached patch and I couldn't reproduce it on any of the non-patched builds. I'm not sure if this is the bug you are looking for since I see no GC involved, but the fact that this only pops up with the patch seems to make it a good candidate for fixing anyways:


Summary: Crash [@ ??]
Build version: mozilla-central revision fe1794e607cc+
Build flags: --enable-posix-nspr-emulation --enable-valgrind --enable-gczeal --disable-tests --enable-debug --enable-optimize
Runtime options: --fuzzing-safe

Testcase:

format.formatToParts(date), [
  { type: 'month', value: '12' },
  { type: 'literal', value: '/' },
  { type: 'day', value: '17' },
  { type: 'second', value: '42' },
  { type: 'literal', value: ' '},
  { type: 'dayPeriod', value: 'AM'}
]
format.formatToParts(date), [{ type: 'month', value: 'D'}]
format.formatToParts(date), [
  { type: 'year', value: '2012'},
  { type: 'literal', value: ' '},
  { type: 'era', value: 'AD' }
]
format.formatToParts(date), [
  { type: 'weekday', value: 'Monday' },
  { type: 'month', value: '12' },
  { type: 'literal', value: '/' },
  { type: 'day', value: '17' },
  { type: 'literal', value: '/' },
  { type: 'year', value: '2012' }, 
  { type: 'literal', value: ',' },
  { type: 'hour', value: '3' },
  { type: 'literal', value: ':' },
  { type: 'minute', value: '00' },
  { type: 'literal', value: ':' },
  { type: 'second', value: 3},
]

Backtrace:

received signal SIGSEGV, Segmentation fault.
0x00002f222ab1c153 in ?? ()
#0  0x00002f222ab1c153 in ?? ()
#1  0x0000000000000000 in ?? ()
rax	0x7ffff448f140	140737291809088
rbx	0x7fffffffb790	140737488336784
rcx	0x3	3
rdx	0x0	0
rsi	0x3	3
rdi	0x7ffff41fe500	140737289118976
rbp	0x7fffffffb7d0	140737488336848
rsp	0x7fffffffb760	140737488336736
r8	0x400	1024
r9	0x9adb960a	2598082058
r10	0x7ffff5f3e000	140737319788544
r11	0xfffff	1048575
r12	0x2f222ab1c010	51823791685648
r13	0x7ffff4051f70	140737287364464
r14	0x7ffff5f16000	140737319624704
r15	0x7fffffffb930	140737488337200
rip	0x2f222ab1c153	51823791685971
=> 0x2f222ab1c153:	cmpl   $0x1,-0x17(%r11)
   0x2f222ab1c158:	je     0x2f222ab1c163
Comment on attachment 8938660 [details]
nursery-strings.bundle

Didn't really find anything with this patch so far.
Attachment #8938660 - Flags: feedback+
Attachment #8938659 - Flags: feedback?(gary) → feedback+
Comment on attachment 8938659 [details] [diff] [review]
Another rollup, for fuzzing

sfink reported to me that the crash I commented in comment 101 is exactly the bug he was looking for. No other issues found, feedback+
Attachment #8938659 - Flags: feedback?(choller) → feedback+
Blocks: 1330475
Patch set needed to be updated to now handle unboxed objects. Patch is very small yet took quite a while for me to get right. :(
Attachment #8940017 - Flags: review?(jdemooij)
Comment on attachment 8940017 [details] [diff] [review]
Handle unboxed objects

Review of attachment 8940017 [details] [diff] [review]:
-----------------------------------------------------------------

Ugh, great find.

::: js/src/vm/UnboxedObject.cpp
@@ +136,5 @@
>      // buffer entry for the new object.
>      Label postBarrier;
>      for (size_t i = 0; i < layout.properties().length(); i++) {
>          const UnboxedLayout::Property& property = layout.properties()[i];
> +        if (property.type != JSVAL_TYPE_OBJECT && property.type != JSVAL_TYPE_STRING)

Nit: can we change this to |if (!UnboxedTypeNeedsPostBarrier(property.type))| ?

That way when someone adds a new type to UnboxedTypeNeedsPostBarrier, we will get an assertion failure below.
Attachment #8940017 - Flags: review?(jdemooij) → review+
I was looking at the stack of a crash, probably due to my latest rebasing. It looked something like

  JS::GCThingTraceKind(void*)
  JS::Value::toGCCellPtr
  js::GCMarker::drainMarkStack

which is not happy-making: I would have hoped that drainMarkStack would have basic operations like that fully inlined.

The problem is that getting the trace kind requires two things that are currently unavailable to js/public: checking whether a nursery cell is a string vs object, and looking up the map of AllocKind -> TraceKind.

The first is mostly artificial -- we have the necessary shadow bits in jsfriendapi.h, but it seems pretty funky to include jsfriendapi.h from js/public/HeapAPI.h. This patch resolves that problem, and as a result nursery objects and strings can now stay on the inlinable path.

The second is more annoying. What we really want is for engine-internal users (eg drainMarkStack!) to use an inlined version, and let JSAPI users go through a function call. But it doesn't seem straightforward to set that up. I think we could do it with #ifdef EXPORT_JS_API, but I'm unsure what the best approach would be.

The easy path would be to say #ifdef EXPORT_JS_API then #include gc/Cell.h from js/HeapAPI.h, but that's pretty gross.

Just making multiple versions of GCThingTraceKind() would be straightforward enough for an individual .cpp file, but here we care about js/public/Value.h. Its toGCCellPtr() could have a different implementation #ifdef EXPORT_JS_API, but that would again require including gc/Cell.h, now from Value.h, and that's still gross.

The best I can come up with is to make a custom ValueToGCCellPtr() within gc/Marking.cpp, which is (1) gross because now Marking.cpp is twiddling with internal details of Values, and (2) it wouldn't fix all other users. The latter is perhaps not that big of a problem, given that the only other user is in XPCOM and so is going to pay the cost regardless.

Or we could expose MapAllocToTraceKind publicly, but gc/AllocKind.h is kind of explicitly for internal stuff.

Any opinions?

I was hoping there was some clever way with 'inline' and multiple definitions, but I'm not seeing it.
Attachment #8940380 - Flags: feedback?(jcoppeard)
Comment on attachment 8940380 [details] [diff] [review]
Allow inlining of Value.toGCCellPtr for nursery cells

Review of attachment 8940380 [details] [diff] [review]:
-----------------------------------------------------------------

> it seems pretty funky to include jsfriendapi.h from js/public/HeapAPI.h

Yes, moving the GC-relevant parts into HeapAPI.h like this is much better.

(While looking at this I noticed we have GCCellPtr::GCCellPtr(const Value&) and Value::toGCCellPtr(), with different implementations.)

> The second is more annoying
> The best I can come up with is to make a custom ValueToGCCellPtr()

That's what I would do.  It wouldn't have to be within gc/Marking.cpp but could be in some other internal but more general header.

::: js/src/jsapi.cpp
@@ +7725,4 @@
>  {
>      MOZ_ASSERT(thing);
> +    MOZ_ASSERT(!IsInsideNursery(static_cast<js::gc::Cell*>(thing)));
> +    return static_cast<js::gc::Cell*>(thing)->asTenured().getTraceKind();

Cell::asTenured() will assert that it's not in the nursery so you don't need the assertion as well.
Attachment #8940380 - Flags: feedback?(jcoppeard) → feedback+
Heh, this was fun. I was getting a talos-only crash. I hacked talos to run with --enable-debug, which found a bunch of pre-existing problems, but also gave me an assert for this bug: somehow we were finding a relocated Scope object. I was able to reproduce locally under rr, and the problem turned out to be because Scope starts with a 1-byte field, then an 8-byte pointer. The new RelocationOverlay::magic_ field (moved for these string patches) now falls into the padding between those two Scope fields. Which means that if you empty an arena during compaction and reuse it for Scopes, they will leave that memory uninitialized and produce a false positive assertion when the heap is traced for checking.

I have not yet scanned through all other GC subtypes to see if there are others.

If this produces a speed hit, it could be made DEBUG-only. But it would be confusing to be debugging a crash dump and find a live Scope object with a forwarded magic_ field, so I'm hoping that the clearing will get more or less coalesced with the initialization. (I'm hopeful that it'll just be a matter of extending the 8-bit value to a 64-bit value on 64-bit; I'm a little surprised it doesn't already do that, since doing an 8-byte write makes it dependent on the previous contents of that memory location. But maybe it's against the spec to touch padding memory?)
Attachment #8940912 - Flags: review?(jcoppeard)
Oh. My logic is incorrect. This was failing on an opt build. So it may be more than an assertion failure; something might be checking cells for forwarding at other times, in which case this can't be DEBUG-only. (But that's fine.)
Comment on attachment 8940912 [details] [diff] [review]
Prevent a stale RelocationOverlay::magic_ from sneaking into a newly created Scope

Review of attachment 8940912 [details] [diff] [review]:
-----------------------------------------------------------------

Oh wow, nice find!

I guess release only since we poison on debug?  That's nasty.

::: js/src/vm/Scope.h
@@ +242,3 @@
>          data_(0)
> +    {
> +        js::gc::RelocationOverlay::clearMagic(this);

I would prefer that we force initialisation of the entire first word without reference to relocation magic here because that's what should happen anyway, and also assert that newly allocated GC things are not forwarded in the allocator (not that that would have caught this).

Maybe | union { ScopeKind kind_, uintptr_t paddedKind_ } |?  Or just |uintptr_t kind_|?
Attachment #8940912 - Flags: review?(jcoppeard) → review+
(In reply to Jon Coppeard (:jonco) from comment #110)
> Comment on attachment 8940912 [details] [diff] [review]
> Prevent a stale RelocationOverlay::magic_ from sneaking into a newly created
> Scope
> 
> Review of attachment 8940912 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Oh wow, nice find!
> 
> I guess release only since we poison on debug?  That's nasty

Oh! Thanks for pointing that out. I idly wondered why poisoning didn't conceal this, but didn't really think it through.

That explains why this was talos-only, and still happened when I enabled DEBUG for talos: talos explicitly disables poisoning. So I could probably have gotten this bug to show up in other places by doing a push that disabled poisoning.

(So with talos, it is not release-only. But for our whole test suite, it is.)

> ::: js/src/vm/Scope.h
> @@ +242,3 @@
> >          data_(0)
> > +    {
> > +        js::gc::RelocationOverlay::clearMagic(this);
> 
> I would prefer that we force initialisation of the entire first word without
> reference to relocation magic here because that's what should happen anyway,
> and also assert that newly allocated GC things are not forwarded in the
> allocator (not that that would have caught this).

I don't follow that last. We expect reused arenas to have random forwarded crap, don't we? And we don't initialize anything during allocation, only during construction. So the assert would need to be a post-construction assert, for which there is no choke point.

> Maybe | union { ScopeKind kind_, uintptr_t paddedKind_ } |?  Or just
> |uintptr_t kind_|?

My first thought was the union, but it seemed kind of gross to modify all readers for a construction-specific concern.

Oh, but I'm wrong and you're right. The field definitions are describing the in-memory layout; they're private, so thankfully everything is hidden behind the class API. Still, the union has endian assumptions built into it (if I assign uintptr_t(kind) to paddedKind_, do kind_ get set to kind or zero?) On the other hand, expanding the width of the field means that reads have to read the full 64 bits and manipulate it as such, which means longer instructions on x86_64.

Ok, I think I'll go with the union and do paddedKind_ = 0; kind_ = kind; and see what you think. Hopefully the optimizer can make that be a single store, and read sizes can be properly controlled by the compiler to be whatever is best.

The correctness problems appear to be all fixed on my latest talos push. Unfortunately, I have a new 2.5x slowdown on damp to figure out now. It never ends... :(
// jsfunfuzz-generated
for (var i = 0; i < 99; i++) {
    try {
        // Adapted from randomly chosen test: js/src/tests/test262/built-ins/Object/defineProperties/15.2.3.7-6-a-140.js
        load("z.js");
    } catch (e) {}
}
// jsfunfuzz-generated
eval(`
    try {
        j,
        {   
            f: {
                value: 2
            }
        }
    } catch (e) {}
`);


z.js contains:

s({ h: { value: "" } } );


Stack:

#0  0x00001ac855cad6e0 in ?? ()
#1  0x0020002000200020 in ?? ()
#2  0x0000000000a8cc5f in js::UnboxedPlainObject::createWithProperties (cx=cx@entry=0x7ffff5f15000, group=..., group@entry=..., 
    newKind=newKind@entry=js::TenuredObject, properties=properties@entry=0x7ffff5f180e0) at /home/ubuntu/trees/mozilla-central/js/src/vm/UnboxedObject.cpp:709
#3  0x00000000009d5f7c in js::ObjectGroup::newPlainObject (cx=cx@entry=0x7ffff5f15000, properties=<optimized out>, nproperties=1, 
    newKind=newKind@entry=js::TenuredObject) at /home/ubuntu/trees/mozilla-central/js/src/vm/ObjectGroup.cpp:1255
#4  0x00000000009d6c5b in js::ObjectGroup::newPlainObject (cx=cx@entry=0x7ffff5f15000, properties=<optimized out>, nproperties=<optimized out>, 
    newKind=newKind@entry=js::TenuredObject) at /home/ubuntu/trees/mozilla-central/js/src/vm/ObjectGroup.cpp:1304
#5  0x000000000091fafc in js::frontend::ParseNode::getConstantValue (this=<optimized out>, cx=cx@entry=0x7ffff5f15000, 
    allowObjects=allowObjects@entry=js::frontend::ParseNode::AllowObjects, vp=vp@entry=..., compare=compare@entry=0x0, ncompare=ncompare@entry=0, 
    newKind=js::TenuredObject) at /home/ubuntu/trees/mozilla-central/js/src/frontend/BytecodeEmitter.cpp:6526
#6  0x000000000091f92b in js::frontend::ParseNode::getConstantValue (this=this@entry=0x7ffff5cd00e0, cx=0x7ffff5f15000, 
    allowObjects=allowObjects@entry=js::frontend::ParseNode::AllowObjects, vp=..., vp@entry=..., compare=compare@entry=0x0, ncompare=ncompare@entry=0, 
    newKind=js::SingletonObject) at /home/ubuntu/trees/mozilla-central/js/src/frontend/BytecodeEmitter.cpp:6500
/snip

jsfunfuzz did find something after all. Tested with a 64-bit debug non-deterministic build, on Linux.
Flags: needinfo?(sphink)
This crash might be similar to the one caused by comment 101 (I seem to get similar crash stacks). Reporting anyway just in case they are different.
(In reply to Steve Fink [:sfink] [:s:] from comment #111)
> I don't follow that last. We expect reused arenas to have random forwarded
> crap, don't we? And we don't initialize anything during allocation, only
> during construction. So the assert would need to be a post-construction
> assert, for which there is no choke point.

You're right, I forgot about that.  There's no easy way to add that assert.

> Still, the union has endian assumptions built into it (if I
> assign uintptr_t(kind) to paddedKind_, do kind_ get set to kind or zero?) On
> the other hand, expanding the width of the field means that reads have to
> read the full 64 bits and manipulate it as such, which means longer
> instructions on x86_64.

That's a good point.  Maybe we could use a bit field instead... but then we would have to calculate the number of padding bits.

> The correctness problems appear to be all fixed on my latest talos push.
> Unfortunately, I have a new 2.5x slowdown on damp to figure out now. It
> never ends... :(

Gah :(
Here's one way to do it. After discovering how many places in the code I'd need to change, I used some ugly operator overloading to eliminate them all. I'm not sure how you feel about it.
Attachment #8941552 - Flags: review?(jcoppeard)
Attachment #8940912 - Attachment is obsolete: true
Comment on attachment 8941552 [details] [diff] [review]
Prevent a stale RelocationOverlay::magic_ from sneaking into a newly created Scope

Review of attachment 8941552 [details] [diff] [review]:
-----------------------------------------------------------------

Could you use an anonymous union to get around having to change all the uses of kind_?

::: js/src/vm/Scope.h
@@ +231,5 @@
> +    union {
> +        ScopeKind kind;
> +        uintptr_t paddedKind;
> +        operator ScopeKind () const { return kind; }
> +        bool operator==(ScopeKind other) const { return kind == other; }

I only learnt that you could put methods in unions last week.  It's the second C++ feature I've found out about this year.
I never requested review for this patch before because it was something of an expirement, and I wasn't sure whether I was going to split out a separate nursery for strings or something. But I believe this is now part of how the nursery disabling stuff works, so I guess I'd better get it reviewed.

Until there's a separate nursery, this change isn't really needed, I don't think. But I'd rather not undo it at the moment, especially since I'll probably end up putting it back when I split out the string nursery completely.
Attachment #8941565 - Flags: review?(jcoppeard)
(In reply to Jon Coppeard (:jonco) from comment #117)
> Comment on attachment 8941552 [details] [diff] [review]
> Prevent a stale RelocationOverlay::magic_ from sneaking into a newly created
> Scope
> 
> Review of attachment 8941552 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Could you use an anonymous union to get around having to change all the uses
> of kind_?

I don't know how anonymous unions work, I'll give it a try. Yes! That's so much better. I guess I should have known that; I think we use them in strings or something.

Oh, and I forgot to mention -- I wrote a tool to process all type information gathered by the rooting analysis, and verified that Scope (and its subclasses) are the only Cell subclasses with padding anywhere near here. We do have several other cases of nontrivial amounts of padding, but nothing that intersects this region, and only one really seemed to bloat something up by a meaningful percentage.
Thanks, this is obviously far better. That'll be my C++ learning for the week.
Attachment #8941570 - Flags: review?(jcoppeard)
Attachment #8941552 - Attachment is obsolete: true
Attachment #8941552 - Flags: review?(jcoppeard)
(In reply to Gary Kwong [:gkw] [:nth10sd] from comment #114)
> This crash might be similar to the one caused by comment 101 (I seem to get
> similar crash stacks). Reporting anyway just in case they are different.

Thanks! Yes, that's the same one, now fixed.
Flags: needinfo?(sphink)
Attachment #8941565 - Flags: review?(jcoppeard) → review+
Comment on attachment 8941570 [details] [diff] [review]
Prevent a stale RelocationOverlay::magic_ from sneaking into a newly created Scope

Review of attachment 8941570 [details] [diff] [review]:
-----------------------------------------------------------------

Great, that's much nicer.
Attachment #8941570 - Flags: review?(jcoppeard) → review+
Pushed by sfink@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/9316d8f7b92a
Separate out AllocKinds for nursery-allocatable strings, r=jonco
https://hg.mozilla.org/integration/mozilla-inbound/rev/1fc5ee0d0116
Reparent JSString from TenuredCell to Cell, r=jonco
https://hg.mozilla.org/integration/mozilla-inbound/rev/80b9d97bf1fe
Make js::Allocate cast strings to requested type, r=jonco
https://hg.mozilla.org/integration/mozilla-inbound/rev/457008b194a8
Force non-atom strings to have their low flag bit set in order to distinguish them from JSObjects in the nursery, r=jandem
https://hg.mozilla.org/integration/mozilla-inbound/rev/6d7d15b25489
Avoid atomization for string size tests, r=jimb
https://hg.mozilla.org/integration/mozilla-inbound/rev/fdb6431ea4ff
Strings in the nursery: allocation, r=jonco
https://hg.mozilla.org/integration/mozilla-inbound/rev/2bc9d427f427
External strings can morph into plain ones while still in EXTERNAL_STRING arena, r=jonco
https://hg.mozilla.org/integration/mozilla-inbound/rev/11b3f0fda4ad
Remove verifier assumption that only objects are in the nursery, r=jonco
https://hg.mozilla.org/integration/mozilla-inbound/rev/7c96258a6459
Strings in the nursery: tracing and tenuring, r=jonco
https://hg.mozilla.org/integration/mozilla-inbound/rev/7d56db668369
Strings in the nursery: barriers, r=jonco
https://hg.mozilla.org/integration/mozilla-inbound/rev/f5f72c93adf9
Discard nursery keys from EvalCache, r=jonco
https://hg.mozilla.org/integration/mozilla-inbound/rev/7854bfe5d683
Strings in the nursery: allow any thread to access zone of permanent atoms, r=jonco
https://hg.mozilla.org/integration/mozilla-inbound/rev/c62e5867d763
Strings in the nursery: MIR node, r=jandem
https://hg.mozilla.org/integration/mozilla-inbound/rev/6f3666e9540e
Change Relocated marker to not confuse string vs object bit, r=jonco
https://hg.mozilla.org/integration/mozilla-inbound/rev/71831e232df2
Default nursery strings to off, add ability to enable, r=jonco
https://hg.mozilla.org/integration/mozilla-inbound/rev/38f4e0426bdd
Tenure strings that are checked for in about:memory test, r=njn
https://hg.mozilla.org/integration/mozilla-inbound/rev/ec5b307a28aa
Strings in the nursery: JIT, r=jandem
https://hg.mozilla.org/integration/mozilla-inbound/rev/cc56f32ddae8
Split out string nursery pointers from object nursery, r=jonco
https://hg.mozilla.org/integration/mozilla-inbound/rev/bb2cc298a155
Disable nursery strings in a Zone if too many get tenured, r=sfink
https://hg.mozilla.org/integration/mozilla-inbound/rev/3f72f8747e29
Allocate strings in the nursery, r=flagflip
Pushed by sfink@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/2dab1647f933
Register more nursery string buffers to be freed, r=leak
Backout by nerli@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/65e92478e09d
Backed out 20 changesets for detected memory leaks on a CLOSED TREE
https://hg.mozilla.org/integration/mozilla-inbound/rev/b580ce0f1d06
Backed out changeset 2dab1647f933 for mass failures on a CLOSED TREE
This also regressed Octane by 6% or so (mostly pdfjs and splay). Also some improvements everywhere, also on six-speed, so that's great :) Maybe we should land disabled by default? Merge is in 10 days or so.
Comment on attachment 8938659 [details] [diff] [review]
Another rollup, for fuzzing

Actually I found another subsequent bug:

Assertion failure: cx->isNurseryAllocAllowed(), at /home/ubuntu/trees/mozilla-central/js/src/gc/Allocator.cpp:138

which I've sent to :sfink. Until this is fixed and we have a new patch stack to test, clearing feedback flag for now.
Attachment #8938659 - Flags: feedback+
Oops, missed this comment.

(In reply to Jan de Mooij [:jandem] from comment #127)
> This also regressed Octane by 6% or so (mostly pdfjs and splay). Also some

Weird. I don't see a splay regression locally anymore (my last run showed a small improvement, but well within the noise threshold.) And overall it improved octane.

> improvements everywhere, also on six-speed, so that's great :) Maybe we
> should land disabled by default? Merge is in 10 days or so.

I attempted to land just before the soft freeze and got backed out for memory management problems, almost certainly in the handoff of malloc buffers when tenuring strings. At that point, it felt like it was too late to land, even disabled, and I figured I'd wait out the soft freeze.

Now I need to fix gkw's latest bug too.
Flags: needinfo?(sphink)
Recording current state.

I'm seeing hangs and crashes when pretenuring, specifically when we decide to disable nursery strings for a zone and do discardJitCode. Problematic path:

Path from #160235 to #17471:
  #160235 = void JS::Zone::discardJitCode(js::FreeOp*, uint8)
  #79841 = js::gc::ZoneCellIter<T>::operator GCType*() const [with GCType = JSScript]
  #80729 = GCType* js::gc::ZoneCellIter<T>::get() const [with GCType = JSScript]
  #80956 = T* js::gc::ZoneCellIter<js::gc::TenuredCell>::get() const [with T = JSScript]
  #81092 = JSScript* js::gc::ArenaCellIterImpl::get() const [with T = JSScript]
  #19277 = js::gc::TenuredCell* js::gc::ArenaCellIterImpl::getCell() const
  #9829 = GCAPI.h:void js::gc::ExposeGCThingToActiveJS(JS::GCCellPtr)
  #9832 = uint8 JS::UnmarkGrayGCThingRecursively(JS::GCCellPtr)
  #11058 = uint8 js::UnmarkGrayCellRecursively(js::gc::Cell*, int32)
  #307695 = uint8 (UnmarkGrayCellRecursivelyFunctor, void*, int32), (Forward<Args>)(JS::DispatchTraceKindTyped::args)...)) JS::DispatchTraceKindTyped(F, void*, JS::TraceKind, Args&& ...) [with F = UnmarkGrayCellRecursivelyFunctor; Args = {}; decltype (f(static_cast<JSObject*>(nullptr), (Forward<Args>)(JS::DispatchTraceKindTyped::args)...)) = bool]
  #309074 = uint8 UnmarkGrayCellRecursivelyFunctor::operator(js::BaseShape*)(T*) [with T = js::BaseShape]
  #309992 = Marking.cpp:uint8 TypedUnmarkGrayCellRecursively(js::BaseShape*) [with T = js::BaseShape]
  #17472 = void js::gcstats::AutoPhase::AutoPhase(js::gcstats::Statistics*, uint8)
  #17470 = void js::gcstats::AutoPhase::AutoPhase(js::gcstats::Statistics*, uint8)
  #17471 = void js::gcstats::Statistics::beginPhase(uint8)

This particular path will assert in my case because it's under a minorGC that pushed onto the phase stack. My try push shows an assert, several hangs, and one gray assertion failure (not sure if that's the same general problem, though.)

I don't think this should be doing the unmark gray read barrier. But I'm not sure if that's true for all callers of discardJitCode? It probably is.

When looking into this, I stumbled upon related bug 1433001 when traversing the callgraph. I may just be confused about that one.
Just re-adding some API points that I need. I wanted to check to make sure we don't have a better way of doing these things yet.
Attachment #8946970 - Flags: review?(jdemooij)
Comment on attachment 8946970 [details] [diff] [review]
Reimplement extractString(Address, Register)

Review of attachment 8946970 [details] [diff] [review]:
-----------------------------------------------------------------

Is it possible to rewrite your code to use unboxString instead? extractString(Address, Register) is basically the same as unboxString(Address, Register), except the former will return the output register. I guess you have a templated call somewhere but unboxString might work there too. I don't really like the extract* methods and having multiple ways of doing things :)
Attachment #8946970 - Flags: review?(jdemooij)
(In reply to Jan de Mooij [:jandem] from comment #132)
> Comment on attachment 8946970 [details] [diff] [review]
> Reimplement extractString(Address, Register)
> 
> Review of attachment 8946970 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Is it possible to rewrite your code to use unboxString instead?
> extractString(Address, Register) is basically the same as
> unboxString(Address, Register), except the former will return the output
> register. I guess you have a templated call somewhere but unboxString might
> work there too. I don't really like the extract* methods and having multiple
> ways of doing things :)

Yes, fair. The extract* stuff bothered me too, but it was pre-existing. That was the main reason I wanted your review, to see if it really needed to be this way for some reason I didn't bother to figure out myself. I'm happy to rework this.
Oh. Rereading your message, I see you said unboxString not unboxNonDouble.

And this patch oversteps what it really needs to do. I removed extractString entirely since it clutters up a lot of MacroAssembler* files and imho obscures the using code a little. But I'm not confident of whether I did it right, nor whether I should really be doing it in this bug.

So I marked feedback? to see whether you're ok with this landing as-is, you'd rather see it with unboxString instead of unboxNonDouble, and whether you'd prefer I didn't mess around with other extractString sites in the same bug.

(Why "unboxNonDouble", btw? Seems like it ought to just be "unbox", and preferably used for doubles too, to make it independent of the boxing scheme. Although... how does that work? If you have a DoubleValue in an integer register, then presumably you need to get the double into an fp reg? Never mind.)
Attachment #8947176 - Flags: feedback?(jdemooij)
Attachment #8946970 - Attachment is obsolete: true
Whoops. Yep, totally messed something up. Try tests are failing.
Sorry, that was just a dumb mistake. I was overwriting input in one case. Still testing this locally, but it fixes the try failures I was looking at.
Attachment #8947184 - Flags: feedback?(jdemooij)
Attachment #8947176 - Attachment is obsolete: true
Attachment #8947176 - Flags: feedback?(jdemooij)
Comment on attachment 8947184 [details] [diff] [review]
Reimplement extractString(Address, Register)

...and never mind. It turned out to be more involved than I was thinking. And after separating out the minimal necessary patch, it ended up being too trivial to review. It's a rebase fixup that I would've done in the first place had I understood what extractString and unboxString were doing.

I'll file a separate bug for the other stuff.
Attachment #8947184 - Attachment is obsolete: true
Attachment #8947184 - Flags: feedback?(jdemooij)
The first problem was that ensureLinear() takes a maybecx, and when maybecx was nullptr we didn't do the nursery registration stuff at all.

The second problem is that ensureLinear() can move ownership of a char buffer from a rope's leftmost child to the root node, and in that case the registration can end up mismatching the nursery status of the two strings. (So if you move the buffer between two tenured strings or two nursery strings, all is ok, but it's a problem if they are in different heaps.)

Doesn't crash locally, but I'm not running with asan either. Haven't pushed to try yet.
Attachment #8948052 - Flags: review?(jcoppeard)
Comment on attachment 8948052 [details] [diff] [review]
Fix ensureLinear() string leak and double-free

Review of attachment 8948052 [details] [diff] [review]:
-----------------------------------------------------------------

LGTM.
Attachment #8948052 - Flags: review?(jcoppeard) → review+
I'm pretty unclear on how all this stuff is put together, but I figure that you can probably tell me pretty easily.

The problem is that while creating the global for an offthread parse task, we could allocate a string (eg an error message). But we're in AutoAssertNoNurseryAlloc, which currently is ok because we don't expect to allocate any objects. We could do some extra check -- eg, prevent nursery allocation within AutoSuppressGC, or within AutoAssertNoNurseryAlloc, but both of those are kind of weird. We could make a new AutoSuppressNurseryAlloc. But that would need to do a bunch of things like flushing all JIT code.

So I'm wondering if this patch is enough -- just create the global before locking down GC stuff.

On the other hand, I don't understand why the current code works: if we cannot run within an incremental GC, what prevents that from happening already? From the comment, I was expecting this to be called just after finishing a GC or something, but it seems to be invoked directly from eg JS shell functions. So clearly I'm missing something.
Attachment #8948587 - Flags: feedback?(jcoppeard)
This collided pretty heavily with the LINEAR_BIT stuff, so it probably makes sense to request a re-review.
Attachment #8948589 - Flags: review?(jdemooij)
Could I get fuzzing on this and the next rollup patch? This patch has all of the nursery strings code in it, but with nursery string allocation disabled. I'd like to land it that way first and then wait a bit.
Attachment #8948591 - Flags: feedback?(nth10sd)
Attachment #8948591 - Flags: feedback?(choller)
This is not an incremental rollup, it's from current inbound like the other one. This has everything enabled, and is by far the more likely to cause bugs.
Attachment #8948592 - Flags: feedback?(nth10sd)
Attachment #8948592 - Flags: feedback?(choller)
Comment on attachment 8948587 [details] [diff] [review]
Prevent allocating into the nursery while preparing to do offthread parse

Review of attachment 8948587 [details] [diff] [review]:
-----------------------------------------------------------------

I think this works at the moment because we populate most of the gobal lazily, so very little actually gets created here.  Just the compartment, group and global JSObject (a singleton so not nursery allocated).

Does the AutoAssertNoNurseryAlloc trigger without this change?  If so maybe we should add a flag to the context for whether nursery GC is allowed.  Currently we use helperThread() for this.

::: js/src/vm/HelperThreads.cpp
@@ +798,5 @@
> +
> +    // It is unlikely that a GC is running now, but in case it is, finish it
> +    // up. Then suppress GC so that calls below do not trigger a new
> +    // incremental GC which could require barriers on the atoms compartment.
> +    gc::FinishGC(cx);

I don't think we want to do this.  This would cause GCs to be finished non-incrementally if we try to parse something off thread during an incremental GC, which probably happens quite often.
Attachment #8948587 - Flags: feedback?(jcoppeard)
Comment on attachment 8948589 [details] [diff] [review]
Force non-atom strings to have their low flag bit set in order to distinguish them from JSObjects in the nursery,

Review of attachment 8948589 [details] [diff] [review]:
-----------------------------------------------------------------

Sorry for the merge conflicts. Looks good, just some nits.

::: js/src/jit/Lowering.cpp
@@ +1999,5 @@
>      MOZ_ASSERT(idx->type() == MIRType::Int32);
>  
> +    LCharCodeAt* lir = new(alloc()) LCharCodeAt(useRegister(str),
> +                                                useRegister(idx),
> +                                                temp(LDefinition::INT32));

Nit: we can revert this change now?

::: js/src/jit/MacroAssembler.h
@@ +2127,5 @@
>                               TypedArrayObject* templateObj, TypedArrayLength lengthKind);
>  
>      void initUnboxedObjectContents(Register object, UnboxedPlainObject* templateObject);
>  
> +    void newGCString(Register result, Register temp, Label* fail, bool attemptNursery);

The patch changes some MacroAssembler.h methods without updating MacroAssembler.cpp

::: js/src/jsfriendapi.h
@@ +611,5 @@
>  };
>  
>  struct String
>  {
> +    static const uint32_t NON_ATOM_BIT     = JS_BIT(0);

We should add a NON_ATOM_BIT static_assert here:

https://searchfox.org/mozilla-central/rev/f80722d4f3bfb722c5ec53880c4a7efb71285676/js/src/vm/String.h#319-320

::: js/src/vm/String.h
@@ +269,2 @@
>  
> +    static const uint32_t ROPE_FLAGS             = NON_ATOM_BIT;

Nit: I think we should remove this and use INIT_ROPE_FLAGS defined below - because we have the linear bit now, we only need this value when we are *initializing* a rope.

@@ +269,4 @@
>  
> +    static const uint32_t ROPE_FLAGS             = NON_ATOM_BIT;
> +    static const uint32_t DEPENDENT_FLAGS        = NON_ATOM_BIT | LINEAR_BIT | HAS_BASE_BIT;
> +    static const uint32_t FLAT_FLAGS             = NON_ATOM_BIT | LINEAR_BIT;

Nit: similarly, we could call this INIT_FLAT_FLAGS and move it next to the other INIT_* constants below.
Attachment #8948589 - Flags: review?(jdemooij) → review+
Comment on attachment 8948591 [details] [diff] [review]
rollup patch with nursery strings still disabled

I gave :sfink the link for two crashes caused by this bug, including one testcase. Waiting for a new rollup now because other crashes we find might be related.
Attachment #8948591 - Flags: feedback?(choller) → feedback-
Comment on attachment 8948592 [details] [diff] [review]
rollup patch with nursery strings enabled

Canceling feedback request based on previous comment.
Attachment #8948592 - Flags: feedback?(choller)
Comment on attachment 8948591 [details] [diff] [review]
rollup patch with nursery strings still disabled

Cancelling feedback? pending :decoder's testcase fixes.
Attachment #8948591 - Flags: feedback?(nth10sd)
Comment on attachment 8948592 [details] [diff] [review]
rollup patch with nursery strings enabled

Cancelling feedback? pending :decoder's testcase fixes.
Attachment #8948592 - Flags: feedback?(nth10sd)
I still have somewhat mixed feelings about this patch. But it does what it says, and should prevent the theoretical assertion failure that I don't want to add a fast-path check for.

Note that the fuzz bugs I was struggling with were due to the alternative approach of creating the global object before suppressing GC. That won't work without somehow setting the createForHelperThread flag earlier.

The crashes I saw were from creating the global's zone and allocating a couple of things in that zone, then doing a gczeal-triggered GC. That marked the zone as needing prebarrier verification. We then suppress GC and the verifier is disabled, but that doesn't matter. At some point, we do another GC, which stops the prebarrier verifier. When it's done, it doesn't restart it because rt->hasHelperThreadZones().

So we end up in a state where we have a zone that thinks it wants prebarrier verification, but there's no verifier active. During the merge process, we will notice this and abort.

Creating the global with GC suppressed will prevent this mismatch. I originally thought that a new zone should inherit the current verifier state, but there is explicit code that needs to check whether we've created any new compartments and verify things differently if so.

Another potential fix would be to figure out a way to turn off our zone->needsIncrementalBarrier_ at some point. It could remove this footgun. I believe the specific problematic case is creating a zone for a helper thread with the verifier enabled. A simple fix might be to assert when setting a ZoneGroup to createdForHelperThread if any zones in that group have the prebarrier verifier active? Let me know what you think.
Attachment #8949247 - Flags: review?(jcoppeard)
Oops, I think I just uploaded the version of the patch that still crashes. I had switched back to it to make sure I had the sequence of events right.
Attachment #8949248 - Flags: review?(jcoppeard)
Attachment #8949247 - Attachment is obsolete: true
Attachment #8949247 - Flags: review?(jcoppeard)
Attached patch rollup v2, disabled (obsolete) — Splinter Review
Attachment #8948591 - Attachment is obsolete: true
Attachment #8949250 - Flags: feedback?(nth10sd)
Attachment #8949250 - Flags: feedback?(choller)
Attached patch rollup v2, enabled (obsolete) — Splinter Review
Attachment #8938659 - Attachment is obsolete: true
Attachment #8941346 - Attachment is obsolete: true
Attachment #8948592 - Attachment is obsolete: true
Attachment #8949252 - Flags: feedback?(nth10sd)
Attachment #8949252 - Flags: feedback?(choller)
(In reply to Steve Fink [:sfink] [:s:] from comment #152)
> Created attachment 8949250 [details] [diff] [review]
> rollup v2, disabled

This doesn't apply to m-c tip. Can you please let me know which m-c rev this can be applied to?
Flags: needinfo?(sphink)
Comment on attachment 8949248 [details] [diff] [review]
Implement AutoSuppressNurseryCellAlloc to avoid nursery allocation just before offthread parse startup

Review of attachment 8949248 [details] [diff] [review]:
-----------------------------------------------------------------

The buffer alloc is a bit of a footgun.  I wonder how often this is called and whether we could assert that nursery alloc is allowed with the TLS context.

::: js/src/gc/Allocator.cpp
@@ +170,5 @@
>      size_t size = sizeof(StringAllocT);
>      MOZ_ASSERT(size == Arena::thingSize(kind));
>      MOZ_ASSERT(size == sizeof(JSString) || size == sizeof(JSFatInlineString));
>  
>      // Off-thread alloc cannot trigger GC or make runtime assertions.

You probably should update the comment here.

::: js/src/vm/String.cpp
@@ +652,5 @@
>      bool isLatin1 = left->hasLatin1Chars() && right->hasLatin1Chars();
>      bool canUseInline = isLatin1
>                          ? JSInlineString::lengthFits<Latin1Char>(wholeLength)
>                          : JSInlineString::lengthFits<char16_t>(wholeLength);
> +    if (canUseInline && !cx->isNurseryAllocSuppressed()) {

What is this check for?  Nothing inside this block looks like it is specifically nursery-only.  Maybe this should still check helperThread()?
Attachment #8949248 - Flags: review?(jcoppeard) → review+
This should apply on top of b68e9ed93b9f or I did it wrong. (This is a concatenation of all the patches and applies for me with patch -p1. The first rollup I did was a single diff. Let me know what works best for you.)
Attachment #8949250 - Attachment is obsolete: true
Attachment #8949250 - Flags: feedback?(nth10sd)
Attachment #8949250 - Flags: feedback?(choller)
Flags: needinfo?(sphink)
Attachment #8949414 - Flags: feedback?(nth10sd)
Attachment #8949414 - Flags: feedback?(choller)
Whiteboard: [qf:f60][qf:p1] → [qf:f61][qf:p1]
jandem - I don't know if this is necessary or not. This is doing an lea of string+JSDependentString::offsetOfBase into a register, then passing that register to a function call (that won't actually deref it, it looks like.) But I either need to do this or access JSDependentString::offsetOfBase somehow.
Attachment #8949901 - Flags: review?(jdemooij)
decoder's langfuzz has only found a single crash so far with the disabled patch, and it happens without the patch as well, so it seems like it's a good time to start fuzzing the enabled patch.
Attachment #8949252 - Attachment is obsolete: true
Attachment #8949252 - Flags: feedback?(nth10sd)
Attachment #8949252 - Flags: feedback?(choller)
Attachment #8950040 - Flags: feedback?(nth10sd)
Attachment #8950040 - Flags: feedback?(choller)
Comment on attachment 8949901 [details] [diff] [review]
Spectre mitigation for nursery strings in the JIT

Review of attachment 8949901 [details] [diff] [review]:
-----------------------------------------------------------------

More Spectre bustage, sorry.

::: js/src/jit/MacroAssembler.cpp
@@ +1585,5 @@
> +
> +    if (JitOptions.spectreStringMitigations) {
> +        // If the string does not have a base-string, zero the |str| register.
> +        // The code below loads &str->base so this should block speculative
> +        // execution.

I think it's okay to call this leaNewDependentStringBase (note the "New"), or something, and then do the computeEffectiveAddress + add a comment saying it's Spectre-safe because we just allocated the dependent string (so it can't be another string type and we know |base| must be a JSString*).

Btw, I thought we had the invariant that all strings allocated inline would be in the nursery, so we don't have to post barrier their fields. Is that not true?
Attachment #8949901 - Flags: review?(jdemooij) → review+
(In reply to Jan de Mooij [:jandem] from comment #159)
> Comment on attachment 8949901 [details] [diff] [review]
> Spectre mitigation for nursery strings in the JIT
> 
> Review of attachment 8949901 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> More Spectre bustage, sorry.
> 
> ::: js/src/jit/MacroAssembler.cpp
> @@ +1585,5 @@
> > +
> > +    if (JitOptions.spectreStringMitigations) {
> > +        // If the string does not have a base-string, zero the |str| register.
> > +        // The code below loads &str->base so this should block speculative
> > +        // execution.
> 
> I think it's okay to call this leaNewDependentStringBase (note the "New"),
> or something, and then do the computeEffectiveAddress + add a comment saying
> it's Spectre-safe because we just allocated the dependent string (so it
> can't be another string type and we know |base| must be a JSString*).

Hm, ok. That works. I don't think this would be a Spectre hazard even if it were an existing dependent string, because nothing dereferences the pointer generated. You would just have the address of some random memory cell in a register. 
But of course, that wouldn't be known locally, so your approach seems safer.

I don't know if those regexp cases should be handled, but I'm certainly not doing that in this bug!

> Btw, I thought we had the invariant that all strings allocated inline would
> be in the nursery, so we don't have to post barrier their fields. Is that
> not true?

Normally yes. But CreateDependentString in particular handles free list allocation failure internally, with:

        masm.newGCString(string, temp2, &fallbacks_[FallbackKind::NotInlineString], stringsCanBeInNursery);

I didn't totally understand what was going on when I was first working on this, but I believe I now do. Unfortunately, I lumped together nursery allocation failure with free list allocation failure, so if nursery allocation fails it will fall back to doing a tenured allocation and then continue. And in that case, you'll end up with a freshly created tenured string.

I could remove this barrier if I split up the failure modes into nursery failure vs tenured failure. If I jump to the 'failure' parameter label, would that retry the whole operation? It looks like in generateRegExpMatcherStub, that would go to an OOL path for the whole regexp execution, if I'm reading this correctly?
Attachment #8948587 - Attachment is obsolete: true
(In reply to Steve Fink [:sfink] [:s:] from comment #160)
> If I jump to the 'failure' parameter label,
> would that retry the whole operation? It looks like in
> generateRegExpMatcherStub, that would go to an OOL path for the whole regexp
> execution, if I'm reading this correctly?

I think that's correct. Keeping the barrier is probably fine though...
Pushed by sfink@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/5acea87ec5dd
Separate out AllocKinds for nursery-allocatable strings, r=jonco
https://hg.mozilla.org/integration/mozilla-inbound/rev/a59293001f00
Reparent JSString from TenuredCell to Cell, r=jonco
https://hg.mozilla.org/integration/mozilla-inbound/rev/84c7cf744dec
Make js::Allocate cast strings to requested type, r=jonco
https://hg.mozilla.org/integration/mozilla-inbound/rev/5c9e104fbc92
Force non-atom strings to have their low flag bit set in order to distinguish them from JSObjects in the nursery, r=jandem
https://hg.mozilla.org/integration/mozilla-inbound/rev/bcbc5c16f436
Avoid atomization for string size tests, r=jimb
https://hg.mozilla.org/integration/mozilla-inbound/rev/d33c2ce0eed1
Strings in the nursery: allocation, r=jonco
https://hg.mozilla.org/integration/mozilla-inbound/rev/e0fad5abd109
External strings can morph into plain ones while still in EXTERNAL_STRING arena, r=jonco
https://hg.mozilla.org/integration/mozilla-inbound/rev/8b11814dcd95
Remove verifier assumption that only objects are in the nursery, r=jonco
https://hg.mozilla.org/integration/mozilla-inbound/rev/af8459e4365b
Strings in the nursery: tracing and tenuring, r=jonco
https://hg.mozilla.org/integration/mozilla-inbound/rev/cb864cb39b98
Strings in the nursery: barriers, r=jonco
https://hg.mozilla.org/integration/mozilla-inbound/rev/1de96c3211a5
Discard nursery keys from EvalCache, r=jonco
https://hg.mozilla.org/integration/mozilla-inbound/rev/0a3caa89b298
Strings in the nursery: allow any thread to access zone of permanent atoms, r=jonco
https://hg.mozilla.org/integration/mozilla-inbound/rev/9529b3330677
Strings in the nursery: MIR node, r=jandem
https://hg.mozilla.org/integration/mozilla-inbound/rev/4f68a4776fb7
Change Relocated marker to not confuse string vs object bit, r=jonco
https://hg.mozilla.org/integration/mozilla-inbound/rev/a8e71c7c7da0
Default nursery strings to off, add ability to enable, r=jonco
https://hg.mozilla.org/integration/mozilla-inbound/rev/59ec7474c6a1
Tenure strings that are checked for in about:memory test, r=njn
https://hg.mozilla.org/integration/mozilla-inbound/rev/203171ae0828
Strings in the nursery: JIT, r=jandem
https://hg.mozilla.org/integration/mozilla-inbound/rev/39a1573c32ba
Split out string nursery pointers from object nursery, r=jonco
https://hg.mozilla.org/integration/mozilla-inbound/rev/ed48a62b2cb4
Register nursery string buffers to be freed, r=jonco
https://hg.mozilla.org/integration/mozilla-inbound/rev/2f5f91da6438
Implement AutoSuppressNurseryCellAlloc to avoid nursery allocation just before offthread parse startup, r=jonco
Comment on attachment 8949414 [details] [diff] [review]
rollup v3, disabled, applies to b68e9ed93b9f

No further issues found on debug64 after another day of fuzzing.
Attachment #8949414 - Flags: feedback?(choller) → feedback+
Comment on attachment 8950040 [details] [diff] [review]
rollup v4 with nursery strings enabled, apply to 2ef6e47e1087

No further issues found on debug64 after several days of fuzzing.
Attachment #8950040 - Flags: feedback?(choller) → feedback+
Comment on attachment 8949414 [details] [diff] [review]
rollup v3, disabled, applies to b68e9ed93b9f

Due to PTO, I wasn't able to get to this in time. Clearing feedback for now, will fuzz more when it hits mozilla-central.
Attachment #8949414 - Flags: feedback?(nth10sd)
Comment on attachment 8950040 [details] [diff] [review]
rollup v4 with nursery strings enabled, apply to 2ef6e47e1087

Due to PTO, I wasn't able to get to this in time. Clearing feedback for now, will fuzz more when it hits mozilla-central.
Attachment #8950040 - Flags: feedback?(nth10sd)
Pushed by sfink@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/20d1242828ef
Allocate strings in the nursery if enabled, r=jonco
gkw: the above commit adds --nursery-strings to the JS shell. But note that once I turn it on for real, the flag's name will change to --no-nursery-strings.
needinfo? to request fuzzing.
Flags: needinfo?(nth10sd)
Depends on: 1440562
Pushed by sfink@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/2d3891634ce0
Disable nursery strings in a Zone if too many get tenured, r=sfink
https://hg.mozilla.org/integration/mozilla-inbound/rev/c9f0ec7b3f11
Allocate strings in the nursery, r=flagflip
Pushed by sfink@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/741225b427a8
speed up test that is dramatically slow with zeal + nursery strings, r=me
https://hg.mozilla.org/integration/mozilla-inbound/rev/0f4234c21322
Disable nursery strings in a Zone if too many get tenured, r=sfink
https://hg.mozilla.org/integration/mozilla-inbound/rev/e8978f3fcfbe
Allocate strings in the nursery, r=flagflip
Flags: needinfo?(sphink)
Keywords: leave-open
I'm very happy this landed, congrats! However it did regress Octane-splay by 25% on AWFY, is it possible the pre-tenuring heuristic isn't working as expected?
Flags: needinfo?(sphink)
Backout by csabou@mozilla.com:
https://hg.mozilla.org/mozilla-central/rev/6d72eade26af
Backed out 3 changesets for frequent assertion failures on js/src/gc/Cell.h:182 a=backout
This also caused the frequent hazard bug 1440917: subprocess.CalledProcessError: Command '['/builds/worker/workspace/obj-haz-shell/dist/bin/js', '/builds/worker/checkouts/gecko/js/src/devtools/rootAnalysis/computeGCFunctions.js', 'callgraph.txt', 'gcFunctions.tmp', 'gcFunctions_list.tmp', 'gcEdges.tmp', 'suppressedFunctions_list.tmp']' returned non-zero exit status -11
Depends on: 1440914, 1440917
// jsfunfuzz-generated
gc();
function f(){};
f();
// Adapted from randomly chosen test: js/src/jit-test/tests/gc/bug-1208994.js
oomTest(() => getBacktrace(oomTest[load+1]));

crashes 64-bit js debug deterministic shell on m-c rev 7208b6a7b11c with --fuzzing-safe --no-threads --baseline-eager --no-ion having the following stack:

#0  js::ReportOutOfMemory (cx=cx@entry=0x0) at /home/ubuntu/trees/mozilla-central/js/src/vm/JSContext.cpp:366
#1  0x0000000000c2b51e in JSRope::flattenInternal<(JSRope::UsingBarrier)1, unsigned char> (this=this@entry=0x7ffff5e000f8, maybecx=maybecx@entry=0x0)
    at /home/ubuntu/trees/mozilla-central/js/src/vm/String.cpp:528
#2  0x0000000000c11038 in JSRope::flattenInternal<(JSRope::UsingBarrier)1> (maybecx=<optimized out>, this=0x7ffff5e000f8)
    at /home/ubuntu/trees/mozilla-central/js/src/vm/String.cpp:600
#3  JSRope::flatten (this=this@entry=0x7ffff5e000f8, maybecx=maybecx@entry=0x0) at /home/ubuntu/trees/mozilla-central/js/src/vm/String.cpp:612
#4  0x00000000008e8c82 in JSString::ensureLinear (cx=0x0, this=0x7ffff5e000f8) at /home/ubuntu/trees/mozilla-central/js/src/vm/String.h:1540
#5  js::jit::EqualStringsHelper (str1=0x7ffff5dae900, str2=0x7ffff5e000f8) at /home/ubuntu/trees/mozilla-central/js/src/jit/VMFunctions.cpp:1585
#6  0x0000298ad8848025 in ?? ()
#7  0xfffb7ffff5e000f8 in ?? ()
#8  0x00007fffffffbc40 in ?? ()


Now it no longer crashes m-c tip as the regressing changeset has been backed out.

autoBisect shows this is probably related to the following changeset:

The first bad revision is:
changeset:   https://hg.mozilla.org/mozilla-central/rev/e8978f3fcfbe
user:        Steve Fink
date:        Wed Feb 21 11:36:52 2018 -0800
summary:     Bug 903519 - Allocate strings in the nursery, r=flagflip

autoBisect shows this is probably related to the following changeset:

The first good revision is:
changeset:   https://hg.mozilla.org/mozilla-central/rev/6d72eade26af
user:        Cosmin Sabou
date:        Mon Feb 26 19:16:41 2018 +0200
summary:     Backed out 3 changesets (bug 903519) for frequent assertion failures on js/src/gc/Cell.h:182 a=backout
Flags: needinfo?(nth10sd)
(In reply to Steve Fink [:sfink] [:s:] from comment #169)
> gkw: the above commit adds --nursery-strings to the JS shell. But note that
> once I turn it on for real, the flag's name will change to
> --no-nursery-strings.

Steve, what's the current state then? Should --nursery-strings or --no-nursery-strings be tested, or should things be on hold for the moment? Or should we test a patch stack?
When this landed (comment 174), these regressions were noticed:

== Change summary for alert #11785 (as of Tue, 27 Feb 2018 11:34:14 GMT) ==

Regressions:

  3%  dromaeo_dom linux64 opt e10s stylo     3,574.98 -> 3,456.81
  3%  dromaeo_dom linux64 pgo e10s stylo     3,751.88 -> 3,646.07

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=11785
The backout from comment 177 brought the baseline to previous levels:

== Change summary for alert #11758 (as of Mon, 26 Feb 2018 15:19:23 GMT) ==

Improvements:

  5%  dromaeo_dom linux64 pgo e10s stylo     3,658.89 -> 3,852.82
  5%  dromaeo_dom linux64 opt e10s stylo     3,455.66 -> 3,638.53

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=11758
gkw's fuzz bug is a horribly simple one, easily fixed. I don't think it explains any of the other crashes that showed up.

At the moment, with it backed out, --nursery-strings should be tested. If I manage to re-land this, it'll switch back to --no-nursery-strings as the thing to test -- but really, that testing probably isn't too important at this point, as that will be very similar to the state that was landed happily for about 3 days.

(In reply to Jan de Mooij [:jandem] from comment #176)
> I'm very happy this landed, congrats! However it did regress Octane-splay by
> 25% on AWFY, is it possible the pre-tenuring heuristic isn't working as
> expected?

Yes, you are correct. I keep forgetting I incorrectly moved the important line to a different patch that I can't land yet. (Most of my testing was with that patch, sadly.) The landed version never pretenured. I will fix that for the next landing.

I'm still working on the other crashes, though. I may split this bug so it can ride the trains disabled, and have another bug if I land in a later version.
Flags: needinfo?(sphink)
Depends on: 1441002
I manually symbolicated a reftest crash (not sure why symbolication failed in automation?) and it turned out to be when doing JSRope::flattenInternal, the left string was pre-write-barriered. That traversed up through base links, and seems to have encountered a nursery string.

Marking a string assumes that if a string is tenured, then its base will also be tenured. Which... seems kind of reasonable. A nursery string is always newer than a tenured string. It's not clear to me how a newer string could be the base of an older string.

Oh. Oh! Yes it is, in fact it should be common. Which begs the question of why tests didn't fail all over the place.

If you have a tenured extensible string and concatenate it to something, producing a nursery rope, then flatten the rope, you would expect this to happen. The rope would steal the extensible string's buffer and transmogrify itself into an extensible string (but still in the nursery); the tenured string would transmogrify into a dependent string with its base pointing to the nursery string that stole its heart away.

Oh, hm. Except to hit this assertion, you need to

 (1) make this happen in the middle of an incremental GC
 (2) have a rope depth of > 1 (as in, you need a rope with a rope as its left child with an extensible string as its left child)
 (3) have the leftmost rope be tenured
 (4) have all of this happen off the main thread (so probably in an offthread parse) because the assert is from AssertSafeToSkipBarrier

...and *that* is why the assert is rare. And this code path is benign without the assert; regular marking checks for being in the nursery. I've found one other assert-only place that assumes that tenured strings' bases are always tenured. I haven't been able to find anywhere it would actually cause problems, though -- if you're tracing for tenuring, it'll do the right thing. Only marking tracers will stop traversing base links when they hit a nursery string.

Darn. I was really hoping this would explain some of the intermittent failures.
(Fixing the assert problem in the above rambling description should fix bug 1440914.)
Sadly, I don't see a smooth path forward, so as decoder pointed out I should really make this less painful for fuzzing.

With this patch, you can use --nursery-strings=on/off. The default will depend on whether or not I've been backed out most recently.
Attachment #8955208 - Flags: review?(jcoppeard)
Attachment #8955208 - Flags: review?(jcoppeard) → review+
Pushed by sfink@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/11a6d361ee90
Disable nursery strings in a Zone if too many get tenured, r=sfink
https://hg.mozilla.org/integration/mozilla-inbound/rev/321c29f48508
Switch to --nursery-strings=on/off for less fuzzing churn, r=jonco
https://hg.mozilla.org/mozilla-central/rev/11a6d361ee90
https://hg.mozilla.org/mozilla-central/rev/321c29f48508
Status: REOPENED → RESOLVED
Closed: 2 years ago2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla60
This was closed even though it is not enabled. I think that's fine, since I'm not going to be able to land this before the freeze. I am going to create a new bug for enabling, and restructure the bug dependency graph to be more accurate.
Blocks: 1442481
No longer depends on: 1440914, 1440917, 1440981, 1441002, 1441143
Depends on: 1443607
Depends on: 1444335
Depends on: 1470921
You need to log in before you can comment on or make changes to this bug.