Function relazification is complicated. The patch in bug 1005306 should fix some known issues and simplify it, but next month it will be a year later and that patch still hasn't landed yet... I think a big part of the complexity comes from relazifying during (incremental) GC, see the big comment in JSFunction::relazify for an example of this. Incremental GC also makes these issues very hard to reproduce. So far, all relazification bugs I've debugged in the browser have been really hard to reproduce. I think it'd be nice to either: (1) During marking, collect a list of relazifiable functions and then relazify them all at once at the end of the GC (or maybe stop when we run out of time). Here we'd have to ensure the function is still relazifiable because code can run between incremental slices but that's fine. (2) Only relazify during non-incremental GCs. Are shrinking/compacting GCs non-incremental? It might make sense to only relazify there. (s-s for now to avoid drawing too much attention to this..)
Another thing: relazification bugs are really hard to find for the fuzzers because we don't relazify in active compartments, so the fuzzers need to: (1) Create a new global/compartment and run some code in it. (2) Leave this compartment and trigger a GC. (3) Run some more code in the second compartment. The relazifyFunctions() shell function I added in bug 1116760 lifts the inactive-compartments-only restriction to help fuzzing, but because that's a function it doesn't catch GCs triggered in other places. See bug 1150513 for a relazification problem bz spent a lot of time on this week. Unfortunately I'm afraid we can't allow relazification during all GCs, because it will likely trigger a lot of false positives...
Does comment 0 make sense?
So what are the actual goals of relazification? Why the current choice of the "relazify during gc but only if we're not in the function's compartment", exactly? Is the "not in the compartment" thing a safety thing or a performance heuristic? Put another way, how much would break if we noted things that might be relazifiable during GC (assuming that's the right heuristic to start with) and then relazified async off an event from the event loop? We'd need an event loop, of course.
(In reply to Not doing reviews right now from comment #3) > So what are the actual goals of relazification? Relazification replaces a function's JSScript with a LazyScript, allowing the GC to potentially collect the JSScript if nothing else is keeping it alive. This reduces memory usage because JSScripts are much bigger than LazyScripts (they store bytecode etc). > Why the current choice of > the "relazify during gc but only if we're not in the function's > compartment", exactly? It's done to avoid relazification hazards, like the one you fixed today. Basically we're working with a function and then we call something that can relazify and as a result our function suddenly has a lazy script and has to be aware of that. I think the idea is that in *most* of these cases we entered the function's compartment, so we completely avoid the problem in those cases. > Is the "not in the compartment" thing a safety thing > or a performance heuristic? Safety, though I think it's also nice for performance to not relazify compartments that are running benchmarks/games etc. > Put another way, how much would break if we noted things that might be > relazifiable during GC (assuming that's the right heuristic to start with) > and then relazified async off an event from the event loop? We'd need an > event loop, of course. "async" as in off-thread or just on the main thread at a later point? The latter should be doable and is a bit like my first suggestion (comment 0), the former is more scary.
"async" as in main thread at a later point.
The needinfo from terrence got (I assume accidentally) cleared.
(In reply to Jan de Mooij [:jandem] from comment #0) > Function relazification is complicated. The patch in bug 1005306 should fix > some known issues and simplify it, but next month it will be a year later > and that patch still hasn't landed yet... > > I think a big part of the complexity comes from relazifying during > (incremental) GC, see the big comment in JSFunction::relazify for an example > of this. Incremental GC also makes these issues very hard to reproduce. So > far, all relazification bugs I've debugged in the browser have been really > hard to reproduce. > > I think it'd be nice to either: > > (1) During marking, collect a list of relazifiable functions and then > relazify them all at once at the end of the GC (or maybe stop when we run > out of time). Here we'd have to ensure the function is still relazifiable > because code can run between incremental slices but that's fine. I agree, that would help tremendously. We could probably even do this concurrently with our other table sweeping and maybe even get a big perf boost here. > (2) Only relazify during non-incremental GCs. Are shrinking/compacting GCs > non-incremental? It might make sense to only relazify there. Compacting and shrinking can be either incremental or non-incremental. They are non-incremental when done in response to memory pressure events and incremental when run during idle times.
Jan and I talked on IRC about this and came up with the idea of running an incremental relazification step at the very start of the GC (i.e. before root marking). This would simply cut the link between functions and scripts where appropriate, and the scripts would then be collected by the rest of the GC if nothing else was keeping them alive. This would not then have to deal with the complications of doing this during incremental marking. Marking a function would then always mark its script if it had one, as happened originally. This step could be incremental but would benefit from a way of easily iterating through all functions in a compartment. Since functions are already special objects whose size doesn't easily map to one of object alloc kinds, the suggestion is to make them a separate new GC type. The downside of this is extra arena overhead, but since we can compact these it would not be two large. (In fact we need to two alloc kinds, for functions and extended functions). Terrence and Steve, what do you think? This is a reasonably big change so asking for feedback here. Also, please correct me if I've misrepresented relazification as I'm not too familiar with the details.
FWIW, fixing this bug (and relazification issues like bug 1005306) is one of my q2 goals, so I can spend some time on this...
My first thought was: sounds great, go for it. On further reflection, it occurred to me that this is going to be /extremely/ painful, probably for even more reasons than I'm thinking of now, which are already numerous. On even more reflection, it might be worth doing anyway if we can use the opportunity to make things better in general. Some stream-of-consciousness thoughts: * Currently JSFunction is our only JSObject with overlaid C++ properties. If we ever want to do Oilpan, this is going to become a much larger category of things. * However, if the purpose is for faster iteration via separate arenas, that doesn't really jibe with creating a new generic mechanism for C++/JSObject chimeras. * Could we make JSFunction a type of UnboxedObject, with associated generic marking code and drop all of our JSFunction specializations, independent of the split arenas? I'm thinking we could box all of the non-JS pointers into PrivateValues or something, if needed. * Is our current split of JSGCTraceKind (the layout type) and the AllocKind (the layout AND size type) really the right abstraction? It's always seemed weird to me that these were not more independent. * On the other hand, maybe this split makes one implementation relatively trivial: alloc with AllocKind::OBJECT_FUNCTION instead of AllocKind::OBJECT2_BACKGROUND and put the new object "size" before OBJECT_LAST so that the arenas get treated as objects? I need to give this more thought.
Heck if I know. I fixed one relazification bug, but I had no idea that the whole thing was such a tangled web. The one problem I knew of was when you have a function or script or lazyscript in hand, then you do something that can GC, then you expect all of those things to still be what they were before, or perhaps you just expect them to still be as lazy or unlazy as they were before the GC (eg by checking if something is lazy, GC'ing, then grabbing the script off of it based on stale laziness info.) Usually you would be inside the compartment, so relazification would be blocked, but occasionally you'll switch compartments before GC'ing. But that appears to be just the tip of the iceberg, judging from that comment in JSFunction::relazify. Heck, I could fix the above by making the Rooted<> specializations of the relevant types block relazification of themselves and their canonical JSFunction (or something; I still don't understand the data model.) But it looks like the the execution stack isn't causing too much of the trouble here, it's state across iGC slices or something. Actually, is there a way to present a concise description of the current data model? Something like: A JSFunction can be canonical or non-canonical. A JSFunction can be lazy or nonlazy. A lazy JSFunction has a strong ref to a LazyScript. A nonlazy JSFunction has a ref to a JSScript. Whether that ref is strong or weak is complicated and buggy. A non-canonical JSFunction has a ref to a JSScript that points back to that very same non-canonical JSFunction, in complete disregard for everything I thought I understood (because I thought JSScripts would only point to canonicals.) A JSScript represents compiled source code, definitely including bytecode and possibly including JIT code. A JSScript has a strong ref to a canonical JSFunction or to nullptr. A JSScript has a strong ref to a LazyScript or to nullptr. A LazyScript represents the information needed to reconstitute a JSScript. A LazyScript may be active (a corresponding JSScript exists) or inactive. An active LazyScript has a strong ref to the JSScript. A LazyScript is bitter because its name is incorrect; it is not a script (at least, not a JSScript), merely the seed that can be grown into one. Is it possible to create a description that is something like that, only correct? And what does "canonical" mean, anyway? Is a JSFunction canonical only from the perspective of a particular JSScript? What the heck does that even mean? Anyway, I'm clearly not enough in tune with the existing setup to venture an opinion here. If I were to make a wild stab, I'd say there are two categories of issues here: (1) multi-cell rooting for things on the execution stack, and (2) multi-cell reference graph issues over the course of an incremental GC. Moving all relazification to the beginning or end of the iGC would resolve #2. #1 seems to currently be manually worked around with things like AutoDelazify and hold bits, and might be doable in a more principled way via Rooted specializations or perhaps just slightly more automated hold bits. But I'm too fuzzy on the data model to know what solution smells best. As a final comment, I'll note that the end of js::CloneFunctionAndScript reads: RootedScript cloneScript(cx, clone->nonLazyScript()); return clone; } which is either too subtle for me to understand or a useless RootedScript.
(In reply to Steve Fink [:sfink, :s:] from comment #11) > The one problem I knew of was when you have a function or script or > lazyscript in hand, then you do something that can GC, then you expect all > of those things to still be what they were before Ah, I didn't realize that, and that does complicate things, although as you say we can make rooters that preserve the laziness of the thing they root. It's also worth pointing out that adding a new thing kind for functions is not required for doing relazification at the start of GC, just desirable. If we only relazify on shrinking GCs as was mentioned then it might not be too bad just to iterate all objects of the correct kind looking for functions.
(In reply to Steve Fink [:sfink, :s:] from comment #11) > The one problem I knew of was when you have a function or script or > lazyscript in hand, then you do something that can GC, then you expect all > of those things to still be what they were before, or perhaps you just > expect them to still be as lazy or unlazy as they were before the GC Right, in practice this is not a huge problem because, as you said, we usually aren't dealing with function pointers outside the current compartment. But to solve this, we could do something like what bz suggested: only relazify if nothing else is running (Gecko's event loop could probably pass us a flag or call some JSAPI function). > But it looks like the the execution stack isn't > causing too much of the trouble here, it's state across iGC slices or > something. Yes, for now I'm mostly interested in disentangling JSFunction::trace and JSFunction::relazify. > And what does "canonical" mean, anyway? Is a JSFunction canonical > only from the perspective of a particular JSScript? What the heck does that > even mean? I think so; a canonical function is the thing the parser creates, then when we're executing code we can make clones of that canonical function. Canonical functions aren't exposed to JS code, unless it's a run-once/singleton function.
After some more IRC discussion with terrence, the plan is to: (1) Give JSFunction its own alloc kind (and similar for extended functions). (2) Add a non-incremental relazification GC phase that runs before marking. We want to do (2) non-incrementally first so we can measure and see how fast it's in practice. If it's slow, we can either (a) make it incremental or (b) only relazify if we're doing a shrinking GC. For relazification to work incrementally, we need to remember which arenas we still have to relazify. This seems solvable but it'd be nice if we could avoid it. I can hopefully start on it this week; will do the actual work in new/public bugs.
I started on this and noticed another issue, posting it here so I don't forget about it: CreateLazyScriptsForCompartment wants to iterate over all functions (so a new AllocKind will make that faster as wel), but it doesn't check extended functions.
Jandem fixed this in another bug.
Status: NEW → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.