Closed Bug 1414385 Opened 7 years ago Closed 6 years ago

addMessage reducer causes frequent OUT_OF_NURSERY

Categories

(DevTools :: Console, defect)

57 Branch
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: Harald, Unassigned)

References

(Blocks 1 open bug)

Details

https://perfht.ml/2lNV3i4

Shows a massive overhead of nursery GC after every call of addMessage. Either this can be optimized in JS or we determine that this is a common Redux/React pattern and the JIT should this better.
ni? Steve for Nursery input.
Flags: needinfo?(sphink)
What's happening here is that we are getting a long series of FULL_SLOT_BUFFER minor GCs, all with a promotion rate of 51.5-51.6% (it's a very tight range, leading me to believe it's doing a long series of the same thing.) These minor GCs take 5-7ms (well, one in the middle is 16ms). When it started doing this stuff, the nursery size quickly climbed from 1MB -> 16MB. So we appear to be tenuring 8MB of GC cells every minor GC, and that are 35 of them or so.

I believe the main line of code relevant is here is the state.withMutations call at http://searchfox.org/mozilla-central/source/devtools/client/webconsole/new-console-output/reducers/messages.js#97

withMutations is from immutable.js, so we're constructing new immutable data structures that are partly modified, partly pointers to older versions.

I don't know if this problem is fundamental to using immutable things, or whether this is seeing too much activity. (Or maybe holding onto the older data somehow?) One line that jumps out as suspicious is http://searchfox.org/mozilla-central/source/devtools/client/webconsole/new-console-output/reducers/messages.js#97

    record.set("visibleMessages", [...visibleMessages, newMessage.id]);

What type is visibleMessages? Couldn't this be visibleMessages.unshift(newMessage.id)?
Flags: needinfo?(sphink)
I guess that last is a question for bgrins.
Flags: needinfo?(bgrinstead)
(In reply to Steve Fink [:sfink] [:s:] from comment #2)
> I believe the main line of code relevant is here is the state.withMutations
> call at
> http://searchfox.org/mozilla-central/source/devtools/client/webconsole/new-
> console-output/reducers/messages.js#97
> 
> withMutations is from immutable.js, so we're constructing new immutable data
> structures that are partly modified, partly pointers to older versions.
> 
> I don't know if this problem is fundamental to using immutable things, or
> whether this is seeing too much activity. (Or maybe holding onto the older
> data somehow?) One line that jumps out as suspicious is
> http://searchfox.org/mozilla-central/source/devtools/client/webconsole/new-
> console-output/reducers/messages.js#97
> 
>     record.set("visibleMessages", [...visibleMessages, newMessage.id]);
> 
> What type is visibleMessages? Couldn't this be
> visibleMessages.unshift(newMessage.id)?

Thanks for looking into this. That line was moved around in Bug 1371721 but I believe it was originally added in Bug 1363678. visibleMessages is an Array but I believe we don't want to use unshift because we don't want to modify the state passed into the function. We have some similar grossness for example at https://dxr.mozilla.org/mozilla-central/source/devtools/client/webconsole/new-console-output/reducers/messages.js#226. Would it be faster to make a copy of visibleMessages via slice or something right at the beginning of the function?
Flags: needinfo?(bgrinstead)
Oh. I assumed visibleMessages would be a Immutable.List or something. (But then, I've never used immutable.js before.)

I don't know for sure that visibleMessages is even relevant, I just saw the flatten and it seemed like it might be doing a lot of copying.

It might be that we could fix this on the GC side by making this use a whole cell store buffer entry instead of using a SlotsEdge. (These minor GCs are happening because we're recording tenured -> nursery cross-generational pointers, one per property, and in this case it looks like it might be better to just record the owning object and scan all of its properties at GC time. Because these minor GCs are being triggered by running out of space for recording slots.)
(In reply to Steve Fink [:sfink] [:s:] from comment #5)
> Oh. I assumed visibleMessages would be a Immutable.List or something. (But
> then, I've never used immutable.js before.)

It would be interesting to see if the behavior changes if it _is_ an Immutable.List...  It feels like the current code uses Immutable half way, since it's used for the record holding this list, but not the list itself.

Just a thought though, not sure if it would make a difference here.
How can I reproduce this test case?
STR:
- Open Console im about:blank
- Execute ‘for (var i = 0; i < 1000; console.log(i));’
Flags: needinfo?(hkirschner)
Any insights from the STR?
Flags: needinfo?(sphink)
(In reply to :Harald Kirschner :digitarald from comment #9)
> Any insights from the STR?

No, that seems to be something else entirely. If I use the Web to evaluate that line -- or even only counting up to 10 -- it displays '0' on the console (not 0, 1, 2, ...) and then shows a counter that I assume is the number of duplicates of that message? And it just keeps counting forever, burning 200% CPU. (I let it go to 1.4 million once.) If I manage to capture a profile of this happening (difficult, since the main thread is pretty overwhelmed), then it does indeed show many many minor GCs, but they have reason=FULL_CELL_PTR_BUFFER. And given that it seems to be in an infinite loop, it is reasonable to overflow something or other.

I'll restart in safe mode to make sure I can reproduce and then file a bug.

If I use the Browser Console instead, I get similar behavior, but it gives me a slow script dialog from which I can cancel it, and then it spits out 2 JS stacks (presumably because I canceled?)
Flags: needinfo?(sphink)
My bad, I typed the line above on my phone without testing and missed an important piece. No magic here, just printing 1000 numbers:

for (var i = 0; i < 1000; console.log(i++));
Oh. Uh. Right. I should read the code I'm pasting.

    for (var i = 0; i < 1000; console.log(i));

*is* an infinite loop. It never increments i. Sorry, I'm blind. Let me try this again...
Yes, this gives me something to work with. Most of the minorGCs are OUT_OF_NURSERY, but there are quite a few FULL_SLOT_BUFFER mixed in.

Actual histogram, ignoring reasons that show up 5 or fewer times:

     6 TOO_MUCH_MALLOC
    64 FULL_SLOT_BUFFER
    82 FULL_WHOLE_CELL_BUFFER
   756 OUT_OF_NURSERY

The range of times for OUT_OF_NURSERY is wide, but skimming through it says that the median is <1ms. The median for FULL_SLOT_BUFFER looks like it's around 5ms. (All of the FULL_WHOLE_CELL_BUFFER minor GCs are very fast; the median looks to be about 0.2ms and the max is 346µs.)
Note that the histogram for the profile in comment 0 is a little different:

     6 FULL_CELL_PTR_BUFFER
     6 TOO_MUCH_MALLOC
     8 ALLOC_TRIGGER
    57 FULL_SLOT_BUFFER
  1926 OUT_OF_NURSERY

The medians are about the same, though.
I've been looking at what is getting tenured. When logging 0..249, I tenured

2414 x { element, parentID, isMounted } -- some react thing?
 - in all GCs in this log, this ObjectGroup tenured 2414 + 1238 + 1862
1961 x { onChange, messagesRepeat, open, data-url, title, messageBody, visibleMessages } -- visibleMessages is familiar from the code
 - in all GCs in this log, this ObjectGroup tenured 261 + 1961 + 784 + 1690

These are just a sampling of the properties on those objects (the first has 7ish properties, the second has 82).

There were a few other frequent ObjectGroups. Some stats object (range, bucket_count, histogram_type, values, sum) tenured 923. There were 3000 or so unknown OGs. (Megamorphic? I don't know what that means.)

Counting 0..999 had the same ones plus some more. Notable:

  10010 x [Function * 0x7fffd138aa00] : <Function 0x7fffd1396e80> unknown {}
Details on the different minor GCs:

                reason: count    sum median  mean mean_promo%
         ALLOC_TRIGGER:     8  21022   4245  2627        67.5
  FULL_CELL_PTR_BUFFER:     3  39373  14255 13124        37.5
      FULL_SLOT_BUFFER:    57 353190   5955  6196        50.5
FULL_WHOLE_CELL_BUFFER:     1   1523   1523  1523        14.5
        OUT_OF_NURSERY:  1923 358185    151   186         0.7
       TOO_MUCH_MALLOC:     6    891    151   148         0.6

So the relevant types are OUT_OF_NURSERY and FULL_SLOT_BUFFER. There are 40x as many OUT_OF_NURSERY triggers, but the total time is about the same for the two types, because the FULL_SLOT_BUFFER collections are much, much slower.

To summarize, it seems from the GC side that there are a couple of things that don't look good. The FULL_SLOT_BUFFER GCs are very slow, probably because they are tenuring half of the nursery every time. That suggests that either we should pretenure the things being placed into those slots, increase the size of the buffer, or use a different store buffer type (eg the whole cell buffer) to prevent this from triggering GCs.

Then there are all of the OUT_OF_NURSERY GCs. We don't want to pretenure to avoid those, because the promotion rate is under 1%. This points to the code being problematic; why is it generating so much short-lived garbage?

Where is this observed as a problem in practice? console.log()ging 1000 messages probably isn't critical to optimize for. But is this coming up because we *do* log lots and that being slow is an issue? Or is console.log just a convenient way to trigger the same symptoms as something more relevant?

How does console.log work, anyway? Is this using postMessage() to send all these log messages between processes? That would create a fair amount of temporary garbage. (The tenured stuff appears to be from the code using React and immutable.js; I don't know what the temporary things are.)
Priority: -- → P3
Whiteboard: [newconsole-mvp]
Priority: P3 → P2
Since the last activity on this bug, Immutable.js has been removed in the console.
I'm not sure the original issue still reproduces.

Here is the recording: https://perfht.ml/2F5J4kC
of: for (var i = 0; i < 1000; console.log(i++));
Blocks: console-perf
No more OUT_OF_NURSERY pattern, looks like this is fixed. Since Steve filed bug 1432565 as follow up for the Maps constructor, this can be closed.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Priority: P2 → --
Whiteboard: [newconsole-mvp]
Product: Firefox → DevTools
You need to log in before you can comment on or make changes to this bug.