bugzilla.mozilla.org has resumed normal operation. Attachments prior to 2014 will be unavailable for a few days. This is tracked in Bug 1475801.
Please report any other irregularities here.

Loading travis logs hangs the browser due to O(N^2) nsGenConList::DestroyNodesFor (long lists of quotes or counters)

RESOLVED FIXED in Firefox 51

Status

()

Core
Layout
RESOLVED FIXED
2 years ago
a year ago

People

(Reporter: evilpie, Assigned: xidorn)

Tracking

({perf})

unspecified
mozilla51
Points:
---

Firefox Tracking Flags

(e10s-, firefox51 fixed)

Details

(URL)

MozReview Requests

()

Submitter Diff Changes Open Issues Last Updated
Loading...
Error loading review requests:

Attachments

(4 attachments, 1 obsolete attachment)

(Reporter)

Description

2 years ago
This was reported on HN. Basically loading this travis log hangs the browser. I was using e10s and actually had to kill the content process to get it under control.

Chrome is definitely not fast, but it does finish loading the whole log.
We seem to spend lost of time under PresShell::ContentRemoved, especially in 
nsGenConList::DestroyNodesFor.
Hmm, isn't nsGenConList::DestroyNodesFor effectively O(n^2).
...when removing a list of nodes
Yes.

I suppose we should just give each nsGenConList an out-of-band hashtable to find the nodes for a given frame.

Alternatively, we could store the list as a tree instead of a linked list, and use the sorting code to do removals.  (Though that has the risk that bugs that make the sorting code nondeterministic would lead to dangling pointers, which sounds a little risky.)
Keywords: perf
Summary: Loading travis logs hangs the browser → Loading travis logs hangs the browser due to O(N^2) nsGenConList::DestroyNodesFor (long lists of quotes or counters)
_Loading_ the log hangs, or _unloading_?  Unloading is bug 1230007, right?
Flags: needinfo?(evilpies)
(Reporter)

Comment 6

2 years ago
Loading. I don't even get to the unloading part.
Flags: needinfo?(evilpies)
I just profiled loading and I guess it's also hitting removeChild codepaths; presumably the page is mutating its DOM or something.  I'm seeing nothing remotely resembling a permahang, just 4-5 hangs of 2-3 seconds each, but it might depend on network parameters or something...
And hmm, I guess this is in fact DestroyNodesFor, not renumbering of the generated content, so not quite the same as bug 1230007.

Updated

2 years ago
tracking-e10s: --- → ?

Updated

2 years ago
tracking-e10s: ? → -
(Assignee)

Comment 9

2 years ago
(In reply to Boris Zbarsky [:bz] from comment #8)
> And hmm, I guess this is in fact DestroyNodesFor, not renumbering of the
> generated content, so not quite the same as bug 1230007.

When I was profiling bug 1230007, I didn't see much time spent in nsCounterManager::RecalcAll. I actually see the same thing here that DestroyNodesFor takes tones of time, so I guess we can dupe it to bug 1230007.
(Assignee)

Updated

2 years ago
Assignee: nobody → xidorn+moz
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
(Assignee)

Comment 13

2 years ago
Test looks fine: https://treeherder.mozilla.org/#/jobs?repo=try&revision=b032371ed780

The last part is actually for bug 1230007, but I don't think it makes much sense to delay doing that (other than increasing number of bugs I fix :)

Comparing to dbaron's proposal in comment 4, my solution in part 2 has a potential risk that if Recalc of a list is called at any point after the invalidation in nsCSSFrameConstructor::NotifyDestroyingFrame(), but before weak frames are cleared, we may end up having a inconsistent list. I guess that would unlikely happen, though.

Adding a hash table to each nsGenConList might be a good idea as it may have lower overhead on performance (because no string hash) and less risky for the case described above, but I'm a bit concerned that it would at least double the additional memory consumption comparing to this approach.
Is it guaranteed that common use of nsWeakFrames doesn't end up causing similar performance issues.
PresShell::ClearFrameRefs is after all O(n).

(Initially nsWeakFrame was designed to be stack-only class, but since it proved to be useful, rare/shortliving use of it was accepted also on heap)
(Assignee)

Comment 15

2 years ago
Wait... wat? nsWeakFrame is a singly linked list so... removing one in the middle without knowing its siblings is basically another O(n)?! Ahhhh...

Hopefully the linked list is short enough :/ Otherwise we may want to make it a doubly linked list or something else...
(Assignee)

Comment 16

2 years ago
OK, I guess I should take dbaron's approach which adds a hashtable to nsGenConList instead...
(Assignee)

Comment 17

2 years ago
(In reply to Olli Pettay [:smaug] from comment #14)
> Is it guaranteed that common use of nsWeakFrames doesn't end up causing
> similar performance issues.
> PresShell::ClearFrameRefs is after all O(n).

PresShell::ClearFrameRefs being O(n) doesn't really matter. What matters is that nsIPresShell::RemoveWeakFrame is O(n) as well, which means having lots of weak frame reference living shorter than the frame could cause performance issue.

I don't think that would be a serious issue, though, as the n there is the number of references to a single frame, which is (currently) unrelated to number of frames or DOM nodes, so it can be considered as some kind of constant factor.
(Assignee)

Comment 18

2 years ago
It looks like our hash table impl is even more expensive than I thought (in memory consumption). So let's try my current approach first.
(Assignee)

Comment 19

2 years ago
Comment on attachment 8779217 [details]
Bug 1291707 part 2 - Change the way nsGenConNodes are destroyed.

Oh, wait... nsWeakFrame is one linked list for all frames in the pres shell? Hmmm... okay, then it is definitly a performance issue...
Attachment #8779217 - Flags: review?(bzbarsky)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
(Assignee)

Updated

2 years ago
Attachment #8779217 - Attachment is obsolete: true
(Assignee)

Comment 24

2 years ago
try looks fine... but there is currently some issue with tc or treeherder so no debug build test runs... https://treeherder.mozilla.org/#/jobs?repo=try&revision=512e92970afd

Comment 25

2 years ago
mozreview-review
Comment on attachment 8779639 [details]
Bug 1291707 part 2 - Add GetAndRemove to nsDataHashtable.

https://reviewboard.mozilla.org/r/70610/#review68020

The idea is not bad, but I think the code needs some refinement.

::: xpcom/glue/nsBaseHashtable.h:154
(Diff revision 1)
> +   * @return the found value, or nullptr if no entry was found with
> +   *   the given key.

How is this reasonable for non-pointer value types, e.g. nsDataHashtable?

::: xpcom/glue/nsBaseHashtable.h:160
(Diff revision 1)
> +      UserDataType data = mozilla::Move(ent->mData);
> +      this->RemoveEntry(ent);
> +      return data;

How does this even compile for something like nsRefPtrHashtable, where it's quite unsafe?  More generally, how does this work correctly for cases where DataType is not equal to UserDataType?
Attachment #8779639 - Flags: review?(nfroyd)
(Assignee)

Comment 26

2 years ago
I believe all issues you raised here applies to the existing nsBaseHashtable::Get as well, no?
(Assignee)

Comment 27

2 years ago
mozreview-review-reply
Comment on attachment 8779639 [details]
Bug 1291707 part 2 - Add GetAndRemove to nsDataHashtable.

https://reviewboard.mozilla.org/r/70610/#review68020

> How is this reasonable for non-pointer value types, e.g. nsDataHashtable?

Actually I copied the comment from ::Get method above, so I just supposed that this kind of description works.

> How does this even compile for something like nsRefPtrHashtable, where it's quite unsafe?  More generally, how does this work correctly for cases where DataType is not equal to UserDataType?

It isn't necessary to compile everywhere thanks to SFINAE. I think it should compile whenever Get(KeyType) compiles.

Comment 28

2 years ago
mozreview-review-reply
Comment on attachment 8779639 [details]
Bug 1291707 part 2 - Add GetAndRemove to nsDataHashtable.

https://reviewboard.mozilla.org/r/70610/#review68020

> Actually I copied the comment from ::Get method above, so I just supposed that this kind of description works.

Ugh, that's terrible.  OK, so this part is at least no worse than what's already there.

> It isn't necessary to compile everywhere thanks to SFINAE. I think it should compile whenever Get(KeyType) compiles.

How is this SFINAE?  I know I've seen cases where instantiating a template with a type that doesn't fit correctly in some of the methods declared inside the class (e.g. the type doesn't have a copy constructor or whatnot) causes compiler errors...what is different about this case?
(Assignee)

Comment 29

2 years ago
mozreview-review-reply
Comment on attachment 8779639 [details]
Bug 1291707 part 2 - Add GetAndRemove to nsDataHashtable.

https://reviewboard.mozilla.org/r/70610/#review68020

> How is this SFINAE?  I know I've seen cases where instantiating a template with a type that doesn't fit correctly in some of the methods declared inside the class (e.g. the type doesn't have a copy constructor or whatnot) causes compiler errors...what is different about this case?

Oh, hmm, I somehow mixed up this patch with some other patch I wrote today... So no, no SFINAE here.

But how could it be worse than ::Get again? ::Get(KeyType) uses copy ctor, right? And Move() would failback to copy ctor if move ctor does not exist. So this method should compile whenever ::Get(KeyType) compiles.

Comment 30

2 years ago
mozreview-review-reply
Comment on attachment 8779639 [details]
Bug 1291707 part 2 - Add GetAndRemove to nsDataHashtable.

https://reviewboard.mozilla.org/r/70610/#review68020

> Oh, hmm, I somehow mixed up this patch with some other patch I wrote today... So no, no SFINAE here.
> 
> But how could it be worse than ::Get again? ::Get(KeyType) uses copy ctor, right? And Move() would failback to copy ctor if move ctor does not exist. So this method should compile whenever ::Get(KeyType) compiles.

OK, so ::Get copy-ctor's from DataType to UserDataType if those types are the same, but it implicitly converts from DataType to UserDataType if those two are different.  Very different operations in those cases, and both are OK.

::GetAndRemove does something very different in the second case: it does an implicit conversion from DataType to UserDataType, and then the underlying data actually goes away.  So in the case of nsRefPtrHashtable, where DataType == RefPtr<T> and UserDataType == T*, we're going to get the refcounted pointer, Release() the pointer, and then return the pointer, which can be dangerous: we may have Release()'d the only remaining reference, and we'd be returning a dangling pointer.  (I'm not entirely sure how that compiles, given that we prohibited implicit conversion to pointers from temporary RefPtr values...surely the compiler doesn't decide to use the copy constructor here when there's a move constructor available?)  The same argument applies for nsClassHashtable, where DataType == nsAutoPtr<T> and UserDataType == T*.

I think I would be OK with putting this as a method in nsDataHashtable; I'm skeptical it can be made to work in nsBaseHashtable.
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment on attachment 8779216 [details]
Bug 1291707 part 1 - Change nsGenConList::Remove to Destroy and make it private.

https://reviewboard.mozilla.org/r/70238/#review68214

r=me
Attachment #8779216 - Flags: review?(bzbarsky) → review+

Comment 35

2 years ago
mozreview-review
Comment on attachment 8779639 [details]
Bug 1291707 part 2 - Add GetAndRemove to nsDataHashtable.

https://reviewboard.mozilla.org/r/70610/#review68300

Bleh, I don't know that's necessarily much better; it's hard to detect the failure case for "key does not exist" for non-trivial data types.  (It happens to do the right thing for pointers...)  WDYT about:

```
mozilla::Maybe<DataType> GetAndRemove(KeyType aKey)
{
  mozilla::Maybe<DataType> value;
  if (EntryType* ent = this->GetEntry(aKey)) {
    DataType data = mozilla::Move(ent->mData);
    this->RemoveEntry(ent);
    value.emplace(mozilla::Move(data));
  }
  return value;
}
```

Not quite as ergonomic on the caller side as what you have, but more general, I think.  That would actually be suitable for inclusion in `nsBaseHashtable`, if you wanted to go that route.  r=me on that version, and an updated documentation comment.
Attachment #8779639 - Flags: review?(nfroyd) → review+
Comment on attachment 8779640 [details]
Bug 1291707 part 3 - Use hashtable to track nsGenConNodes related to a frame.

https://reviewboard.mozilla.org/r/70612/#review68462

::: layout/base/nsGenConList.cpp:36
(Diff revision 1)
>  
>  bool
>  nsGenConList::DestroyNodesFor(nsIFrame* aFrame)
>  {
> -  if (!mFirstNode)
> -    return false; // list empty
> +  if (nsGenConNode* node = mNodes.GetAndRemove(aFrame)) {
> +    // Clear following nodes first as they must not be mFirstNode.

Please assert that node->mPseudoFrame == aFrame.

::: layout/base/nsGenConList.cpp:39
(Diff revision 1)
>  {
> -  if (!mFirstNode)
> -    return false; // list empty
> -  nsGenConNode* node;
> -  bool destroyed = false;
> -  while (mFirstNode->mPseudoFrame == aFrame) {
> +  if (nsGenConNode* node = mNodes.GetAndRemove(aFrame)) {
> +    // Clear following nodes first as they must not be mFirstNode.
> +    // If mFirstNode refers to a node of this frame, it should be the
> +    // one retrieved from mNodes.
> +    nsGenConNode* curNode = Next(node);

curNode is unused outside the while loop, right?

Perhaps it should be a for loop, with curNode scoped to it:

    for (nsGenConNode* curNode = Next(node);
         curNode->mPseudoFrame == aFrame && curNode != node;
         /* curNode updated in loop body */;

or something.

::: layout/base/nsGenConList.cpp:40
(Diff revision 1)
> -  if (!mFirstNode)
> -    return false; // list empty
> -  nsGenConNode* node;
> -  bool destroyed = false;
> -  while (mFirstNode->mPseudoFrame == aFrame) {
> -    destroyed = true;
> +  if (nsGenConNode* node = mNodes.GetAndRemove(aFrame)) {
> +    // Clear following nodes first as they must not be mFirstNode.
> +    // If mFirstNode refers to a node of this frame, it should be the
> +    // one retrieved from mNodes.
> +    nsGenConNode* curNode = Next(node);
> +    while (curNode->mPseudoFrame == aFrame && curNode != node) {

This assumes that all the nodes with the given mPseudoFrame are contiguous in the list.  What guarantees that?  It's entirely non-obvious to me that this guarantee holds.

I guess if we never have bugs in our compare-tree-position code (which I have _very_ low confidence in, by the way) then the fact that distinct frames map to distinct (nsIContent, pseudoType) pairs is probably enough to guarantee this invariant?  If that's the case, please document this clearly.

The failure mode here if this contiguous thing does not hold is dangling pointers, so this part really scares me.

::: layout/base/nsGenConList.cpp:53
(Diff revision 1)
> -      node = nextNode;
> +    return true;
> -    } else {
> -      node = Next(node);
> -    }
>    }
> -  return destroyed;
> +  return false;

It might make sense to write this function as:

    nsGenConNode* node = mNodes.GetAndRemove(aFrame)
    if (!node) {
      return false;
    }

and then outdent the rest of it.  Either way, though.

::: layout/base/nsGenConList.cpp:167
(Diff revision 1)
>    }
>    ++mSize;
>  
> +  // Set the mapping only if it doesn't exist before, or the new one
> +  // was inserted before all existing nodes of that frame.
> +  nsGenConNode* firstNodeOfFrame = mNodes.Get(aNode->mPseudoFrame);

I believe you could avoid this hashtable lookup if you made the condition around the `Put` call be:

    if (aNode == mFirstNode ||
        Prev(aNode)->mPseudoFrame != aNode->mPseudoFrame) {

Either way, it feels like it's worth asserting the equivalence of these two conditions...

r=me as long as the contiguous thing actually holds and we document why it holds.
Comment on attachment 8779218 [details]
Bug 1291707 part 4 - Not recalc quotes and counters in EndUpdate.

https://reviewboard.mozilla.org/r/70242/#review68476

::: layout/base/nsCSSFrameConstructor.cpp:8615
(Diff revision 2)
> -    // mUpdateCount, recalc quotes and counters as needed.
> -
> -    RecalcQuotesAndCounters();
> -    NS_ASSERTION(mUpdateCount == 1, "Odd update count");
> -  }
>    NS_ASSERTION(mUpdateCount, "Negative mUpdateCount!");

After this change, mUpdateCount is only read by assertions, so it can probably become a debug-only member.  We could also consider nixing it and EndUpdate entirely; BeginUpdate would still be needed to manage the generation counters, I guess...

Anyway, I _think_ this is safe.  I really wish I had more confidence in that thought.  :(

r=me but watch out for weird crash issues or whatnot.  :(
Attachment #8779218 - Flags: review?(bzbarsky) → review+
Comment on attachment 8779640 [details]
Bug 1291707 part 3 - Use hashtable to track nsGenConNodes related to a frame.

https://reviewboard.mozilla.org/r/70612/#review68484
Attachment #8779640 - Flags: review?(bzbarsky) → review+
(Assignee)

Comment 39

2 years ago
mozreview-review-reply
Comment on attachment 8779639 [details]
Bug 1291707 part 2 - Add GetAndRemove to nsDataHashtable.

https://reviewboard.mozilla.org/r/70610/#review68300

That still wouldn't be suitable for nsBaseHashtable, because nsBaseHashtable has DataType and UserDataType, while in nsDataHashtable, they are the same.

The question is whether we want to expose a method in nsBaseHashtable which returns DataType rather than UserDataType.
(Assignee)

Comment 40

2 years ago
mozreview-review-reply
Comment on attachment 8779640 [details]
Bug 1291707 part 3 - Use hashtable to track nsGenConNodes related to a frame.

https://reviewboard.mozilla.org/r/70612/#review68462

> This assumes that all the nodes with the given mPseudoFrame are contiguous in the list.  What guarantees that?  It's entirely non-obvious to me that this guarantee holds.
> 
> I guess if we never have bugs in our compare-tree-position code (which I have _very_ low confidence in, by the way) then the fact that distinct frames map to distinct (nsIContent, pseudoType) pairs is probably enough to guarantee this invariant?  If that's the case, please document this clearly.
> 
> The failure mode here if this contiguous thing does not hold is dangling pointers, so this part really scares me.

If you are not confident with that... I guess we can do something different. Probably maintain another linked list for nodes of a frame?
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
(Assignee)

Comment 48

2 years ago
Comment on attachment 8779639 [details]
Bug 1291707 part 2 - Add GetAndRemove to nsDataHashtable.

Addressed the compile error on non-MSVC compilers.
Attachment #8779639 - Flags: review+ → review?(nfroyd)
(Assignee)

Comment 49

2 years ago
Comment on attachment 8779640 [details]
Bug 1291707 part 3 - Use hashtable to track nsGenConNodes related to a frame.

Added some assertions for ensuring the invariant.

ni? myself for asking :bz to review the new comment and assertions when he backs from vacation.
Flags: needinfo?(xidorn+moz)
Attachment #8779640 - Flags: review+
Comment on attachment 8779639 [details]
Bug 1291707 part 2 - Add GetAndRemove to nsDataHashtable.

Apparently MozReview doesn't like you asking for review on something that's already been reviewed. :(
Attachment #8779639 - Flags: review?(nfroyd) → review+
(Assignee)

Comment 51

2 years ago
(In reply to Nathan Froyd [:froydnj] from comment #50)
> Comment on attachment 8779639 [details]
> Bug 1291707 part 2 - Add GetAndRemove to nsDataHashtable.
> 
> Apparently MozReview doesn't like you asking for review on something that's
> already been reviewed. :(

No, it doesn't, sadly. That's bug 1195661.
(Assignee)

Comment 52

2 years ago
And the compilers really don't like me either :(
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment on attachment 8779640 [details]
Bug 1291707 part 3 - Use hashtable to track nsGenConNodes related to a frame.

https://reviewboard.mozilla.org/r/70612/#review71204

r=me.  Thank you for adding those assertions and the comment.
Attachment #8779640 - Flags: review+
(Assignee)

Comment 59

2 years ago
Thanks for reviewing.
Flags: needinfo?(xidorn+moz)

Comment 60

2 years ago
Pushed by xquan@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/6412d9fddb46
part 1 - Change nsGenConList::Remove to Destroy and make it private. r=bz
https://hg.mozilla.org/integration/mozilla-inbound/rev/2fff37cb7ecd
part 2 - Add GetAndRemove to nsDataHashtable. r=froydnj
https://hg.mozilla.org/integration/mozilla-inbound/rev/790a62a0a9da
part 3 - Use hashtable to track nsGenConNodes related to a frame. r=bz
https://hg.mozilla.org/integration/mozilla-inbound/rev/e74ffefc0d27
part 4 - Not recalc quotes and counters in EndUpdate. r=bz

Comment 61

2 years ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/6412d9fddb46
https://hg.mozilla.org/mozilla-central/rev/2fff37cb7ecd
https://hg.mozilla.org/mozilla-central/rev/790a62a0a9da
https://hg.mozilla.org/mozilla-central/rev/e74ffefc0d27
Status: NEW → RESOLVED
Last Resolved: 2 years ago
status-firefox51: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla51
You need to log in before you can comment on or make changes to this bug.