Closed Bug 1016545 Opened 6 years ago Closed 5 years ago

memory leak that kills Firefox in few seconds

Categories

(Core :: Disability Access APIs, defect)

29 Branch
x86_64
Windows 8.1
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla34
Tracking Status
firefox-esr31 --- wontfix

People

(Reporter: southpolenator, Assigned: tbsaunde)

References

Details

(Keywords: csectype-oom, sec-low, Whiteboard: [reporter-external])

Attachments

(2 files, 2 obsolete files)

Attached file TopCoder.zip
User Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:29.0) Gecko/20100101 Firefox/29.0 (Beta/Release)
Build ID: 20140506152807

Steps to reproduce:

Opened page specific to my account on www.topcoder.com site.


Actual results:

Firefox stopped responding. I opened task manager and firefox was using 2 cores of my cpu and memory was growing. When it hit 4GB, it crashed.


Expected results:

Attached page opens normally.
can you go into "about:crashes" in your awesomebar and link to the crash please? This may just been a normal OOM and not a security issue.
Flags: needinfo?(southpolenator)
Can you reproduce this using the saved testcase? I can't seem to reproduce anything interesting loading this locally.

In general we don't treat "safe" OOM as an important bug, since there are so many ways to DOS a browser.
(In reply to Curtis Koenig [:curtisk] from comment #1)
> can you go into "about:crashes" in your awesomebar and link to the crash
> please? This may just been a normal OOM and not a security issue.

https://crash-stats.mozilla.com/report/index/bp-d44df05c-d3dd-4fcd-af1c-6bf2e2140527
Flags: needinfo?(southpolenator)
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #2)
> Can you reproduce this using the saved testcase? I can't seem to reproduce
> anything interesting loading this locally.
> 
> In general we don't treat "safe" OOM as an important bug, since there are so
> many ways to DOS a browser.

9/10 times I open htm file from zip archive Firefox crashes.
> https://crash-stats.mozilla.com/report/index/bp-d44df05c-d3dd-4fcd-af1c-6bf2e2140527

That's an OOM crash alright, under accessiblity code.  Also, it's a 4096 byte allocation, and we have a fair amount of virtual/physical RAM free (133758976 and 1414770688 respectively), so not being able to allocate a single page is pretty odd..
Component: General → Disability Access APIs
Whiteboard: [reporter-external]
133MB available VM isn't that much, especially if there's fragmentation. dmajor can you check the VM stats and see if/how many 1MB blocks are available?

Safe-OOM, opening.
Group: core-security
Flags: needinfo?(dmajor)
Free=128M Tiny=48M Misaligned=7M Usable=42M Other=29M
(Categories explained in bug 1001760 comment 4)

So, we could have done a little bit better, but at 128MB I would absolutely expect a failure, even with the upcoming allocator improvements. Due to Breakpad reservation, that was actually only 99MB free at the time of crash, and in the worst case the allocator might have needed a new 1MB contiguous chunk even if the request was 4096 bytes.
Flags: needinfo?(dmajor)
(In reply to Vuk Jovanovic from comment #4)
> 9/10 times I open htm file from zip archive Firefox crashes.

I don't see any abnormal memory usage with this file. Maybe there is an extension involved. Does it crash if you start Firefox in Safe Mode?

Also, can you save a report from about:memory slightly before you reach 4GB?
The stacks indicate that he has accessibility code enabled (which isn't just used by full-on screenreaders but also various utility tools).
(In reply to David Major [:dmajor] (UTC+12) from comment #8)
> (In reply to Vuk Jovanovic from comment #4)
> > 9/10 times I open htm file from zip archive Firefox crashes.
> 
> I don't see any abnormal memory usage with this file. Maybe there is an
> extension involved. Does it crash if you start Firefox in Safe Mode?
> 
> Also, can you save a report from about:memory slightly before you reach 4GB?

I've started Firefox in Safe mode.
First opening of attached page succeeds in few seconds during which Firefox is frozen.
Second opening of attached page causes OOM and crash. During OOM Firefox is frozen and I'm not able to save report from about:memory. I could take process memory dump, but it would be huge...

As for accessibility features, I've not enabled anything on Windows (default installation), but I have touch screen, so Windows might have set few features on by default.
> As for accessibility features, I've not enabled anything on Windows (default
> installation), but I have touch screen, so Windows might have set few
> features on by default.

That was the key, thanks! I was able to reproduce the issue using attachment 8429489 [details] on a borrowed Microsoft Surface with the latest Nightly. 

In the stack at bp-d44df05c-d3dd-4fcd-af1c-6bf2e2140527, we never go up past frame 19, PL_DHashTableEnumerate. There are thousands of entries in that table, and each entry causes a deep NextChildInternal walk where FragmentOrElement::GetChildren allocates memory in several ways. This pegs the CPU until the allocator crashes.

Alexander, can your team take a look?
Flags: needinfo?(surkov.alexander)
Cc'ing Trev for ideas
Flags: needinfo?(surkov.alexander)
I'm curious if it could be worse form of bug 759236
btw, it waits for a while for me but finally loads the test
(In reply to alexander :surkov from comment #14)
> btw, it waits for a while for me but finally loads the test

page reload takes much longer, memory goes up to 10gb but after minutes it unfreezes and gets into much more robust state - 2,7GB. I tried to profile it but having no luck, it seems not showing anything useful
https://people.mozilla.org/~bgirard/cleopatra/#report=3da1746188a4a734c1df9adfd2e9fe3442ab1abc

I guess because of heavy mutations, we create a lot of event objects and place them into single array but it's just hypothesis.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Trevor has been debugging this and will try a few things before giving up.
Assignee: nobody → trev.saunders
What seems to be happening is that we create a ton of content lists and non of them go away because even though there refcount goes to 0 we keep them alive so snow white can delete them.
Attachment #8457135 - Flags: review?(bzbarsky)
Duplicate of this bug: 1039677
Attachment #8457135 - Attachment is obsolete: true
Attachment #8457135 - Flags: review?(bzbarsky)
Attachment #8458361 - Flags: review?(bzbarsky)
I'll probably not get to this before Monday.
Comment on attachment 8458361 [details] [diff] [review]
Provide an iterator that iterates over all children of an element

I'm sorry for the lag here.  :(  Next round will be much faster.

>+++ b/accessible/base/TreeWalker.cpp

So in the old setup, we took a snapshot of the child list on the first call to NextChildInternal for a particular parent, but now we'll walk the live list.  Are we OK with that?

>+++ b/content/base/src/ChildIterator.cpp
>+FlattenedChildIterator::FlattenedChildIterator(nsIContent* aParent,
>+                                               bool aIgnoreXBL)

The whole point of FlattenedChildIterator is to not ignore XBL.  You're supposed to use ExplicitChildIterator if you want to ignore it.

Can we not just use different classes in the a11y code for these two cases (e.g. with Maybe<>s for the two iterators and a pointer that points to whichever one we actually constructoed)?

If we do keep this setup, please document that this argument is only needed because we have a class which sometimes wants to consider XBL and sometimes not.  :(

No matter what, the new iterator class needs documentation that explains what it does.  And its members need documentation, as do the values of IteratorPhase.

>+AllChildrenIterator::GetNextChild()
>+    if (mAnonKids.Length()) {

!mAnonKids.IsEmpty()

We need very clear documentation about how doing anything that might change the frame tree while iterating via this iterator is totally not OK.  And maybe add asserts if we can.  :(  Is there a reason we're not just holding strong refs to the anon content we return?  If we did, we could even use nsBaseContentList and just advance our index into it..

Why is the anon kid phase called eNeedAnonKid instead of eNeedAnonKids, similar to eNeedExplicitKids?

You may want to talk to Mats Palmgren about how this stuff is supposed to interact with display-box:contents (see bug 907396).  Followup for sorting that out is probably OK.

>+++ b/content/base/src/ChildIterator.h
>+  nsIFrame* mFrame;

This is unused, right?

>+++ b/layout/forms/nsTextControlFrame.cpp
>+  nsIContent* placeHolder = txtCtrl->GetPlaceholderNode();

"placeholder", with lowercase 'h'.
Attachment #8458361 - Flags: review?(bzbarsky) → review-
> >+++ b/accessible/base/TreeWalker.cpp
> 
> So in the old setup, we took a snapshot of the child list on the first call
> to NextChildInternal for a particular parent, but now we'll walk the live
> list.  Are we OK with that?

Since mutations are totally not allowed those are the same thing right?

> >+++ b/content/base/src/ChildIterator.cpp
> >+FlattenedChildIterator::FlattenedChildIterator(nsIContent* aParent,
> >+                                               bool aIgnoreXBL)
> 
> The whole point of FlattenedChildIterator is to not ignore XBL.  You're
> supposed to use ExplicitChildIterator if you want to ignore it.

yeah, that bit is certainly hacky

> Can we not just use different classes in the a11y code for these two cases
> (e.g. with Maybe<>s for the two iterators and a pointer that points to
> whichever one we actually constructoed)?

I want to allocate the iterator within the context object to save allocations.  It probably wouldn't hurt too much to have the context have space for two iterators, but that would mean either we add two new iterator classes that have no easy way to share there implementation, or we do something specific to the a11y use of GetChildren() which may or may not be a good idea.

> If we do keep this setup, please document that this argument is only needed
> because we have a class which sometimes wants to consider XBL and sometimes
> not.  :(

yeah, I guess we could use a protected ctor instead of the default arg to help make that clear.

> No matter what, the new iterator class needs documentation that explains
> what it does.  And its members need documentation, as do the values of
> IteratorPhase.

yeah, fair

> >+AllChildrenIterator::GetNextChild()
> >+    if (mAnonKids.Length()) {
> 
> !mAnonKids.IsEmpty()
> 
> We need very clear documentation about how doing anything that might change
> the frame tree while iterating via this iterator is totally not OK.  And
> maybe add asserts if we can.  :(  Is there a reason we're not just holding
> strong refs to the anon content we return?  If we did, we could even use
> nsBaseContentList and just advance our index into it..

the a11y TreeWalker can be way too slow in some cases for example the test case for this bug.  I'm pretty sure we really need to make algorithmic changes to get somewhat reasonable performance here, but for now I'm looking for anything I can do to make it faster without redesigning accessible tree update.

> Why is the anon kid phase called eNeedAnonKid instead of eNeedAnonKids,
> similar to eNeedExplicitKids?
I think at one point I thought there was only ever one anon kid, so there's no good reason and we should probably change it.

> You may want to talk to Mats Palmgren about how this stuff is supposed to
> interact with display-box:contents (see bug 907396).  Followup for sorting
> that out is probably OK.

ok, I'd need to spend some time figuring out what that's about before saying anything useful.

> >+++ b/content/base/src/ChildIterator.h
> >+  nsIFrame* mFrame;
> 
> This is unused, right?

yeah, I think I was thinking about caching the frame so the eNeedAfterKids phase wouldn't need to call GetPrimaryFrame, but I guess I didn't do that and I'm not sure how much it helps.
> Since mutations are totally not allowed those are the same thing right?

Yes, agreed.

> I want to allocate the iterator within the context object to save allocations. 

Sure.  Hence my suggestion for Maybe, so it doesn't need a separate heap allocation.

I guess the issue is you want both of your class to have the custom GetNextChild, ok.

> yeah, I guess we could use a protected ctor instead of the default arg to help make that
> clear.

If that can be done easily, that would be nice, yes.  We'd need to move the xlb-init bits out of the ctor and into an init method called from different ctors, because we can't delegate to a different ctor yet.

> the a11y TreeWalker can be way too slow in some cases

OK.  And we're very super sure the frame tree never changes during this iteration?

> I'd need to spend some time figuring out what that's about before saying anything useful

As I said, followup for this part is fine.

GetPrimaryFrame() is a single inlined property read, so there's no particular need to cache it.
> > I want to allocate the iterator within the context object to save allocations. 
> 
> Sure.  Hence my suggestion for Maybe, so it doesn't need a separate heap
> allocation.
> 
> I guess the issue is you want both of your class to have the custom
> GetNextChild, ok.

yeah, that's the issue, I guess they could share a static function or something but passing the state around sounds sucky.

> > yeah, I guess we could use a protected ctor instead of the default arg to help make that
> > clear.
> 
> If that can be done easily, that would be nice, yes.  We'd need to move the
> xlb-init bits out of the ctor and into an init method called from different
> ctors, because we can't delegate to a different ctor yet.

yeah

> > the a11y TreeWalker can be way too slow in some cases
> 
> OK.  And we're very super sure the frame tree never changes during this
> iteration?

yes, though of course being able to assert this would be nice.

> > I'd need to spend some time figuring out what that's about before saying anything useful
> 
> As I said, followup for this part is fine.
> 
> GetPrimaryFrame() is a single inlined property read, so there's no
> particular need to cache it.

ok
Attachment #8458361 - Attachment is obsolete: true
Attachment #8464046 - Flags: review?(bzbarsky)
Comment on attachment 8464046 [details] [diff] [review]
Provide an iterator that iterates over all children of an element

>+    if (mAnonKids.Length()) {

Still need !mAnonKids.IsEmpty() here.

r=me with that.
Attachment #8464046 - Flags: review?(bzbarsky) → review+
https://hg.mozilla.org/mozilla-central/rev/2522daeec7f7
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla34
Flags: qe-verify-
You need to log in before you can comment on or make changes to this bug.