Closed Bug 193011 Opened 22 years ago Closed 20 years ago

Some tabs unresponsive/won't close when many tabs are open: followup from bug 156405

Categories

(Core :: Layout: Images, Video, and HTML Frames, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: benjamin, Assigned: jesup)

References

()

Details

(Keywords: regression, testcase)

Attachments

(2 files, 3 obsolete files)

This is a followup bug from bug 156405 (Tabbed browsing frequently crashes
Mozilla - Trunk M130A [@ nsXULWindow::ContentShellAdded]). I fixed the crash in
that bug, (see bug 156405 comment 54) but fixing the underlying cause is more
challenging.

Testcase from previous bug:
a) load http://komodo.mozilla.org/buster/random/random.html
b) load above url again in new tab
c) (continue to do b) until you reach the maximum amount of tabs, then if you
haven´t crashed, try switching to the last tab or second last tab, the last part
may help trigger the crash)

Quoting myself from the previous bug:

I think that this bug was probably introduced by the fix for bug 98158. The code at:

http://lxr.mozilla.org/seamonkey/source/content/base/src/nsFrameLoader.cpp#342
parentAsItem->GetSameTypeRootTreeItem(getter_AddRefs(root));

may be returning a chrome docshellThis causes the MAX_NUMBER_DOCSHELLS check to
count all of the docshells in chrome, which is not what was intended.
No longer depends on: 156405
Status: NEW → ASSIGNED
Priority: -- → P2
Target Milestone: --- → mozilla1.4alpha
Do you mean bug 186099 ?
Blocks: 186099
Flags: blocking1.4a?
Flags: blocking1.4a? → blocking1.4a-
I, too, am experiencing this problem.  Here's some more information:

The problem occurred after I opened my second or third tab, and instead of
bringing up my home page, it came up blank.  I typed in the home page address
manually, and it came up, but when I navigated away from that page, its name
remained in the tab title.  The tab cannot be closed in any fashion besides,
presumably, closing the browser window entirely.  Also, while browsing around my
home site, the "back" and "forward" navigation buttons were inactive, and I
couldn't navigate with hotkeys, either.
*** Bug 211427 has been marked as a duplicate of this bug. ***
*** Bug 213532 has been marked as a duplicate of this bug. ***
I'm busy with my extension manager proposal, and fixing this isn't as easy for
me as I first thought: I don't know the code here very well. Someone else is
welcome to take this.
Keywords: helpwanted
Priority: P2 → --
Target Milestone: mozilla1.4alpha → ---
Yes, this is caused by the patch for bug 98158.

The problem is that it's counting all docshell's under the root shell, not all
docshells withing the tab (document).  So we're limiting docshells among all the
tabs, instead of the docshells in a single document.  Obviously, with lots of
tabs it's a lot easier to hit the problem without having a recursive frameset
(which is what but 98158 was attempting to solve in an indirect way).

Either we need to count docshells in the tab, or count the current docshell
depth in the tab, or find a more direct way to measure frameset depth/recursion.
Adding people involved with bug 98158 which is the root of this problem.
Adding a URL that shows the problem quickly.
Flags: blocking1.5b?
Can we just revert the patch for bug 98158 and instead fix the old code some
other way?  (Maybe it should ignore the ref (after the "#") part of the url?)
Hmmpf.  The old code would have avoided the problem I think (of the frame
loading itself recursively).  Jack's patch removed the old recursion check for
the same URL.  Later, mkaply and jkeiser in bug 175220 removed the 8 frame deep
limit added in bug 8065.  Either would have stopped the recursion before we ran
out of docshells.

The docshell limit might be ok on it's own IF if counted only the docshells in a
tab.  Instead, it counts all the docshells in all the tabs.

Solutions:
1) revert back to including both old recursion checks
2) fix this to only count docshells in a single tab
3) Find some other way to deal with the recursion issue
4) Add the same-url (and depth?) patches back in to wallpaper this particular
case - but that doesn't solve the "I have too many tabs open" problem.
We (IBM) have a requirement to remove this limit as well.

Here is some background:

The current limit is set to 100 DocShells which does not correspond to 100
frames, because of some overhead. This is far too low for the frame caching
technique which we use in Webadmin 7. 

To give you an idea how extensively we use frames, in the grafic below you can
see that we currently use 9 levels of frameset's to display the conent for the
files panel. The need for so many frames has two reasons:
1) frame caching does not discard the content it just keeps it so that when the
user comes back to a page he was before, the content is not reloaded from the
server, just the frame is displayed which contains the page.
2) dynamically adding frames to a frameset will reload the already existing
frames in the frameset and DHTML content will disappear. Because of this
problem, which will not be solved in the near future, see the mail from Boris, I
use a technique which gives me the same behaviour like dynacally adding frames
however with some additional frames and also some wasting of frames.

Because of these conditions, I reach the current limit very easily. When this
limit is not raised to a much higher number or better to unlimited, I have to
write additional code for Mozilla to not use frame caching for the most areas
and the Mozilla user will have a much slower experience with the application
than an IE user because IE does not have this limit.
I'm going to take this one
Assignee: bsmedberg → rjesup
Status: ASSIGNED → NEW
please cc me when you take bugs, thanks :)
This bug is about too many problems now.

comment 0: "Too many tabs open" -- the The best solution, as always, is for
Mozilla to simply not crash when it runs out of memory.  It should just stop
loading pages.  All other solutions are hacks around that complex problem. 
Perhaps it is easy to fix; perhaps not.

comment 2: "My homepage doesn't show up" -- we seem to be saying that this is
the exact opposite problem as comment 0: it says that we count docshells from
the root rather than per-tab (when we *explicitly* wrote the patch to count them
per-tab).  Which is happening?

If we are really counting docshells per-root, it just needs to be fixed per-tab.

comment 11: limit is too small.  Once the tab limit goes away, this could still
be a problem.  But I'm not clear on what you mean, mkaply, when you say
docshells and frames don't quite correspond: are you just talking about the
docshells for the browser and other chrome XUL, or are you saying there are
other docshells inside the browser?  Perhaps a way to solve this is to use a
heuristic and set the number of frames based on available memory.  Or we could
just increase the number of frames.

Possible things you can do to keep recursion crashes from happening:
(1) don't crash when you run out of memory--just stop loading frames.  Best
solution.
(2) limit same-URLs, preferably be searching the entire ancestor list for the
same URL.
(3) limit total depth of tree
(4) limit total number of frames in tree

IMO, I do not believe (3) is a good idea.  It does not accomplish the aims we
set out for, which is specifically to keep memory down by having as few
docshells around as possible; and a limit that mattered (like 8) could well
restrict apps like mkaply's.  Assuming (1) is judged to be Too Hard Right Now
(I'm not sure it is, but I haven't investigated either), (2) alone is probably a
good solution and (2)+(4) is a little more air-tight.
Our old solution was jkeiser's (2) + (3) (limit same-url's and limit depth of
framesets).

(2) + (4) would mean limiting same-url's plus limiting docshells-per-tab.  Note
that the current code does not limit docshells-per-tab, but
docshells-per-window.  I'm not sure why that is - feel free to look at the bug
98158 patch and suggest a correction.

(2) + (3) does allow us to run out of memory, if they use a hugely-wide but not
deep set of frames.  However, there are probably any number of other ways to run
us out of memory if you really want to.  (2) + (3) catches pretty much all the
accidental cases and rarely will limit anything done intentionally.

(2) + (3) + (4) would be a possibility as well.

(1) (Not crashing when we run out of memory) requires either limiting memory use
to something below the real limit (which is generally variable), and then very
careful coding to keep things alive in that case, perhaps involving "ripcord"
memory or allowing the allocator to go over the artificial limit for certain
(classes of) allocations.  We did this on the Amiga, and making a system solid
in the face of out-of-memory requires FAR more attention to error-handling
detail (and design) than we (Mozilla) have, let alone the OS's we run under. 
(It does no good for you to not crash if the OS or something else important
running crashes.)  C++/etc just makes it worse, I'm afraid (allocators are
assumed to succeed, implicit, etc, especially without lots of exception code).
any chance that we can get this sorted out for 1.5?

thinking that minimo might want some of these limits too...
I'm actively working on a patch for (2) + (3) (because it's easy).  I can add in
(4) or do it instead of (3) if I can figure out why the bug 98158 patch counted
all tabs...
Your argument for (2)+(3) makes sense.  I guess we're not really so much trying
to catch the malicious people as the dumb ones.  Say we give it a slightly
deeper limit than before, like 12?  That means that accidental recursion of 2
would give 2^12, or 1024, frames.
Right, pretty much what dbaron and I had thought.  I have a patch (mostly from
backing out the other patches) that I now need to debug (breaks all loading,
probably something stupid).  Hopefully I'll have it done and up today.
Status: NEW → ASSIGNED
Oh, and I'm trying to solve the problem that caused jack.jia to want to change
how the recursion handling was done as well (<frame src="#blah">).
Ok, I have a working patch, and it works for the testcase that jack.jia created
for bug 98158 (http://bugzilla.mozilla.org/attachment.cgi?id=90267&action=view)
(warning, following that link can be painful - I put it on a local server).

With my (2) + (3) patch (plus new fixes for bug 98158) it creates around 580
webshells before stopping (with a total mozilla VM size of ~55MB (non-opt Linux
build)).  This is with 3 identical URL's in a branch to cause a halt, and max
depth of 10.  Note that the testcase did not hit the depth limit, nor does the
URL for this bug - both are kicked out by the identical URL code (which is now
ignoring #foo).

Should I consider URL's with ? parameters (or ';' parameters?) to always be
different, and only stop those on depth?  Certainly a CGI (even with identical ?
parameters) _could_ return different content.

I'll post the first-cut of the patch soon.
Keywords: helpwanted
BTW, I can leave in the "count docshells" code if people prefer (i.e. (2) + (3)
+ (4)), but with a much higher number, say 4000 or 10000.

I doubt that would affect real uses like IBM's (mkaply?) (even if a site in one
tab that accidentally recurses and uses up 500 or 1000 docshells), but would
limit intentionally evil uses of frames.  I'd prefer not to leave the docshell
limit in without a fix for the tab issue though, even with a higher limit, and
as I stated before this is only one out of many ways you can purposely run
mozilla out of memory.  Most of them probably would do it faster than this
would, so I'm not sure we gain a lot by leaving the docshell limit in.

Flags: blocking1.5b? → blocking1.5b+
Attached patch Patch for (2) + (3) (obsolete) — Splinter Review
note: does not include code to special-case URL's with ? parameters.  Other
than that it's complete and solves both this case and the bug 98158 testcase.
Now considers all URL's with query portions to be different even if other
entries in the tree have the same URL and query.  Ready for review.
Attachment #129297 - Attachment is obsolete: true
Comment on attachment 129300 [details] [diff] [review]
Patch for review; includes handling ?

Review requests in.  If jst is still around, he should look at this too.
Attachment #129300 - Flags: superreview?(dbaron)
Attachment #129300 - Flags: review?(john)
Now considers identical ?foo=bar queries as identical.	jkeiser tells me that
the spec states that a site _should_ return identical data for the same query. 
Still strips off #foo anchors.	Cleaned up the final test as well.
Attachment #129300 - Attachment is obsolete: true
Attachment #129304 - Flags: superreview?(dbaron)
Attachment #129304 - Flags: review?(john)
Attachment #129300 - Flags: superreview?(dbaron)
Attachment #129300 - Flags: review?(john)
hmm... what if the prepath of the URL changes, but the path does not?  are you
checking "scheme://username:password@host:port/" elsewhere?
I'm checking both the prepath and the filepath (and for queries, the query
portion).  I only consider something identical if both (or all 3 for queries)
match.  So yes, the prepath is being checked.
Component: Tabbed Browser → Layout: HTML Frames
Comment on attachment 129304 [details] [diff] [review]
Patch to consider identical ?'s the same

Generally instead of "if (NS_FAILED(rv)) return rv;" Mozilla uses
NS_ENSURE_SUCCESS(rv, rv);

If you did that on purpose to avoid an extra warning, go ahead and leave it
that way; I like doing it that everywhere but not everyone does.
Attachment #129304 - Flags: review?(john) → review+
Why does MAX_SAME_URL_CONTENT_FRAMES need to be > 1?
Comment on attachment 129304 [details] [diff] [review]
Patch to consider identical ?'s the same

>+  nsCOMPtr<nsIURL> aURL(do_QueryInterface(uri, &rv)); // QI can fail
>+  if (NS_SUCCEEDED(rv)) {
>+    rv = aURL->GetFilePath(filepath);
>+    if (NS_FAILED(rv)) return rv;
>+    rv = aURL->GetQuery(query);
>+    if (NS_FAILED(rv)) return rv;

What about nsIURL::GetParam?  Or do we just assume nobody uses that?

>+  }

Should we have something to handle the case of not-an-nsIURL?  Perhaps use
uri->GetPath(filepath)?  (Or might that contain a ref?)

(Any changes here would require parallel changes below, and perhaps changes to
the comparison...)
nsURI::GetPath: that can/will include ref's, which is why I'm not using it by
default.  For not-a-URL (what causes those?  Chrome?), we could just use GetPath().

nsURL::GetParam: you're correct, I could check that as well for being identical.
 Can't hurt (even though they're rarely used), I'll update and upload.  (I'll
also switch to NS_ENSURE_SUCCESS()).

MAX_SAME_URL_CONTENT_FRAMES - I didn't select the value of 3; that was the value
we used before we removed the check.  I basically undid bug 98158 and the later
removals of the other two checks, and then modified the code to handle the case
that bug 98158 was trying to fix (which is the "ignore refs when checking
equality" stuff).  
Updated patch.	Looks at (entire) paths for non-URL's, uses
NS_ENSURE_SUCCESS(), checks URL 'params' (";foo") for being identical.
Attachment #129304 - Attachment is obsolete: true
Attachment #129377 - Flags: superreview?(dbaron)
Attachment #129377 - Flags: review?(john)
Attachment #129304 - Flags: superreview?(dbaron)
Comment on attachment 129377 [details] [diff] [review]
Updated patch for review

sr=dbaron, although you may want to run the revisions by darin.
Attachment #129377 - Flags: superreview+
Comment on attachment 129377 [details] [diff] [review]
Updated patch for review

>Index: content/base/src/nsFrameLoader.cpp

> #include "nsIURI.h"
>+#include "nsIURL.h"

nit: nsIURL.h includes nsIURI.h so you don't need to include it here.


>+// Bug 136580: Limit to the number of nested content frames that can have the
>+//             same URL. This is to stop content that is recursively loading
>+//             itself.  Note that "#foo" on the end of URL doesn't affect
>+//             whether it's considered identical, but "?foo" is assumed to be
>+//             different even if the "foo" is the same.
>+#define MAX_SAME_URL_CONTENT_FRAMES 3

nit: would be clearer to call out specifically what constitutes
equivalent URLs here.


>+// Bug 8065: Limit content frame depth to some reasonable level. This
>+// does not count chrome frames when determining depth, nor does it
...
>+// we'd need to re-institute a fixed version of bug 98158.
>+#define MAX_DEPTH_CONTENT_FRAMES 10

minor style nit: previous comment block had text indented.  this
one doesn't.  might be nice to make them the same :-)


>+  } else {
>+    // Not a URL, so punt and just take the whole path.  Note that if you
>+    // have a self-referential-via-refs non-URL (how?) frameset it will
>+    // recurse down to the depth limit before stopping, but it will stop.

this self-referential-via-refs non-URL can't happen with nsSimpleURI,
which is how most non-URLs are implemented.  there may be exceptions
however, especially since anyone can implement a protocol handler.


>+        rv = parentURI->GetPrePath(parentPrePath);
>+        NS_ENSURE_SUCCESS(rv,rv);
>+        nsCOMPtr<nsIURL> parentURL(do_QueryInterface(parentURI, &rv)); // QI can fail
>+        if (NS_SUCCEEDED(rv)) {
>+          rv = parentURL->GetFilePath(parentFilePath);
>+          NS_ENSURE_SUCCESS(rv,rv);
>+          rv = parentURL->GetQuery(parentQuery);
>+          NS_ENSURE_SUCCESS(rv,rv);
>+          rv = parentURL->GetParam(parentParam);
>+          NS_ENSURE_SUCCESS(rv,rv);
>+        } else {
>+          rv = uri->GetPath(filepath);
>+          NS_ENSURE_SUCCESS(rv,rv);
>+        }

ok, this code block is begging to be turned into a static
helper function.

r=darin
Attachment #129377 - Flags: review?(john) → review+
> > #include "nsIURI.h"
> >+#include "nsIURL.h"
> 
> nit: nsIURL.h includes nsIURI.h so you don't need to include it here.

The patch uses both types, though -- better to include headers for explicit type
dependencies.

/be
Comment on attachment 129377 [details] [diff] [review]
Updated patch for review

Looking for approval for 1.5b.	I'll make a version for 1.4.x as well.
Attachment #129377 - Flags: approval1.5b?
Comment on attachment 129377 [details] [diff] [review]
Updated patch for review

a=asa (on behalf of drivers) for checkin to 1.5beta
Attachment #129377 - Flags: approval1.5b? → approval1.5b+
Fix checked in, I'll leave the bug open and develop a fix for 1.4.x (which
should be trivial to do).
Flags: blocking1.4.x?
Remove blocking 1.5b since it's checked in for that.
Flags: blocking1.5b+
*** Bug 219543 has been marked as a duplicate of this bug. ***
Flags: blocking1.4.1?
*** Bug 186099 has been marked as a duplicate of this bug. ***
1.4.2 is gone... are we gonna have a 1.4.3, or can we close this bug permanently?
We wouldn't leave this open for a 1.4.x fix anyway. IF someone wants to do it,
create the fix, mark it for approval.

For now, I don't see a reason to keep this open.
Status: ASSIGNED → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
Product: Core → Core Graveyard
Component: Layout: HTML Frames → Layout: Images
Product: Core Graveyard → Core
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: