Closed Bug 228829 Opened 21 years ago Closed 19 years ago

[FIX]Crash on page with recursive frames

Categories

(Core :: Layout: Images, Video, and HTML Frames, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla1.8beta2

People

(Reporter: waazup, Assigned: bzbarsky)

References

()

Details

(Keywords: crash)

Attachments

(1 file, 1 obsolete file)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.6b) Gecko/20031208
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.6b) Gecko/20031208

The browser crashes upon visiting:
http://www.urbnet.com/anonymous/main.html



Reproducible: Always

Steps to Reproduce:
1. Visit http://www.urbnet.com/anonymous/main.html in Mozilla 1.6b
The HTML frames are recursive...
Whiteboard: DUPEME
Changing summary from "browser crashes upon visiting the above url" to "Crash on
page with recursive frames".
Keywords: crash
Summary: browser crashes upon visiting the above url → Crash on page with recursive frames
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.6b) Gecko/20031217

Related bugs might be: Bug #8065, Bug #98158, Bug #136580, but they are all fixed.
But the url http://www.urbnet.com/anonymous/main.html still makes my Mozilla
hang and eat resources or whatever, so I have to kill it. Marking as confirmed.
Assignee: general → frame
Status: UNCONFIRMED → NEW
Component: Browser-General → Layout: HTML Frames
Ever confirmed: true
Hmm.. The page has a branching factor of 6.  We allow a depth of 3 nested frames
of same URL, and a depth of 10 in general.

The problem is that 6^10 is a rather large number (and that's essentially what
you have here; the modifications introduced by the same-URL check are pretty minor).

I suggest reinstating an absolute cap on total number of docshells in addition
to the checks we have.  We can put that absolute cap rather high, but at the
moment we're attempting to render on the order of 40 million docshells....
that's not going to fly.  I would say 10,000 is a perfectly reasonable number to
absolute limit to (though that may not fix the crash either, depending on system
memory).

mkaply?  Thoughts?  I recall you guys had apps that needed thousands of
docshells at once....
Whiteboard: DUPEME
Flags: blocking1.7a?
sounds like we could take a patch for this in 1.7a if one appears soon, or
figure this out in 1.7b...  renominate if a patch appears and there is time..

dbaron says we have tried to fix this at least a few times with out much success
so far..
Flags: blocking1.7a? → blocking1.7a-
The only lack of success I recall was that IBM demanded we back out the absolute
cap we had since they had an internal site that used more frames than that cap...
Though to be fair that cap was also low enough that people with lots of tabs in
lots of windows would hit it too.... Hence my suggestion for a
rather-high-but-still-not-crashing cap.
*** Bug 232302 has been marked as a duplicate of this bug. ***
If someone has the time for the proposal from comment 4 and could do it for
1.7b, that would be great...

mkaply, some sort of response on what you guys actually use in your app would be
much appreciated; we really would rather not break it.
Flags: blocking1.7b?
I don't think we can find a solution that makes everyone happy.

The WebSphere implentation was a ton of various frames that hit the docshell
limit (not necesssarily recursive) and we also had a customer that specifically
had some deeply nested frames.

See: http://bugzilla.mozilla.org/show_bug.cgi?id=175220

No idea why that one wasn't marked fixed.
> I don't think we can find a solution that makes everyone happy.

We should make sure that no matter what our behavior is sane, then.  If that
makes some people who want insane behavior unhappy, so be it. The only question
is what constitutes sane behavior.  Clearly, wanting 40 million subframes is not
it.  Apparently wanting hundreds or even thousands is. Fine.  ;)

> See: http://bugzilla.mozilla.org/show_bug.cgi?id=175220

Actually, I'm talking about bug 193011, not bug 175220.  In bug 193011 comment
11 you say:

  "The current limit is set to 100 DocShells ... This is far too low for the
   frame caching technique which we use in Webadmin 7."

All I am asking for is a _quantitative_ estimate of what sort of limit would not
hurt your application.  Clearly 100 is too low.  "a ton" is not very
quantitative. So what's an order of magnitude estimate of the number of frames
you _do_ use?

If we can do that limit per-tab (see discussion in bug 193011), great.  If not,
we should do it per-window and set the limit high enough that people will not
hit it (as they did in bug 193011); frankly, anything that makes Webadmin work
would probably be fine there.
I'm waiting to here from the WebSphere developers to find out what their limit
was. Based on a cursory glance of the case, I think it is probably very large.

I think the main thing we need to do is simply avoid crashes. Anyone want to put
together a patch that does that?

incidentally, do we just have a faulty architecture here? How come IE can do
this OK?
In my case, as far as I can tell, the browser crashed due to the usual
"overcommit and then kill the app" thing linux does with memory allocation....

As for IE's behavior, what does it actually end up showing?  Where is it cutting
off the recursion?  If you tell me that, I may have a better idea of what it
actually does...
Response from Websphere team

Calculating the total amount of frames needed for our Domino Application
Webadmin 7 is a bit complicated. 

The limiting factors for dynamically creating frames in Mozilla are:
- a frameset cannot have more than 25 frames, any frame which is additionally
appended does not get rendered.
- adding an additional frame to a frameset does reload any already existing
frames within the frameset.

These are the facts which will not be changed by the Mozilla team as I remember
from our previous conversation.

Additionaly, the number of nested frames is currently 10 for Mozilla 1.4,1.5,1.6.

Because of all these limitations I developed a system which deals with these
limitations but gives me the flexibility to 'pseudo' dynamically add frames to a
frameset, however this system is wasting frames. This system creates a frameset
with a constant number of frames, with the frames n-1 containing content and the
n'th frame contains a frameset with n frames, with n <= 25. The limits for this
system is currently 25 (frames) * 10 (nested levels) = 250.

                 <frameset>
                     |
   +--------+--------+--------+---------+
   |        |        |        |         |
<frame>  <frame>  <frame>  <frame>   <frame>
   |        |        |        |         |
content  content  content  content  <frameset>
                                        |
                                        +--------+--------+--------+---------+
                                        |        |        |        |         |
                                     <frame>  <frame>  <frame>  <frame>  <frame>
                                        |        |        |        |        |
                                     content  content  content  content  <frameset>



When there are new considerations about limiting the frames in Mozilla, I would
suggest that you remove the limit for nested frames (or set it to 100 instead of
10) and set a limit for the total number of frames within a tab in Mozilla. A
good number for total number of frames could be 10'000. 1'000 would be too low
because with the above described system we are always wasting frames.

For IE I havn't yet found any frame limits during the development of webadmin 7.
> - a frameset cannot have more than 25 frames

Hmm...  I don't recall such a restriction in our code... This seems sorta
arbitrary...

Thank you for the information, Michael.

So what we should do test a 10,000 frame cap and see whether it makes this bug
happier.  Then we'll know where we stand.
try for final
Flags: blocking1.7b?
Flags: blocking1.7b-
Flags: blocking1.7?
*** Bug 237760 has been marked as a duplicate of this bug. ***
mkaply, if this is breaking your apps and so needs to be fixed, can you help
move it forward?
Flags: blocking1.7? → blocking1.7-
What's blocking mkaply is if we were to fix this bug in the obvious way, as far
as I know.  The current state of things is not blocking anyone except users who
get their browsers crashed.
I guess I was misinformed. I thought crash bugs would be squashed for 1.7
peter: how did you come to that conclusion?
at no time has someone from mozilla.org marked blocking+, and at present (and
for the past time) fixing this bug requires coming up w/ a solution which
doesn't break an ibm web app.
I came up with that conclusion because I read on mozillazine.org that Asa was
trying to remove all crashers for 1.7

What's more important the IBM web app or the browser not crashing on the
millions of end users who surf with it. Can't IBM just modify the source to set
the fixed limit higher and recompile for it's own purposes. I understand IBM
donates a lot of money to Mozilla, but are we actually going to ship a lower
quality product so we don't upset them. If you haven't noticed they don't seem
to be moving this bug forward. Why don't we fix it, and if IBMs web app breaks
they can code a patch.

The IBM web app will break if you open too many frames. Set a limit that doesn't
cause the browser to crash. Simiple idea, may be a difficult to implement but
lets get to it.
peter,

I'm not sure if your analysis is on track.  We do want to fix all critical and
visible crashes, but we also want to maintain high levels of compatility with
web content and web applications...

The trade off here is not IBM v. the world.  It's about fix to protect against a
crash on a "poorly coded page/non-useful page with recursive frames" by setting
some abritrary limit on the number of frames, and then set up a side effect that
inhibits some useful web application from continuing to work in Mozilla.

Check out the test case provided in IE.
http://www.urbnet.com/anonymous/main.html
The link provides no useful functionallity.

If we set up the limit to be compatible with one of the IBM web apps created
with Websphere, what's to say that limit won't be a frame or two low to inhibit
 some other useful web app from IBM, Oracle, or any other web content developer
from working.

A possible strategy here is to figure out what IE is doing and match its cut off
limit.   Finding a reasonable high cut off limit that preserves Mozilla's web
compatibility with the gratest amount of useful content is the most important
thing to get right here.   Fixing a crash for pages with recursive frames that
are unlikely to be present on the web, or be visited by the general web surfing
population is really a secondary issue.

Let's try and get some data here.  We can use some help.

Can someone try figure out how IE handles
http://www.urbnet.com/anonymous/main.html ?  Is it acutally cutting off the
recursion at some limit, or is mkapaly correct that that IE has no limits?

Can someone construct a test case and do a binary search on the a pages with a
high number of frames and figure out how/at what point IE exits out of the
recursion?

jay,  can you generate a stack trace by producing the crash from
http://www.urbnet.com/anonymous/main.html and match up the stack signature if
users might be hit similar crashes with like stack traces; then see if there is
any evidence of this stack signature showing up general web content.  Post the
other urls out of the talkback data.  I'm guessing we might not find any but we
need to check.

others,  if anyone can find other test sites with recursive frames, or extreme
high numbers of frames to add as test cases lets get them posted to the bug.

chris h.
I could re-enable the per-window docshell limit as mentioned here with a much
higher limit, probably without a lot of trouble.

The testcase has no functionality in IE.  However there may be accidental
recursions (of lower fanout levels) that could in theory be affected.

Sounds like IBM is suggesting 10,000 and that 1,000 is too low.  Obviously
10,000 is a lot less than 40,000,000... :-)

Note for IBM: if 1000 is too low, do I need to worry about them having 10 of
your pages open in tabs and hitting the 10,000 limit?

Another useful thing would be to figure out roughly the per-docshell memory
usage (say 1-ish, 1000, 10,000, and maybe 50,000).  that would also help select
a value.
I'm not sure I can give real numbers anymore on this. All we can do is "what's best"
taking bug
Assignee: core.layout.html-frames → rjesup
If you want to see really bad behaviour...

Save this code as 'something.html':

<frameset cols="176,*" frameborder="1" border="0" marginheight="0"
marginwidth="0" noresize scrolling="no"> 
  <frame name="menu" scrolling="no" marginheight="0" marginwidth="0"
src="menu.html">
  <frameset rows="96,*" frameborder="1" border="0" marginheight="0"
marginwidth="0" noresize scrolling="no"> 
  	<frame name="top" scrolling="no" marginheight="0" marginwidth="0"
src="top.html" >
  	<frame name="main" marginheight="0" marginwidth="0" src="main.html">
  </frameset>

Copy it to menu.html, main.html and top.html, then attempt to view
something.html. The browser will keep using CPU and memory, which is quite
unfortunate. IE just refuses to bother to try.

I note that if you name the html file "menu.html" and replace all the frame
references to also be menu.html, it doesn't crash.

Maybe it only checks that it is not loading itself?

My suggestion would be to keep some sort of structured list in memory, such that
when a child frame tries to reopen a parent frame, it will fail.
James, I suggest reading the code that implements the recursion check and its
cvs history...  In your case, we would catch the recursion at a depth of 10 or
so, but the branching factor is 3, and 3^10 is a big number.

> Maybe it only checks that it is not loading itself?

See the code at
http://lxr.mozilla.org/seamonkey/source/content/base/src/nsFrameLoader.cpp#164
-- it's a lot more complicated than that.

> My suggestion would be to keep some sort of structured list in memory, such
> that when a child frame tries to reopen a parent frame, it will fail.

That's not what IE does, and there are popular sites that break if you do that
(they actually load the same URI in the subframe and use some sort of other
state server-side to send different content).

If you _do_ figure out exactly what IE's check is, please let us know so we can
duplicate it.

Also, note that the 10000-frame limit we were discussing earlier in this bug is
not so much less than the 3^10 = 59000 frames that James was trying to load....
 So apparently the right answer is something smarter than just a cap.
*** Bug 284368 has been marked as a duplicate of this bug. ***
So this is interesting.  I finally got my hands on IE to test (thanks to
timeless and remote desktop), and it looks like IE (at least 5.5 on Windows)
cuts of recursion any time it hits the "same URI" as it already saw before
(where sameness is tested for without including the anchor, if any).  I checked
both static testcases and CGI scripts which prohibit caching and return
different content every time (via rand()).  See

http://landfill.mozilla.org/ryl/testFrame1.cgi
http://landfill.mozilla.org/ryl/testFrame2.cgi
http://landfill.mozilla.org/ryl/testFrame3.cgi
http://landfill.mozilla.org/ryl/testFrame4.cgi
http://landfill.mozilla.org/ryl/testFrame5.cgi
http://landfill.mozilla.org/ryl/testFrame6.cgi
http://landfill.mozilla.org/ryl/testFrame7.cgi
http://landfill.mozilla.org/ryl/testFrame9.cgi

(tests 8 and 10 are part of the 2-frame recursion test).

Based on these results, I will probably post a patch to cut off recursion as
soon as we detect it sometime soonish.
More tests (for onload and timeout issues):

http://landfill.mozilla.org/ryl/testFrame11.cgi
http://landfill.mozilla.org/ryl/testFrame12.cgi

All of these support my initial finding -- IE 5.5 on Windows does not allow
frame recursion of any kind as long as uri-without-ref tests equal.
Great! Sounds like a plan :)

Thanks a lot dude...
Attached patch Patch (obsolete) — Splinter Review
Ok, with this patch I don't crash on that page (as expected; the number of
frames we now have to load is 6 + 6*5 + 6*5*4 + 6*5*4*3 + 6! + 6! = 1956. 
Still freezes up my browser for 10-15 seconds, but it's a good bit better than
trying to load 40 million of the suckers.

The build with this patch has identical behavior to IE 5.5/Win on the testcases
I posted in comment 30 and comment 31.

The part that fixes the bug is just the #define change; the rest is cleanup to
use nsIURI::Equals so as to not reinvent the wheel and to fix this little
problem we had of incorrectly comparing hostnames (doing a case-sensitive
compare).
Assignee: rjesup → bzbarsky
Status: NEW → ASSIGNED
Attachment #176831 - Flags: superreview?(jst)
Attachment #176831 - Flags: review?(mkaply)
I'd sorta like to get this into beta2 asap so we can get people testing. If we
find examples where IE _does_ behave differently, I would like to know so I can
try to reverse-engineer it.

On another note, I found the source of the 25-frame limit mkaply mentiones in
comment 14 (bug 285394 filed), and have a general idea for maybe removing the
10-deep limit (bug 285395 filed).
OS: Windows XP → All
Priority: -- → P1
Hardware: PC → All
Target Milestone: --- → mozilla1.8beta2
Summary: Crash on page with recursive frames → [FIX]Crash on page with recursive frames
Comment on attachment 176831 [details] [diff] [review]
Patch

r=mkaply

It's always good to say "we work like IE" :)
Attachment #176831 - Flags: review?(mkaply) → review+
Comment on attachment 176831 [details] [diff] [review]
Patch

+  // Bug 98158/193011: We need to ignore data after the #
+  nsCOMPtr<nsIURL> cloneURL(do_QueryInterface(cloneURI, &rv)); // QI can fail
   if (NS_SUCCEEDED(rv)) {

Since you don't use rv here I'd not pass it to do_QI and just check for a
non-null cloneURL here.

sr=jst
Attachment #176831 - Flags: superreview?(jst) → superreview+
Attachment #176831 - Attachment is obsolete: true
Fixed.
Status: ASSIGNED → RESOLVED
Closed: 19 years ago
Resolution: --- → FIXED
*** Bug 285951 has been marked as a duplicate of this bug. ***
People are still seeing action on Bug 284368. While it deosn't "crash" per se,
it's still in an unacceptable state. Any ideas?
I think Grey mentioned the wrong bug there, he is referring to bug 285951. I
reproduced using nightly builds from trunk and aviary for March 23rd
(tonight/yesterday).  It does indeed still persist. CCing myself, cause this
looks interesting.
*** Bug 301952 has been marked as a duplicate of this bug. ***
Blocks: 228594
Product: Core → Core Graveyard
Component: Layout: HTML Frames → Layout: Images
Product: Core Graveyard → Core
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: