Closed Bug 477564 Opened 15 years ago Closed 15 years ago

Session restore hangs/not responding with high CPU on large form with many checkboxes

Categories

(Firefox :: Session Restore, defect)

defect
Not set
major

Tracking

()

RESOLVED FIXED
Firefox 3.6a1
Tracking Status
status1.9.2 --- beta1-fixed
blocking1.9.1 --- -
status1.9.1 --- .3-fixed

People

(Reporter: ginnchen+exoracle, Assigned: zeniko)

References

(Depends on 1 open bug, )

Details

(4 keywords, Whiteboard: workaround: comment 0)

Attachments

(7 files)

With a web page with a large form, say 1000 checkboxes, it takes more than 10 seconds to load the page.
Since the default sessionstore.interval is 10 seconds, it is really hard to operate with the page. (i.e. every click will take another more than 10 seconds hang.)

This behavior is caused by bug 346337.
Workaround is to set browser.sessionstore.privacy_level to 2.

This bug is similar to other sessionstore performance bugs, but it is focus on form data and can be reproduced with a single tab.

Testcase attached.
BTW: Firefox 3.0 or 2.0 is fine, since they don't have this feature.
The perf issue is in our XPath generating code. Giving those 1000 checkboxes IDs prevents the hanging.
Flags: blocking-firefox3.1?
Keywords: regression
Attached patch WIPSplinter Review
For those forms, our XPath generator is currently O(n^2). Using caching, we could make it O(n) which is significantly faster, even though it still slightly lags. I'm not sure this is the fastest way possible, though, and this patch could break for dynamically changing forms.
Keywords: helpwanted
sicking might have some thoughts on how to optimize the Xpath.
If you have an XPath expression that needs to be optimized I could certainly help with that. However I'd be surprised if you really have an XPath expression that takes any significant time to execute?
Actually, it's our XPath _generator_ that needs optimizing. We need an XPath expression for every form element, starting at the first parent element with an ID (or going all the way to the root element). What is particularly expensive is counting through all previous siblings of a node which have the same type (and the same name).
What i'd recommend is using XPath to do all tree walking. So first get all forms with an id by evaluating "//form[@id]". Then get all inputs inside such a form using ".//input". You can even use expressions like ".//input[@type='checkbox' and @name='foobar']" to find all inputs with a given name/type under a specific form.

Though really, we have backend code for saving and restoring page state. Can you not hook into that?
(In reply to comment #9)
> What i'd recommend is using XPath to do all tree walking.

We already do that. We currently produce XPath queries to each node in a form in order to account for forms which dynamically change (i.e. instead of "the 23rd form element" we store "//form[@id='whatever']/table/tr[20]/td[2]/input[@name='one_of_many_checkboxes_of_the_same_name']"). Maybe we should just stop doing that for pages with more than say 30 form elements...

> Though really, we have backend code for saving and restoring page state. Can
> you not hook into that?

I don't know. Can we get/set that state from a JS component in serializable form?
(In reply to comment #10)
> > Though really, we have backend code for saving and restoring page state. Can
> > you not hook into that?
> 
> I don't know. Can we get/set that state from a JS component in serializable
> form?

If you file a bug on it I'm sure we could expose such an API.
(In reply to comment #11)
> If you file a bug on it I'm sure we could expose such an API.

What component would that bug belong into - and what are the chances of this being fixed for Gecko 1.9.1?
(In reply to comment #12)
> (In reply to comment #11)
> > If you file a bug on it I'm sure we could expose such an API.
> 
> What component would that bug belong into - and what are the chances of this
> being fixed for Gecko 1.9.1?

Probably in the "Core/Content" component. cc me and bzbarsky.

Dunno what the chances are of getting it done for 1.9.1 is. Depends on how risky the patch is i'd say. And how badly we need the performance problems described in this bug fixed for 1.9.1
Depends on: 478107
Thankfully, pages with 1000 checkboxes are few and far between. Not a blocker, good perf improvement candidate though. Seems like we need the dependent Core:Content issue to be fixed, first, though.
Flags: blocking-firefox3.1? → blocking-firefox3.1-
The attached HTML from NCIX's promo sales page shows how bad this is with only about 500 checkboxes.
(In reply to comment #14)
> Thankfully, pages with 1000 checkboxes are few and far between.

Pages with hundreds of checkboxes are not incredibly rare -- looking at 130K pages downloaded from dmoz.org a few months ago, the greatest number of checkboxes I see are:

    230 on http://www.sciencedirect.com/science/journals
    221 on http://www.realestateboston.com
    220 on http://autos.aol.com/used-list/make1-Volvo
    182 on http://www.webworkshop.net/pagerank_calculator.php3
    121 on http://www.realhome4u.ca/

(Clicking the 'All' link on the sciencedirect.com page gives a page with 2289 checkboxes.)

Don't know if hundreds is enough to be a performance concern, so maybe this comment isn't useful.
(In reply to comment #17)
> Don't know if hundreds is enough to be a performance concern, so maybe this
> comment isn't useful.

Hundreds is enough depending on the web page.

The original test cases (1000 checkboxes), attached to this bug, are not good examples of how bad this bug really is.

The following are my timings (P3 933 MHz):

1000 checkboxes with IDs (per attachment):
------------------------------------------
1 sec (load) + 5 sec (idle) + 1 sec 24-27% CPU spike (IMPACT NEGLIGIBLE)

500 checkboxes at NCIX website (per attachment):
------------------------------------------------
10 sec (load) + 5 sec (idle) + 24 sec 99 % CPU spike (HUGE PERFORMANCE IMPACT)

3142 x 1.5 (4713 est.) checkboxes at sciencedirect.com:
-------------------------------------------------------
21 sec (load) + 1 sec (idle) + 4 sec 99 % CPU spike (IMPACT NEGLIGIBLE)

6958 x 1.3 (9045 est.) checkboxes at sciencedirect.com:
-------------------------------------------------------
43 sec (load) + 1 sec (idle) + 230 sec 99 % CPU spike (HUGE PERFORMANCE IMPACT)


As you can see, number of checkboxes alone is a deceptive measure; thus, I believe this is a significant performance regression.
Just attaching this for completeness, since I mentioned it in comment 18.
Just curious, shouldn't setting browser.sessionstore.enabled to false prevent the CPU consumption?
ok, apparently that pref doesn't disable the session restore part used for the undo close tab/window features, which should answer my previous question.
Keywords: common-issue?
Looking at the number of dups, it looks like we should try to fix this for next release. As described in comment 9, if someone files a bug on what backend API we need here I'm sure we can expose one.
Flags: blocking-firefox3.6?
The bug was already filed as Bug 478107.
The problem occurs on IMDb MyMovies pages with more than 500 titles, all with checkboxes. See Mozilla Reporter: RMO12472475245711, RMO12462161785463, RMO12464359507899. With ~1600 titles the not responding (no window refresh, no event handling) behavior takes more than 50 secs on an X2 4800. I think it is a critical bug, for me at least. :) The other big problem is that it is not listed in known bugs.
Set browser.sessionstore.privacy_level to 2 solved the problem for me, but the others will think what I thought first: Firefox 3.5 was launched too early, it is still in beta. Now I understand that there is a workaround, but most of the users will not register in bugzilla and report the problem. They will just uninstall it, and try another browser, hopefully Firefox 3.0
This work-around should be safe for 1.9.1.* and hardly affect any forms: in the worst case, we just don't completely restore a large form (authors still wanting full crash recovery for large forms should make sure to assign an ID to all form elements).
Attachment #388044 - Flags: review?(dietrich)
Flags: blocking1.9.1.1?
I can confirm the fix of using the workaround, set browser.sessionstore.privacy_level to 2 works and this wasn't a problem in version 2.x or 3.x or other brousers, but did crop up in FF 3.5.  Here is an example form, which has more than 1000 choices and now works after changing the setting from 1 to 2.  
http://www.nws.noaa.gov/tdl/synop/products/bullform.all.php
Note, that some duplicates have nothing to do with session restore.

The noaa.gov site, also crashes without any session restore. Well, at least on my computer, XP and 3.5.
In that case it's not a duplicate.  Please reopen and/or cc me on the relevant bug.
It is not only that last duplicate, ALL duplicates.
(In reply to comment #46)
> It is not only that last duplicate, ALL duplicates.

This is not an issue with the restoring a session. This is an issue with saving session data.
Assignee: nobody → zeniko
Attached file 4000 checkboxes
Attached is the test I had made for creating a new ticket, until I stumbled upon this one at the last minute.  I think a lot of people are making duplicate tickets because this one is not easily searchable, nor is the word "checkboxes" anywhere in the title.

Some observations from my test...Loading this test completely kills the browser for me.  It takes over 5+ minutes to recover from a "Not Responding" state.  Once it does, it only takes about 10-20 checks of various checkboxes for the browser to go back into "Not Responding" state for another 5+ minutes.  Also, I've noticed that the memory for the process goes up by 30MB and then back down by 30MB every second.  This up and down every second is consistent over the entire 5+ minutes.

This one is very frustrating for me personally, as I run a site where the admin page has over 4000 checkboxes, and I can't use my own site.  I have to (gasp) use another browser, unfortunately.
Flags: wanted1.9.1.x?
Flags: blocking1.9.1.1?
Flags: blocking1.9.1.1-
they all work perfectly fine with me (win7 rc x64, 2x2GHz, 2GB ram), but "real-life" apps make firefox hang for 10-15 seconds.
example: http://www.wunschliste.de/index.pl?vote&r=03
(it's in german, but that doesn't matter)
I think the same as comment #49: a lot of people are making duplicate tickets because this one is very hard to be found! I recommend a modified title, which contains "checkboxes", "not responding", "hang", "large form". E.g: The browser is not responding (hangs) on a large form with thousands of checkboxes because of session restore 
IMHO the real problem is not the CPU nor the memory usage, but the "not responding" behavior in all Firefox windows and tabs.
Is the workaround published somewhere else than bugzilla? E.g.: Known issues section in http://www.mozilla-europe.org/en/firefox/3.5/releasenotes/ ?
Just to let you know: this is a very serious issue for many biologists that have search results presented as a list of checkboxes (for example, BLAST tool). For them pages with >1000 checkboxes are quite normal, and force them to kill firefox.

Also, the workaround is not easy to find.
Severity: normal → major
Summary: Session restore takes too much CPU with large form → Session restore hangs/not responding with high CPU on large form with many checkboxes
Whiteboard: workaround: comment 0
blocking1.9.1: --- → ?
Comment on attachment 388044 [details] [diff] [review]
don't (completely) restore certain large forms

why 100? did you use your testcase to determine this is a reasonable number?

r=me though on the code change. however, this must be a short-term fix. maybe even branch-only.
Attachment #388044 - Flags: review?(dietrich) → review+
Can someone look if bug 504784 is related? Someone is putting a lot of text in a textarea and has similar hang, but the work around in this bug, doesn't work.
This bug has now 30 dupes and 25 votes, but it still only a nomination for 3.5.2. I think the seriousness of this bug is not clear yet. 

FF completely hangs (at least on my computer) on some of the referenced pages. Those pages are exceptional, but not violating any standard. This not when doing a session restore, but due to session restore (title is still misleading).

So, a major feature of FF is broken.
I'm experiencing something similar on Mac OS X (FF3.5.1), with significantly less than "100's of checkboxes".  I'm more in the region of 100.  Not sure if it's related to session restore.  I get intermittent hangs during page load, and sometimes after page load.

Just FYI.
Does that patch apply to mozilla-1.9.1? If so, could we get an approval request, please? Should we relnote that we won't restore tabs that have forms with more than 100 fields in them? Have we checked to make sure Bugzilla doesn't fall into that category ;)
blocking1.9.1: ? → -
The bug actually affects most scientific databases. For example pages like

http://adsabs.harvard.edu/cgi-bin/nph-ref_query?bibcode=2009AnP....18..231S&refs=AR&db_key=PHY

are completely unusable due to that bug.

It is normal that scientific databases give huge lists of articles and these lists usually come with checkboxes. The boxes enable the reader to collect those articles that he will read in a kind of personalized  library.

It is pure luck, that google has no checkbox for every search result.

Therefore, this bug should be fixed asap.
Just a question. Is there any change that this problem get worse over tabs? I mean, given the nature of the bug, should I count the number of form items of one tab or all tabs?

There are some bug reports about FF slowing down in case of many tabs. I have some lead in reproduction and thought that it might related to total number of form items on all tabs.
I've found another very annoying bug in FF 3.5.1 so I uninstalled it, and installed 3.0.12. Much more stable, and runs faster especially on facebook. The newly found bug related to facebook connections. Many of my connections are missing. E.g. the title says: "10 Connections", but only 4 or even 0 can be seen (AJAX problem?). With 3.0.12 everything is perfect. :( Sorry about that, but I think 3.5.1 is still a beta release.
Blocks: 506093
You are right,Firefox 3.0.12 very very good...it will be a disaster to quit working on it on 2010 ..
(In reply to comment #75)
> You are right,Firefox 3.0.12 very very good...it will be a disaster to quit
> working on it on 2010 ..

Don't worry, this bug will be fixed as soon as possible, well before then.

/be
Keywords: checkin-needed
Flags: wanted1.9.1.x?
I agree to comment 67, this bug is now listed as #2 Hot Bug here, with 33 dupes and major severity. But still, nothing is happening as far as one can see... The workaround is not satisfactory, and a real fix is not in sight.
We should get this fixed ASAP, best would be in 3.5.2!
There is definitely progress: there's a reviewed patch attached to this bug.  It would be great if you could make a build with it (see https://developer.mozilla.org/en/Build_Documentation ) and make sure that it resolves your problem.

Dietrich: is the reviewed patch suitable for the branch?  Seems like a useful mitigation that we should get checked in ASAP so we can bake and move.

It seems like we can probably make that form management more efficient by doing a single walk down the tree and building the XPath expressions piecemeal rather than doing them from scratch for each element we hit.  I guess that's sort of what Simon is doing in comment 5 -- what's the latest on that?
http://hg.mozilla.org/mozilla-central/rev/3621a1d223bd
Status: NEW → RESOLVED
Closed: 15 years ago
Keywords: checkin-needed
Resolution: --- → FIXED
Attachment #388044 - Flags: approval1.9.1.2?
Comment on attachment 388044 [details] [diff] [review]
don't (completely) restore certain large forms

This needs to bake before we'll consider it for 1.9.1.x and given our 1.9.1.2 schedule, I don't think this will make it before 1.9.1.3.

I think beltzner asked, but maybe not in the bug. What's the rationale for 100 as opposed to 200 or 300 or even 50?
Attachment #388044 - Flags: approval1.9.1.2? → approval1.9.1.3?
Should this really be "RESOLVED FIXED?"

Will there be a follow-on bug to workout a non-hacked fix?
Sounds good to me, go ahead and file one.
I still get the impression that the problem is not fully understood.

From comment #79 it looks like if it is only a performance issue. And that the performance degrades when the number of input fields increases.

This is however not what I see. 

If I take attachment 388399 [details] with 4000 checkboxes, then I have no problem at all. However, if I go to the website:

http://www.nws.noaa.gov/tdl/synop/products/bullform.all.php

Then I a full hang, while the number of checkboxes is less then in 388399.

So, I don't have a continuous degradation. It performance or hangs. Nothing between.

Is this behavior explained??? Also, during the hang, the memory usage of FF goes wildly up and down. from 60 to 180Mb and back in just one or two seconds.

It is still related to the session restore functionality, because I can not reproduce anything if the privavy_level is on 2.

Lucas
> Is this behavior explained???

Yes.  Session restore doesn't save form state on https:// URIs (like the attachment in this bug) by default.  So you're simply not seeing this bug at all on that attachment.  You can set the privacy_level pref to 0 to have it save https data as well, if you really want to use the attachment for testing.

To actually test whether the degradation is continuous, I recommend a controlled experiment: several pages that are identical except for number of checkboxes, all located at the same scheme+host+port (or better yet same url).  That way you wouldn't run into issues like the above.

> Also, during the hang, the memory usage of FF goes wildly up and down.

Yes, it's generating lots of JS garbage as it generates all those XPaths.

Trust me, the problem is fully understood.  ;)
(In reply to comment #84)
> Sounds good to me, go ahead and file one.

Hey Boris: Would you (or someone else) like to take this on? I'd do it but I won't be able to describe the problem as well as you--my understanding being lacking.

Thanks
Hello there,
in changeset - 30663,
is that only limit Maxinum of XPath expressions (in different node?) numbers to 100,
instead using cache mechanism to reduce complexity from O(n^2) to linear time O(n)?
(In reply to comment #59)
> why 100? did you use your testcase to determine this is a reasonable number?

This number isn't empirically backed. It just seemed large enough to cover all small to medium sized forms while being significantly smaller than all of the test cases we've seen so far.

> however, this must be a short-term fix. maybe even branch-only.

For Trunk, we need bug 478107 first.

(In reply to comment #69)
> Should we relnote that we won't restore tabs that have forms
> with more than 100 fields in them? Have we checked to make sure Bugzilla
> doesn't fall into that category ;)

Note that this only affects forms with more than 100 fields _missing an ID_. BugZilla fields all have an ID and are thus unaffected by the issue (if they were saved by default, which they aren't for privacy reasons).

(In reply to comment #79)
> It seems like we can probably make that form management more efficient by doing
> a single walk down the tree and building the XPath expressions piecemeal

The proper fix would go along bug 478107's route. If that doesn't work out, something along the WIP patch's line should also fix this issue. It doesn't yet take into account dynamically modified forms, though, and I haven't got the resources for working on it short-term.
Target Milestone: --- → Firefox 3.6a1
Another day, another dupe of this bug. Is this solved for 3.5.3?
Just took the leap of actually trying this in other browsers. Most of those "HUGE HIT" links and examples are composited instantly, without lag, in those other programs. And was the goal to isolate just the tables and checkboxes? Attaching a plain HTML leaves me wondering whether to run it in offline mode and strip out the image loading hit.

I'm not sure if you can satisfactorily resolve this one. It may be tied too deeply to your other sessionrestore speed hits.
Two more in the same day? Perhaps it would be better to set an upper limit on unlabeled input. Above a certain threshold, input would not be cached. It's no worse than before Firefox was fixed into a corner, and sanely built forms with only ten-to-fifty unlabeled fields could still benefit.

Or would it be too expensive to add such a test, or is there not enough granularity to selectively cache only some forms and not others?
Comment on attachment 388044 [details] [diff] [review]
don't (completely) restore certain large forms

Approved for 1.9.1.3. a=ss
Attachment #388044 - Flags: approval1.9.1.3? → approval1.9.1.3+
Just one comment - though I assume the path based on XPath's accounts for this - this hit occurs with any form element that I can tell, I had to put ID's on input type=text's and input type=hdden for 3.0-style performance.

Mac OS X 10.5.8 / Firefox 3.5.2 / PPC
Verified fixed for 1.9.1.3 with Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.3pre) Gecko/20090817 Shiretoko/3.5.3pre (.NET CLR 3.5.30729).
Keywords: verified1.9.1
Just a note: the test case with 4000 checkboxes passes with flying colors on FF 352, UNLESS you use brackets in the checkbox name: 
<input type="checkbox" name="category[]" value="1" />Item 1<br/> etc...
Then it stalls completely.
No reports on support in the last week.
(In reply to comment #110)
> Just a note: the test case with 4000 checkboxes passes with flying colors on FF
> 352, UNLESS you use brackets in the checkbox name: 
> <input type="checkbox" name="category[]" value="1" />Item 1<br/> etc...
> Then it stalls completely.

filed followup bug 513977.
Using the FireFox 3.5.2,
a created form with about 250 items still freezing for a cupple of time.
Doesn´t matter if the checkbox got onchange events or so, and it´s already not important if you got the <form> element or not. Whenever using many form items without any <form> element or so, the FireFox still dont response for a few settings, depending on how much items you created and how big the html size is.
Flags: blocking-firefox3.5-
Sorry guys didn´t read the "Session Restore" this bug i reported also happens if i am using no session restore. Create a new Bugreport as well.
This bug is/will be fixed in Firefox 3.5.3, which will be available shortly. You can try a release candidate here:
ftp://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/3.5.3-candidates/

Or try nightly builds, which are in development for 3.5.4 here:
ftp://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/latest-mozilla-1.9.1/
Flags: blocking-firefox3.6? → blocking-firefox3.6+
Steffen - I've tried the 3.5.3 RC and this bug is still there for the pages that are affecting me the most.
Patch 3.5.3-candidates works fine for me (I created the bug 516764)
Firefox 3.5.3 shipped 5 days ago, please update if you haven't already.
Blocks: 510652
This bug appears to be present in 3.6.6 release (testing with: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.6) Gecko/20100625 Firefox/3.6.6).

I see hangs on large tables when display properties are changed on many rows. 1000+ rows causes an unrecoverable hang.
(In reply to comment #124)
> This bug appears to be present in 3.6.6 release (testing with: Mozilla/5.0
> (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.6) Gecko/20100625
> Firefox/3.6.6).
> 
> I see hangs on large tables when display properties are changed on many rows.
> 1000+ rows causes an unrecoverable hang.

Please file a new bug. This one was patched and resolved. It's always better to file anew, the problem could be related, or completely different. Profiling is needed.

/be
Funny; a regression should nearly always invoke the original bug. Why clutter the system with another duplicate filing? Of much greater interest is whether this has crept into the 4.0 test builds. May not be worth fixing for 3.6.x if 4.0 will deal with it again.
> Funny; a regression should nearly always invoke the original bug.

You'd mention the old bug in the new one, sure.

> Why clutter the system with another duplicate filing?

Because bugs have status that needs separate tracking for separate issues (e.g. this bug is fixed, while the new bug you would file clearly is not).

I should note that the issue description in comment 124 does NOT sound like this bug (which is the other reason to file separate issues as separate bugs, of course; even if you think it might be a similar issue or the same one, it's usually hard to tell as a bug filer) .  I'd really appreciate a bug filed on comment 124, with a testcase.
Thanks for the feedback, I'm sorry to clutter up this ticket. I intend on filing a new bug as soon as I have a minimal test case working. It is starting to look like I am not encountering this bug but something more subtle.
David, you don't need a minimal testcase for a performance bug.  If you have a testcase at all, please file!
... Why is this marked "fixed" when it was only kludged?
(In reply to Samuel Bronson from comment #130)
> ... Why is this marked "fixed" when it was only kludged?

Well, we believe it to be fixed.
You need to log in before you can comment on or make changes to this bug.