Closed Bug 635008 Opened 13 years ago Closed 13 years ago

browser hangs every time on the "categorize your question" page on yahoo answers and needs to be killed by task manager

Categories

(Core :: DOM: Core & HTML, defect)

defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla2.0
Tracking Status
blocking2.0 --- final+

People

(Reporter: contactme, Assigned: mounir)

References

(Depends on 1 open bug)

Details

(Keywords: hang, regression, Whiteboard: [hardblocker])

Attachments

(5 files, 11 obsolete files)

17.03 KB, patch
Details | Diff | Splinter Review
18.54 KB, patch
Details | Diff | Splinter Review
3.66 KB, patch
Details | Diff | Splinter Review
13.68 KB, patch
bzbarsky
: review+
Details | Diff | Splinter Review
3.13 KB, patch
bzbarsky
: review+
Details | Diff | Splinter Review
User-Agent:       Mozilla/5.0 (Windows NT 6.1; rv:2.0b11) Gecko/20100101 Firefox/4.0b11
Build Identifier: Mozilla/5.0 (Windows NT 6.1; rv:2.0b11) Gecko/20100101 Firefox/4.0b11

Every time while asking questions on yahoo answers, the browser hangs permanently when the "categorize your question" page loads. This happens EVERY TIME in the same fashion and the browser has to be killed by the task manager manually.

This doesn't happen with Google Chrome.

Reproducible: Always

Steps to Reproduce:
1.Ask a question on yahoo answers by filling in the question and the detail in the first step.
2.Proceed to the next step where you are required to "categorize your question".

3.Bam! You want to kill yourself because it has hanged, yet again.
Actual Results:  
The Browser hangs and stays that way no matter how much you wait. In other words, needs to be killed manually by task manager.

Expected Results:  
It shouldn't have hanged.

about:buildconfig
Source

Built from http://hg.mozilla.org/mozilla-central/rev/f9d66f4d17bf
Build platform
target
i686-pc-mingw32
Build tools
Compiler 	Version 	Compiler flags
d;D:\mozilla-build\msys\mozilla-build\python25\python2.5.exe -O e;D:\mozilla-build\msys\builds\moz2_slave\rel-cen-w32-bld\build\build\cl.py cl 	14.00.50727.762 	-TC -nologo -W3 -Gy -Fdgenerated.pdb -DNDEBUG -DTRIMMED -Zi -Zi -UDEBUG -DNDEBUG -GL -wd4624 -wd4952 -O1
d;D:\mozilla-build\msys\mozilla-build\python25\python2.5.exe -O e;D:\mozilla-build\msys\builds\moz2_slave\rel-cen-w32-bld\build\build\cl.py cl 	14.00.50727.762 	-GR- -TP -nologo -Zc:wchar_t- -W3 -Gy -Fdgenerated.pdb -wd4800 -DNDEBUG -DTRIMMED -Zi -Zi -UDEBUG -DNDEBUG -GL -wd4624 -wd4952 -O1
Configure arguments

--enable-application=browser --enable-update-channel=beta --enable-update-packaging --enable-jemalloc --enable-tests --enable-official-branding
Could you see if the issue occurs with a new, empty profile:
http://support.mozilla.com/en-US/kb/Basic%20Troubleshooting#w_make-a-new-profile
Version: unspecified → Trunk
I just did that. I created a new profile and without making any additional changes to it, straightaway went to Yahoo! answers. It hanged exactly the way it hangs in the default profile. I hope that answers your question.

Thank you for your response.
To further eliminate the causes, could you try Safe Mode please:
http://support.mozilla.com/en-US/kb/Safe+Mode

If that still hangs, please can you try disabling all plugins shown on about:plugins, restarting Firefox and trying again.

Thanks!
Again, the same results. I started new test profile I have created for to last step, in safe mode and when it asked me if I wanted to disable add-ons and other such options, I chose all of them and restarted Firefox in safe mode. I typed about:plugins and it showed that no plug-ins are enabled. Now, I opened Yahoo answers and the problem persists. It hanged in the exact same manner.

Thank you for your interest!
*I started the new test profile I have created for the last step

Sorry for the typo, using speech recognition ;)
Thank you for your responses.

I've just created a yahoo account for myself to try and test it out. The page does hang but for only 10 seconds or so.

If you leave it for a bit, does it recover?
What is your CPU speed/type?

Also, since when has this started happening?
Keywords: hang
My computer's specifications :

Intel Pentium dual CPU E2140 @ 1.60 GHz
Ram: 3 GB
OS : Windows 7 Ultimate x86

I remember leaving it for about 20 seconds approximately. It didn't recover. I can try to leave it for more if you want me to.

Before using Firefox 4.0 b11, I was using Google Chrome. Prior to that I have used Firefox while it was in between the versions 3 to 3.5 as the primary browser. I have been using Yahoo! answers from long time and I don't remember anything like that happening. In fact, I'm pretty sure it didn't happen in the 3.x versions.

Thank you.
It seems that the browser and forever. I waited for it to recover well over 5 min. but it didn't. In the task manager I noticed that the CPU usage by Firefox was in between 45 and 50 consistently while there was no spike in RAM usage.

Again, I had to kill Firefox.
*It seems that the browser hanged forever.
Just checked with Chrome and 3.6.13 and does not occur with them, so regression.

STR:
1) Go to: http://answers.yahoo.com/question/ask
2) Type at least 20 characters into each box
3) Click continue
4) Now try to open a new tab/switch tab/use Firefox UI straight away

Expected:
- Rest of Firefox UI responsive, new tabs can be created/tabs can be switched to straight away.

Actual:
- Firefox UI unresponsive for ~10-15 seconds (using Core2Duo 2.4; 2GB RAM; Win7 x64), clicking impatiently makes the window go white.

Last good nightly: 2010-12-17 
First bad nightly: 2010-12-18

Pushlog: 
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=d1da1005b6d6&tochange=4f03895d544b

No idea what may be the cause in that range, so leaving in Core-General for now.
Status: UNCONFIRMED → NEW
blocking2.0: --- → ?
Ever confirmed: true
Keywords: hangregression
Product: Firefox → Core
QA Contact: general → general
Navneet, given that for me the hang only lasts 10-15 seconds, could you confirm that the regression range for your problem (hang with no recovery) is the same as mine please.

ie: download/extract these to your desktop, run from there and see if they hang or not. (Won't interfere with your current Firefox install).

Should work fine:
http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2010/12/2010-12-17-03-mozilla-central/firefox-4.0b9pre.en-US.win32.zip

Should hang:
http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2010/12/2010-12-18-03-mozilla-central/firefox-4.0b9pre.en-US.win32.zip

Thanks again!
Hello again! I did what you asked me to do.

As you mentioned, the browser did not hang when I opened Yahoo answers using the build you provided in the first link. Not even for a second.

While in the second build, it hanged like in the previous cases.

You mentioned that the browser for you hangs only for a maximum of 15 seconds but for me it seemingly, hangs indefinitely.

Thank you and it's my pleasure!
Thank you Navneet, that confirms that even though the page only hangs for 10-15s for me, the regression range is still the same, so should be the same underlying cause.

Just needs someone who can build locally or has hourly builds to help narrow down the range a bit more. In case whomever ends up doing that doesn't have a yahoo account, you are welcome to use the test one I just created, as it will be binned once this is resolved anyway.
u: testaccount777888999
p: <this bug's ID repeated twice>
Keywords: hang
Ed, what are the changeset IDs of those builds?
It's the same revs as the range in comment 10.
Oh, I see.

I also just reproduced a beachball for about 20s following the steps in comment 10 (though only the second time I tried it).  Profiler says all the time is under removeChild call made from JS and in particular under WillRemoveFromRadioGroup calling nsHTMLInputElement::UpdateValueMissingValidityStateForRadio.  This seems to be triggering ContentStatesChanged on all the radios in the group (is that right?) which triggers HasStateDependentStyle on all the radios, and we spend a bunch of time in HasStateDependentStyle itself and in selector matching.  It looks like we make O(N^2) such calls, in particular.

So this is almost certainly a regression from bug 610687.

Note that this page has about 1600 radio buttons in it when it's first loaded; then it seems to remove most of them from the DOM....

This shouldn't be a flat-out hang, but depending on hardware it might take more or less time to finish, of course.
Unfortunately i didn't understand much from your last comment Boris. I am Computer Science 2nd semester undergrad so it's going to be some time before i can understand stuff written above :p. But can you summarize in a bit simple terms as to what you meant? 

By the way i deduce from your name that you are Russian. Am i right? Sorry if you are not but i have read War and Peace by Leo Tolstoy and i remember a character, or perhaps, more than one named Boris ;)
> But can you summarize in a bit simple terms as to what you meant? 

I meant that we now know exactly where the performance problem is (most of that first big paragraph), as well as what caused it (the "a regression from") part.  Please feel free to send me mail at my bugzilla address if you want more details?

I'm from what used to be the Soviet Union, yes.
I'm taking this bug. The current algorithm is suboptimal because making it optimal requires a change in an interface and I wrote the patch after the interface freeze. I'm going to write the patch changing the interface and we will see if it can go in Gecko 2.0.
Assignee: nobody → mounir.lamouri
Component: General → DOM: Core & HTML
OS: Windows 7 → All
QA Contact: general → general
Hardware: x86 → All
Attached patch WIP Patch (obsolete) — Splinter Review
This WIP patch probably doesn't compile. Last time it was compiling, the hang was fixed but the feature causing the hang was broken. I'm attaching the patch here to work on it from another computer.
Sounds like a blocker based on earlier comments, the fact that it's a pretty popular site, and a regression. Marking as betaN+ for now, but happy if someone tells me that it can wait until final+.
blocking2.0: ? → betaN+
Whiteboard: [hardb
Whiteboard: [hardb → [hardblocker]
I will try to attach a patch for review tomorrow.
Mounir - any opinion on Mike's (implied) question of whether this needs to block beta, or just final?
Whiteboard: [hardblocker] → [hardblocker][might be final+]
(In reply to comment #23)
> Mounir - any opinion on Mike's (implied) question of whether this needs to
> block beta, or just final?

In one hand, the patch isn't trivial so it might have some unexpected consequences. In another hand, the changed code might be very related to radio button code so it shouldn't break everywhere if it has to break.
So, I think a beta coverage would be better if we take a patch in the same spirit as the WIP one. However, having it pushed in the first RC should be doable too IMO.
I think we should be fine taking this in an rc, personally; we'll just need to be careful with the patch (which is the plan anyway).  That's easier to do if we're aiming to have the patch be done in a few days instead of yesterday, imo....
Marking final+ for bookkeeping, but we should obviously aim to get it in sooner rather than later if people feel like working over the long weekend!
blocking2.0: betaN+ → final+
Whiteboard: [hardblocker][might be final+] → [hardblocker]
Attachment #513737 - Attachment is obsolete: true
Attachment #514003 - Flags: review?(bzbarsky)
Attached patch Part 2 - Remove dead code. (obsolete) — Splinter Review
Comment on attachment 514004 [details] [diff] [review]
Part 2 - Remove dead code.

This is removing dead code but also changing nsIRadioVisitor. But I honestly doubt anyone is using this useless thing. Though, this could easily swipe in Gecko next.
Attachment #514004 - Attachment description: Remove → Part 2 - Remove dead code.
Attachment #514004 - Attachment is patch: true
Attachment #514004 - Attachment mime type: application/octet-stream → text/plain
Attachment #514004 - Flags: review?(bzbarsky)
Attachment #514005 - Attachment description: Part 3 - Keep track of the vali → Part 3 - Keep track of the validity state of groups.
Attachment #514005 - Attachment is patch: true
Attachment #514005 - Attachment mime type: application/octet-stream → text/plain
Attachment #514005 - Flags: review?(bzbarsky)
This is crashing for unknown reason. Asking for feedback in case of Boris can find out an obvious mistake.
Attachment #514006 - Flags: feedback?(bzbarsky)
(In reply to comment #31)
> Created attachment 514006 [details] [diff] [review]
> Part 4 - Use only one hash table in nsHTMLFormElement for groups instead of 3
> 
> This is crashing for unknown reason. Asking for feedback in case of Boris can
> find out an obvious mistake.

A stack would probably be useful to figure that out ...
Attached patch Alternative patch (obsolete) — Splinter Review
This patch is an alternative to the four other parts: it reduces the hang to only a few seconds on my system (and debug build) but I'm worried the hang might still be visible on other systems. It's mostly here for information and in case of the other patches are considered too much risky.
Attached patch Alternative patch (obsolete) — Splinter Review
Better with the proper patch...
Attachment #514012 - Attachment is obsolete: true
I did some benchmarks with a release build:
- with no patch: ~15 secs freeze ;
- with 4 patches: no freeze;
- with alternative patch: nearly imperceptible freeze (<1 sec).
With the crash fixed.
Attachment #514006 - Attachment is obsolete: true
Attachment #514037 - Flags: review?(bzbarsky)
Attachment #514006 - Flags: feedback?(bzbarsky)
Attachment #514037 - Attachment is obsolete: true
Attachment #514038 - Flags: review?(bzbarsky)
Attachment #514037 - Flags: review?(bzbarsky)
Comment on attachment 514039 [details] [diff] [review]
Part 5 - Don't return an already_addrefed object for GetSelectedRadioButton

This could be moved in another bug too.
Attachment #514039 - Attachment description: Part 5 - Don't return an already_addrefe → Part 5 - Don't return an already_addrefed object for GetSelectedRadioButton
Attachment #514039 - Attachment is patch: true
Attachment #514039 - Attachment mime type: application/octet-stream → text/plain
Attachment #514039 - Flags: review?(bzbarsky)
I've sent everything to try, if people with slow computers can test the improvements with both patches, that would be nice. Builds will be available here:
Real patch (5 parts): ftp://ftp.mozilla.org/pub/mozilla.org/firefox/tryserver-builds/mlamouri@mozilla.com-1ba8bc5bbcfe
Alternative patch: ftp://ftp.mozilla.org/pub/mozilla.org/firefox/tryserver-builds/mlamouri@mozilla.com-decc2d9aef7f
Status: NEW → ASSIGNED
Whiteboard: [hardblocker] → [hardblocker][has patch][needs review bz]
The try server caught a few failures. This patch should pass our test suite.
Attachment #514014 - Attachment is obsolete: true
Updating the patch for same reasons as above.
Attachment #514005 - Attachment is obsolete: true
Attachment #514226 - Flags: review?(bzbarsky)
Attachment #514005 - Flags: review?(bzbarsky)
Comment on attachment 514039 [details] [diff] [review]
Part 5 - Don't return an already_addrefed object for GetSelectedRadioButton

This patch is actually wrong. Thanks Ms2ger for noticing.
Attachment #514039 - Attachment is obsolete: true
Attachment #514039 - Flags: review?(bzbarsky)
Whiteboard: [hardblocker][has patch][needs review bz] → [hardblocker][has patch][needs review bz][passed try]
Comment on attachment 514003 [details] [diff] [review]
Part 1 - Keep track of required state of radio groups.

Should GetRequiredRadioCount be const?

>+    : mRadioRequired(0)

mRequiredRadioCount, perhaps?

>mRequiredRadioButtons

mRequiredRadioButtonCounts, perhaps?

That last hunk looks complicated, but ok for now.  Minimal changes and all.

r=me
Attachment #514003 - Flags: review?(bzbarsky) → review+
Comment on attachment 514004 [details] [diff] [review]
Part 2 - Remove dead code.

r=me, if you don't change the IID.
Attachment #514004 - Flags: review?(bzbarsky) → review+
Comment on attachment 514038 [details] [diff] [review]
Part 4 - Use only one hash table in nsHTMLFormElement for groups instead of 3

I can review this as needed, but do we need this cleanup for 2.0?  Or would it be better in a separate bug?
Comment on attachment 514226 [details] [diff] [review]
Part 3 - Keep track of the validity state of groups.

>+  return radioGroup ? radioGroup->mGroupSuffersFromValueMissing : false;

  return radioGroup && radioGroup->mGroupSuffersFromValueMissing;

Should GetValueMissingState be a const method?

>+  // Notify the radio button it's been added to a group

I assume there's an ordering dependency here that caused you to move this block of code?  Please document that ordering dependency.

I don't understand why the boolean for a single input element should affect the whole group for value-missing.... if it's false for one element, why is it false for the group?

Why is the AfterSetAttr change needed?  What was handling that before this patch?
Attachment #514226 - Flags: review?(bzbarsky) → review-
r=bz

I made GetRequiredRadioCount const but I had to create bug 636123 because GetRadioGroup can't be const for the moment.
Attachment #514003 - Attachment is obsolete: true
r=bz
(In reply to comment #47)
> Comment on attachment 514226 [details] [diff] [review]
> Part 3 - Keep track of the validity state of groups.
> 
> >+  return radioGroup ? radioGroup->mGroupSuffersFromValueMissing : false;
> 
>   return radioGroup && radioGroup->mGroupSuffersFromValueMissing;
> 
> Should GetValueMissingState be a const method?

Ok.

> >+  // Notify the radio button it's been added to a group
> 
> I assume there's an ordering dependency here that caused you to move this block
> of code?  Please document that ordering dependency.

There is a comment line just after:
"// This has to be done _after_ UpdateValidity() call.". Is that enough or you want more details? Actually, AddedToRadioGroup calls SetValidityState() which also call UpdateValidity() so if you call AddedToRadioGroup before UpdateValidity in ::AddElement, you will end with an element counted twice.

> I don't understand why the boolean for a single input element should affect the
> whole group for value-missing.... if it's false for one element, why is it
> false for the group?

The idea is that UpdateValueMissingState() is called just after so we don't really care if the new element should change the group validity (it will happen in UpdateValueMissingState()). The real situation this is fixing is when the new element has a different validity than the group because it will not be updated by UpdateValueMissingState().
But I presume this is pointing the case of AddedToRadioGroup() called from AfterSetAttr() where a call to UpdateValueMissingState() is missing.

> Why is the AfterSetAttr change needed?  What was handling that before this
> patch?

It isn't handled. I saw this bug while trying to check some stuff (I will write a few tests later).
Attachment #514226 - Attachment is obsolete: true
Attachment #514460 - Flags: review?(bzbarsky)
Attachment #514004 - Attachment is obsolete: true
(In reply to comment #46)
> Comment on attachment 514038 [details] [diff] [review]
> Part 4 - Use only one hash table in nsHTMLFormElement for groups instead of 3
> 
> I can review this as needed, but do we need this cleanup for 2.0?  Or would it
> be better in a separate bug?

I'm fine with that.
Comment on attachment 514038 [details] [diff] [review]
Part 4 - Use only one hash table in nsHTMLFormElement for groups instead of 3

Part 4 moved to bug 636132.
Attachment #514038 - Attachment is obsolete: true
Attachment #514038 - Flags: review?(bzbarsky)
Hello I'm using Firefox 4.0b11 for a few days.
I notice this behaviour too. Tried with the Italian version of Yahoo Answers, it hangs for ~10 seconds on Categorize question page.
25% CPU on Task Manager on a Core i5-750. After the lock up the page works fine.
(In reply to comment #54)
> Hello I'm using Firefox 4.0b11 for a few days.
> I notice this behaviour too. Tried with the Italian version of Yahoo Answers,
> it hangs for ~10 seconds on Categorize question page.
> 25% CPU on Task Manager on a Core i5-750. After the lock up the page works
> fine.

Thank you for your report.  The current status is that we understand the issue and do not need any more reports of the issue.  This is currently marked as a blocker for the release, which means that we intend to fix this issue in the release version of Firefox 4.
Some of these new tests do not currently pass. Some of them are just here to make sure we don't regress.
Attachment #514668 - Flags: review?(bzbarsky)
> There is a comment line just after:

Indeed.  Good.
> you will end with an element counted twice.

We need to simplify this stuff post-2.0.  You and I should sit down and talk it over....

> It isn't handled.

Add a test to keep it from regressing, please?

> The idea is that UpdateValueMissingState() is called just after

Please document that.  This code is incredibly confusing at this point; we really need to simplify it.  :(

That said, the code that I don't understand here is _inside_ UpdateValueMissingState, so I'm not sure I follow your answer.  Or did you mean that NS_SetRadioValueMissingState will update the radio group itself or something?  Because I don't see where that's happening.

> But I presume this is pointing the case of AddedToRadioGroup() called from
> AfterSetAttr() where a call to UpdateValueMissingState() is missing.

I'm not sure what you mean.
Comment on attachment 514668 [details] [diff] [review]
New tests for required radio groups

r=me
Attachment #514668 - Flags: review?(bzbarsky) → review+
(In reply to comment #57)
> > There is a comment line just after:
> 
> Indeed.  Good.

The new patch actually makes the comment a little bit more verbose than it was.

(In reply to comment #58)
> > you will end with an element counted twice.
> 
> We need to simplify this stuff post-2.0.  You and I should sit down and talk it
> over....

Catch me on IRC :)

> > It isn't handled.
> 
> Add a test to keep it from regressing, please?

The new tests you reviewed handle that. That's why some do not currently pass (but this patch will make them pass).

> > The idea is that UpdateValueMissingState() is called just after
> 
> Please document that.  This code is incredibly confusing at this point; we
> really need to simplify it.  :(

Yes. I'm afraid some parts of nsHTMLInputElement code looks like spaghetti code.

> That said, the code that I don't understand here is _inside_
> UpdateValueMissingState, so I'm not sure I follow your answer.  Or did you mean
> that NS_SetRadioValueMissingState will update the radio group itself or
> something?  Because I don't see where that's happening.

I probably didn't understand what was the issue then. Could you quote the code?

> > But I presume this is pointing the case of AddedToRadioGroup() called from
> > AfterSetAttr() where a call to UpdateValueMissingState() is missing.
> 
> I'm not sure what you mean.

Your comment make me realize I forgot something (calling UpdateValueMissingState after AddedToRadioGroup in AfterSetAttr).
> I probably didn't understand what was the issue then. Could you quote the code?

Yes.  This part:

>+    if (container->GetValueMissingState(name) != valueMissing) {
>+      container->SetValueMissingState(name, valueMissing);

Say valueMissing is false, while container->GetValueMissingState(name) is true.  We'd enter this block, and set set the value on the container to false.  Why is that correct?
(In reply to comment #61)
> >+    if (container->GetValueMissingState(name) != valueMissing) {
> >+      container->SetValueMissingState(name, valueMissing);
> 
> Say valueMissing is false, while container->GetValueMissingState(name) is true.
>  We'd enter this block, and set set the value on the container to false.  Why
> is that correct?

Because all radio elements of the same group have the same value missing state. IOW, if you have data:text/html,<input type='radio' name='a'><input type='radio' name='a' required>, at page load, both radio element will be invalid and when selecting any of the two radios, both will become valid.

Currently, when updating the value missing state of a radio element, we check if a radio is selected in the group, then we check if there is at least one required radio. The resulting state is common for the entire group that's why we can set the new value to the container. BTW, we are only calling this method when the state could have changed (when a radio is added/removed from a group or when @required or checkedness is changed).
Ah, I see.  The point is that |required| can be true even if our current radio is not required.  This is just hard to see because of the over-complicated code for setting |required|....  Alright.
Comment on attachment 514460 [details] [diff] [review]
Part 3 - Keep track of the validity state of groups.

r=me, but again we really need to simplify this.  Please file a followup bug?
Attachment #514460 - Flags: review?(bzbarsky) → review+
(In reply to comment #63)
> Ah, I see.  The point is that |required| can be true even if our current radio
> is not required.  This is just hard to see because of the over-complicated code
> for setting |required|....  Alright.

Really sorry about that :(
Depends on: 636774
Is this ready to go?
(In reply to comment #66)
> Is this ready to go?

I just pushed that:
http://hg.mozilla.org/mozilla-central/rev/10fe279c3613
http://hg.mozilla.org/mozilla-central/rev/f644c17d2120
http://hg.mozilla.org/mozilla-central/rev/7702940cf12c
http://hg.mozilla.org/mozilla-central/rev/5c15691031e5
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Whiteboard: [hardblocker][has patch][needs review bz][passed try] → [hardblocker]
Target Milestone: --- → mozilla2.0
Depends on: 639175
No longer depends on: 639175
Depends on: 750370
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: