Closed
Bug 309424
Opened 18 years ago
Closed 18 years ago
Firefox 1.5beta1 freezes & goes into 95+% cpu usage browsing blackisha.com
Categories
(Core :: XML, defect)
Core
XML
Tracking
()
VERIFIED
FIXED
mozilla1.8beta5
People
(Reporter: mozilla, Assigned: darin.moz)
References
()
Details
(Keywords: fixed1.8, hang, regression)
Attachments
(3 files, 1 obsolete file)
8.77 KB,
text/plain
|
Details | |
10.85 KB,
text/plain
|
Details | |
9.09 KB,
patch
|
Biesinger
:
review+
bzbarsky
:
superreview+
asa
:
approval1.8rc1+
|
Details | Diff | Splinter Review |
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8b4) Gecko/20050908 Firefox/1.4 Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8b4) Gecko/20050908 Firefox/1.4 Goto blackisha.com & hover over the second comments link. it will come up with a list of comment titles, hover over a title to get the msg body. then move the mouse over to somewhere else in the page. Mozilla freezes & starts consuming cpu. Reproducible: Always Steps to Reproduce: 1. blackisha.com 2. hover over the second comments link (or a link with comments), a menu will popup 3.hover over a comment title, so that another div pops ip. 4. move the mouse to somewhere else of the page, it crashes at this point for me. Actual Results: crash & 95+% cpu usage. Expected Results: continue normally & the divs (ur <ul>'s, cant quite remember) should dissapear I will test in safe mode, but cant test yet as it will crash browser.
Reporter | ||
Comment 1•18 years ago
|
||
OK, I tried it in safe mode. same problem.
Summary: mozilla 1.5beta1 freezes & goes into 95+% cpu usage → mozilla 1.5beta1 freezes & goes into 95+% cpu usage browsing blackisha.com
Version: unspecified → 1.5 Branch
Comment 2•18 years ago
|
||
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9a1) Gecko/20050920 Firefox/1.6a1 ID:2005092021 Sadly I can't get it to crash, but the site becomes unresponsive, the toolbars disappear and I have to force a shutdown of the browser.
Comment 3•18 years ago
|
||
Reproduced hang with 2005-09-18 branch build on OS X.
Comment 4•18 years ago
|
||
Debugging on a several days old home-made trunk build, the hang seems to be in nsXMLHttpRequest::Send(), in the following loop, which appears to run forever: // If we're synchronous, spin an event loop here and wait if (!(mState & XML_HTTP_REQUEST_ASYNC)) { while (mState & XML_HTTP_REQUEST_SYNCLOOPING) { modalEventQueue->ProcessPendingEvents(); } Confirming.
Assignee: nobody → xml
Status: UNCONFIRMED → NEW
Component: General → XML
Ever confirmed: true
Product: Firefox → Core
QA Contact: general → ashshbhatt
Version: 1.5 Branch → Trunk
Reporter | ||
Comment 6•18 years ago
|
||
SOrry, I should have said hang & cpu usage, not crash.
Comment 7•18 years ago
|
||
Regression window is between 2005-03-28-09 and 2005-03-29-08. Checkins: http://bonsai.mozilla.org/cvsquery.cgi?treeid=default&module=all&branch=HEAD&branchtype=match&dir=&file=&filetype=match&who=&whotype=match&sortby=Date&hours=2&date=explicit&mindate=2005-03-28+08%3A00&maxdate=2005-03-29+09%3A00&cvsroot=%2Fcvsroot
Comment 8•18 years ago
|
||
Since this is a regression and a hang, requesting blocking1.8b5. I have no idea how to make a testcase for this, and I'm also not sure what the correct component is.
Flags: blocking1.8b5?
Comment 9•18 years ago
|
||
(In reply to comment #4) > Debugging on a several days old home-made trunk build, the hang seems to be in > nsXMLHttpRequest::Send(), in the following loop, which appears to run forever: > > // If we're synchronous, spin an event loop here and wait > if (!(mState & XML_HTTP_REQUEST_ASYNC)) { > while (mState & XML_HTTP_REQUEST_SYNCLOOPING) { > modalEventQueue->ProcessPendingEvents(); > } > > Confirming. The site should switch to an async xmlhttprequest to stop the UI thread from not refreshing.
Assignee | ||
Comment 10•18 years ago
|
||
I can reproduce this with a 2005-09-22 Windows 1.8-branch build.
Assignee | ||
Comment 11•18 years ago
|
||
I noticed that this page appears to be sync loading via XMLHttpRequest from a mousemove handler. The sync load causes XMLHttpRequest to push a child event queue, which suppresses events from existing requests from being processed. I noticed in my log file that the same URL was being loaded twice when the hang happens. The second request for the URL is stuck waiting to access the cache while the first one is responsible for populating the cache. The first never has any of its events processed, however, which prevents it from ever writing to the cache, and thus we're in a dead locked situation where nothing can proceed. I'm not sure what is causing the event queue to spin so madly though.
Assignee | ||
Comment 12•18 years ago
|
||
Here's a HTTP log that shows two different XMLHttpRequest objects each attempting to sync load the same resource. Trouble occurs because of the modal event queues.
Assignee | ||
Comment 13•18 years ago
|
||
This stack trace shows recursive calls to XMLHttpRequest::Send. Matching those up to the HTTP log, we see that they are both for the same URL. The second never finishes because it is waiting for the first to complete, but the first never completes because its events are never processed. Yay! :-(
Assignee | ||
Comment 14•18 years ago
|
||
OK, the problem is that PL_ProcessPendingEvents does not actually process its events if called recursively. See: http://lxr.mozilla.org/mozilla/source/xpcom/threads/plevent.c#602 NOTE: It turns out that the statement I made above is not true: "The sync load causes XMLHttpRequest to push a child event queue, which suppresses events from existing requests from being processed." The elder event queue is processed, but because of the code pointed to above, it turns out that the events simply aren't processed. This explains why we loop madly through ProcessPendingEvents.
Assignee | ||
Comment 15•18 years ago
|
||
As a test, I commented out this code at the top of PL_ProcessPendingEvents: /* if (self->processingEvents) { _pl_AcknowledgeNativeNotify(self); self->notified = PR_FALSE; PR_ExitMonitor(self->monitor); return; } */ With that code removed, I can no longer reproduce this bug. I'm not sure that it is the correct change. Unfortunately, there is near zero documentation in plevent.c about the purpose of the processingEvents field and the cvs logs for this chunk of code are barren. It seems to me that our nested event queue implementation will only suppress elder event queues when the elder event queue's PL_ProcessPendingEvents call is on the stack. But, then one has to wonder why a nsEventQueue bothers to call ProcessPendingEvents on its mElderQueue at all... /Sigh/
Comment 16•18 years ago
|
||
i think this block of code was added so that we don't overflow the Windows native event queue with a bunch of notification MSG's. Yes, our stacked event queue impl. is not consistent, poorly documented, and unowned. Simon has recently has experience much pain because of this.
Comment 17•18 years ago
|
||
I just had a very similar problem in Camino with a PLEvent used by an nsProxyObject; PLEvent processing was already on the stack, which meant that the PLEvent for the proxied XPCOM call (it was an observer notification from nsKeygenThread) was never processed (bug 308700). I fixed this by pushing another event queue, but I'd rather this was fixed at a more general level.
Assignee | ||
Comment 18•18 years ago
|
||
Unfortunately, pushing another event queue won't help with this bug. We are already pushing an event queue per XMLHttpRequest::Send invocation.
![]() |
||
Comment 19•18 years ago
|
||
Doesn't the new nsIEventQueue have a different PLEventQueue, though? So I'd expect that we would process events on the child queue ok. Also, is the regression range in comment 7 correct? I don't see any event queue or XMLHttpRequest checkins in there...
Comment 20•18 years ago
|
||
(In reply to comment #19) > Also, is the regression range in comment 7 correct? I don't see any event queue > or XMLHttpRequest checkins in there... I double-checked with Firefox builds (on a different machine) and I still see the regression between those two builds (except that the 2005-03-28 build is 2005-03-28-08, and not 2005-03-28-09 as I wrote). I also checked with Mozilla (suite) builds: Build ID 2005032810 works, Build ID 2005032909 hangs. Of course, it would be nice if someone else could confirm this as well.
Assignee | ||
Comment 21•18 years ago
|
||
(In reply to comment #19) > Doesn't the new nsIEventQueue have a different PLEventQueue, though? So I'd > expect that we would process events on the child queue ok. Yes, and we do. But, in this situation the second request (run by the child queue) will not finish until the first request (run by the elder queue) finishes. There's no way for the first request to finish if its events are never processed.
![]() |
||
Comment 22•18 years ago
|
||
Hmm... Why can't the second request finish until the first does? Maybe I'm confused about the call order here... I thought we were calling send() again while waiting for a response to the first send(), which caused us to push a second nested event queue for the new send(). Is that not what's going on?
Assignee | ||
Comment 23•18 years ago
|
||
bz: See comment #11. The problem is that the second request is stuck waiting to access the cache entry that is being written to by the first request.
![]() |
||
Comment 24•18 years ago
|
||
Ah, I see... So what we somehow need is to prevent all interaction with the page while that first sync XMLHttpRequest is in flight?
Assignee | ||
Comment 25•18 years ago
|
||
> So what we somehow need is to prevent all interaction with the page while that
> first sync XMLHttpRequest is in flight?
Right. The problem is the second mousemove event. I suspect that something
changed between 1.0 and 1.5 that causes that second mousemove event to fire.
Comment 26•18 years ago
|
||
> I suspect that something > changed between 1.0 and 1.5 that causes that second mousemove event to fire. Bug 284664 has to do with mouse events, and was fixed in the regression window of this bug.
![]() |
||
Comment 28•18 years ago
|
||
The problem is that the new behavior is basically correct. While a "sync" XMLHttpRequest is in flight, we are delivering events to the page, and that would include mouse move events. The bug roc fixed just fixed cases where we didn't delive mouse move events at all (independent of whether there was XMLHttpRequest involved). So the real issue is that our impl of "sync" XMLHttpRequest is really not very sync. This has caused problems before; this is just another example.
![]() |
||
Comment 29•18 years ago
|
||
Darin, could we possibly switch XMLHttpRequest to just using open() for sync sends? Then we could perhaps deal on the HTTP end by throwing from an open() call if the cache entry is already in use....
Assignee | ||
Comment 30•18 years ago
|
||
(In reply to comment #29) > Darin, could we possibly switch XMLHttpRequest to just using open() for sync > sends? Then we could perhaps deal on the HTTP end by throwing from an open() > call if the cache entry is already in use.... That's an interesting idea. We could also make the second HTTP request simply bypass the cache if it cannot get exclusive access. Or, would it be better to throw?
![]() |
||
Comment 31•18 years ago
|
||
I'd prefer bypassing the cache to throwing, yeah. Do we need to switch to open() to do this, or could we do it in today's asyncOpen() world? I assume the latter would affect far too many other things, since we would have no idea that the caller is evil?
Assignee | ||
Comment 32•18 years ago
|
||
(In reply to comment #31) > I'd prefer bypassing the cache to throwing, yeah. > > Do we need to switch to open() to do this, or could we do it in today's > asyncOpen() world? I assume the latter would affect far too many other things, > since we would have no idea that the caller is evil? Well, we could implement it as a load flag. DONT_BLOCK_ON_CACHE or something like that. I can't think of a good name for it right now.
![]() |
||
Comment 33•18 years ago
|
||
That might be easiest and safest for 1.8, with possibly a followup bug to consider using open() for trunk?
Assignee | ||
Comment 34•18 years ago
|
||
Yeah, that's probably a reasonable solution for the impending release.
![]() |
||
Comment 35•18 years ago
|
||
Darin, could you put that together? Let's call the flag BYPASS_CACHE_IF_LOCKED perhaps?
Assignee | ||
Comment 36•18 years ago
|
||
Yeah, I could do that. But, I'd like to think about this a bit more. I don't really like exposing details of the cache implementation like this if it can be helped. Hmm... -> me
Assignee: roc → darin
Target Milestone: --- → mozilla1.8beta5
Assignee | ||
Comment 37•18 years ago
|
||
Perhaps a better option would be to treat the current Window as modal when we are sync loading. Then it would not respond to events, and we would avoid this nasty re-entrancy problem. I think I've suggested this before, but I don't know what it would take exactly to simulate a modal dialog. Perhaps there is code in WindowWatcher::OpenWindowJS that we could rip.
Updated•18 years ago
|
Flags: blocking1.8b5? → blocking1.8b5+
Comment 38•18 years ago
|
||
Darin, how common / prevalent do you think this issue will be out in the wild?
Assignee | ||
Comment 39•18 years ago
|
||
> Darin, how common / prevalent do you think this issue will be out in the wild?
I don't know. You'd hope that most people would know to use the asynchronous
mode instead. However, I wouldn't be surprised if we hear about this causing
compatibility problems with some legacy web apps developed against IE, which has
a much better synchronous XMLHTTP implementation.
Updated•18 years ago
|
Flags: blocking1.8b5+ → blocking1.8b5-
Assignee | ||
Updated•18 years ago
|
Flags: blocking1.8rc1?
Comment 40•18 years ago
|
||
our inclination is to minus this bug for 1.5, we don't think this is a critical stop ship bug and we decided it wasn't important enough for beta 2. Care to build a case to the contrary Darin? Leaving the nomination alone for now.
Assignee | ||
Comment 41•18 years ago
|
||
This is a regression from Firefox 1.0; however, people can workaround the problem. I don't know how much trouble this will cause in the wild.
Updated•18 years ago
|
Flags: blocking1.8rc1? → blocking1.8rc1-
Assignee | ||
Comment 42•18 years ago
|
||
Here's a workaround that does the trick. I think this is a good fix for 1.5rc1.
Attachment #199374 -
Flags: superreview?(jst)
Attachment #199374 -
Flags: review?(bzbarsky)
Assignee | ||
Comment 43•18 years ago
|
||
This patch also implies that we can simply eliminate the pushing and popping of event queues in XMLHttpRequest, but I was trying to minimize the changes.
Assignee | ||
Comment 44•18 years ago
|
||
(In reply to comment #43) > This patch also implies that we can simply eliminate the pushing and popping of > event queues in XMLHttpRequest, but I was trying to minimize the changes. To add to that comments, even if we did away with the PushThreadEventQueue call in XMLHttpRequest, we'd still need to walk the event queue chain in order to deal with the fact that someone else might have pushed a modal event queue.
![]() |
||
Comment 45•18 years ago
|
||
Hmm... So any events that got posted before the send() call will also get processed, basically?
Assignee | ||
Comment 46•18 years ago
|
||
> Hmm... So any events that got posted before the send() call will also get
> processed, basically?
Yes.
![]() |
||
Comment 47•18 years ago
|
||
Does that change our behavior with regard to interaction with timeouts and intervals? Or are we firing those into the XMLHttpRequest queue anyway?
Assignee | ||
Comment 48•18 years ago
|
||
(In reply to comment #47) > Does that change our behavior with regard to interaction with timeouts and > intervals? Or are we firing those into the XMLHttpRequest queue anyway? Right, we fire those into the current event queue... same with input events. Given that the current event loop may end up executing arbitrary events, I see no reason not to process all event queues from this code.
![]() |
||
Comment 49•18 years ago
|
||
Comment on attachment 199374 [details] [diff] [review] v1 patch Yeah, ok. Seems reasonable. I guess as long as we can't end up in an XMLHttpRequest that started _after_ a modal dialog that we really want to block necko, eg, came up, we're ok. That assumption isn't true for Linux Firefox, but I don't see what we can do about it here.
Attachment #199374 -
Flags: review?(bzbarsky) → review+
Assignee | ||
Comment 50•18 years ago
|
||
> Yeah, ok. Seems reasonable. I guess as long as we can't end up in an
> XMLHttpRequest that started _after_ a modal dialog that we really want to block
> necko, eg, came up, we're ok. That assumption isn't true for Linux Firefox,
> but I don't see what we can do about it here.
Well... with this change, an async XMLHttpRequest might fire onload while a
script is inside a call to alert(). Similarly, an async XMLHttpRequest might
fire onload while a script is performing a sync XMLHttpRequest send.
![]() |
||
Comment 51•18 years ago
|
||
I'm more worried about necko events being processed while something like the HTTP password prompt is up. I guess that can happen if the send() is done first, then in a different window the user tries to load something that uses basic auth, right? If it does happen, will HTTP deal?
Assignee | ||
Comment 52•18 years ago
|
||
> I'm more worried about necko events being processed while something like the
> HTTP password prompt is up. I guess that can happen if the send() is done
> first, then in a different window the user tries to load something that uses
> basic auth, right? If it does happen, will HTTP deal?
HTTP should be fine in that case because the same nsHttpChannel instance
shouldn't be re-entered while it is making that call to PromptUsernameAndPassword.
![]() |
||
Comment 53•18 years ago
|
||
So we have something preventing that other than the fact that the dialog is modal?
Assignee | ||
Comment 54•18 years ago
|
||
> So we have something preventing that other than the fact that the dialog is modal?
No, not really.... I suppose it may be possible for OnDataAvailable and
OnStopRequest events to be processed while we are inside the prompt call if the
event queue is processed, and that might not be so good. The problem, I think,
is that we do not suspend those events while making the prompt call. So,
yeah... this patch might be trouble :-(
Assignee | ||
Comment 55•18 years ago
|
||
Comment on attachment 199374 [details] [diff] [review] v1 patch OK, this patch is no good. A cheap solution to this bug is to simply bypass the cache when making a sync load. I tried to avoid that solution because it introduces bad performance on the site in question, but maybe that's just going to have to be the solution for Gecko 1.8.
Attachment #199374 -
Attachment is obsolete: true
Assignee | ||
Comment 56•18 years ago
|
||
This patch implements what we came up with earlier: a load flag for causing the channel to bypass the local cache instead of waiting for a cache entry.
Attachment #199664 -
Flags: superreview?(bzbarsky)
Attachment #199664 -
Flags: review?(cbiesinger)
Assignee | ||
Updated•18 years ago
|
Attachment #199374 -
Flags: superreview?(jst)
![]() |
||
Updated•18 years ago
|
Attachment #199664 -
Flags: superreview?(bzbarsky) → superreview+
Comment 57•18 years ago
|
||
Comment on attachment 199664 [details] [diff] [review] v2 patch + * request. Unlike LOAD_BYPASS_CACHE, it does not force an end-to-end load + * (i.e., it does not affect proxy caches), and it does not prevent the + * channel from writing to the cache once it receives a response from the + * server. hmm... this comment sounds to me like LOAD_BYPASS_CACHE prevents writing to the cache, which isn't the case... r=biesi
Attachment #199664 -
Flags: review?(cbiesinger) → review+
Assignee | ||
Updated•18 years ago
|
Attachment #199664 -
Flags: approval1.8rc1?
Assignee | ||
Comment 58•18 years ago
|
||
> hmm... this comment sounds to me like LOAD_BYPASS_CACHE prevents writing to the
> cache, which isn't the case...
Good point. I'll clean that up before check-in.
Comment 59•18 years ago
|
||
Darin, can you tell us what kind of risk is involved in taking this and how many sites are going to be broken or developers prevented from doing good and obvious coding by thid problem? Also, this would need to land and be verified to work and not break anything on the trunk before we'd consider for the branch.
Assignee | ||
Comment 60•18 years ago
|
||
> Darin, can you tell us what kind of risk is involved in taking this and how
> many sites are going to be broken or developers prevented from doing good and
> obvious coding by thid problem?
Asa: This bug is a regression. While websites can workaround this problem, I
suspect that many will be shocked to find that they have to. The fix is very
low-risk IMO, and I think we should definitely take it for this release.
I'm landing this on the trunk today.
Comment 61•18 years ago
|
||
we'll evaluate again after this has been landed and verified on the trunk.
Assignee | ||
Comment 62•18 years ago
|
||
fixed-on-trunk
Status: NEW → RESOLVED
Closed: 18 years ago
Resolution: --- → FIXED
Comment 63•18 years ago
|
||
I still see issues using today's Windows trunk build Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9a1) Gecko/20051021 Firefox/1.6a1. I load the page, and when I hover over the links in the comments section, I have difficult escaping out of that mode. At one point the browser page just became blank, and then eventually the browser comes back to life. I will have Tracy try this out too to see what he sees.
Comment 64•18 years ago
|
||
seemed to work ok for me in a cvs build and my nightly trunk build on winxp updated to today. I couldn't reproduce the issues Marcia is seeing.
Assignee | ||
Comment 65•18 years ago
|
||
Marcia, Yeah that is working as expected. The good part is that your browser did eventually come back to life. Without the patch in this bug, it would never come back to life. Fixing the other part of the bug is well beyond the scope of FF 1.5.
Comment 66•18 years ago
|
||
Verifying based on Darin's comment using Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9a1) Gecko/20051021 Firefox/1.6a1. I checked CPU usage while the page was loading and there are no issues there, as well as no crash. Marking verified.
Status: RESOLVED → VERIFIED
Comment 67•18 years ago
|
||
Also looks good on Linux trunk build from 1021
Updated•18 years ago
|
Attachment #199664 -
Flags: approval1.8rc1? → approval1.8rc1+
Summary: mozilla 1.5beta1 freezes & goes into 95+% cpu usage browsing blackisha.com → Firefox 1.5beta1 freezes & goes into 95+% cpu usage browsing blackisha.com
![]() |
||
Comment 69•18 years ago
|
||
*** Bug 300631 has been marked as a duplicate of this bug. ***
Just a note that I can still reproduce this with 1.5 (Gecko/20051111 Firefox/1.5) on OSX -- talkback http://talkback-public.mozilla.org/talkback/fastfind.jsp?search=2&type=iid&id=12086427 , I had to SEGV the process manually to get tb to kick in. I only left it running for about 10-15 minutes, but in that time it never came back. I can also trivially reproduce this using an AJAX "skin" for the web interface for my SlimServer (music server thingy, http://www.slimdevices.com/su_downloads.html), which seems to use sync xmlhttprequest calls throughout =/
Assignee | ||
Comment 71•18 years ago
|
||
Vlad: I think it is a different problem on OSX. My FF gets hung as well sometimes -- not always -- and when it does get hung, it is not consuming the CPU. So, I think you found a different bug.
Assignee | ||
Comment 72•18 years ago
|
||
Vlad: In my testing, I found that if I waited long enough, my browser would always become unfrozen. With some of the very large comment blocks on blackisha.com, it would sometimes take several minutes to fetch the data. Also, if you happened to trigger multiple mouseover events on that site, then you would trigger multiple synchronous fetchs of the data, which would surely increase the time it takes to unblock the browser (note: the disk cache doesn't help). I filed bug 317596 for this issue.
You need to log in
before you can comment on or make changes to this bug.
Description
•