Created attachment 754551 [details] Instruments trace of infinite loop I just had a hang in the current (2013-05-27) Nightly. An Instruments run indicates an iloop within mozilla::image::RasterImage::DrawFrameTo. When this happened, I had just clicked on a tags folder in Google Reader, the previously active folder's content was already gone, but the newly active one's not yet shown. Instruments trace is attached.
I don't see how we could be ilooping _within_ DrawFrameTo; the only loops seem to be quite bounded. My suspicion is that we just end up calling DrawFrameTo over and over instead.
Quite possibly, yes. I should've attached a debugger to the app while it was still running ...
So, this happened again, and this time I did attach a debugger to it. The iloop indeed occurs in RequestRefresh, not DrawFrameTo. I'd bet good money this is a regression from bug 867758.
Very likely, yes, or one of its dependent bugs. I thought I had fixed this in bug 875173, though. Till, do you recall (were you able to tell?) precisely what condition caused this? Or do you have any possible steps to reproduce this?
(In reply to Joe Drew (:JOEDREW! \o/) from comment #4) > Very likely, yes, or one of its dependent bugs. I thought I had fixed this > in bug 875173, though. > > Till, do you recall (were you able to tell?) precisely what condition caused > this? Or do you have any possible steps to reproduce this? Sorry, no on both counts. I ran into this in an opt build and didn't have symbols, so couldn't tell which condition caused it. The only, pretty unreliable, STR I can give are in comment 0: switch between folders in Google Reader ... If that helps at all, I also had a hang that went away after a few seconds while switching in gReader. Of course, that might've been unrelated.
So, I can reproduce this issue annoyingly easy, now. :( At least, that gave me plenty of opportunities to investigate it some more: turns out the hang isn't really permanent, it can just take a long time for the loop to end. Sometimes it ends after a second or so, sometimes it takes maybe a minute. After fetching the symbols, I can confirm the somewhat obvious: it's the loop at http://mxr.mozilla.org/mozilla-central/source/image/src/RasterImage.cpp#657 that takes this long to complete. This being an opt build, however, I can't tell which condition is causing this, exactly: all locals are optimized away, and my asm foo isn't nearly good enough to figure out in which registers the relevant values are stored, sorry. @joe, can I do anything else to help you debug this? Ping me on IRC if an interactive debug session would help you.
I'd be very interested if you still have problems with the fix to bug 876332.
Ok, will retest once that has landed.
So, the fix to bug 876332 made this much harder to reproduce (as in, it doesn't happen every second time I navigate within gReader), but, unfortunately, it didn't fully fix it. Here's a Gecko Profiler profile of the hang - not that it gives a lot of new insights or anything: http://people.mozilla.com/~bgirard/cleopatra/#report=0221df66d0a16e82f70d670ef06002d9ef11903e
Unfortunately bug 876332 seems to have fixed the other bug that seemed to have the same cause as this (bug 877253), so it's harder to reproduce. Till, if you're up to building and debugging Firefox, I can walk you through finding out what the state is when we run into this case.
(In reply to Joe Drew (:JOEDREW! \o/) from comment #10) > Till, if you're up to building and debugging Firefox, I can walk you through > finding out what the state is when we run into this case. Yes yes, I'm certainly up for that. I just hoped that giving you the information I had would let me get out of doing propert debugging. ;) I have a debug build and can reproduce the issue with that. I guess we should go through that together the next time we're both online, but here's what information I get out of it, for now: In RasterImage::RequestRefresh, the loop `while (currentFrameEndTime <= aTime)` doesn't really iloop, it just sometimes takes a very long time to finish. One case I had in the debugger looked like this: - aTime: 211042319161672 - all frame end times are identical: 210825837815525 - the animation has 18 frames - loop count is -1 - the image in question is "chrome://firstname.lastname@example.org" - AdvanceFrame() returns `true` It took tens of thousands of iterations for this loop to finish. I don't pretend to understand the code well enough to know whether this is intended or a problem, and the general slowness of the debug browser makes it kinda hard to know if this was an actual instance of the bug, but it might have been. If this information helps you in finding the bug: great. If not, ping me on IRC (#developers would work) when you have time, then we can go through it together.
Till's been having a lot of trouble reproducing this in a debugger again. Something I want to know is what GetCurrentFrameEndTime() is returning, and how it's calculating that result (i.e., what the individual frame timeouts are), both within RequestRefresh() and AdvanceFrame().
Would it be a good idea to back out bug 876332, temporarily, to help debug this issue?
No, because that bug itself caused long hangs, and significantly more frequently than this bug, so one would be overwhelmed with the other.
When I leave my browser idle on the following page, eventually one core get saturated. I'm not sure how long you'd have to leave the browser on the page, but it's happened consistently on both my home and work PCs since yesterday when I first visited the page: http://feryjunaedi.wordpress.com/2008/05/29/redistribution-between-eigrp-in-two-different-autonomous-systems/
Does your browser stop responding?
(In reply to Joe Drew (:JOEDREW! \o/) from comment #17) > Does your browser stop responding? Mostly unresponsive. Any interaction is excruciatingly slow.
OK, and you've profiled and it's all under RequestRefresh?
That I didn't do. Will the Gecko Profiler extension provide sufficiently useful information for you?
Yep! That's exactly what I'd like!
I'm pretty sure what's happening here is that we're drawing an image a long time after its last refresh, for whatever reason, and looping, blending frames, until we reach the right time. Of course, mostly gifs loop, so we could do a *single* iteration of the loop instead of tens of thousands.
That squares with the conditions under which I tend to experience this, when I've scrolled down the endless tumblr dashboard for a long way, and then something causes me to scroll back to the top where an animated gif is looping.
Glad you're figuring it out 'cause I was having trouble getting the profile to upload and was going to try again later today. That said, I wasn't sure I was observing the right thing, because it was fingering a whole bunch of timer-related functions.
Created attachment 759411 [details] [diff] [review] For looping images, advance by the loop time when we're being asked to When there is a long delay between calls to RequestRefresh(), for example because an animated image has been scrolled off the screen, the current animation frame time can be significantly behind the current time, requiring a huge number of composites to catch up. This patch makes us skip those intermediate composites, jumping to the closest multiple of the image's loop time. I'm not sure if putting this in AdvanceFrame or RequestRefresh makes more sense, but since AdvanceFrame explicitly sets the currentAnimationFrameTime, I chose this location. It's not perfect, though, because it'll be checked every time.
That does not fix my issue of comment 16, so I might be experiencing a different issue.
In that case please try getting your profile to upload! :D
The patch fixes the bug for me.
Comment on attachment 759411 [details] [diff] [review] For looping images, advance by the loop time when we're being asked to Review of attachment 759411 [details] [diff] [review]: ----------------------------------------------------------------- Looks good.