Closed Bug 835658 Opened 12 years ago Closed 12 years ago

Intermittent test_animSVGImage.html | timing out after 120000ms. Animated image still doesn't look correct, after call (#2831|#1919|#1890) to onStopFrame

Categories

(Core :: SVG, defect)

x86
Windows 7
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: philor, Assigned: seth)

References

Details

(Keywords: intermittent-failure)

https://tbpl.mozilla.org/php/getParsedLog.php?id=19220261&tree=Mozilla-Inbound Rev3 WINNT 6.1 mozilla-inbound pgo test mochitest-other on 2013-01-28 19:03:54 PST for push 4e0b83a23621 slave: talos-r3-w7-020 ... ... ... 14129 INFO TEST-PASS | chrome://mochitests/content/chrome/image/test/mochitest/test_animSVGImage.html | myOnStopFrame called 14130 ERROR TEST-UNEXPECTED-FAIL | chrome://mochitests/content/chrome/image/test/mochitest/test_animSVGImage.html | timing out after 120000ms. Animated image still doesn't look correct, after call #2831 to onStopFrame 14131 INFO TEST-PASS | chrome://mochitests/content/chrome/image/test/mochitest/test_animSVGImage.html | myOnStopFrame called ... ... ... Noisy bugger, innit?
This got a lot more frequent on Linux32 in the last day. Can somebody please take a look at this?
So, the spammy (multiple-failures-per-day) issues on m-i seem to have started with comment 7, for revision 9f044f9e981a. Just to be on the safe side, let's assume that one *might've* been random, and the spamminess could have started with the next inbound cset (comment 9, cset ea8b11062c07) So, here's a pushlog for a day leading up to that cset: https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=b093ba2c7ff9&tochange=ea8b11062c07 The most relevant changes there seem to be Seth's patches for bug 704059 (changing how SVG images behave).
I am taking a look at this now.
Argh. I am having a tough time reproducing this so far. Maybe I'm just being quite unlucky, or maybe my logging code is slowly things down enough that things always work. The try job where I'm doing this is here: https://tbpl.mozilla.org/?tree=Try&rev=95e4c4eed660 There are lots of failures due to the logging code I added (timeouts/"too much output", with one crash in test_733533 due to a bug in the logging code that isn't worth a rebuild to fix), but 'oth' on Fedora debug remains solid green. In case the issue is that timing has changing due to logging, I am preparing a second try job with much more minimal logging that will hopefully still at least give me a clue.
So I've pushed a new try job here to look a bit more into this problem: https://tbpl.mozilla.org/?tree=Try&rev=788e3c28084a I've reduced the logging to some simple sanity checking for now to make sure we're not in a totally weird and unexpected state. The other addition is that we now report a snapshot of the failure, which will hopefully help clarify what's happening here.
Drat, missing return statement burnt that try job. New one: https://tbpl.mozilla.org/?tree=Try&rev=8ee1e8d9b3df
Summary: Intermittent test_animSVGImage.html | timing out after 120000ms. Animated image still doesn't look correct, after call #2831 to onStopFrame → Intermittent test_animSVGImage.html | timing out after 120000ms. Animated image still doesn't look correct, after call (#2831|#1919) to onStopFrame
@Ed: Actually I've pushed several more testcases to try, and I haven't posted all of them here. My apologies. The newest one is here: https://tbpl.mozilla.org/?tree=Try&rev=ff6257a70c18 I think this is the first one that actually _does_ tell us something useful. In reading the following, the important thing to pay attention to is how currentTime (the internal SVG document's perception of where we are in the animation) is changing. Here's an (edited) portion of the log for the run of the test that succeeds (Fedora debug): > 11459 INFO TEST-START | chrome://mochitests/content/chrome/image/test/mochitest/test_animSVGImage.html > 11463 INFO TEST-PASS | chrome://mochitests/content/chrome/image/test/mochitest/test_animSVGImage.html | myOnStopFrame called > *** VectorImage::Draw for image 0x98841d0 (currentTime? 0.005000) > ( ... more myOnStopFrame calls, but we never see any currentTime values other than 0.005 and 0.24 ... ) > 11468 INFO TEST-PASS | chrome://mochitests/content/chrome/image/test/mochitest/test_animSVGImage.html | myOnStopFrame called > *** VectorImage::Draw for image 0x98841d0 (currentTime? 0.240000) > 11469 INFO TEST-PASS | chrome://mochitests/content/chrome/image/test/mochitest/test_animSVGImage.html | Animated image looks correct, at call #7 to onStopFrame > 11470 INFO TEST-END | chrome://mochitests/content/chrome/image/test/mochitest/test_animSVGImage.html | finished in 595ms Contrast this with the run that fails (Fedora opt): > 11449 INFO TEST-START | chrome://mochitests/content/chrome/image/test/mochitest/test_animSVGImage.html > 11453 INFO TEST-PASS | chrome://mochitests/content/chrome/image/test/mochitest/test_animSVGImage.html | myOnStopFrame called > *** VectorImage::Draw for image 0xa3ddd160 (currentTime? 0.001000) > ( ... WAY more myOnStopFrame calls, with many teeny tiny increments to currentTime .. ) > 13286 INFO TEST-PASS | chrome://mochitests/content/chrome/image/test/mochitest/test_animSVGImage.html | myOnStopFrame called > *** VectorImage::Draw for image 0xa3ddd160 (currentTime? 0.043000) > 13287 ERROR TEST-UNEXPECTED-FAIL | chrome://mochitests/content/chrome/image/test/mochitest/test_animSVGImage.html | timing out after 120000ms. Animated image still doesn't look correct, after call #1835 to onStopFrame > 13482 INFO TEST-END | chrome://mochitests/content/chrome/image/test/mochitest/test_animSVGImage.html | finished in 143845ms Notice that despite running for _way_ longer and passing through many more values of currentTime, the failing run has barely progressed at all through the animation. After 144 seconds it's at 0.043, while in the passing run we're at 0.24 after only 0.6 seconds. The difference in the speed at which we are advancing through the animation is 3 orders of magnitude. I think this is the source of the problem, and if we fix this, this intermittent orange will go away. As an aside, in a separate bug it'd be good to understand why we redraw so many times when nothing has changed (same currentTime, etc). This isn't exposed in the edited logs above, but if you look at the full logs, we redraw a _lot_ and I don't understand why.
(In reply to Seth Fowler [:seth] from comment #146) > Notice that despite running for _way_ longer and passing through many more > values of currentTime, the failing run has barely progressed at all through > the animation. Very interesting. So, that time is supposed to be updated in nsSMILTimeContainer::UpdateCurrentTime(), which is called every sample (driven by the refresh driver). Probably worthwhile to add some logging there to print out the time we set there, and in nsSMILAnimationController::WillRefresh to print out the time that we're passed by the refresh driver. > As an aside, in a separate bug it'd be good to understand why we redraw so > many times when nothing has changed (same currentTime, etc). That implies we're getting lots of calls to SVGRootRenderingObserver::DoUpdate, in VectorImage.cpp (That's how the internal SVG document tells us something has changed, invalidates the image-clients and gets itself repainted.) It's normal for that to be called on each animation sample, if something has changed that requires a re-render. But I'm not sure why it'd be getting called in this case.
(In reply to Daniel Holbert [:dholbert] from comment #152) > Probably worthwhile to add some logging there to print out the time we set > there, and in nsSMILAnimationController::WillRefresh to print out the time > that we're passed by the refresh driver. On it. > It's normal for that to be called on each animation sample, if something has > changed that requires a re-render. But I'm not sure why it'd be getting > called in this case. Yeah, I'm investigating this as well, because I'm a bit concerned about it. In my logging patch I've also added some code to differentiate the different callers of DoUpdate, so we can try to figure out what's going on.
The extra calls to Draw are traced to their root cause and addressed in bug 846028.
A new try job with additional refresh-related logging is up here: https://tbpl.mozilla.org/?tree=Try&rev=140014a8c764
An edited IRC transcript about the latest try run's results: > seth: basically the problem is that the first time nsSMILAnimationController::WillRefresh gets called, elapsedTime=1, and so that's what mAvgTimeBetweenSamples gets initialized to > seth: after that initial call, elapsedTime is consistently much worse > seth: the result is that every time, we trigger the "Detected really long delay between samples" code > seth: mAvgTimeBetweenSamples stays at 1, since elapsedTime never gets <= 200 again, and so every time we advance by the smallest possible tick > seth: i think though that the code that handles the "really long delay" situations is wrong > seth: it'd probably be appropriate to still update the moving average in that situation. maybe with some damping factor, but it should get updated > seth: that would prevent time essentially freezing for the SVG image in a situation like this Based on this, I've pushed an experimental patch to try that _may_ fix this orange: https://tbpl.mozilla.org/?tree=Try&rev=462ee25eb0db
Preliminary results are good! We got green! It's possible that we just got lucky, though, so I queued up a bunch more runs. This version of the patch is sitting on top of a big logging patch, so right now I'm rebasing against m-c and preparing to create a new bug about the fix.
Depends on: 846861
We'll take care of this orange in bug 846861.
Assigning to Seth, to take this off of the list of high-frequency-oranges-that-are-currently unowned. As he noted in comment 280, he's got a fix for this over in bug 846861 that should hopefully take care of this for good.
Assignee: nobody → seth
Summary: Intermittent test_animSVGImage.html | timing out after 120000ms. Animated image still doesn't look correct, after call (#2831|#1919) to onStopFrame → Intermittent test_animSVGImage.html | timing out after 120000ms. Animated image still doesn't look correct, after call (#2831|#1919|#1890) to onStopFrame
Whiteboard: [Fix in bug 846861]
Hasn't occurred since bug 846861 landed; marking fixed - we can always reopen if it fails again on a tree that has that cset.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Whiteboard: [Fix in bug 846861]
\o/
You need to log in before you can comment on or make changes to this bug.