Intermittent image/test/reftest/bmp/bmpsuite/q/wrapper.html?pal8os2sp.bmp == about:blank | image comparison, max difference: 245, number of differing pixels: 8128
Categories
(Core :: Graphics: ImageLib, defect, P3)
Tracking
()
People
(Reporter: intermittent-bug-filer, Unassigned)
References
Details
(Keywords: intermittent-failure, Whiteboard: [stockwell disabled])
Attachments
(2 files)
3.77 KB,
patch
|
Details | Diff | Splinter Review | |
47 bytes,
text/x-phabricator-request
|
Details | Review |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 35•6 years ago
|
||
Failure rate has increased here since the 18th of March.
Emilio, can it be from your push, bug 1395964?
Comment hidden (Intermittent Failures Robot) |
Comment 37•6 years ago
|
||
It seems plausible that may patch may have made this more intermittent, yeah, though it'd be just exposing a pre-existing problem.
Does this happen on qr-only? There may be a problem with the sync-decode path on webrender.
Comment 38•6 years ago
|
||
Seems like it's split maybe 60/40 webrender/non-webrender.
As a start, I pushed to try to see which of the 3 patches landed are responsible (or clear them if they happen to be innocent).
Comment 39•6 years ago
|
||
Seems to be caused by
https://hg.mozilla.org/mozilla-central/rev/3f1e9dae2467
Bug 1395964 - Make image loading changes not reframe.
Comment 40•6 years ago
|
||
So I think that makes some sense, because we're invalidating less stuff. See bug 1226748 comment 2, whatever race Benoit was talking about now probably happens almost always. Let me try catching it under rr sometime this week...
Comment 41•6 years ago
|
||
So, I haven't been able to repro yet with various combinations of RR chaos mode and MOZ_CHAOSMODE, but, looking at the reftest screenshots in here, in order to paint the blue rectangle that the reftest analyzer shows, the following has to be true:
- We're loading (otherwise we'd paint the broken image icon), but not marked as broken yet (since otherwise we'd be an empty inline given the alt=" ").
- We've the image size available (otherwise we'd be 0x0).
I suspect what's happening is something like:
- We stop the image load on onload, since the test waits for that (no frame change, just reflow).
- We get the SIZE_AVAILABLE notification, update our intrinsic size, request a reflow.
- We reflow, but we keep the intrinsic size we had around. EnsureIntrinsicSizeAndRatio does nothing because we do have a valid intrinsic size, even though imagelib may already know that the image is bad. This is the key difference compared to before my patch. Before my patch, the frame is new, has no intrinsic size, and pokes at imagelib directly asking for the intrinsic size at this point rather than using the size at the time we were notified of it being available.
- We paint, sync-decode, image is bad, we're done. But we paint the blue background already since we do have an intrinsic size (that will get invalidated at some point later when the image content gets notified and turns into the broken state).
Timothy, does this sound like a reasonable hypothesis?
How does this work in ImageLib? When you have a size but the image is bad, how do you notify clients that the size has "changed" (since a broken image has no intrinsic size I assume)?
Also, this could be a nice thing to debug with Pernosco if roc could make that happen? :)
Comment hidden (Intermittent Failures Robot) |
Comment 43•6 years ago
|
||
(In reply to Emilio Cobos Álvarez (:emilio) from comment #41)
I suspect what's happening is something like:
- We stop the image load on onload, since the test waits for that (no frame
change, just reflow).- We get the SIZE_AVAILABLE notification, update our intrinsic size,
request a reflow.- We reflow, but we keep the intrinsic size we had around.
EnsureIntrinsicSizeAndRatio does nothing because we do have a valid
intrinsic size, even though imagelib may already know that the image is bad.
This is the key difference compared to before my patch. Before my patch, the
frame is new, has no intrinsic size, and pokes at imagelib directly asking
for the intrinsic size at this point rather than using the size at the time
we were notified of it being available.- We paint, sync-decode, image is bad, we're done. But we paint the blue
background already since we do have an intrinsic size (that will get
invalidated at some point later when the image content gets notified and
turns into the broken state).Timothy, does this sound like a reasonable hypothesis?
Yeah, something like that. I think the problem must be that we can do a metadata decode succesfully (to get the size, and send the load event for the image), but then when we try to decode the image data we get an unrecoverable error. Because if the error comes during the metadata decode nsImageFrame::OnLoadComplete would clear the intrinsic sizes and ratio. So if we get into this state I think it might be possible that the imageframe never gets notified and keeps the non-zero intrinsic size/ratio.
How does this work in ImageLib? When you have a size but the image is bad,
how do you notify clients that the size has "changed" (since a broken image
has no intrinsic size I assume)?
There isn't really anything that notifies about size changes. If my hypothesis in the previous paragraph is correct then this error case falls into a class that we haven't really considered much. The existing imagelib consumers kind of assume that if they get a load/size available without an error on the image will be good. So we could either teach all the consumers how to deal with this situation, or add an ON_ERROR notification type which would seem to make it easier to deal with this. I'll attach a patch I wrote that does the first for nsImageFrame so we can confirm that it would fix the problem.
Comment 44•6 years ago
|
||
Comment 45•6 years ago
|
||
Sheriffs feel free to disable this test for now. The frequency seems quite high.
Comment 46•6 years ago
|
||
Hmm, that patch didn't work, my next guess is along the lines of comment 40. We've loaded the image, no errors, decoding hasn't started, reftest does sync decode paint, we decode, encounter error, but it's too late for this paint, we can't change the size of the frame during a paint, we paint as if we haven't encountered the error yet. After the reftest capturing paint the image frame adjusts (hopefully) and renders correctly.
Updated•6 years ago
|
Updated•6 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment 48•6 years ago
|
||
Timothy is disabling the test still needed here?
Joel, can you help out with and example on how to disable this?
Comment 49•6 years ago
|
||
:apavel, this is failing a lot, odd that it seems to be failing across platforms with the same failure at a pretty high rate.
Here is the line to edit:
https://searchfox.org/mozilla-central/source/image/test/reftest/bmp/bmpsuite/q/reftest.list#85
== wrapper.html?pal8os2sp.bmp about:blank
it should be:
fuzzy(0-245,0-8128) == wrapper.html?pal8os2sp.bmp about:blank
Comment 50•6 years ago
|
||
Comment 51•6 years ago
|
||
Updated•6 years ago
|
Comment 52•6 years ago
|
||
Pushed about a million times to try server yesterday, went from "how is this failing?" to "how does this ever work?" and back a few times. Continuing to investigate this.
Comment hidden (Intermittent Failures Robot) |
Comment 54•6 years ago
|
||
bugherder |
Comment 55•6 years ago
|
||
I think I understand the failure now, it's pretty simple actually. The complicated part is that the failure mode is completely possible before Emilio's patch (as shown by the long history of this bug). The
- metadata decode (ie size available) completes
- if a (full) decode is triggered between step 1 and 3 (by say nsImageFrame::MaybeDecodeForPredictedSize or by normal painting to the screen) then when that decode encounters an error the error can't be reported to the main thread until after step 3.
- reftest does drawwindow with syncdecode. the image is an error mode now, but it's too late, we'd have to reflow/frame construct to fix it.
If the decode error gets reported to the main thread then we mark the image frame as needed a frame reconstruct and reftest drawwindow call flushes so it is guaranteed to happen.
This sequence doesn't seem particular unlikely, actually seems like it would be pretty common actually. So why don't we hit it much more often before Emilio's patch? Based on observing the most likely explanation is that we get the load event for the image right after the size available notification (size available blocks load for the image, and it's a small local file so we likely have all the data right away), this would trigger a frame reconstruct before Emilio's patch. Since the test uses reftest-wait the reftest-wait polling would see this as an pending paint and do another poll. This would usually be enough time for the error to reach the main thread and give us the correct rendering. If it wasn't enough time then we would get the existing lower volume intermittent.
Comment hidden (Intermittent Failures Robot) |
Comment 57•6 years ago
|
||
(In reply to Timothy Nikkel (:tnikkel) from comment #55)
- if a (full) decode is triggered between step 1 and 3 (by say
nsImageFrame::MaybeDecodeForPredictedSize or by normal painting to the
screen) then when that decode encounters an error the error can't be
reported to the main thread until after step 3.
This might be a little confusing, so I'll clarify. The error can be reported to the main thread at any point. However, for the intermittent failure to happen it must be the case that the error doesn't get reported to the main thread until after step 3 happens.
Updated•6 years ago
|
Updated•2 years ago
|
Comment 58•5 months ago
|
||
https://wiki.mozilla.org/Bug_Triage#Intermittent_Test_Failure_Cleanup
For more information, please visit BugBot documentation.
Updated•4 months ago
|
Description
•