Closed Bug 1174923 Opened 4 years ago Closed 4 years ago

The document load event should not be delayed for decoding images

Categories

(Core :: ImageLib, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla41
Tracking Status
firefox41 --- fixed

People

(Reporter: seth, Assigned: seth, NeedInfo)

References

Details

Attachments

(1 file, 2 obsolete files)

There's no particularly good reason to delay the document load event until we've finished decoding the first frame of images. Content won't be able to tell as long as we delay the load event until we've discovered the image's size.

The main argument in favor of delaying the document load event until we've decoded the first frame of images (as discussed in bug 512435) is that not doing so might lead to "flicker", which in this context seems to mean painting an image before it's fully decoded and then having to repaint the remaining portion of the image later. However, the protection this offers is quite limited, since we can *already* get that kind of flicker in many situations:

(1) Images which aren't in the document or aren't visible are already not guaranteed to be decoded when the load event fires. If script assumes that they are and moves them onscreen, or if a CSS animation or transition brings them into view, we may still get flicker.

(2) When the user scrolls new content into view, we may get this same kind of flicker, because we don't automatically decode every image on the page - we only decode those images which are close to the viewport.

(3) If we've loaded the image before (either because it was present on another image and is still in the image cache, or because the user soft-refreshed the current page) then we *already* don't block the document load event until the image is redecoded, and we may get flicker.

There are so many ways in which the "don't fire document load until images' first frames are available" heuristic can go wrong that I honestly do not see much value in continuing to do it. We *do* want to try to decode images shortly before they become visible whenever possible, and we do this now and are going to continue to get better at it. I just don't think that delaying the document load event usefully contributes to that strategy.

What I think we should do instead is only delay the document load event until we have loaded the images from the network and have decoded them enough to determine their size. This strategy has a significant advantage: it lets whatever JavaScript the site may want to run when the document load event fires *start running much earlier*, particularly on slower machines or machines with fewer cores. That JavaScript often triggers new work, such as style or layout flushes or network IO, that we can overlap with image decoding, which happens off-main-thread. Overlapping that work means that *pages load faster*, which is something I really think we should prioritize, and the benefit is bigger on older machines that really need a boost.

TLDR: Changing this policy means better page load performance, and the policy didn't really solve the problem it was intended to solve anyway.
CC'ing some folks who may have opinions on this issue. If anyone has any feedback or objections, I'd love to hear it.
If we can get away with this I'm all for it.
Here's the patch to actually make this change.

I didn't actually remove the onload blocking infrastructure from ImageLib here,
even though this patch leaves it unused, as we will end up needing it in bug 1163856.
(Though perhaps only temporarily, until a bit more refactoring can be done.)
Attachment #8622761 - Flags: review?(tnikkel)
Blocks: 1163856
It still seems to me that if we _started_ the decode before onload fired, we should probably hold onload until the decode completes.  Or something.  Web authors think it's really weird if onload fires and then a while later the images actually show up...
(In reply to Not doing reviews right now from comment #5)
> It still seems to me that if we _started_ the decode before onload fired, we
> should probably hold onload until the decode completes.  Or something.  Web
> authors think it's really weird if onload fires and then a while later the
> images actually show up...

This is the status quo (the directive you gave me back in 2009, which I implemented). Seth is proposing changing that, on the argument that the weirdness isn't really unique to this case, and that the potential benefit is high.
The weirdness is not really unique to this case, but the other cases where it comes up are considered bugs by web developers.... :(
roc, dbaron: what do you guys think?
Flags: needinfo?(roc)
Flags: needinfo?(dbaron)
I do understand that web page authors prefer that all images are decoded and painted before we deliver the document load event, since it gives them an easier mental model to work with, but I think this policy change is much better for *users*, since pages load faster, particularly on low-end hardware. And this change does not really introduce any new issues for authors that didn't exist before, though it may make the existing ones more visible.

(In reply to Not doing reviews right now from comment #7)
> The weirdness is not really unique to this case, but the other cases where
> it comes up are considered bugs by web developers.... :(

I think we'd be *much* better off coming up with targeted solutions for the actual issues. The policy we're discussing here is just a broad heuristic; it doesn't really do anything to guarantee "flicker-free" behavior. "Flicker-free" is probably not a guarantee we can realistically make, but I think more targeted heuristics aimed at specific issues can actually do better than what we have now.
I agree with Seth, but not strongly. I don't have a good feel for the ways in which Web devs rely on image load events. It seems to me that the only way to avoid flicker in general is to do more sync decoding, independent of this change.
Flags: needinfo?(roc)
Yep. My understanding is that Blink avoids these kinds of issues essentially by sync decoding all the time. I don't think that we want to move to that model, but perhaps we could benefit from *some* targeted sync decoding, especially once we ship e10s and apz everywhere and there's less pressure on the main thread for UI responsiveness.

Right now the most important thing (which I'm working on right now) is improving the scheduling of async image decodes and becoming smarter about predicting which images will be needed before they're used. Once we eliminate the low hanging fruit in those areas, we can reevaluate and think about whether a limited amount of sync decoding might be useful in some situations.
So there were a lot of oranges in the last try push. They were mostly related to
VectorImage; unfortunately, investigating further reveals that the _current_
behavior of VectorImage is not even correct with respect to the image load
event. I've reverted any changes to VectorImage for now, and filed bug 1175371
about fixing the issue there.

There was also an issue related to |document.readyState|, resulting in failures
in |dom/html/test/test_bug347174_write.html|. This turned out to be related to
some flat-out wrong code in nsImageLoadingContent which delayed firing the image
|load| and |error| events until the image finished decoding, but only if the
image had already started decoding by the time ImageLib sent LOAD_COMPLETE. We
really do not want to decide whether to fire the |load| or |error| event
differently depending on whether we've happened to start decoding the image yet.
(I suppose this code is so old that it hails from a time when we started
decoding all images in the document as soon as they were loaded.) On top of
that, this is a spec violation, because the |error| event is only supposed to
fired when an image is so bad that we can't even get a size for it.  I've just
ripped out that code totally, which resolved the oranges in
|test_bug347174_write.html| locally.
Attachment #8623428 - Flags: review?(tnikkel)
Attachment #8622761 - Attachment is obsolete: true
Attachment #8622761 - Flags: review?(tnikkel)
Depends on: 1175371
OK, this version of the patch resolves one of the two remaining issues. To wit:
it updates |browser_bug666317.js| to correctly spin the event loop when waiting
for async notifications that result from image decoding. This was a pre-existing
issue that seems to have been masked by the code that was removed in
nsImageLoadingContent.

The other remaining issue is that VectorImage does not currently wait to send
LOAD_COMPLETE until its size is available, which is a spec violation. That also
happened to usually get masked by the code that was removed in
nsImageLoadingContent, resulting in some test failures. That's getting fixed in
bug 1175371, which now blocks this bug.

Locally, all remaining failures are resolved. Unfortunately try is down right
now, so I can't verify on try yet.
Attachment #8624013 - Flags: review?(tnikkel)
Attachment #8623428 - Flags: review?(tnikkel)
Attachment #8623428 - Attachment is obsolete: true
Comment on attachment 8624013 [details] [diff] [review]
Stop delaying the document load event until images are decoded

I'm on the fence about this, but you say that this blocks work on bug 1163856 so we can give this a go and see how it effects things.
Attachment #8624013 - Flags: review?(tnikkel) → review+
Backed out in https://hg.mozilla.org/integration/mozilla-inbound/rev/966e5c2db9f6 - I can't tell you every platform where it will or won't fail since we barely run any tests anymore, but at least for OS X 10.10 debug (try: -b d -p macosx64 -u reftest[10.10] -t none or try: -b d -p all -u reftest[-xylophone] -t none if you need every platform including 10.10 at once), somewhere between 50% and 90% of runs hit https://treeherder.mozilla.org/logviewer.html#?job_id=10954289&repo=mozilla-inbound
I should explain that that failure is an absolutely tiny difference that could well be related to timing differences in calling imgFrame::Optimize(), like so many other bugs. (See bug 1128229.) I really need to get around to fixing that issue. I went ahead and added fuzz to the test. I couldn't help but notice that between when comment 19 got posted and when comment 20 got posted, someone else had already added fuzz to the test on Android, so it's not even clear that the issue is related to this patch. (Though this patch might well make it more obvious.)
https://hg.mozilla.org/mozilla-central/rev/a79cb8e688cd
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla41
See Also: → 1128229
Blocks: 1177604
Depends on: 1178353
Depends on: 1192159
Depends on: 1194837
No longer depends on: 1192159
You need to log in before you can comment on or make changes to this bug.