Closed Bug 421602 Opened 17 years ago Closed 17 years ago

[FIX]Image onload fails to fire intermittently

Tracking

()

Status:

VERIFIED FIXED

People

(Reporter: alex, Assigned: bzbarsky)

Details

(Keywords: regression)

Attachments

(5 files)

test case 17 years ago Alex Sirota (iosart) 4.44 KB, text/html		Details
Modified testcase, keep Javascript variables alive 17 years ago Olli Pettay [:smaug][bugs@pettay.fi] 5.66 KB, text/html		Details
Possible approach 17 years ago Boris Zbarsky [:bzbarsky] 8.58 KB, patch	pavlov : review+ sicking : superreview+	Details \| Diff \| Splinter Review
valgrind log 17 years ago Andrew Schultz 2.35 KB, text/plain		Details
Fix that 17 years ago Boris Zbarsky [:bzbarsky] 1.27 KB, patch	pavlov : review+	Details \| Diff \| Splinter Review

Alex Sirota (iosart)

Reporter

Description

•

17 years ago

While loading several images by creating an Image object and assigning the src property to it, along with an onload event handler, the onload event handler sometimes doesn't fire (in Firefox 3 Beta 3). var preloaderImg = new Image(); preloaderImg.onload = function() { ... } preloaderImg.src = '...'; To see the bug in action, clear the browser cache and open the attached html file with Firefox 3 Beta.

Alex Sirota (iosart)

Reporter

Comment 1

•

17 years ago

Attached file test case — Details

Jaime Mitchell (use bugmail@jaimem.org.uk for email)

Comment 2

•

17 years ago

Alex is this a regression from Firefox 2.0.12?

Alex Sirota (iosart)

Reporter

Comment 3

•

17 years ago

(In reply to comment #2) > Alex is this a regression from Firefox 2.0.12? > Yep, I believe it is. The attached test case passes on Firefox 2.x, but fails on Firefox 3.x (btw - thanks to Dan Malca from FoxyTunes for troubleshooting this and creating the test case)

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 4

•

17 years ago

Attached file Modified testcase, keep Javascript variables alive — Details

Jaime Mitchell (use bugmail@jaimem.org.uk for email)

Comment 5

•

17 years ago

Requesting blocking as this is a regression from Firefox 2.0.x

Flags: blocking1.9?

Keywords: regression

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 6

•

17 years ago

Nothing in the original testcase keeps the Image objects alive while waiting data to be loaded from server. Modified testcase works fine.

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 7

•

17 years ago

Opera behaves similarly as FF3 (well, Opera actually starts to load 39 images here, of which 37 seven loads succeeds). The modified testcase works fine.

Alex Sirota (iosart)

Reporter

Comment 8

•

17 years ago

Ok. I actually had a similar test case, but in mine the preloaderImgArray was local to the main() function, so it probably was garbage collected as well, so the test failed. making it global makes sure it stays alive along with all the contained images. Does this make sense? FF3 seems more aggressive than FF2 with garbage collection (is this GC or some other mechanism taking place?). While strictly speaking this is 100% fine, I wonder how many (slightly sloppy) sites or libraries this could potentially break.

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 9

•

17 years ago

You could try to find the regression range and see which bug caused this. That might give some hint why ff3 works the way it does.

Ria Klaassen (not reading all bugmail)

Comment 10

•

17 years ago

Regression window is http://bonsai.mozilla.org/cvsquery.cgi?module=PhoenixTinderbox&date=explicit&mindate=1182974220&maxdate=1182981179

:Gavin Sharp [email: gavin@gavinsharp.com]

Comment 11

•

17 years ago

(In reply to comment #10) > Regression window is > http://bonsai.mozilla.org/cvsquery.cgi?module=PhoenixTinderbox&date=explicit&mindate=1182974220&maxdate=1182981179 Only change in that range that looks relevant is the cycle collector checkin. If this really is a case of us breaking cycles where we otherwise would have leaked, I'm not sure there's much to be done here...

Alex Sirota (iosart)

Reporter

Comment 12

•

17 years ago

(In reply to comment #11) > Only change in that range that looks relevant is the cycle collector checkin. > If this really is a case of us breaking cycles where we otherwise would have > leaked, I'm not sure there's much to be done here... Yep, I think this is it - looks like GC aging was changed to 0 in bug 381199, which causes the garbage collector to be very aggressive. I think the solution might be to increase the GC intervals, which would give legacy code such as image loading above a chance to run without being GCed (as in FF2) - this shouldn't result in any leaks, just less frequent GCs. As I mentioned above, I'd imagine that many JS libraries (probably wrongfully) assume that their vars won't go away so quickly, and this can potentially break quite a few of them.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 13

•

17 years ago

(In reply to comment #12) > Yep, I think this is it - looks like GC aging was changed to 0 in bug 381199, > which causes the garbage collector to be very aggressive. No more aggressive than the JS GC was before the cycle collector landed. That said, I don't see why the cycle collector would collect loading images; I'd think it ought to consider them reachable, since I'd think something else would have a reference to them. That said, this was an issue in the nsIDOMGCParticipant world (pre-cycle-collector), and I fixed it in bug 321054.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 14

•

17 years ago

(In reply to comment #12) > I think the solution might be to increase the GC intervals, which would give > legacy code such as image loading above a chance to run without being GCed (as > in FF2) - this shouldn't result in any leaks, just less frequent GCs. Let me be very clear: the solution here is definitely not to change the GC aging constant. That might somehow fix one case of this bug, but it would definitely not fix others. The GC aging constant affects only what objects are examined during a cycle collection. However, any JavaScript object and anything reachable from a JavaScript object is always considered in cycle collection. If we have any GC hazards, we need to fix those, not twiddle the aging constant so that we use more memory.

Alex Sirota (iosart)

Reporter

Comment 15

•

17 years ago

(In reply to comment #14) > Let me be very clear: the solution here is definitely not to change the GC > aging constant. That might somehow fix one case of this bug, but it would > definitely not fix others. David - sure, I'm not very familiar with Mozilla GC mechanisms, so whatever works best... :)

Johnny Stenback (:jst)

Comment 16

•

17 years ago

From Jonas: The cycle collector is most likely the cause here. It used to be that setting .onload on an image would stick it in the 'wrapper has expando' has table on the document which would keep it alive for the lifetime of the document. So GC wouldn't be able to collect it. However the cycle collector traverses these references in such a way that an orphaned node and the wrapper would appear as a cycle and would get collected. I would have expected the network or imglib code to keep strong references here while the image is loading, but maybe it uses weak references somehow. What we could do is to explicitly keep alive all currently loading images until after their onload or onerror has fired. Ben, you want something other than leak bugs to work on? :) Here's one for you.

Assignee: nobody → bent.mozilla

Flags: blocking1.9? → blocking1.9+

Priority: -- → P2

Boris Zbarsky [:bzbarsky]

Assignee

Comment 17

•

17 years ago

This is great fun! Network holds strong refs to its listeners. But imagelib holds _weak_ refs to its listeners, because those listeners always hold a strong ref to the imgIRequest (it's the only way they can get the image data). So we used to leak until the imagelib referece was made weak. I've been thinking about making it strong again now that we have cycle collection. That would definitely address this bug. The one issue it would cause is that an image whose network connection never closes (e.g. jpeg push) would effectively leak for the lifetime of that network connection, even if removed from the DOM and forgotten about. It would be collected when the page was left, of course (because the necko load would be terminated at that point). Right now this works, because when the last imagelib listener goes away the underlying necko channel is stopped, iirc. I suppose one option would be to change the jpeg push stuff somehow so that necko no longer had a strong ref to the imagelib listener after the first OnStopRequest (using nsWeakPtr somehow). But even if we don't do that, the jpeg push behavior we'd get with cycle collection seems to be a better tradeoff than what we have no without.

Johnny Stenback (:jst)

Comment 18

•

17 years ago

That certainly seems to be a good way to fix this, but I really don't think we want that for 1.9. We've been talking here about fixing this for 1.9 by simply making the image element hold on to itself while it's loading the image. Much lower impact even if not necessarily an elegant fix, IMO. I believe Ben has at least started hacking that up.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 19

•

17 years ago

For what it's worth, "while it's loading the image" turned out to be harder than I thought when I did bug 321054, but there's probably a good bit of code from that (that was backed out when cycle collector landed -- that's probably the version you'd want to look at) that you could use.

Boris Zbarsky [:bzbarsky]

Assignee

Comment 20

•

17 years ago

OK, so holding a strong ref to the image decoder observer and cycle-collecting all this stuff is not good enough, because while necko has a strong ref to the imgRequest, the imgRequest holds weak refs to the imgRequestProxies. So if we did what I suggest in comment 17, we'd just end up collecting the proxy/observer cycle, and it wouldn't do us much good.

Boris Zbarsky [:bzbarsky]

Assignee

Comment 21

•

17 years ago

Attached patch Possible approach — Details — Splinter Review

What this does is hold a strong ref from the imgRequestProxy to the observer until OnStopRequest fires. Then it goes back to holding a weak ref, the way the API promises. For a multipart channel, that means after the first part is done it'll hold a weak ref. If we really want to change only the content side, I guess we could. As dbaron said, it's a lot more complicated, but it could be done. Either way, the mochitest catches this bug and should go in.

Assignee: bent.mozilla → bzbarsky

Status: NEW → ASSIGNED

Attachment #309583 - Flags: superreview?(jonas)

Attachment #309583 - Flags: review?(pavlov)

Boris Zbarsky [:bzbarsky]

Assignee

Updated

•

17 years ago

Summary: Image onload fails to fire intermittently → [FIX]Image onload fails to fire intermittently

Jonas Sicking (:sicking) No longer reading bugmail consistently

Comment 22

•

17 years ago

Do we really want to hold strong references to all image-load observers? Things like background image loads etc doesn't IMHO be kept alive if we no longer care about the image. <img> is special because people use that to pre-populate caches, and you can have onload-handlers that need to fire even if there are no references to the <img>, but that doesn't seem to be true for other types of loads.

Boris Zbarsky [:bzbarsky]

Assignee

Comment 23

•

17 years ago

I think this is just the simplest and safest way to fix this bug. We could try to only do images, per comment 19. Like dbaron said, that approach, while doable, would require a good bit more care. If someone wants to do that, I'm not going to stop them, of course!

Stuart Parmenter

Updated

•

17 years ago

Attachment #309583 - Flags: review?(pavlov) → review+

Jonas Sicking (:sicking) No longer reading bugmail consistently

Comment 24

•

17 years ago

Comment on attachment 309583 [details] [diff] [review] Possible approach >+void imgRequestProxy::NullOutListener() >+{ >+ if (mListenerIsStrongRef) { >+ // Releasing could do weird reentery stuff, so just play it super-safe >+ imgIDecoderObserver* obs = mListener; >+ mListener = nsnull; >+ mListenerIsStrongRef = PR_FALSE; >+ NS_IF_RELEASE(obs); Or you could simply do: nsCOMPtr<imgIDecoderObserver> obs; obs.swap(mListener); sr=me with that fixed.

Attachment #309583 - Flags: superreview?(jonas) → superreview+

Boris Zbarsky [:bzbarsky]

Assignee

Comment 25

•

17 years ago

Checked in, with sicking's nice simplification.

Status: ASSIGNED → RESOLVED

Closed: 17 years ago

Flags: in-testsuite+

Resolution: --- → FIXED

Andrew Schultz

Comment 26

•

17 years ago

Attached file valgrind log — Details

After this landed, nye's mochitest cycle been crashing on test_compute_data_with_start_struct.html. It seems jemalloc debug builds are perhaps the only ones that will crash since jemalloc is filling the deleted object (imgRequestProxy) memory with 0x5a5a5a5a. With jemalloc disabled, no crash but valgrind still sees the problem.

Boris Zbarsky [:bzbarsky]

Assignee

Comment 27

•

17 years ago

Attached patch Fix that — Details — Splinter Review

The RemoveFromLoadGroup kills us, then we go on to touch members. Bad! This patch just keeps us alive till we get out of this method.

Attachment #310939 - Flags: review?(pavlov)

Boris Zbarsky [:bzbarsky]

Assignee

Comment 28

•

17 years ago

I checked that patch in for now to fix the orange. Will make changes as needed based on review comments.

Stuart Parmenter

Updated

•

17 years ago

Attachment #310939 - Flags: review?(pavlov) → review+

Justin Dolske [:Dolske]

Comment 29

•

17 years ago

Is it possible this is causing bug 425987? (see comment 6)

Carsten Book [:Tomcat]

Comment 30

•

17 years ago

verified fixed using Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9pre) Gecko/2008050614 Minefield/3.0pre --> Verified fixed

Status: RESOLVED → VERIFIED

Nobody; OK to take it and work on it

Updated

•

6 years ago

Component: Event Handling → User events and focus handling

You need to log in before you can comment on or make changes to this bug.