Closed Bug 466586 Opened 16 years ago Closed 15 years ago

eBay preview image occasionally disappears after briefly appearing

Categories

(Core :: Graphics: ImageLib, defect, P1)

defect

Tracking

()

VERIFIED FIXED
mozilla1.9.1b4

People

(Reporter: stephend, Assigned: joe)

References

()

Details

(Keywords: regression, verified1.9.1)

Attachments

(6 files, 5 obsolete files)

129.07 KB, text/html
Details
129.20 KB, text/html
Details
39.84 KB, text/html
Details
8.07 KB, patch
Details | Diff | Splinter Review
31.50 KB, patch
vlad
: review+
Details | Diff | Splinter Review
1.43 KB, patch
Details | Diff | Splinter Review
Attached file testcase (saved HTML)
I can reproduce this at the URL within a minute or two of reloading, using
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1b2pre) Gecko/20081124 Minefield/3.1b2pre

Here's the saved HTML (which still uses a bunch of remote resources) -- I can reproduce it using this file, too.

FWIW, I tried using Firefox 2.0.0.17 and couldn't reproduce this bug within a few minutes.  Regression?
Status: UNCONFIRMED → NEW
Ever confirmed: true
OS: Windows Vista → All
Hardware: PC → All
Version: unspecified → Trunk
Yeah, this bug doesn't reproduce with 3.0.4, either, so it's just a Gecko 1.9.1 regression.
Flags: blocking1.9.1?
I tried making a minimal testcase here, but the more I remove from the page, the harder it gets to reproduce. It seems to be either a race issue, or perhaps related to the cache filling up?

The relevant javascript is:
function standardizeImage(image,dimension){
var customImage=new Image();customImage.src=image.src;var imw;var imh;
imw=customImage.width;imh=customImage.height;
var rw=imw/dimension,rh=imh/dimension,ratio=(rw>rh)?rw:rh;
if(ratio>=1){image.width=imw/ratio;image.height=imh/ratio;}
else{image.width=imw;image.height=imh;}}

That's called from the image: onload="standardizeImage(this,200)"
and when the image disappears, imw and imh are 0. So, somehow, sometimes right after setting customImage.src=image.src, that customImage doesn't have a width and height yet (but most of the time it does). 
(If you set the width and height of the image element to non-zero values with the dom inspector, the image shows up again.)

I suspect Safari would run into the same problem, as there's an entire branch in that function for Safari, which works on the original image rather than creating a new temporary image.
if(is.safari&&is.mac){
 var origImg=getOriginalImg();
 if(typeof(origImg)!=u) {
  if(typeof(origImg.width)!=u){imw=origImg.width;}
  if(typeof(origImg.height)!=u){imh=origImg.height;}
 }
}
(u is "undefined" there.)
Flags: blocking1.9.1? → blocking1.9.1+
Priority: -- → P2
Could somebody find a regression range for this?  (In other words, the last nightly build before this started happening and the first one in which it started.)
This is the same as the previously attached testcase, but one that gives an alert when the bug occurs. And it does automatic reloading. It allowed me to find the regression range easier.

I get a regression range between 2008-09-04 and 2008-09-05 (I let the 2008-09-04 build run for 10 minutes or so before I got tired of it):
http://hg.mozilla.org/mozilla-central/pushloghtml?startdate=2008-09-04+04%3A00%3A00&enddate=2008-09-05+04%3A00%3A00
So this seems to me a regression from bug 430061 somehow.
I've been able to make the image intermittently not show as far back as 2007091216 (I've been testing by loading the attached testcase and the original eBay url in 30+ tabs, and giving a reload all once or twice, after which there always was at least one tab that had the image not show up).
No longer blocks: 430061
Given comment #6, I suspect that this lies entirely in Content-land. (It's the only thing that makes sense to me, anyways.)

Content folk: feel free to punt back to me if you've got a reason to suspect it's not you.
Component: Layout → Content
QA Contact: layout → content
So on Linux, x86_64, using attachment 352016 [details], I'm also seeing the regression as between 2008-09-04-05-mozilla-central and 2008-09-05-05-mozilla-central.
This is a bit reduced, but indeed, it seems increasingly difficult to reduce it, the smaller the file gets. I can also reproduce this in offline mode, btw.
After further discussion with Sander and Martijn, this is probably Imagelib after all.
Assignee: nobody → joe
Component: Content → ImageLib
QA Contact: content → imagelib
Hrm; I think bug 468160 might have fixed this.  Martijn/Sander: do you guys still see it?
(In reply to comment #11)
> Hrm; I think bug 468160 might have fixed this.  Martijn/Sander: do you guys
> still see it?

nvm me; still happens, just really sporadically, as Martijn mentions in comment 9.
eBay's code is sort of reasonable, but I believe it's wrong.

What happens is that, with the volume of images on that page, once they're all loaded, the cache has been more than filled. The thumbnail that gets resized is one of the images that gets thrown out of the cache, so when they do customimage.src = image.src, it needs to reload from the network (or, at least, the disk cache). This means that the image isn't available until its onload handler gets called, leading to the image disappearing.

I'm inclined to resolve this invalid, but I'm willing to hear alternate points of view on it.
That sounds reasonable, although a little odd given that the image is actually available to be displayed.  A more complex solution would be to recognize that we have the uncompressed bits sitting around already... though I guess that won't work, the image that the server serves up could be completely different for the same URL.
We have a situation here where Firefox is acting differently after once every
xxx reloads.
Firefox should always act one way or another, not do something different once
in while. 
Or am I wrong?
Firefox should do whatever the script/html tells it to do -- in this case, the script is doing something erroneous.  It worked before due to an implementation quirk, but there's nothing that says that the code as written should replace the image seamlessly.  One element is removed, another one is put in its place before it's loaded...
The reason it happens occasionally rather than all the time is because of the way image caching works. We don't evict elements from the cache until another image is loaded. Most of the images from this page are removed from the cache when this happens, so it's entirely a factor of timing from the network and whatever other images are loading at the time. This is why having multiple copies of the testcase open in multiple tabs: we get different images loading at different times, so it's more likely that the image in question gets evicted before the js gets run.
It would work if we refused to evict any image from the cache that's currently in use, right?
Yes, but I suspect that's not a good idea: think of having a pile of flickr images open in tabs, for example. Your cache would be all full of stuff that you're probably not going to re-request (content still has a ref to it), and you wouldn't be able to add any new things to the cache.

I'm going to do an experiment by adding a second hash table of weak refs to "all cache entries currently alive." If we don't have an element in the cache, but it still exists in memory, maybe we can just throw it back into the cache.
This should have been a lot easier to fix, but some stupid thinkos made it a several-day affair.

While I still maintain that this bug is not necessarily a bug, it'd be nice to be able to grab images that are still in use and use them as-is, rather than going to the network or disk. So this patch does that. I ran the ebay test with a build with this patch, and never once got an alert.

Still undetermined is whether i should aim this for 1.9.1. I'm sort of inclined not to, because it seems sort of risky in a refcount sort of way, but I'm willing to listen to arguments the other way.
Attachment #356847 - Flags: superreview?(vladimir)
Attachment #356847 - Flags: review?(roc)
I'm not the right reviewer here.
Attachment #356847 - Flags: review?(roc) → review?(bzbarsky)
So...  It sounds like this site is using the standard image preloading setup where you do new Image() and then use that src on the page, right?  Page authors expect such images to not hit the network, and that's what we used to do as I recall.  We need to keep that behavior; breaking it is not acceptable.  So this bug is most definitely a bug.  I thought I'd said all this in an imagelib bug recently, but I can't find it.

Perhaps we shouldn't pin images in the cache from all content nodes, but only from ones that are not in a document (or even only ones created via new Image() and not in a document)?
Er, reading comment 3 more carefully it seems that in fact they're assigning from an image in the document to a new Image().  So while the new Image() thing is still broken, and still needs to be fixed, that won't fix this bug.

That said, I'm not sure I'm following comment 19.  I thought the image cache basically cached imgRequests, right?  Would it make sense to not count imgRequests with outstanding imgRequestProxies against the cache size (so that we're only really "caching" the things that no one is actively using; we're just using the cache as a lookup table to get at the rest)?
Boris' idea was pretty brilliant, and I couldn't help but think it's a better long-term solution for images. So I implemented it.

We now have two sets of cache entries: those that are tracked for time-based expiry and counted towards the cache limit, and the rest, which will be kept in memory anyways because they have observers holding strong references to them. 

This way, we can always serve requests for images that are currently loaded, regardless of size, and the cache becomes a cache of only currently-unused images.
Attachment #356847 - Attachment is obsolete: true
Attachment #357239 - Flags: superreview?(vladimir)
Attachment #357239 - Flags: review?(bzbarsky)
Attachment #356847 - Flags: superreview?(vladimir)
Attachment #356847 - Flags: review?(bzbarsky)
Attachment #357239 - Flags: superreview?(vladimir) → superreview+
Comment on attachment 357239 [details] [diff] [review]
Only track image cache entries whose requests have no observers

Patch looks good to me overall, just not super happy with the 'Tracked' term -- discussed some ideas on irc.
For joe:

14:58 < vlad> the patch looks good, but only thing is that the 'Tracked' 
              terminology was confusing to me
14:58 < joe> vlad: hm, yes
14:58 < vlad> I got to the comment which says that things that are tracked are 
              tracked for eviction
14:58 < joe> got a better name?
14:58 < vlad> so maybe call it CanBeEvicted() or IsEvictable() or something?
14:58 < vlad> IsBadTenant()? :)
14:58 < joe> ah yes, and then your brain leaked out your ears
14:59 < joe> well any entry can be removed from cache/evicted
14:59 < joe> if it's cancelled or whatever
14:59 < vlad> so if something is not tracked
14:59 < vlad> it can still be evicted?
14:59 < joe> but only by explicit request
14:59 < joe> it won't be evicted by the cache
14:59 < vlad> ok, so it won't be evicted by the normal cache eviction
14:59 < joe> for example if time passes or by space
14:59 < joe> right
15:00 < vlad> so maybe CanBeAutoEvicted or something?
15:00 < joe> hm
15:00 < joe> not superhappy with that name, but i'll mediate on it
15:00 < vlad> or even make it explicit what it is
15:00 < joe> meditate
15:00 < vlad> like HasNoObservers()
15:00 < joe> hmm
15:00 < joe> that's interesting
15:01 < vlad> because what you really want to be able to say is "Only things 
              that have no observers will be automatically evicted by the 
              cache; other entries can only be explicitly evicted due to 
              request cancellation or other events"
15:01 < vlad> or something like that
15:02 < vlad> then AddTrackedItem() becomes SetHasNoObservers() etc.
15:08 < joe> can you put all this onto the bug?
15:10 < vlad> I referenced the irc conversation, want me to copy & paste?
15:12 < joe> sure
Comment on attachment 357239 [details] [diff] [review]
Only track image cache entries whose requests have no observers

>The image cache is now comprised of two things: a heap of tracked items that

s/comprised/composed/.  Or "now comprises two things".

>+++ b/modules/libpr0n/src/imgLoader.cpp
>+PRBool imgLoader::AddTrackedItem(nsIURI *key, imgCacheEntry *entry)
...
>+  entry->SetTracked(true);

PR_TRUE.

>+PRBool imgLoader::RemoveTrackedItem(nsIURI *key)
...
>+    entry->SetTracked(false);

PR_FALSE.

>@@ -1033,24 +1116,33 @@ NS_IMETHODIMP imgLoader::LoadImage(nsIUR
>+        // If this entry is being tracked, it's an entry for a request that
>+        // has no proxies currently, and the request has no reference to the entry.
>+        if (entry->Tracked()) {

...

>@@ -1168,43 +1260,54 @@ NS_IMETHODIMP imgLoader::LoadImageWithCh
>         request = getter_AddRefs(entry->GetRequest());
>+
>       } else {

Nix the random whitespace change.

>+        // If this entry isn't being tracked, it's an entry for a request that
>+        // has no proxies currently, and the request has no reference to the entry.
>+        if (entry->Tracked()) {

Why the comment difference from LoadImage()?  I'd think the comments should say the same things...

In terms of what to call this, I like HasNoObservers() or maybe IsInUse() (with the opposite meaning, of course).
Attached patch Address review comments (obsolete) — Splinter Review
This code is going to need significant baking on the trunk, I think; it's currently a bit fragile, in that it's quite easy to add and remove crasher bugs. Most of those problems have lain in imgCacheValidator, so please focus on that in review.
Attachment #357239 - Attachment is obsolete: true
Attachment #357879 - Flags: superreview?(vladimir)
Attachment #357879 - Flags: review?(bzbarsky)
Attachment #357239 - Flags: review?(bzbarsky)
Attachment #357879 - Flags: superreview?(vladimir) → superreview+
Comment on attachment 357879 [details] [diff] [review]
Address review comments

Nothing really bad jumps out at me, but I don't have my head fully wrapped around the interaction between the different objects in imglib.
Ok, here's a mochitest for this bug. I load a big (3000x3000) PNG file, then load another image, forcing the image cache to try to evict any elements it can. Then we do the same trick that the eBay page does - img = new Image(), then img.src = big.src. If immediately after that we don't have a width, the image wasn't able to be loaded from cache. (Note: I know I claimed that I was OK with breaking this, and it might seem a little hypocritical to create a test that will ensure we don't break it in future given that stance. I only said that because I thought it would be hard to fix. :) )

Anyways, the reason I flag you for review on this, Boris, is because I tried to roll another test into this test - namely, one that would test imgCacheValidator. Unfortunately, as I discovered, if you're reloading the same image from the same document (the aCX parameter passed to LoadImage()), the image cache just assumes all's well and doesn't validate. It takes an explicit reload to get imgCacheValidator involved. Is this correct? If so, should I just do a document.reload() in a separate test?
Attachment #358292 - Flags: review?(bzbarsky)
Forgot to mention: current trunk fails on this mochitest, and trunk + attachment 357879 [details] [diff] [review] passes.
Keywords: testcase-wanted
I have no clue where to start in on the validator stuff.  I don't even know the answer to the question in comment 30.

Is there any chance of getting Stuart to review this?  Other than maybe darin he's the only one who has a shot of already knowing how the validator thing works.  I'd have to sit down and sort through it.

If it makes him feel better, I'd rather do interruptible reflow stuff than sort though imgCacheValidator, but this bug is a blocker.... ;)
Comment on attachment 357879 [details] [diff] [review]
Address review comments

this looks OK.
Attachment #357879 - Flags: review?(bzbarsky) → review+
Attached patch Fix leaks in previous patch (obsolete) — Splinter Review
As is usually the case with my patches :), I found a leak or two in the previous attempt. Namely, removing an entry that didn't make it into the cache (because it failed to initialize) didn't work, even though it was in the tracker and queue. I fixed this by changing RemoveEntry(imgCacheEntry *) to directly remove entries rather than relying on removing by URI.

Also, I forgot to change EvictEntries() to account for the fact that the cache now has a superset of things in the queue, so on XPCOM shutdown not everything got freed.

I'm carrying over Stuart's r+, but I'd like Vlad to look this over.
Attachment #357879 - Attachment is obsolete: true
Attachment #358498 - Flags: superreview?(vladimir)
Attachment #358498 - Flags: review+
Attachment #358498 - Flags: superreview?(vladimir) → superreview+
Comment on attachment 358498 [details] [diff] [review]
Fix leaks in previous patch

Looked at diff; looks fine to me.
Attachment #358292 - Flags: review?(bzbarsky)
Pushed to m-c: http://hg.mozilla.org/mozilla-central/rev/72fda0d63f66 and http://hg.mozilla.org/mozilla-central/rev/18988bc1c727
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Backed out because of leaks :(
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
In certain obscure cases, such as are triggered in the mochitest for bug 368972, it was possible for an imgRequest to lose its proxies after it was cancelled. (I'm kind of surprised that no other tests triggered that.) Since it makes no sense for the cache to have a queue entry for elements that aren't in the cache's hash table, we can just skip out if the entry has been evicted.
Attachment #358498 - Attachment is obsolete: true
Attachment #359850 - Flags: superreview?(vladimir)
Attachment #359850 - Flags: review+
Attachment #359850 - Flags: superreview?(vladimir) → superreview+
http://hg.mozilla.org/mozilla-central/rev/ff62653d0cd8 and http://hg.mozilla.org/mozilla-central/rev/380f8cf4bec2 . Looks to be sticking this time.
Status: REOPENED → RESOLVED
Closed: 15 years ago15 years ago
Resolution: --- → FIXED
Verified FIXED on trunk using Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2a1pre) Gecko/20090129 Minefield/3.2a1pre; thanks Joe!
Status: RESOLVED → VERIFIED
Verified FIXED on Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.4; en-US; rv:1.9.1b3pre) Gecko/20090201 Shiretoko/3.1b3pre; replacing fixed1.9.1 keyword with verified1.9.1
Depends on: 476457
filed bug 476457 for the crashes.
Depends on: 476349
Reopened due to 476457. :(
Status: VERIFIED → REOPENED
Keywords: verified1.9.1
Resolution: FIXED → ---
I can't guarantee that this fixes all crashes on shutdown, but it definitely fixes a crash I was seeing on shutdown when creating a new profile, and it's a smart thing to do. Basically, sometimes it was possible for nsExpirationTracker::AddObject to fail, but we didn't check for that possibility. Now we handle it gracefully.
Attachment #359850 - Attachment is obsolete: true
Attachment #361697 - Flags: review?(vladimir)
I just threw this patch over to the try server -- see if the people who were seeing the crashes frequently can reproduce with it?
Here is a link to Vlad's try server builds. Please test with these and let us know whether you see this crash.

https://build.mozilla.org/tryserver-builds/2009-02-11_01:34-vladimir@mozilla.com-bug466586/
After careful consideration, I'm going to say that we can't release B3 without this. The patch is scary enough that it really needs a large audience of testers - i.e., a beta vehicle. I'm comfortable pushing this to mozilla-central and letting it bake for a few days right ATM; we can reevaluate this later if it's necessary.
Priority: P2 → P1
http://hg.mozilla.org/mozilla-central/rev/52da9e8d713a and http://hg.mozilla.org/mozilla-central/rev/af9270b650e6

Going to let this bake in m-c for several days to ensure that all the nits have been shaken out.
Status: REOPENED → RESOLVED
Closed: 15 years ago15 years ago
Resolution: --- → FIXED
Backed out again because of a strange crash: http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1234564770.1234566952.27920.gz
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
http://hg.mozilla.org/mozilla-central/rev/a6b38a1f6624 and http://hg.mozilla.org/mozilla-central/rev/c48c9cc92320. Looks like there's no obvious failures this time, but I have got to believe that there's a bug lying dormant because of the crash mentioned in comment 52. I hope not to back this out; please file follow-up bugs for crashes/assertions you find.
Status: REOPENED → RESOLVED
Closed: 15 years ago15 years ago
Resolution: --- → FIXED
Verified FIXED using Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2a1pre) Gecko/20090219 Minefield/3.2a1pre; I had a harder time reproducing the failure this time around than when I filed the bug, but did with a two-day old build, using Martijn's testcase:

https://bugzilla.mozilla.org/attachment.cgi?id=352339

That now works with the above build, in which I left it running for a long time.
Status: RESOLVED → VERIFIED
Depends on: 479328
I was trying to type into a textarea at the time, although I get the same stack on similar-looking crashes closing windows.
Depends on: 480352
Depends on: 479502
Priority: P2 → P1
Target Milestone: mozilla1.9.1b3 → mozilla1.9.1b4
Verified FIXED in 1.9.1 using:

Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.4; en-US; rv:1.9.1b4pre) Gecko/20090324 Shiretoko/3.5b4pre

Replacing fixed1.9.1 keyword with verified1.9.1
(In reply to comment #30)
> Created an attachment (id=358292) [details]
> Test to ensure we don't need to reload an image in a document

This seems to be failing intermittently:

http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1240597768.1240608481.27987.gz&fulltext=1#err7

http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1240581329.1240591791.5433.gz&fulltext=1#err7

> (Note: I know I claimed that I was
> OK with breaking this, and it might seem a little hypocritical to create a test
> that will ensure we don't break it in future given that stance. I only said
> that because I thought it would be hard to fix. :) )

Given that it seems tricky to test this reliably, what are your current feelings?
The fact that this fails intermittently means there's something wrong with the code, not with the test. :(
I've filed bug 490384 on the intermittent failures.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: