Last Comment Bug 763593 - Firefox 13 incorrectly displays photos on Skydrive
: Firefox 13 incorrectly displays photos on Skydrive
Status: RESOLVED FIXED
: regression
Product: Core
Classification: Components
Component: ImageLib (show other bugs)
: Trunk
: All All
: -- normal with 1 vote (vote)
: mozilla16
Assigned To: Justin Lebar (not reading bugmail)
:
Mentors:
Depends on:
Blocks: 731419
  Show dependency treegraph
 
Reported: 2012-06-11 11:36 PDT by Bob Henson
Modified: 2013-12-27 14:24 PST (History)
17 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
+
disabled
+
fixed
+
fixed


Attachments
Patch, v1 (9.17 KB, patch)
2012-06-18 23:21 PDT, Justin Lebar (not reading bugmail)
bzbarsky: review+
joe: review-
Details | Diff | Splinter Review
Patch for branches, v1 (1015 bytes, patch)
2012-06-18 23:23 PDT, Justin Lebar (not reading bugmail)
bzbarsky: review+
lukasblakk+bugs: approval‑mozilla‑beta+
Details | Diff | Splinter Review
Patch v2 (1.74 KB, patch)
2012-06-20 08:51 PDT, Justin Lebar (not reading bugmail)
joe: review+
akeybl: approval‑mozilla‑aurora+
lukasblakk+bugs: approval‑mozilla‑beta-
Details | Diff | Splinter Review

Description Bob Henson 2012-06-11 11:36:01 PDT
User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:13.0) Gecko/20100101 Firefox/13.0
Build ID: 20120601045813

Steps to reproduce:

I tried to display photographs on a Skydrive page using Firefox 13 running under Windows 7.

http://tinyurl.com/bv6uy2t

This was repeated on another set of photos on a similar Skydrive page. A further similar page was tried, but on Dropbox. 


Actual results:

The page displayed a row of thumbnails of the photos. Attempting to display the photos by clicking the thumbnails resulted in some of the photos being displayed, but not others. 

Refreshing the page did not help.

Using the back and forward arrows failed too.

It was the same photos that did not display each time. Confirmation by many other people revealed that all worked correctly and as expected with Firefox 12, but not with Firefox 13.

It did not help to run Firefox in Safe Mode, ruling out extensions and plug-ins as a source of problems.

It did not help to reinstall Firefox 13 (two people tried) 

I repeated the check in Linux with Firefox 12 - the photographs displayed just fine. I upgraded to Firefox 13 and immediately repeated the test - it now failed in identical manner to all the other attempts.

Clicking "View Original" correctly displayed the photos one at a time.

Viewing the photos in Internet Explorer 9 or Chrome displays them correctly, only Firefox and then only version 13 fails.

Numerous other people have had the same result. Some of the results are discussed in the mozilla.firefox.support newsgroup under the heading "Firefox 13 problem displaying photos".

The bug could not be reproduced on a similar page on Dropbox - it only happens on Skydrive pages (two different ones tried with identical results). 


Expected results:

The photos should all have displayed have displayed from the thumbnails and from the left/right arrows.

There should be no difference in this regard between Firefox 12 and Firefox 13.

There should be no difference between the way Firefox handled sets of photos on Dropbox and on Skydrive.
Comment 1 (mostly gone) XtC4UaLL [:xtc4uall] 2012-06-11 13:58:52 PDT
Is this reproducible in a new Profile too?
Comment 2 Daniel Holbert [:dholbert] 2012-06-13 10:59:01 PDT
It looks like Skydrive does some magic where they serve a low-res version of the photo and then serve a higher-res version of the photo.

(Whatever's going on, it's likely due to an interaction with this magic -- possibly from an intermediate broken-image/franken-image getting cached somehow?  Assuming this is either an ImageLib or a networking bug.  Kicking to ImageLib for now.)

I can hit this bug (or something like it) in Nightly -- paging through the photos quickly with leftarrow / rightarrow will usually hit at least one photo that doesn't load (the screen is just gray).

I also noticed a number of messages like this in my Error Console when testing this:
{
Error: Image corrupt or truncated: https://byfiles.storage.live.com/y1pBZxTIhDXcp0qWlb7D5iQW33zRM4-oN69YtJ_n1lvnNEenioyLEzAQ3KkLJc5n_11cNfwyV9i48U/Aviemore_004.jpg?psid=1
Source File: https://byfiles.storage.live.com/y1pBZxTIhDXcp0qWlb7D5iQW33zRM4-oN69YtJ_n1lvnNEenioyLEzAQ3KkLJc5n_11cNfwyV9i48U/Aviemore_004.jpg?psid=1
Line: 0
}

(Note: refreshing the page _did_ help for me, though -- unlike comment 0.  Bob: are you sure it doesn't help for you? What if you do a "full" refresh with Ctrl+Shift+R?  (no resources loaded from cache))
Comment 3 Daniel Holbert [:dholbert] 2012-06-13 11:00:05 PDT
My nightly build was: Mozilla/5.0 (X11; Linux x86_64; rv:16.0) Gecko/16.0 Firefox/16.0a1
and (per comment 1) I was indeed able to reproduce it in a fresh profile.
Comment 4 Asa Dotzler [:asa] 2012-06-13 12:12:49 PDT
works in Firefox 12, broken in Firefox 13 (and all later.) need a regression window.
Comment 5 Bob Henson 2012-06-16 05:24:46 PDT
> (Note: refreshing the page _did_ help for me, though -- unlike comment 0. 
> Bob: are you sure it doesn't help for you? What if you do a "full" refresh
> with Ctrl+Shift+R?  (no resources loaded from cache))

No - I did that, as was suggested to me, but it made no difference.
Comment 6 Alice0775 White 2012-06-16 06:46:13 PDT
Regression window(m-c)
Good:
http://hg.mozilla.org/mozilla-central/rev/c33438bd5706
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:13.0) Gecko/20120311 Firefox/13.0a1 ID:20120311005632
Bad:
http://hg.mozilla.org/mozilla-central/rev/5ec9524de1af
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:13.0) Gecko/20120312 Firefox/13.0a1 ID:20120312031136
Pushlog:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=c33438bd5706&tochange=5ec9524de1af


Regression window(m-i)
Good:
http://hg.mozilla.org/integration/mozilla-inbound/rev/bd50a33d8c69
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:13.0) Gecko/20120309 Firefox/13.0a1 ID:20120309133232
Bad:
http://hg.mozilla.org/integration/mozilla-inbound/rev/55e63a03ccad
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:13.0) Gecko/20120309 Firefox/13.0a1 ID:20120309152032
Pushlog:
http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=bd50a33d8c69&tochange=55e63a03ccad
Suspected:
Bug 731419
Comment 7 Alice0775 White 2012-06-16 09:59:17 PDT
FYI
image.mem.discardable = false helps
Comment 8 Justin Lebar (not reading bugmail) 2012-06-18 11:33:47 PDT
As far as I can tell, this is a real bug, unrelated to bug 731419.

What happens is, setting the src of an image eventually triggers a call to nsImageLoadingContent::ClearCurrentRequest().  ClearCurrentRequest unlocks the image, and, *if we haven't finished decoding it yet* we discard it.  It's that dependence on having finished decoding which makes this bug appear intermittently.

Discarding cancels the decode, which causes us to throw the "Image corrupt or truncated:" error.  That error, in this case, should instead say "*we* truncated this image."

mozilla::image::Decoder::Finish at src/image/src/Decoder.cpp:106
mozilla::image::RasterImage::ShutdownDecoder at src/image/src/RasterImage.cpp:2325
mozilla::image::RasterImage::UnlockImage at src/image/src/RasterImage.cpp:2688
imgRequestProxy::UnlockImage at src/image/src/imgRequestProxy.cpp:336
nsDocument::RemoveImage at src/content/base/src/nsDocument.cpp:8249
nsImageLoadingContent::UntrackImage at src/content/base/src/nsImageLoadingContent.cpp:1161
nsImageLoadingContent::ClearCurrentRequest at src/content/base/src/nsImageLoadingContent.cpp:1046
nsImageLoadingContent::PrepareCurrentRequest at src/content/base/src/nsImageLoadingContent.cpp:997
nsImageLoadingContent::MakePendingRequestCurrent at src/content/base/src/nsImageLoadingContent.cpp:1021
nsImageLoadingContent::LoadImage at src/content/base/src/nsImageLoadingContent.cpp:753
nsImageLoadingContent::LoadImage at src/content/base/src/nsImageLoadingContent.cpp:656
nsHTMLImageElement::SetAttr at src/content/html/content/src/nsHTMLImageElement.cpp:428
nsGenericHTMLElement::SetAttr at src/content/html/content/src/nsGenericHTMLElement.h:163
nsGenericHTMLElement::SetAttrHelper at src/content/html/content/src/nsGenericHTMLElement.cpp:2830
nsHTMLImageElement::SetSrc at src/content/html/content/src/nsHTMLImageElement.cpp:115
Comment 9 Justin Lebar (not reading bugmail) 2012-06-18 14:21:21 PDT
Okay, this is in fact being tickled by bug 731419; if I take out the call to RasterImage::RequestDiscard in nsDocument::RemoveImage, the bug goes away.

But I don't think the problem is in the nsDocument code.  If I modify RasterImage::UnlockImage and take out

    ShutdownDecoder(eShutdownIntent_Interrupted);
    ForceDiscard();

then everything also works properly.

Since once an image isn't displayed, it's never displayed for the whole session, I think we're somehow getting into a state where we cancel the decode but don't clean out the RasterImage sufficiently and the image becomes permanently corrupted.

I'm scared to try to figure out what this page is doing, but it seems to start and cancel many image decodes every time we navigate the page.
Comment 10 Lukas Blakk [:lsblakk] use ?needinfo 2012-06-18 14:56:02 PDT
Tracking this regression, do we need to look into backing out bug 731419 or will a forward fix be investigated for this and bug 762460?
Comment 11 Justin Lebar (not reading bugmail) 2012-06-18 14:58:49 PDT
> will a forward fix be investigated for this and bug 762460?

If it's not clear from comments 8 and 9, I'm investigating a forward fix here.  Worst case, I don't think we need to back out all of bug 731419, as I indicated in comment 9.
Comment 12 Justin Lebar (not reading bugmail) 2012-06-18 15:34:39 PDT
> Worst case, I don't think we need to back out all of bug 731419, as I indicated in comment 9.

Well, I guess I didn't really indicate as much there.

Bug 731419 actually makes two changes.  When a tab is closed, we immediately discard its image data.  And when an image is "removed from the document", we also immediately discard it.  It's only this second change which appears to be causing problems.

But again, I think there's a deeper problem here.  Joe et al., I could use some help understanding what's going on in comment 9.  Again, it seems like the decoder shutdown gets us into a borked state from which we never recover.
Comment 13 Justin Lebar (not reading bugmail) 2012-06-18 21:40:31 PDT
> If I modify RasterImage::UnlockImage and take out
> 
>     ShutdownDecoder(eShutdownIntent_Interrupted);
>     ForceDiscard();
> 
> then everything also works properly.

If I take out only the ForceDiscard(), the bug still manifests.  I can't take out ShutdownDecoder and leave ForceDiscard, because you can't discard while there's an active decoder, but this suggests that the problem is indeed with the decoder and not with discarding per se.
Comment 14 Justin Lebar (not reading bugmail) 2012-06-18 22:55:53 PDT
Okay, I think I have this.  I'll post a patch tonight or tomorrow.
Comment 15 Justin Lebar (not reading bugmail) 2012-06-18 23:21:25 PDT
Created attachment 634303 [details] [diff] [review]
Patch, v1

I have to admit, I'm not entirely sure how we get into this state to begin with, or why it's intermittent.  But it seems what's going wrong is that MakePendingRequestCurrent() is called when both the pending and current requests point to the same image.  We unlock the current request, killing its decoder, and then swap in the pending request, which is now borked (I'm not exactly sure how).

I think I could write this patch differently: Lock the current request only if the current and pending requests are both for the same image.  But even then, I'd have to check whether the image was locked; I don't want to lock an unlocked image, because that will kick off decodes and whatnot.  Not checking that the two requests are for the same image seemed simpler to me.

Because I don't quite understand how we get into this state, I haven't written a test.  We really should figure this out, though.

I'm also not confident we should take this patch on branches.  There's a simpler one-line fix to nsDocument, which I'll post here in a moment.
Comment 16 Justin Lebar (not reading bugmail) 2012-06-18 23:23:47 PDT
Created attachment 634304 [details] [diff] [review]
Patch for branches, v1
Comment 17 Justin Lebar (not reading bugmail) 2012-06-18 23:25:49 PDT
Patch v1: https://tbpl.mozilla.org/?tree=Try&rev=0c153567b552
Patch for branches v1: https://tbpl.mozilla.org/?tree=Try&rev=de3bec94dc8d
Comment 18 Boris Zbarsky [:bz] 2012-06-18 23:26:42 PDT
Comment on attachment 634303 [details] [diff] [review]
Patch, v1

r=me
Comment 19 Boris Zbarsky [:bz] 2012-06-18 23:27:10 PDT
Comment on attachment 634304 [details] [diff] [review]
Patch for branches, v1

Hmm.  Why is it ok to do this?
Comment 20 Justin Lebar (not reading bugmail) 2012-06-19 06:58:16 PDT
>  Why is it ok to do this?

The code this patch removes was added in bug 731419.  We won't discard these images as quickly, but we'll still eventually discard them.

(Alternatively, we could take the other patch on branches; it's not *too* scary, except for the interface changes.)
Comment 21 Boris Zbarsky [:bz] 2012-06-19 07:11:17 PDT
Comment on attachment 634304 [details] [diff] [review]
Patch for branches, v1

Ah.  r=me, I guess.  The interface changes worry me on beta, at least...

I'm not sure whether taking this patch on beta is better than living with the regression for that cycle, though.
Comment 22 Justin Lebar (not reading bugmail) 2012-06-19 11:12:40 PDT
Comment on attachment 634304 [details] [diff] [review]
Patch for branches, v1

[Approval Request Comment]
Bug caused by: bug 731419
User impact if declined: This bug, bug 762460.
Testing completed (on m-c, etc.): Pushed to try.  :)
Risk to taking this patch (and alternatives if risky): See comment 20.  This may increase memory usage temporarily for some users, which can lead to OOMs.  But it's no worse than we were in FF12.
String or UUID changes made by this patch: None.
Comment 23 Joe Drew (not getting mail) 2012-06-19 15:28:40 PDT
Comment on attachment 634303 [details] [diff] [review]
Patch, v1

Review of attachment 634303 [details] [diff] [review]:
-----------------------------------------------------------------

r=me, but I'd like it if we had an RAII class even more.

::: content/base/src/nsImageLoadingContent.cpp
@@ +1030,5 @@
> +    currentRequest->IsLocked(&currentRequestIsLocked);
> +  }
> +  if (currentRequestIsLocked) {
> +    currentRequest->LockImage();
> +  }

You should make an RAII class to do this!

::: image/public/imgIRequest.idl
@@ +166,5 @@
>     */
>    void unlockImage();
>  
>    /**
> +   * Checks whether the image is locked.

s/Checks w/W/
Comment 24 Justin Lebar (not reading bugmail) 2012-06-19 17:41:45 PDT
> s/Checks w/W/

s/Checks/Gets/?  I'd like it to be a complete sentence, since it's a method.  I guess we could make it an attribute...

> You should make an RAII class to do this!

It's kind of tricky because we only want to lock if the image is already locked.  So it's either a really specific RAII class, or otherwise it has to be one of those Init()'able RAII classes.
Comment 25 Joe Drew (not getting mail) 2012-06-19 19:10:33 PDT
(In reply to Justin Lebar [:jlebar] from comment #24)
> > s/Checks w/W/
> 
> s/Checks/Gets/?  I'd like it to be a complete sentence, since it's a method.
> I guess we could make it an attribute...

A readonly attribute, sure.

> > You should make an RAII class to do this!
> 
> It's kind of tricky because we only want to lock if the image is already
> locked.  So it's either a really specific RAII class, or otherwise it has to
> be one of those Init()'able RAII classes.

You could just lock the image unconditionally. Locking doesn't kick off decodes!
Comment 26 Justin Lebar (not reading bugmail) 2012-06-20 07:12:48 PDT
> You could just lock the image unconditionally. Locking doesn't kick off decodes!

Oh, duh.  Okay, that sounds much better.
Comment 27 Justin Lebar (not reading bugmail) 2012-06-20 08:51:50 PDT
Created attachment 634937 [details] [diff] [review]
Patch v2

RAII, unconditional locking.  Much smaller patch!

No interface changes, so I guess we could take this on beta if we wanted.
Comment 28 Joe Drew (not getting mail) 2012-06-20 10:37:08 PDT
Comment on attachment 634937 [details] [diff] [review]
Patch v2

Lovely
Comment 29 Justin Lebar (not reading bugmail) 2012-06-20 10:47:06 PDT
Comment on attachment 634937 [details] [diff] [review]
Patch v2

[Approval Request Comment]
See approval request comment in comment 22.  We don't need both this patch and the other in this bug approved for beta; just one or the other.  The other patch is a straight backout, so is less risky, but this should have fewer negative effects (increased memory usage).
Comment 30 Justin Lebar (not reading bugmail) 2012-06-20 10:48:55 PDT
https://hg.mozilla.org/integration/mozilla-inbound/rev/f409b30d408d
Comment 31 Joe Drew (not getting mail) 2012-06-20 14:08:30 PDT
Comment on attachment 634303 [details] [diff] [review]
Patch, v1

Rescinding my review on the older patch, given how this bug developed.
Comment 32 Lukas Blakk [:lsblakk] use ?needinfo 2012-06-20 15:29:22 PDT
Comment on attachment 634304 [details] [diff] [review]
Patch for branches, v1

[Triage comment]
Let's take this backout since it's less risky and puts us more comparably to where we're at with FF12 - thank you Justin for providing options.  I'll leave the approval-aurora nom on the other one and we can re-evaluate it after the patch in comment 29 lands on central.
Comment 33 Lukas Blakk [:lsblakk] use ?needinfo 2012-06-20 15:29:42 PDT
Comment on attachment 634937 [details] [diff] [review]
Patch v2

taking the patch in comment 22 for beta (backout) instead.
Comment 34 Justin Lebar (not reading bugmail) 2012-06-20 20:27:44 PDT
Fixed (well, partially backed out) on beta for FF14: https://hg.mozilla.org/releases/mozilla-beta/rev/ea720a642a88
Comment 35 Ed Morley [:emorley] 2012-06-21 04:06:28 PDT
https://hg.mozilla.org/mozilla-central/rev/f409b30d408d
Comment 36 Alex Keybl [:akeybl] 2012-06-24 13:27:13 PDT
Comment on attachment 634937 [details] [diff] [review]
Patch v2

[Triage Comment]
We can take the forward fix for 15.
Comment 37 Justin Lebar (not reading bugmail) 2012-06-24 13:38:59 PDT
Aurora 15: https://hg.mozilla.org/releases/mozilla-aurora/rev/696fa474722c
Comment 38 Simona B [:simonab ] -PTO- back Sept 5th 2012-08-14 07:33:02 PDT
When trying to verify this on Firefox 15 beta 4, I've noticed a number of error messages in the Error Console on Windows 7:

Timestamp: 8/14/2012 4:48:33 PM
Error: The character encoding of the plain text document was not declared. The document will render with garbled text in some browser configurations if the document contains characters from outside the US-ASCII range. The character encoding of the file needs to be declared in the transfer protocol or file needs to use a byte order mark as an encoding signature.
Source File: https://secure.wlxrs.com/wkbwMVPzjLD-5BZn35tBbw/Bucket2c.js
Line: 0

On Windows 7 - all the photos are properly displayed.

Moreover, as on Firefox 13, not all the photos from Skydrive are displayed on Firefox 15 beta 4 on Ubuntu 12.04 and Mac OS X 10.7. The same error as the one mentioned in Comment 2 is displayed in the Error Console.

Note You need to log in before you can comment on or make changes to this bug.