Closed Bug 942004 Opened 11 years ago Closed 10 years ago

Intermittent TEST-UNEXPECTED-FAIL | bug917595-exif-rotated.jpg | image comparison (==), max difference: 224, number of differing pixels: 433587

Categories

(Core :: Layout: Images, Video, and HTML Frames, defect, P5)

x86
Windows 7
defect

Tracking

()

RESOLVED FIXED
mozilla33
Tracking Status
firefox30 --- wontfix
firefox31 --- fixed
firefox32 --- fixed
firefox33 --- fixed
firefox-esr24 --- fixed
b2g-v1.3 --- fixed
b2g-v1.3T --- fixed
b2g-v1.4 --- fixed
b2g-v2.0 --- fixed
b2g-v2.1 --- fixed

People

(Reporter: KWierso, Assigned: seth)

References

(Depends on 1 open bug)

Details

(Keywords: intermittent-failure)

Attachments

(2 files, 1 obsolete file)

Attached image Attachment 1 —
https://tbpl.mozilla.org/php/getParsedLog.php?id=30917712&tree=Mozilla-Inbound
slave: t-w732-ix-079



14:10:19     INFO -  REFTEST TEST-START | file:///C:/slave/test/build/tests/reftest/tests/content/html/document/reftests/bug917595-exif-rotated.jpg
14:10:19     INFO -  REFTEST TEST-LOAD | file:///C:/slave/test/build/tests/reftest/tests/content/html/document/reftests/bug917595-exif-rotated.jpg | 9779 / 10059 (97%)
14:10:19     INFO -  ++DOMWINDOW == 179 (101E4F38) [pid = 3928] [serial = 27286] [outer = 09773890]
14:10:19     INFO -  REFTEST TEST-LOAD | file:///C:/slave/test/build/tests/reftest/tests/content/html/document/reftests/bug917595-pixel-rotated.jpg | 9779 / 10059 (97%)
14:10:19     INFO -  ++DOMWINDOW == 180 (1627F090) [pid = 3928] [serial = 27287] [outer = 09773890]
14:10:19     INFO -  --DOCSHELL 05E326B8 == 5 [pid = 3928] [id = 1198]
14:10:20     INFO -  REFTEST TEST-UNEXPECTED-FAIL | file:///C:/slave/test/build/tests/reftest/tests/content/html/document/reftests/bug917595-exif-rotated.jpg | image comparison (==), max difference: 224, number of differing pixels: 433587
14:10:20     INFO -  REFTEST   IMAGE 1 (TEST): [SEE ATTACHMENT 1 [details] [diff]]
14:10:21     INFO -  REFTEST   IMAGE 2 (REFERENCE): [SEE ATTACHMENT 2 [details] [diff]]
14:10:21     INFO -  REFTEST INFO | Loading a blank page
14:10:21     INFO -  ++DOMWINDOW == 181 (161F4908) [pid = 3928] [serial = 27288] [outer = 09773890]
14:10:21     INFO -  --DOCSHELL 05E34940 == 4 [pid = 3928] [id = 1199]
14:10:21     INFO -  REFTEST TEST-END | file:///C:/slave/test/build/tests/reftest/tests/content/html/document/reftests/bug917595-exif-rotated.jpg
Priority: -- → P5
There is probably a race here since the rotation is applied to image documents during the document load event.

Most likely the only possible fix is converting this to a mochitest.
In general the reftest harness should be correctly showing the result of any actions that are taken onload. I think a lot of reftests depend on that.
(In reply to Timothy Nikkel (:tn) from comment #3)
> In general the reftest harness should be correctly showing the result of any
> actions that are taken onload. I think a lot of reftests depend on that.

Not sure how this could be happening otherwise, though. I'm concerned image documents may be a case that hasn't been exercised up until now, since IIRC I added the first onload handler for them that could actually affect reftest results.
Hmm, actually looking back over the code I'm wondering if I need to manually trigger a repaint to be that one will happen.
Comments 13-15 were bug 952785.
(In reply to Seth Fowler [:seth] from comment #7)
> Hmm, actually looking back over the code I'm wondering if I need to manually
> trigger a repaint to be that one will happen.

Any more thoughts, Seth? :)
Flags: needinfo?(seth)
At this point my best guess is still that I need to manually trigger a repaint. Given the frequency of this test failure I don't have a lot of confidence about verifying a fix on try. I may just cook up a patch, push it in, and see whether we stop seeing the orange.
Flags: needinfo?(seth)
One thing that has changed since this was originally reported is that I can now reproduce it about 50% of the time on my dev machine. That has made it *much* easier to evaluate different fixes.

It looks like the problem was indeed that I needed to schedule a paint, or at least, scheduling a paint appears to fix it. I made it through about 10 runs with this patch applied without triggering the failure.

The patch is super trivial. Should hopefully have this fixed in a jiff.
Attachment #8401056 - Flags: review?(tnikkel)
Assignee: nobody → seth
Status: NEW → ASSIGNED
Comment on attachment 8401056 [details] [diff] [review]
Reliably rotate image document during load event.

So wouldn't we have the same bug for images that take their orientation from the exif information but are loaded as a part of a document (not as a top level document)?
(In reply to Timothy Nikkel (:tn) from comment #40)
> Comment on attachment 8401056 [details] [diff] [review]
> Reliably rotate image document during load event.
> 
> So wouldn't we have the same bug for images that take their orientation from
> the exif information but are loaded as a part of a document (not as a top
> level document)?

I don't think so, no, although it's possible I've missed something.

What's different about the ImageDocument case is that it directly manipulates its own size through unsavory methods (i.e., directly setting member variables), rather than going through the usual style/layout channels. Normally SchedulePaint is called automatically as part of the layout/DLBI machinery, but in this case we are bypassing that and we need to do it ourselves.
(In reply to Seth Fowler [:seth] from comment #41)
> What's different about the ImageDocument case is that it directly
> manipulates its own size through unsavory methods (i.e., directly setting
> member variables), rather than going through the usual style/layout
> channels.

Can you expand on this? Looking through ImageDocument.cpp it appears we set the width and height of the image via the nsIDOMHTMLImageElement interface and SetWidth/SetHeight functions. Those are regular content js accessible functions so updating rendering when those are changed should work.
Flags: needinfo?(seth)
Timothy, I think you have a point. Let's back up for a second, though.

(In reply to Timothy Nikkel (:tn) from comment #40)
> So wouldn't we have the same bug for images that take their orientation from
> the exif information but are loaded as a part of a document (not as a top
> level document)?

Keep in mind that this bug has nothing to do with the size that the nsImageFrame thinks it is. The problem is that the nsImageFrame always gets it right, but sometimes the ImageDocument has it wrong, so we see the image in the correct orientation but with the wrong aspect ratio. This probably isn't a bug in nsImageFrame itself.

(In reply to Timothy Nikkel (:tn) from comment #42)
> Can you expand on this? Looking through ImageDocument.cpp it appears we set
> the width and height of the image via the nsIDOMHTMLImageElement interface
> and SetWidth/SetHeight functions. Those are regular content js accessible
> functions so updating rendering when those are changed should work.

Yeah, this is true. I just spent a while tracing exactly what happens when the width and height attributes get set on the image content node. We end up in RestyleManager::AttributeChanged, where we find that HasAttributeDependentStyle returns eRestyle_Self. This results in a call to SetNeedsStyleFlush on the document, which takes care of it the next time we call RestyleManager::ProcessPendingRestyles. In the end nsStylePosition::CalcDifference ends up returning nsChangeHint_AllReflowHints, since the height has changed.

Clear we aren't doing quite enough to make the reflow actually happen in time for the reftest snapshot, though. Frustratingly I can no longer reproduce this locally (maybe there was more load on my machine the other day for some reason?) but I remember trying FlushPendingNotifications instead of contentFrame->SchedulePaint(), and it didn't work. Maybe I didn't use it with an aggressive-enough argument. SchedulePaint has the effect of calling SetNeedsLayoutFlush on the document, which may have something to do with why it actually works.
Flags: needinfo?(seth)
If the reflow isn't happening maybe it's because of bug 992324?
Possibly. I wish I could still reproduce this locally so I could check for that.
If interrupted reflows are indeed the cause maybe you can reproduce by making nsPresContext::CheckForInterrupt return true always or once in every n calls.
Comment on attachment 8401056 [details] [diff] [review]
Reliably rotate image document during load event.

If this only works because it causes another flush to happen then it's the wrong fix. So clearing review request for now.
Attachment #8401056 - Flags: review?(tnikkel)
(In reply to Timothy Nikkel (:tn) from comment #47)
> If this only works because it causes another flush to happen then it's the
> wrong fix. So clearing review request for now.

Yep, agreed.
(In reply to Timothy Nikkel (:tn) from comment #46)
> If interrupted reflows are indeed the cause maybe you can reproduce by
> making nsPresContext::CheckForInterrupt return true always or once in every
> n calls.

Just tried that, and that's not it.

Given that this happens in a VM, this is almost certainly a timing issue, but I'm not sure what other parts of this process are timing-dependent.
I spent a significant amount of time trying to reproduce this, trying things like enabling chaos mode, artificially inducing 100% load on all my cores, and checking out older versions of the code in case some checkin accidentally fixed this.

I had no luck reproducing it.

Based upon the explorations so far I have two conclusions:

(1) This may not be a bug in nsImageFrame *or* in ImageDocument, but in some other part of the system. We appear to be doing everything necessary to trigger a relayout and it's just not happening in time sometimes.

(2) Given that I can't reproduce this anymore and it's not exactly a frequent orange, I can't justify spending further time on it at this point.

I'll be more than happy to revisit this if any new information comes to light or it becomes easier to reproduce.
Well, I couldn't help myself from trying one more thing. Here's a try job with some verbose logging that I'm hoping will give us some more info about the failure:

https://tbpl.mozilla.org/?tree=Try&rev=1c96952473b5
Argh, forgot about the ancient version of GCC on these platforms. Another try:

https://tbpl.mozilla.org/?tree=Try&rev=3ce91f4ad1b5
Depends on: 1023618
I recently became able to reproduce this locally again and jumped on the opportunity to investigate. This led me to filing bug 1023618, which has now landed. I can no longer reproduce it locally with this patch in place. I hope that now means that the root cause of this bug (and possibly others) is now resolved, but with these intermittent oranges it's hard to be sure.

Ordinarily, my next step would be to watch this bug and see if we hit this anymore. However, it looks like we haven't seen this bug happen in the past two months in any case. I'm not sure what changed; it's possible an unrelated patch made the bug much less likely to occur.

Given that it doesn't seem like it'll be very useful to monitor this bug further, I'm inclined to resolve it as WORKSFORME at this point. (We can always reopen if someone actually hit this again.)

Ryan, do you have any objection to resolving this bug now?
Flags: needinfo?(ryanvm)
Attachment #8401056 - Attachment is obsolete: true
Works for me. Is that safe to backport to the other branches as well?
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Flags: needinfo?(ryanvm)
Resolution: --- → FIXED
Target Milestone: --- → mozilla33
Depends on: 997604
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #54)
> Works for me. Is that safe to backport to the other branches as well?

Yes, it's an incredibly low risk patch, should be safe to backport anywhere. I'll go ahead and request it.
No longer depends on: 997604
Depends on: 1029285
Product: Core → Core Graveyard
Product: Core Graveyard → Core
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: