Closed Bug 910533 Opened 6 years ago Closed 5 years ago

Intermittent test_bug512435.html | img_b should be decoded before onload fires

Categories

(Core :: ImageLib, defect)

ARM
Android
defect
Not set

Tracking

()

RESOLVED WORKSFORME
mozilla28
Tracking Status
firefox26 --- fixed
firefox27 --- fixed
firefox28 --- fixed
firefox-esr24 --- unaffected
b2g-v1.2 --- fixed

People

(Reporter: RyanVM, Unassigned)

References

Details

(Keywords: intermittent-failure)

Attachments

(1 file)

Presumably from one of the patches you landed today.

https://tbpl.mozilla.org/php/getParsedLog.php?id=27148469&tree=Mozilla-Inbound

Android 4.0 Panda mozilla-inbound opt test mochitest-7 on 2013-08-28 19:17:56 PDT for push 85c661620e26
slave: panda-0736

19:34:00     INFO -  15394 INFO TEST-START | /tests/image/test/mochitest/test_bug512435.html
19:34:00     INFO -  15395 INFO TEST-PASS | /tests/image/test/mochitest/test_bug512435.html | img_a should be decoded before onload fires
19:34:00     INFO -  15396 ERROR TEST-UNEXPECTED-FAIL | /tests/image/test/mochitest/test_bug512435.html | img_b should be decoded before onload fires
19:34:00     INFO -  15397 INFO TEST-END | /tests/image/test/mochitest/test_bug512435.html | finished in 573ms
Thanks for that testing, Timothy. I've already backed out that patch. Will investigate further.
Alright. I obviously can't reproduce this locally, but I have formed two theories as to the cause of this issue. This patch tests them together:

https://tbpl.mozilla.org/?tree=Try&rev=7997c367fe74
Well, clearly that wasn't it.
Looks to me like we're still hitting this after the backout of bug 867183, though apparently with much lower frequency. Starting to think the root cause may be in bug 905468.
Double check to make sure the occurrences aren't coming from a tree where the backout hasn't made it to yet.
Comment 77 appears to be from m-c after the backout landed there.
Hmm.. actually, now that I think of it, though, I'm comparing the time when a patch hit m-c with the time when tests finished running from a push. It's natural that there'd be some lag. Checking deeper I take it back; looks like bug 867183 remains the likely culprit.
Pushed a new try job with some debug output and further modifications here:

https://tbpl.mozilla.org/?tree=Try&rev=368ee751bbc6
Any updates, Seth? :)
Assignee: nobody → seth
Flags: needinfo?(seth)
No, but I'll try to make progress on this one today.
Flags: needinfo?(seth)
Any news, Seth? This is our #1 orange on beta now :)
Flags: needinfo?(seth)
I now strongly believe that bug 905468 part 2 is the cause. Here are try results from a backout of that part:

https://tbpl.mozilla.org/?tree=Try&rev=925496115106
Flags: needinfo?(seth)
This try job is an experiment to try to track down where the oranges are coming from. If no oranges show up here then we've narrowed down the issue to one of two potential causes.

https://tbpl.mozilla.org/?tree=Try&rev=0d2e240e6882
Alright, it looks like we've probably found the source of the problem. Here's a modified version of the previous try job with one of the two fixes removed. If this is still green then it's possible fix 1 (when mInvalidRect gets cleared); if this turns orange then it's possible fix 2 (error/unblock onload notifications somehow getting sent out inappropriately for decode-only situations). My money is on possible fix 1.

https://tbpl.mozilla.org/?tree=Try&rev=88f36ee8bd6a
OK, here we're testing what could become the final fix:

https://tbpl.mozilla.org/?tree=Try&rev=aa4f40f38ca6
Here's another run with reftests on all platforms as a sanity check. Hopefully this fix doesn't turn something _else_ orange.

https://tbpl.mozilla.org/?tree=Try&rev=1130e6534e57
So it looks like this issue boils down to whether we clear mInvalidRect before or after calling SyncNotifyState.

IMO it makes more sense to do it _before_ notifying, but there seems to be a dependency on doing it after, so this patch changes the order. In bug 905468 we combined SyncNotifyState and SyncNotifyDecodeState, and this was actually different between the two - we did it before in the SyncNotifyState case and after in the SyncNotifyDecodeState case. At least now we're consistent.

I'd love to understand exactly what the problem is with clearing it before notifying, but given the difficulty of reproducing this bug and the ease of fixing it, it doesn't seem like a good use of time at the moment.
Attachment #829615 - Flags: review?(josh)
Attachment #829615 - Flags: review?(josh) → review?(tnikkel)
Flipping this over to Timothy since I suspect Josh is out of town at the moment.
Comment on attachment 829615 [details] [diff] [review]
Clear invalidation rect only after sync notifying in imgStatusTracker.

There are only a few places that look at mInvalidRect, so this is a little weird. Would be good to figure out why this is important. But it is just restoring what we had before for decode notifications, so should be fine.
Attachment #829615 - Flags: review?(tnikkel) → review+
Thanks for the review!

(In reply to Timothy Nikkel (:tn) from comment #221)
> Comment on attachment 829615 [details] [diff] [review]
> Clear invalidation rect only after sync notifying in imgStatusTracker.
> 
> There are only a few places that look at mInvalidRect, so this is a little
> weird. Would be good to figure out why this is important. But it is just
> restoring what we had before for decode notifications, so should be fine.

Definitely agreed. Frankly, the way mInvalidRect is handled in general makes me quite uncomfortable. I'd like to make the invariants stronger but to do this I need to separate the implementation of an imgStatusTracker that's merely "recording" from the normal case. I suspect once we have a smaller piece of code with stronger invariants we'll be able to reason about this much more easily and problems like this will be greatly simplified.
Marking this as [leave open] until we confirm that the orange is gone.

Needinfo'ing myself to check back in a week or so.
Flags: needinfo?(seth)
Whiteboard: [leave open]
OK, I'm going to call this resolved unless we start hitting this again.
Flags: needinfo?(seth)
Whiteboard: [leave open]
Comment on attachment 829615 [details] [diff] [review]
Clear invalidation rect only after sync notifying in imgStatusTracker.

[Approval Request Comment]
Bug caused by (feature/regressing bug #): 905468
User impact if declined: It's unlikely that this bug is user visible. However, it is a frequent intermittent orange that causes troubles for our sheriffs.
Testing completed (on m-c, etc.): On m-c for a week.
Risk to taking this patch (and alternatives if risky): None. This is a tiny patch, and it restores behavior that existed prior to bug 905468.
String or IDL/UUID changes made by this patch: None.
Attachment #829615 - Flags: approval-mozilla-beta?
Attachment #829615 - Flags: approval-mozilla-aurora?
Comment on attachment 829615 [details] [diff] [review]
Clear invalidation rect only after sync notifying in imgStatusTracker.

low risk patch, helps with intermittent-failures. Approving on branches.
Attachment #829615 - Flags: approval-mozilla-beta?
Attachment #829615 - Flags: approval-mozilla-beta+
Attachment #829615 - Flags: approval-mozilla-aurora?
Attachment #829615 - Flags: approval-mozilla-aurora+
Thanks for the quick approval!
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla28
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Hmm, there's an obvious recent regression here.

It was a top orange, then a fix got pushed in comment 227, then we didn't see it at all for 2 months.

It reappeared about 3 months ago *one* time, then went away again for 2 months. I suspect that whatever caused this got fixed quickly, though it's hard to be sure.

In the last 3 weeks we've hit it 7 times.
Assignee: seth → nobody
This test was disabled on Android 2.3, but it seems to be running okay now -- enabled in bug 979921.
Status: REOPENED → RESOLVED
Closed: 6 years ago5 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.