Closed Bug 1319283 Opened 7 years ago Closed 7 years ago

Intermittent image/test/mochitest/test_bug1217571.html | containers for identical images in different iframes should be identical

Categories

(Core :: Graphics: ImageLib, defect, P5)

defect

Tracking

()

RESOLVED FIXED
mozilla55
Tracking Status
firefox-esr52 --- unaffected
firefox53 --- wontfix
firefox54 --- fixed
firefox55 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: tnikkel)

References

Details

(Keywords: intermittent-failure, Whiteboard: [gfx-noted][stockwell fixed])

Attachments

(1 file)

Whiteboard: [gfx-noted]
Priority: -- → P5
this has picked up, looks like 30+ failures in the last week.  appears to be android opt only...ni myself to get more info
Flags: needinfo?(jmaher)
Whiteboard: [gfx-noted] → [gfx-noted][stockwell needswork]
doing some retriggers here- I saw some done on the first instance, but it looks like almost 2 days later this picked up:
https://treeherder.mozilla.org/#/jobs?repo=autoland&filter-searchStr=mochitest-18%20opt&tochange=3244e9568d540abe75bf836edba1eb7aba44718f&fromchange=2b1d378dc8b705aa9f44951acdcd428cad1797e3&selectedJob=91579237

there is really no information in the logs to help out :(

this is what I see:
[task 2017-04-14T22:01:21.413801Z] 22:01:21     INFO -  17 INFO TEST-START | image/test/mochitest/test_bug1217571.html
[task 2017-04-14T22:01:21.414232Z] 22:01:21     INFO -  18 INFO TEST-UNEXPECTED-FAIL | image/test/mochitest/test_bug1217571.html | containers for identical images in different iframes should be identical
[task 2017-04-14T22:01:21.414388Z] 22:01:21     INFO -      window.onload@image/test/mochitest/test_bug1217571.html:35:3
[task 2017-04-14T22:01:21.414900Z] 22:01:21     INFO -  19 INFO TEST-OK | image/test/mochitest/test_bug1217571.html | took 2841ms



this is where it fails in the test case:
https://dxr.mozilla.org/mozilla-central/source/image/test/mochitest/test_bug1217571.html?q=path%3Atest_bug1217571.html&redirect_type=single#36

unfortunately I don't see any screenshots, possibly an issue on android.


:milan, can you help find someone to look at this?  I see :froydnj authored the test, I have him cc'd here as well.  I suspect this isn't critical, but worth getting some eyes on in the near future.
Flags: needinfo?(jmaher) → needinfo?(milan)
(In reply to Joel Maher ( :jmaher) from comment #5)
> unfortunately I don't see any screenshots, possibly an issue on android.

Right - screenshot-on-fail does not work on Android.
I don't believe my patch triggers this intermittent. The first orange classified for this from the dashboard is https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=5e1026baaf23d0c2c96174d831e7d7f075bd3761&filter-searchStr=mochitest-18%20opt which landed 15hrs earlier than my patch. Also there is a merge just before my patch, so the intermittent may just be from there.
Flags: needinfo?(xidorn+moz)
There's no way I see that the patch for bug 1355683 could be relevant here.  None of the code it touches is even used in the testcase in question.
The earliest failure I can see is https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=5e1026baaf23d0c2c96174d831e7d7f075bd3761, but it happens so infrequently on that push, finding a narrow regression range may not be practical.
There haven't been many patches pushed recently that I would expect to affect this, so even from a somewhat large regression range it might be possible to pick out a patch that caused this.

That said, it's probably not hard to figure this out via printfs and pushing to try if it reproduces with a reasonable number of re-triggers.
I still hold my assertion of the root cause in this range:
https://treeherder.mozilla.org/#/jobs?repo=autoland&filter-searchStr=mochitest-18%20opt&tochange=3244e9568d540abe75bf836edba1eb7aba44718f&fromchange=2a494e1f39e78f9bb25681a3b6616c6697e48685

for 5 revisions prior I have 20 retriggers with no failures, then after Bug 1355683 lands, we see 2+ failures for every 20 retriggers on every push.

While a failure might have started much further in the past- it might be a different reason for failing, showing 100+ green runs and then 15%+ failure rate after Bug 1355683 looks like a root cause for making this more intermittent or failing for a different reason.

Unless we are dealing with some build differences that are not related to the pushlog (clobbers, library ordering, etc.) this looks like a clear culprit.

:xidorn as the patch author can you look into this, maybe printf's on try?  If not, can you find someone to take ownership of this?
Flags: needinfo?(xidorn+moz)
Could you do a try push with that commit backed out and see if the intermittent disappears?

Although that test seems completely unrelated to the patch, that patch may accelerate some code path, so maybe it reveals some timing issue, but really unlikely. And I don't really want to back out that bug since that may get in the way of stylo.

Is that test important? Could we disable it on android opt? How is the test supposed to work? Why it is broken now?

froydnj, as the author of that test, do you have some idea about this issue?
Flags: needinfo?(xidorn+moz) → needinfo?(nfroyd)
this doesn't backout cleanly, possibly when I have more time I could look into it, probably tomorrow.
I've started looking into this. It reproduces trivially locally.
(In reply to Timothy Nikkel (:tnikkel) from comment #17)
> I've started looking into this. It reproduces trivially locally.

Actually it doesn't reproduce locally so far, but I'm pushing to try.
(In reply to Xidorn Quan [:xidorn] UTC+10 (less responsive 15/Apr-3/May) from comment #14)
> Could you do a try push with that commit backed out and see if the
> intermittent disappears?
>
> froydnj, as the author of that test, do you have some idea about this issue?

I have no idea what's going on here, but I do see that Timothy is looking into the issue, which is great!

Redirecting the ni? to Joel so the request to do a try push is in his queue.
Flags: needinfo?(nfroyd) → needinfo?(jmaher)
Actually, I can't even reproduce this on try at all, even on a push with no changes to m-c.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=2333879b213a0e3de25fec61ff4f89db95a1db96&group_state=expanded
This was happening between April 13 and April 21, but there have been no failures since. Accidentally fixed??
yeah, I see the same thing and there are no changes to the manifest file or test case.
Flags: needinfo?(jmaher)
Whiteboard: [gfx-noted][stockwell needswork] → [gfx-noted][stockwell unknown]
It likely has to do with how the mochitest gets chunked together because so far I've discovered that the problem happens when an image expires from the cache (after 25 seconds).
The problem is that damon.jpg (used in this test) is used in other tests before. It expires from the cache after 25 seconds (determined by what the server tells us). So there is nothing to stop us from loading the image before 25 seconds in one iframe, and after 25 seconds in the other iframe, and thus fetching the image again and we create a new image (in case it changed).
Flags: needinfo?(milan)
Assignee: nobody → tnikkel
Attachment #8862166 - Flags: review?(aosmond)
Attachment #8862166 - Flags: review?(aosmond) → review+
The other thing I learned from this was that the network cache thinks cache items are valid until after their expiration time (ie >). The check in imagelib treats items as invalid if we reach the expiration time (ie >=). Or at least that's what it looked like from the logs. I don't know where in the network code this check happens.
Pushed by tnikkel@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/cf8d41c8b1f8
In test_bug1217571.html use an image that is only used in this test. r=aosmond
https://hg.mozilla.org/mozilla-central/rev/cf8d41c8b1f8
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
Whiteboard: [gfx-noted][stockwell unknown] → [gfx-noted][stockwell fixed]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: