Intermittent image/test/mochitest/test_bug1217571.html | containers for identical images in different iframes should be identical

RESOLVED FIXED in Firefox 54

Status

()

Core
ImageLib
P5
normal
RESOLVED FIXED
a year ago
2 months ago

People

(Reporter: Treeherder Bug Filer, Assigned: tnikkel)

Tracking

({intermittent-failure})

unspecified
mozilla55
intermittent-failure
Points:
---

Firefox Tracking Flags

(firefox-esr52 unaffected, firefox53 wontfix, firefox54 fixed, firefox55 fixed)

Details

(Whiteboard: [gfx-noted][stockwell fixed])

Attachments

(1 attachment)

(Reporter)

Description

a year ago
treeherder
Filed by: wkocher [at] mozilla.com

https://treeherder.mozilla.org/logviewer.html#?job_id=6965815&repo=autoland

https://queue.taskcluster.net/v1/task/HbFru0pOQ6iJ8bKiTcIUWQ/runs/0/artifacts/public/logs/live_backing.log
(Assignee)

Updated

a year ago
Whiteboard: [gfx-noted]
(Assignee)

Updated

a year ago
Priority: -- → P5

Updated

11 months ago
Duplicate of this bug: 1319312

Comment 2

10 months ago
6 failures in 749 pushes (0.008 failures/push) were associated with this bug in the last 7 days.  

Repository breakdown:
* mozilla-beta: 6

Platform breakdown:
* android-4-3-armv7-api15: 6

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1319283&startday=2017-01-23&endday=2017-01-29&tree=all

Comment 3

7 months ago
13 failures in 894 pushes (0.015 failures/push) were associated with this bug in the last 7 days.   

Repository breakdown:
* autoland: 9
* mozilla-inbound: 3
* try: 1

Platform breakdown:
* android-4-3-armv7-api15: 13

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1319283&startday=2017-04-10&endday=2017-04-16&tree=all
this has picked up, looks like 30+ failures in the last week.  appears to be android opt only...ni myself to get more info
Flags: needinfo?(jmaher)
Whiteboard: [gfx-noted] → [gfx-noted][stockwell needswork]
doing some retriggers here- I saw some done on the first instance, but it looks like almost 2 days later this picked up:
https://treeherder.mozilla.org/#/jobs?repo=autoland&filter-searchStr=mochitest-18%20opt&tochange=3244e9568d540abe75bf836edba1eb7aba44718f&fromchange=2b1d378dc8b705aa9f44951acdcd428cad1797e3&selectedJob=91579237

there is really no information in the logs to help out :(

this is what I see:
[task 2017-04-14T22:01:21.413801Z] 22:01:21     INFO -  17 INFO TEST-START | image/test/mochitest/test_bug1217571.html
[task 2017-04-14T22:01:21.414232Z] 22:01:21     INFO -  18 INFO TEST-UNEXPECTED-FAIL | image/test/mochitest/test_bug1217571.html | containers for identical images in different iframes should be identical
[task 2017-04-14T22:01:21.414388Z] 22:01:21     INFO -      window.onload@image/test/mochitest/test_bug1217571.html:35:3
[task 2017-04-14T22:01:21.414900Z] 22:01:21     INFO -  19 INFO TEST-OK | image/test/mochitest/test_bug1217571.html | took 2841ms



this is where it fails in the test case:
https://dxr.mozilla.org/mozilla-central/source/image/test/mochitest/test_bug1217571.html?q=path%3Atest_bug1217571.html&redirect_type=single#36

unfortunately I don't see any screenshots, possibly an issue on android.


:milan, can you help find someone to look at this?  I see :froydnj authored the test, I have him cc'd here as well.  I suspect this isn't critical, but worth getting some eyes on in the near future.
Flags: needinfo?(jmaher) → needinfo?(milan)

Comment 6

7 months ago
(In reply to Joel Maher ( :jmaher) from comment #5)
> unfortunately I don't see any screenshots, possibly an issue on android.

Right - screenshot-on-fail does not work on Android.
ok, going further back in history:
https://treeherder.mozilla.org/#/jobs?repo=autoland&filter-searchStr=mochitest-18%20opt&tochange=3244e9568d540abe75bf836edba1eb7aba44718f&fromchange=0ac4960227da322feca7798925e11286c9cce5dd&selectedJob=91416929

will follow up in a couple hours if there is a clear push
and a root cause:
https://treeherder.mozilla.org/#/jobs?repo=autoland&filter-searchStr=mochitest-18%20opt&tochange=3244e9568d540abe75bf836edba1eb7aba44718f&fromchange=2a494e1f39e78f9bb25681a3b6616c6697e48685

bug 1355683:
https://hg.mozilla.org/integration/autoland/rev/94b5ea8bed5caa0dbea578a33e2b900264267846

:xidorn, can you look at this, it looks as if your patch caused this regression.
Flags: needinfo?(xidorn+moz)
I don't believe my patch triggers this intermittent. The first orange classified for this from the dashboard is https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=5e1026baaf23d0c2c96174d831e7d7f075bd3761&filter-searchStr=mochitest-18%20opt which landed 15hrs earlier than my patch. Also there is a merge just before my patch, so the intermittent may just be from there.
Flags: needinfo?(xidorn+moz)
There's no way I see that the patch for bug 1355683 could be relevant here.  None of the code it touches is even used in the testcase in question.
The earliest failure I can see is https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=5e1026baaf23d0c2c96174d831e7d7f075bd3761, but it happens so infrequently on that push, finding a narrow regression range may not be practical.
(Assignee)

Comment 12

7 months ago
There haven't been many patches pushed recently that I would expect to affect this, so even from a somewhat large regression range it might be possible to pick out a patch that caused this.

That said, it's probably not hard to figure this out via printfs and pushing to try if it reproduces with a reasonable number of re-triggers.
I still hold my assertion of the root cause in this range:
https://treeherder.mozilla.org/#/jobs?repo=autoland&filter-searchStr=mochitest-18%20opt&tochange=3244e9568d540abe75bf836edba1eb7aba44718f&fromchange=2a494e1f39e78f9bb25681a3b6616c6697e48685

for 5 revisions prior I have 20 retriggers with no failures, then after Bug 1355683 lands, we see 2+ failures for every 20 retriggers on every push.

While a failure might have started much further in the past- it might be a different reason for failing, showing 100+ green runs and then 15%+ failure rate after Bug 1355683 looks like a root cause for making this more intermittent or failing for a different reason.

Unless we are dealing with some build differences that are not related to the pushlog (clobbers, library ordering, etc.) this looks like a clear culprit.

:xidorn as the patch author can you look into this, maybe printf's on try?  If not, can you find someone to take ownership of this?
Flags: needinfo?(xidorn+moz)
Could you do a try push with that commit backed out and see if the intermittent disappears?

Although that test seems completely unrelated to the patch, that patch may accelerate some code path, so maybe it reveals some timing issue, but really unlikely. And I don't really want to back out that bug since that may get in the way of stylo.

Is that test important? Could we disable it on android opt? How is the test supposed to work? Why it is broken now?

froydnj, as the author of that test, do you have some idea about this issue?
Flags: needinfo?(xidorn+moz) → needinfo?(nfroyd)

Comment 15

7 months ago
43 failures in 817 pushes (0.053 failures/push) were associated with this bug in the last 7 days. 

This is the #19 most frequent failure this week.  

** This failure happened more than 30 times this week! Resolving this bug is a high priority. **

** Try to resolve this bug as soon as possible. If unresolved for 2 weeks, the affected test(s) may be disabled. ** 

Repository breakdown:
* mozilla-inbound: 13
* autoland: 13
* graphics: 8
* try: 5
* mozilla-central: 4

Platform breakdown:
* android-4-3-armv7-api15: 42
* osx-10-10: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1319283&startday=2017-04-17&endday=2017-04-23&tree=all
this doesn't backout cleanly, possibly when I have more time I could look into it, probably tomorrow.
(Assignee)

Comment 17

7 months ago
I've started looking into this. It reproduces trivially locally.
(Assignee)

Comment 18

7 months ago
(In reply to Timothy Nikkel (:tnikkel) from comment #17)
> I've started looking into this. It reproduces trivially locally.

Actually it doesn't reproduce locally so far, but I'm pushing to try.
(In reply to Xidorn Quan [:xidorn] UTC+10 (less responsive 15/Apr-3/May) from comment #14)
> Could you do a try push with that commit backed out and see if the
> intermittent disappears?
>
> froydnj, as the author of that test, do you have some idea about this issue?

I have no idea what's going on here, but I do see that Timothy is looking into the issue, which is great!

Redirecting the ni? to Joel so the request to do a try push is in his queue.
Flags: needinfo?(nfroyd) → needinfo?(jmaher)
(Assignee)

Comment 20

7 months ago
Actually, I can't even reproduce this on try at all, even on a push with no changes to m-c.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=2333879b213a0e3de25fec61ff4f89db95a1db96&group_state=expanded
This was happening between April 13 and April 21, but there have been no failures since. Accidentally fixed??
yeah, I see the same thing and there are no changes to the manifest file or test case.
Flags: needinfo?(jmaher)
Whiteboard: [gfx-noted][stockwell needswork] → [gfx-noted][stockwell unknown]
(Assignee)

Comment 23

7 months ago
It likely has to do with how the mochitest gets chunked together because so far I've discovered that the problem happens when an image expires from the cache (after 25 seconds).
(Assignee)

Comment 24

7 months ago
The problem is that damon.jpg (used in this test) is used in other tests before. It expires from the cache after 25 seconds (determined by what the server tells us). So there is nothing to stop us from loading the image before 25 seconds in one iframe, and after 25 seconds in the other iframe, and thus fetching the image again and we create a new image (in case it changed).
Flags: needinfo?(milan)
(Assignee)

Comment 25

7 months ago
Created attachment 8862166 [details] [diff] [review]
use an image only in this test
Assignee: nobody → tnikkel
Attachment #8862166 - Flags: review?(aosmond)
Attachment #8862166 - Flags: review?(aosmond) → review+
(Assignee)

Comment 26

7 months ago
The other thing I learned from this was that the network cache thinks cache items are valid until after their expiration time (ie >). The check in imagelib treats items as invalid if we reach the expiration time (ie >=). Or at least that's what it looked like from the logs. I don't know where in the network code this check happens.

Comment 27

7 months ago
26 failures in 883 pushes (0.029 failures/push) were associated with this bug in the last 7 days.   

Repository breakdown:
* autoland: 9
* try: 7
* mozilla-inbound: 5
* mozilla-central: 3
* graphics: 2

Platform breakdown:
* android-4-3-armv7-api15: 26

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1319283&startday=2017-04-24&endday=2017-04-30&tree=all

Comment 28

7 months ago
16 failures in 146 pushes (0.11 failures/push) were associated with this bug yesterday.   

Repository breakdown:
* autoland: 7
* try: 4
* mozilla-inbound: 4
* graphics: 1

Platform breakdown:
* android-4-3-armv7-api15: 16

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1319283&startday=2017-05-02&endday=2017-05-02&tree=all

Comment 29

7 months ago
Pushed by tnikkel@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/cf8d41c8b1f8
In test_bug1217571.html use an image that is only used in this test. r=aosmond

Comment 30

7 months ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/cf8d41c8b1f8
Status: NEW → RESOLVED
Last Resolved: 7 months ago
status-firefox55: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
Whiteboard: [gfx-noted][stockwell unknown] → [gfx-noted][stockwell fixed]
status-firefox53: --- → wontfix
status-firefox54: --- → affected
status-firefox-esr52: --- → unaffected
https://hg.mozilla.org/releases/mozilla-beta/rev/c7d36791f6bc
status-firefox54: affected → fixed

Comment 32

7 months ago
20 failures in 770 pushes (0.026 failures/push) were associated with this bug in the last 7 days.   

Repository breakdown:
* try: 7
* autoland: 7
* mozilla-inbound: 4
* graphics: 2

Platform breakdown:
* android-4-3-armv7-api15: 20

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1319283&startday=2017-05-01&endday=2017-05-07&tree=all

Comment 33

3 months ago
1 failures in 908 pushes (0.001 failures/push) were associated with this bug in the last 7 days.   

Repository breakdown:
* mozilla-inbound: 1

Platform breakdown:
* linux32: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1319283&startday=2017-08-21&endday=2017-08-27&tree=all

Comment 34

2 months ago
10 failures in 924 pushes (0.011 failures/push) were associated with this bug in the last 7 days.   

Repository breakdown:
* mozilla-inbound: 10

Platform breakdown:
* linux32: 7
* osx-10-10: 2
* linux64: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1319283&startday=2017-09-04&endday=2017-09-10&tree=all
You need to log in before you can comment on or make changes to this bug.