Intermittent extensions/cookie/test/browser_test_favicon.js | Test timed out -

NEW
Unassigned

Status

()

defect
P2
normal
2 years ago
4 months ago

People

(Reporter: intermittent-bug-filer, Unassigned)

Tracking

({intermittent-failure, leave-open, stalled})

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [necko-triaged][stockwell unknown])

Attachments

(1 attachment, 1 obsolete attachment)

The issue first happened at 25 Sep 2017.  Kershaw, this seems not related to bug 1247843 because that patch was not uplifted to beta.  Ni you just in case you have some idea.
Flags: needinfo?(kechang)
(In reply to Shian-Yow Wu [:swu] from comment #3)
> The issue first happened at 25 Sep 2017.  Kershaw, this seems not related to
> bug 1247843 because that patch was not uplifted to beta.  Ni you just in
> case you have some idea.

Thanks!

This seems to have nothing to do with bug 1247843, but I'll keep this on watchlist.
Flags: needinfo?(kechang)
Honza, can this be influence with something you landed that delays some requests? I do nod remember when all pieces landed.
Flags: needinfo?(honzab.moz)
(In reply to Dragana Damjanovic [:dragana] from comment #6)
> Honza, can this be influence with something you landed that delays some
> requests? I do nod remember when all pieces landed.

The favicon regression that was fixed with bug 1401538 is on beta (uplift) since 2017-10-12.  So, it was  not the cause since comment 5 shows this test is still failing.

Bug 1358060 has landed on 2017-08-31 on m-c (57 that time).

Bug 1247843 has landed on 2017-09-22.

But this intermittent started and is steadily happening every day since 2017-09-25 on 58 m-c and on 2017-09-29 on 57 m-b, hence it has to be something else.  Note that 2017-09-25 was the 57 m-c -> m-b merge day.



I found following known to me bugs landed during that period that were on the repo at the time the failure was first observed:

https://bugzilla.mozilla.org/show_bug.cgi?id=1360603
https://bugzilla.mozilla.org/show_bug.cgi?id=1400001

Could this be bug 1360603?  Landed on m-c (58) on 2017-09-25 and was uplifted to beta (57) on 2017-09-27.  But it's so negligible change that I doubt that...
Flags: needinfo?(honzab.moz)
In the last 7 days there were 42 failure. Most of the failures occur on Linux but there are also a few occurrences on OS X.
The build type on which the failures mostly occur are opt, pgo and one asan.
Test suite: mochitest-browser-chrome-e10s.

This is an example of a recent log: https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-inbound&job_id=140310315&lineNumber=4345

Here is a part of a log:
[task 2017-10-27T20:51:40.268Z] 20:51:40     INFO - checking window state
4342
[task 2017-10-27T20:51:40.297Z] 20:51:40     INFO - TEST-START | extensions/cookie/test/browser_test_favicon.js
4343
[task 2017-10-27T20:52:25.299Z] 20:52:25     INFO - TEST-INFO | started process screentopng
4344
[task 2017-10-27T20:52:25.928Z] 20:52:25     INFO - TEST-INFO | screentopng: exit 0
4345
[task 2017-10-27T20:52:25.928Z] 20:52:25     INFO - TEST-UNEXPECTED-FAIL | extensions/cookie/test/browser_test_favicon.js | Test timed out - 
4346
[task 2017-10-27T20:52:25.928Z] 20:52:25     INFO - GECKO(4576) | MEMORY STAT | vsize 2330MB | residentFast 235MB | heapAllocated 83MB
4347
[task 2017-10-27T20:52:25.928Z] 20:52:25     INFO - TEST-OK | extensions/cookie/test/browser_test_favicon.js | took 45021ms
4348
[task 2017-10-27T20:52:25.928Z] 20:52:25     INFO - checking window state

:jduell could you please take a look?
Flags: needinfo?(jduell.mcbugs)
Whiteboard: [stockwell needswork]
Priority: P5 → P2
Whiteboard: [stockwell needswork] → [stockwell needswork][necko-triaged]
swu: feel free to bounce back to me if you can't find someone.
Flags: needinfo?(jduell.mcbugs) → needinfo?(swu)
I'd like to take this, since I might have a solution.
Flags: needinfo?(swu)
Posted patch Add cleanup function in test (obsolete) — Splinter Review
I think this bug might be caused by not doing cleanup before starting loading a favicon. So, I just add a cleanup function in the test case and call it before starting the test.
It looks like the try result [1] agrees with my theory. 

@Marco, could you take a look at this patch? Thanks.




[1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=c6892553491fe287e1a88efad58eb23e218e5d5f&group_state=expanded
Assignee: nobody → kechang
Status: NEW → ASSIGNED
Attachment #8924630 - Flags: review?(mak77)
Comment on attachment 8924630 [details] [diff] [review]
Add cleanup function in test

Review of attachment 8924630 [details] [diff] [review]:
-----------------------------------------------------------------

::: extensions/cookie/test/browser_test_favicon.js
@@ +32,5 @@
> +  await clearAllPlacesFavicons();
> +  // Clear all image caches and network caches.
> +  clearAllImageCaches();
> +  // Clear Places history.
> +  await PlacesUtils.history.clear();

Off-hand some of these look pointless.
For example setIcon has no relation with Places, so both clearing history and clearing Places icons should have no effect.
The other thing that seems unlikely are the cache and image cache, this image is only loaded by this single test, and we create a new profile per each folder, so why should those matter?

I'd suggest to try just clearing the cookies, would it have an effect?

Another possibility is that the effect of all of these cleanups in in reality is just to space this test from the previous ones, and then it's not really about cleanups, but it's about inter-test interactions...

In a few words, even if this appears to work, we should try to understand why.
Attachment #8924630 - Flags: review?(mak77)
(In reply to Marco Bonardo [::mak] from comment #14)
> Comment on attachment 8924630 [details] [diff] [review]
> Add cleanup function in test
> 
> Review of attachment 8924630 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: extensions/cookie/test/browser_test_favicon.js
> @@ +32,5 @@
> > +  await clearAllPlacesFavicons();
> > +  // Clear all image caches and network caches.
> > +  clearAllImageCaches();
> > +  // Clear Places history.
> > +  await PlacesUtils.history.clear();
> 
> Off-hand some of these look pointless.
> For example setIcon has no relation with Places, so both clearing history
> and clearing Places icons should have no effect.
> The other thing that seems unlikely are the cache and image cache, this
> image is only loaded by this single test, and we create a new profile per
> each folder, so why should those matter?
> 
> I'd suggest to try just clearing the cookies, would it have an effect?
> 
> Another possibility is that the effect of all of these cleanups in in
> reality is just to space this test from the previous ones, and then it's not
> really about cleanups, but it's about inter-test interactions...
> 
> In a few words, even if this appears to work, we should try to understand
> why.

Thanks for the feedback. I agree with you that we have to figure out the root cause.
After some experiments, it looks like that none of clearing Places icons, cookies, and caches works.

However, I found that the http channel used to load the favicon is not created in the failed test case. Please search the "nsHttpChannel::AsyncOpen" in the passed log [1] and failed log [2] and you can see this.

It looks like this is not a networking issue, since the http channel is not even created.
Tentatively change the component to tabbed browser.


[1] https://public-artifacts.taskcluster.net/XvaFBHF4QsmWNoEPiXjRDA/0/public/logs/live_backing.log
[2] https://public-artifacts.taskcluster.net/FK8TOpJETgaMama2xXhs_A/0/public/logs/live_backing.log
Assignee: kechang → nobody
Status: ASSIGNED → NEW
Component: Networking: Cookies → Tabbed Browser
Flags: needinfo?(mak77)
Product: Core → Firefox
Attachment #8924630 - Attachment is obsolete: true
(In reply to Kershaw Chang [:kershaw] from comment #16)
> Thanks for the feedback. I agree with you that we have to figure out the
> root cause.
> After some experiments, it looks like that none of clearing Places icons,
> cookies, and caches works.

Does this mean the previous green Try build was just by luck?

> However, I found that the http channel used to load the favicon is not
> created in the failed test case. Please search the
> "nsHttpChannel::AsyncOpen" in the passed log [1] and failed log [2] and you
> can see this.

The only thing that setIcon does to start the network load is aTab.setAttribute("image", sizedIconUrl);
I wonder if the problem is that we call gBrowser.setIcon too early. The test harness opens a tab and starts the test, maybe we setIcon and then the default tab setIcon call runs.

I can try doing a simple changes to the test (open a new tab and wait for its load) and see how it reacts.
Assignee: nobody → mak77
Status: NEW → ASSIGNED
Flags: needinfo?(mak77)
(In reply to Marco Bonardo [::mak] from comment #17)
> (In reply to Kershaw Chang [:kershaw] from comment #16)
> > Thanks for the feedback. I agree with you that we have to figure out the
> > root cause.
> > After some experiments, it looks like that none of clearing Places icons,
> > cookies, and caches works.
> 
> Does this mean the previous green Try build was just by luck?
> 

Yes, it was just by luck. :(

> > However, I found that the http channel used to load the favicon is not
> > created in the failed test case. Please search the
> > "nsHttpChannel::AsyncOpen" in the passed log [1] and failed log [2] and you
> > can see this.
> 
> The only thing that setIcon does to start the network load is
> aTab.setAttribute("image", sizedIconUrl);
> I wonder if the problem is that we call gBrowser.setIcon too early. The test
> harness opens a tab and starts the test, maybe we setIcon and then the
> default tab setIcon call runs.
> 
> I can try doing a simple changes to the test (open a new tab and wait for
> its load) and see how it reacts.

Unfortunately, I still see a test failure from the link you provided. I see two "SETTING ATTRIBUTE" both in success and failure logs, so the problem must be at somewhere else.
Even if the attribute is written, in the timeout screenshot the tab has no icon, so I still suspect the 2 set attribute may be happening inverted.
Today I may not have time for further experiments, I'll look in the next days, in the worst case we could take the patch as a first step to reduce the number of failures but leave the bug open.
(In reply to Marco Bonardo [::mak] from comment #23)
> Even if the attribute is written, in the timeout screenshot the tab has no
> icon, so I still suspect the 2 set attribute may be happening inverted.

FWIW, the test is finished before the loading of favicon completed. So, it is normal without seeing the icon in the screenshot.

> Today I may not have time for further experiments, I'll look in the next
> days, in the worst case we could take the patch as a first step to reduce
> the number of failures but leave the bug open.

It seems that your change reduces the failure rate significantly. Besides the one time failure mentioned in comment 22, I can't reproduce this anymore on try server.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=37e1ca3146ca362fe33dad294da75b9b91a75bc4&group_state=expanded
Comment on attachment 8925698 [details]
Bug 1403009 - Intermittent failure in browser_test_favicon.js.

https://reviewboard.mozilla.org/r/196824/#review202780

r+ for reducing the failure rate.
Attachment #8925698 - Flags: review?(kechang) → review+
Pushed by mak77@bonardo.net:
https://hg.mozilla.org/integration/autoland/rev/9d06816dccb8
Intermittent failure in browser_test_favicon.js. r=kershaw
looking at the orange factor, the number of failures largely reduced on 9th of November, so the partial fix helped. The report in comment 28 is mostly for the previous period.
That said, the bug still hits us, but I'm lacking ideas atm.
I'm unassigning to indicate that I'm not immediately working on it, so that anyone with an idea could chime in.
Assignee: mak77 → nobody
Status: ASSIGNED → NEW
There are 27 failures in the last 7 days.
All of them are on Windows - debug:
- windows7-32 -> 11 failures 
- windows10-64 -> 16 failures

Recent failure: https://treeherder.mozilla.org/logviewer.html#?job_id=183062259&repo=autoland&lineNumber=20311

18:44:35     INFO -  852 INFO TEST-START | extensions/cookie/test/browser_test_favicon.js
18:44:35     INFO -  GECKO(5764) | Chrome file doesn't exist: Z:\task_1528912501\build\tests\mochitest\browser\extensions\cookie\test\head.js
18:44:35     INFO -  GECKO(5764) | ++DOCSHELL 09947800 == 3 [pid = 5924] [id = {0ba73d1c-2a4a-496b-98b0-999ecbaac6d4}]
18:44:35     INFO -  GECKO(5764) | ++DOMWINDOW == 6 (059A77A0) [pid = 5924] [serial = 6] [outer = 00000000]
18:44:35     INFO -  GECKO(5764) | ++DOMWINDOW == 7 (09949000) [pid = 5924] [serial = 7] [outer = 059A77A0]
18:44:35     INFO -  GECKO(5764) | [Child 5924, Main Thread] WARNING: site security information will not be persisted: file z:/build/build/src/security/manager/ssl/nsSiteSecurityService.cpp, line 553
18:44:35     INFO -  GECKO(5764) | [Child 3996, Main Thread] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80004005: file z:/build/build/src/xpcom/base/nsSystemInfo.cpp, line 119
18:44:35     INFO -  GECKO(5764) | ++DOCSHELL 05523800 == 1 [pid = 3996] [id = {3207a8cd-7600-409b-9720-280c08252dc9}]
18:44:35     INFO -  GECKO(5764) | [Child 3996, Main Thread] WARNING: 'NS_FAILED(rv)', file z:/build/build/src/widget/windows/nsLookAndFeel.cpp, line 847
18:44:35     INFO -  GECKO(5764) | ++DOMWINDOW == 1 (055BF2E0) [pid = 3996] [serial = 1] [outer = 00000000]
18:44:35     INFO -  GECKO(5764) | ++DOMWINDOW == 8 (0994BC00) [pid = 5924] [serial = 8] [outer = 059A77A0]
18:44:35     INFO -  GECKO(5764) | ++DOMWINDOW == 2 (0A2E8800) [pid = 3996] [serial = 2] [outer = 055BF2E0]
18:44:35     INFO -  GECKO(5764) | [Child 3996, Main Thread] WARNING: 'NS_FAILED(GetAccentColor(unused))', file z:/build/build/src/widget/windows/nsLookAndFeel.cpp, line 473
18:44:35     INFO -  GECKO(5764) | [Parent 5764, Main Thread] WARNING: NS_ENSURE_TRUE(aSecondURI) failed: file z:/build/build/src/dom/base/ThirdPartyUtil.cpp, line 98
18:44:39     INFO -  GECKO(5764) | --DOCSHELL 16402C00 == 6 [pid = 5764] [id = {d5ee518e-37df-4e83-ac3c-e31a52a3e38c}]
18:44:39     INFO -  GECKO(5764) | --DOCSHELL 1B135000 == 5 [pid = 5764] [id = {1e9529aa-87ed-49b1-9f0c-64ee237c571c}]
18:44:40     INFO -  GECKO(5764) | --DOCSHELL 05D0F400 == 1 [pid = 5752] [id = {d471b857-a37c-4ca7-ad96-f8a5d40ea793}]
18:44:41     INFO -  GECKO(5764) | --DOCSHELL 0590F400 == 2 [pid = 5924] [id = {740c7727-b498-4f5f-8c3b-fa253697daee}]
18:44:41     INFO -  GECKO(5764) | --DOCSHELL 09944400 == 1 [pid = 5924] [id = {d7366732-6acc-45ec-8b70-bef2730ad2fc}]
18:44:44     INFO -  GECKO(5764) | --DOMWINDOW == 6 (15716C00) [pid = 5752] [serial = 6] [outer = 00000000] [url = about:blank]
18:44:44     INFO -  GECKO(5764) | --DOMWINDOW == 5 (05D10C00) [pid = 5752] [serial = 2] [outer = 00000000] [url = about:blank]
18:44:46     INFO -  GECKO(5764) | --DOMWINDOW == 7 (05910C00) [pid = 5924] [serial = 2] [outer = 00000000] [url = about:blank]
18:44:46     INFO -  GECKO(5764) | --DOMWINDOW == 6 (09949000) [pid = 5924] [serial = 7] [outer = 00000000] [url = about:blank]
18:44:46     INFO -  GECKO(5764) | --DOMWINDOW == 16 (1640F400) [pid = 5764] [serial = 4] [outer = 00000000] [url = about:blank]
18:44:46     INFO -  GECKO(5764) | --DOMWINDOW == 15 (1F705000) [pid = 5764] [serial = 13] [outer = 00000000] [url = about:blank]
18:44:46     INFO -  GECKO(5764) | --DOMWINDOW == 14 (1F706800) [pid = 5764] [serial = 14] [outer = 00000000] [url = about:blank]
18:44:46     INFO -  GECKO(5764) | --DOMWINDOW == 13 (0ADEAFF0) [pid = 5764] [serial = 9] [outer = 00000000] [url = about:blank]
18:44:47     INFO -  GECKO(5764) | --DOMWINDOW == 2 (05720C00) [pid = 6112] [serial = 2] [outer = 00000000] [url = about:blank]
18:44:48     INFO -  GECKO(5764) | --DOMWINDOW == 4 (05D992E0) [pid = 5752] [serial = 1] [outer = 00000000] [url = about:blank]
18:44:49     INFO -  GECKO(5764) | --DOMWINDOW == 5 (059A7670) [pid = 5924] [serial = 4] [outer = 00000000] [url = about:blank]
18:44:49     INFO -  GECKO(5764) | --DOMWINDOW == 4 (059A72E0) [pid = 5924] [serial = 1] [outer = 00000000] [url = chrome://gfxsanity/content/sanitytest.html]
18:44:50     INFO -  GECKO(5764) | --DOMWINDOW == 2 (05BD3400) [pid = 5660] [serial = 2] [outer = 00000000] [url = about:blank]
18:44:50     INFO -  GECKO(5764) | --DOMWINDOW == 1 (057B92E0) [pid = 6112] [serial = 1] [outer = 00000000] [url = view-source:http://example.com/]
18:44:50     INFO -  GECKO(5764) | --DOMWINDOW == 12 (1C8E4000) [pid = 5764] [serial = 10] [outer = 00000000] [url = about:blank]
18:44:51     INFO -  GECKO(5764) | --DOMWINDOW == 3 (0999B400) [pid = 5924] [serial = 3] [outer = 00000000] [url = chrome://gfxsanity/content/sanitytest.html]
18:44:51     INFO -  GECKO(5764) | --DOMWINDOW == 2 (09946C00) [pid = 5924] [serial = 5] [outer = 00000000] [url = about:blank]
18:44:52     INFO -  GECKO(5764) | --DOMWINDOW == 3 (138AAC00) [pid = 5752] [serial = 4] [outer = 00000000] [url = about:blank]
18:44:52     INFO -  GECKO(5764) | --DOMWINDOW == 2 (05D0F800) [pid = 5752] [serial = 3] [outer = 00000000] [url = about:blank]
18:44:54     INFO -  GECKO(5764) | [Parent 5764, StreamTrans #17] WARNING: 'NS_FAILED(rv)', file z:/build/build/src/modules/libjar/nsJARChannel.cpp, line 417
18:44:54     INFO -  GECKO(5764) | --DOMWINDOW == 0 (0A6F2800) [pid = 6112] [serial = 3] [outer = 00000000] [url = view-source:http://example.com/]
18:44:58     INFO -  GECKO(5764) | --DOCSHELL 0571E000 == 0 [pid = 6112] [id = {5e5cf990-f94e-4e5a-b2d9-d8eb612fbd64}]
18:46:05     INFO -  TEST-INFO | started process screenshot
18:46:05     INFO -  TEST-INFO | screenshot: exit 0
18:46:05     INFO -  Buffered messages logged at 18:44:35
18:46:05     INFO -  853 INFO Entering test bound
18:46:05     INFO -  854 INFO Console message: [JavaScript Warning: "Use of nsIFile in content process is deprecated." {file: "resource://gre/modules/FileUtils.jsm" line: 170}]
18:46:05     INFO -  Buffered messages finished
18:46:05    ERROR -  855 INFO TEST-UNEXPECTED-FAIL | extensions/cookie/test/browser_test_favicon.js | Test timed out -
18:46:05     INFO -  GECKO(5764) | MEMORY STAT | vsize 714MB | vsizeMaxContiguous 752MB | residentFast 203MB | heapAllocated 72MB
18:46:05     INFO -  856 INFO TEST-OK | extensions/cookie/test/browser_test_favicon.js | took 90114ms

:mak, can you take a look?
Flags: needinfo?(mak77)
I don't have a clear idea about this failure, and previous investigation didn't show anything obvious.

The only thing we may look into is the failure increase range, it looks like it increased between 31st of May and the 2nd of June, I'll check what landed in that timeframe.

We could also just disable this test on Windows-debug to lower the failures frequency.

Adding Kershaw who also did some network investigation here, in case he has further ideas to evaluate.
Flags: needinfo?(kershaw)
Update:
In the last 7 days, there have been 33 failures.
All of them occur on the debug build type and on the following platforms:
- windows7-32 - 19
- windows10-64 - 12
- osx-10-10 - 2

Here is a recent relevant log file and a snippet with the failure:
https://treeherder.mozilla.org/logviewer.html#?job_id=185176768&repo=mozilla-inbound&lineNumber=25884

14:54:01     INFO -  TEST-INFO | started process screenshot
14:54:01     INFO -  TEST-INFO | screenshot: exit 0
14:54:01     INFO -  Buffered messages logged at 14:52:31
14:54:01     INFO -  692 INFO Entering test bound
14:54:01     INFO -  Buffered messages logged at 14:52:32
14:54:01     INFO -  693 INFO Console message: [JavaScript Warning: "Use of nsIFile in content process is deprecated." {file: "resource://gre/modules/FileUtils.jsm" line: 170}]
14:54:01     INFO -  Buffered messages finished
14:54:01    ERROR -  694 INFO TEST-UNEXPECTED-FAIL | extensions/cookie/test/browser_test_favicon.js | Test timed out -
14:54:01     INFO -  GECKO(5720) | MEMORY STAT | vsize 729MB | vsizeMaxContiguous 512MB | residentFast 203MB | heapAllocated 72MB
14:54:01     INFO -  695 INFO TEST-OK | extensions/cookie/test/browser_test_favicon.js | took 90116ms
14:54:01     INFO -  GECKO(5720) | ++DOCSHELL 05A0C000 == 2 [pid = 3996] [id = {44ea01d0-569d-4e49-bc45-161a5f751365}]
14:54:01     INFO -  GECKO(5720) | ++DOMWINDOW == 3 (013CF410) [pid = 3996] [serial = 9] [outer = 00000000]
14:54:01     INFO -  GECKO(5720) | ++DOMWINDOW == 4 (0A535000) [pid = 3996] [serial = 10] [outer = 013CF410]
14:54:01     INFO -  696 INFO checking window state
(In reply to Marco Bonardo [::mak] from comment #57)
> I don't have a clear idea about this failure, and previous investigation
> didn't show anything obvious.
> 
> The only thing we may look into is the failure increase range, it looks like
> it increased between 31st of May and the 2nd of June, I'll check what landed
> in that timeframe.
> 
> We could also just disable this test on Windows-debug to lower the failures
> frequency.
> 
> Adding Kershaw who also did some network investigation here, in case he has
> further ideas to evaluate.

Sorry that I don't have any idea now.

It seems that the failure rate is pretty low recently. I'll look into this when this is more reproducible.
Flags: needinfo?(kershaw)
Flags: needinfo?(mak77)
Keywords: stalled
See Also: → 1548167
Duplicate of this bug: 1548167
You need to log in before you can comment on or make changes to this bug.