Closed Bug 1126299 Opened 9 years ago Closed 8 years ago

Intermittent browser_child_resource.js | uncaught exception - NS_ERROR_FAILURE: Component returned failure code: 0x80004005 (NS_ERROR_FAILURE) [nsIMessageSender.sendAsyncMessage] at resource://gre/modules/PageThumbs.jsm:242

Categories

(Firefox :: New Tab Page, defect)

x86
Windows 7
defect
Not set
normal

Tracking

()

RESOLVED FIXED
Firefox 48
Tracking Status
e10s + ---
firefox47 --- fixed
firefox48 --- fixed

People

(Reporter: RyanVM, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: intermittent-failure, Whiteboard: [e10s-orangeblockers][disabled on linux 64 debug])

Attachments

(1 file)

18:13:38 INFO - 124 INFO TEST-PASS | netwerk/test/browser/browser_child_resource.js | Shouldn't resolve in main process
18:13:38 INFO - 125 INFO TEST-PASS | netwerk/test/browser/browser_child_resource.js | Shouldn't resolve in child process
18:13:38 INFO - 126 INFO Leaving test
18:13:38 INFO - 127 INFO Entering test
18:13:38 INFO - 128 INFO Waiting for load
18:13:38 INFO - 129 INFO Console message: [JavaScript Warning: "unsafe CPOW usage" {file: "resource://app/modules/sessionstore/TabState.jsm" line: 96}]
18:13:38 INFO - 130 INFO Console message: [JavaScript Warning: "unsafe CPOW usage" {file: "resource://app/modules/sessionstore/TabState.jsm" line: 96}]
18:13:38 INFO - 131 INFO Saw load
18:13:38 INFO - 132 INFO Set
18:13:38 INFO - 133 INFO Console message: [JavaScript Error: "The character encoding of the HTML document was not declared. The document will render with garbled text in some browser configurations if the document contains characters from outside the US-ASCII range. The character encoding of the page must be declared in the document or in the transfer protocol." {file: "http://example.com/browser/netwerk/test/browser/dummy.html" line: 0}]
18:13:38 INFO - 134 INFO TEST-PASS | netwerk/test/browser/browser_child_resource.js | Should resolve in main process
18:13:38 INFO - 135 INFO TEST-PASS | netwerk/test/browser/browser_child_resource.js | Should resolve in child process
18:13:38 INFO - 136 INFO Waiting for AboutTabCrashedLoad
18:13:38 INFO - 137 INFO TEST-UNEXPECTED-FAIL | netwerk/test/browser/browser_child_resource.js | uncaught exception - NS_ERROR_FAILURE: Component returned failure code: 0x80004005 (NS_ERROR_FAILURE) [nsIMessageSender.sendAsyncMessage] at resource://gre/modules/PageThumbs.jsm:242
18:13:38 INFO - Stack trace:
18:13:38 INFO - chrome://mochikit/content/tests/SimpleTest/SimpleTest.js:simpletestOnerror:1474
18:13:38 INFO - null:null:0
18:13:38 INFO - JavaScript error: resource://gre/modules/PageThumbs.jsm, line 242: NS_ERROR_FAILURE: Component returned failure code: 0x80004005 (NS_ERROR_FAILURE) [nsIMessageSender.sendAsyncMessage]
If you get to wondering how you went from around one failure per week to suddenly failing half the time on the tier-2 taskcluster runs, well, funny story involving a should-have-been-NPOTB push and a no-op for you DONTBUILD push: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&fromchange=b9dba72f9e97&group_state=expanded&filter-searchStr=f57cef87a778cb72d1e48c642bae87110164c4b4&tochange=96c92e9d6216
(In reply to Phil Ringnalda (:philor) from comment #26)
> If you get to wondering how you went from around one failure per week to
> suddenly failing half the time on the tier-2 taskcluster runs, well, funny
> story involving a should-have-been-NPOTB push and a no-op for you DONTBUILD
> push:
> https://treeherder.mozilla.org/#/jobs?repo=mozilla-
> inbound&fromchange=b9dba72f9e97&group_state=expanded&filter-
> searchStr=f57cef87a778cb72d1e48c642bae87110164c4b4&tochange=96c92e9d6216

I completely missed bc. Sorry.
I will disble them.
I disabled the jobs yesterday. We should not get anymore OF notifications.
No more instances since Feb. 4th (4 days ago).
Armen, it seems this started appearing again in late February.  Any ideas?
Flags: needinfo?(armenzg)
Hi bkelly,
We re-enabled the jobs back on February 17th:
https://hg.mozilla.org/integration/mozilla-inbound/rev/d8f9f159cee2b704177973576215c4f7d83d0a90

From looking at brasstacks, I can see that TC has 100 instances out of 116 (86%).

I would have looked at these but I honestly got confused with another wpt intermittent bug which we solved by making the TC instance larger.

These jobs run on the same AWS instance type as the Buildbot jobs [1][2] (m1.medium), however, we know that docker can have a bit of overhead plus the /tmp directory is using the aufs system instead of ext4 (bug 1246947) which is slower.
We changed the mount point for ~/workspace and managed to speed up run times from ~30% to ~12% (compared to Buildbot).
e10s bc jobs seem to run around 10% slower than on Buildbot [3]

What are dead CPOW?

[0]
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1126299&startday=2016-02-22&endday=2016-03-06&tree=all

[1] Buildbot jobs:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&filter-searchStr=ubuntu%20x64%20debug%20mochitest-e10s-browser-chrome&group_state=expanded

[2] TaskCluster jobs
https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&filter-searchStr=tc%20debug%20mochitest%20-browser-chrome%20e10s&group_state=expanded

[3] https://docs.google.com/spreadsheets/d/18OWl54b94Uda8AqdcHVFtydZwVlSUFhddJkip-2Iko8/edit#gid=0
Flags: needinfo?(armenzg)
:billm, I see that you have reviewed most of the changes to browser_child_resource.js:
https://hg.mozilla.org/mozilla-central/filelog/f0c0480732d36153e8839c7f17394d45f679f87d/netwerk/test/browser/browser_child_resource.js

Can you find an owner for this issue so we can get rid of one of our top intermittent bugs?
Flags: needinfo?(wmccloskey)
Whiteboard: [e10s-orangeblockers]
no response from :billm in 1 week, I am going to disable this test case for linux 64 debug e10s.
Comment on attachment 8732794 [details]
MozReview Request: Bug 1126299 - Intermittent browser_child_resource.js: disable test. r?ryanvm

Bill is on PTO through the end of the month. I'd suggest ni? someone else familiar with the test before disabling.
Attachment #8732794 - Flags: review?(ryanvm)
:mossop, can you help us figure out how to how to resolve this frequent intermittent?  I see you had authored some patches for this test case.
Flags: needinfo?(wmccloskey) → needinfo?(dtownsend)
This looks like an issue in the thumbnail capturing code. Here it sets a timeout before trying to capture a browser's thumbnail: https://dxr.mozilla.org/mozilla-central/source/browser/base/content/browser-thumbnails.js#110. My guess is that between setting that and the timeout being called the test crashes the browser it is referring to so then it is attempting to send messages to a dead browser and so throws. Jim seems to have done work around here so maybe he can help.
Flags: needinfo?(dtownsend) → needinfo?(jmathies)
Thanks for the insight :mossop!  I will wait for :jimm to weigh in here.
Mossop's description sounds right to me. We probably have some sort of tab crashed notification we could listen for here that would kill that timer.
Flags: needinfo?(jmathies)
As a note, this seems to be taskcluster linux64 debug for the platform/opt that this failure is occuring on.

:RyanVM- I am not seeing traction on this bug- can you own disabling this test or getting it fixed?
Flags: needinfo?(ryanvm)
Felipe and Blake are handling more the test-specific issues for e10s. I'd be fine with disabling on Linux64 debug if we can't get the traction to fix whatever's broken here.
Flags: needinfo?(ryanvm) → needinfo?(felipc)
Flags: needinfo?(felipc)
Whiteboard: [e10s-orangeblockers] → [e10s-orangeblockers][disabled on linux 64 debug]
https://hg.mozilla.org/mozilla-central/rev/9eb806f21fb1
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 48
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: