Closed Bug 1126299 Opened 6 years ago Closed 5 years ago
_child _resource .js | uncaught exception - NS _ERROR _FAILURE: Component returned failure code: 0x80004005 (NS _ERROR _FAILURE) [ns IMessage Sender .send Async Message] at resource://gre/modules/Page Thumbs .jsm:242
58 bytes, text/x-review-board-request
If you get to wondering how you went from around one failure per week to suddenly failing half the time on the tier-2 taskcluster runs, well, funny story involving a should-have-been-NPOTB push and a no-op for you DONTBUILD push: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&fromchange=b9dba72f9e97&group_state=expanded&filter-searchStr=f57cef87a778cb72d1e48c642bae87110164c4b4&tochange=96c92e9d6216
(In reply to Phil Ringnalda (:philor) from comment #26) > If you get to wondering how you went from around one failure per week to > suddenly failing half the time on the tier-2 taskcluster runs, well, funny > story involving a should-have-been-NPOTB push and a no-op for you DONTBUILD > push: > https://treeherder.mozilla.org/#/jobs?repo=mozilla- > inbound&fromchange=b9dba72f9e97&group_state=expanded&filter- > searchStr=f57cef87a778cb72d1e48c642bae87110164c4b4&tochange=96c92e9d6216 I completely missed bc. Sorry. I will disble them.
I disabled the jobs yesterday. We should not get anymore OF notifications.
No more instances since Feb. 4th (4 days ago).
Armen, it seems this started appearing again in late February. Any ideas?
Hi bkelly, We re-enabled the jobs back on February 17th: https://hg.mozilla.org/integration/mozilla-inbound/rev/d8f9f159cee2b704177973576215c4f7d83d0a90 From looking at brasstacks, I can see that TC has 100 instances out of 116 (86%). I would have looked at these but I honestly got confused with another wpt intermittent bug which we solved by making the TC instance larger. These jobs run on the same AWS instance type as the Buildbot jobs  (m1.medium), however, we know that docker can have a bit of overhead plus the /tmp directory is using the aufs system instead of ext4 (bug 1246947) which is slower. We changed the mount point for ~/workspace and managed to speed up run times from ~30% to ~12% (compared to Buildbot). e10s bc jobs seem to run around 10% slower than on Buildbot  What are dead CPOW?  https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1126299&startday=2016-02-22&endday=2016-03-06&tree=all  Buildbot jobs: https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&filter-searchStr=ubuntu%20x64%20debug%20mochitest-e10s-browser-chrome&group_state=expanded  TaskCluster jobs https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&filter-searchStr=tc%20debug%20mochitest%20-browser-chrome%20e10s&group_state=expanded  https://docs.google.com/spreadsheets/d/18OWl54b94Uda8AqdcHVFtydZwVlSUFhddJkip-2Iko8/edit#gid=0
:billm, I see that you have reviewed most of the changes to browser_child_resource.js: https://hg.mozilla.org/mozilla-central/filelog/f0c0480732d36153e8839c7f17394d45f679f87d/netwerk/test/browser/browser_child_resource.js Can you find an owner for this issue so we can get rid of one of our top intermittent bugs?
no response from :billm in 1 week, I am going to disable this test case for linux 64 debug e10s.
Review commit: https://reviewboard.mozilla.org/r/41367/diff/#index_header See other reviews: https://reviewboard.mozilla.org/r/41367/
Comment on attachment 8732794 [details] MozReview Request: Bug 1126299 - Intermittent browser_child_resource.js: disable test. r?ryanvm Bill is on PTO through the end of the month. I'd suggest ni? someone else familiar with the test before disabling.
:mossop, can you help us figure out how to how to resolve this frequent intermittent? I see you had authored some patches for this test case.
Flags: needinfo?(wmccloskey) → needinfo?(dtownsend)
This looks like an issue in the thumbnail capturing code. Here it sets a timeout before trying to capture a browser's thumbnail: https://dxr.mozilla.org/mozilla-central/source/browser/base/content/browser-thumbnails.js#110. My guess is that between setting that and the timeout being called the test crashes the browser it is referring to so then it is attempting to send messages to a dead browser and so throws. Jim seems to have done work around here so maybe he can help.
Flags: needinfo?(dtownsend) → needinfo?(jmathies)
Thanks for the insight :mossop! I will wait for :jimm to weigh in here.
Mossop's description sounds right to me. We probably have some sort of tab crashed notification we could listen for here that would kill that timer.
As a note, this seems to be taskcluster linux64 debug for the platform/opt that this failure is occuring on. :RyanVM- I am not seeing traction on this bug- can you own disabling this test or getting it fixed?
Felipe and Blake are handling more the test-specific issues for e10s. I'd be fine with disabling on Linux64 debug if we can't get the traction to fix whatever's broken here.
Flags: needinfo?(ryanvm) → needinfo?(felipc)
Whiteboard: [e10s-orangeblockers] → [e10s-orangeblockers][disabled on linux 64 debug]
You need to log in before you can comment on or make changes to this bug.