Closed Bug 1149955 Opened 9 years ago Closed 8 years ago

Intermittent test_shared_all.py TestSharedUnits.test_units | TimeoutException: TimeoutException: Error loading page, timed out (onDOMContentLoaded)

Categories

(Testing :: Marionette Client and Harness, defect)

x86
Windows 8
defect
Not set
normal

Tracking

(firefox39 unaffected, firefox40 disabled, firefox41 disabled, firefox-esr31 unaffected, firefox-esr38 unaffected)

RESOLVED FIXED
mozilla47
Iteration:
47.1 - Feb 8
Tracking Status
firefox39 --- unaffected
firefox40 --- disabled
firefox41 --- disabled
firefox-esr31 --- unaffected
firefox-esr38 --- unaffected

People

(Reporter: cbook, Assigned: standard8)

References

()

Details

(Keywords: intermittent-failure, Whiteboard: [test disabled on Windows])

Attachments

(1 file)

Windows 8 64-bit mozilla-inbound pgo test marionette

https://treeherder.mozilla.org/logviewer.html#?job_id=8343458&repo=mozilla-inbound

04:22:36 ERROR - TEST-UNEXPECTED-ERROR | test_shared_all.py TestSharedUnits.test_units | TimeoutException: TimeoutException: Error loading page, timed out (onDOMContentLoaded)
This is painful :(
Flags: needinfo?(cmanchester)
Looking at some failure logs for this doesn't look like a marionette problem more than the tests themselves getting stuck. PGO seems relevant here.
Flags: needinfo?(cmanchester)
Mark, can you please take a look at this frequent failure?
Flags: needinfo?(standard8)
Looking - I've set up a push to try to see if we can get a screenshot of the page that's failing.
Assignee: nobody → standard8
Component: Marionette → Client
Flags: needinfo?(standard8)
Product: Testing → Loop
At the moment, I'm trying to debug some things on try server.

However, 5-hour turn arounds, a couple of false starts, and multiple failures due to bug 1115490 mean its already taken several days to attempt just to try and get some basic initial information - which I haven't achieved yet.
Depends on: 1115490
Ok, I don't think this is a Loop issue. I just made the test file a lot smaller on try server, and it didn't fix the issue.

The other thing is that the exception:

TimeoutException: Error loading page, timed out (onDOMContentLoaded)

is coming from this line:

self.marionette.navigate(urlparse.urljoin(self.server_prefix, page))

Which also implies a marionette issue.

The other interesting thing, is the first 4/5 timeouts were on mozilla-inbound, and then fx-team and mozilla-central appear - as all the Hello patches tend to land in fx-team, I'm thinking something in an mozilla inbound landing could be at issue here.

David, any ideas how Marionette might be at issue here?
Assignee: standard8 → nobody
Component: Client → Marionette
Flags: needinfo?(dburns)
Product: Loop → Testing
Approximate regression range: https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?startdate=March+30+2015&tochange=11960a87b918

though the start date could easily be wrong - that's a bit of a guess as its an intermittent.

I couldn't see anything obvious, unless this could be related to infra changes?
It's neither Win8-specific nor PGO-specific, just more likely there. Rather more like a race that's more likely the faster we're running.
Summary: Intermittent Win8-PGOtest_shared_all.py TestSharedUnits.test_units | TimeoutException: TimeoutException: Error loading page, timed out (onDOMContentLoaded) → Intermittent test_shared_all.py TestSharedUnits.test_units | TimeoutException: TimeoutException: Error loading page, timed out (onDOMContentLoaded)
This test also should be disabled on Windows, which (as a Visibility Requirement) should be something a sheriff can do by following instructions linked from https://developer.mozilla.org/en-US/docs/Mozilla/QA/Automated_testing
(In reply to Mark Banner (:standard8) from comment #359)
> Ok, I don't think this is a Loop issue. I just made the test file a lot
> smaller on try server, and it didn't fix the issue.
> 
> The other thing is that the exception:
> 
> TimeoutException: Error loading page, timed out (onDOMContentLoaded)
> 
> is coming from this line:
> 
> self.marionette.navigate(urlparse.urljoin(self.server_prefix, page))
> 
> Which also implies a marionette issue.
> 
> The other interesting thing, is the first 4/5 timeouts were on
> mozilla-inbound, and then fx-team and mozilla-central appear - as all the
> Hello patches tend to land in fx-team, I'm thinking something in an mozilla
> inbound landing could be at issue here.
> 
> David, any ideas how Marionette might be at issue here?

This is an issue we saw on b2g but could not recreate. Looking at the code it would appear the webserver/test machine is at fault because Firefox is not hitting DOMContentLoaded. I will try look into this some more
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #458)
> https://hg.mozilla.org/releases/mozilla-aurora/rev/37d5ec17d2fc

Excitingly, this pushed the failures to test_desktop_all.py instead. Rather than continuing to play whack-a-mole with this, I've skipped them on Windows for the time-being.
https://treeherder.mozilla.org/logviewer.html#?job_id=906385&repo=mozilla-aurora

https://hg.mozilla.org/releases/mozilla-aurora/rev/a7d5b64df128
Depends on: 1224298
No longer depends on: 1224298
From doing a lot of PGO runs on try server this seems like it isn't happening any more:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=123cce86bbaa

Therefore I'm going with re-enabling it and we'll see what happens in the real world...
Attachment #8709440 - Flags: review?(mdeboer)
Attachment #8709440 - Flags: review?(edilee)
Assignee: nobody → standard8
Flags: needinfo?(dburns)
Whiteboard: [test disabled on Windows][leave open] → [test disabled on Windows]
Comment on attachment 8709440 [details] [diff] [review]
Renable test_shared_all.py as intermittents seem resolved.

Review of attachment 8709440 [details] [diff] [review]:
-----------------------------------------------------------------

Sorry I missed this in my queue. If try is happy, I am too :)

This prolly bitrotted, btw.
Attachment #8709440 - Flags: review?(mdeboer)
Attachment #8709440 - Flags: review?(edilee)
Attachment #8709440 - Flags: review+
I've landed this direct into fx-team, as then its out-of-band of a release, so if we get failures again, we're more likely to know they are to do with re-enabling rather than the release.

Requesting to leave open for now - I'll land this in our repo after its merged, if all is still looking good.
Iteration: --- → 47.1 - Feb 8
Keywords: leave-open
Nothing has reoccurred so I'm calling this fixed. This doesn't need landing in the loop repo as we're not controlling the manifest.ini from there.
Status: NEW → RESOLVED
Closed: 8 years ago
Keywords: leave-open
Resolution: --- → FIXED
Target Milestone: --- → mozilla47
Product: Testing → Remote Protocol
Moving bug to Testing::Marionette Client and Harness component per bug 1815831.
Component: Marionette → Marionette Client and Harness
Product: Remote Protocol → Testing
You need to log in before you can comment on or make changes to this bug.