Huge spike in testrun failures.

RESOLVED DUPLICATE of bug 631175

Status

Mozilla QA Graveyard
Mozmill Automation
RESOLVED DUPLICATE of bug 631175
7 years ago
4 years ago

People

(Reporter: ashughes, Unassigned)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [mozmill-test-failure], URL)

(Reporter)

Description

7 years ago
Today there was a huge spike in test failures across all platforms and branches.

2011-01-30: ~35 failed tests
2011-01-31: ~370 failed tests

It appears as though many of the new failures are waitForPageLoad() timeouts. I don't think there was any scheduled downtime or outage, but who knows?

We should investigate what happened here.
1.5.2RC2 was installed yesterday -- something to not overlook
We run 1.5.2 a couple of times on our systems, so I don't expect it to be a regression in Mozmill. In the past I have noticed at least on the Linux machine that even loading pages from localhost timed out.

Could someone run the testrun_general script on the affected platforms? You can report to mozill-archive. Would be good to know if that is reproducible. Sadly I don't have time for right now.
Example from today on Windows NT

http://mozmill-release.brasstacks.mozilla.com/#/general/report/fc2eabbf52c98c01bb8d9938c6002bc1

controller.waitForPageLoad() timeout on testStopReloadButtons, which uses local data
Can you reproduce it when you run this single script via Mozmill too? Or does it only occur for test-runs triggered by the automation scripts?

Updated

7 years ago
Blocks: 630551
Unable to reproduce locally with the automation script and mozmill test-runs
Remotely (qa-horus w/ 1.5.2rc2), during the test-run, multiple ports are being used for perhaps multiple httpd instances. This is visible during the tests that are run on localhost. Ports are incremented by one with each ran test.
(In reply to comment #5)
> Unable to reproduce locally with the automation script and mozmill test-runs

Scratch that, was on 1.5.1 - with 1.5.2rc2 I see this.
Steps to Reproduce

1. Install and setup mozmill 1.5.2rc2
2. Clone http://hg.mozilla.org/qa/mozmill-tests/
3. mozmill -t firefox/testAwesomebar -b <path to binary>
Comments #6-#8 might be related to something else. Moving discussion to bug 630599
Depends on: 630599
Bug 630599 should have fixed that. I cannot reproduce this failures anymore. See the latest testruns:

http://mozmill-archive.brasstacks.mozilla.com/#/general/report/e2d85e9ae099daa31109068aed0513ad
http://mozmill-archive.brasstacks.mozilla.com/#/general/report/e2d85e9ae099daa31109068aed051c08
http://mozmill-archive.brasstacks.mozilla.com/#/general/report/e2d85e9ae099daa31109068aed052f23

Also the results are looking awesome! There is only one failure on the older branches for the testPasswordNotSaved.js test, but it looks different to bug 614973.

We should wait for the official test-run today.
(In reply to comment #10)
> Bug 630599 should have fixed that. I cannot reproduce this failures anymore.
> See the latest testruns:
> 
> http://mozmill-archive.brasstacks.mozilla.com/#/general/report/e2d85e9ae099daa31109068aed0513ad
> http://mozmill-archive.brasstacks.mozilla.com/#/general/report/e2d85e9ae099daa31109068aed051c08
> http://mozmill-archive.brasstacks.mozilla.com/#/general/report/e2d85e9ae099daa31109068aed052f23
> 
> Also the results are looking awesome! There is only one failure on the older
> branches for the testPasswordNotSaved.js test, but it looks different to bug
> 614973.
> 
> We should wait for the official test-run today.

whimboo: can you review these test results with ctalbert, and give us a better description of what exactly was being tested? I ask because we just recently had to disable mozmill in production because of problems caused when mozmill hung production machines.

Once these tests are passing in staging, without hanging, please file a bug in mozilla.org/RelEng to have us re-enable these tests in production at a time that everyone is around to carefully watch for hangs.
(In reply to comment #11)
> whimboo: can you review these test results with ctalbert, and give us a better
> description of what exactly was being tested? I ask because we just recently
> had to disable mozmill in production because of problems caused when mozmill
> hung production machines.

John, we have discussed that in our yesterdays Mozmill meeting, and we probably don't want to re-enable the Mozmill tests until we have released Mozmill 2.0. The issue you have seen is a different one, as what we have logged here.
No longer blocks: 630551
This huge spike can also be related to the new feature of Mozmill to report chrome JS errors. We don't correctly report those due to bug 631175 but it shows that we have some strange bugs in the browser itself:

http://mozmill-archive.brasstacks.mozilla.com/#/general/report/00edfccff35537bd5dcddb5129620f10

"[JavaScript Error: \"document is null\" {file: \"chrome://browser/content/browser.js\" line: 12728}]"

Seems to only happen on Linux, so I wonder if we have a regression there. Aaron, are you able to reproduce that, if yes even in older releases?
I just gave a run via testrun_general as well mozmill on its own under Linux with the latest nightly and beta 10, w/ 1.5.2rc3 and did not see any spikes. In fact, I got quite the opposite:

INFO Passed: 216
INFO Failed: 0
INFO Skipped: 13
Could this be a combination of updates and general tests again? Would be good to know what testrun_all.py says.
This should have been the same cause as bug 631175. Not sure what happens but the python package was completely broken on that machine.
Status: NEW → RESOLVED
Last Resolved: 7 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 631175
(Assignee)

Updated

4 years ago
Product: Mozilla QA → Mozilla QA Graveyard
You need to log in before you can comment on or make changes to this bug.