Closed Bug 623992 Opened 11 years ago Closed 11 years ago

Linux test-runs run out of order

Categories

(Mozilla QA Graveyard :: Mozmill Automation, defect)

x86
Linux
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: u279076, Assigned: u279076)

Details

This was originally discovered in bug 614973.

For whatever reason, the testrun order on Linux when run using the daily_testrun script is completely mixed up.  Under normal testing conditions, the tests should run in alphabetic order, but they don't when run on Linux using the testrun script:

testAddons
testAwesomeBar
testBookmarks
testCookies
testDownloading
testFindInPage
testFormManager
testGeneral
testInstallation
testLayout
testPasswordManager
testPopups
testPreferences
testPrivateBrowsing
testSearch
testSecurity
testSessionStore
testTabbedBrowsing
testTechnicalTools
testToolbar

- BECOMES - 

testSessionStore
testBookmarks
testCookies
testTechnicalTools
testInstallation
testPreferences
testFormManager
testLayout
testPasswordManager
testAwesomeBar
testSecurity
testFindInPage
testPopups
testSearch
testTabbedBrowsing
testToolbar
testAddons
testGeneral
testDownloading
testPrivateBrowsing
It is my belief that this is the root cause of most, if not all, of the current failures we see on Linux-only.
Assignee: nobody → anthony.s.hughes
Blocks: 614973
In the previous bug you mentioned you ran with the testrun_general.py script, is there any difference between that script and in this bug you mention the testrun_daily.py script?
OS: Linux → Windows CE
OS: Windows CE → Linux
The daily testrun script is only a wrapper. It shouldn't affect anything.
So, I've run the tests using hotfix-1.5.2, and there are no failures
whatsoever.  Henrik, can you get the testrun_general.py script working so I can
try it with hotfix-1.5.2?

Right now, the script only starts Firefox, no tests run.
A path is up and is waiting for review from Geo. You can apply it locally meanwhile.
I've been doing a little research into Linux filesystems and it turns out they have a feature called dir_index which essentially creates an index of files on the filesystem in a database. All modern file systems use some sort of file indexing, but I'm wondering if it's possible that index is out of order and python is processing the test files based on the index ordering...
CC'ing Jeff and Clint, who might be able to share their knowledge.
But the question remains why it only happens with our automation scripts. Those are setting the test (Mozmill 1.5.2: tests) property. It should be the same as when you specify the folder via the -t option from the command line.
I noticed recently that running tests on a Linux VM where the tests were on the local file system they ran in a different order to when the tests were on my host OS via shared-folders. Not sure if that helps at all, but thought it might be worth mentioning.
Interesting fact Dave! Which one of those tests (local, host system) has the correct order for you?
I'm not sure what we're saying is 'correct'. I was running just the tests in firefox/testAwesomebar at the time. I can try replicating the issue and posting results here if it'll help?
Please run the complete firefox folder and check if we run the folder in ascending sorted order by name. That's what we expect and need for the moment.
Hard to offer an opinion without the mozmill command line used (probably not accessible in automation).  I have noticed that os.listdir and other directory listers on linux are not guaranteed to correctly sort this list (that is, on some platforms they do, on some they don't....no idea of the pattern).  Not sure if that's the issue or not.  If so, should be easy to work around by insisting they are sorted whereever the relavent place that happens is
I was unable to replicate the issue I saw with tests running in a different order.
(In reply to comment #13)
> Hard to offer an opinion without the mozmill command line used (probably not
> accessible in automation).  I have noticed that os.listdir and other directory
> listers on linux are not guaranteed to correctly sort this list (that is, on
> some platforms they do, on some they don't....no idea of the pattern).  Not
> sure if that's the issue or not.  If so, should be easy to work around by
> insisting they are sorted whereever the relavent place that happens is

Is there a way I can print out os.listdir on testrun?

By the way, this is the command I've been running:
./testrun_general.py --logfile=testrun.log <location_of_Firefox>
I've done some further investigation and it appears the issue here is not the actual execution order.  I've run the individual tests in the exact order they are reported here:
http://mozmill-release.brasstacks.mozilla.com/#/general/report/a57c8e0b757874f4760206c74b16b018

They all pass when run using mozmill -b <build> -t folder/test -t folder/test ...

They only time the testPasswordNotSaved.js test fails is when the tests are run through testrun_general.py.

Additionally, I watched the test run several times, and every time testPasswordNotSaved.js fails, I can visually see it clicking on the close button.  This tells me that the test visually passes but something (either in Mozmill, Mozmill-Automation, or the python configuration on this VM) is causing the failure.

Henrik, can you please comment your thoughts on this?
Please keep the discussion for the password failure on bug 614973. This bug is only for the mixed-up ordering of tests.
No longer blocks: 614973
Not sure the status of this bug.  Assuming this is 1.5.2, we use os.listdir https://github.com/mozautomation/mozmill/blob/hotfix-1.5.2/mozmill/mozmill/__init__.py#L237

os.listdir is *not* guaranteed to be in order!

>>> import os
>>> os.listdir('.')
['mozinfo', 'mozmill', 'mozprocess', '.gitignore', 'README.md', '.git', 'mozprofile', 'mozrunner', 'setup_development.py', 'jsbridge']

On some systems, it is.  On others, it isn't.  Did you upgrade python, the operating system, or the filesystem recently on the failing system?
It runs out of order on both Linux VMs as per the dashboard logs.  The run in alphabetic order on Mac OSX and Windows VMs.  I'm guessing this is probably caused with os.listdir() on ext4 filesystems.

Based on this, and the fact that we are currently whittling down the failures week by week, I'm inclined to call this WONTFIX.

Henrik, please advise.
Anthony, can you please run an additional check on qa-horus? Please checkout the tests to the Linux VM itself and run the daily automation script with that given repository instead. I wonder if this is related that we are accessing a repo clone outside of the VM.
Oh wait. I was too quick. We don't reference a local repo for our daily tests but clone the repository each time. So personally I wouldn't spent too much time on it, but more in figuring out existing test failures and fixing those.
Yeah, like I said, I think it's more of an issue with os.listdir() than with Mozmill and mozmill-tests.  Resolving WONTFIX for now.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → WONTFIX
I think the WONTFIX is too hasty, Anthony. 

I do think it's preferable that our test runs are stable (in the sense of sorting) between platforms; it helps greatly with comparing one platform's results to the other by removing the chance it's just test order causing a platform-specific bug. 

Despite Jeff's reply, they -could- have (and IMO should have) just sorted the results of os.listdir().

I'll leave it up to Henrik as to whether to reopen, since he's the main interface to the A-Team on this sort of stuff. 

However, unless the assumption is that the next release includes manifests and manifests will fix this, I'd like to see the test order sorted. I see no harm in leaving the bug open to do that.
Yes, they probably should be sorted and in harth's fix for 2.0 they are.  

That said, depending on directory recursion for test order is fairly fragile.
As given by Clint for Mozmill 2.0 the manifests should always be used. -t is only available for debugging purposes. I think the Mozmill team has more important things to work on so we can get 1.5.2 and finally 2.0 out of the door.
Product: Mozilla QA → Mozilla QA Graveyard
You need to log in before you can comment on or make changes to this bug.