623992 - Linux test-runs run out of order

Assignee

Description

•

14 years ago

This was originally discovered in bug 614973. For whatever reason, the testrun order on Linux when run using the daily_testrun script is completely mixed up. Under normal testing conditions, the tests should run in alphabetic order, but they don't when run on Linux using the testrun script: testAddons testAwesomeBar testBookmarks testCookies testDownloading testFindInPage testFormManager testGeneral testInstallation testLayout testPasswordManager testPopups testPreferences testPrivateBrowsing testSearch testSecurity testSessionStore testTabbedBrowsing testTechnicalTools testToolbar - BECOMES - testSessionStore testBookmarks testCookies testTechnicalTools testInstallation testPreferences testFormManager testLayout testPasswordManager testAwesomeBar testSecurity testFindInPage testPopups testSearch testTabbedBrowsing testToolbar testAddons testGeneral testDownloading testPrivateBrowsing

u279076

Assignee

Comment 1

•

14 years ago

It is my belief that this is the root cause of most, if not all, of the current failures we see on Linux-only.

u279076

Assignee

Updated

•

14 years ago

Assignee: nobody → anthony.s.hughes

u279076

Assignee

Updated

•

14 years ago

Blocks: 614973

Aaron Train [:aaronmt]

Comment 2

•

14 years ago

In the previous bug you mentioned you ran with the testrun_general.py script, is there any difference between that script and in this bug you mention the testrun_daily.py script?

OS: Linux → Windows CE

Aaron Train [:aaronmt]

Updated

•

14 years ago

OS: Windows CE → Linux

Henrik Skupin [:whimboo][⌚️UTC+2]

Comment 3

•

14 years ago

The daily testrun script is only a wrapper. It shouldn't affect anything.

u279076

Assignee

Comment 4

•

14 years ago

So, I've run the tests using hotfix-1.5.2, and there are no failures whatsoever. Henrik, can you get the testrun_general.py script working so I can try it with hotfix-1.5.2? Right now, the script only starts Firefox, no tests run.

Henrik Skupin [:whimboo][⌚️UTC+2]

Comment 5

•

14 years ago

A path is up and is waiting for review from Geo. You can apply it locally meanwhile.

u279076

Assignee

Comment 6

•

14 years ago

I've been doing a little research into Linux filesystems and it turns out they have a feature called dir_index which essentially creates an index of files on the filesystem in a database. All modern file systems use some sort of file indexing, but I'm wondering if it's possible that index is out of order and python is processing the test files based on the index ordering...

Henrik Skupin [:whimboo][⌚️UTC+2]

Comment 7

•

14 years ago

CC'ing Jeff and Clint, who might be able to share their knowledge.

Henrik Skupin [:whimboo][⌚️UTC+2]

Comment 8

•

14 years ago

But the question remains why it only happens with our automation scripts. Those are setting the test (Mozmill 1.5.2: tests) property. It should be the same as when you specify the folder via the -t option from the command line.

Dave Hunt [:davehunt] [he/him] ⌚BST

Comment 9

•

14 years ago

I noticed recently that running tests on a Linux VM where the tests were on the local file system they ran in a different order to when the tests were on my host OS via shared-folders. Not sure if that helps at all, but thought it might be worth mentioning.

Henrik Skupin [:whimboo][⌚️UTC+2]

Comment 10

•

14 years ago

Interesting fact Dave! Which one of those tests (local, host system) has the correct order for you?

Dave Hunt [:davehunt] [he/him] ⌚BST

Comment 11

•

14 years ago

I'm not sure what we're saying is 'correct'. I was running just the tests in firefox/testAwesomebar at the time. I can try replicating the issue and posting results here if it'll help?

Henrik Skupin [:whimboo][⌚️UTC+2]

Comment 12

•

14 years ago

Please run the complete firefox folder and check if we run the folder in ascending sorted order by name. That's what we expect and need for the moment.

Jeff Hammel

Comment 13

•

14 years ago

Hard to offer an opinion without the mozmill command line used (probably not accessible in automation). I have noticed that os.listdir and other directory listers on linux are not guaranteed to correctly sort this list (that is, on some platforms they do, on some they don't....no idea of the pattern). Not sure if that's the issue or not. If so, should be easy to work around by insisting they are sorted whereever the relavent place that happens is

Dave Hunt [:davehunt] [he/him] ⌚BST

Comment 14

•

14 years ago

I was unable to replicate the issue I saw with tests running in a different order.

u279076

Assignee

Comment 15

•

14 years ago

(In reply to comment #13) > Hard to offer an opinion without the mozmill command line used (probably not > accessible in automation). I have noticed that os.listdir and other directory > listers on linux are not guaranteed to correctly sort this list (that is, on > some platforms they do, on some they don't....no idea of the pattern). Not > sure if that's the issue or not. If so, should be easy to work around by > insisting they are sorted whereever the relavent place that happens is Is there a way I can print out os.listdir on testrun? By the way, this is the command I've been running: ./testrun_general.py --logfile=testrun.log <location_of_Firefox>

u279076

Assignee

Comment 16

•

14 years ago

I've done some further investigation and it appears the issue here is not the actual execution order. I've run the individual tests in the exact order they are reported here: http://mozmill-release.brasstacks.mozilla.com/#/general/report/a57c8e0b757874f4760206c74b16b018 They all pass when run using mozmill -b <build> -t folder/test -t folder/test ... They only time the testPasswordNotSaved.js test fails is when the tests are run through testrun_general.py. Additionally, I watched the test run several times, and every time testPasswordNotSaved.js fails, I can visually see it clicking on the close button. This tells me that the test visually passes but something (either in Mozmill, Mozmill-Automation, or the python configuration on this VM) is causing the failure. Henrik, can you please comment your thoughts on this?

Henrik Skupin [:whimboo][⌚️UTC+2]

Comment 17

•

14 years ago

Please keep the discussion for the password failure on bug 614973. This bug is only for the mixed-up ordering of tests.

Henrik Skupin [:whimboo][⌚️UTC+2]

Updated

•

14 years ago

No longer blocks: 614973

Jeff Hammel

Comment 18

•

14 years ago

Not sure the status of this bug. Assuming this is 1.5.2, we use os.listdir https://github.com/mozautomation/mozmill/blob/hotfix-1.5.2/mozmill/mozmill/__init__.py#L237 os.listdir is *not* guaranteed to be in order! >>> import os >>> os.listdir('.') ['mozinfo', 'mozmill', 'mozprocess', '.gitignore', 'README.md', '.git', 'mozprofile', 'mozrunner', 'setup_development.py', 'jsbridge'] On some systems, it is. On others, it isn't. Did you upgrade python, the operating system, or the filesystem recently on the failing system?

u279076

Assignee

Comment 19

•

14 years ago

It runs out of order on both Linux VMs as per the dashboard logs. The run in alphabetic order on Mac OSX and Windows VMs. I'm guessing this is probably caused with os.listdir() on ext4 filesystems. Based on this, and the fact that we are currently whittling down the failures week by week, I'm inclined to call this WONTFIX. Henrik, please advise.

Henrik Skupin [:whimboo][⌚️UTC+2]

Comment 20

•

14 years ago

Anthony, can you please run an additional check on qa-horus? Please checkout the tests to the Linux VM itself and run the daily automation script with that given repository instead. I wonder if this is related that we are accessing a repo clone outside of the VM.

Henrik Skupin [:whimboo][⌚️UTC+2]

Comment 21

•

14 years ago

Oh wait. I was too quick. We don't reference a local repo for our daily tests but clone the repository each time. So personally I wouldn't spent too much time on it, but more in figuring out existing test failures and fixing those.

u279076

Assignee

Comment 22

•

14 years ago

Yeah, like I said, I think it's more of an issue with os.listdir() than with Mozmill and mozmill-tests. Resolving WONTFIX for now.

Status: NEW → RESOLVED

Closed: 14 years ago

Resolution: --- → WONTFIX

Geo Mealer [:geo] -- This account is inactive after 2015-07-07

Comment 23

•

14 years ago

I think the WONTFIX is too hasty, Anthony. I do think it's preferable that our test runs are stable (in the sense of sorting) between platforms; it helps greatly with comparing one platform's results to the other by removing the chance it's just test order causing a platform-specific bug. Despite Jeff's reply, they -could- have (and IMO should have) just sorted the results of os.listdir(). I'll leave it up to Henrik as to whether to reopen, since he's the main interface to the A-Team on this sort of stuff. However, unless the assumption is that the next release includes manifests and manifests will fix this, I'd like to see the test order sorted. I see no harm in leaving the bug open to do that.

Jeff Hammel

Comment 24

•

14 years ago

Yes, they probably should be sorted and in harth's fix for 2.0 they are. That said, depending on directory recursion for test order is fairly fragile.

Henrik Skupin [:whimboo][⌚️UTC+2]

Comment 25

•

14 years ago

As given by Clint for Mozmill 2.0 the manifests should always be used. -t is only available for debugging purposes. I think the Mozmill team has more important things to work on so we can get 1.5.2 and finally 2.0 out of the door.

Nobody; OK to take it and work on it

Updated

•

11 years ago

Product: Mozilla QA → Mozilla QA Graveyard