787448 - Investigate why Windows xpcshell tests take so long to run on the tinderboxen

Reporter

Description

•

12 years ago

      No description provided.

Phil Ringnalda (:philor)

Comment 1

•

12 years ago

The backstory is in bug 617503 - dropping ipv6 took Win7 from 70 minutes to 50 (at that time, it's back pretty close to 70 now), but running locally on worse hardware was taking 7 minutes instead of 50. The same thing probably affects reftests (and crashtests, though crastests are fast enough that it's only 5 minutes vs 15 minutes, rather than the noticeable 20 minutes vs 45 minutes for reftest and 30 minutes versus 70 minutes for xpcshell between Linux32 and Win7).

William Lachance (:wlach)

Comment 2

•

12 years ago

IMO the first step is to figure out which tests are taking so long. My first step when tackling bug 617503 was to create a spreadsheet of the test times. A badly written geolocation test which took a huge amount of time (and didn't even work) immediately stood out. It's quite possible something like that has snuck in again.

(no longer active)

Reporter

Comment 3

•

12 years ago

(In reply to comment #2)
> IMO the first step is to figure out which tests are taking so long. My first
> step when tackling bug 617503 was to create a spreadsheet of the test times. A
> badly written geolocation test which took a huge amount of time (and didn't
> even work) immediately stood out. It's quite possible something like that has
> snuck in again.

Yeah, I can definitely believe that!

William Lachance (:wlach)

Comment 4

•

12 years ago

Attached file Script for parsing xpcshell log and getting test times — Details

Here's the script I used last time for parsing the xpcshell log and generating a csv of test times. I'd be surprised if it didn't still work.

Mike Hommey [:glandium]

Comment 5

•

12 years ago

If you consider 7 minutes to be a normal time for xpcshell tests, then we're slow on all platforms, not only windows.

Mike Hommey [:glandium]

Comment 6

•

12 years ago

Taking http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-inbound-win32/1346421170/mozilla-inbound_win7_test-xpcshell-bm23-tests1-windows-build1639.txt.gz, there are 1594 tests.
704 take more than 1s.
460 more than 2s
336 more than 3s
220 more than 5s
96 more than 10s
6 more than 30s

These 6 are:
toolkit/components/passwordmgr/test/unit/test_storage_mozStorage_3.js,30121.000m
unit/test_partial.js,30850.000m
unit/test_errorhandler.js,37265.000m
toolkit/mozapps/extensions/test/xpcshell-unpack/test_blocklistchange.js,48786.000m
toolkit/mozapps/extensions/test/xpcshell/test_blocklistchange.js,49542.000m
dom/indexedDB/test/unit/test_overlapping_transactions.js,59310.000m

Phil Ringnalda (:philor)

Comment 7

•

12 years ago

The log I just looked at took 69 minutes over all, 65 in the test running buildstep, but there's a little copying of files in that step, so rather than 7 minutes I'd be perfectly happy with 11 or even 12. (12 is, probably not coincidentally, how long the 10.8 run I just looked at took.)

Mike Hommey [:glandium]

Comment 8

•

12 years ago

cumulative duration of all tests that run in less than:
1s -> 3:35
2s -> 9:35
3s -> 14:42
5s -> 21:57
10s -> 36:24
30s -> 60:43

Looks like it's a global problem

William Lachance (:wlach)

Comment 9

•

12 years ago

(In reply to Mike Hommey [:glandium] from comment #8)
> cumulative duration of all tests that run in less than:
> 1s -> 3:35
> 2s -> 9:35
> 3s -> 14:42
> 5s -> 21:57
> 10s -> 36:24
> 30s -> 60:43
> 
> Looks like it's a global problem

Last year I recall finding some issues with diskio and some longer running tests. See: https://bugzilla.mozilla.org/show_bug.cgi?id=675363

Unfortunately we didn't find any easy solutions to that problem. :(

Nathan Froyd [:froydnj]

Comment 10

•

12 years ago

(In reply to Mike Hommey [:glandium] from comment #8)
> cumulative duration of all tests that run in less than:
> 1s -> 3:35
> 2s -> 9:35
> 3s -> 14:42
> 5s -> 21:57
> 10s -> 36:24
> 30s -> 60:43
> 
> Looks like it's a global problem

FWIW, mochitests show this same sort of distribution too.  A couple of tests are responsible for a disproportionate amount of the testing time.

Mike Hommey [:glandium]

Comment 11

•

12 years ago

On Linux:
(http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-inbound-linux/1346429177/mozilla-inbound_fedora_test-xpcshell-bm24-tests1-linux-build1979.txt.gz)
1535 tests take 33:7
454 tests taking > 1s cumulate to 28:14
239 tests taking > 2s cumulate to 23:15
173 tests taking > 3s cumulate to 20:35
93 tests taking > 5s cumulate to 15:25
28 tests taking > 10s cumulate to 8:06
3 tests taking > 30s cummulate to 1:38

Mike Hommey [:glandium]

Comment 12

•

12 years ago

On OSX:
(http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-inbound-macosx64/1346425731/mozilla-inbound_snowleopard_test-xpcshell-bm11-tests1-macosx-build1889.txt.gz)
1492 tests take 13:06
126 tests taking > 1s cummulate to 6:50
66 tests taking > 2s cumulate to 5:26
43 tests taking > 3s cumulate to 4:31
31 tests taking > 5s cumulate to 3:44
4 tests taking > 10s cumulate to 0:56

Chris AtLee [:catlee]

Comment 13

•

12 years ago

Do these times match times when the tests are run locally?

Chris Cooper [:coop] (he/him)

Comment 14

•

12 years ago

What are the next steps here? 

Seems releng hasn't been very involved to this point. Should this bug live under Testing:XPCShell Harness instead?

Component: Release Engineering → Release Engineering: Automation (General)

OS: Mac OS X → Windows 7

QA Contact: catlee

Whiteboard: [buildfaster:p2][testing]

(not currently active) Ted Mielczarek

Comment 15

•

12 years ago

I don't know if the discussion is on this bug, but we're pretty sure the root cause is just slow I/O.

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Comment 16

•

12 years ago

(In reply to Chris AtLee [:catlee] from comment #13)
> Do these times match times when the tests are run locally?
Question still unanswered, afaict?

(In reply to Chris Cooper [:coop] from comment #14)
> What are the next steps here? 
> 
> Seems releng hasn't been very involved to this point. Should this bug live
> under Testing:XPCShell Harness instead?
Its been ~2months since last comment and ~4months since this bug was filed. Based on comment2, comment3, I'm kicking this over to Testing:XPCShell Harness for investigation by someone who understands the test suite and can debug if anything inappropriate was added to the suite, causing the regression.

As usual, if it would help for RelEng to grant you access to a loaner machine from production, so you can do side-by-side debugging with local runs, please file a dependent bug in mozilla.org:Release Engineering.


(In reply to Ted Mielczarek [:ted.mielczarek] from comment #15)
> I don't know if the discussion is on this bug, but we're pretty sure the
> root cause is just slow I/O.

I dont see discussion about slow hardware i/o in reading earlier comments, so if you could add that info to this bug, it would be helpful.

Component: Release Engineering: Automation (General) → XPCShell Harness

Product: mozilla.org → Testing

QA Contact: catlee

Version: other → unspecified

(not currently active) Ted Mielczarek

Comment 17

•

12 years ago

Sorry, I made that comment and failed to actually look up where that discussion had taken place. Most of it is in bug 675363, which was spun off from bug 617503.

Geoff Brown [:gbrown] (pto Apr 11-18)

Comment 18

•

6 years ago

Mass closing bugs with no activity in 2+ years. If this bug is important to you, please re-open.

Status: NEW → RESOLVED

Closed: 6 years ago

Resolution: --- → WONTFIX

Phil Ringnalda (:philor)

Comment 19

•

6 years ago

My, how things change when you ignore a bug for five years. Current times on debug are more like an hour on Mac, 40 minutes on Windows, and 150 minutes across ten chunks on Linux64.