Closed Bug 1207900 Opened 8 years ago Closed 5 years ago

Intermittent Windows talos-h2/g2 command timed out: 7200 seconds elapsed, attempting to kill

Categories

(Testing :: Talos, defect, P3)

defect

Tracking

(firefox44 affected, firefox58 fixed, firefox59 fixed)

RESOLVED WORKSFORME
mozilla59
Tracking Status
firefox44 --- affected
firefox58 --- fixed
firefox59 --- fixed

People

(Reporter: philor, Unassigned)

References

Details

(Keywords: intermittent-failure, Whiteboard: [stockwell disabled])

Attachments

(1 file)

      No description provided.
I'll try to figure it out and fix it.
Assignee: nobody → aschen
Status: NEW → ASSIGNED
Bulk assigning P3 to all open intermittent bugs without a priority set in Firefox components per bug 1298978.
Priority: -- → P3
In the last 7 days there have been 52 failures.

All the failures occur only on the Windows 7 platform.
50% of the failures are on opt build type
50% of the failures are on pgo

Here is a recent log file: https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-inbound&job_id=145290972&lineNumber=1687
And a relevant snippet from it:

03:56:49     INFO -      ----------------------------------------
1685
03:56:49     INFO -    Rolling back uninstall of cryptography
1686
03:56:54     INFO -  Command "c:\slave\test\py3venv\scripts\python.exe -u -c "import setuptools, tokenize;__file__='C:\\Users\\cltbld\\AppData\\Local\\Temp\\pip-build-zfi4tigm\\cryptography\\setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record C:\Users\cltbld\AppData\Local\Temp\pip-yas2b5pg-record\install-record.txt --single-version-externally-managed --compile --install-headers c:\slave\test\py3venv\include\site\python3.6\cryptography" failed with error code 1 in C:\Users\cltbld\AppData\Local\Temp\pip-build-zfi4tigm\cryptography\
1687
03:56:54    ERROR - Return code: 1
1688
03:56:54     INFO - Installing mitmproxy
1689
03:56:54     INFO - Running command: C:\slave\test\py3venv\Scripts\pip install mitmproxy
1690
03:56:54     INFO - Using env: {'ALLUSERSPROFILE': 'C:\\ProgramData',
1691
03:56:54     INFO -  'APPDATA': 'C:\\Users\\cltbld\\AppData\\Roaming',
1692
03:56:54     INFO -  'COMMONPROGRAMFILES': 'C:\\Program Files\\Common Files',
1693
03:56:54     INFO -  'COMPUTERNAME': 'T-W732-IX-132',
1694
03:56:54     INFO -  'COMSPEC': 'C:\\windows\\system32\\cmd.exe',
1695
03:56:54     INFO -  'CYGWINBASE': 'C:\\cygwin',
1696
03:56:54     INFO -  'DCLOCATION': 'SCL3',
1697
03:56:54     INFO -  'DNSSUFFIX': 'wintest.releng.scl3.mozilla.com',
1698
03:56:54     INFO -  'FP_NO_HOST_CHECK': 'NO',
1699
03:56:54     INFO -  'HOMEDRIVE': 'C:',
1700
03:56:54     INFO -  'HOMEPATH': '\\Users\\cltbld',
1701
03:56:54     INFO -  'KTS_HOME': 'C:\\Program Files\\KTS',
1702
03:56:54     INFO -  'KTS_VERSION': '1.19c',
1703
03:56:54     INFO -  'LOCALAPPDATA': 'C:\\Users\\cltbld\\AppData\\Local',
1704
03:56:54     INFO -  'LOGONSERVER': '\\\\T-W732-IX-132',
1705
03:56:54     INFO -  'MONDIR': 'C:\\Monitor_config\\',
1706
03:56:54     INFO -  'MOZBUILDDIR': 'C:\\mozilla-build\\',
1707
03:56:54     INFO -  'MOZ_CRASHREPORTER_NO_REPORT': '1',
1708
03:56:54     INFO -  'MOZ_NO_REMOTE': '1',
1709
03:56:54     INFO -  'NO_EM_RESTART': '1',
1710
03:56:54     INFO -  'NUMBER_OF_PROCESSORS': '8',
1711
03:56:54     INFO -  'OS': 'Windows_NT',
1712
03:56:54     INFO -  'OURDRIVE': 'C:',
1713
03:56:54     INFO -  'PATH': 'C:\\Python24;C:\\Py
Flags: needinfo?(sdeckelmann)
Whiteboard: [stockwell needswork]
I'm not currently working on this.
Assignee: bmo → nobody
Status: ASSIGNED → NEW
Smells like a build system issue, python3 maybe? gps?
Flags: needinfo?(sdeckelmann) → needinfo?(gps)
Nearly all the recent failures are in Talos on Windows -- let's check in with jmaher to see if he knows what's happening.
Flags: needinfo?(jmaher)
we are disabling h2 for win7 due to fileIO times, it will remain on win10, osx, and linux64; this will go away, the trees were closed and I will land tomorrow.  This will be in bug 1415858
Flags: needinfo?(jmaher)
disabled as per bug 1415858
Flags: needinfo?(gps)
Whiteboard: [stockwell needswork] → [stockwell disabled]
There are 33 failures in the last 7 days, all of them occurring on Windows 10-64 platform. Failure rate goes up starting with 12th of December.

Recent log file: https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-central&job_id=151591977&lineNumber=12238

20:38:56     INFO -  TEST-START | tp6_facebook_heavy
20:38:56     INFO -  Initialising browser for tp6_facebook_heavy test...
20:40:25     INFO -  Local copy of 'simple' is fresh enough
20:40:25     INFO -  6 days old
20:40:25     INFO -  Cloning profile located at C:\Users\cltbld\.mozilla\profiles\simple
20:40:25     INFO -  => \bookmarkbackups
20:40:25     INFO -  => \browser-extension-data
20:40:25     INFO -  => \browser-extension-data\screenshots@mozilla.org
20:40:25     INFO -  => \cache2
20:40:25     INFO -  => \cache2\doomed
20:40:25     INFO -  => \cache2\entries

command timed out: 7200 seconds elapsed, attempting to kill
program finished with exit code 1
elapsedTime=7205.006000
========= master_lag: 0.39 =========
========= Finished 'c:/mozilla-build/python27/python -u ...' warnings (results: 1, elapsed: 2 hrs, 5 secs) (at 2017-12-13 21:00:25.107003)
When I use a word, it means just what I choose it to mean--neither more nor less.
Component: General → Talos
Product: Core → Testing
Summary: Intermittent Windows xpcshell command timed out: 7200 seconds elapsed, attempting to kill → Intermittent Windows talos-h2 command timed out: 7200 seconds elapsed, attempting to kill
This has increased in an concerning rate over the last 2 days. There are 94 fails in the last week, all of them are on windows10-64. 

https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1207900 

Here's a recent log: https://treeherder.mozilla.org/logviewer.html#?repo=autoland&job_id=150948148&lineNumber=12512

::rwood, could you please take a look at this and disable the test if necessary. Thank you!
Flags: needinfo?(rwood)
This is an issue with cloning the heavy profile in Win 10. It is taking 60 minutes each time:

12:22:55     INFO -  Cloning profile located at C:\Users\cltbld\.mozilla\profiles\simple
12:22:55     INFO -  => \bookmarkbackups
...
13:21:05     INFO -  Installing Add-ons
13:21:06     INFO -  Application command: C:\slave\test\build\application\firefox\firefox  http://localhost:49795/getInfo.html -profile c:\users\cltbld\appdata\local\temp\tmp4bqj_l\profile
13:21:06     INFO -  TEST-INFO | started 

I'm going to go ahead and disable talos-h2 on Win 10 (it's already disabled on Win 7).

We could break up the talos h2 job into 4 separate jobs but IMO that's too many resources (and configs etc) for little payback (?) What do you guys think? I'm thinking we just permanently run the talos jobs only on linux/osx. Is it even worth having them enabled for try if they intermittent so much? Thanks for the input!
Flags: needinfo?(tarek)
Flags: needinfo?(rwood)
Flags: needinfo?(jmaher)
Talos h1 runs fine on Win 10 (50 min) so just concerned with h2 here.
Flags: needinfo?(tarek)
Flags: needinfo?(jmaher)
Attachment #8937482 - Flags: review?(jmaher) → review?(armenzg)
Comment on attachment 8937482 [details]
Bug 1207900 - Disable talos-h2 on Win10 production as it takes too long to run;

https://reviewboard.mozilla.org/r/208154/#review214306
Attachment #8937482 - Flags: review?(armenzg) → review+
Pushed by rwood@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/f9f4c7a84ee1
Disable talos-h2 on Win10 production as it takes too long to run; r=armenzg
https://hg.mozilla.org/mozilla-central/rev/f9f4c7a84ee1
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla59
Recent failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=162787180&repo=autoland&lineNumber=13361
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Summary: Intermittent Windows talos-h2 command timed out: 7200 seconds elapsed, attempting to kill → Intermittent Windows talos-h2/g2 command timed out: 7200 seconds elapsed, attempting to kill
Status: REOPENED → RESOLVED
Closed: 5 years ago5 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.