Closed Bug 1207900 Opened 10 years ago Closed 7 years ago

Intermittent Windows talos-h2/g2 command timed out: 7200 seconds elapsed, attempting to kill

Categories

(Testing :: Talos, defect, P3)

defect

Tracking

(firefox44 affected, firefox58 fixed, firefox59 fixed)

RESOLVED WORKSFORME
mozilla59
Tracking Status
firefox44 --- affected
firefox58 --- fixed
firefox59 --- fixed

People

(Reporter: philor, Unassigned)

References

Details

(Keywords: intermittent-failure, Whiteboard: [stockwell disabled])

Attachments

(1 file)

No description provided.
I'll try to figure it out and fix it.
Assignee: nobody → aschen
Status: NEW → ASSIGNED
Bulk assigning P3 to all open intermittent bugs without a priority set in Firefox components per bug 1298978.
Priority: -- → P3
In the last 7 days there have been 52 failures. All the failures occur only on the Windows 7 platform. 50% of the failures are on opt build type 50% of the failures are on pgo Here is a recent log file: https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-inbound&job_id=145290972&lineNumber=1687 And a relevant snippet from it: 03:56:49 INFO - ---------------------------------------- 1685 03:56:49 INFO - Rolling back uninstall of cryptography 1686 03:56:54 INFO - Command "c:\slave\test\py3venv\scripts\python.exe -u -c "import setuptools, tokenize;__file__='C:\\Users\\cltbld\\AppData\\Local\\Temp\\pip-build-zfi4tigm\\cryptography\\setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record C:\Users\cltbld\AppData\Local\Temp\pip-yas2b5pg-record\install-record.txt --single-version-externally-managed --compile --install-headers c:\slave\test\py3venv\include\site\python3.6\cryptography" failed with error code 1 in C:\Users\cltbld\AppData\Local\Temp\pip-build-zfi4tigm\cryptography\ 1687 03:56:54 ERROR - Return code: 1 1688 03:56:54 INFO - Installing mitmproxy 1689 03:56:54 INFO - Running command: C:\slave\test\py3venv\Scripts\pip install mitmproxy 1690 03:56:54 INFO - Using env: {'ALLUSERSPROFILE': 'C:\\ProgramData', 1691 03:56:54 INFO - 'APPDATA': 'C:\\Users\\cltbld\\AppData\\Roaming', 1692 03:56:54 INFO - 'COMMONPROGRAMFILES': 'C:\\Program Files\\Common Files', 1693 03:56:54 INFO - 'COMPUTERNAME': 'T-W732-IX-132', 1694 03:56:54 INFO - 'COMSPEC': 'C:\\windows\\system32\\cmd.exe', 1695 03:56:54 INFO - 'CYGWINBASE': 'C:\\cygwin', 1696 03:56:54 INFO - 'DCLOCATION': 'SCL3', 1697 03:56:54 INFO - 'DNSSUFFIX': 'wintest.releng.scl3.mozilla.com', 1698 03:56:54 INFO - 'FP_NO_HOST_CHECK': 'NO', 1699 03:56:54 INFO - 'HOMEDRIVE': 'C:', 1700 03:56:54 INFO - 'HOMEPATH': '\\Users\\cltbld', 1701 03:56:54 INFO - 'KTS_HOME': 'C:\\Program Files\\KTS', 1702 03:56:54 INFO - 'KTS_VERSION': '1.19c', 1703 03:56:54 INFO - 'LOCALAPPDATA': 'C:\\Users\\cltbld\\AppData\\Local', 1704 03:56:54 INFO - 'LOGONSERVER': '\\\\T-W732-IX-132', 1705 03:56:54 INFO - 'MONDIR': 'C:\\Monitor_config\\', 1706 03:56:54 INFO - 'MOZBUILDDIR': 'C:\\mozilla-build\\', 1707 03:56:54 INFO - 'MOZ_CRASHREPORTER_NO_REPORT': '1', 1708 03:56:54 INFO - 'MOZ_NO_REMOTE': '1', 1709 03:56:54 INFO - 'NO_EM_RESTART': '1', 1710 03:56:54 INFO - 'NUMBER_OF_PROCESSORS': '8', 1711 03:56:54 INFO - 'OS': 'Windows_NT', 1712 03:56:54 INFO - 'OURDRIVE': 'C:', 1713 03:56:54 INFO - 'PATH': 'C:\\Python24;C:\\Py
Flags: needinfo?(sdeckelmann)
Whiteboard: [stockwell needswork]
I'm not currently working on this.
Assignee: bmo → nobody
Status: ASSIGNED → NEW
Smells like a build system issue, python3 maybe? gps?
Flags: needinfo?(sdeckelmann) → needinfo?(gps)
Nearly all the recent failures are in Talos on Windows -- let's check in with jmaher to see if he knows what's happening.
Flags: needinfo?(jmaher)
we are disabling h2 for win7 due to fileIO times, it will remain on win10, osx, and linux64; this will go away, the trees were closed and I will land tomorrow. This will be in bug 1415858
Flags: needinfo?(jmaher)
disabled as per bug 1415858
Flags: needinfo?(gps)
Whiteboard: [stockwell needswork] → [stockwell disabled]
There are 33 failures in the last 7 days, all of them occurring on Windows 10-64 platform. Failure rate goes up starting with 12th of December. Recent log file: https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-central&job_id=151591977&lineNumber=12238 20:38:56 INFO - TEST-START | tp6_facebook_heavy 20:38:56 INFO - Initialising browser for tp6_facebook_heavy test... 20:40:25 INFO - Local copy of 'simple' is fresh enough 20:40:25 INFO - 6 days old 20:40:25 INFO - Cloning profile located at C:\Users\cltbld\.mozilla\profiles\simple 20:40:25 INFO - => \bookmarkbackups 20:40:25 INFO - => \browser-extension-data 20:40:25 INFO - => \browser-extension-data\screenshots@mozilla.org 20:40:25 INFO - => \cache2 20:40:25 INFO - => \cache2\doomed 20:40:25 INFO - => \cache2\entries command timed out: 7200 seconds elapsed, attempting to kill program finished with exit code 1 elapsedTime=7205.006000 ========= master_lag: 0.39 ========= ========= Finished 'c:/mozilla-build/python27/python -u ...' warnings (results: 1, elapsed: 2 hrs, 5 secs) (at 2017-12-13 21:00:25.107003)
When I use a word, it means just what I choose it to mean--neither more nor less.
Component: General → Talos
Product: Core → Testing
Summary: Intermittent Windows xpcshell command timed out: 7200 seconds elapsed, attempting to kill → Intermittent Windows talos-h2 command timed out: 7200 seconds elapsed, attempting to kill
This has increased in an concerning rate over the last 2 days. There are 94 fails in the last week, all of them are on windows10-64. https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1207900 Here's a recent log: https://treeherder.mozilla.org/logviewer.html#?repo=autoland&job_id=150948148&lineNumber=12512 ::rwood, could you please take a look at this and disable the test if necessary. Thank you!
Flags: needinfo?(rwood)
This is an issue with cloning the heavy profile in Win 10. It is taking 60 minutes each time: 12:22:55 INFO - Cloning profile located at C:\Users\cltbld\.mozilla\profiles\simple 12:22:55 INFO - => \bookmarkbackups ... 13:21:05 INFO - Installing Add-ons 13:21:06 INFO - Application command: C:\slave\test\build\application\firefox\firefox http://localhost:49795/getInfo.html -profile c:\users\cltbld\appdata\local\temp\tmp4bqj_l\profile 13:21:06 INFO - TEST-INFO | started I'm going to go ahead and disable talos-h2 on Win 10 (it's already disabled on Win 7). We could break up the talos h2 job into 4 separate jobs but IMO that's too many resources (and configs etc) for little payback (?) What do you guys think? I'm thinking we just permanently run the talos jobs only on linux/osx. Is it even worth having them enabled for try if they intermittent so much? Thanks for the input!
Flags: needinfo?(tarek)
Flags: needinfo?(rwood)
Flags: needinfo?(jmaher)
Talos h1 runs fine on Win 10 (50 min) so just concerned with h2 here.
Flags: needinfo?(tarek)
Flags: needinfo?(jmaher)
Attachment #8937482 - Flags: review?(jmaher) → review?(armenzg)
Comment on attachment 8937482 [details] Bug 1207900 - Disable talos-h2 on Win10 production as it takes too long to run; https://reviewboard.mozilla.org/r/208154/#review214306
Attachment #8937482 - Flags: review?(armenzg) → review+
Pushed by rwood@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/f9f4c7a84ee1 Disable talos-h2 on Win10 production as it takes too long to run; r=armenzg
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla59
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Summary: Intermittent Windows talos-h2 command timed out: 7200 seconds elapsed, attempting to kill → Intermittent Windows talos-h2/g2 command timed out: 7200 seconds elapsed, attempting to kill
Status: REOPENED → RESOLVED
Closed: 8 years ago7 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: