Closed
Bug 1436237
Opened 7 years ago
Closed 6 years ago
Intermittent reftest Automation Error: mozprocess timed out after 1000 seconds running ['Z:\\task_1517946424\\build\\venv\\Scripts\\python', '-u', 'Z:\\task_1517946424\\build\\tests\\reftest\\runreftest.py'
Categories
(Testing :: Reftest, defect, P5)
Tracking
(firefox-esr60 fixed, firefox64 fixed, firefox65 fixed)
RESOLVED
FIXED
People
(Reporter: intermittent-bug-filer, Assigned: gbrown)
References
Details
(Keywords: intermittent-failure, Whiteboard: [stockwell disable-recommended])
Filed by: rgurzau [at] mozilla.com
https://treeherder.mozilla.org/logviewer.html#?job_id=160656014&repo=mozilla-inbound
https://queue.taskcluster.net/v1/task/Og_9HnMORqmzoy7tmHD6Eg/runs/0/artifacts/public/logs/live_backing.log
https://hg.mozilla.org/mozilla-central/raw-file/tip/layout/tools/reftest/reftest-analyzer.xhtml#logurl=https://queue.taskcluster.net/v1/task/Og_9HnMORqmzoy7tmHD6Eg/runs/0/artifacts/public/logs/live_backing.log&only_show_unexpected=1
20:06:49 INFO - Automation Error: mozprocess timed out after 1000 seconds running ['Z:\\task_1517946424\\build\\venv\\Scripts\\python', '-u', 'Z:\\task_1517946424\\build\\tests\\reftest\\runreftest.py', '--appname=Z:\\task_1517946424\\build\\application\\firefox\\firefox.exe', '--utility-path=tests/bin', '--extra-profile-file=tests/bin/plugins', '--symbols-path=https://queue.taskcluster.net/v1/task/TH62S7GNTVKzQzjapqX8zg/artifacts/public/build/target.crashreporter-symbols.zip', '--log-raw=Z:\\task_1517946424\\build\\blobber_upload_dir\\crashtest_raw.log', '--log-errorsummary=Z:\\task_1517946424\\build\\blobber_upload_dir\\crashtest_errorsummary.log', '--cleanup-crashes', '--marionette-startup-timeout=180', '--suite=crashtest', '--', 'tests/reftest/tests/testing/crashtest/crashtests.list']
[taskcluster 2018-02-06T20:47:07.613Z] Aborting task - max run time exceeded!
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 6•7 years ago
|
||
In the last 7 days, there have been 35 failures.
Most of the failures are on windows10-64, and a few of them on windows10-64-ccov.
Affected build types: debug, opt and pgo.
An example of a recent log file:
https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-inbound&job_id=164292055&lineNumber=1778
And the relevant part of the log:
11:18:54 INFO - ### XPCOM_MEM_BLOAT_LOG defined -- logging bloat/leaks to c:\users\genericworker\appdata\local\temp\tmpweg8qb.mozrunner\runreftest_leaks_tab_pid7796.log
11:18:54 INFO - [Child 7796, Main Thread] WARNING: No CID found when attempting to map contract ID: file z:/build/build/src/xpcom/components/nsComponentManager.cpp, line 508
11:18:59 INFO - --DOCSHELL 000001AD50760000 == 7 [pid = 1692] [id = {bf19be86-71be-4ee6-ba58-3e0737a37be4}]
11:18:59 INFO - --DOCSHELL 000001AD4CEC7000 == 6 [pid = 1692] [id = {76212ae8-d9a7-4e5c-a735-af8b48184559}]
11:18:59 INFO - --DOCSHELL 000001AD4FAE8000 == 5 [pid = 1692] [id = {61fd81ea-e623-4d94-a990-c3c345c00738}]
11:18:59 INFO - --DOCSHELL 000001AD52803000 == 4 [pid = 1692] [id = {2909ac40-b58a-4b02-870a-0ad7492dee21}]
11:19:00 INFO - --DOMWINDOW == 18 (000001AD50175C00) [pid = 1692] [serial = 8] [outer = 0000000000000000] [url = about:blank]
11:19:01 INFO - --DOCSHELL 0000021E8043F800 == 1 [pid = 5568] [id = {aeda12ed-9386-4d75-9417-332f2f52f82c}]
11:19:04 INFO - --DOMWINDOW == 17 (000001AD4D1E8400) [pid = 1692] [serial = 4] [outer = 0000000000000000] [url = about:blank]
11:19:04 INFO - --DOMWINDOW == 16 (000001AD52FA8000) [pid = 1692] [serial = 11] [outer = 0000000000000000] [url = about:blank]
11:19:04 INFO - --DOMWINDOW == 15 (000001AD4C7B4C00) [pid = 1692] [serial = 13] [outer = 0000000000000000] [url = about:blank]
11:19:04 INFO - --DOMWINDOW == 14 (000001AD4C7B6400) [pid = 1692] [serial = 14] [outer = 0000000000000000] [url = about:blank]
11:19:06 INFO - --DOMWINDOW == 5 (0000021E87F2C000) [pid = 5568] [serial = 5] [outer = 0000000000000000] [url = about:blank]
11:19:06 INFO - --DOMWINDOW == 4 (0000021E87B17C00) [pid = 5568] [serial = 2] [outer = 0000000000000000] [url = about:blank]
11:19:07 INFO - --DOMWINDOW == 13 (000001AD50972400) [pid = 1692] [serial = 9] [outer = 0000000000000000] [url = about:blank]
11:19:09 INFO - --DOMWINDOW == 3 (0000021E87B17800) [pid = 5568] [serial = 1] [outer = 0000000000000000] [url = chrome://gfxsanity/content/sanitytest.html]
11:19:11 INFO - --DOMWINDOW == 2 (0000027E741F4400) [pid = 3572] [serial = 2] [outer = 0000000000000000] [url = about:blank]
11:19:11 INFO - --DOMWINDOW == 2 (0000021E87E92000) [pid = 5568] [serial = 3] [outer = 0000000000000000] [url = chrome://gfxsanity/content/sanitytest.html]
11:19:13 INFO - --DOMWINDOW == 12 (000001AD5251A800) [pid = 1692] [serial = 10] [outer = 0000000000000000] [url = about:blank]
11:35:53 INFO - Automation Error: mozprocess timed out after 1000 seconds running ['Z:\\task_1519643815\\build\\venv\\Scripts\\python', '-u', 'Z:\\task_1519643815\\build\\tests\\reftest\\runreftest.py', '--appname=Z:\\task_1519643815\\build\\application\\firefox\\firefox.exe', '--utility-path=tests/bin', '--extra-profile-file=tests/bin/plugins', '--symbols-path=Z:\\task_1519643815\\build\\symbols', '--log-raw=Z:\\task_1519643815\\build\\blobber_upload_dir\\crashtest_raw.log', '--log-errorsummary=Z:\\task_1519643815\\build\\blobber_upload_dir\\crashtest_errorsummary.log', '--cleanup-crashes', '--marionette-startup-timeout=180', '--suite=crashtest', '--', 'tests/reftest/tests/testing/crashtest/crashtests.list']
[taskcluster 2018-02-26T12:16:57.035Z] Exit Code: 0
[taskcluster 2018-02-26T12:16:57.035Z] User Time: 0s
[taskcluster 2018-02-26T12:16:57.035Z] Kernel Time: 0s
[taskcluster 2018-02-26T12:16:57.035Z] Wall Time: 59m54.3194423s
[taskcluster 2018-02-26T12:16:57.035Z] Peak Memory: 4894720
[taskcluster 2018-02-26T12:16:57.035Z] Result: IDLENESS_LIMIT_EXCEEDED
[taskcluster 2018-02-26T12:16:57.036Z] === Task Finished ===
[taskcluster 2018-02-26T12:16:57.036Z] Task Duration: 59m54.3239426s
:ahal As you are the triage owner of this component, could you please take a look at this?
Thank you!
Flags: needinfo?(ahalberstadt)
Whiteboard: [stockwell needswork]
Comment 7•7 years ago
|
||
:gbrown/:whimboo- this looks to be windows 10 reftests starting the browser and not starting the tests- not sure why we don't hit the 370 second timeout- maybe win10 has other issues and there is a bug in mozprocess?
Flags: needinfo?(hskupin)
Flags: needinfo?(gbrown)
Flags: needinfo?(ahalberstadt)
![]() |
Assignee | |
Comment 8•7 years ago
|
||
I think the key here is why don't we hit the 370 second timeout. If we hit that timeout, we'd get a crash report and screenshots and have a chance of finding a root cause. I'll make a quick check that we are setting a no-output timeout, but don't have many cycles for diving in here right now.
Flags: needinfo?(gbrown)
![]() |
Assignee | |
Comment 9•7 years ago
|
||
Using Windows 10 debug crashtests, I verified that mozprocess is using a 370 second output timeout when running firefox. I don't know what's going wrong.
Comment 10•7 years ago
|
||
Do reftests and crashtests set the mozcrashreporter environment variable to shutdown Firefox in case of content crashes? This looks kinda like we see a crash of the tab which drives the tests.
Flags: needinfo?(hskupin)
Comment 11•7 years ago
|
||
Oh, and we should really take a screenshot when such a situation happens. I think that is what we do with mochitests, and that helped us with the above situation.
Comment hidden (Intermittent Failures Robot) |
Comment 13•7 years ago
|
||
(In reply to Geoff Brown [:gbrown] from comment #9)
> Using Windows 10 debug crashtests, I verified that mozprocess is using a 370
> second output timeout when running firefox. I don't know what's going wrong.
We may hang infinitely due to bug 1421289? If there is a crash and Firefox would quit itself, the result might be different.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Updated•7 years ago
|
Summary: Intermittent Automation Error: mozprocess timed out after 1000 seconds running ['Z:\\task_1517946424\\build\\venv\\Scripts\\python', '-u', 'Z:\\task_1517946424\\build\\tests\\reftest\\runreftest.py' → Intermittent reftest Automation Error: mozprocess timed out after 1000 seconds running ['Z:\\task_1517946424\\build\\venv\\Scripts\\python', '-u', 'Z:\\task_1517946424\\build\\tests\\reftest\\runreftest.py'
Comment hidden (Intermittent Failures Robot) |
Comment 22•7 years ago
|
||
Sebastian, what can we do to make sure that this bug appears in the suggested bugs on Treeherder for such reftest failures?
https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=071ee904485e21e19ca08456d32bce6825b77a26&selectedJob=171995989
Flags: needinfo?(aryx.bugmail)
![]() |
||
Comment 23•7 years ago
|
||
Ed, can you provide information how to let this show up as a bug suggestion in TH? Bug 1444831 is similar and gets suggested for https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=e7b736c64e87afa0cf39066f6956f11fadb1ce42&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=retry&filter-resultStatus=usercancel&filter-resultStatus=running&filter-resultStatus=pending&filter-resultStatus=runnable&selectedJob=167374011
Flags: needinfo?(aryx.bugmail) → needinfo?(emorley)
Comment 24•7 years ago
|
||
The failure line in the log is:
01:09:35 INFO - Automation Error: mozprocess timed out after 1000 seconds running ['C:\\Users\\task_1522889138\\build\\venv\\Scripts\\python', '-u', 'C:\\Users\\task_1522889138\\build\\tests\\reftest\\runreftest.py', '--total-chunks', '2', '--this-chunk', '2', '--appname=C:\\Users\\task_1522889138\\build\\application\\firefox\\firefox.exe', '--utility-path=tests/bin', '--extra-profile-file=tests/bin/plugins', '--symbols-path=https://queue.taskcluster.net/v1/task/A7Uz_VKZS1qllD9i8-I1GQ/artifacts/public/build/target.crashreporter-symbols.zip', '--log-raw=C:\\Users\\task_1522889138\\build\\blobber_upload_dir\\reftest_raw.log', '--log-errorsummary=C:\\Users\\task_1522889138\\build\\blobber_upload_dir\\reftest_errorsummary.log', '--cleanup-crashes', '--marionette-startup-timeout=180', '--sandbox-read-whitelist=C:\\Users\\task_1522889138\\build', '--suite=reftest', '--', 'tests/reftest/tests/layout/reftests/reftest.list']
Few issues:
* The error line contains parts that will always be unique (eg "task_1522889138" in the path), so never get correct suggestions. The harness should be only outputting relative paths in error messages.
* It's a ridiculously long error line (albeit bug 1444831 is even worse). Changing to relative paths throughout would help shorten it. The errors suggestions logic trims the line to 100 characters so the rest never gets seen anyway (https://github.com/mozilla/treeherder/blob/856e96cf3f32008f2fd6197c993a7736fafcee82/treeherder/model/error_summary.py#L132-L140)
* Ideally the mozharness error prefix for this line would be ERROR not INFO (though this doesn't affect the log parser, since it specifically checks for "Automation Error:", though we want to move away from hardcoding every error string variation in Treeherder)
Flags: needinfo?(emorley)
Comment 25•7 years ago
|
||
(In reply to Ed Morley [:emorley] from comment #24)
> * The error line contains parts that will always be unique (eg
> "task_1522889138" in the path), so never get correct suggestions. The
> harness should be only outputting relative paths in error messages.
Those lines are not from the individual harnesses but from mozharness. I'm not sure why we log those with the absolute path and not the relative one. Maybe to see which exact binaries are used? If we could agree to have relative paths as output by mozharness, we could indeed make things better for Treeherder.
Chris, what do you think?
> * Ideally the mozharness error prefix for this line would be ERROR not INFO
> (though this doesn't affect the log parser, since it specifically checks for
> "Automation Error:", though we want to move away from hardcoding every error
> string variation in Treeherder)
Would it help if we make the harness name part of the error line so we have at least an idea about which harness the automation error is about? It could be something like: "Automation Error: mozprocess timed out after 1000 seconds running reftest: %command%.
Flags: needinfo?(catlee)
Comment 26•7 years ago
|
||
(In reply to Henrik Skupin (:whimboo) from comment #25)
> (In reply to Ed Morley [:emorley] from comment #24)
> > * The error line contains parts that will always be unique (eg
> > "task_1522889138" in the path), so never get correct suggestions. The
> > harness should be only outputting relative paths in error messages.
>
> Those lines are not from the individual harnesses but from mozharness. I'm
> not sure why we log those with the absolute path and not the relative one.
> Maybe to see which exact binaries are used? If we could agree to have
> relative paths as output by mozharness, we could indeed make things better
> for Treeherder.
>
> Chris, what do you think?
Without looking at the code, I would assume that we're logging the full commandline run, rather than making an explicit decision about whether to log absolute or relative paths.
If we can switch mozharness to use relative paths for the commandline, then I'm sure the logging will be improved.
> > * Ideally the mozharness error prefix for this line would be ERROR not INFO
> > (though this doesn't affect the log parser, since it specifically checks for
> > "Automation Error:", though we want to move away from hardcoding every error
> > string variation in Treeherder)
>
> Would it help if we make the harness name part of the error line so we have
> at least an idea about which harness the automation error is about? It could
> be something like: "Automation Error: mozprocess timed out after 1000
> seconds running reftest: %command%.
Yes, that would be much clearer IMO.
Flags: needinfo?(catlee)
Comment 28•7 years ago
|
||
As OF shows we only have those reftest hangs for QuantumRender those days:
https://treeherder.mozilla.org/intermittent-failures.html#/bugdetails?startday=2018-04-05&endday=2018-04-12&tree=trunk&bug=1436237
Also it happens after Marionette installed the extensions (specialpower, reftest). So I would assume that we see something similar to bug 1452913, where we might not handle a failure appropriately and call `DoneTests()`. As result we hang forever.
Geoff, the reftest extension can log actions to a file when --log-file is used when invoking reftests, but I don't see it happening, or the artifact not uploaded. Could you check that and if necessary get it added? It will help us to identify the last step the extension did, and which step might have failed afterward.
Flags: needinfo?(gbrown)
![]() |
Assignee | |
Comment 29•7 years ago
|
||
(In reply to Henrik Skupin (:whimboo) from comment #28)
> Geoff, the reftest extension can log actions to a file when --log-file is
> used when invoking reftests, but I don't see it happening, or the artifact
> not uploaded. Could you check that and if necessary get it added? It will
> help us to identify the last step the extension did, and which step might
> have failed afterward.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=fda489ab3fd6265850b224fb2c73de3c1b546693
Is this producing the log you are interested in, the "reftest-log.log" artifact? It looks to me like all of that information is already included in the normal raw log artifact (eg "crashtest_raw.log" for crashtests).
Flags: needinfo?(gbrown)
Comment 30•7 years ago
|
||
Oh, you are right. I totally missed that. Sorry. Also with your patch we have kinda huge logs ~17MB per test job due to debug level logging. I think that we are fine and don't need anything extra.
So I have taken another look at the logs and noticed the following. For a successful build the following reftest extension log is seen right after:
https://taskcluster-artifacts.net/SjXrJ914SDmH3bXODrKvzg/0/public/logs/log_info.log
> 02:49:53 INFO - REFTEST INFO | Reading manifest file://C:\Users\task_1523587360\build\tests\reftest\tests\layout\reftests\reftest.list
> 02:49:53 INFO - REFTEST INFO | Dumping JSON representation of sandbox
Note that this is NOT happening in case of hangs! So it feels to me that the reftest extension might fail early during initialization/startup. Maybe something similar to what I fixed yesterday on bug 1452913.
Here where it is called:
https://dxr.mozilla.org/mozilla-central/rev/325ef357e5b73d63794e47c02c7f8e7cf58ccb48/layout/tools/reftest/reftest.jsm#391
The code lives in `ReadTests` which gets called at:
https://dxr.mozilla.org/mozilla-central/rev/325ef357e5b73d63794e47c02c7f8e7cf58ccb48/layout/tools/reftest/reftest.jsm#306-312
Interesting is the check for `FOCUS_FILTER_NON_NEEDS_FOCUS_TESTS`. If `focusFilterMode` is different we register a listener for focus, which calls `ReadTests`. Please note that we would never reach `ReadTests` if the browser doesn't emit this event.
The `focusFilterMode` is set via the preference `reftest.focusFilterMode` in the reftest harness which directly comes from the command line options:
https://dxr.mozilla.org/mozilla-central/rev/325ef357e5b73d63794e47c02c7f8e7cf58ccb48/layout/tools/reftest/runreftest.py#304
And we don't set this option in mozharness:
https://dxr.mozilla.org/mozilla-central/rev/325ef357e5b73d63794e47c02c7f8e7cf58ccb48/testing/mozharness/configs/unittests/win_taskcluster_unittest.py#121-135
As such it is None and I'm fairly sure that we hang due to the above described scenario. Firefox does not have focus.
Geoff, I would suggest that we add another `logger.info()` call which explains that we wait for focus. I will file a new bug and provide a patch.
Comment 31•7 years ago
|
||
Btw the mochitest harness seems to log some more helpful messages in case the focus cannot be set:
> 0:39.87 INFO Error: Unable to restore focus, expect failures and timeouts.
[..]
> 0:39.93 INFO must wait for load
> 0:39.93 INFO must wait for focus
Geoff, it looks like we might want to do some more work, which I don't have the time for. Maybe you could pick this up?
Flags: needinfo?(gbrown)
Comment 32•7 years ago
|
||
I actually filed bug 1453895 about that.
Depends on: 1453895
Flags: needinfo?(gbrown)
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 51•6 years ago
|
||
There are 41 failures in the last 7 days.
They occur on
6 windows7-32 (pgo, debug),
17 windows10-64-qr (debug, opt),
2 windows10-64-ccov (debug),
15 windows10-64 (debug, opt, pgo),
1 osx-10-10 (debug).
Recent failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=207451732&repo=autoland&lineNumber=1210
ahal: Hi, Can you please point to someone that can have a look at this? Thank you!
Flags: needinfo?(ahal)
Whiteboard: [stockwell unknown] → [stockwell unknown][stockwell needswork]
![]() |
Assignee | |
Comment 52•6 years ago
|
||
Note that this became much more frequent around Oct 23. A more detailed regression window might be helpful.
Comment 53•6 years ago
|
||
Noting that no changes to the reftest harness or mozprocess landed between then and now.
Flags: needinfo?(ahal)
Comment 54•6 years ago
|
||
these are primarily windows 10, this is on hardware, not on vm.
:pmoore- when did the generic-worker upgrade roll out to the windows hardware?
:markco- were there other changes to windows hardware such as firmware upgrades, os tweaks, etc?
specifically in the last few days/
Flags: needinfo?(pmoore)
Flags: needinfo?(mcornmesser)
Comment 55•6 years ago
|
||
Please note that in all the cases which I checked Marionette was able to install the reftest extension. So I assume that it's not getting correctly initialized and the tests started.
Comment 56•6 years ago
|
||
Also if it would be a configuration change for the Windows hardware I would also expect to see failures for other branches, but not only trunk:
https://treeherder.mozilla.org/intermittent-failures.html#/bugdetails?startday=2018-10-21&endday=2018-10-24&tree=all&page=1&page_size=100&bug=1436237
As this graph shows we had no failures before Oct 23rd. As such maybe one of those landed commits are related?
https://hg.mozilla.org/mozilla-central/pushloghtml?startdate=2018-10-22&enddate=2018-10-24
As such I would suggest that we do a backfill.
Comment 57•6 years ago
|
||
(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #54)
> these are primarily windows 10, this is on hardware, not on vm.
>
> :pmoore- when did the generic-worker upgrade roll out to the windows
> hardware?
> :markco- were there other changes to windows hardware such as firmware
> upgrades, os tweaks, etc?
>
> specifically in the last few days/
Oct 15th was the last generic-worker upgrade, and Oct 18th was the last significant change to the OS configuration.
I am not aware of any changes to the firmware within the last few days. Dhouse?
Flags: needinfo?(pmoore)
Flags: needinfo?(mcornmesser)
Flags: needinfo?(dhouse)
Comment 58•6 years ago
|
||
Also note there is no Z: drive on Windows hardware nodes. I don't know if that is playing into the issue or not.
Comment 59•6 years ago
|
||
(In reply to Mark Cornmesser [:markco] from comment #57)
> (In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #54)
> > these are primarily windows 10, this is on hardware, not on vm.
> >
> > :pmoore- when did the generic-worker upgrade roll out to the windows
> > hardware?
> > :markco- were there other changes to windows hardware such as firmware
> > upgrades, os tweaks, etc?
> >
> > specifically in the last few days/
>
> Oct 15th was the last generic-worker upgrade, and Oct 18th was the last
> significant change to the OS configuration.
>
> I am not aware of any changes to the firmware within the last few days.
> Dhouse?
No firmware changes or other changes to the hardware have been made within the last week.
Flags: needinfo?(dhouse)
Comment 60•6 years ago
|
||
(In reply to Henrik Skupin (:whimboo) from comment #56)
> https://treeherder.mozilla.org/intermittent-failures.html#/
> bugdetails?startday=2018-10-21&endday=2018-10-
> 24&tree=all&page=1&page_size=100&bug=1436237
>
> As this graph shows we had no failures before Oct 23rd. As such maybe one of
> those landed commits are related?
https://treeherder.mozilla.org/intermittent-failures.html#/bugdetails?startday=2018-09-24&endday=2018-10-24&tree=trunk&page=1&page_size=20&bug=1436237
This looks like it started on Oct. 17th and didn't show up over the weekend.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 68•6 years ago
|
||
There are 222 failures in the last 7 days.
They occur on
112 windows10-64-qr (opt, debug)
99 windows10-64 (all build types)
6 windows10-64-ccov (debug)
2 windows10-64-ccov (debug)
2 windows10-64-devedition (opt)
1 windows10-64-nightly (opt)
Recent failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=209156247&repo=autoland&lineNumber=1003
![]() |
Assignee | |
Comment 69•6 years ago
|
||
I'm trying to narrow this down with debug logging on try. So far it looks like something is hanging in reftest.jsm.
Flags: needinfo?(gbrown)
![]() |
Assignee | |
Comment 70•6 years ago
|
||
This failure is easily reproduced on try, at least on windows-10/debug.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=31bb93611267c85883d29a0740cbfb75fe5cd32a
Debug logging in that try push shows that this code is run for both success and failure cases:
https://dxr.mozilla.org/mozilla-central/rev/c291143e24019097d087f9307e59b49facaf90cb/layout/tools/reftest/reftest.jsm#308-309
but ReadTests() is never called in the failure cases: reftests do not start because the focus event is never received.
We identified this as an intermittent cause of this failure months ago, but didn't follow-up effectively: bug 1453895.
Flags: needinfo?(gbrown)
Comment 71•6 years ago
|
||
Thanks for looking into this Geoff!
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
![]() |
Assignee | |
Comment 76•6 years ago
|
||
Failures have become less frequent lately; I don't know why.
Most/all of the remaining failures should go away with bug 1453895's landing (today).
Comment 77•6 years ago
|
||
Note that there are no new test failures for today! So this looks brilliant!
https://treeherder.mozilla.org/intermittent-failures.html#/bugdetails?startday=2018-10-31&endday=2018-11-07&tree=all&page=1&page_size=20&bug=1436237
Comment hidden (Intermittent Failures Robot) |
Comment 79•6 years ago
|
||
The one remaining failure as filed after the patch on bug 1453895 landed is a crash due to a shutdown hang, and is not related to this bug at all:
https://treeherder.mozilla.org/logviewer.html#?job_id=210599757&repo=autoland&lineNumber=29670-29678
I think it's safe to close this bug as fixed now.
Assignee: nobody → gbrown
Status: NEW → RESOLVED
Closed: 6 years ago
status-firefox64:
--- → fixed
status-firefox65:
--- → fixed
status-firefox-esr60:
--- → fixed
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•