Intermittent /css/css-shapes/shape-outside/shape-image/gradients/shape-outside-linear-gradient-014.html | Testing /shape-outside-linear-gradient-014.html == shape-outside-linear-gradient-014.html
Categories
(Core :: Layout: Floats, defect, P5)
Tracking
()
People
(Reporter: intermittent-bug-filer, Assigned: CosminS)
Details
(Keywords: intermittent-failure, Whiteboard: [stockwell disabled])
Attachments
(1 file)
Filed by: apavel [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer.html#?job_id=286837691&repo=try
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/Ukf_aeZJTVau8Et6nVCn8g/runs/0/artifacts/public/logs/live_backing.log
Reftest URL: https://hg.mozilla.org/mozilla-central/raw-file/tip/layout/tools/reftest/reftest-analyzer.xhtml#logurl=https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/Ukf_aeZJTVau8Et6nVCn8g/runs/0/artifacts/public/logs/live_backing.log&only_show_unexpected=1
[task 2020-01-29T11:52:32.805Z] 11:52:32 INFO - TEST-START | /css/css-shapes/shape-outside/shape-image/gradients/shape-outside-linear-gradient-014.html
[task 2020-01-29T11:52:32.805Z] 11:52:32 INFO - PID 7648 | 1580298752799 Marionette INFO Testing http://web-platform.test:8000/css/css-shapes/shape-outside/shape-image/gradients/shape-outside-linear-gradient-014.html == http://web-platform.test:8000/css/css-shapes/shape-outside/shape-image/gradients/reference/shape-outside-linear-gradient-001-ref.html
[task 2020-01-29T11:52:32.879Z] 11:52:32 INFO - PID 7648 | [Child 5608, Main Thread] WARNING: Trying to request nsIHttpChannel from DocumentChannel, this is likely broken: file z:/build/build/src/netwerk/ipc/DocumentChannel.cpp, line 63
[task 2020-01-29T11:52:33.036Z] 11:52:33 INFO - PID 7648 | 1580298753025 Marionette INFO No differences allowed
[task 2020-01-29T11:52:33.133Z] 11:52:33 INFO - TEST-UNEXPECTED-FAIL | /css/css-shapes/shape-outside/shape-image/gradients/shape-outside-linear-gradient-014.html | Testing http://web-platform.test:8000/css/css-shapes/shape-outside/shape-image/gradients/shape-outside-linear-gradient-014.html == http://web-platform.test:8000/css/css-shapes/shape-outside/shape-image/gradients/reference/shape-outside-linear-gradient-001-ref.html
[task 2020-01-29T11:52:33.133Z] 11:52:33 INFO - Found 40000 pixels different, maximum difference per channel 255
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 13•5 years ago
|
||
Comment hidden (Intermittent Failures Robot) |
It seems like the thing that made it a lot more frequent was probably a good bit more recent than that: probably 2020-04-22 around 18:00-22:00 UTC.
In other words, I'm more interested in this range for the increased frequency.
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 18•5 years ago
|
||
These frequent failures might be related to some issues in windows machines. Some examples of machines the tests are failing on:
https://firefox-ci-tc.services.mozilla.com/provisioners/gecko-t/worker-types/t-win10-64-gpu-s/workers/aws/i-0f0d0ddf12600a870
https://firefox-ci-tc.services.mozilla.com/provisioners/gecko-t/worker-types/t-win10-64-gpu-s/workers/aws/i-07500595c40f5831d
https://firefox-ci-tc.services.mozilla.com/provisioners/gecko-t/worker-types/t-win10-64-gpu-s/workers/aws/i-066bb79c93076d308
https://firefox-ci-tc.services.mozilla.com/provisioners/gecko-t/worker-types/t-win10-64-gpu-s/workers/aws/i-09508deb8b775b311
All the machines from the push these failures started happening were later terminated. The later on retriggers are green.
Range 1 and Range 2 of retriggers. The failures here don't look in anyway related to https://hg.mozilla.org/integration/autoland/rev/7f4f1d605c69b3a471727810b6fc876187e8211e
Maybe Rob has more info about the state of the windows machines. There was also a spike of retries on windows jobs for a couple of days.
Comment hidden (Intermittent Failures Robot) |
Comment 20•5 years ago
|
||
we had an issue with windows workers during a recent github outage that resulted in a number of machines going idle (doing no work but being included in the number of running workers). i terminated a large number of machines over a period of several hours. i used the script at https://gist.github.com/grenade/63bf380b79b995065cb6530df34725c8 to make the determination as to whether a machine was idle or productive, by querying the taskcluster api to see if the instance had recent task runs associated with it. if the api response indicated that the instance had not run a task in the preceding thirty minutes, it was terminated.
since we had a spike of retries, it is apparent that some of those terminations were against machines that must have actually been doing productive work. i include a link to the script above in case anyone wishes to scrutinise it for flaws.
i apologise for the inconvenience caused and can only say that we were in a situation where we had to get rid of a lot of unproductive workers in order to reduce a large task backlog that at the time we were unable to create new capacity for.
Comment 21•5 years ago
|
||
/cc :jrmuizel
:grenade - can you find out what graphics cards are in the windows gpu workers? The failures are only happening on windows-qr builds, i.e. webrender enabled. And the regression range ("range 1" from comment 18) points to either bug 1632239 or bug 1624988, both of which are plausible candidates as they touch webrender code. The first of the two is specific to Windows and certain graphics cards, so if the workers this is running have those graphics cards, that might explain the problem.
Comment hidden (Intermittent Failures Robot) |
Comment 23•5 years ago
|
||
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #21)
/cc :jrmuizel
:grenade - can you find out what graphics cards are in the windows gpu workers? The failures are only happening on windows-qr builds, i.e. webrender enabled. And the regression range ("range 1" from comment 18) points to either bug 1632239 or bug 1624988, both of which are plausible candidates as they touch webrender code. The first of the two is specific to Windows and certain graphics cards, so if the workers this is running have those graphics cards, that might explain the problem.
the best way to get current information about worker systems is to create a task on the worker type you are interested in, with commands that will query the system for the information you need. to answer the question above, i created tasks with definitions similar to:
retries: 0
created: '2020-04-28T08:23:55.176Z'
deadline: '2020-05-01T08:43:34.349Z'
expires: '2021-05-02T08:43:34.349Z'
provisionerId: gecko-t
workerType: t-win10-64-gpu-s
priority: highest
tags: {}
scopes: []
payload:
command:
- wmic path win32_VideoController get name
- 'wmic path win32_VideoController get /all /format:table'
- 'wmic path win32_VideoController get /all /format:list'
- 'wmic path win32_VideoController get /all /format:csv'
maxRunTime: 60
extra: {}
metadata:
name: determine video controller on gecko-t/t-win10-64-gpu-s
description: |-
## query wmic for video controller info
this task demonstrates how to query wmic for video controller metadata
- determine the name of the video controller
- get metadata about video controller in table format
- get metadata about video controller in list format
- get metadata about video controller in csv format
owner: grenade@mozilla.com
source: 'https://bugzilla.mozilla.org/show_bug.cgi?id=1612100#c21'
these tasks show that the video controllers present are:
- gecko-t/t-win10-64-gpu-s:
- NVIDIA Tesla M60
- Microsoft Basic Display Adapter
Comment 24•5 years ago
|
||
Thanks!
Jeff, your patch affected AMD cards, so I would imagine that it shouldn't affect the above cards. So maybe it was Glenn's patch (bug 1624988) that caused it? Either way it should be possible to do a try push with the changes backed out and see if the problem still occurs.
Comment 25•5 years ago
|
||
Comment 26•5 years ago
|
||
... and of course both backouts still have the intermittent failures.
Comment 27•5 years ago
|
||
I triggered the jobs more on the regression range:
So far the results don't make any sense, because both the WR changes seem green, and thus the window narrows to two seemingly-unrelated changes.
Comment hidden (Intermittent Failures Robot) |
Comment 29•5 years ago
|
||
Try push with both changes backed out https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&revision=d7cc8f1d3b30f87303478c7c211e74777cd8e1ba also has the problem. I'm doing more retriggers on the seemingly-unrelated changes.
Comment 30•5 years ago
|
||
As far as I can tell from retriggers, the culprit is the backout of bug 1631211, which makes no sense.
I'm giving up here. The problem is most likely a race condition in the WPT harness or similar, so feel free to disable the test on windows-qr.
Assignee | ||
Comment 31•5 years ago
|
||
Updated•5 years ago
|
Comment 32•5 years ago
|
||
Assignee | ||
Updated•5 years ago
|
Comment 33•5 years ago
|
||
James, any idea why the webreftest failure started on a push which seems unrelated? See comment 30.
Comment 34•5 years ago
|
||
bugherder |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Updated•2 years ago
|
Updated•2 years ago
|
Updated•2 years ago
|
Description
•