Intermittent Linux64-qr http://localhost:37789/1549576556955/136/289480.html#top == http://localhost:37789/1549576556955/136/289480-ref.html | image comparison, max difference: 4, number of differing pixels: 1374

RESOLVED FIXED in Firefox 67

Status

()

defect
P5
normal
RESOLVED FIXED
5 months ago
5 months ago

People

(Reporter: intermittent-bug-filer, Assigned: NarcisB)

Tracking

({intermittent-failure, regression})

unspecified
mozilla67
x86_64
Linux
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox-esr60 unaffected, firefox65 unaffected, firefox66 unaffected, firefox67 fixed)

Details

(Whiteboard: [retriggered][stockwell disable-recommended])

Attachments

(2 attachments)

#[markdown(off)]
Filed by: apavel [at] mozilla.com

https://treeherder.mozilla.org/logviewer.html#?job_id=226964990&repo=mozilla-inbound

https://queue.taskcluster.net/v1/task/Zl3Z4W8yTcaJDFEQ-Gi6nA/runs/0/artifacts/public/logs/live_backing.log

https://hg.mozilla.org/mozilla-central/raw-file/tip/layout/tools/reftest/reftest-analyzer.xhtml#logurl=https://queue.taskcluster.net/v1/task/Zl3Z4W8yTcaJDFEQ-Gi6nA/runs/0/artifacts/public/logs/live_backing.log&only_show_unexpected=1

[task 2019-02-07T21:57:58.008Z] 21:57:58 INFO - REFTEST TEST-START | http://localhost:37789/1549576556955/136/289480.html#top == http://localhost:37789/1549576556955/136/289480-ref.html
[task 2019-02-07T21:57:58.009Z] 21:57:58 INFO - REFTEST TEST-LOAD | http://localhost:35914/1549576590462/1/289480.html#top | 310 / 2056 (15%)
[task 2019-02-07T21:57:58.049Z] 21:57:58 INFO - [Child 1379, Main Thread] WARNING: site security information will not be persisted: file /builds/worker/workspace/build/src/security/manager/ssl/nsSiteSecurityService.cpp, line 506
[task 2019-02-07T21:57:58.067Z] 21:57:58 INFO - [Parent 1222, Main Thread] WARNING: 'NS_FAILED(rv)', file /builds/worker/workspace/build/src/netwerk/url-classifier/AsyncUrlChannelClassifier.cpp, line 756
[task 2019-02-07T21:57:58.074Z] 21:57:58 INFO - --DOMWINDOW == 120 (0x7fcfdf6aac00) [pid = 1379] [serial = 758] [outer = (nil)] [url = about:blank]
[task 2019-02-07T21:57:58.075Z] 21:57:58 INFO - --DOCSHELL 0x7fcfddcc5000 == 4 [pid = 1379] [id = {fa083af8-bb12-4505-8d8b-217239fb21a5}]
[task 2019-02-07T21:57:58.076Z] 21:57:58 INFO - --DOCSHELL 0x7fcfddda2800 == 3 [pid = 1379] [id = {63aef58b-d3fb-44df-9939-e1fe51c62cfb}]
[task 2019-02-07T21:57:58.078Z] 21:57:58 INFO - --DOMWINDOW == 119 (0x7fcfdf912400) [pid = 1379] [serial = 761] [outer = (nil)] [url = data:text/html,<body style='font-weight: bold; text-align: center'>This is a test</body>]
[task 2019-02-07T21:57:58.080Z] 21:57:58 INFO - --DOMWINDOW == 118 (0x7fcfdf6a4000) [pid = 1379] [serial = 742] [outer = (nil)] [url = data:application/xhtml+xml;charset=UTF-16LE,<%00h%00t%00m%00l%00 %00x%00m%00l%00n%00s%00=%00'%00h%00t%00t%00p%00:%00/%00/%00w%00w%00w%00.%00w%003%00.%00o%00r%00g%00/%001%009%009%009%00/%00x%00h%00t%00m%00l%00'%00>%00<%00h%001%00>%00T%00e%00s%00t%00<%00/%00h%001%00>%00<%00/%00h%00t%00m%00l%00>%00]
[task 2019-02-07T21:57:58.080Z] 21:57:58 INFO - --DOCSHELL 0x7fcfddda2000 == 2 [pid = 1379] [id = {773804b3-a330-462b-b307-5a158fb5b359}]
[task 2019-02-07T21:57:58.081Z] 21:57:58 INFO - --DOCSHELL 0x7fcfdf568000 == 1 [pid = 1379] [id = {b7664c63-af28-49f9-b3d0-28c91ff6fdef}]
[task 2019-02-07T21:57:58.082Z] 21:57:58 INFO - --DOMWINDOW == 117 (0x7fcfdf90e800) [pid = 1379] [serial = 745] [outer = (nil)] [url = data:application/xhtml+xml;charset=us-ascii,<html xmlns='http://www.w3.org/1999/xhtml'><h1>Test</h1></html>]
[task 2019-02-07T21:57:58.160Z] 21:57:58 INFO - ++DOMWINDOW == 118 (0x7fcfddd0d400) [pid = 1379] [serial = 825] [outer = 0x7fcfe2714400]
[task 2019-02-07T21:57:58.507Z] 21:57:58 INFO - [Parent 1222, Main Thread] WARNING: 'NS_FAILED(rv)', file /builds/worker/workspace/build/src/netwerk/url-classifier/AsyncUrlChannelClassifier.cpp, line 756
[task 2019-02-07T21:57:58.509Z] 21:57:58 INFO - [Parent 1222, Main Thread] WARNING: 'NS_FAILED(rv)', file /builds/worker/workspace/build/src/netwerk/url-classifier/AsyncUrlChannelClassifier.cpp, line 756
[task 2019-02-07T21:57:59.380Z] 21:57:59 INFO - REFTEST TEST-LOAD | http://localhost:35914/1549576590462/1/289480-ref.html | 310 / 2056 (15%)
[task 2019-02-07T21:57:59.401Z] 21:57:59 INFO - [Parent 1222, Main Thread] WARNING: 'NS_FAILED(rv)', file /builds/worker/workspace/build/src/netwerk/url-classifier/AsyncUrlChannelClassifier.cpp, line 756
[task 2019-02-07T21:57:59.441Z] 21:57:59 INFO - ++DOMWINDOW == 119 (0x7fcfdfc3bc00) [pid = 1379] [serial = 826] [outer = 0x7fcfe2714400]
[task 2019-02-07T21:57:59.761Z] 21:57:59 INFO - [Parent 1222, Main Thread] WARNING: 'NS_FAILED(rv)', file /builds/worker/workspace/build/src/netwerk/url-classifier/AsyncUrlChannelClassifier.cpp, line 756
[task 2019-02-07T21:58:00.144Z] 21:58:00 INFO - REFTEST INFO | REFTEST fuzzy test (0, 0) <= (4, 1374) <= (6, 1124)
[task 2019-02-07T21:58:00.317Z] 21:58:00 INFO - REFTEST TEST-UNEXPECTED-FAIL | http://localhost:37789/1549576556955/136/289480.html#top == http://localhost:37789/1549576556955/136/289480-ref.html | image comparison, max difference: 4, number of differing pixels: 1374

Bug 1525831 just added a null check to prevent crashes, so it can be the cause.

aosmond, could your changes in the regression range have caused this?

Flags: needinfo?(bugs) → needinfo?(aosmond)
Summary: Intermittent http://localhost:37789/1549576556955/136/289480.html#top == http://localhost:37789/1549576556955/136/289480-ref.html | image comparison, max difference: 4, number of differing pixels: 1374 → Intermittent Linux64-qr http://localhost:37789/1549576556955/136/289480.html#top == http://localhost:37789/1549576556955/136/289480-ref.html | image comparison, max difference: 4, number of differing pixels: 1374

Pushed by csabou@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/601af10b1100
Disabled 289480-ref.html on linux-qr r=jmaher

Keywords: checkin-needed
Pushed by csabou@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/9b4bb0f83ee4
Add 289480.html for the reference file to fix the reftest failures. CLOSED TREE
Backout by nbeleuzu@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/1b8364a650e9
Backed out 2 changesets for reftest failures on reftest.list . CLOSED TREE
Pushed by nbeleuzu@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/f2122008d8ac
Disable 289480-ref.html on linux-qr r=jmaher
Status: NEW → RESOLVED
Closed: 5 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla67
Status: RESOLVED → REOPENED
Keywords: leave-open
Resolution: FIXED → ---

Can you please refer reviews that involve disabling tests for WebRender to me? These bugs are often filed in layout as opposed to the WebRender component, so I don't see them, and I can't find a way to auto-subscribe to reviews based on patch diff content. In this case the test just needed some fuzz adjustment, which I had a patch for in bug 1529288 once I became aware of the problem. Apparently this bug is a dupe of that one, or something - both bugs are getting starred for the same failure.

Flags: needinfo?(svoisen)
Flags: needinfo?(nbeleuzu)
Flags: needinfo?(jmaher)
Flags: needinfo?(aosmond)

I'm going to close this since I landed the fuzz change in bug 1529288, so there's no need to keep this open any more.

Assignee: nobody → nbeleuzu
Blocks: 1453935
Status: REOPENED → RESOLVED
Closed: 5 months ago5 months ago
Keywords: leave-open
OS: Unspecified → Linux
Hardware: Unspecified → x86_64
Resolution: --- → FIXED

I have updated the wiki to have :kats review in the case of a webrender specific disabling:
https://wiki.mozilla.org/Auto-tools/Projects/Stockwell/disable-recommended#Requesting_review

Flags: needinfo?(nbeleuzu)
Flags: needinfo?(jmaher)

I'm a little confused why these changes are being made without owner/peer review, which I think in this case would have led to being redirected to somebody involved with webrender.

(I'd also note that this test is particularly important, since it's a copy of a very high-visibility test page.)

(In reply to David Baron :dbaron: 🏴󠁵󠁳󠁣󠁡󠁿 ⌚UTC-8 (if account gets disabled due to email bounces, ask a bugzilla admin to reenable it) from comment #23)

I'm a little confused why these changes are being made without owner/peer review, which I think in this case would have led to being redirected to somebody involved with webrender.

To be fair two relevant people were needinfo'd and they never responded. So I think it was fair to go ahead and do the disable.

(In reply to David Baron :dbaron: 🏴󠁵󠁳󠁣󠁡󠁿 ⌚UTC-8 (if account gets disabled due to email bounces, ask a bugzilla admin to reenable it) from comment #23)

I'm a little confused why these changes are being made without owner/peer review, which I think in this case would have led to being redirected to somebody involved with webrender.

(I'd also note that this test is particularly important, since it's a copy of a very high-visibility test page.)

When we see a high failure rate for a particular test, we ni the triage owner, either to fix it or to assign someone, if there is no progress/response, we disable the test.

Failure rate here is 191 total failures in the last 30 days and we disable tests when test has failed more than 150 times in the last 30 days. If there is another approach to this, I assume the test can be re-enabled.

The triage owner is different from the module owner and peers. That doesn't remove the code review requirement.

Something that would remove the code review requirement is backing out whatever triggered the regression and needinfo'ing the patch author. If the failure isn't high enough frequency to trigger a backout of its cause, then it should be able to wait for the normal code review requirements.

:dbaron, we have had this process in place for over a year and this is the first time there is a question about it. We disabled a test here and we have done this hundreds of times. I am happy to push on finding more appropriate triage owners, or asking for needinfo's to be resolved faster. The rules are simple and if we should consider changing them, I would like to bring this up in a larger forum. It sounds like the problem here is a lack of appropriate response to needinfo requests and proper triage.

Uhm, from experience, in 90% of the cases, triage owner = module owner. It's possible that there were recent changes and this was not updated in Bugzilla?

We did not backout because we only had retriggers as a proof of where the failure started, since we did not receive any other replies on the bug for confirmation.

Anyways, if you have suggestions or want us to change our working ways, I think :jmaher is the right approach here.

Ah, sorry Joel, did not see your comment before I posted. Thanks.

Pretty sure I've asked questions about it before, so I don't think this is the first time. I think disabling tests without the developers responsible for the code knowing about it isn't OK, and this is not the first time I've brought it up. We have tests for a reason, and randomly disabling them doesn't serve that reason. Some of the tests are substantially more critical than others (in terms of importance of what they test and in terms of lack of other tests covering the same thing), and it's good to consult people with that knowledge before disabling them.

Based on a sample of about 10 components, triage owners in Core are mostly engineering managers; a minority of those engineering managers are peers or owners.

And, to be clear, the triage owners are the right people to triage and prioritize issues, but they're often not the right people to review code changes.

David, in this particular case, how could we have known what developer is responsible for the code?

Also, for future references, how can we figure out what developer is responsible? (I'm referring to the situations where we don't receive replies to our ni)

The failure is still occurring even after the disable patch was landed.
Central: http://tinyurl.com/y2u2n4me

Mozilla-Inbound: https://tinyurl.com/y3lwxxzn

Kartikaya: should we reopen this or file a new bug?

Flags: needinfo?(kats)

Some combination of https://wiki.mozilla.org/Modules/All and file history in the repository.

(In reply to Andreea Pavel [:apavel] from comment #32)

Kartikaya: should we reopen this or file a new bug?

I landed another patch on bug 1529288 to increase the fuzz a bit more. Please let me know if it still happens again.

Flags: needinfo?(kats)

(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #34)

(In reply to Andreea Pavel [:apavel] from comment #32)

Kartikaya: should we reopen this or file a new bug?

I landed another patch on bug 1529288 to increase the fuzz a bit more. Please let me know if it still happens again.

No other failures since the 23rd of February, when you landed this patch.

Thank you.

You need to log in before you can comment on or make changes to this bug.