Closed Bug 1124091 Opened 11 years ago Closed 11 years ago

Intermittent media-src-7_1.html | Violation report status OK. - Test timed out

Categories

(Core :: DOM: Security, defect)

x86
Windows 7
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla39
Tracking Status
firefox37 --- unaffected
firefox38 --- fixed
firefox39 --- fixed
firefox-esr31 --- unaffected

People

(Reporter: cbook, Assigned: jgraham)

References

()

Details

(Keywords: intermittent-failure)

Attachments

(1 file)

Windows 7 32-bit mozilla-inbound opt test web-platform-tests-2 https://treeherder.mozilla.org/logviewer.html#?job_id=5699377&repo=mozilla-inbound 00:30:18 INFO - TEST-UNEXPECTED-TIMEOUT | /content-security-policy/media-src/media-src-7_1.html | Violation report status OK. - Test timed out
Sid, this is insanely frequent. Can you please help find an owner?
Flags: needinfo?(sstamm)
Hmm... I'm not familiar with these tests, which appear to be imported from the w3c tests on github. https://github.com/w3c/web-platform-tests James: I think you did the import in bug 1118722. What's the plan with the imported tests that time out (See RyanVM's comment 502)?
Flags: needinfo?(sstamm) → needinfo?(james)
(In reply to Sid Stamm [:geekboy or :sstamm] from comment #508) > Hmm... I'm not familiar with these tests, which appear to be imported from > the w3c tests on github. https://github.com/w3c/web-platform-tests > > James: I think you did the import in bug 1118722. What's the plan with the > imported tests that time out (See RyanVM's comment 502)? Umm, I don't have a plan as such. Someone who understands both the spec and the code needs to look at these tests and determine if the timeouts are a problem with the tests or a problem with our implementation. It *might* just be that they need a longer timeout (I thought I added one, but it seems like there are new tests since then). It is, of course, possible to just disable them, but I would prefer to try to address problems rather than just sweeping them under the carpet, where possible.
Flags: needinfo?(james) → needinfo?(sstamm)
(In reply to James Graham [:jgraham] from comment #514) > It is, of course, possible to just disable them, but I would prefer to try > to address problems rather than just sweeping them under the carpet, where > possible. In the future, it would be nice to do a bit more to identify which tests are going to cause problems like this before landing them (via more try runs and/or some input from domain experts before landing instead of after landing). Can you disable the timing-out CSP tests and file a new bug to re-enable the disabled tests? This will fix the breakage on trunk and give us time to vet the accuracy/quality of the tests you imported. We have lots of CSP tests already, so I am thinking it's a bug in the test causing this timeout. Once this new bug is filed, need-info me on that bug and if I can't get to figuring out how to re-enable it, I'll bring in someone else to help. I understand that you need someone who knows CSP to take a peek, but we're all pretty busy on other stuff, and this is the best way for us to move forward.
Flags: needinfo?(sstamm) → needinfo?(james)
(In reply to Sid Stamm [:geekboy or :sstamm] from comment #530) > (In reply to James Graham [:jgraham] from comment #514) > > It is, of course, possible to just disable them, but I would prefer to try > > to address problems rather than just sweeping them under the carpet, where > > possible. > > In the future, it would be nice to do a bit more to identify which tests are > going to cause problems like this before landing them (via more try runs > and/or some input from domain experts before landing instead of after > landing). There is a tradeoff here. It's an explicit goal to make upgrading to latest upstream as painless as possible so that we can stay up to date and make local changes that will upstream easily. Blocking on domain expert's review when tests land would utterly destroy our ability to perform upgrades by adding delays of at least days and more probably weeks to each upgrade. When we do the upgrades today there are a minimum of two try runs on every supported platform. This more than is currently required to land e.g. mochitests, so I think it's entirely reasonable as a standard. Obviously the more try runs we do beforehand the lower the chance of problems later, but the downside is again the upgrades become significantly slower and more painful. At some point we just have to land, and sort out the small fraction of tests that cause problems afterwards. > Can you disable the timing-out CSP tests and file a new bug to > re-enable the disabled tests? This will fix the breakage on trunk and give > us time to vet the accuracy/quality of the tests you imported. We have lots > of CSP tests already, so I am thinking it's a bug in the test causing this > timeout. OK. The main issue seems to be that the tests are checking that they get a violation report. Do our existing tests do that? > Once this new bug is filed, need-info me on that bug and if I can't get to > figuring out how to re-enable it, I'll bring in someone else to help. I > understand that you need someone who knows CSP to take a peek, but we're all > pretty busy on other stuff, and this is the best way for us to move forward. In general I am wary of disabling tests and hoping that people will fix them, or the code, later. It turns out that people *always* have other things to work on and investigating a disabled test is very easy to put off. This isn't aimed at you specifically, of course. I'm just explaining why reaching for the "disable" button is more of a last resort than a first move.
Flags: needinfo?(james)
Thanks for the context, it's helpful. I hope you understand why I'm not very excited to look at a test I've not really verified before and is causing a frequent orange suddenly but due to no changes in our CSP code. :-/ Maybe I'm being a bit unreasonable, but there's gotta be a better way to handle intermittent wpt failures like this than a sheriff like RyanVM flagging me to investigate... (In reply to James Graham [:jgraham] from comment #531) > There is a tradeoff here. It's an explicit goal to make upgrading to latest > upstream as painless as possible so that we can stay up to date and make > local changes that will upstream easily. Blocking on domain expert's review > when tests land would utterly destroy our ability to perform upgrades by > adding delays of at least days and more probably weeks to each upgrade. I see, and agree. Blocking is bad. How do we handle bugs/problems in the imports? This is kind of the same question I asked you initially in this bug: if we import tests that are possibly broken, what do we do? Based on this bug, I believe our strategy is "import it, let the test go orange, and get someone to fix it." Can we adopt a process that's more like bug fixing and less like putting out a fire? > When we do the upgrades today there are a minimum of two try runs on every > supported platform. Ah, great, that's a completely reasonable standard. I assume the test code is reviewed? And someone (you?) owns the imported code and is responsible for fixing bugs and upstreaming fixes for issues that pop up (like this)? I'm not passing judgement here on what we're doing, I'm honestly curious who takes responsibility for the code quality now that we're building against these tests. > OK. The main issue seems to be that the tests are checking that they get a > violation report. Do our existing tests do that? Some tests here: http://mxr.mozilla.org/mozilla-central/source/dom/base/test/csp/ Some of them, but that's the slow/hard way, and violation report format is not clearly defined in the spec. Additionally, violation reporting is asynchronous and tests that use it are error-prone due to the need to wait for the reports. We have other ways to test if CSP blocked something that is faster and not as error-prone. Can the tests instead use DOM events or something to identify when something *should* have been loaded and then inspect to see if it was? Or like for script tags where we can use onload/onerror handlers? > In general I am wary of disabling tests and hoping that people will fix > them, or the code, later. Sure, you're absolutely right. But having this frequent orange isn't going to find me more time to fix this, and it's just gonna make our sherrifs (and the gecko engineer who helps fix the problem) grumpy. By the way, I think it is really awesome that we have these tests. I'm just grumpy. Flagging myself for needinfo so it stays on my radar.
(In reply to Sid Stamm [:geekboy or :sstamm] from comment #534) > Thanks for the context, it's helpful. I hope you understand why I'm not > very excited to look at a test I've not really verified before and is > causing a frequent orange suddenly but due to no changes in our CSP code. > :-/ Maybe I'm being a bit unreasonable, but there's gotta be a better way > to handle intermittent wpt failures like this than a sheriff like RyanVM > flagging me to investigate... Yeah, so the problem is that it's hard to know if it's a flaky test or flaky code. We have had wpt intermittents for both reasons (e.g. a very flaky wpt navigation-timing test turned out to be a serious bug in the implementation, others have simply turned out to be tests that were written in a bad way). Other than having people flagged to investigate the issues, I really don't know how to distinguish these situations. > (In reply to James Graham [:jgraham] from comment #531) > > There is a tradeoff here. It's an explicit goal to make upgrading to latest > > upstream as painless as possible so that we can stay up to date and make > > local changes that will upstream easily. Blocking on domain expert's review > > when tests land would utterly destroy our ability to perform upgrades by > > adding delays of at least days and more probably weeks to each upgrade. > > I see, and agree. Blocking is bad. How do we handle bugs/problems in the > imports? This is kind of the same question I asked you initially in this > bug: if we import tests that are possibly broken, what do we do? Based on > this bug, I believe our strategy is "import it, let the test go orange, and > get someone to fix it." Can we adopt a process that's more like bug fixing > and less like putting out a fire? If there's an obvious problem with the tests being imported, I usually just fix them. If the problem manifests later on in existing code, I think the procedure is basically what you see here i.e. needinfo me and some people with more insight into the implementation. If it becomes too much like a firedrill, the escape hatch is to disable the tests. I don't know what a better process is for new tests that target existing features. Do you have any suggested improvements? When developing new features the best approach is to treat the wpt as first class citzens, making sure that they are included in Try runs, that bugs in tests are fixed, and that as many as possible pass before launching the feature. Indeed where possible it's preferable to write web-platform-tests rather than mochitests because they can be used to ensure that we have interop with other vendors. > > When we do the upgrades today there are a minimum of two try runs on every > > supported platform. > > Ah, great, that's a completely reasonable standard. I assume the test code > is reviewed? Yes, all test code is (supposed to be) reviewed before it lands upstream. Although I just checked and it seems like the media-src tests in particular may not have gone through the normal review process, which makes me extremely cross. Having said that the person who would likely have reviewed them pushed some fixup commits, so I guess some sort of extraordinary review did occur. > And someone (you?) owns the imported code and is responsible > for fixing bugs and upstreaming fixes for issues that pop up (like this)? Yes, I own the tests in that sense, although obviously I'm not an expert on every single test. If I'm not around Ms2ger is also a good person to talk to. > I'm not passing judgement here on what we're doing, I'm honestly curious who > takes responsibility for the code quality now that we're building against > these tests. > > > OK. The main issue seems to be that the tests are checking that they get a > > violation report. Do our existing tests do that? > > Some tests here: > http://mxr.mozilla.org/mozilla-central/source/dom/base/test/csp/ > > Some of them, but that's the slow/hard way, and violation report format is > not clearly defined in the spec. Additionally, violation reporting is > asynchronous and tests that use it are error-prone due to the need to wait > for the reports. We have other ways to test if CSP blocked something that > is faster and not as error-prone. I think these tests try to do both i.e. they both check that something got blocked from script and then check that a violation report was sent. > > In general I am wary of disabling tests and hoping that people will fix > > them, or the code, later. > > Sure, you're absolutely right. But having this frequent orange isn't going > to find me more time to fix this, and it's just gonna make our sherrifs (and > the gecko engineer who helps fix the problem) grumpy. I agree that in this particular case the volume of oranges mean that something has to be done in the short term. Thanks for looking at this.
Attached patch csp.diff — — Splinter Review
Attachment #8584672 - Flags: review?(Ms2ger)
Attachment #8584672 - Flags: review?(Ms2ger) → review+
Assignee: nobody → james
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla39

Retesting, it might be worth exploring to re-enable the test.
See for yourself at http://w3c-test.org/content-security-policy/media-src/media-src-7_1.html

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: