WPT expected-crashes get highlighted as intermittent failures, if they're followed by an intermittent failure
Categories
(Testing :: web-platform-tests, defect)
Tracking
(Not tracked)
People
(Reporter: dholbert, Unassigned)
References
Details
[Spinning this off from bug 1876001]
If you have a WPT that is annotated "expected-crash", then we shouldn't ever highlight a crash-in-that-test as something to file an intermittent bug about. This is fine for most WPT tasks because we in fact appear to skip WPTs if entirely, if they're annotated as expected-crash. [EDIT: this skipping seems intentional & reasonable; see bug 1572820]
However, for web-platform-tests-backlog tasks, we don't skip the test -- we run it and let it crash. This is still fine, because that doesn't turn the log orange -- BUT, if the task is orange for some other reason (e.g. another test intermittently fails), then **TreeHerder parses out the crash and highlights it as being one of the unexpected-problems in the log (possibly the first displayed issue), even though it's in fact entirely expected and unrelated to why the task was judged to be orange.
This resulted in the mistaken/unnecessary filing of bug 1876001 (as an example) which was filed for what-treeherder-shows-to-be-a-bug-worthy-crash (even though the tests in question are annotated as expected-crash).
I don't know exactly how far back this issue goes; maybe forever? I found at least one log from December 2023 that shows this problem (a PROCESS-CRASH mistakenly highlighted).
Ideally we should the log-parser or whatever, so this so that expected crashes don't get highlighted by treeherder in orange logs. (Not sure if that'd require a fix in our WPT harness, in TreeHerder, or something else completely. I'm filing this bug under the WPT component, but please feel free to reclassify as-appropriate; thanks!)
| Reporter | ||
Comment 1•1 year ago
•
|
||
I pushed some dummy tests to Try that I was hoping to trigger the issue with, but it turned out one of my tests got skipped for some reason, so my tests didn't actually trigger the issue. Other preexisting tests did happen to trigger the issue, though, helpfully :) so my Try run does still demonstrate the problem.
https://treeherder.mozilla.org/jobs?repo=try&resultStatus=testfailed%2Cbusted%2Cexception%2Csuccess%2Crunning%2Cpending%2Crunnable&revision=ea66c4cddfe3c8bef3803b268bccbe97b11d0c66&searchStr=backlog&selectedTaskRun=XMpVlcYgRlamjOvTQdkwVw.0
Some of the W-b(wpt1) tests on my try run are orange, and they show this exact issue happening with tests that are in-tree. e.g. this log shows these five sheriff-attention-worthy / bug-worthy items as being highlighted from one of the orange logs:
[ x ] FATAL ERROR: Non-local network connections are disabled and a connection attempt to google.com (142.250.125.113) was made.
[ x ] PROCESS-CRASH | Attempting to connect to non-local address! opener is [unknown], uri is [https://google.com/] [@ mozilla::net::nsHttpChannel::OnStartRequest] | /html/semantics/embedded-content/the-iframe-element/sandbox-top-navigation-child.tentative.sub.window.html
[ x ] FATAL ERROR: Non-local network connections are disabled and a connection attempt to google.com (142.250.125.113) was made.
[ x ] PROCESS-CRASH | Attempting to connect to non-local address! opener is [unknown], uri is [https://google.com/] [@ mozilla::net::nsHttpChannel::OnStartRequest] | /html/semantics/embedded-content/the-iframe-element/sandbox-top-navigation-escalate-privileges.tentative.sub.window.html
[ x ] TEST-UNEXPECTED-TIMEOUT | /idle-detection/idle-detection-allowed-by-permissions-policy.https.sub.html | Inherited header permissions policy allows dedicated workers. - Test timed out
Only the last item there is an actual unexpected test outcome, and the other items are all expected, but there's no way for a sheriff to easily tell that.
Comment 2•1 year ago
|
||
typically sheriffs annotate the first failure, unless they know it is something to skip and then they go down to the first failure that is not expected. Yeah, confusing.
There are no failures with the NEW flag set on the highlighted job, so most likely those exact failures have been seen before and are not of concern.
| Reporter | ||
Comment 3•1 year ago
•
|
||
(adding relation to bug 1572820 which is where this wpt "backlog" task originated)
(In reply to Joel Maher ( :jmaher ) (UTC -8) from comment #2)
typically sheriffs annotate the first failure, unless they know it is something to skip and then they go down to the first failure that is not expected. Yeah, confusing.
There are no failures with the
NEWflag set on the highlighted job, so most likely those exact failures have been seen before and are not of concern.
This status-quo seems like it prevents intermittent bugs from getting filed for the same (expected/not-actually-intermittent) crash, but it seems like it dooms us to potentially getting at least one new intermittent bug filed for any new expected-crash that gets added to WPT (e.g. by other browser vendors, with the crash coming from an inadvertent remote-resource-load, for example).
Once an expected-crash WPT has been imported: the first time it happens to coincide with another later-in-the-log intermittent failure, it'll potentially be the first highlighted log-entry and will have the NEW flag, and will result in a bug being filed, and a possible goose-chase on that bug to identify who introduced this new crash (which is not actually/exactly a new crash).
Comment 4•1 year ago
|
||
it seems like the proper resolution here is that if we are expecting crashes, then we shouldn't have them visible via treeherder.
| Reporter | ||
Comment 5•1 year ago
•
|
||
(In reply to Joel Maher ( :jmaher ) (UTC -8) from comment #4)
it seems like the proper resolution here is that if we are expecting crashes, then we shouldn't have them visible via treeherder.
Yup, agreed. (And they're not, in green logs; they're only visible in logs that are orange. I think that's because we only kick off the the log-parser if the task overall is judged to have failed?)
In the meantime, I also wanted to double-check and document best-practices for what to do with intermittent bugs (like bug 1876001) that are spawned as-described-here. Should we resolve them as INVALID since the highlighted failure is not actually intermittent and not unexpected?
Comment 6•1 year ago
|
||
invalid is just fine!
Comment 7•1 year ago
|
||
for reference, the failures are still printed on wpt-b green tasks, we just don't mark the task as failing, so there is no default view of the failures.
wpt-b should have intermittents or changed conditions if there is a TEST-UNEXPECTED ?
| Reporter | ||
Comment 8•1 year ago
•
|
||
(In reply to Joel Maher ( :jmaher ) (UTC -8) from comment #7)
for reference, the failures are still printed on wpt-b green tasks, we just don't mark the task as failing, so there is no default view of the failures.
Ah, right -- the (expected) crash is shown in the parsed-log, in green tasks, in my try run for my intentionally-crashing test in
https://treeherder.mozilla.org/jobs?repo=try&resultStatus=testfailed%2Cbusted%2Cexception%2Csuccess%2Crunning%2Cpending%2Crunnable&revision=ea66c4cddfe3c8bef3803b268bccbe97b11d0c66&searchStr=backlog&selectedTaskRun=SCQguHzzQEeEo-jDa5xlpA.0
...but it doesn't show up on the actual TreeHerder view of the push.
wpt-b should have intermittents or changed conditions if there is a
TEST-UNEXPECTED?
I'm not fully up on our test logging magic-strings, but that sounds reasonable (filtering for TEST-UNEXPECTED, at least specifically for this test suite where we're anticipating lots of expected-crashes.)
On the flip side, see also bug 1881681 which I filed for crashes that happen to occur after we've navigated away from a test. Ideally we shouldn't make it harder to catch those (though if we do so in a wpt-backlog-specific manner, that's probably fine since these are known-cursed tests).
| Reporter | ||
Comment 9•1 year ago
•
|
||
Here's a Try run with a patch that can be used to intentionally trigger this bug:
https://treeherder.mozilla.org/jobs?repo=try&revision=3d532df5c91cf59731c93fe5fc97d7008345548d&searchStr=backlog
For the tests added in that Try patch...
- The Linux/macOS
W-b(wpt2)tasks show this issue. - The Windows
W-b(wpt1)task shows this issue. - The Android
W-b(wpt3)task includes these tests but does not show this issue (apparently because the crash doesn't get picked up at all by the log-highlighter there).
Comment 10•1 year ago
|
||
we in fact appear to skip WPTs if entirely, if they're annotated as expected-crash
FWIW I don't think we ought to do that (although I understand that we recently changed things to only schedule those tests in backlog jobs). It seems to me that it just makes it easier for us to end up with situations where we fix a crash without updating the metadata and the first time we know about it is when the job reaches central.
Anyway, from the wpt perspective it's hard to know how to improve here. We always want to log the crash because a crash is always a bug that we should fix, and having the crash dump in the logs is helpful (e.g. when we import a test that tries to access an external site, being able to easily see the crash logs immediately identifies the problem, without having to run a try job with the expectation removed). So we log those as TEST-CRASH (no UNEXPECTED) but also call mozcrash which logs the PROCESS CRASH part.
The log parser doesn't know anything about wpt jobs, or have any concept that a crash can be expected. That makes a lot of sense, if you see the string CRASH outside the context of a specific test harness that's probably an error you want to flag. Also, even for a passing job, if you want to investigate any expected crashes, flagging the PROCESS CRASH lines is helpful in allowing you to job directly to the stack in the log viewer.
So I think if you wanted to fix this without regressing other things you'd need to pass through the fact that the crash is expected to mozcrash and then to the log formatter, so the TBPL formatter could produce some different string. Ideally the log parser would then be updated so that string would be picked up but would get an additional flag so it wouldn't appear in the treeherder failure summary, but would appear in the parsed log view.
| Reporter | ||
Comment 11•1 year ago
|
||
(In reply to James Graham [:jgraham] from comment #10)
we in fact appear to skip WPTs if entirely, if they're annotated as expected-crash
FWIW I don't think we ought to do that (although I understand that we recently changed things to only schedule those tests in backlog jobs).
(Right, this^ was just me being confused & not realizing what the the backlog jobs were for. After running across bug 1572820, the setup makes more sense.)
Comment 12•1 year ago
|
||
The severity field is not set for this bug.
:jgraham, could you have a look please?
For more information, please visit BugBot documentation.
Updated•1 year ago
|
Comment 13•1 year ago
|
||
I think this is solvable, I had added to mozcrash a quiet parameter to most of the entry points. This is used and intended for "If you are expecting a crash, don't output stuff".
I see we call mozcrash.log_crashes() here. If we could determine at that point in the code if this was expected or not, then we could pass in quiet=True.
For reference, mozcrash has log_crashes defined here.
:jgraham, is there a way to determine if there is an expected status of crash?
Comment 14•1 year ago
|
||
More or less, yes. We end up there via check_crash. Looking at callers of that https://searchfox.org/mozilla-central/source/testing/web-platform/tests/tools/wptrunner/wptrunner/testrunner.py#603 is really checking for startup crashes. Those shouldn't be expected. https://searchfox.org/mozilla-central/source/testing/web-platform/tests/tools/wptrunner/wptrunner/testrunner.py#735 is checking for a crash after a test ran, so we do know the expected result of the crash.
I note that not logging at all the fact that we had a crash — even if it's marked as expected — seems like a serious regression to me. We really want these to be fixed, and being able to find them in the logs and get the signature and stacktrace is hugely useful. If we logged them as e.g. PROCESS KNOWN-CRASH and excluded those lines only from the treeherder logic used to identify intermittents, that would be a much better solution than just suppressing all the output.
Comment 15•1 year ago
|
||
I don't understand how having a bug indicating a crash exists, then basically ignoring it via manifest changes is different from not printing out info of the crash when we know it exists and a bug is on file. The developer is already notified and we can remove noise from everyone else to reduce confusion.
These are all on backlog jobs, so possibly we could set a flag to only do this on backlog jobs? If not desired, then we should resolve this bug as wontfix.
Description
•