Several Windows opt failures after the latest wpt-sync
Categories
(Testing :: web-platform-tests, defect)
Tracking
(Not tracked)
People
(Reporter: SerbanS, Assigned: Sasha)
References
Details
Attachments
(1 file)
Hi Alexandra! Could you please take a look at these wpt failures? It seems that they're a fallout of the latest wpt-sync and they're failling in every instance but with several failure lines, as it can be seen here. This is happening only on Autoland. A separate instance it's happening on main too but on Windows 11 24H2 Shippable, as it can be seen here.
Thank you!
Assignee | ||
Comment 1•2 months ago
|
||
Assignee | ||
Comment 2•2 months ago
|
||
I can see that /event-timing/click.html
failures come from wptsync landing, but I'm not sure about everything else. I've created the patch for /event-timing/click.html
.
Comment 4•2 months ago
|
||
bugherder |
Updated•2 months ago
|
Comment hidden (Intermittent Failures Robot) |
Reporter | ||
Comment 6•2 months ago
|
||
The failures in /event-timing/click.html indeed seem to have been resolved but the other failures are still present. After some more digging, Bug 1937025 looks to be the one who started this, as it can be seen in these ranges of retriggers and backfills: r1, r2, r3.
Hi Bob! Could you please take a look at this?
Thank you!
Reporter | ||
Updated•2 months ago
|
Comment 7•2 months ago
•
|
||
I've run those jobs 5 times each on my final try push and didn't get any failures, so I'm a bit confused as to why these changes could have caused those failures.
https://treeherder.mozilla.org/jobs?repo=try&duplicate_jobs=visible&revision=a7b3fcfa81b63b10747c27fdb7e77b8a08371c66
Comment 8•2 months ago
|
||
Pushed central and with the main patch from bug 1937025 backed out to try:
https://treeherder.mozilla.org/jobs?repo=try&revision=bbe636c27daa704736b76524dd0c809d4e86e89f
https://treeherder.mozilla.org/jobs?repo=try&revision=c554116ff2854fd88e3c7b1f7c4b2e5b2b20d213
There do appear to be fewer failures with the back-out.
One of the issues is that we don't seem to get crash dumps a lot of the time.
When we do, the failures seem to be bug 1979494 and bug 1977225.
Crashing here for 1979494 because oldCurrentEntry
is null.
Crashing here for 1977225.
Looking at the dumps, the root cause in both cases seems to be GetCurrentEntry
returning null because mCurrentEntryIndex
is not set.
Hard to see why the chromium sandbox update would actually cause these failures, so perhaps it is some sort of timing issue.
smaug - what do you think?
Can these be fixed by null checks or is there a deeper issue?
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 11•2 months ago
|
||
Given that there didn't appear to be an issue on my final try push, I've bisected on try to find what else it is interacting with to cause the failures.
It ends up on Bug 1977364 - Remove support for ducktyped errors.
Try push with my changes on top of first patch for that bug:
https://treeherder.mozilla.org/jobs?repo=try&revision=1523186cd8154e066d337bcca2adc919c8a756f4
Try push on top of second patch:
https://treeherder.mozilla.org/jobs?repo=try&revision=463208c564e47e1102ac8d629a3c98c08d1bd873
So, I've pushed two more try runs with current central and the pref set to true and both patches backed out:
https://treeherder.mozilla.org/jobs?repo=try&revision=a101b1c8f4720fd2b015ec670ea01753ac39904b
https://treeherder.mozilla.org/jobs?repo=try&revision=1774847716012e2757588acbede05b26670348f9
Looks like both of these resolve the crashes.
tschuster: any idea how these might interact?
In theory there shouldn't be any sandboxing changes here, but there could of course be timing changes.
The chromium update also produced some memory improvements: bug 1937025 comment 42.
Reporter | ||
Comment 12•2 months ago
|
||
FWIW, this seems to be resolved now. The last push where these failures appeared was this one. What came after that were clear from these failures.
Comment 13•2 months ago
|
||
(In reply to Serban Stanca [:SerbanS] from comment #12)
FWIW, this seems to be resolved now. The last push where these failures appeared was this one. What came after that were clear from these failures.
Seems it's still failing on some builds:
https://treeherder.mozilla.org/jobs?repo=autoland&revision=ba74f4e0d9bfcdbb4c2e3ec2267632a65dd8a2d5
https://treeherder.mozilla.org/jobs?repo=autoland&revision=d970c2612f53e692c05a34890ea5896f30ed5593
There are clean ones either side of these.
Comment 14•2 months ago
|
||
Sorry, I have no idea why this would crash. It's of course problem that we changed the reported error message somewhere and the code can't deal? Is this is even a "real" crash?
INFO - NoSuchWindowException on command, setting status to CRASH
INFO - TEST-UNEXPECTED-CRASH | /html/document-isolation-policy/credentialless-shared-worker.https.tentative.window.html?request_origin=same_origin&worker_dip=none&window_dip=none | expected OK
INFO - TEST-INFO took 805ms
INFO - PID 8384 | JavaScript error: chrome://remote/content/shared/webdriver/Certificates.sys.mjs, line 79: NS_ERROR_NOT_AVAILABLE: Component returned failure code: 0x80040111 (NS_ERROR_NOT_AVAILABLE) [nsICertOverrideService.setDisableAllSecurityChecksAndLetAttackersInterceptMyData]
INFO - Browser exited with return code 572
Updated•2 months ago
|
Comment hidden (Intermittent Failures Robot) |
Comment 16•2 months ago
|
||
So it does still happen. I don't really know how to proceed here.
Comment 17•2 months ago
|
||
farre, see comment 8.
Need to get some Navigation API crashes fixed ;)
Comment 18•2 months ago
|
||
(In reply to Olli Pettay [:smaug][bugs@pettay.fi] from comment #17)
farre, see comment 8.
Need to get some Navigation API crashes fixed ;)
Yeah, looking at it now. It's not a null check that's needed, it's a deeper issue.
Comment 19•1 month ago
|
||
I had another look at this. The following failure in Bob's Try push seems interesting:
TEST-UNEXPECTED-FAIL | /workers/interfaces/WorkerUtils/importScripts/report-error-cross-origin.sub.any.worker.html | WorkerGlobalScope error event: lineno - assert_equals: expected 8 but got 0
TEST-UNEXPECTED-FAIL | /workers/interfaces/WorkerUtils/importScripts/report-error-cross-origin.sub.any.sharedworker.html | WorkerGlobalScope error event: filename - assert_equals: expected "http://web-platform.test:8000/workers/interfaces/WorkerUtils/importScripts/report-error-helper.js" but got ""
These exact symptoms happen when JS::ErrorReportBuilder::maybeCreateReportFromDOMException
returns nullptr
. Of course it's not clear why this would happen non-deterministically. The ini file for this test does that show that even before my changes it seems to be highly unreliable.
Description
•