Closed Bug 1509635 Opened 6 years ago Closed 6 years ago

Update webrender to 914d16f9a2fb8d007509894660bae9c61074ae31 (WR PR #3347)

Categories

(Core :: Graphics: WebRender, enhancement, P3)

65 Branch
enhancement

Tracking

()

RESOLVED FIXED
mozilla65
Tracking Status
firefox65 --- fixed

People

(Reporter: kats, Assigned: u480271)

References

()

Details

(Whiteboard: [gfx-noted])

Attachments

(2 files)

+++ This bug was initially created as a clone of Bug #1509592 +++ I'm filing this as a placeholder bug for the next webrender update. I may be running a cron script [1] that does try pushes with webrender update attempts, so that we can track build/test breakages introduced by webrender on a rolling basis. This bug will hold the try push links as well as dependencies filed for those breakages, so that we have a better idea going into the update of what needs fixing. I might abort the cron job because once things get too far out of sync it's hard to fully automate fixing all the breakages. When we are ready to actually land the update, we can rename this bug and use it for the update, and then file a new bug for the next "future update". [1] https://github.com/staktrace/wrupdater/blob/master/try-latest-webrender.sh
WR @ commit 914d16f9a2fb8d007509894660bae9c61074ae31 - servo/webrender#3347 - on HG rev autoland: https://treeherder.mozilla.org/#/jobs?repo=try&revision=c175b3b57eeac3ac40bed1c58095843bf4c4e972
WR @ commit 90fa51c71579ce434013953cee35a9bd159ab398 - servo/webrender#3342 - on HG rev autoland: https://treeherder.mozilla.org/#/jobs?repo=try&revision=bedbe3dc536db03af0a351875a2f42917ff0e820
WR @ commit 90fa51c71579ce434013953cee35a9bd159ab398 - servo/webrender#3342 - on HG rev b599964cc3ee: https://treeherder.mozilla.org/#/jobs?repo=try&revision=747ce99d141f380805ddae9c6d049590f49787ba
WR @ commit 90fa51c71579ce434013953cee35a9bd159ab398 - servo/webrender#3342 - on HG rev c48931864919: https://treeherder.mozilla.org/#/jobs?repo=try&revision=3d4330daee1621b2c91184d9ebd2abb2369b8cff
WR @ commit f450af9277e2474e2a2a2c1358689ca9486e2a09 - servo/webrender#3345 - on HG rev c48931864919: https://treeherder.mozilla.org/#/jobs?repo=try&revision=39f4cf8f39c1c17bb48dba99bcae328867ecdb88
WR @ commit f450af9277e2474e2a2a2c1358689ca9486e2a09 - servo/webrender#3345 - on HG rev def0fd8429f9: https://treeherder.mozilla.org/#/jobs?repo=try&revision=1b500d4283734a8f2bcfce88d6111c6ee3e3ba83
WR @ commit e2e52b1145ad959191c0612edd41b0b189cf6b59 - servo/webrender#3346 - on HG rev def0fd8429f9: https://treeherder.mozilla.org/#/jobs?repo=try&revision=44a2cc0b652af6ff67df57b599ce6365b50b20ba
There appear to be failures on a particular Linux crashtest that started with servo/webrender#3342. Looks like it might just expose some pre-existing bug with invalidation? But we'll need to resolve it or disable the test before updating to that cset.
Yup, it does look like it's become a permanent failure, I mistakenly thought we were hitting the referenced intermittent. I can't reproduce locally unfortunately. I don't see any backtrace in the logs, if I'm looking in the right place?
WR @ commit 35027d93aded8c0a7887dadc8aef5e393171e802 - servo/webrender#3348 - on HG rev def0fd8429f9: https://treeherder.mozilla.org/#/jobs?repo=try&revision=59203a81633a10a976a12cb59fb83d5dbbaf4951
(In reply to Glenn Watson [:gw] from comment #9) > Yup, it does look like it's become a permanent failure, I mistakenly thought > we were hitting the referenced intermittent. > > I can't reproduce locally unfortunately. I don't see any backtrace in the > logs, if I'm looking in the right place? It's not actually crashing, it just fails to complete. It was intermittent before (bug 1500458) so most likely it's done running assumption in the test that the WR code path fails to satisfy. The try push in comment 10 has the test disabled, assuming that is green and the failure doesn't just move elsewhere, I'm ok to land with that. I can try to reproduce the failure locally and investigate.
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #11) > so most likely it's done running assumption in the test That should be "... so most likely there's some timing assumption ..." (silly phone autocorrect) Also apparently it happens on two different tests (which I didn't notice before) and I only disabled one. I'll disable the other one too and do another try push. So far I haven't been able to reproduce the problem locally but I'll try harder tomorrow.
WR @ commit 35027d93aded8c0a7887dadc8aef5e393171e802 - servo/webrender#3348 - on HG rev def0fd8429f9: https://treeherder.mozilla.org/#/jobs?repo=try&revision=26ace37bd2293e57c82aae2f1fe5c3834aa9864e
Also servo/webrender#3346 added a couple of failures on the windows reftests - looks fuzzable, :gw can you confirm that's ok to fuzz?
Yup, these are fine to fuzz - they are also fuzzed on other renderer backends.
WR @ commit 35027d93aded8c0a7887dadc8aef5e393171e802 - servo/webrender#3348 - on HG rev 14ae1910a4f5: https://treeherder.mozilla.org/#/jobs?repo=try&revision=554206afe4d7b6f8d59b9921abc756f23a7bd9bb
WR @ commit 05bdcae134d73aca7bb48358e91de1f8aef27773 - servo/webrender#3354 - on HG rev 14ae1910a4f5: https://treeherder.mozilla.org/#/jobs?repo=try&revision=3f59f054992a82dc8cf516289c7d976c9af3d79f
Disabling the two crashtests just moved the failure down to some other crashtests. I don't really want to disable a slew of crashtests so I'll try debugging it first. I might end up landing some WR PRs out of order while I try to sort this out. I still haven't successfully reproduced the failure locally so I'll do try pushes with logging to try and track it down.
Noticed that the 972199-1.html seems to trigger a deadlock or infinite loop or something similar. The test does an AdvanceTimeAndRefresh which triggers a sync IPC to the compositor and that blocks on FlushRendering() which blocks. When the harness kills firefox, the stacks show that the render thread is busy [1] which is presumably why the FlushRendering() doesn't return and the test is hanging. It might be that we're hitting a bug in mesa which is only triggered on the try servers, which is why I don't see it locally. Certainly the top frames of the renderer stack are in swrast_dri.so. [1] https://treeherder.mozilla.org/logviewer.html#?job_id=213688710&repo=try&lineNumber=21591
Looks like running just the failing tests in isolation makes the failure go away: https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&searchStr=crashtest&selectedJob=213859306&revision=8c491545a63b500c181872baaa000e03059d79be So that's annoying. It means that some previous test is setting up some state that's triggering the failure. I'll try to trim down the set of tests that need to be run while also trying to reproduce this locally by running the tests inside the same docker image that is used on the try server.
WR @ commit 05bdcae134d73aca7bb48358e91de1f8aef27773 - servo/webrender#3354 - on HG rev 6c10213a8924: https://treeherder.mozilla.org/#/jobs?repo=try&revision=b423cffb6a26ab172d9c254049944ffc0091235f
I can reproduce the problem in a docker container, so now I'm trying to bisect and narrow down the minimal set of tests that trigger the problem.
WR @ commit f3e489eebe9ffd5229c93aa4e17f4c3a7e6cb31d - servo/webrender#3349 - on HG rev 6c10213a8924: https://treeherder.mozilla.org/#/jobs?repo=try&revision=dd9d7bcbea2e8322b44c397364c8c64050f75b18
The bad news is that I'm having trouble getting a minimal set of tests. It seems like when crashtests run, the harness doesn't actually block on the compositor for the most part, so it just loads page after page and the compositor asynchronously goes about rendering things. But if the compositor is too slow then I guess we skip over some of the tests entirely. So this produces some sort of intermittent behaviour. One (or more) of the tests that may or may not get rendered in the compositor seems to get the renderer thread wedged in an infinite loop (or maybe just a really long-running loop) in draw_tile_frame. And then the next test that does any sync IPC to the compositor manifests the failure. The good news is that I'm fairly sure this isn't an osmesa bug, because I was able to attach gdb to the firefox process while the renderer thread was doing its infinite loop, and I was able to `fin` my way out of the swrast_dri.so stack frames, but couldn't `fin` out of the draw_tile_frame stack frame. It was tricky enough getting gdb working in the docker image, but I'll see if I can do a build of FF inside the image and get some more info, because right now I'm not getting a lot of useful info out of gdb. For those following along, here are some steps to reproduce what I did: https://gist.github.com/staktrace/a83dd0d66e29f0d049cc6b16d6cf71b2 (note that the "magic" bits like the task-id for the docker image, env vars, and the command to run can all be found on the task details page, e.g. https://tools.taskcluster.net/groups/EfFrMMd8QAyY9bg27ZoXcA/tasks/Sbohi5I9TH2QBypUncosTQ/details for this case)
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #24) > The bad news is that I'm having trouble getting a minimal set of tests. It > seems like when crashtests run, the harness doesn't actually block on the > compositor for the most part, so it just loads page after page and the > compositor asynchronously goes about rendering things. I'm going to try and fix this first since it should be easy and point us more directly to a culprit test.
WR @ commit 05d4eccfa6dd7f667a1f74b12134257a85bea047 - servo/webrender#3350 - on HG rev f38d34679027: https://treeherder.mozilla.org/#/jobs?repo=try&revision=c370cb0f5f64fc6d75c3633dad7d8ba8bc937d33
WR @ commit 05d4eccfa6dd7f667a1f74b12134257a85bea047 - servo/webrender#3350 - on HG rev a60b595747ad: https://treeherder.mozilla.org/#/jobs?repo=try&revision=3d60c565b962a9aaa275b824bdb73a6d4738b788
WR @ commit 235273012e08230c07a214e907175c535206098d - servo/webrender#3356 - on HG rev a60b595747ad: https://treeherder.mozilla.org/#/jobs?repo=try&revision=3c23549ac8c022471ca52085f622c149cf0ec509
The problematic crashtests are actually the large-border-radius-* tests in layout/generic/crashtests. With those disabled the crashtests are green (see last couple of try pushes above, which include a patch to disable those tests). I discussed with :gw on IRC, and I'll land the WR updates with those tests disabled on linux for now, and he'll fix it in upstream WR.
Alias: wr-future-update
Assignee: nobody → dglastonbury
No longer depends on: 1500458, 1510026
Summary: Future webrender update bug → Update webrender to 914d16f9a2fb8d007509894660bae9c61074ae31 (WR PR #3347)
Depends on D13021
Pushed by kgupta@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/51e713e72d92 Update webrender to commit 914d16f9a2fb8d007509894660bae9c61074ae31 (WR PR #3347). r=kats https://hg.mozilla.org/integration/autoland/rev/9f0228da2763 Re-generate FFI header. r=kats
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla65
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: