Closed Bug 1385003 Opened 2 years ago Closed 2 years ago

Update webrender to e68c8acb021656440d26ac46e705e7ceb31891e6

Categories

(Core :: Graphics: WebRender, enhancement, P3)

57 Branch
enhancement

Tracking

()

RESOLVED FIXED
mozilla57
Tracking Status
firefox57 --- fixed

People

(Reporter: kats, Assigned: kats)

References

(Blocks 2 open bugs)

Details

(Whiteboard: [gfx-noted])

Attachments

(10 files, 4 obsolete files)

59 bytes, text/x-review-board-request
jrmuizel
: review+
Details
59 bytes, text/x-review-board-request
jrmuizel
: review+
Details
59 bytes, text/x-review-board-request
kvark
: review+
Details
59 bytes, text/x-review-board-request
sotaro
: review+
Details
59 bytes, text/x-review-board-request
kats
: review+
Details
59 bytes, text/x-review-board-request
jrmuizel
: review+
Details
59 bytes, text/x-review-board-request
jrmuizel
: review+
Details
59 bytes, text/x-review-board-request
nical
: review+
Details
59 bytes, text/x-review-board-request
nical
: review+
Details
59 bytes, text/x-review-board-request
kvark
: review+
Details
+++ This bug was initially created as a clone of Bug #1383041 +++

I'm filing this as a placeholder bug for the next webrender update. I may be running a cron script [1] that does try pushes with webrender update attempts, so that we can track build/test breakages introduced by webrender on a rolling basis. This bug will hold the try push links as well as dependencies filed for those breakages, so that we have a better idea going into the update of what needs fixing. I might abort the cron job because once things get too far out of sync it's hard to fully automate fixing all the breakages.

When we are ready to actually land the update, we can rename this bug and use it for the update, and then file a new bug for the next "future update".

[1] https://github.com/staktrace/moz-scripts/blob/master/try-latest-webrender.sh
WR @ 97af583573b34e46763476085e4423fda97d724e, with build bustage fixed:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=18ca96cadc5ec410f1e0626ffc0a27c5c17c5519
https://treeherder.mozilla.org/#/jobs?repo=try&revision=51063bc24d4dd4d819555ca9a4ab4aa3e1b638a1

Sadly hitting assertion failures in all the debug tests. I'll look into it.
Hmm. It looks like we just make a namespace id out of thin air [1] and use that in our image keys, but WR is now expecting the namespace id to actually match its internal namespace id [2]. So that doesn't work. I haven't been following the namespace stuff - Nical or Dzmitry, do you know what needs to be done here?

[1] http://searchfox.org/mozilla-central/rev/09c065976fd4f18d4ad764d7cb4bbc684bf56714/gfx/layers/wr/WebRenderBridgeParent.h#198
[2] https://hg.mozilla.org/try/rev/e7229fe6d9151e745d78862b72855f53ba23af58#l31.349
Flags: needinfo?(nical.bugzilla)
Flags: needinfo?(kvark)
That namespacing is a bit tricky, or even hacky, should I say, in the way it used to function. Even though a `RenderApi` produced a namespace, you were able to use those interchangeably with any resources from other namespaces. Then we got a cleanup routine, which I put in the `Drop` implementation of the API. This means a stronger bond between `RenderApi` and the namespace it is associated to, so you are supposed to do all the operations within the namespaced resource with the same `RenderApi`, or otherwise it's not going to be safe to clear that namespace.

I wanted to go further and make documents namespace-bound as well, but that idea faced resistance from Servo code.

If this scheme doesn't work for Gecko, we'll need to re-evaluate how namespaces work in WR. The sooner then better.
Flags: needinfo?(kvark)
Hm, ok.

I replaced the hacky AllocIdNamespace() function with calls to WebRenderAPI::GetIdNamespace() instead which seems more legit. The sanity reftests passed locally. I'll do a try push to verify and put those patches up to get feedback.
Flags: needinfo?(nical.bugzilla)
https://treeherder.mozilla.org/#/jobs?repo=try&revision=cd9ce86ac93b9347318d8cf8bae68f86b12fd5c6

Not looking so hot, there's a rust panic instead the renderbackend thread. Any help resolving this would be great.
WebGL tests are likely fixed by https://github.com/servo/webrender/pull/1530
Image request failures may be related to https://github.com/servo/webrender/issues/1519
Replacing the AllocIdNamespace() function even on m-c tip (i.e. without the multiple document API changes) also results in the same test failures: https://treeherder.mozilla.org/#/jobs?repo=try&revision=17b3ca28e3b029d8f954e04874afa1bf53e84317

So I guess what I'm trying to do there is either wrong or exposing wrongness in WR.
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #9)
> Replacing the AllocIdNamespace() function even on m-c tip (i.e. without the
> multiple document API changes) also results in the same test failures:
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=17b3ca28e3b029d8f954e04874afa1bf53e84317
> 
> So I guess what I'm trying to do there is either wrong or exposing wrongness
> in WR.

So I spent some more time wrapping my head around this code. I think I know what the problem is, but I'm not sure how to resolve it. In a nutshell it's this:

Without the multiple-document API change, we allocate a new IdNamespace for WebRenderBridgeParent. This means that each parent-process window gets its own unique IdNamespace and each tab also gets a unique IdNamespace. And we would use the same webrender API instance to manage all the resources. The structure would look something like this:
- Toplevel browser window A, creates WR API api_a, has WebRenderBridgeParent with IdNamespace=1
  - tab 1, uses api_a, has WebRenderBridgeParent with IdNamespace=2
  - tab 2, uses api_a, has WebRenderBridgeParent with IdNamespace=3
  - tab 3, uses api_a, has WebRenderBridgeParent with IdNamespace=4
- Toplevel browser window B, creates WR API api_b, has WebRenderBridgeParent with IdNamespace=5
  - tab 4, uses api_b, has WebRenderBridgeParent with IdNamespace=6
  - tab 5, uses api_b, has WebRenderBridgeParent with IdNamespace=7

So api_a's UpdateImage function, for example, could be called a ImageKey that any IdNamespace in the range 1..4 because all of those were associated with api_a. Similarly api_b was dealing with all the things with IdNamespace in the range 5..7.

Now with the multidoc API change, we have assertions that ban this sort of behaviour [1]. Each API now asserts that all the IdNamespaces it's given match the IdNamespace for that API. To conform to this on the Gecko side, we need to do one of two things: (1) make each tab have a separate API or (2) make sure all the tabs that share an API also share the IdNamespace.

My initial fix for this went down the (2) path, but it resulted in test failures. I believe this happens because even though the IdNamespace is the same, each WebRenderLayerManager (i.e. each tab) keeps its own resource id counter [2] within that namespace, and so there are resource key collisions all over place. Previously these were in separate namespaces it wasn't a problem. Even if I made that mResourceIdCounter a static variable that would just make it unique inside the process, but tabs can live in different content processes, so there would still be collisions. The way to avoid collisions entirely would be to have the parent process coordinate the generation of resource ids, but then that leads to a bad architecture with the content processes constantly sending (sync?) IPC messages to the parent process to generate resource ids.

We might be able to implement option (1) instead and have each WebRenderBridgeParent maintain its own WebRenderAPI wrapper and separate underlying API instances. However this seems like a pretty big change and I'm not sure how that will interact with pipeline ids (will a display list in one API instance be able to embed a pipeline living in a different API instance?).

Do you guys have suggestions on what the best thing to do here is?

[1] https://github.com/servo/webrender/blob/f6d81d9349823cf89e2ccb1d3b2166918818db1b/webrender_api/src/api.rs#L336
[2] http://searchfox.org/mozilla-central/rev/09c065976fd4f18d4ad764d7cb4bbc684bf56714/gfx/layers/wr/WebRenderBridgeChild.h#170
Flags: needinfo?(sotaro.ikeda.g)
Flags: needinfo?(nical.bugzilla)
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #12)
> So api_a's UpdateImage function, for example, could be called a ImageKey
> that any IdNamespace in the range 1..4 because all of those were associated

This should say "... could be called with an ImageKey that could contain any IdNameSpace in the range 1..4 ..."

> failures. I believe this happens because even though the IdNamespace is the
> same, each WebRenderLayerManager (i.e. each tab) keeps its own resource id
> counter [2] within that namespace,

s/WebRenderLayerManager/WebRenderBridgeChild/
I prefer option (1), WebRenderBridgeChild and AsyncImagePipelineManager allocates image key independently.

I created a wip patch of option (1) based on the following try server source.

> https://treeherder.mozilla.org/#/jobs?repo=try&revision=14b36a04ae669d228d833d707fd6b33d369acf9a
Flags: needinfo?(sotaro.ikeda.g)
Implements option (1) "make each tab have a separate API". Created patch based on the following source.

> https://treeherder.mozilla.org/#/jobs?repo=tryrevision=14b36a04ae669d228d833d707fd6b33d369acf9a
Assignee: nobody → sotaro.ikeda.g
Assignee: sotaro.ikeda.g → nobody
It seems difficult to make option (2) work, since WebRenderBridgeChild allocates image keys independently.
  - "(2) make sure all the tabs that share an API also share the IdNamespace."
(In reply to Sotaro Ikeda [:sotaro] from comment #15)
> Created attachment 8892354 [details] [diff] [review]
> temporary patch - Make each tab have a separate API
> 
> Implements option (1) "make each tab have a separate API". Created patch
> based on the following source.
> 
> > https://treeherder.mozilla.org/#/jobs?repo=tryrevision=14b36a04ae669d228d833d707fd6b33d369acf9a

This patch seems to reuse the same document ID across the different RenderApi instances. I'm not sure that makes sense architecturally because the function to create the document exists on the RenderApi, so logically it seems that a RenderApi can have multiple documents but each document should be associated with a single RenderApi. I think to do this properly we will need to create a new document and RenderApi for each tab. I can update your patch to do that.
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #17)
> This patch seems to reuse the same document ID across the different
> RenderApi instances. I'm not sure that makes sense architecturally because
> the function to create the document exists on the RenderApi, so logically it
> seems that a RenderApi can have multiple documents but each document should
> be associated with a single RenderApi. I think to do this properly we will
> need to create a new document and RenderApi for each tab. I can update your
> patch to do that.

Never mind, it appears that you can't embed one pipeline into another if they don't have the same document id. So your patch is closer to what we want.
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #18)
> Never mind, it appears that you can't embed one pipeline into another if
> they don't have the same document id. So your patch is closer to what we
> want.

Yes, the document id needs to be same.
For the record, I did a try push with your patch rebased on top of mine at https://treeherder.mozilla.org/#/jobs?repo=try&revision=f58f00d4c4d8a2acb74f07463b91e4fd5c905c3c and it has a bunch of crashes/failures. I'm trying to investigate those but they're a little hard to reproduce locally and rr isn't working for me so it's going slow.
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #20)
> For the record, I did a try push with your patch rebased on top of mine at
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=f58f00d4c4d8a2acb74f07463b91e4fd5c905c3c and it has a bunch of crashes/failures.


The above has the following failures. Both seems to be caused by changed wr::WebRenderAPI's lifetime.
 - ###!!! ABORT: X_GetGeometry: BadDrawable
 - application crashed [@ mozilla::wr::LockExternalImage] 

"X_GetGeometry: BadDrawable" typically happen when gl::GLContext is used after related widget destruction. The gl::GLContext needs to be deleted before the related widget destruction.

"crashed [@ mozilla::wr::LockExternalImage] " seemed to happen since AsyncImagePipelineManager is deleted before the root wr::WebRenderAPI destruction.

Both could be addressed by fixing wr::WebRenderAPI's lifetime.
>  - application crashed [@ mozilla::wr::LockExternalImage] 

From log, it happened because AsyncImagePipelineManager held non root wr::WebRenderAPI. Then the non root wr::WebRenderAPI held root wr::WebRenderAPI. Therefore AsyncImagePipelineManager is not deleted after the root wr::WebRenderAPI destruction.

>  ###!!! ABORT: X_GetGeometry: BadDrawable

From the log, it happened AsyncImagePipelineManager was destructed after CompositorBridgeParent::StopAndClearResources() because AsyncImagePipelineManager was held by non root WebRenderBridgeParent. WebRenderBridgeParent::ClearResources() does not clear mAsyncImageManager.
Attachment #8892354 - Attachment description: temporary patch - Make each tab have a separate API → temporarl patch - Make each tab have a separate API
Assignee: nobody → sotaro.ikeda.g
Assignee: sotaro.ikeda.g → nobody
That's looking a lot better: https://treeherder.mozilla.org/#/jobs?repo=try&revision=f74ce0226f8fedb858c8827b84a95ed9c02b0ef8

Just a M(gpu) crash to deal with now. (The R7 failure was introduced by some other upstream WR change, in the range 97af583573b34e46763476085e4423fda97d724e..1ec6fb9ae47d8356920975a4be60552608dbfa1b)
The R7 failure was caused by servo/webrender#1525. I'm just going to mark the test failing for now.

The latest try push (third link in comment 26) doesn't have the same M(gpu) crash, so that was likely fixed elsewhere. There is, however, a debug crash in AsyncImagePipelineManager::Update in M-e10s(gl3). Sotaro, any thoughts on that one?
It seems that mWrBridge->AsyncImageManager() is nullptr. It could happen by attachment 8893221 [details] [diff] [review]. There is a case that CompositorBridgeParent::NotifyDidCompositeToPipeline() is called after root WebRenderBridgeParent's WebRenderBridgeParent::ClearResources() call.
This add checks of AsyncImageManager ptr. It seems address the crash.
Assignee: nobody → sotaro.ikeda.g
Assignee: sotaro.ikeda.g → nobody
Thanks! Things are looking pretty good now. I'd like to get this landed soon, so I made a PR for the wr_api change that we need.
Flags: needinfo?(nical.bugzilla)
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #31)
> Hm, still getting an intermittent null pointer dereference even with
> attachment 8893682 [details] [diff] [review], such as in
> https://treeherder.mozilla.org/logviewer.
> html#?job_id=121044999&repo=try&lineNumber=7976

This null pointer dereference seems to be addressed by attachment 8894052 [details] [diff] [review].
Assignee: sotaro.ikeda.g → nobody
I was hoping to update to WR @ a342a8eaa451793e0831213f5757dbe12ba94ed7 but that build is broken on 32-bit platforms:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=6c580180c077f22b60b4f623b98feaaaa423d687

#1557 fixed that, so if I update to WR @ b13a3b8f776b2ef4d5672173c9a3aa77c7b87936 that problem goes away. However there were more API changes in the meantime so it'll take another round of fixups before that's ready to land.
At b13a3b8f776b2ef4d5672173c9a3aa77c7b87936 we have much orange:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=fbf75eef7236f0e410ffcb766b403432f6c7797b

The regression range for this is:

*   b13a3b8 Auto merge of #1557 - glennw:fix-arm32, r=emilio
|\
| * 56ca39d Fix compilation error on arm32 platforms.
|/
*   285c405 Auto merge of #1556 - nical:tex-cache-dbg, r=glennw
|\
| * cea066f Tidy up render target and texture cache debugging.
| * c7a5b78 Merge the debug options into a single bitflags.
| * 901fd8f Add a debugging option to see the texture cache.
* |   1b54634 Auto merge of #1532 - nical:txn, r=glennw
|\ \
| |/
|/|
| * e3bfb73 Use a vector of generic updates instead of several vectors of specific updates.
| * eccae89 Add ResourceCache::update_resources.
| * 55acc96 Batch resource updates in the API.
|/
*   3102353 Auto merge of #1554 - nical:cache-delete-texture-2, r=glennw
|\
| * dba30cb Remove images from the texture cache when deleting the template.
*   bf0313b Auto merge of #1518 - Gankro:glyphs, r=glennw
|\
| * 5ab4db4 Add FastHashMap/Set to webrender_api
| * b8782e4 add a cache of all seen glyphs to the end of the display list
* 1b223cf Auto merge of #1545 - glennw:make-cache-fast-again, r=kvark
* 99043f4 Remove the requested_images and requested_glyphs hash sets.

I'll bisect a bit.
It happened as a result of b8782e4, because the BuiltDisplayListDescriptor structure is serialized across gecko IPC and the ParamTraits impl wasn't updated. I'm going to make all of those ParamTraits extend from PlainOldDataSerializer to avoid this problem happening again.
Try pushes are still going but I'd like to start getting reviews on these patches anyway.
Comment on attachment 8894991 [details]
Bug 1385003 - Use a different WebRenderAPI instance for each WebRenderBridgeParent.

https://reviewboard.mozilla.org/r/166114/#review171296
Attachment #8894991 - Flags: review?(bugmail) → review+
Comment on attachment 8892354 [details] [diff] [review]
temporarl patch - Make each tab have a separate API

Sotaro, I folded all your patches into part 5 of my series, "Use a different WebRenderAPI instance for each WebRenderBridgeParent". I kept you as the author and used myself as the reviewer. I'll mark these attachments obsolete.
Attachment #8892354 - Attachment is obsolete: true
Comment on attachment 8894989 [details]
Bug 1385003 - Update webrender bindings for API changes in WR cset f6d81d9.

https://reviewboard.mozilla.org/r/166110/#review171314

I see. So for now you temporarily bind a document together with an API. That sounds reasonable.
Attachment #8894989 - Flags: review?(kvark) → review+
Comment on attachment 8894987 [details]
Bug 1385003 - Use PlainOldDataSerializer for WR struct serialization to avoid getting them out of sync.

https://reviewboard.mozilla.org/r/166106/#review171320

Too bad PlainOldDataSerializer doesn't static_assert for IsPod yet.
Attachment #8894987 - Flags: review?(jmuizelaar) → review+
Try push uncovered another error. Gankro put up a PR to fix it, https://github.com/servo/webrender/pull/1559. I'm doing another try push with this applied.
Comment on attachment 8894988 [details]
Bug 1385003 - Update webrender to commit e68c8acb021656440d26ac46e705e7ceb31891e6.

https://reviewboard.mozilla.org/r/166108/#review171344
Attachment #8894988 - Flags: review?(jmuizelaar) → review+
Comment on attachment 8894993 [details]
Bug 1385003 - Mark counter-style-rule-clone as failing from WR cset 921bde2.

https://reviewboard.mozilla.org/r/166118/#review171346
Attachment #8894993 - Flags: review?(jmuizelaar) → review+
https://treeherder.mozilla.org/#/jobs?repo=try&revision=3949f4c794576a0d9b1d5966f296628d30bc9cc6

is looking much better. Only three oranges, all of which are pre-existing problems. I think we're good to go once Gankro's PR gets merged (hopefully no problematic changes get merged in before it).
Comment on attachment 8894992 [details]
Bug 1385003 - Autogenerated changes to go with webrender update.

https://reviewboard.mozilla.org/r/166116/#review171370
Attachment #8894992 - Flags: review?(jmuizelaar) → review+
Comment on attachment 8894990 [details]
Bug 1385003 - Drop the hacky gecko IdNamespace allocator and use the IdNamespace from WR to avoid mismatches.

https://reviewboard.mozilla.org/r/166112/#review171498
Attachment #8894990 - Flags: review?(sotaro.ikeda.g) → review+
Comment on attachment 8894994 [details]
Bug 1385003 - Update webrender bindings for API changes in WR cset 55acc967.

https://reviewboard.mozilla.org/r/166120/#review171666
Attachment #8894994 - Flags: review?(nical.bugzilla) → review+
Comment on attachment 8894995 [details]
Bug 1385003 - Update webrender bindings for API change in WR cset c7a5b78d.

https://reviewboard.mozilla.org/r/166122/#review171668
Attachment #8894995 - Flags: review?(nical.bugzilla) → review+
I'm going to update to e68c8acb021656440d26ac46e705e7ceb31891e6 to pick up Gankro's fix. I kicked off a try push with an update at that cset, but try is busted currently so I don't know if it's green or not :/
Assignee: nobody → bugmail
Summary: Future webrender update bug → Update webrender to e68c8acb021656440d26ac46e705e7ceb31891e6
Version: unspecified → 57 Branch
... and of course there's another problem. Looks like gw snuck in #1547 just before Gankro's fix merged, and that causes an assertion failure on all the debug builds.
Comment on attachment 8895393 [details]
Bug 1385003 - Add missing call to deinit the renderer before dropping it.

https://reviewboard.mozilla.org/r/166584/#review171716

Now I'm not sure why we aren't doing `deinit` in the destructor in the first place... but the change looks good!
Attachment #8895393 - Flags: review?(kvark) → review+
Pushed by kgupta@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/2c214b5c0ab9
Use PlainOldDataSerializer for WR struct serialization to avoid getting them out of sync. r=jrmuizel
https://hg.mozilla.org/integration/autoland/rev/f5b9bc1d9745
Update webrender to commit e68c8acb021656440d26ac46e705e7ceb31891e6. r=jrmuizel
https://hg.mozilla.org/integration/autoland/rev/f612b8179bca
Update webrender bindings for API changes in WR cset f6d81d9. r=kvark
https://hg.mozilla.org/integration/autoland/rev/42a7096fd6ee
Drop the hacky gecko IdNamespace allocator and use the IdNamespace from WR to avoid mismatches. r=sotaro
https://hg.mozilla.org/integration/autoland/rev/76466d164397
Use a different WebRenderAPI instance for each WebRenderBridgeParent. r=kats
https://hg.mozilla.org/integration/autoland/rev/ebb1ba076d1c
Autogenerated changes to go with webrender update. r=jrmuizel
https://hg.mozilla.org/integration/autoland/rev/bccc73cf1142
Mark counter-style-rule-clone as failing from WR cset 921bde2. r=jrmuizel
https://hg.mozilla.org/integration/autoland/rev/0f4cba614d83
Update webrender bindings for API changes in WR cset 55acc967. r=nical
https://hg.mozilla.org/integration/autoland/rev/c35f8054f844
Update webrender bindings for API change in WR cset c7a5b78d. r=nical
https://hg.mozilla.org/integration/autoland/rev/7f992aa3fcbb
Add missing call to deinit the renderer before dropping it. r=kvark
Depends on: 1390004
You need to log in before you can comment on or make changes to this bug.