Bug 1887287 Comment 5 Edit History

Note: The actual edited comment in the bug view page will always show the original commenter’s name and original timestamp.

Original comment by

Erich Gubler [:ErichDonGubler]

on 2024-03-26 07:26:26 PDT

Observations while reproducing on my Windows 11 machine:

* Initial memory usage seems to start out relatively high on initial page load, climbing to ~170 MB on the content process.
* Using the same single tab from the last observation, I'm able to refresh constantly and achieve a peak of ~325 MB of memory usage for a single content process using WebGPU on <https://webgpu.github.io/webgpu-samples/?sample=helloTriangle>. I am unable to push the tab to allocate further than this. Within 15-20 seconds, memory returns to its stable baseline of ~115MB.
* Now, the interesting part that's relevant to the OP: the GPU process accumulates _gigabytes_ of memory in less than ten seconds.

My assumption is that all the content process' JS objects are being cycle-cleaned, so my hypotheseses for root cause are:

1. We're not correctly cleaning up resources internally in `wgpu-core`. This is unfortunately likely, since we've recently reworked the internals of resource tracking, and are still finding bugs. CC :nical.
2. Something was always broken in our plumbing of clean-up of WGPU resources between JS objects and `wgpu-core`'s resource handles.

Given Mayank's note about this regressing with a specific re-vendoring of WGPU, I think (1) is more likely. Triaging as a P1, since this is likely to hamper the WebGPU team's ability to debug things, and the OOMS we experience in CI.

Revision 1 by

Erich Gubler [:ErichDonGubler]

on 2024-03-26 07:27:02 PDT

Observations while reproducing on my Windows 11 machine:

* Initial memory usage seems to start out relatively high on initial page load, climbing to ~170 MB on the content process.
* Using the same single tab from the last observation, I'm able to refresh constantly and achieve a peak of ~325 MB of memory usage for a single content process using WebGPU on <https://webgpu.github.io/webgpu-samples/?sample=helloTriangle>. I am unable to push the tab to allocate further than this. Within 15-20 seconds, memory returns to its stable baseline of ~115MB.
* Now, the interesting part that's relevant to the OP: the GPU process accumulates _gigabytes_ of memory in less than ten seconds. This memory was not released until the process was ended.

I assume that all the content process' JS objects are being cycle-cleaned, so my hypotheseses for root cause are:

1. We're not correctly cleaning up resources internally in `wgpu-core`. This is unfortunately likely, since we've recently reworked the internals of resource tracking, and are still finding bugs. CC :nical.
2. Something was always broken in our plumbing of clean-up of WGPU resources between JS objects and `wgpu-core`'s resource handles.

Given Mayank's note about this regressing with a specific re-vendoring of WGPU, I think (1) is more likely. Triaging as a P1, since this is likely to hamper the WebGPU team's ability to debug things, and the OOMS we experience in CI.

Revision 2 by

Erich Gubler [:ErichDonGubler]

on 2024-03-26 07:32:15 PDT

Observations while reproducing on my Windows 11 machine:

* Initial memory usage seems to start out relatively high on initial page load, climbing to ~170 MB on the content process.
* Using the same single tab from the last observation, I'm able to refresh constantly and achieve a peak of ~325 MB of memory usage for a single content process using WebGPU on <https://webgpu.github.io/webgpu-samples/?sample=helloTriangle>. I am unable to push the tab to allocate further than this. Within 15-20 seconds, memory returns to its stable baseline of ~115MB.
* Now, the interesting part that's relevant to the OP: the GPU process accumulates _gigabytes_ of memory in less than ten seconds. This memory was not released until the process was ended, despite forced GC and CC runs in `about:memory`.

I assume that all the content process' JS objects are being cycle-cleaned, so my hypotheseses for root cause are:

1. We're not correctly cleaning up resources internally in `wgpu-core`. This is unfortunately likely, since we've recently reworked the internals of resource tracking, and are still finding bugs. CC :nical.
2. Something was always broken in our plumbing of clean-up of WGPU resources between JS objects and `wgpu-core`'s resource handles.

Given Mayank's note about this regressing with a specific re-vendoring of WGPU, I think (1) is more likely. Triaging as a P1, since this is likely to hamper the WebGPU team's ability to debug things, and the OOMS we experience in CI.

Revision 3 by

Erich Gubler [:ErichDonGubler]

on 2024-03-26 10:06:48 PDT

Observations while reproducing on my Windows 11 machine:

* Initial memory usage seems to start out relatively high on initial page load, climbing to ~170 MB on the content process.
* Using the same single tab from the last observation, I'm able to refresh constantly and achieve a peak of ~325 MB of memory usage for a single content process using WebGPU on <https://webgpu.github.io/webgpu-samples/?sample=helloTriangle>. I am unable to push the tab to allocate further than this. Within 15-20 seconds, memory returns to its stable baseline of ~115MB.
* Now, the interesting part that's relevant to the OP: the GPU process accumulates _gigabytes_ of memory in less than ten seconds. This memory was not released until the process was ended, despite forced GC and CC runs in `about:memory`.

I assume that all the content process' JS objects are being cycle-cleaned, so my hypotheseses for root cause are:

1. We're not correctly cleaning up resources internally in `wgpu-core`. This is unfortunately likely, since we've recently reworked the internals of resource tracking, and are still finding bugs. CC :nical.
2. Something was always broken in our plumbing of clean-up of WGPU resources between JS objects and `wgpu-core`'s resource handles.

Given Mayank's note about this regressing with a specific re-vendoring of WGPU, I think (1) is more likely. Triaging as a P1, since this is likely to hamper the WebGPU team's ability to debug things, and related to the OOMS we experience in CI.

Back to Bug 1887287 Comment 5