TypedArray.subarray() slow and leaks memory
Categories
(Core :: JavaScript Engine, defect, P3)
Tracking
()
People
(Reporter: ossman, Unassigned)
References
(Depends on 2 open bugs, Blocks 1 open bug)
Details
Attachments
(2 files)
User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:142.0) Gecko/20100101 Firefox/142.0
Steps to reproduce:
- Start up noVNC
- Connect to a server using RealVNC's JPEG encoding
- Run some heavy animation
Actual results:
Firefox starts consuming CPU and memory, and becomes unresponsive. Sometimes crashes because of memory issues.
Expected results:
Smooth, steady performance.
| Reporter | ||
Comment 1•6 months ago
|
||
We want to use that new more efficient encoding more, but Firefox performance is abysmal here which is a blocker for us.
The code in question can be found here:
https://github.com/novnc/noVNC/blob/master/core/decoders/jpeg.js
And here is a profile using that code:
https://share.firefox.dev/41C3rTj
It seems to be spending far too much time in TypedArray.subarray(). Which is surprising as we would have hoped it would be the most efficient method as it avoids copying data. But instead it is putting massive pressure on the memory.
As a comparison, this is our current JPEG encoding that fares much better:
https://github.com/novnc/noVNC/blob/master/core/decoders/tight.js
With its profile:
https://share.firefox.dev/4p5T2JC
This problem does not exist with Chrome which runs this code much more efficiently.
| Reporter | ||
Comment 2•6 months ago
|
||
Since the issue was memory pressure and the garbage collector, I tried rewriting the code to avoid the obvious constant reallocations:
https://github.com/CendioOssman/noVNC/blob/jpegopt/core/decoders/jpeg.js
This behaved much worse, both on Firefox and Chrome. But I don't understand why.
Here is the profile of it:
https://share.firefox.dev/4p3SjIX
For some reason, the creation of an HTMLImageElement (to decode the JPEG image) has shot up massively in time. Perhaps a side effect of even more memory pressure from the other code?
| Reporter | ||
Comment 3•6 months ago
|
||
Also note bug 1874699. These bugs might be the same thing, but I still opened a new one since it was also an issue of memory usage which wasn't mentioned there.
| Reporter | ||
Comment 4•6 months ago
|
||
Issue in noVNC's tracker: https://github.com/novnc/noVNC/issues/1987
Comment 5•6 months ago
|
||
The Bugbug bot thinks this bug should belong to the 'Core::Widget: Gtk' component, and is moving the bug to that component. Please correct in case you think the bot is wrong.
Comment 6•6 months ago
|
||
I would not assume that this is Linux specific? Tentatively moving to Javascript.
| Reporter | ||
Comment 7•6 months ago
|
||
I have not tested on any other platforms yet, so no idea from my side.
| Reporter | ||
Comment 8•6 months ago
|
||
(In reply to Pierre Ossman from comment #2)
For some reason, the creation of an HTMLImageElement (to decode the JPEG image) has shot up massively in time. Perhaps a side effect of even more memory pressure from the other code?
This was simply a bug in my end. I kept feeding that code a massive buffer, even though just part of it had valid data. With that fixed, the "optimized" version has about the same performance as the original one. I.e. TypedArray.subarray() is a big bottleneck.
And for posterity, TypedArray.slice() performs even worse. So no surprises there.
| Reporter | ||
Comment 9•6 months ago
|
||
I found a way to reduce the number of subarray() calls, but I'm still seeing performance problems and excessive memory load:
https://share.firefox.dev/4g9ekC3
If there is anything I can do to better pinpoint things, then please let me know.
Comment 10•6 months ago
•
|
||
Thanks for testing on Firefox and reporting the bug!
If possible, could you please share a standalone testpage that reproduces this issue? That would make diagnosing this much easier for the devs.
Updated•6 months ago
|
Comment 11•6 months ago
|
||
Also, just as a quick experiment:
In about:config if you set gfx.canvas.accelerated=False, restart firefox. Does the issue still reproduce?
| Reporter | ||
Comment 12•6 months ago
|
||
Not easily since I'm only guessing where the allocations are coming from. Perhaps the test case on bug 1874699 can be changed to a loop and see if it also provokes excessive memory usage?
| Reporter | ||
Comment 13•6 months ago
|
||
(In reply to Mayank Bansal from comment #11)
Also, just as a quick experiment:
In about:config if you set gfx.canvas.accelerated=False, restart firefox. Does the issue still reproduce?
No noticeable improvement, I'm afraid. I created a brand new profile to test it to make sure nothing else what influencing it.
Comment 14•6 months ago
•
|
||
Could you also try with the latest Firefox Nightly? There have been some recent improvements to Typedarrays in bug 1976558 and its dependencies.
For reference, these fixes improve teh TypedArray-subarray bits 100x faster than before!
| Reporter | ||
Comment 15•6 months ago
|
||
Firefox Nightly does indeed seem to be initially faster. It manager to keep up with the data without running out of CPU.
But after a few seconds, it starts stuttering and misbehaving.
I've uploaded a profile here:
https://share.firefox.dev/425DPyc
The flame graph shows that the time spent in subarray() is now negligible. But the memory graph still shows some issue, which eventually results in the garbage collector killing performance.
Comment 16•6 months ago
|
||
- You seem to have the devtools opened while profiling. That will slow down some js.
- Can you share another profile from Nightly with thefollowing changes:
- When profiling, please select the "Graphics" preset.
- When uploading the profile, please select all the 5 checkboxes.
- In Nightly, go to about:support, and copy-paste its contents here.
This may not be a js/subarray bug at all, and may be related to image/canvas/gfx. cc tnikkel and aosmond.
Comment 17•6 months ago
|
||
In that latest profile:
- The garbage collector seems pretty healthy. <7ms max pause, and it manages to clear out a fair amount of memory. If the memory use is growing too much before that, though, then we might have a problem where it should be triggering sooner.
- There is a lot of time in the content process labeled "CC (Finalize)", which appears to be finalization triggered by reference counts going to zero (nothing actually involving the cycle collector). In particular,
nsMaybeWeakPtrArray::RemoveWeakElementis taking a huge amount of time. It appears to be doing many many scans through an array? Something smells quadratic. I'll needinfo mccr8. - The parent process's memory usage seems to keep going up and up. It's not an enormous amount, about 100MB. There's a GC and CC in the middle of the profiled period, but it doesn't free very much. It would be good to confirm that this isn't just a side effect of having the devtools open.
Can you also grab a memory snapshot from about:memory?
For the speed, it seems to me like the weak pointer stuff is the smoking gun. I'd want to get that fixed before looking further. For the memory, I think getting an about:memory dump would be the next step.
Comment 18•6 months ago
|
||
The finalization there looks a little weird to me. We're calling release on nsSystemInfo which is then calling release on nsHashPropertyBag (is the nsSystemInfo being destroyed?), which is then calling the dtor for DataChannelChild, which then eventually ends up calling the dtor for nsLoadGroup, which ends up spending a bunch of time in RemoveObserver(this, "last-pb-context-exited") like you said. I think we've had bugs before about removing observers being quadratic but I can't find it right now.
Comment 19•6 months ago
|
||
Ah, right. I did improve the performance of nsMaybeWeakPtrArray::RemoveWeakElement in bug 1944426, but we still have the underlying array operation.
Comment 20•6 months ago
|
||
(In reply to Andrew McCreight [:mccr8] from comment #19)
Ah, right. I did improve the performance of nsMaybeWeakPtrArray::RemoveWeakElement in bug 1944426, but we still have the underlying array operation.
Yeah, that looks like a potential quadratic footgun. In theory this list could probably better be a DoublyLinkedList to make all operations O(1) (but that might require some extra plumbing and/or wrapper around the elements). Or if we do not need to maintain the order (but I assume we want to), we could use a HashSet which would be probably easier to drop in but maybe have more overhead in "normal" cases.
Would that qualify the entire bug here as XPCOM bug?
Comment 21•6 months ago
|
||
As a GC person, what jumps to mind is keeping a liveness bitmap and periodically sweeping the whole array in one go. But that might require checking and updating the bitmap a lot. I haven't looked to see how it is normally used.
Comment 22•6 months ago
|
||
(In reply to Jens Stutte [:jstutte] from comment #20)
Would that qualify the entire bug here as XPCOM bug?
There's a bunch of potential badness here. Personally I think we should file separate bugs blocking this one until we've gotten it into an acceptable state.
Comment 23•6 months ago
|
||
(In reply to Pierre Ossman from comment #2)
Since the issue was memory pressure and the garbage collector, I tried rewriting the code to avoid the obvious constant reallocations:
https://github.com/CendioOssman/noVNC/blob/jpegopt/core/decoders/jpeg.js
This behaved much worse, both on Firefox and Chrome. But I don't understand why.
Here is the profile of it:
https://share.firefox.dev/4p3SjIX
For some reason, the creation of an HTMLImageElement (to decode the JPEG image) has shot up massively in time. Perhaps a side effect of even more memory pressure from the other code?
Are you using a data uri for the jpeg? It looks like its spending a lot of time processing the string of the uri of the image. Switching to a blob uri should avoid that.
| Reporter | ||
Comment 24•6 months ago
|
||
| Reporter | ||
Comment 25•6 months ago
|
||
(In reply to Mayank Bansal from comment #16)
- You seem to have the devtools opened while profiling. That will slow down some js.
- Can you share another profile from Nightly with thefollowing changes:
- When profiling, please select the "Graphics" preset.
- When uploading the profile, please select all the 5 checkboxes.
- In Nightly, go to about:support, and copy-paste its contents here.
This may not be a js/subarray bug at all, and may be related to image/canvas/gfx. cc tnikkel and aosmond.
I wasn't aware you could profile without devtools. But here should be a new profile as instructed:
https://share.firefox.dev/3VwHykJ
about:support has been attached.
| Reporter | ||
Comment 26•6 months ago
|
||
(In reply to Steve Fink [:sfink] [:s:] from comment #17)
Can you also grab a memory snapshot from about:memory?
For the speed, it seems to me like the weak pointer stuff is the smoking gun. I'd want to get that fixed before looking further. For the memory, I think getting an about:memory dump would be the next step.
Where in the process would you like this? Just anywhere when it is janky?
(In reply to Timothy Nikkel (:tnikkel) from comment #23)
Are you using a data uri for the jpeg? It looks like its spending a lot of time processing the string of the uri of the image. Switching to a blob uri should avoid that.
Yes, that is because of performance issues with blob uris:
https://issues.chromium.org/issues/41324363
https://issues.chromium.org/issues/41409681
https://bugzilla.mozilla.org/show_bug.cgi?id=1092837
| Reporter | ||
Comment 27•6 months ago
|
||
We're open to changing things in noVNC if that can somehow avoid the wrath of the memory allocator. But we'd need some guidance.
| Reporter | ||
Comment 28•6 months ago
|
||
(In reply to Pierre Ossman from comment #25)
I wasn't aware you could profile without devtools. But here should be a new profile as instructed:
My apologies. This profile was with the wrong test case. It also shows stuttering, though, but not as bad as the original test case, which you can find here:
Comment 29•6 months ago
|
||
Having an easy way to reproduce the issue would be most useful for diagnosis/recommendations. SOme standalone testcase or a webpage/portal which can be accessed by clicking on it would be ideal.
Comment 30•6 months ago
|
||
Unrelated to any GC or memory issues: The profile also shows that some time is spend in encode for Base64 encoding. Switching to https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Uint8Array/toBase64 (if supported by the browser) could improve this.
| Reporter | ||
Comment 31•6 months ago
|
||
This should be able to allow you to play around with the same thing:
Download and do your own setup:
$ git clone https://github.com/novnc/noVNC.git
$ mkdir noVNC/recordings
$ cd noVNC/recordings
$ curl -O https://www.cendio.com/~ossman/noVNC/recordings/jpgspheres.js
$ cd ..
$ ./utils/novnc_launch
Then open http://localhost/tests/vnc_playback.html?data=jpgspheres.js and click "Start"
Or, use this upload:
https://www.cendio.com/~ossman/noVNC/tests/vnc_playback.html?data=jpgspheres.js
I did a quick test run here:
- Firefox nightly: 12 - 16 seconds per run
- Firefox 142: 14 - 19 seconds per run
- Chrome 139: 10 seconds per run
| Reporter | ||
Comment 32•6 months ago
|
||
(In reply to Pierre Ossman from comment #31)
$ ./utils/novnc_launch
Sorry, that's actually ./utils/novnc_proxy.
Updated•6 months ago
|
Comment 33•6 months ago
|
||
(In reply to Pierre Ossman from comment #26)
(In reply to Steve Fink [:sfink] [:s:] from comment #17)
Can you also grab a memory snapshot from about:memory?
Where in the process would you like this? Just anywhere when it is janky?
Yeah, that would probably be good enough.
Though I'm most curious about what's piling up on the main thread. So I guess I'd like it to at least run long enough for a decent amount of crap to build up there.
| Reporter | ||
Comment 34•6 months ago
|
||
Here you go. The highlights are:
2,726.05 MB (100.0%) -- explicit
├──2,084.40 MB (76.46%) -- images
│ ├──1,995.31 MB (73.19%) -- content
│ │ ├──1,993.87 MB (73.14%) -- raster
│ │ │ ├──1,764.20 MB (64.72%) -- unused
│ │ │ │ ├──1,500.67 MB (55.05%) ++ progress=10f
│ │ │ │ └────263.52 MB (09.67%) -- <non-notable images>
│ │ │ │ ├──162.83 MB (05.97%) ── decoded-nonheap
│ │ │ │ ├───96.50 MB (03.54%) ── source
│ │ │ │ └────4.19 MB (00.15%) ── decoded-heap
│ │ │ └────229.68 MB (08.43%) -- used
│ │ │ ├──213.66 MB (07.84%) ++ progress=10f
│ │ │ └───16.02 MB (00.59%) ++ <non-notable images>
│ │ └──────1.44 MB (00.05%) ++ vector/used/progress=18f
│ └─────89.09 MB (03.27%) ── cache/overhead
├────504.91 MB (18.52%) ── heap-unclassified
| Reporter | ||
Comment 35•6 months ago
|
||
Again, if there is anything we can do from our end to better help Firefox free up this memory quickly, please let us know.
Comment 36•6 months ago
|
||
(In reply to Pierre Ossman from comment #34)
2,726.05 MB (100.0%) -- explicit ├──2,084.40 MB (76.46%) -- images │ ├──1,995.31 MB (73.19%) -- content │ │ ├──1,993.87 MB (73.14%) -- raster │ │ │ ├──1,764.20 MB (64.72%) -- unused
So you're using one img element to decode one jpeg and then using a new img element for the next jpeg? What if you set img.src = "" after you are done with the img element? I think that should discard the decoded frame.
Updated•6 months ago
|
| Reporter | ||
Comment 37•6 months ago
|
||
(In reply to Timothy Nikkel (:tnikkel) from comment #36)
So you're using one img element to decode one jpeg and then using a new img element for the next jpeg?
Correct, since decoding is asynchronous, and we can't control how many we get. So they need to be queued somewhere.
What if you set img.src = "" after you are done with the img element? I think that should discard the decoded frame.
That does seem to resolve that big memory leakage for the process as a whole. Thanks!
But it does nothing for performance, and memory usage in the profiler looks the same where things are constantly growing until the GC has to kick in.
The new largest offender in about:memory is simply the anonymous "heap-unclassified".
Comment 38•5 months ago
•
|
||
(In reply to Jens Stutte [:jstutte] from comment #20)
(In reply to Andrew McCreight [:mccr8] from comment #19)
Ah, right. I did improve the performance of nsMaybeWeakPtrArray::RemoveWeakElement in bug 1944426, but we still have the underlying array operation.
I have a patch stack up on bug 1986846 that might help a bit with the nsMaybeWeakPtrArray part. The most relevant piece here would probably be the removal search from tail (together with the stale entries logic), but maybe the lesser number of array copies is welcome, too. There are release builds here (slightly earlier version, but the essence is the same).
Description
•