Performance issue on voice.google.com - Site can hang from seconds to minutes
Categories
(Core :: JavaScript: GC, defect, P3)
Tracking
()
Performance Impact | medium |
People
(Reporter: miketaylr, Unassigned)
References
(Blocks 1 open bug, )
Details
(Keywords: perf:responsiveness)
Attachments
(2 files)
Originally reported at https://github.com/webcompat/web-bugs/issues/54459
There is a profile from the original reporter here: https://share.firefox.dev/2OocUHB
[EDIT from dholbert: here's a better profile, from comment 4: https://share.firefox.dev/2BCBqCk ]
So it took a while to get because GVoice isn't always like that. I suspect this may be why you had a hard time reproducing it. It just ... find itself in that state sometimes. Sometimes an hour after my browser is open, sometimes a day. Usually within a day, but it wasn't happening this weekend. Today I restarted my browser and a few hours later it started doing it, so I grabbed a profile. Hopefully it's useful.
One one other note. In the original bug description I said 'several minutes' - which is true, that does happen. In this case it was several seconds, but a clear, solid hang for ... 4-5 seconds.
Reporter | ||
Updated•5 years ago
|
Updated•5 years ago
|
Comment 1•5 years ago
|
||
(Looks like phil is the original reporter from GitHub.)
Phil: if you're able to figure out any specific actions or conditions that seem to be associated with causing the hangs, that could help determine next actions here.
Also: if you could capture another profile of the hang, that would be helpful as well. (One observation about your first profile -- unfortunately it's missing sample data for the first ~half of it. That's not your fault -- it's the fault of the profiler mechanism, and I've noticed that sometimes happens for the very first profile that I record in a Firefox session, and subsequent profiles are fine. So: when you're getting ready to try to reproduce this & to capture a profile, it may help to capture a brief "dummy profile" first, just to get the gears spinning so that it's ready for your "real" profile-capturing-operation when you experience the Google Voice hang and start profiling again.)
Comment 2•5 years ago
|
||
Hey Daniel,
Unfortunately there's no consistent action to cause this. I keep gvoice in a pinned tab. Often times, I back to that tab and the entire tab is unresponsive.
I'm happy to grab another profile. Is it still useful if I start the profile after I switch to that tab? Since I don't know if it's going to happen until after I switch to the tab that would make it easier than always having to remember to start a profile and throw a bunch of them away.
And yeah I can try to make sure anytime I've restarted the browser I do a dummy profile.
Comment 3•5 years ago
|
||
Thanks! Yes, it's still useful if you start the profile after switching to the tab.
For now, I'm going to classify this under our JavaScript engine, because we seem to be spending most of the hang-time running JavaScript in the profile that you posted so far.
Comment 4•5 years ago
|
||
Here's a good one: https://share.firefox.dev/2BCBqCk
So I clicked on another text thread and it just did nothing. After a while I clicked on another. I let it run over a minute. The text that was open still let me type text into, but no other part of the UI seemed to respond.
Comment 5•5 years ago
•
|
||
Thanks Phil, that one's really useful. Looks like our javascript engine is repeatedly doing ~8 second batches of garbage collection (which stop the world while they're happening). Usually garbage collection doesn't take that long.
Either our JS Engine is going a little haywire, or Google Voice is creating/discarding crazy amounts of objects continuously, or something in between.
Tentatively bumping to p1 from a qf perspective, since now we've got something more actionable, & it seems reasonable to assume this affects a decent number of users, since Google Voice is reasonably popular.
Comment 6•5 years ago
|
||
[ni=triage owner jonco to be sure this gets a look from somebody. See performance profile in comment 4, with what seem to be repeated ~8-second GC pauses]
Comment 7•5 years ago
|
||
FWIW, I stopped it before it ever responded (I don't know it ever would). The browser is still in that state, so if you want a longer profile, LMK.
Comment 9•5 years ago
|
||
(In reply to Daniel Holbert [:dholbert] from comment #5)
In the profile I see repeated ~8 second GCs. Heap size is 3.9GB, which is very large, and the GCs hardly free any memory.
It's possible that the page is leaking memory, although that should also affect chrome.
It's unlikely to be solely a GC bug but it could be an interaction with the rest of the browser that means that some things are not getting cleaned up. Is it possible to capture and about:memory report while this is happening?
Comment 10•5 years ago
|
||
Yeah I can do that next time this happens.
Comment 11•5 years ago
|
||
(In reply to Daniel Holbert [:dholbert] from comment #5)
Thanks Phil, that one's really useful. Looks like our javascript engine is repeatedly doing ~8 second batches of garbage collection (which stop the world while they're happening). Usually garbage collection doesn't take that long.
They do not stop the world. Those 8 second GCs have a max pause time of 100-200ms. Which isn't great, but it means that the pause is not explained by a synchronous GC. Something else is happening that is preventing interactivity for long periods of time.
At the same time, those 8s GCs are spending 5 of those 8 seconds actually doing the GC, the vast majority in marking. (By the way, there's something wrong with the profiler's summary view. It's helpfully pointing out that we spend 1.1ms marking roots, but neglects to mention that we spent 4798ms in regular marking. My guess is that it sees the overall "mark" phase as taking the longest time, so it displays it, but then it displays only the large of the "mark.*" children -- omitting what I would call "mark.other" that is not explicitly recorded in a subphase.) Which suggests an enormous object graph of some sort.
In short: what I'm seeing is enough steady allocation to trigger GCs (INCREMENTAL_ALLOC_TRIGGER
), but with an absolutely massive heap being collected. I wonder if 3.9GB being close to 4GB means we're bumping up against a 4GB limit?
An about:memory
report would be helpful. Or you could see a more detailed view by taking a snapshot with the devtools Memory tool. I'm having problems saving one out (I get a 0-length file), but just grabbing a screenshot of the initial views of all 3 different views ("Tree Map", "Aggregate", and "Dominators") might be helpful.
I use Google Voice, but I cannot immediately reproduce the performance problem. I scrolled around, listed to some voicemails, etc. I don't have it as a pinned tab, though.
Comment 12•5 years ago
|
||
I have a memory report and a gc and cc verbose report. Will upload
Comment 13•5 years ago
|
||
memory report
Comment 14•5 years ago
|
||
Even the concise gc/cc reports are too bug to upload, so .. only memory report for now. That work?
Comment 15•5 years ago
|
||
Yeah, the heap dumps are huge. The devtools memory report is probably huge too, I don't know exactly what it saves. But let me look at the main part of the memory report:
3,455.53 MB (100.0%) -- explicit
├──3,055.36 MB (88.42%) -- window-objects
│ ├──3,032.83 MB (87.77%) -- top(<anonymized-6442451543>, id=11)
│ │ ├──2,921.63 MB (84.55%) -- active
│ │ │ ├──2,917.13 MB (84.42%) -- window(<anonymized-6442451543>)
│ │ │ │ ├──2,879.18 MB (83.32%) -- js-realm(<anonymized-8>)
│ │ │ │ │ ├──2,859.52 MB (82.75%) -- classes
│ │ │ │ │ │ ├────860.74 MB (24.91%) -- class(Call)/objects
│ │ │ │ │ │ │ ├──860.53 MB (24.90%) ── gc-heap
│ │ │ │ │ │ │ └────0.21 MB (00.01%) ── malloc-heap/slots
│ │ │ │ │ │ ├────671.91 MB (19.44%) -- class(Array)/objects
│ │ │ │ │ │ │ ├──643.30 MB (18.62%) ── gc-heap
│ │ │ │ │ │ │ └───28.61 MB (00.83%) ++ malloc-heap
│ │ │ │ │ │ ├────660.35 MB (19.11%) -- class(Function)/objects
│ │ │ │ │ │ │ ├──657.90 MB (19.04%) ── gc-heap
│ │ │ │ │ │ │ └────2.45 MB (00.07%) ++ malloc-heap
│ │ │ │ │ │ ├────658.60 MB (19.06%) -- class(Object)/objects
│ │ │ │ │ │ │ ├──648.68 MB (18.77%) ── gc-heap
│ │ │ │ │ │ │ └────9.92 MB (00.29%) ++ malloc-heap
│ │ │ │ │ │ └──────7.93 MB (00.23%) ++ (22 tiny)
│ │ │ │ │ └─────19.66 MB (00.57%) ++ (9 tiny)
│ │ │ │ └─────37.95 MB (01.10%) ++ (3 tiny)
│ │ │ └──────4.49 MB (00.13%) ++ (4 tiny)
...
I think this probably tells a lot, to someone who knows more about how things work. It's showing a huge amount of Call objects, Array objects, plain Objects, and Function objects. That seems like... closures, maybe? I'll find somebody to ask.
Comment 16•5 years ago
|
||
Jason, maybe you would know?
I also think the devtools Memory snapshot thing might be worth a try (not to upload, just to look at the basic results.) Open up devtools, select the Memory tab, click on Take Snapshot. Then look at the 3 views.
Comment 17•5 years ago
|
||
The views aren't really copy-paste-able in any meaningful way, so I don't know how to provide them to you....
Comment 18•5 years ago
|
||
Something else struck me as strange about the profile - there's nothing happening between the GCs.
Normally in a report like this you see JS running (and presumably creating garbage) interspersed with the GC running to clean it up. Here the profile shows requestAnimationFrame callbacks, but no significant JS running.
Comment 19•5 years ago
•
|
||
(In reply to Phil Dibowitz from comment #17)
The views aren't really copy-paste-able in any meaningful way, so I don't know how to provide them to you....
Oh, sorry. I meant with a screenshot. Sure, it's kind of lame, but I'm hoping it'll give some clue to have the initial toplevel display of each view.
Comment 20•5 years ago
|
||
ah. Firefox's screenshot mechanism doesn't let you take a screenshot of developer tools. But next time this happens I'll fire up gimp.
Comment 21•5 years ago
|
||
Ah, I found the right tool for this job. The one tricky bit is that I suspect the profiler needs to be running while the memory is being allocated, which may be a little tricky.
- Change the settings for the profiler: from the dropdown, use the "Custom" preset and do "Edit Settings".
- Turn on "JS Allocations" and "Native Allocations".
- (You can turn off unwanted stuff like screenshots, and perhaps reduce the Buffer Size to something reasonable to upload)
- Start profiling
- Hopefully, do stuff that makes it allocate memory.
- Stop profiling
- Upload the profile
I am hopeful that this will show where the memory is being allocated even if it doesn't grow to a ridiculous size.
Comment 22•5 years ago
|
||
Oh, I should mention that this information will be displayed in the profiler if you change the "Summary:" value to "Retained Memory".
Comment 23•5 years ago
|
||
Also, I should note that the direct cause of the issues here looks to be what jonco mentioned in comment 18: we have a heap size limit of 4GB. When you near the limit, we get more and more aggressive about GCing. In this case, we end up doing lots of GCs that don't free up anything, without really doing anything useful in between.
In the short term, it would probably be better to crash the process for running out of memory. And have a way to know how often this happens.
But we'd also like to resolve whatever it is that's eating so much memory in this specific case, by tracking down the allocation.
I don't suppose you've checked whether Chrome uses large amounts of memory on this site? (If not, it may not prove much -- I wouldn't be terribly surprised if voice.google.com
ran different code in Chrome.)
Comment 24•5 years ago
|
||
I suspect that will be nearly impossible.
The problem is that the tab sits around for days and when I switch to it, it's fine. Then one day, I switch to it, and it's unresponsive. This is when I started the profile that you see. But it presumably allocated the memory sometime in the <however long> since I last looked at the tab, which is probably a day or two. The only way I could catch the allocs is if I just ran the profiler for a week or so until the bug popped up which isn't going to work out terribly well.
Comment 25•5 years ago
|
||
The severity field is not set for this bug.
:jonco, could you have a look please?
For more information, please visit auto_nag documentation.
Updated•5 years ago
|
![]() |
||
Updated•4 years ago
|
Updated•3 years ago
|
![]() |
||
Updated•3 years ago
|
Comment 26•3 years ago
|
||
The Performance Priority Calculator has determined this bug's performance priority to be P2.
Impact on site: Renders site effectively unusable
Websites affected: Major
[x] Causes severe resource usage
Comment 28•3 years ago
|
||
Yes, it happens anytime the tab is open for more than like an hour or so.
Comment 29•3 years ago
|
||
I'm constantly having to close and re-open the tab for the site to be functional. Just wanted to follow up here.
Comment 30•3 years ago
|
||
I have had Google Voice open in a pinned tab for the last year or so in hopes of observing this bug, but it hasn't ever happened that I've noticed.
I don't think I can see the list of addons in your profile (anonymization must strip that out?). What addons are you using?
An up to date profile and about:memory might be useful. The one in comment 4 looks to be pre-Fission. The steps in comment 21 might also reveal something.
Near as I can tell, something is using up a ton of memory when you're running voice.google.com in your browser, and it isn't doing the same thing in mine. (Nor probably in most people's, but I don't want to assume that just from the lack of complaints.) The discrepancy might be an addon you have installed, or it might just be something different about the data in your account. Or something else entirely.
Comment 31•3 years ago
|
||
Oh right, there's a memory report that contains the list of addons. I don't know if you still have the same ones, but I could imagine some of them potentially causing problems. I would assume Amazon Assistant and Translate Whole Page would at least need to access page contents and could cause stuff to be retained, in theory.
Would it be possible to test if this happens in safe mode?
Comment 32•3 years ago
|
||
Here's an updated memory report. After I saved the logs, FF blew up and my machine became unresponsive. Though it's my laptop which has much less memory than my desktop, so I'll see if I can get one from home soon.
Testing in safe mode is a bit difficult because it takes time for this to manifest, but I'll see what I can do.
Comment 33•3 years ago
|
||
Updated•7 months ago
|
Description
•