Closed Bug 1818758 Opened 2 years ago Closed 2 years ago

some restored tabs broken on launch (content not loading/eternal spinner/throbber), also some new tabs broken: won't load pages, no reaction to Alt key to show toolbar

Categories

(Firefox :: Untriaged, defect)

defect

Tracking

()

RESOLVED DUPLICATE of bug 1817498
Tracking Status
firefox-esr102 --- unaffected
firefox110 --- unaffected
firefox111 --- unaffected
firefox112 + fixed

People

(Reporter: aryx, Unassigned)

References

(Regression)

Details

(Keywords: regression)

Attachments

(2 files)

Firefox 112.0a1 20230224092408 on Windows 10

For a few days, some session restored tabs in Nightly are broken after launch: the url is shown but the tab's throbber will spin infinitely without showing content. These tabs don't react to Ctrl+R to reload them and pressing the Alt key to show the menu bar also does not work.

Initially I considered this to be triggered by a networking issue but the issue has not been observed with Firefox 110.0 on the same machine.

A comparison of gBrowser.selectedTab e.g. includes

  • pendingpaint="true" in outerHTML: '<browser xmlns="http://www.mozilla.org/keymaster/gatekeeper/there.is.only.xul" contextmenu="contentAreaContextMenu" message="true" messagemanagergroup="browsers" tooltip="aHTMLTooltip" type="content" maychangeremoteness="true" initiallyactive="false" usercontextid="6" remote="true" autocompletepopup="PopupAutoComplete" nodefaultsrc="true" primary="true" pendingpaint="true"/>'

See the attached files for the full content of gBrowser.selectedTab

The Bugbug bot thinks this bug should belong to the 'Firefox::Tabbed Browser' component, and is moving the bug to that component. Please correct in case you think the bot is wrong.

Component: Untriaged → Tabbed Browser
Component: Tabbed Browser → Untriaged

Bisection with a profile with ~10 pinned tabs and a dozen windows points to bug 1814718 as regressing change.

In addition to the behavior in comment 0, the following can be observed:

  • during startup, the firefox.exe of the parent process uses >5GB of memory which drop later - the UI can get temporarily unresponsive.
  • in case tabs are broken, shutdown will take one minute (some shutdown process termination)
  • high memory usage during shutdown (once observed 29GB - for the parent process if I remember correct).
  • the issue with new tabs being broken seems much more likely directly after launch.

This is with a profile of 3.6k tabs (only the pinned one and the shown ones are loaded by default).

[Tracking Requested - why for this release]: Bad user experience

Flags: needinfo?(jlink)
Keywords: regression
Regressed by: 1814718
See Also: → 1817498

@aryx: You mention that you were able to use a reduced scenario with "~10 pinned tabs and a dozen windows" to narrow down where this started. Is this something that I would reasonably be able to reproduce locally? If so, can you give me any specific instructions on how view this behavior? I tried locally with 10 tabs and didn't see any issues but I'm not sure if I'm trying the right thing or looking in the right way.

Flags: needinfo?(jlink) → needinfo?(aryx.bugmail)

@chutten: I'm wondering if we might be running into some performance limitations related to glean here. From stepping through the code and looking at the disassembly it seems that there is quite a bit of code that gets executed for a single call to AccumulateSamples. I can easily increase the buffer size that TimerThread uses and thus send larger batches of samples less frequently (which will help amortize any fixed overhead) but I think that will only be side-stepping the problem and, even then, only partially so, because on shutdown there will still be a lot of final batches of samples to be sent at the same time which would still be sluggish (and maybe moreso with the larger buffers).

There is also the issue with some tabs apparently being unresponsive. Could that be some kind of deadlock related to the heavy usage of glean? (See also the ticket cited by Florian which may be related.)

Flags: needinfo?(chutten)

I was not able to reproduce the issue with a minimalist profile of ~10 windows of ~10 normal tabs and ~10 pinned tabs total. But feel free to reach out with questions or if I shall run scripts in the browser console and provide the output.

Flags: needinfo?(aryx.bugmail)

(In reply to Justin Link from comment #5)

@chutten: I'm wondering if we might be running into some performance limitations related to glean here. From stepping through the code and looking at the disassembly it seems that there is quite a bit of code that gets executed for a single call to AccumulateSamples. I can easily increase the buffer size that TimerThread uses and thus send larger batches of samples less frequently (which will help amortize any fixed overhead) but I think that will only be side-stepping the problem and, even then, only partially so, because on shutdown there will still be a lot of final batches of samples to be sent at the same time which would still be sluggish (and maybe moreso with the larger buffers).

There is also the issue with some tabs apparently being unresponsive. Could that be some kind of deadlock related to the heavy usage of glean? (See also the ticket cited by Florian which may be related.)

Glean always aims to be cheap (but not free). It may be that you're pressing up against the edges of how expensive "cheap" can get at volume. What is the volume here? 3.6k tabs means... tens? hundreds? 2^10s of TimerThreads being run?

While I'm here, let's do some visual tracing and see if we can see an obvious bottleneck.

On the instrumenting thread, we:

  • Ask the instrumentor to allocate an nsTArray
  • Check if the metric's mirrored via GIFFT (which should be a jump table)
    • (if it were mirrored we'd iterate the samples and mirror to Telemetry, but in this case it's not, so we can ignore the loop)
  • Pass it without allocation through FFI (huzzah ThinVec)
  • FFI will copy the contents into a Rust Vec<i64> via to_vec if using AccumulateSamplesSigned or collect if using AccumulateSamples.
  • Which process are we in?
    • If we're parent, then we jump straight to Glean. The Rust SDK will take the samples over to the dispatcher thread, and the instrumenting thread's job is done.
    • Otherwise, IPC kicks in and we extend the IPC payload's samples vec for this metric, which is another iteration and perhaps some allocation. Then IPC will need to drain the next time there's some idle, but that doesn't happen on the instrumenting thread (it instead happens on the main thread D : for now ). But this is then where the instrumenting thread's job is done.

On the whole it appears to me to be pretty minimal. There are a few too many copies for my tastes, but given the current shape of things (nsTArray becoming Rust Vec, IPC, requiring the nsTArray in the API) I don't see anything we could avoid. If we change the shape (maybe... Glean could use ThinVec for CustomDistributionMetric?) then perhaps there's something.

Now, this is by inspection. Perhaps there's more to learn if we look at the way it actually runs.

Flags: needinfo?(chutten)

I measured the actual runtime of the calls to AccumulateSamples() in my case to try and get some idea if those calls actually are taking a long time.

If I submit 128 samples at once via AcccumulateSamples() (the current behavior) this takes about 0.019414ms/call (0.000151671875ms/sample). If I submit 1024 samples at once, this takes about 0.042370ms/call (0.000041376953/sample). Neither of these feel unreasonable to me and that feels like a reasonable balance of variable vs fixed cost.

That having been said, from stepping through in the debugger (on Windows) it does seem like there is some significant overhead/allocation going on related to the .collect() or with_metric() lines of the Rust code. Here is a call stack that, from what I can tell, is setting up the call to with_metric!():

xul.dll!alloc::raw_vec::RawVec<tuple$<alloc::vec::Vec<u8,alloc::alloc::Global>,alloc::vec::Vec<u8,alloc::alloc::Global>>,alloc::alloc::Global>::needs_to_grow<tuple$<alloc::vec::Vec<u8,alloc::alloc::Global>,alloc::vec::Vec<u8,alloc::alloc::Global>>,alloc::alloc::Global>(unsigned __int64 self, unsigned __int64 len) Line 360 Rust
[Inline Frame] xul.dll!alloc::raw_vec::RawVec<i64,alloc::alloc::Global>::reserve(unsigned __int64 self, unsigned __int64 len) Line 288 Rust
[Inline Frame] xul.dll!alloc::vec::Vec<i64,alloc::alloc::Global>::reserve(unsigned __int64 self) Line 907 Rust
[Inline Frame] xul.dll!alloc::vec::spec_extend::impl$1::spec_extend(alloc::vec::Vec<i64,alloc::alloc::Global> * self, core::iter::adapters::map::Map<core::slice::iter::Iter<u64>,firefox_on_glean::ffi::custom_distribution::fog_custom_distribution_accumulate_samples::closure_env$0>) Line 36 Rust
[Inline Frame] xul.dll!alloc::vec::spec_from_iter_nested::impl$1::from_iter(core::iter::adapters::map::Map<core::slice::iter::Iter<u64>,firefox_on_glean::ffi::custom_distribution::fog_custom_distribution_accumulate_samples::closure_env$0>) Line 62 Rust
xul.dll!alloc::vec::spec_from_iter::impl$0::from_iter<i64,core::iter::adapters::map::Map<core::slice::iter::Iter<u64>,firefox_on_glean::ffi::custom_distribution::fog_custom_distribution_accumulate_samples::closure_env$0>>(core::iter::adapters::map::Map<core::slice::iter::Iter<u64>,firefox_on_glean::ffi::custom_distribution::fog_custom_distribution_accumulate_samples::closure_env$0>) Line 33 Rust
[Inline Frame] xul.dll!alloc::vec::impl$18::from_iter(core::iter::adapters::map::Map<core::slice::iter::Iter<u64>,firefox_on_glean::ffi::custom_distribution::fog_custom_distribution_accumulate_samples::closure_env$0>) Line 2757 Rust
[Inline Frame] xul.dll!core::iter::traits::iterator::Iterator::collect(core::iter::adapters::map::Map<core::slice::iter::Iter<u64>,firefox_on_glean::ffi::custom_distribution::fog_custom_distribution_accumulate_samples::closure_env$0>) Line 1836 Rust
xul.dll!firefox_on_glean::ffi::custom_distribution::fog_custom_distribution_accumulate_samples(unsigned int id, thin_vec::ThinVec<u64> * samples) Line 46 Rust
xul.dll!TimerThread::Run() Line 772 C++

I have two thoughts/questions at this point:

  1. @chutten: Does this look approximately like what you would have expected with regards to copies/allocations for this scenario?
  2. Since it seems like a single call to Accumulate() isn't too slow, but there are some calls to some allocations, could we be running into a problem related to locks in the allocator in the situation where a "coordinated attack" of thousands of TimerThreads is happening? (Of course then I would expect everything to eventually become responsive, even if it took some observable amount of time.
Flags: needinfo?(chutten)

with_metric! shouldn't allocate unless the metric is labeled (which custom_distributions like this cannot be (and even if it were, it should only ever allocate a few things once per label and then never again)). ...unless I'm wrong to assume lock acquisition (for read) and map gets shouldn't allocate.

The allocation in the stack seems about right for turning the ThinVec into the Vec, though comments in that file suggest our aim was to avoid allocation. (I'm not sure how, and the author of those comments is taking some well-deserved PTO so we can't ask for at least a little while).

I do think this supports the hypothesis that we're in a bit of a pathological case here: Glean's doing "okay", but for all these TimerThreads running at once "okay" just isn't good enough.

Flags: needinfo?(chutten)

As a reminder, soft code freeze is in two days... Just wondering if the plan here is to back anything out before beta?

The patch in bug 1817498 is currently landing, it would be nice to see if it improves things here.

5/5 launches did not reproduce broken tabs and 1 minute for shutdown. The first launch (after a Nightly update) hit 6GB memory consumption during launch, later launches were unaffected. Thank you for fixing this.

Status: NEW → RESOLVED
Closed: 2 years ago
Duplicate of bug: 1817498
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: