Closed Bug 1759506 Opened 4 years ago Closed 4 years ago

startup Crash in [@ mozilla::net::LoadInfo::SetReservedClientInfo] on Amazon

Categories

(Core :: Networking, defect, P1)

defect

Tracking

()

RESOLVED FIXED
100 Branch
Tracking Status
firefox-esr91 99+ fixed
firefox98 + fixed
firefox99 + fixed
firefox100 + fixed

People

(Reporter: aryx, Assigned: kershaw)

References

(Regression)

Details

(Keywords: crash, regression, Whiteboard: [necko-triaged])

Crash Data

Attachments

(1 file)

~200 crashes with Firefox 91.7.0esr, ~400 with 91.7.0 (must be an unofficial build), no crashes with Firefox 91.6.*. Crashes on Windows and macOS, >40% in the first minute after launch. Comments mention it is frequently observed on amazon.com (>80% of users with en-US locale).

Eden, please investigate the issue.

Crash report: https://crash-stats.mozilla.org/report/index/38099100-6802-4daf-9e7f-766680220314

MOZ_CRASH Reason: MOZ_RELEASE_ASSERT(!isSome())

Top 10 frames of crashing thread:

0 xul.dll mozilla::net::LoadInfo::SetReservedClientInfo netwerk/base/LoadInfo.cpp:1771
1 xul.dll mozilla::dom::`anonymous namespace'::ClientChannelHelperParent::CreateClientForPrincipal dom/clients/manager/ClientChannelHelper.cpp:247
2 xul.dll mozilla::dom::`anonymous namespace'::ClientChannelHelperParent::CreateClient dom/clients/manager/ClientChannelHelper.cpp:213
3 xul.dll mozilla::dom::`anonymous namespace'::ClientChannelHelper::AsyncOnChannelRedirect dom/clients/manager/ClientChannelHelper.cpp:143
4 xul.dll mozilla::net::nsAsyncRedirectVerifyHelper::DelegateOnChannelRedirect netwerk/base/nsAsyncRedirectVerifyHelper.cpp:153
5 xul.dll mozilla::net::nsAsyncRedirectVerifyHelper::Run netwerk/base/nsAsyncRedirectVerifyHelper.cpp:259
6 xul.dll mozilla::TaskController::DoExecuteNextTaskOnlyMainThreadInternal xpcom/threads/TaskController.cpp:805
7 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1152
8 xul.dll mozilla::ipc::MessagePump::Run ipc/glue/MessagePump.cpp:85
9 xul.dll MessageLoop::RunHandler ipc/chromium/src/base/message_loop.cc:324
Flags: needinfo?(echuang)

Also a spike in 98.0

Summary: esr startup Crash in [@ mozilla::net::LoadInfo::SetReservedClientInfo] on Amazon → startup Crash in [@ mozilla::net::LoadInfo::SetReservedClientInfo] on Amazon
Component: DOM: Core & HTML → Networking

Unfortunately, I can not reproduce it locally.

The assertion comes from https://searchfox.org/mozilla-central/rev/d4d7611ee4dd0003b492b865bc5988a4e6afc985/mfbt/Maybe.h#837 which complains that LoadInfo::mReservedClientInfo is not Nothing() while emplacing it.

Basically, we expect that when calling ClientChannelHelper::AsyncOnChannelRedirect, the passed new channel's reseervedClient/reservedClientInfo should be null/Nothing(). But it seems that the case violates our assumption. That means the newLoadInfo's SetReservedClientInfo/GiveReservedSource is called before calling ClientChannelHelper::AsyncOnChannelRedirect.

Flags: needinfo?(echuang)

Basically, we expect that when calling ClientChannelHelper::AsyncOnChannelRedirect, the passed new channel's reseervedClient/reservedClientInfo should be null/Nothing(). But it seems that the case violates our assumption. That means the newLoadInfo's SetReservedClientInfo/GiveReservedSource is called before calling ClientChannelHelper::AsyncOnChannelRedirect.

Is this an uncovered error when a client info is set for a new load info?
Can we just replace the already existed client info?

Flags: needinfo?(echuang)

The fact that this crash clearly showed up also in 91.7.0esr feels like a pretty solid clue to go off:
https://hg.mozilla.org/releases/mozilla-esr91/pushloghtml?fromchange=FIREFOX_91_6_0esr_RELEASE&tochange=FIREFOX_91_7_0esr_RELEASE

Not seeing anything super-obvious in that range, but maybe bug 1744352?

Well, the immediate cause is certainly bug 1754305. My guess would be that this showed up as something else before that.

Regressed by: 1754305

(bug 1754305 is in the range that Ryan posted in comment 4.)

Has Regression Range: --- → yes

If this was previously a diagnostic assert, that means we were effectively just ignoring this situation on Release in the past, right? Would it be a viable wallpaper solution to basically do the same more explicitly now (i.e. early return or something) to match the previous behavior? We could potentially ifdef it to builds without MOZ_DIAGNOSTIC_ASSERT_ENABLED set so it doesn't change anything in Nightly and early Beta builds.

That at least seems like it'd be a low-risk upliftable option to address this spike in the short-term while we investigate how we're getting into this state in the first place.

Assignee: nobody → kershaw
Status: NEW → ASSIGNED
Keywords: leave-open
Priority: -- → P1
Whiteboard: [necko-triaged]

(In reply to Ryan VanderMeulen [:RyanVM] from comment #8)

If this was previously a diagnostic assert, that means we were effectively just ignoring this situation on Release in the past, right? Would it be a viable wallpaper solution to basically do the same more explicitly now (i.e. early return or something) to match the previous behavior?

Before the assert, we would overwritten the value in the Maybe<> without calling the destructor. In this case, it looks like it wouldn't cause anything worse than a leak. So, probably not a crash, like I was guessing before.

Pushed by kjang@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/e6178b7ab160 Avoid crashing if mReservedClientInfo already has something, r=edenchuang

Comment on attachment 9268121 [details]
Bug 1759506 - Avoid crashing if mReservedClientInfo already has something, r=#necko

Beta/Release Uplift Approval Request

  • User impact if declined: Could crash.
  • Is this code covered by automated tests?: No
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: N/A
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): The logic of this patch is straightforward.
  • String changes made/needed: N/A
Attachment #9268121 - Flags: approval-mozilla-release?
Attachment #9268121 - Flags: approval-mozilla-beta?

Comment on attachment 9268121 [details]
Bug 1759506 - Avoid crashing if mReservedClientInfo already has something, r=#necko

Approved for 99.0b5. Thanks.

Attachment #9268121 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Comment on attachment 9268121 [details]
Bug 1759506 - Avoid crashing if mReservedClientInfo already has something, r=#necko

We're going to want this on ESR too.

Attachment #9268121 - Flags: approval-mozilla-esr91?

Hi,

I'm attempting to verify the fix but unfortunately I'm unable able to reproduce the crash on my end.
Are there any more detailed steps through which I could try to reproduce it?

Thank you!

Flags: needinfo?(aryx.bugmail)

I hadn't been able to reproduce the issue and based on the fix there is nothing to verify here.

Flags: needinfo?(aryx.bugmail)

Based on previous comments, I'm removing the qe-verify+ flag

QA Whiteboard: [qa-triaged]
Flags: qe-verify+

New report from Beta showing the diagnostic assert being hit now as expected:
https://crash-stats.mozilla.org/report/index/c25f55a3-89bd-41cc-94e2-dbd3f0220320

And as expected, no reports from 99.0b6 so far since diagnostic asserts are tied to early beta or earlier.

Comment on attachment 9268121 [details]
Bug 1759506 - Avoid crashing if mReservedClientInfo already has something, r=#necko

Wallpapers us back to where we were before the regressing bug landed. Approved for 91.8esr.

Attachment #9268121 - Flags: approval-mozilla-esr91? → approval-mozilla-esr91+

Comment on attachment 9268121 [details]
Bug 1759506 - Avoid crashing if mReservedClientInfo already has something, r=#necko

P1/S2 crash. Fix works on Beta, approved for 98.0.2, thanks.

Attachment #9268121 - Flags: approval-mozilla-release? → approval-mozilla-release+

(In reply to Kershaw Chang [:kershaw] from comment #3)

Basically, we expect that when calling ClientChannelHelper::AsyncOnChannelRedirect, the passed new channel's reseervedClient/reservedClientInfo should be null/Nothing(). But it seems that the case violates our assumption. That means the newLoadInfo's SetReservedClientInfo/GiveReservedSource is called before calling ClientChannelHelper::AsyncOnChannelRedirect.

Is this an uncovered error when a client info is set for a new load info?

I think it depends on why the ReservedClientInfo had already been set.

Can we just replace the already existed client info?

According to the context, I think we can replace the existing client Info since it is a cross-origin redirect for document loading.
According to the cross-origin redirection for ClientInfo, the previous ClientInfo should not be valid anymore, so it should be okay to replace the existing one.
But we probably should also clean up the LoadInfo::mReservedClient.

Flags: needinfo?(echuang)

Let's close this bug and investigate further in bug 1761208.

Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Keywords: leave-open
Resolution: --- → FIXED
Target Milestone: --- → 100 Branch
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: