Crash in mozilla::ipc::MessageChannel::Send | mozilla::dom::PContentParent::SendDataStorageRemove

RESOLVED FIXED in Firefox 61

Status

()

defect
P2
critical
RESOLVED FIXED
a year ago
a year ago

People

(Reporter: valentin, Assigned: bagder)

Tracking

({crash})

Trunk
mozilla61
Unspecified
Linux
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox-esr52 unaffected, firefox59 unaffected, firefox60- disabled, firefox61 fixed)

Details

(Whiteboard: [trr], crash signature)

Attachments

(1 attachment)

This bug was filed from the Socorro interface and is
report bp-e618d262-9dea-47cc-8c19-f5f9f0180226.
=============================================================

Crashes on:
MOZ_RELEASE_ASSERT(mWorkerThread == GetCurrentVirtualThread()) (not on worker thread!)

Top 10 frames of crashing thread:

0 libxul.so mozilla::ipc::MessageChannel::Send [clone .cold.462] 
1 libxul.so mozilla::dom::PContentParent::SendDataStorageRemove ipc/ipdl/PContentParent.cpp:1411
2 libxul.so mozilla::DataStorage::Remove security/manager/ssl/DataStorage.cpp:795
3 libxul.so mozilla::net::TRRService::IsTRRBlacklisted netwerk/dns/TRRService.cpp:400
4 libxul.so nsHostResolver::TrrLookup netwerk/dns/nsHostResolver.cpp:1104
5 libxul.so nsHostResolver::NameLookup [clone .cold.41] 
6 libxul.so nsHostResolver::ResolveHost 
7 libxul.so nsDNSService::AsyncResolveExtendedNative 
8 libxul.so mozilla::net::nsSocketTransport::ResolveHost 
9 libxul.so mozilla::net::nsSocketTransport::OnSocketEvent 

=============================================================
:catalinb: when you have time give this a look
Flags: needinfo?(catalin.badea392)
Priority: -- → P2
(In reply to Valentin Gosu [:valentin] from comment #0)
> This bug was filed from the Socorro interface and is
> report bp-e618d262-9dea-47cc-8c19-f5f9f0180226.
> =============================================================
> 
> Crashes on:
> MOZ_RELEASE_ASSERT(mWorkerThread == GetCurrentVirtualThread()) (not on
> worker thread!)
> 
> Top 10 frames of crashing thread:
> 
> 0 libxul.so mozilla::ipc::MessageChannel::Send [clone .cold.462] 
> 1 libxul.so mozilla::dom::PContentParent::SendDataStorageRemove
> ipc/ipdl/PContentParent.cpp:1411
> 2 libxul.so mozilla::DataStorage::Remove
> security/manager/ssl/DataStorage.cpp:795
> 3 libxul.so mozilla::net::TRRService::IsTRRBlacklisted
> netwerk/dns/TRRService.cpp:400
> 4 libxul.so nsHostResolver::TrrLookup netwerk/dns/nsHostResolver.cpp:1104
> 5 libxul.so nsHostResolver::NameLookup [clone .cold.41] 
> 6 libxul.so nsHostResolver::ResolveHost 
> 7 libxul.so nsDNSService::AsyncResolveExtendedNative 
> 8 libxul.so mozilla::net::nsSocketTransport::ResolveHost 
> 9 libxul.so mozilla::net::nsSocketTransport::OnSocketEvent 
> 
> =============================================================

Why is this a service workers bug?
Flags: needinfo?(catalin.badea392) → needinfo?(valentin.gosu)
Oh, sorry, I filed this in a rush, and saw ContentParent + worker thread and thought "service worker" :)
Component: DOM: Service Workers → Security: PSM
Flags: needinfo?(valentin.gosu)
Adding a Mac signature which seems to be the same issue.
Crash Signature: [@ mozilla::ipc::MessageChannel::Send | mozilla::dom::PContentParent::SendDataStorageRemove] → [@ mozilla::ipc::MessageChannel::Send | mozilla::dom::PContentParent::SendDataStorageRemove] [@ mozilla::ipc::MessageChannel::CxxStackFrame::CxxStackFrame | mozilla::ipc::MessageChannel::Send | mozilla::dom::PContentParent::SendDataStorageRemove ]
This happens a few times a day, usually after a long period of inactivity.
To reproduce set network.trr.mode;0 and network.trr.uri;https://dns.cloudflare.com/.well-known/dns-query
It is odd that this happens, since DataStorage is supposed to be thread safe - here it is called on the socket thread.
Blocks: DoH
Flags: needinfo?(dkeeler)
Maybe there's a thread restriction on sending IPC messages now? Ehsan, any ideas?
Flags: needinfo?(dkeeler) → needinfo?(ehsan)
Assignee

Comment 7

a year ago
This one has gotten me into nice restart-loops twice now this evening. I run TRR mode 2 with the cloudflare server. This is my latest one: https://crash-stats.mozilla.com/report/index/75f9b90b-43f0-4a00-a3c5-8a4de0180308
Assignee

Updated

a year ago
Whiteboard: [trr]
daniel is going to proxy the Remove call to the main thread. I bet that will resolve this.
Comment hidden (mozreview-request)
Assignee

Updated

a year ago
Assignee: nobody → daniel
Reporter

Comment 10

a year ago
mozreview-review
Comment on attachment 8958098 [details]
bug 1441131 - TRR: proxy storage removal to the mainthread

https://reviewboard.mozilla.org/r/227050/#review232776

::: netwerk/dns/TRRService.cpp:362
(Diff revision 1)
> +    , mService(aService), mHashkey(aHashkey), mType(aType)
> +  { }
> +
> +  NS_IMETHOD Run() override
> +  {
> +    mService->mTRRBLStorage->Remove(mHashkey, mType);

I haven't looked throught all the code paths, but it seems that it is possible that mTRRBLStorage could be null here.
Either check if it is null, or have the runnable hold a ref to mTRRBLStorage instead (which I prefer)

::: netwerk/dns/TRRService.cpp:362
(Diff revision 1)
> +    , mService(aService), mHashkey(aHashkey), mType(aType)
> +  { }
> +
> +  NS_IMETHOD Run() override
> +  {
> +    mService->mTRRBLStorage->Remove(mHashkey, mType);

I don't see any way mTRRBLStorage would be null here, which is good, but I don't know if this will be true in the future.
We could instead have the runnable keep a ref to the storage instead. Your call.

::: netwerk/dns/TRRService.cpp:438
(Diff revision 1)
>        LOG(("Host [%s] is TRR blacklisted\n", nsCString(aHost).get()));
>        return true;
>      } else {
>        // the blacklisted entry has expired
> +      if (!NS_IsMainThread()) {
> +        NS_DispatchToMainThread(new ProxyStorageRemove(this, hashkey,

How about

nsCOMPtr<nsIRunnable> runnable = NS_NewRunnableFunction("proxyStorageRemove", [mTRRBLStorage, hashKey, privateBrowsing]() {
  mTRRBLStorage->Remove(mHashkey, privateBrowsing ? DataStorage_Private : DataStorage_Persistent);
})
if (!NS_IsMainThread()) {
  NS_DispatchToMainThread(runnable);
} else {
  runnable->Run();
}

This way we keep all the code right here, and we don't need to make the member public.
Attachment #8958098 - Flags: review?(valentin.gosu) → review+
(In reply to Valentin Gosu [:valentin] from comment #10)
> 
> I don't see any way mTRRBLStorage would be null here, which is good, but I
> don't know if this will be true in the future.
> We could instead have the runnable keep a ref to the storage instead. Your
> call.

Ignore these comments :) I wrote them before I thought of using NS_NewRunnableFunction, and they didn't get deleted from mozreview after :)
Comment hidden (mozreview-request)
Assignee

Comment 13

a year ago
mozreview-review-reply
Comment on attachment 8958098 [details]
bug 1441131 - TRR: proxy storage removal to the mainthread

https://reviewboard.mozilla.org/r/227050/#review232776

> How about
> 
> nsCOMPtr<nsIRunnable> runnable = NS_NewRunnableFunction("proxyStorageRemove", [mTRRBLStorage, hashKey, privateBrowsing]() {
>   mTRRBLStorage->Remove(mHashkey, privateBrowsing ? DataStorage_Private : DataStorage_Persistent);
> })
> if (!NS_IsMainThread()) {
>   NS_DispatchToMainThread(runnable);
> } else {
>   runnable->Run();
> }
> 
> This way we keep all the code right here, and we don't need to make the member public.

That's an excellent suggestion. I've pushed an update using this approach now.
hg error in cmd: hg push -r tip ssh://hg.mozilla.org/integration/autoland: pushing to ssh://hg.mozilla.org/integration/autoland
searching for changes
remote: adding changesets
remote: adding manifests
remote: adding file changes
remote: added 1 changesets with 1 changes to 1 files
remote: 
remote: 
remote: ************************** ERROR ****************************
remote: Rev 2359a21defd5 needs "Bug N" or "No bug" in the commit message.
remote: Daniel Stenberg <daniel@haxx.se>
remote: bug-1441131 TRR: proxy storage removal to the mainthread r=valentin
remote: 
remote: MozReview-Commit-ID: K4Ar0RbSRzS
remote: *************************************************************
remote: 
remote: 
remote: transaction abort!
remote: rollback completed
remote: pretxnchangegroup.c_commitmessage hook failed
abort: push failed on remote
Comment hidden (mozreview-request)

Comment 16

a year ago
Pushed by daniel@haxx.se:
https://hg.mozilla.org/integration/autoland/rev/4660a81a0228
TRR: proxy storage removal to the mainthread r=valentin

Comment 17

a year ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/4660a81a0228
Status: NEW → RESOLVED
Last Resolved: a year ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla61

Comment 18

a year ago
Sorry I was too late to respond here, looks like this was resolved before I got a chance to look at it...
Flags: needinfo?(ehsan)
Does this need a Beta approval request for 60?
Flags: needinfo?(daniel)
Assignee

Comment 20

a year ago
Comment on attachment 8958098 [details]
bug 1441131 - TRR: proxy storage removal to the mainthread

Approval Request Comment
[Feature/Bug causing the regression]: Bug 1441131
[User impact if declined]: minor, this bug is only triggered for users who manually enable trr via about:config
[Is this code covered by automated tests?]: no
[Has the fix been verified in Nightly?]: yes
[Needs manual test from QE? If yes, steps to reproduce]: 
[List of other uplifts needed for the feature/fix]:
[Is the change risky?]: no
[Why is the change risky/not risky?]: it has been verify used in nightly for a while with no further crashes detected
[String changes made/needed]:
Flags: needinfo?(daniel)
Attachment #8958098 - Flags: approval-mozilla-beta?
Sounds like beta doesn't need this.
Attachment #8958098 - Flags: approval-mozilla-beta? → approval-mozilla-beta-
Assignee

Updated

a year ago
See Also: → 1450630
[Tracking Requested - why for this release]:

This crash signature is still showing up in FF60/beta.

Could the fix be uplifted?

Signature report for mozilla::ipc::MessageChannel::CxxStackFrame::CxxStackFrame | mozilla::ipc::MessageChannel::Send | mozilla::dom::PContentParent::SendDataStorageRemove
Showing results from 14 days ago

406 Results

Windows 10 	237 	58.4%
OS X 10.13 	101 	24.9%
OS X 10.12 	33 	8.1%
Windows 7 	27 	6.7%
Windows 8.1 	5 	1.2%
OS X 10.10 	2 	0.5%
Windows 8 	1 	0.2%

Firefox 	60.0b11 	156 	45.1% 	36
Firefox 	60.0b10 	149 	43.1% 	37
Firefox 	60.0b12 	21 	6.1% 	9
Firefox 	60.0b9 	18 	5.2% 	2
Firefox 	60.0b8 	2 	0.6% 	2

amd64 	376 	92.6%
x86 	30 	7.4%

99770424-971a-428d-a697-5613c0180413 	2018-04-13 22:35:30 	Firefox 	60.0b12 	20180412172954 	Mac OS X 	EXC_BAD_ACCESS / KERN_INVALID_ADDRESS 	0x0 	2018-04-13 19:52:33
97434f42-815a-4c2b-8c5b-275110180413 	2018-04-13 22:28:14 	Firefox 	60.0b12 	20180412172954 	Windows NT 	EXCEPTION_BREAKPOINT 	0x7ffe502d85b7 	2018-04-13 16:14:05
71de442d-24e7-4a15-ba36-a222e0180413 	2018-04-13 21:52:51 	Firefox 	60.0b12 	20180412172954 	Windows NT 	EXCEPTION_BREAKPOINT 	0x7ffe608385b7 	2018-04-13 16:14:05
1e98f9ff-c8d5-44a9-94a6-a64ad0180413 	2018-04-13 21:52:25 	Firefox 	60.0b12 	20180412172954 	Windows NT 	EXCEPTION_BREAKPOINT 	0x7ffe608385b7 	2018-04-13 16:14:05
9cfcc0b8-acff-483e-b3fa-4b8c30180413 	2018-04-13 20:55:56 	Firefox 	60.0b11 	20180409184545 	Mac OS X 	EXC_BAD_ACCESS / KERN_INVALID_ADDRESS 	0x0 	2018-04-13 20:53:47
1b9def9b-1fb0-4454-8068-554590180413 	2018-04-13 20:53:57 	Firefox 	60.0b10 	20180404171943 	Mac OS X 	EXC_BAD_ACCESS / KERN_INVALID_ADDRESS 	0x0 	2018-04-09
Flags: needinfo?(daniel)
Assignee

Comment 23

a year ago
Well, the bug (I created) is only triggered by TRR which is only enabled by pref and we don't really tell people to use or try TRR on Firefox 60 (we've fixed a whole range of TRR bugs in 61 since). So an easy way to avoid this crash in 60 is just to not use TRR and instead upgrade to 61 if you want to play with TRR.

In addition, the fix for this problem wasn't enough for 61 either. This crash changed signature and is since 61 now known bug 1450630.
Flags: needinfo?(daniel)
The crashes in 60 seem to come from few installs with TRR manually enabled.  I think they get to keep both pieces.

Updated

a year ago
Duplicate of this bug: 1454957

Updated

a year ago
Duplicate of this bug: 1466995
You need to log in before you can comment on or make changes to this bug.