Closed Bug 1441131 Opened 6 years ago Closed 6 years ago

Crash in mozilla::ipc::MessageChannel::Send | mozilla::dom::PContentParent::SendDataStorageRemove

Categories

(Core :: Security: PSM, defect, P2)

Unspecified
Linux
defect

Tracking

()

RESOLVED FIXED
mozilla61
Tracking Status
firefox-esr52 --- unaffected
firefox59 --- unaffected
firefox60 - disabled
firefox61 --- fixed

People

(Reporter: valentin, Assigned: bagder)

References

Details

(Keywords: crash, Whiteboard: [trr])

Crash Data

Attachments

(1 file)

This bug was filed from the Socorro interface and is
report bp-e618d262-9dea-47cc-8c19-f5f9f0180226.
=============================================================

Crashes on:
MOZ_RELEASE_ASSERT(mWorkerThread == GetCurrentVirtualThread()) (not on worker thread!)

Top 10 frames of crashing thread:

0 libxul.so mozilla::ipc::MessageChannel::Send [clone .cold.462] 
1 libxul.so mozilla::dom::PContentParent::SendDataStorageRemove ipc/ipdl/PContentParent.cpp:1411
2 libxul.so mozilla::DataStorage::Remove security/manager/ssl/DataStorage.cpp:795
3 libxul.so mozilla::net::TRRService::IsTRRBlacklisted netwerk/dns/TRRService.cpp:400
4 libxul.so nsHostResolver::TrrLookup netwerk/dns/nsHostResolver.cpp:1104
5 libxul.so nsHostResolver::NameLookup [clone .cold.41] 
6 libxul.so nsHostResolver::ResolveHost 
7 libxul.so nsDNSService::AsyncResolveExtendedNative 
8 libxul.so mozilla::net::nsSocketTransport::ResolveHost 
9 libxul.so mozilla::net::nsSocketTransport::OnSocketEvent 

=============================================================
:catalinb: when you have time give this a look
Flags: needinfo?(catalin.badea392)
Priority: -- → P2
(In reply to Valentin Gosu [:valentin] from comment #0)
> This bug was filed from the Socorro interface and is
> report bp-e618d262-9dea-47cc-8c19-f5f9f0180226.
> =============================================================
> 
> Crashes on:
> MOZ_RELEASE_ASSERT(mWorkerThread == GetCurrentVirtualThread()) (not on
> worker thread!)
> 
> Top 10 frames of crashing thread:
> 
> 0 libxul.so mozilla::ipc::MessageChannel::Send [clone .cold.462] 
> 1 libxul.so mozilla::dom::PContentParent::SendDataStorageRemove
> ipc/ipdl/PContentParent.cpp:1411
> 2 libxul.so mozilla::DataStorage::Remove
> security/manager/ssl/DataStorage.cpp:795
> 3 libxul.so mozilla::net::TRRService::IsTRRBlacklisted
> netwerk/dns/TRRService.cpp:400
> 4 libxul.so nsHostResolver::TrrLookup netwerk/dns/nsHostResolver.cpp:1104
> 5 libxul.so nsHostResolver::NameLookup [clone .cold.41] 
> 6 libxul.so nsHostResolver::ResolveHost 
> 7 libxul.so nsDNSService::AsyncResolveExtendedNative 
> 8 libxul.so mozilla::net::nsSocketTransport::ResolveHost 
> 9 libxul.so mozilla::net::nsSocketTransport::OnSocketEvent 
> 
> =============================================================

Why is this a service workers bug?
Flags: needinfo?(catalin.badea392) → needinfo?(valentin.gosu)
Oh, sorry, I filed this in a rush, and saw ContentParent + worker thread and thought "service worker" :)
Component: DOM: Service Workers → Security: PSM
Flags: needinfo?(valentin.gosu)
Adding a Mac signature which seems to be the same issue.
Crash Signature: [@ mozilla::ipc::MessageChannel::Send | mozilla::dom::PContentParent::SendDataStorageRemove] → [@ mozilla::ipc::MessageChannel::Send | mozilla::dom::PContentParent::SendDataStorageRemove] [@ mozilla::ipc::MessageChannel::CxxStackFrame::CxxStackFrame | mozilla::ipc::MessageChannel::Send | mozilla::dom::PContentParent::SendDataStorageRemove ]
This happens a few times a day, usually after a long period of inactivity.
To reproduce set network.trr.mode;0 and network.trr.uri;https://dns.cloudflare.com/.well-known/dns-query
It is odd that this happens, since DataStorage is supposed to be thread safe - here it is called on the socket thread.
Blocks: 1434852
Flags: needinfo?(dkeeler)
Maybe there's a thread restriction on sending IPC messages now? Ehsan, any ideas?
Flags: needinfo?(dkeeler) → needinfo?(ehsan)
This one has gotten me into nice restart-loops twice now this evening. I run TRR mode 2 with the cloudflare server. This is my latest one: https://crash-stats.mozilla.com/report/index/75f9b90b-43f0-4a00-a3c5-8a4de0180308
Whiteboard: [trr]
daniel is going to proxy the Remove call to the main thread. I bet that will resolve this.
Assignee: nobody → daniel
Comment on attachment 8958098 [details]
bug 1441131 - TRR: proxy storage removal to the mainthread

https://reviewboard.mozilla.org/r/227050/#review232776

::: netwerk/dns/TRRService.cpp:362
(Diff revision 1)
> +    , mService(aService), mHashkey(aHashkey), mType(aType)
> +  { }
> +
> +  NS_IMETHOD Run() override
> +  {
> +    mService->mTRRBLStorage->Remove(mHashkey, mType);

I haven't looked throught all the code paths, but it seems that it is possible that mTRRBLStorage could be null here.
Either check if it is null, or have the runnable hold a ref to mTRRBLStorage instead (which I prefer)

::: netwerk/dns/TRRService.cpp:362
(Diff revision 1)
> +    , mService(aService), mHashkey(aHashkey), mType(aType)
> +  { }
> +
> +  NS_IMETHOD Run() override
> +  {
> +    mService->mTRRBLStorage->Remove(mHashkey, mType);

I don't see any way mTRRBLStorage would be null here, which is good, but I don't know if this will be true in the future.
We could instead have the runnable keep a ref to the storage instead. Your call.

::: netwerk/dns/TRRService.cpp:438
(Diff revision 1)
>        LOG(("Host [%s] is TRR blacklisted\n", nsCString(aHost).get()));
>        return true;
>      } else {
>        // the blacklisted entry has expired
> +      if (!NS_IsMainThread()) {
> +        NS_DispatchToMainThread(new ProxyStorageRemove(this, hashkey,

How about

nsCOMPtr<nsIRunnable> runnable = NS_NewRunnableFunction("proxyStorageRemove", [mTRRBLStorage, hashKey, privateBrowsing]() {
  mTRRBLStorage->Remove(mHashkey, privateBrowsing ? DataStorage_Private : DataStorage_Persistent);
})
if (!NS_IsMainThread()) {
  NS_DispatchToMainThread(runnable);
} else {
  runnable->Run();
}

This way we keep all the code right here, and we don't need to make the member public.
Attachment #8958098 - Flags: review?(valentin.gosu) → review+
(In reply to Valentin Gosu [:valentin] from comment #10)
> 
> I don't see any way mTRRBLStorage would be null here, which is good, but I
> don't know if this will be true in the future.
> We could instead have the runnable keep a ref to the storage instead. Your
> call.

Ignore these comments :) I wrote them before I thought of using NS_NewRunnableFunction, and they didn't get deleted from mozreview after :)
Comment on attachment 8958098 [details]
bug 1441131 - TRR: proxy storage removal to the mainthread

https://reviewboard.mozilla.org/r/227050/#review232776

> How about
> 
> nsCOMPtr<nsIRunnable> runnable = NS_NewRunnableFunction("proxyStorageRemove", [mTRRBLStorage, hashKey, privateBrowsing]() {
>   mTRRBLStorage->Remove(mHashkey, privateBrowsing ? DataStorage_Private : DataStorage_Persistent);
> })
> if (!NS_IsMainThread()) {
>   NS_DispatchToMainThread(runnable);
> } else {
>   runnable->Run();
> }
> 
> This way we keep all the code right here, and we don't need to make the member public.

That's an excellent suggestion. I've pushed an update using this approach now.
hg error in cmd: hg push -r tip ssh://hg.mozilla.org/integration/autoland: pushing to ssh://hg.mozilla.org/integration/autoland
searching for changes
remote: adding changesets
remote: adding manifests
remote: adding file changes
remote: added 1 changesets with 1 changes to 1 files
remote: 
remote: 
remote: ************************** ERROR ****************************
remote: Rev 2359a21defd5 needs "Bug N" or "No bug" in the commit message.
remote: Daniel Stenberg <daniel@haxx.se>
remote: bug-1441131 TRR: proxy storage removal to the mainthread r=valentin
remote: 
remote: MozReview-Commit-ID: K4Ar0RbSRzS
remote: *************************************************************
remote: 
remote: 
remote: transaction abort!
remote: rollback completed
remote: pretxnchangegroup.c_commitmessage hook failed
abort: push failed on remote
Pushed by daniel@haxx.se:
https://hg.mozilla.org/integration/autoland/rev/4660a81a0228
TRR: proxy storage removal to the mainthread r=valentin
https://hg.mozilla.org/mozilla-central/rev/4660a81a0228
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla61
Sorry I was too late to respond here, looks like this was resolved before I got a chance to look at it...
Flags: needinfo?(ehsan)
Does this need a Beta approval request for 60?
Flags: needinfo?(daniel)
Comment on attachment 8958098 [details]
bug 1441131 - TRR: proxy storage removal to the mainthread

Approval Request Comment
[Feature/Bug causing the regression]: Bug 1441131
[User impact if declined]: minor, this bug is only triggered for users who manually enable trr via about:config
[Is this code covered by automated tests?]: no
[Has the fix been verified in Nightly?]: yes
[Needs manual test from QE? If yes, steps to reproduce]: 
[List of other uplifts needed for the feature/fix]:
[Is the change risky?]: no
[Why is the change risky/not risky?]: it has been verify used in nightly for a while with no further crashes detected
[String changes made/needed]:
Flags: needinfo?(daniel)
Attachment #8958098 - Flags: approval-mozilla-beta?
Sounds like beta doesn't need this.
Attachment #8958098 - Flags: approval-mozilla-beta? → approval-mozilla-beta-
See Also: → 1450630
[Tracking Requested - why for this release]:

This crash signature is still showing up in FF60/beta.

Could the fix be uplifted?

Signature report for mozilla::ipc::MessageChannel::CxxStackFrame::CxxStackFrame | mozilla::ipc::MessageChannel::Send | mozilla::dom::PContentParent::SendDataStorageRemove
Showing results from 14 days ago

406 Results

Windows 10 	237 	58.4%
OS X 10.13 	101 	24.9%
OS X 10.12 	33 	8.1%
Windows 7 	27 	6.7%
Windows 8.1 	5 	1.2%
OS X 10.10 	2 	0.5%
Windows 8 	1 	0.2%

Firefox 	60.0b11 	156 	45.1% 	36
Firefox 	60.0b10 	149 	43.1% 	37
Firefox 	60.0b12 	21 	6.1% 	9
Firefox 	60.0b9 	18 	5.2% 	2
Firefox 	60.0b8 	2 	0.6% 	2

amd64 	376 	92.6%
x86 	30 	7.4%

99770424-971a-428d-a697-5613c0180413 	2018-04-13 22:35:30 	Firefox 	60.0b12 	20180412172954 	Mac OS X 	EXC_BAD_ACCESS / KERN_INVALID_ADDRESS 	0x0 	2018-04-13 19:52:33
97434f42-815a-4c2b-8c5b-275110180413 	2018-04-13 22:28:14 	Firefox 	60.0b12 	20180412172954 	Windows NT 	EXCEPTION_BREAKPOINT 	0x7ffe502d85b7 	2018-04-13 16:14:05
71de442d-24e7-4a15-ba36-a222e0180413 	2018-04-13 21:52:51 	Firefox 	60.0b12 	20180412172954 	Windows NT 	EXCEPTION_BREAKPOINT 	0x7ffe608385b7 	2018-04-13 16:14:05
1e98f9ff-c8d5-44a9-94a6-a64ad0180413 	2018-04-13 21:52:25 	Firefox 	60.0b12 	20180412172954 	Windows NT 	EXCEPTION_BREAKPOINT 	0x7ffe608385b7 	2018-04-13 16:14:05
9cfcc0b8-acff-483e-b3fa-4b8c30180413 	2018-04-13 20:55:56 	Firefox 	60.0b11 	20180409184545 	Mac OS X 	EXC_BAD_ACCESS / KERN_INVALID_ADDRESS 	0x0 	2018-04-13 20:53:47
1b9def9b-1fb0-4454-8068-554590180413 	2018-04-13 20:53:57 	Firefox 	60.0b10 	20180404171943 	Mac OS X 	EXC_BAD_ACCESS / KERN_INVALID_ADDRESS 	0x0 	2018-04-09
Flags: needinfo?(daniel)
Well, the bug (I created) is only triggered by TRR which is only enabled by pref and we don't really tell people to use or try TRR on Firefox 60 (we've fixed a whole range of TRR bugs in 61 since). So an easy way to avoid this crash in 60 is just to not use TRR and instead upgrade to 61 if you want to play with TRR.

In addition, the fix for this problem wasn't enough for 61 either. This crash changed signature and is since 61 now known bug 1450630.
Flags: needinfo?(daniel)
The crashes in 60 seem to come from few installs with TRR manually enabled.  I think they get to keep both pieces.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: