Intermittent test_messageChannel_sharedWorker2.html | application crashed [@ DiscardTransferables] (Assertion failure: false (unknown ownership), at StructuredClone.cpp:430)

RESOLVED FIXED in Firefox 41

Status

()

defect
RESOLVED FIXED
4 years ago
4 years ago

People

(Reporter: RyanVM, Assigned: baku)

Tracking

({assertion, crash, intermittent-failure})

Trunk
mozilla42
ARM
Gonk (Firefox OS)
Points:
---
Dependency tree / graph
Bug Flags:
in-testsuite +

Firefox Tracking Flags

(firefox39 unaffected, firefox40 unaffected, firefox41 fixed, firefox42 fixed, firefox-esr31 unaffected, firefox-esr38 unaffected)

Details

Attachments

(2 attachments, 3 obsolete attachments)

09:28:20 WARNING - PROCESS-CRASH | dom/messagechannel/tests/test_messageChannel_sharedWorker2.html | application crashed [@ DiscardTransferables]
09:28:20 INFO - Crash dump filename: /tmp/tmp6HiD5c/6f842484-a57e-49f5-25be5164-5cf0d4f6.dmp
09:28:20 INFO - Operating system: Android
09:28:20 INFO - 0.0.0 Linux 2.6.29-g41a03df #22 Thu Jun 26 10:59:09 CST 2014 armv7l Android/full/generic:4.0.4.0.4.0.4/OPENMASTER/eng.cltbld.20150618.104245:eng/test-keys
09:28:20 INFO - CPU: arm
09:28:20 INFO - 0 CPUs
09:28:20 INFO - Crash reason: SIGSEGV
09:28:20 INFO - Crash address: 0x0
09:28:20 INFO - Thread 0 (crashed)
09:28:20 INFO - 0 libxul.so!DiscardTransferables [StructuredClone.cpp:4ee70d6c74f3 : 430 + 0x16]
09:28:20 INFO - r4 = 0x00000001 r5 = 0x00000000 r6 = 0x4918c390 r7 = 0x4918c3a8
09:28:20 INFO - r8 = 0x00000000 r9 = 0x00000000 r10 = 0xbeb3cac8 fp = 0xbeb3cc57
09:28:20 INFO - sp = 0xbeb3c938 lr = 0x425371ed pc = 0x42545724
09:28:20 INFO - Found by: given as instruction pointer in context
09:28:20 INFO - 1 libxul.so!JSAutoStructuredCloneBuffer::clear [StructuredClone.cpp:4ee70d6c74f3 : 1991 + 0x9]
09:28:20 INFO - r4 = 0xbeb3c9f8 r5 = 0xbeb3c9cc r6 = 0x49146800 r7 = 0x49146800
09:28:20 INFO - r8 = 0x459d7150 r9 = 0xbeb3cad0 r10 = 0xbeb3cac8 fp = 0xbeb3cc57
09:28:20 INFO - sp = 0xbeb3c978 pc = 0x425497a5
09:28:20 INFO - Found by: call frame info
09:28:20 INFO - 2 libxul.so!mozilla::dom::workers::WorkerPrivateParent<mozilla::dom::workers::WorkerPrivate>::DispatchMessageEventToMessagePort [StructuredClone.h : 190 + 0x5]
09:28:20 INFO - r4 = 0x00000001 r5 = 0xbeb3c9cc r6 = 0x49146800 r7 = 0x49146800
09:28:20 INFO - r8 = 0x459d7150 r9 = 0xbeb3cad0 r10 = 0xbeb3cac8 fp = 0xbeb3cc57
09:28:20 INFO - sp = 0xbeb3c980 pc = 0x41ad01e5
09:28:20 INFO - Found by: call frame info
09:28:20 INFO - 3 libxul.so!MessageEventRunnable::WorkerRun [WorkerPrivate.cpp:4ee70d6c74f3 : 1343 + 0x11]
09:28:20 INFO - r4 = 0x00000000 r5 = 0x00000000 r6 = 0x4577f314 r7 = 0x49146800
09:28:20 INFO - r8 = 0x459d7150 r9 = 0xbeb3cad0 r10 = 0xbeb3cac8 fp = 0xbeb3cc57
09:28:20 INFO - sp = 0xbeb3ca90 pc = 0x41ad0287
09:28:20 INFO - Found by: call frame info
09:28:20 INFO - 4 libxul.so!mozilla::dom::workers::WorkerRunnable::Run [WorkerRunnable.cpp:4ee70d6c74f3 : 357 + 0xb]
09:28:20 INFO - r4 = 0x4577f2e0 r5 = 0x00000000 r6 = 0xbeb3cac4 r7 = 0x459d7150
09:28:20 INFO - r8 = 0xbeb3cae8 r9 = 0xbeb3cad0 r10 = 0xbeb3cac8 fp = 0xbeb3cc57
09:28:20 INFO - sp = 0xbeb3cab8 pc = 0x41ab7433
09:28:20 INFO - Found by: call frame info
09:28:20 INFO - 5 libxul.so!nsThread::ProcessNextEvent [nsThread.cpp:4ee70d6c74f3 : 848 + 0x9]
09:28:20 INFO - r4 = 0x40249394 r5 = 0x00000001 r6 = 0x40249350 r7 = 0xbeb3cc18
09:28:20 INFO - r8 = 0xbeb3cc0c r9 = 0x00000000 r10 = 0x00000000 fp = 0xbeb3cc57
09:28:20 INFO - sp = 0xbeb3cbf0 pc = 0x40bbecd1
09:28:20 INFO - Found by: call frame info
09:28:20 INFO - 6 libxul.so!NS_ProcessNextEvent [nsThreadUtils.cpp:4ee70d6c74f3 : 265 + 0xd]
09:28:20 INFO - r4 = 0x40249350 r5 = 0x00000000 r6 = 0x40256628 r7 = 0x40256620
09:28:20 INFO - r8 = 0x00000001 r9 = 0x40204860 r10 = 0xbeb3cd70 fp = 0xbeb3cd7c
09:28:20 INFO - sp = 0xbeb3cc50 pc = 0x40bd8271
09:28:20 INFO - Found by: call frame info
09:28:20 INFO - 7 libxul.so!mozilla::ipc::MessagePump::Run [MessagePump.cpp:4ee70d6c74f3 : 95 + 0x7]
09:28:20 INFO - r4 = 0x40256610 r5 = 0xbeb3cdb8 r6 = 0x40256628 r7 = 0x40256620
09:28:20 INFO - r8 = 0x00000001 r9 = 0x40204860 r10 = 0xbeb3cd70 fp = 0xbeb3cd7c
09:28:20 INFO - sp = 0xbeb3cc68 pc = 0x40d9895d
09:28:20 INFO - Found by: call frame info
09:28:20 INFO - 8 libxul.so!MessageLoop::RunInternal [message_loop.cc:4ee70d6c74f3 : 234 + 0xf]
09:28:20 INFO - r4 = 0xbeb3cdb8 r5 = 0x45a4eca0 r6 = 0x40249350 r7 = 0x00000003
09:28:20 INFO - r8 = 0xbeb3cdb8 r9 = 0x40204860 r10 = 0xbeb3cd70 fp = 0xbeb3cd7c
09:28:20 INFO - sp = 0xbeb3cc90 pc = 0x40d82f75
09:28:20 INFO - Found by: call frame info
09:28:20 INFO - 9 libxul.so!MessageLoop::Run [message_loop.cc:4ee70d6c74f3 : 227 + 0x5]
09:28:20 INFO - r4 = 0xbeb3cdb8 r5 = 0x45a4eca0 r6 = 0x40249350 r7 = 0x00000003
09:28:20 INFO - r8 = 0xbeb3cdb8 r9 = 0x40204860 r10 = 0xbeb3cd70 fp = 0xbeb3cd7c
09:28:20 INFO - sp = 0xbeb3cca8 pc = 0x40d82f8f
09:28:20 INFO - Found by: call frame info
09:28:20 INFO - 10 libxul.so!nsBaseAppShell::Run [nsBaseAppShell.cpp:4ee70d6c74f3 : 165 + 0x7]
09:28:20 INFO - r4 = 0x00000000 r5 = 0x45a4eca0 r6 = 0x40249350 r7 = 0x00000003
09:28:20 INFO - r8 = 0xbeb3cdb8 r9 = 0x40204860 r10 = 0xbeb3cd70 fp = 0xbeb3cd7c
09:28:20 INFO - sp = 0xbeb3ccc0 pc = 0x41bba347
09:28:20 INFO - Found by: call frame info
Flags: needinfo?(amarchesini)
Assignee: nobody → amarchesini
Flags: needinfo?(amarchesini)
Posted patch force.patch (obsolete) — Splinter Review
Attachment #8624652 - Flags: review?(bent.mozilla)
Comment on attachment 8624652 [details] [diff] [review]
force.patch

Review of attachment 8624652 [details] [diff] [review]:
-----------------------------------------------------------------

Andrea, I need some context here. What is this patch trying to solve?
Flags: needinfo?(amarchesini)
> Andrea, I need some context here. What is this patch trying to solve?

The crash happens because the memory flag for the transfered ports is "CUSTOM" (set by writeTransfer).
When the structured clone algorithm fails, it tries to free the transfered object, and for CUSTOM things we must have a freeTransfer callback otherwise we assert.
This patch adds this callback and with that, it frees the transfered ports forcing a close().
Flags: needinfo?(amarchesini) → needinfo?(bent.mozilla)
Posted patch force.patch (obsolete) — Splinter Review
Attachment #8624652 - Attachment is obsolete: true
Attachment #8624652 - Flags: review?(bent.mozilla)
Attachment #8625442 - Flags: review?(bent.mozilla)
Comment on attachment 8625442 [details] [diff] [review]
force.patch

Review of attachment 8625442 [details] [diff] [review]:
-----------------------------------------------------------------

A few questions here before I r+:

::: dom/base/PostMessageEvent.cpp
@@ +238,5 @@
>                                                uint64_t aExtraData,
>                                                void* aClosure)
>  {
> +  if (aTag == SCTAG_DOM_MAP_MESSAGEPORT) {
> +    MOZ_ASSERT(aClosure);

Nit: MOZ_ASSERT(!aContent) ? You do that in the other two

::: dom/messagechannel/MessagePort.cpp
@@ +250,5 @@
>    }
>  };
>  
> +class ForceCloseRunnable final : public nsIRunnable
> +                               , public nsIIPCBackgroundChildCreateCallback

Nit: these get indented 2 on their own lines

@@ +259,5 @@
> +  explicit ForceCloseRunnable(const MessagePortIdentifier& aIdentifier)
> +    : mIdentifier(aIdentifier)
> +  {}
> +
> +  NS_IMETHOD Run() override

Nit: All these overrides can be private

@@ +268,5 @@
> +      mozilla::ipc::BackgroundChild::GetForCurrentThread();
> +    if (actor) {
> +      ActorCreated(actor);
> +    } else {
> +      if (NS_WARN_IF(

Nit: else if

@@ +284,5 @@
> +  }
> +
> +  void ActorCreated(mozilla::ipc::PBackgroundChild* aActor)
> +  {
> +    MOZ_ASSERT(NS_IsMainThread());

Nit: This seems unnecessary, as long as you have an actor you don't really care.

@@ +919,5 @@
> +{
> +  nsRefPtr<ForceCloseRunnable> runnable = new ForceCloseRunnable(aIdentifier);
> +
> +  // Just to avoid problems with the Worker life-time, we dispatch this
> +  // runnable to the main-thread.

What problems? If you're on a worker you a) don't need to do the async get for PBackground, and b) don't need to go to the main thread. This seems like unnecessary work, so I'd like to know what you're working around.

::: dom/messagechannel/MessagePortService.cpp
@@ +251,5 @@
>  
>    nsID destinationUUID = data->mDestinationUUID;
>    mPorts.Remove(aUUID);
>  
> +  nsRefPtr<MessagePortService> kungFuDeathGrip = this;

Is this really a problem that should be solved here? Or should you take the extra ref in ForceClose()?

@@ +343,5 @@
> +                               const nsID& aDestinationUUID,
> +                               const uint32_t& aSequenceID)
> +{
> +  MessagePortServiceData* data;
> +  if (!mPorts.Get(aUUID, &data)) {

Is there any possibility that this could race with a normal close()? If so you probably shouldn't kill the process...

::: dom/messagechannel/MessagePortService.h
@@ +19,5 @@
>  {
>  public:
>    NS_INLINE_DECL_REFCOUNTING(MessagePortService)
>  
> +  static MessagePortService* Get();

Instead of exposing this class instance why not just have |static bool ForceClose(...);|. That way you don't have to worry about people calling anything other than the one function you want exposed.

::: dom/messagechannel/MessagePortUtils.cpp
@@ +198,5 @@
> +void
> +FreeTransfer(uint32_t aTag, JS::TransferableOwnership aOwnership,
> +             void* aContent, uint64_t aExtraData, void* aClosure)
> +{
> +  auto* closure = static_cast<StructuredCloneClosureInternal*>(aClosure);

Nit: MOZ_ASSERT(closure) ?

@@ +201,5 @@
> +{
> +  auto* closure = static_cast<StructuredCloneClosureInternal*>(aClosure);
> +
> +  if (aTag != SCTAG_DOM_MAP_MESSAGEPORT) {
> +    return;

Nit: You do this backwards from the other two (those don't early-return). Be consistent, either way?

@@ +204,5 @@
> +  if (aTag != SCTAG_DOM_MAP_MESSAGEPORT) {
> +    return;
> +  }
> +
> +  MOZ_ASSERT(aContent == 0);

Nit: !aContent

::: dom/workers/WorkerPrivate.cpp
@@ +764,5 @@
>                 void *aContent, uint64_t aExtraData, void* aClosure)
>    {
> +    if (aTag == SCTAG_DOM_MAP_MESSAGEPORT) {
> +      MOZ_ASSERT(!aContent);
> +      auto* closure = static_cast<WorkerStructuredCloneClosure*>(aClosure);

MOZ_ASSERT(closure) ?
Attachment #8625442 - Flags: review?(bent.mozilla)
Flags: needinfo?(bent.mozilla)
> @@ +919,5 @@
> > +{
> > +  nsRefPtr<ForceCloseRunnable> runnable = new ForceCloseRunnable(aIdentifier);
> > +
> > +  // Just to avoid problems with the Worker life-time, we dispatch this
> > +  // runnable to the main-thread.
> 
> What problems? If you're on a worker you a) don't need to do the async get
> for PBackground, and b) don't need to go to the main thread. This seems like
> unnecessary work, so I'd like to know what you're working around.

Yes. You are right. 

> > +  nsRefPtr<MessagePortService> kungFuDeathGrip = this;
> 
> Is this really a problem that should be solved here? Or should you take the
> extra ref in ForceClose()?

Yeah... that was from a previous implementation. That line as to go away.

> > +  MessagePortServiceData* data;
> > +  if (!mPorts.Get(aUUID, &data)) {
> 
> Is there any possibility that this could race with a normal close()? If so
> you probably shouldn't kill the process...

I'll update the patch and submit a new one for a second review.
Posted patch force.patch (obsolete) — Splinter Review
Attachment #8625442 - Attachment is obsolete: true
Attachment #8625512 - Flags: review?(bent.mozilla)
Comment on attachment 8625512 [details] [diff] [review]
force.patch

Review of attachment 8625512 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/messagechannel/MessagePort.cpp
@@ +254,5 @@
> +{
> +public:
> +  NS_DECL_ISUPPORTS
> +
> +  static void Run(const MessagePortIdentifier& aIdentifier)

Nit: Call this "ForceClose"? "Run" implies an nsIRunnable.

@@ +286,5 @@
> +  {
> +    Run(mIdentifier);
> +  }
> +
> +  ~ForceCloseHelper() {}

Nit: Move next to constructor.

::: dom/messagechannel/MessagePortService.cpp
@@ +55,5 @@
>  
>  /* static */ MessagePortService*
> +MessagePortService::Get()
> +{
> +  return gInstance;

Can you assert here that you're in the main process and on the pbackground thread? Would be great to add those same asserts to GetOrCreate also.

@@ +254,5 @@
>  
>    CloseAll(destinationUUID);
>  
> +  // CloseAll calls itself recursively and it can happen that it deletes
> +  // itself. Before continuing we must check if we are still alive.

So is this true? Or is it only true when called from ForceClose? If so why not just add an extra ref in ForceClose so that you can never accidentally delete yourself?

::: dom/messagechannel/MessagePortUtils.cpp
@@ +204,5 @@
> +
> +  if (aTag == SCTAG_DOM_MAP_MESSAGEPORT) {
> +    MOZ_ASSERT(!aContent);
> +    MOZ_ASSERT(aExtraData < closure->mClosure.mMessagePortIdentifiers.Length());
> +    MessagePort::ForceClose(closure->mClosure.mMessagePortIdentifiers[(uint32_t)aExtraData]);

I don't think the (uint32_t) is necessary here. If it warns then you should also fix the other call in WorkerPrivate.cpp.

::: ipc/glue/BackgroundParentImpl.cpp
@@ +39,5 @@
>  using mozilla::dom::cache::PCacheParent;
>  using mozilla::dom::cache::PCacheStorageParent;
>  using mozilla::dom::cache::PCacheStreamControlParent;
>  using mozilla::dom::MessagePortParent;
> +using mozilla::dom::MessagePortService;

Not needed now
Attachment #8625512 - Flags: review?(bent.mozilla) → review+
Keywords: checkin-needed
Comment on attachment 8625560 [details] [diff] [review]
250015.diff

>+class ForceCloseHelper final : public nsIIPCBackgroundChildCreateCallback
>+{
[...]
>+  void ActorFailed() override
>+  {
>+    MOZ_CRASH("Failed to create a PBackgroundChild actor!");
>+  }
>+
>+  void ActorCreated(mozilla::ipc::PBackgroundChild* aActor)
>+  {
>+    ForceClose(mIdentifier);
>+  }

Here, "ActorCreated" was missing an 'override' annotation, which causes clang 3.6 & newer to spam a "-Winconsistent-missing-override" build warning, which busts --enable-warnings-as-errors builds (with clang 3.6+).

I pushed a followup to add that keyword, with blanket r+ that ehsan granted me for fixes of this sort over in bug 1126447 comment 2:
https://hg.mozilla.org/integration/mozilla-inbound/rev/ae700a7a4c5e
sorry had to back this out, starting with this followup we had frequent build failures like https://treeherder.mozilla.org/logviewer.html#?job_id=11046381&repo=mozilla-inbound
Flags: needinfo?(amarchesini)
https://hg.mozilla.org/mozilla-central/rev/bda3c545aa48
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla41
Still happening :(
Status: RESOLVED → REOPENED
Flags: needinfo?(amarchesini)
Resolution: FIXED → ---
Target Milestone: mozilla41 → ---
Posted patch crash.patchSplinter Review
Flags: needinfo?(amarchesini)
Attachment #8629032 - Flags: review?(sphink)
Comment on attachment 8629032 [details] [diff] [review]
crash.patch

Review of attachment 8629032 [details] [diff] [review]:
-----------------------------------------------------------------

Sorry for the delay in reviewing, I didn't notice you'd already posted it. But I didn't realize it was going to be this ugly. I mean, I understand why you're doing this, and it's sensible given the current setup. But is the scoped closure necessary here? Is there a case where one of these messages could get created but never dispatched, and thus skip the proper cleanup here?

Sorry, I have to run now but I want to look into this more tomorrow before r+'ing. For example, I want to look to see what the closure is being used for to see if it could be using extraData instead.

::: dom/workers/WorkerPrivate.cpp
@@ +3544,5 @@
>    JSAutoStructuredCloneBuffer buffer(Move(aBuffer));
> +  const JSStructuredCloneCallbacks* callbacks =
> +    WorkerStructuredCloneCallbacks(true);
> +
> +  class MOZ_STACK_CLASS RAII final

Name this AutoCloneBufferCleaner or something like that. I thought RAII must be some static analysis token like MOZ_STACK_CLASS.

@@ +3556,5 @@
> +      , mClosure(aClosure)
> +      , mBeenRead(false)
> +    {}
> +
> +    void BeenRead() { mBeenRead = true; }

Why is this needed? clear() checks if (data_), so it seems like you can do it unconditionally.

@@ +3606,5 @@
>                     &closure)) {
>      return false;
>    }
>  
>    buffer.clear();

buffer.clear() isn't strictly necessary here, and it's a little weird because it doesn't pass in the callbacks. If it weren't a no-op (guaranteed by read() succeeding), then it would crash. So I'd rather it were gone entirely or it received the callback params.

::: js/src/vm/StructuredClone.cpp
@@ +1992,1 @@
>      }

In JS-land, no braces around single-line consequent.
Note that if we do want to go this route, it can at least be made prettier with bug 1180299. (But I'm expecting that will take a while to get reviewed.)
I read through a bunch of the code, and it *seems* like there's no opportunity for things to get lost between MessageEventRunnable::Write's mBuffer.Write() and runnable->Dispatch(aCx), nor between runnable->Dispatch(aCx) and WorkerPrivateParent<Derived>::DispatchMessageEventToMessagePort, so the JSAutoStructuredCloneBuffer seems safe from that closure-less time window. Though the number of layers I had to look through makes me nervous.

So I guess I'm ok with signing off on this approach, though I still wonder if more of this couldn't be stored in the extraData -- instead of just holding an index into mMessagePortIdentifiers, could it store mMessagePortIdentifiers too? Or is that different between the threads?

Maybe I'm wrong though. Maybe there does need to be a handoff between sender and receiver where you can swap the callbacks/closure members. (Or better, change it into a subclass of some base class.) I just don't like having a JSAutoStructuredCloneBuffer ever existing without the appropriate callbacks and closure set. There should be some way of making that handoff explicit.
Attachment #8629032 - Flags: review?(sphink) → review+
We can change JSAutoStructuredCloneBuffer so that Read/Write/etc are virtual methods. What about this approach? Follow up?
https://hg.mozilla.org/mozilla-central/rev/2df2e0c0a7b4
Status: REOPENED → RESOLVED
Closed: 4 years ago4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla42
Please nominate this for Aurora approval when you get a chance.
Flags: needinfo?(amarchesini)
Comment on attachment 8625560 [details] [diff] [review]
250015.diff

Approval Request Comment
[Feature/regressing bug #]: MessagePort/MessageChannel
[User impact if declined]: We crash
[Describe test coverage new/current, TreeHerder]: no tests because it's racy.
[Risks and why]: these 2 patches are relatively complex but we have 2 different reviewers for them. I think we don't have regression and treeHerder is fully green with them applied.
[String/UUID change made/needed]: none
Flags: needinfo?(amarchesini)
Attachment #8625560 - Flags: approval-mozilla-aurora?
Comment on attachment 8629032 [details] [diff] [review]
crash.patch

Read the previous comment.
Attachment #8629032 - Flags: approval-mozilla-aurora?
Comment on attachment 8625560 [details] [diff] [review]
250015.diff

This landed prior to the uplift.
Attachment #8625560 - Flags: approval-mozilla-aurora?
Comment on attachment 8629032 [details] [diff] [review]
crash.patch

Approving for uplift to Aurora, has been on m-c with no known issues.
Attachment #8629032 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
You need to log in before you can comment on or make changes to this bug.