Closed Bug 1254730 Opened 7 years ago Closed 7 years ago

Intermittent test_websocket1.html,test_websocket2.html,test_websocket3.html,test_websocket4.html | Test timed out. || test 34 custom server code | test 34 custom server reason

Categories

(Core :: DOM: Workers, defect)

47 Branch
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla48
Tracking Status
firefox47 --- fixed
firefox48 --- fixed

People

(Reporter: KWierso, Assigned: baku)

Details

(Keywords: intermittent-failure)

Attachments

(1 file)

https://treeherder.mozilla.org/logviewer.html#?job_id=23252125&repo=mozilla-inbound
Summary: Intermittent test_websocket2.html | Test timed out. → Intermittent test_websocket2.html,test_websocket4.html | Test timed out.
https://treeherder.mozilla.org/logviewer.html#?job_id=23279784&repo=mozilla-inbound
Summary: Intermittent test_websocket2.html,test_websocket4.html | Test timed out. → Intermittent test_websocket1.html,test_websocket2.html,test_websocket4.html | Test timed out.
https://treeherder.mozilla.org/logviewer.html#?job_id=23284391&repo=mozilla-inbound
Summary: Intermittent test_websocket1.html,test_websocket2.html,test_websocket4.html | Test timed out. → Intermittent test_websocket1.html,test_websocket2.html,test_websocket4.html | Test timed out. || test 34 custom server code | test 34 custom server reason
Flags: needinfo?(amarchesini)
Summary: Intermittent test_websocket1.html,test_websocket2.html,test_websocket4.html | Test timed out. || test 34 custom server code | test 34 custom server reason → Intermittent test_websocket1.html,test_websocket2.html,test_websocket3.html,test_websocket4.html | Test timed out. || test 34 custom server code | test 34 custom server reason
Attached patch mutex.patchSplinter Review
The reason why we have this intermittent failure is that WebSocketChannel uses ChannelEventQueue from different threads. With this patch, I made ChannelEventQueue thread-safe, basically adding a mutex, removing ShouldEnqueue+Enqueue and adding RunOrEnqueue().

RunOrEnqueue() uses a mutex to check if we can run the ChannelEvent immediately or if we have to store it until a pending operation is completed.

This patch fixes another 'potential' error: PrependEvents() adds new events in the queue, but what about if this is called by a callback scheduled by ForceFlush()? Without this patch we use an index to schedule callbacks in order, but if we change the array in the meantime, that index points to wrong callbacks.
Assignee: nobody → amarchesini
Flags: needinfo?(amarchesini)
Attachment #8730005 - Flags: review?(jduell.mcbugs)
Attachment #8730005 - Flags: review?(jduell.mcbugs) → review?(michal.novotny)
Comment on attachment 8730005 [details] [diff] [review]
mutex.patch

Review of attachment 8730005 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good, r+ when you keep somehow the assertions.

::: netwerk/ipc/ChannelEventQueue.h
@@ +206,3 @@
>  }
>  
> +// Ensures that RunOrEnqueue() will collecting events during its lifetime

will be collecting

::: netwerk/protocol/ftp/FTPChannelChild.cpp
@@ -382,5 @@
> -  if (mEventQ->ShouldEnqueue()) {
> -    mEventQ->Enqueue(
> -      new FTPDataAvailableEvent(this, channelStatus, data, offset, count));
> -  } else {
> -    MOZ_RELEASE_ASSERT(!mDivertingToParent,

We're losing this check in the new code (also at other places in this patch). Given that it is release assert I assume it's important and should be kept. Maybe RunOrEnqueue could have an argument assertWhenNotQueued or something like that.
Attachment #8730005 - Flags: review?(michal.novotny) → review+
Is this something we need on other branches too?
> +  // @param aCallback - the ChannelEvent
> +  // @param aAssertWhenNotQueued - this optional param will be used in an
> +  //   assertion when the event is executed directly.
> +  inline void RunOrEnqueue(ChannelEvent* aCallback,
> +                           bool aAssertWhenNotQueued = true);

Description of the argument and the logic is correct, but the from the name I would expect that if the argument is true then the assertion is raised if the event is not queued. Maybe aAssertionWhenNotQueued would be better name?

And all the places below should IMO use the assertion argument as well.

> -  MOZ_RELEASE_ASSERT(mEventQ->ShouldEnqueue());
> -
> -  mEventQ->Enqueue(new HttpFlushedForDiversionEvent(this));
> +
> +  mEventQ->RunOrEnqueue(new HttpFlushedForDiversionEvent(this));

 
> -  if (mEventQ->ShouldEnqueue()) {
> -    mEventQ->Enqueue(new FTPFlushedForDiversionEvent(this));
> -  } else {
> -    MOZ_CRASH();
> -  }
> +  mEventQ->RunOrEnqueue(new FTPFlushedForDiversionEvent(this));


> -  if (mEventQ->ShouldEnqueue()) {
> -    mEventQ->Enqueue(new StopRequestEvent(this, channelStatus, timing));
> -  } else {
> -    MOZ_ASSERT(!mDivertingToParent, "ShouldEnqueue when diverting to parent!");
> -
> -    OnStopRequest(channelStatus, timing);
> -  }
> +  mEventQ->RunOrEnqueue(new StopRequestEvent(this, channelStatus, timing));
https://hg.mozilla.org/mozilla-central/rev/db5ef9e43187
https://hg.mozilla.org/mozilla-central/rev/a8ae52b3b96c
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla48
Comment on attachment 8730005 [details] [diff] [review]
mutex.patch

Approval Request Comment
[Feature/regressing bug #]: WebSocket in workers
[User impact if declined]: a race condition can block the scheduling of events.
[Describe test coverage new/current, TreeHerder]: none
[Risks and why]: The patch makes ChannelEventQueue a thread-safe object. It's a small patch but it uses mutex.
[String/UUID change made/needed]: none

This approval request is for both m-i commits.
Attachment #8730005 - Flags: approval-mozilla-aurora?
Wes, Carsten: Can you please confirm whether this fixes the intermittent or not? It's a pretty big code change and I want to be sure this helps the situation. Thanks!
Flags: needinfo?(wkocher)
Flags: needinfo?(cbook)
Possibly too soon to know for sure, but there's been less occurrences since this landed: https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1254730
Flags: needinfo?(wkocher)
Flags: needinfo?(cbook)
Comment on attachment 8730005 [details] [diff] [review]
mutex.patch

This seems to be helping the intermittent failure, Aurora47+
Attachment #8730005 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
You need to log in before you can comment on or make changes to this bug.