Closed
Bug 1131375
Opened 9 years ago
Closed 9 years ago
[e10s] Sync messages cause async messages to be processed out of order when dispatched from the in content child process
Categories
(Core :: DOM: Content Processes, defect)
Tracking
()
RESOLVED
FIXED
mozilla40
People
(Reporter: mossop, Assigned: billm, NeedInfo)
Details
Attachments
(2 files, 6 obsolete files)
3.84 KB,
patch
|
billm
:
review+
|
Details | Diff | Splinter Review |
16.09 KB,
patch
|
billm
:
review+
|
Details | Diff | Splinter Review |
When a frame script is loaded in the main process (for a tab that isn't loaded remotely) sending a sync messages causes async messages to be processed out of order. STR: In a framescript for a frame loaded in the main process: 1. Send some async messages through the frame message manager. 2. Send some async messages through the process message manager. 3. Send a sync message through the process message manager. In the parent we would expect to see the messages arrive in the order sent. Instead sending a sync message through the process message manager flushes all async messages from the process message manager immediately (https://dxr.mozilla.org/mozilla-central/source/dom/base/nsFrameMessageManager.cpp#1834) so we see the messages in the order 2, 3, 1. The same is true if we swap the frame and process message managers (https://dxr.mozilla.org/mozilla-central/source/dom/base/nsInProcessTabChildGlobal.cpp#38).
Updated•9 years ago
|
Updated•9 years ago
|
Assignee: nobody → wmccloskey
Assignee | ||
Updated•9 years ago
|
Attachment #8561752 -
Flags: review+
Assignee | ||
Comment 1•9 years ago
|
||
Thanks Dave. Always a pleasure to fix these bugs with tests attached :-). This patch adds a singleton class that holds all the messages queued by the <browser> message manager as well as the process message manager. That way everything can be flushed at once.
Attachment #8563104 -
Flags: review?(bugs)
Assignee | ||
Comment 2•9 years ago
|
||
I also noticed a problem where frame scripts that are loaded after a message is sent can be run before the handler for the message. I think we should fix this too. I just made frame script loading always go through the event loop like other messages.
Attachment #8563107 -
Flags: review?(bugs)
Assignee | ||
Comment 3•9 years ago
|
||
Here's a test for the frame script ordering issue.
Attachment #8563108 -
Flags: review?(dtownsend)
Assignee | ||
Comment 4•9 years ago
|
||
Found a bug on try.
Attachment #8563107 -
Attachment is obsolete: true
Attachment #8563107 -
Flags: review?(bugs)
Attachment #8563169 -
Flags: review?(mconley)
Comment 5•9 years ago
|
||
Hey billm - curious to know if you actually meant to retarget this review request to me. nsInProcessTabChildGlobal.cpp is something I'm only tangentially familiar with (at best) - smaug was probably the better choice. Or was there something particular in there you wanted me to gnaw on?
Flags: needinfo?(wmccloskey)
Reporter | ||
Updated•9 years ago
|
Attachment #8563108 -
Flags: review?(dtownsend) → review+
Assignee | ||
Comment 6•9 years ago
|
||
Comment on attachment 8563169 [details] [diff] [review] script-load-order v2 Sorry, yeah, this was meant for smaug.
Flags: needinfo?(wmccloskey)
Attachment #8563169 -
Flags: review?(mconley) → review?(bugs)
Assignee | ||
Updated•9 years ago
|
Attachment #8563104 -
Flags: review?(bugs)
Assignee | ||
Comment 7•9 years ago
|
||
Comment on attachment 8563169 [details] [diff] [review] script-load-order v2 Let's wait on these until I have a green try run. This code is a little trickier than I thought.
Attachment #8563169 -
Flags: review?(bugs)
Assignee | ||
Comment 8•9 years ago
|
||
This one is green after fixing a leak. I'll have to debug the frame script loading patch some more.
Attachment #8563104 -
Attachment is obsolete: true
Attachment #8564455 -
Flags: review?(bugs)
Comment 9•9 years ago
|
||
Comment on attachment 8564455 [details] [diff] [review] message-order v2 >+SameProcessMessageQueue::~SameProcessMessageQueue() >+{ >+ MOZ_ASSERT(mQueue.IsEmpty()); It is not clear to me why mQueue couldn't be non-empty here. >+SameProcessMessageQueue::Push(Runnable* aRunnable) >+{ >+ mQueue.AppendElement(aRunnable); >+ NS_DispatchToCurrentThread(aRunnable); >+} >+ >+void >+SameProcessMessageQueue::Flush() >+{ >+ nsTArray<nsRefPtr<Runnable>> queue; >+ mQueue.SwapElements(queue); >+ for (size_t i = 0; i < queue.Length(); i++) { >+ nsRefPtr<Runnable> runnable = queue[i]; Why you need nsRefPtr here? nsTArray should guarantee the object stays alive >+ class Runnable : public nsIRunnable { nit, { goes to its own line. >+ public: >+ explicit Runnable(nsIRunnable* aRunnable); >+ >+ NS_DECL_ISUPPORTS >+ NS_DECL_NSIRUNNABLE >+ >+ private: >+ virtual ~Runnable() {} >+ >+ bool mDispatched; >+ nsCOMPtr<nsIRunnable> mRunnable; >+ }; So this setup is a bit odd. Wrapping other runnables into this one. Why not just make the relevant runnables classes to inherit this new Runnable class instead of the current nsRunnable. Less objects, less malloc'ing and free'ing. You may want to change Run() in those other classes to HandleMessage or some such, and check mDispatched before calling that method.
Attachment #8564455 -
Flags: review?(bugs) → review-
Assignee | ||
Comment 10•9 years ago
|
||
Attachment #8564455 -
Attachment is obsolete: true
Attachment #8565190 -
Flags: review?(bugs)
Comment 11•9 years ago
|
||
Comment on attachment 8565190 [details] [diff] [review] message-order v3 (at some point we may want to implement the queue as linked list)
Attachment #8565190 -
Flags: review?(bugs) → review+
Assignee | ||
Comment 12•9 years ago
|
||
remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/820a0cabcb92 remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/b7d9158b53e7 I'll move the other patch to its own bug once it's ready.
Comment 13•9 years ago
|
||
Backed out for Windows bustage. https://hg.mozilla.org/integration/mozilla-inbound/rev/5ae55992592e https://treeherder.mozilla.org/logviewer.html#?job_id=6729046&repo=mozilla-inbound
Comment 14•9 years ago
|
||
And Gaia python integration test failures (every suite was hitting failures like the one below). https://treeherder.mozilla.org/logviewer.html#?job_id=6734899&repo=mozilla-inbound
Assignee | ||
Comment 15•9 years ago
|
||
Fixes the windows build failure.
Attachment #8565190 -
Attachment is obsolete: true
Attachment #8569545 -
Flags: review+
Assignee | ||
Updated•9 years ago
|
Attachment #8563169 -
Attachment is obsolete: true
Assignee | ||
Updated•9 years ago
|
Attachment #8563108 -
Attachment is obsolete: true
Assignee | ||
Comment 16•9 years ago
|
||
Fabrice, could you maybe help out here? These patches pass all tests except for the Gaia UI tests. All they do is to ensure that messages sent using process message managers are ordered correctly with respect to messages sent using frame message managers in single-process mode. Here's a try run that shows the failures: https://treeherder.mozilla.org/#/jobs?repo=try&revision=4993f1df9833 I followed all the instructions for building a b2g Linux desktop build and running the UI tests. I just can't get the problem to reproduce. And yet it seems to happen all the time on tinderbox. I have no idea how to debug this otherwise since the tests don't seem to generate any logs. I would very much appreciate any help. I've spent hours on this and I'm on the verge of abandoning this bug.
Flags: needinfo?(fabrice)
Comment 17•9 years ago
|
||
I can't reproduce locally either :( James & Gareth, who could help debug that?
Flags: needinfo?(jlal)
Flags: needinfo?(gaye)
Flags: needinfo?(fabrice)
Comment 18•9 years ago
|
||
Perhaps Geo or Davehunt?
Flags: needinfo?(jlal)
Flags: needinfo?(gmealer)
Flags: needinfo?(gaye)
Flags: needinfo?(dave.hunt)
Comment 19•9 years ago
|
||
There are logs. If you view the jobs on Treeherder the panel in the bottom left has some links: Job: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=b7d9158b53e7 HTML report: http://mozilla-releng-blobs.s3.amazonaws.com/blobs/mozilla-inbound/sha512/f618e832cc7120a80abbbc234422ba82adeab08503b6e37dcea13afefdbf01b149380a5e15c08638ff50799e0346163dd1e68b487716f39840370cbb3272097f Gecko log: http://mozilla-releng-blobs.s3.amazonaws.com/blobs/mozilla-inbound/sha512/ec6bbb5827335359b1df434f5fbd99cc62217a681056578c06a7861c72d1f6d5b6039dce6a925567f08b03e2abd7c8e76a6fa4d6c589d9d7bb179eacebdac41d Raw (JSON) log: http://mozilla-releng-blobs.s3.amazonaws.com/blobs/mozilla-inbound/sha512/01c3e3e482bf7ecfb77b5fff54a57910c846f57fe3197048a26802428ed6fdacedc35d11c8bd19a244c679930b99bcdd380cc9ce2e0dd9117c3d702c3001a8ee It looks like we have issues launching certain apps like Gallery and Videos.
Flags: needinfo?(dave.hunt)
Comment 20•9 years ago
|
||
I've noticed similar crashes in today's test results, so it may not be related to this patch. See bug 1137653.
Assignee | ||
Comment 21•9 years ago
|
||
Any update here Dave? I'm having trouble figuring out what's going on in bug 1137653 and how it relates to this one.
Flags: needinfo?(dave.hunt)
Comment 22•9 years ago
|
||
I think the frequency of the crashes has reduced. Does try still reproduce this?
Flags: needinfo?(dave.hunt)
Assignee | ||
Comment 23•9 years ago
|
||
Looks like it still reproduces. https://treeherder.mozilla.org/#/jobs?repo=try&revision=e7192f4e582e
Assignee | ||
Comment 24•9 years ago
|
||
Dave, could you please take another look at this? I looked through the different logs and nothing pops out. These tests fail consistently on try but so far no one has been able to reproduce the problem.
Flags: needinfo?(dave.hunt)
Comment 25•9 years ago
|
||
Looking at the latest try results, we appear to be timing out whilst executing an async script to launch an app. Here we're waiting for the 'appopen' event to fire, see https://github.com/mozilla-b2g/gaia/blob/864e60fd182528de9e3eba1d8153667b2caeba6f/tests/atoms/gaia_apps.js#L284 is it possible that with this patch the event is not being fired? Have you tried reproducing the failure with the same environment as buildbot?
Flags: needinfo?(dave.hunt)
Assignee | ||
Comment 26•9 years ago
|
||
This seems to have started passing: https://treeherder.mozilla.org/#/jobs?repo=try&revision=276bee5a43b9 https://hg.mozilla.org/integration/mozilla-inbound/rev/c2f12ead270c
Comment 27•9 years ago
|
||
Backed out for Windows bustage. https://hg.mozilla.org/integration/mozilla-inbound/rev/f2cf50a14bae https://treeherder.mozilla.org/logviewer.html#?job_id=8817988&repo=mozilla-inbound
Assignee | ||
Comment 28•9 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/d90378f90513
Comment 29•9 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/d90378f90513
Status: NEW → RESOLVED
Closed: 9 years ago
status-firefox40:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla40
Comment 30•9 years ago
|
||
I'm still seeing async messages processed out of order in nightly. I don't know if it is being caused by sync messages or something else. I'm listening to the messages in a javascript component and nowhere else. I'm sending several async messages from a stream listener in a frame script, passing the stream listener callbacks up to the component. Obviously the opnStartRequest/OnDataAvaialable/OnStopRequest happen in the correct order in content, although very rapidly, but the corresponding async messages are received in random order in the component.
You need to log in
before you can comment on or make changes to this bug.
Description
•