Created attachment 779399 [details] [diff] [review] FixBogusUrgentSyncMessageBug.patch SyncChannel::ProcessUrgentMessages() asserts if mRecvd is set, and this is firing on e10s builds. The assert looks bogus to me, I think it's left over from something else. It's valid for something like this to happen: (1) Child receives urgent message while processing events, adds to mUrgent queue. (2) Child sends a synchronous message. (3) Parent sends a synchronous reply. (4) Child receives synchronous reply. (5) Child wakes up, processes mUrgent before mRecvd. This patch removes the assert, and saves mRecvd locally since urgent messages will probably end up with nesting support for CPOWs. Arguably this is a little weird. Maybe we should process urgent messages before blocking on the outgoing sync call. I'm not sure.
This is what will fix bug 893714?
(In reply to Benjamin Smedberg [:bsmedberg] from comment #1) > This is what will fix bug 893714? Yeah.
First off, not to be a pedant, but please write a test. It's pretty easy to reliably force this race condition. (In reply to David Anderson [:dvander] from comment #0) > Created attachment 779399 [details] [diff] [review] > FixBogusUrgentSyncMessageBug.patch > > SyncChannel::ProcessUrgentMessages() asserts if mRecvd is set, and this is > firing on e10s builds. The assert looks bogus to me, I think it's left over > from something else. It's valid for something like this to happen: > > (1) Child receives urgent message while processing events, adds to mUrgent > queue. > (2) Child sends a synchronous message. > (3) Parent sends a synchronous reply. I followed up to #2. The sequence #2-#3 looks suspect. Looking at the code, the problem is that SyncChannel::EventOccurred() doesn't include a check for a nonempty mPending queue. So the worker will happily go into a |WaitForNotify()| when it knows (could know) there's an urgent message in the queue. But would it still be possible for the sync reply to race incoming urgent messages if we fixed that? I think so: SyncChannel::Send() was previously a single critical section |Lock(); Wait(); EventOccurred();|, but urgent messages necessarily force breaking out of the critical section. So there's always a "gap" where a new urgent message can arrive, after the worker replies to the previous one. So we do still have a bug to fix. Another "problem" with that situation is that the child can be permanently starved by incoming urgent messages. Maybe it could be argued that's by design. At any rate, we definitely have to remove the assertion about the sync reply in ProcessUrgentMessages(). Beyond that, I'd like to advertise again for the SyncChannel/RPCChannel refactoring :). I only took a quick look at the patch, but there are some immediate omissions: comments, test. I'm not sure the patch will properly handle the case where multiple urgent messages arrive in sequence around a sync reply. (Or if it does, let me know.) It would be pretty much impossible to force that race condition in a test without invading the SyncChannel code, so we're going to have to rely on a ~correctness proof in the comments.
Comment on attachment 779399 [details] [diff] [review] FixBogusUrgentSyncMessageBug.patch I'm not ready to r+, but I'm not sure yet that this patch has bugs. Would like to reevaluate wrt comment 3.
This was obsoleted by the MessageChannel refactoring.
Status: NEW → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.