Closed
Bug 591708
Opened 14 years ago
Closed 14 years ago
e10s HTTP: nytimes.com causes use-after-delete of IPDL actor?
Categories
(Core :: Networking: HTTP, defect)
Tracking
()
RESOLVED
FIXED
Tracking | Status | |
---|---|---|
fennec | 2.0+ | --- |
People
(Reporter: jduell.mcbugs, Assigned: jdm)
References
Details
Attachments
(1 file, 3 obsolete files)
5.55 KB,
patch
|
mayhemer
:
review+
|
Details | Diff | Splinter Review |
cjones reports he saw
- WARNING: NS_ENSURE_TRUE(mSuspendCount > 0) failed: file
/netwerk/base/src/nsInputStreamPump.cpp, line 241
WARNING: NS_ENSURE_TRUE(aOriginalURI) failed: file
/netwerk/protocol/http/HttpBaseChannel.cpp, line 246
###!!! [Parent][AsyncChannel] Error: Route error: message sent to unknown
actor ID
Haven't had time to verify
Assignee | ||
Comment 1•14 years ago
|
||
I saw it too. I had to load nytimes.com, go back to the homepage and then load it again.
Assignee | ||
Comment 2•14 years ago
|
||
(gdb) p msg
$2 = (const TupleTraits<mozilla::ipc::AsyncChannel::Message*>::Message &) @0xbfe84410: {
<Pickle> = {
static kPayloadUnit = 64,
header_ = 0x86dda88,
header_size_ = 28,
capacity_ = 64,
variable_buffer_offset_ = 0
},
members of IPC::Message:
_vptr.Message = 0x313c5c8,
file_descriptor_set_ = {
ptr_ = 0x0
},
name_ = 0x2fa0fc1 "???",
received_time_ = 0,
output_params_ = Traceback (most recent call last):
File "/usr/lib/../share/gdb/python/libstdcxx/v6/printers.py", line 557, in to_string
return self.val['_M_dataplus']['_M_p'].string (encoding, length = len)
LookupError: unknown encoding: UCS-4
,
log_data_ = 0x0,
dont_log_ = false
}
Assignee | ||
Comment 3•14 years ago
|
||
(gdb) p *msg.header()
$4 = {
<Pickle::Header> = {
payload_size = 4
},
members of IPC::Message::Header:
routing = -491,
type = 12289,
flags = 2,
num_fds = 0,
rpc_remote_stack_depth_guess = 4294967295,
rpc_local_stack_depth = 4294967295,
seqno = 0
}
Assignee | ||
Comment 4•14 years ago
|
||
According to the script at http://pastebin.mozilla.org/781741, this is a SetPriority call gone wrong.
Assignee | ||
Comment 5•14 years ago
|
||
The plot thickens. I tracked this down to an HttpChannelChild actor with a positive id, which should be impossible. It's not overflow, as other channels are only hanging around -200. The channel's not cached, it's loading a 1x1 gif and calling SetPriority(-1). What's also strange is that a subsequent call of SetPriority(0) doesn't trigger the same error.
Reporter | ||
Comment 6•14 years ago
|
||
We've got a known race with SetPriority and redirects (bug 587171). We don't remove the old channel from the loadgroup until the child receives Redirect3Complete(), and we call Send__delete in the parent right after we sent that. That might be the window where the SetPriority arrives? Perhaps we need to move the _delete to the child's RecvRedirect3Complete.
(Actually I'm puzzled as to why we delete the protocol here in the first place: the old channel gets canceled, and I would have thought we could just deleted it the usual way, after OnStopRequest.)
None of this explains the child actor with the positive ID, though, unless it's released memory that's been scribbled over.
Reporter | ||
Updated•14 years ago
|
tracking-fennec: --- → ?
Assignee | ||
Updated•14 years ago
|
Assignee: nobody → josh
Assignee | ||
Comment 7•14 years ago
|
||
I suspect it is the race mentioned in comment 6. The NSPR log for the relevant 1x1 image that causes the error shows this:
-1231029392[9a02c08]: http response [
-1231029392[9a02c08]: HTTP/1.1 302 Moved Temporarily
-1231029392[9a02c08]: Content-Length: 0
-1231029392[9a02c08]: Date: Tue, 14 Sep 2010 21:33:32 GMT
-1231029392[9a02c08]: Location: http://s0.2mdn.net/viewad/666472/130-1x1.gif
-1231029392[9a02c08]: Cache-Control: no-cache
-1231029392[9a02c08]: Pragma: no-cache
-1231029392[9a02c08]: Server: GFE/2.0
-1231029392[9a02c08]: Content-Type: text/html
-1231029392[9a02c08]: ]
Assignee | ||
Comment 8•14 years ago
|
||
Ok, the positive ID is a red herring. I mistook it to mean that actors each side share positive/negative versions of the same ID, but really it means that actors share the same ID, period. If an actor is created in the parent, both will have a positive ID, while if it's created in the child they'll both have a negative one.
Assignee | ||
Comment 9•14 years ago
|
||
FWIW, the actor complaining here is the newly-constructed channel after the redirect, not the redirected one. Still not sure what the underlying cause is, but I guess there's mysteriously no longer a corresponding parent channel actor.
Assignee | ||
Comment 10•14 years ago
|
||
Note to self - this is the stack that leads to the error-causing SetPriority call:
#0 RescheduleRequest (aRequest=<value optimized out>, delta=-1) at /home/t_mattjo/src/firefox/mobilebase/netwerk/base/src/nsLoadGroup.cpp:122
#1 0x00fa5e7c in nsLoadGroup::AddRequest (this=0x9aadd18, request=0x9e35e18, ctxt=0x0) at /home/t_mattjo/src/firefox/mobilebase/netwerk/base/src/nsLoadGroup.cpp:578
#2 0x01092fb9 in mozilla::net::HttpChannelChild::CompleteRedirectSetup (this=0x9e35dd0, listener=0x9aae228, aContext=0x0) at /home/t_mattjo/src/firefox/mobilebase/netwerk/protocol/http/HttpChannelChild.cpp:732
#3 0x010930bc in mozilla::net::HttpChannelChild::Redirect3Complete (this=0x9aae968) at /home/t_mattjo/src/firefox/mobilebase/netwerk/protocol/http/HttpChannelChild.cpp:708
#4 0x01093149 in mozilla::net::HttpChannelChild::RecvRedirect3Complete (this=0x9aae968) at /home/t_mattjo/src/firefox/mobilebase/netwerk/protocol/http/HttpChannelChild.cpp:692
#5 0x0259cb23 in mozilla::net::PHttpChannelChild::OnMessageReceived (this=0x9aae968, __msg=...) at PHttpChannelChild.cpp:579
#6 0x02527c30 in mozilla::dom::PContentChild::OnMessageReceived (this=0x9577d60, __msg=...) at PContentChild.cpp:840
#7 0x02427635 in mozilla::ipc::AsyncChannel::OnDispatchMessage (this=0x9577d68, msg=...) at /home/t_mattjo/src/firefox/mobilebase/ipc/glue/AsyncChannel.cpp:262
#8 0x024300f8 in mozilla::ipc::RPCChannel::OnMaybeDequeueOne (this=0x9577d68) at /home/t_mattjo/src/firefox/mobilebase/ipc/glue/RPCChannel.cpp:438
#9 0x02431686 in DispatchToMethod<mozilla::ipc::RPCChannel, bool (mozilla::ipc::RPCChannel::*)()> (this=0x9578c50) at /home/t_mattjo/src/firefox/mobilebase/ipc/chromium/src/base/tuple.h:383
#10 RunnableMethod<mozilla::ipc::RPCChannel, bool (mozilla::ipc::RPCChannel::*)(), Tuple0>::Run (this=0x9578c50) at /home/t_mattjo/src/firefox/mobilebase/ipc/chromium/src/base/task.h:307
#11 0x024316d9 in Run (this=0xb6c0d208) at ../../dist/include/mozilla/ipc/RPCChannel.h:448
#12 mozilla::ipc::RPCChannel::DequeueTask::Run (this=0xb6c0d208) at ../../dist/include/mozilla/ipc/RPCChannel.h:473
#13 0x0267d76c in MessageLoop::RunTask (this=0xbfea2c20, task=0xb6c0d208) at /home/t_mattjo/src/firefox/mobilebase/ipc/chromium/src/base/message_loop.cc:343
#14 0x0267de58 in MessageLoop::DeferOrRunPendingTask (this=0xbfea2c20, pending_task=...) at /home/t_mattjo/src/firefox/mobilebase/ipc/chromium/src/base/message_loop.cc:351
#15 0x0267ef07 in MessageLoop::DoWork (this=0xbfea2c20) at /home/t_mattjo/src/firefox/mobilebase/ipc/chromium/src/base/message_loop.cc:451
#16 0x0242ba5c in mozilla::ipc::MessagePump::Run (this=0x9576ee0, aDelegate=0xbfea2c20) at /home/t_mattjo/src/firefox/mobilebase/ipc/glue/MessagePump.cpp:114
#17 0x0242bd6d in mozilla::ipc::MessagePumpForChildProcess::Run (this=0x9576ee0, aDelegate=0xbfea2c20) at /home/t_mattjo/src/firefox/mobilebase/ipc/glue/MessagePump.cpp:229
#18 0x0267dc85 in MessageLoop::RunInternal (this=0xbfea2c20) at /home/t_mattjo/src/firefox/mobilebase/ipc/chromium/src/base/message_loop.cc:219
#19 0x0267dca1 in MessageLoop::RunHandler (this=0xbfea2c20) at /home/t_mattjo/src/firefox/mobilebase/ipc/chromium/src/base/message_loop.cc:202
#20 0x0267dd0c in MessageLoop::Run (this=0xbfea2c20) at /home/t_mattjo/src/firefox/mobilebase/ipc/chromium/src/base/message_loop.cc:176
#21 0x022bf2d7 in nsBaseAppShell::Run (this=0x97ad088) at /home/t_mattjo/src/firefox/mobilebase/widget/src/xpwidgets/nsBaseAppShell.cpp:180
#22 0x00f79c9b in XRE_RunAppShell () at /home/t_mattjo/src/firefox/mobilebase/toolkit/xre/nsEmbedFunctions.cpp:629
#23 0x0242bcc6 in mozilla::ipc::MessagePumpForChildProcess::Run (this=0x9576ee0, aDelegate=0xbfea2c20) at /home/t_mattjo/src/firefox/mobilebase/ipc/glue/MessagePump.cpp:215
#24 0x0267dc85 in MessageLoop::RunInternal (this=0xbfea2c20) at /home/t_mattjo/src/firefox/mobilebase/ipc/chromium/src/base/message_loop.cc:219
#25 0x0267dca1 in MessageLoop::RunHandler (this=0xbfea2c20) at /home/t_mattjo/src/firefox/mobilebase/ipc/chromium/src/base/message_loop.cc:202
#26 0x0267dd0c in MessageLoop::Run (this=0xbfea2c20) at /home/t_mattjo/src/firefox/mobilebase/ipc/chromium/src/base/message_loop.cc:176
#27 0x00f7990a in XRE_InitChildProcess (aArgc=1, aArgv=0xbfea2e44, aProcess=GeckoProcessType_Content) at /home/t_mattjo/src/firefox/mobilebase/toolkit/xre/nsEmbedFunctions.cpp:506
#28 0x08048651 in main (argc=2, argv=0xbfea2e44) at /home/t_mattjo/src/firefox/mobilebase/ipc/app/MozillaRuntimeMain.cpp:67
Assignee | ||
Comment 11•14 years ago
|
||
I do believe I was wrong, and we're seeing the redirect/SetPriority race on the old channel, not the new one.
Assignee | ||
Comment 12•14 years ago
|
||
Note to self - I believe this is the stack causing the error for realz this time:
#0 mozilla::net::HttpChannelChild::SetPriority (this=0xa4d51b8, aPriority=10) at /home/t_mattjo/src/firefox/mobilebase/netwerk/protocol/http/HttpChannelChild.cpp:1044
#1 0x01069671 in mozilla::net::HttpBaseChannel::AdjustPriority (this=0xa4d51d0, delta=1) at /home/t_mattjo/src/firefox/mobilebase/netwerk/protocol/http/HttpBaseChannel.cpp:967
#2 0x00fa4e4c in RescheduleRequest (aRequest=<value optimized out>, delta=1) at /home/t_mattjo/src/firefox/mobilebase/netwerk/base/src/nsLoadGroup.cpp:122
#3 0x00fa733b in nsLoadGroup::RemoveRequest (this=0xa80a620, request=0xa4d5200, ctxt=0x0, aStatus=2152398850) at /home/t_mattjo/src/firefox/mobilebase/netwerk/base/src/nsLoadGroup.cpp:664
#4 0x01093091 in mozilla::net::HttpChannelChild::Redirect3Complete (this=0xa4d51b8) at /home/t_mattjo/src/firefox/mobilebase/netwerk/protocol/http/HttpChannelChild.cpp:704
#5 0x01093149 in mozilla::net::HttpChannelChild::RecvRedirect3Complete (this=0xa4d51b8) at /home/t_mattjo/src/firefox/mobilebase/netwerk/protocol/http/HttpChannelChild.cpp:692
#6 0x0259cb23 in mozilla::net::PHttpChannelChild::OnMessageReceived (this=0xa4d51b8, __msg=...) at PHttpChannelChild.cpp:579
#7 0x02527c30 in mozilla::dom::PContentChild::OnMessageReceived (this=0x9577d60, __msg=...) at PContentChild.cpp:840
#8 0x02427635 in mozilla::ipc::AsyncChannel::OnDispatchMessage (this=0x9577d68, msg=...) at /home/t_mattjo/src/firefox/mobilebase/ipc/glue/AsyncChannel.cpp:262
#9 0x024300f8 in mozilla::ipc::RPCChannel::OnMaybeDequeueOne (this=0x9577d68) at /home/t_mattjo/src/firefox/mobilebase/ipc/glue/RPCChannel.cpp:438
#10 0x02431686 in DispatchToMethod<mozilla::ipc::RPCChannel, bool (mozilla::ipc::RPCChannel::*)()> (this=0x9578c50) at /home/t_mattjo/src/firefox/mobilebase/ipc/chromium/src/base/tuple.h:383
#11 RunnableMethod<mozilla::ipc::RPCChannel, bool (mozilla::ipc::RPCChannel::*)(), Tuple0>::Run (this=0x9578c50) at /home/t_mattjo/src/firefox/mobilebase/ipc/chromium/src/base/task.h:307
#12 0x024316d9 in Run (this=0xb6c352d8) at ../../dist/include/mozilla/ipc/RPCChannel.h:448
#13 mozilla::ipc::RPCChannel::DequeueTask::Run (this=0xb6c352d8) at ../../dist/include/mozilla/ipc/RPCChannel.h:473
#14 0x0267d76c in MessageLoop::RunTask (this=0xbfea2c20, task=0xb6c352d8) at /home/t_mattjo/src/firefox/mobilebase/ipc/chromium/src/base/message_loop.cc:343
#15 0x0267de58 in MessageLoop::DeferOrRunPendingTask (this=0xbfea2c20, pending_task=...) at /home/t_mattjo/src/firefox/mobilebase/ipc/chromium/src/base/message_loop.cc:351
#16 0x0267ef07 in MessageLoop::DoWork (this=0xbfea2c20) at /home/t_mattjo/src/firefox/mobilebase/ipc/chromium/src/base/message_loop.cc:451
#17 0x0242ba5c in mozilla::ipc::MessagePump::Run (this=0x9576ee0, aDelegate=0xbfea2c20) at /home/t_mattjo/src/firefox/mobilebase/ipc/glue/MessagePump.cpp:114
#18 0x0242bd6d in mozilla::ipc::MessagePumpForChildProcess::Run (this=0x9576ee0, aDelegate=0xbfea2c20) at /home/t_mattjo/src/firefox/mobilebase/ipc/glue/MessagePump.cpp:229
#19 0x0267dc85 in MessageLoop::RunInternal (this=0xbfea2c20) at /home/t_mattjo/src/firefox/mobilebase/ipc/chromium/src/base/message_loop.cc:219
#20 0x0267dca1 in MessageLoop::RunHandler (this=0xbfea2c20) at /home/t_mattjo/src/firefox/mobilebase/ipc/chromium/src/base/message_loop.cc:202
#21 0x0267dd0c in MessageLoop::Run (this=0xbfea2c20) at /home/t_mattjo/src/firefox/mobilebase/ipc/chromium/src/base/message_loop.cc:176
#22 0x022bf2d7 in nsBaseAppShell::Run (this=0x97ad088) at /home/t_mattjo/src/firefox/mobilebase/widget/src/xpwidgets/nsBaseAppShell.cpp:180
#23 0x00f79c9b in XRE_RunAppShell () at /home/t_mattjo/src/firefox/mobilebase/toolkit/xre/nsEmbedFunctions.cpp:629
#24 0x0242bcc6 in mozilla::ipc::MessagePumpForChildProcess::Run (this=0x9576ee0, aDelegate=0xbfea2c20) at /home/t_mattjo/src/firefox/mobilebase/ipc/glue/MessagePump.cpp:215
#25 0x0267dc85 in MessageLoop::RunInternal (this=0xbfea2c20) at /home/t_mattjo/src/firefox/mobilebase/ipc/chromium/src/base/message_loop.cc:219
#26 0x0267dca1 in MessageLoop::RunHandler (this=0xbfea2c20) at /home/t_mattjo/src/firefox/mobilebase/ipc/chromium/src/base/message_loop.cc:202
#27 0x0267dd0c in MessageLoop::Run (this=0xbfea2c20) at /home/t_mattjo/src/firefox/mobilebase/ipc/chromium/src/base/message_loop.cc:176
#28 0x00f7990a in XRE_InitChildProcess (aArgc=1, aArgv=0xbfea2e44, aProcess=GeckoProcessType_Content) at /home/t_mattjo/src/firefox/mobilebase/toolkit/xre/nsEmbedFunctions.cpp:506
#29 0x08048651 in main (argc=2, argv=0xbfea2e44) at /home/t_mattjo/src/firefox/mobilebase/ipc/app/MozillaRuntimeMain.cpp:67
Reporter | ||
Comment 13•14 years ago
|
||
I suspect we may have to move the __Delete_ of the old channel to some place where we're guaranteed that all child actors (including the load group) have been notified about the redirect. That may mean moving the _delete_ to the child (though TBH I'm not sure why we have to delete in the redirect code, as opposed to just letting the old channel go through OnStop, etc. normally).
Naturally, it'd also be really nice if we could forward any late-arriving SetPriority to the new channel.
Assignee | ||
Comment 14•14 years ago
|
||
The fundamental issue here is that the __delete__ message comes out of the blue. When the child deletes, we are in a known state in which the parent will not be sending any further messages. That is not the case in the redirection code in the parent, so we could either avoid any __delete__ from parent->child, move the __delete__ to another point when the child is guaranteed not to send any further messages, or we could switch to an acknowledged __delete__ strategy when deleting from the parent (ie. parent sends a DeleteSelf message, child responds with __delete__).
I tried removing the channel deletion in the case of successful redirection, and I can no longer reproduce this bug. I'll attach a patch, but I still haven't reasoned through whether the deletion in the case of unsuccessful redirection is race-free.
Assignee | ||
Comment 15•14 years ago
|
||
Patch to remove channel deletion on redirection success which makes the problem unreproducible.
Assignee | ||
Updated•14 years ago
|
Attachment #475595 -
Attachment description: Initiate all PHttpChannel protocol deletion from the child to avoid races. → Remove channel deletion on redirection success.
Comment 16•14 years ago
|
||
(In reply to comment #0)
> cjones reports he saw
>
> - WARNING: NS_ENSURE_TRUE(mSuspendCount > 0) failed: file
> /netwerk/base/src/nsInputStreamPump.cpp, line 241
This one particular warning is related to bug 513008. See bug 594882 for explanation.
Comment 17•14 years ago
|
||
(In reply to comment #6)
> (Actually I'm puzzled as to why we delete the protocol here in the first place:
> the old channel gets canceled, and I would have thought we could just deleted
> it the usual way, after OnStopRequest.)
You mean delete from HttpChannelChild::OnStopRequest? We cannot do that, because we won't get OnStopRequest call for the old child channel. That's how redirects work: you do AsyncOpen on a channel, it succeeds, so you expect OnStartRequest/OnStopRequest. You will get it, but only from a new channel after the redirect is made (if it is not another redirect). This switch is transparent to the channel consumer.
Assignee | ||
Comment 18•14 years ago
|
||
If that's the case, we should probably switch to using a request/delete system from the parent. I'll throw together a patch.
Comment 19•14 years ago
|
||
Comment on attachment 475595 [details] [diff] [review]
Remove channel deletion on redirection success.
This is a good start. But we will leak probably one of the channel (the old or the new one depending on the redirect result).
The correct discovery here is that the channel must not be deleted from the parent side.
We have either to move the deletion to the child: in case of a successful redirect, do it in HttpChannelChild::Redirect3Complete (on *this) after we get success result from mRedirectChannelChild->CompleteRedirectSetup ; in case of a failed redirect, we can delete the channel from the parent or we have to introduce Redirect3Failed event that will delete the new channel (the redirect target channel) on the child side.
Or as a general solution also for future, do what you suggests in comment 14 - DeleteSelf message.
Assignee | ||
Updated•14 years ago
|
Attachment #475595 -
Attachment is obsolete: true
Assignee | ||
Comment 20•14 years ago
|
||
Assignee | ||
Updated•14 years ago
|
Attachment #476266 -
Flags: review?(jduell.mcbugs)
Assignee | ||
Updated•14 years ago
|
Attachment #476266 -
Attachment is obsolete: true
Attachment #476266 -
Flags: review?(jduell.mcbugs)
Assignee | ||
Comment 21•14 years ago
|
||
Bitrotted after bug 598076.
Assignee | ||
Updated•14 years ago
|
Attachment #478854 -
Flags: review?(jduell.mcbugs)
Assignee | ||
Updated•14 years ago
|
Attachment #478854 -
Flags: review?(jduell.mcbugs)
Comment 23•14 years ago
|
||
I can steal it.
Assignee | ||
Updated•14 years ago
|
Attachment #478854 -
Attachment is obsolete: true
Attachment #478854 -
Flags: review?(jduell.mcbugs)
Assignee | ||
Comment 24•14 years ago
|
||
Bitrot.
Assignee | ||
Updated•14 years ago
|
Attachment #481289 -
Flags: review?(jduell.mcbugs)
Assignee | ||
Updated•14 years ago
|
Attachment #481289 -
Flags: review?(jduell.mcbugs) → review?(honzab.moz)
Comment 25•14 years ago
|
||
Comment on attachment 481289 [details] [diff] [review]
Initiate all PHttpChannel protocol deletion from the child to avoid races.
r=honzab if the netwerk xpc-shell tests are passing with this patch, w/o leaks.
Attachment #481289 -
Flags: review?(honzab.moz) → review+
Assignee | ||
Comment 26•14 years ago
|
||
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•