Closed
Bug 1092220
Opened 10 years ago
Closed 10 years ago
Sometimes preload.js attempts to load the cookie service when we can't send messages
Categories
(Core :: DOM: Content Processes, defect)
Core
DOM: Content Processes
Tracking
()
RESOLVED
WORKSFORME
People
(Reporter: mccr8, Assigned: mccr8)
References
Details
Attachments
(1 file)
Bug 1091766 changes the behavior in child processes when we drop a message from calling exit(0) to returning. This causes an abort in B2G ICS Emulator opt M5 in PNeckoChild::SendPCookieServiceConstructor here:
bool __sendok = (mChannel)->Send(__msg);
if ((!(__sendok))) {
NS_RUNTIMEABORT("constructor for actor failed"); <-- failure here
return nullptr;
}
Before I guess we'd fail to send, and just quietly exit, whereas now we return, and the failure triggers an abort.
Looking further up the stack, what is happening is that we're running preload.js, which is loading the cookie service, which tries to send a message, apparently when we can't.
Now, the odd thing to me is that this is only happening on a single test. The two failures for this are happening shortly after test_audio2.html completes:
667 INFO TEST-START | /tests/dom/media/test/test_audio2.html
668 INFO TEST-OK | /tests/dom/media/test/test_audio2.html | took 7445ms
-*- NetworkService: NetworkService shutdown
[Parent 724] WARNING: waitpid failed pid:1000 errno:10: file ../../../gecko/ipc/chromium/src/base/process_util_posix.cc, line 267
JavaScript error: resource://gre/modules/IndexedDBHelper.jsm, line 121: UnknownError: The operation failed for reasons unrelated to the database itself and not covered by any other error code.
[Child 1014] ###!!! ABORT: constructor for actor failed: file PNeckoChild.cpp, line 267
I'm not sure if those messages after the TEST-OK are related or not.
this is the try run: https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=6beb4e8478f0
Assignee | ||
Comment 1•10 years ago
|
||
The problem here appears to be that we're somehow peeling a RecvAppInfo message off the queue after we're already in XPCOM shutdown. This patch checks if we're in shutdown, and just fails in RecvAppInfo if it is late, instead of calling PreloadSlowThings(). Maybe that will fix it.
Assignee | ||
Updated•10 years ago
|
Assignee: nobody → continuation
Assignee | ||
Comment 2•10 years ago
|
||
Well, that didn't change anything.
Assignee | ||
Comment 3•10 years ago
|
||
Kyle gave me some tips for more things to investigate.
This is being caused by dom/ipc/tests/test_NuwaProcessDeadlock.html
try run with that disabled is green:
https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=66fc55ac12f2
try run with it enabled is orange:
https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=23e9744322bd
The sequence of events seems to be that we run that test, then shortly there after we have a SendAppInfo/RecvAppInfo pair, but it does not hit the PreloadSlowThings() code. Then 6 seconds later, we do another SendAppInfo in the parent, then two seconds later we start a RecvAppInfo in the child. Then another 37 or so seconds later we seem to start running dom/ipc/preload.js and get to the line
Cu.import("resource://gre/modules/AppsServiceChild.jsm");
Then we don't see anything else from the child process for a while, then the parent process reaches xpcom-shutdown, then suddenly the child starts running preload.js again, until we try to get the cookie service, where we try to send a message to the parent that has shut down, and do an abort (prior to my patch, we'd just do an exit(0), is my theory).
I'm guessing that all of the pref setting that NuwaProcessDeadlock somehow doesn't quite get undone correctly, so we end up creating a new child process that is weird somehow, so it ends up doing something that makes it blocked on the parent process, or run at a very low priority, or something. Then when the parent process goes away, whatever it was hung on is unblocked, and it runs, but it fails because it is so late.
Assignee | ||
Comment 4•10 years ago
|
||
The deadlock test does a setPref on dom.ipc.processPrelaunch.testMode twice, and the documentation for SpecialPowers.pushPrefEnv says that is undefined, so maybe it isn't getting cleared on exit. When that pref is true, we do a sleep(5):
http://mxr.mozilla.org/mozilla-central/source/ipc/glue/MessageLink.cpp#141
which somewhat matches up with the 6 seconds between the two SendAppInfo we see, so maybe it is related? I'm not sure what the point of that second setPref on processPrelauch.testMode is.
Cervantes, do you have any ideas what might be happening here? It looks like you wrote this test. Thanks.
Flags: needinfo?(cyu)
Comment 5•10 years ago
|
||
(In reply to Andrew McCreight [:mccr8] from comment #4)
> The deadlock test does a setPref on dom.ipc.processPrelaunch.testMode twice,
> and the documentation for SpecialPowers.pushPrefEnv says that is undefined,
> so maybe it isn't getting cleared on exit. When that pref is true, we do a
> sleep(5):
> http://mxr.mozilla.org/mozilla-central/source/ipc/glue/MessageLink.cpp#141
> which somewhat matches up with the 6 seconds between the two SendAppInfo we
> see, so maybe it is related? I'm not sure what the point of that second
> setPref on processPrelauch.testMode is.
>
> Cervantes, do you have any ideas what might be happening here? It looks
> like you wrote this test. Thanks.
I have no idea about how the failure came to be, but it sounds like we can only perform one setPref() in the test case and then just finish the test case, letting the system to revert the change. If that's the case then let's see if it helps by removing the setPref() call during shutting down: https://tbpl.mozilla.org/?tree=Try&rev=50842d8ee0f1
If it doesn't, I can take a further look.
Flags: needinfo?(cyu)
Assignee | ||
Comment 6•10 years ago
|
||
Thanks. Unfortunately, that doesn't seem to have helped.
Comment 7•10 years ago
|
||
I ran the test locally in the hope of debugging locally, but this seems to be reproducible only on the server. The test finished without the crash:
14383 INFO MEMORY STAT vsize after test: 150413312
14384 INFO MEMORY STAT residentFast after test: 51818496
14385 INFO MEMORY STAT heapAllocated after test: 22514896
14386 INFO TEST-OK | /tests/dom/media/test/test_VideoPlaybackQuality_disabled.html | took 545ms
14387 INFO TEST-START | Shutdown
14388 INFO Passed: 32769
14389 INFO Failed: 0
14390 INFO Todo: 446
14391 INFO Slowest: 22278ms - /tests/dom/ipc/tests/test_NuwaProcessDeadlock.html
14392 INFO SimpleTest FINISHED
14393 INFO TEST-INFO | Ran 1 Loops
14394 INFO SimpleTest FINISHED
Assignee | ||
Comment 8•10 years ago
|
||
This seems to have gone away, somehow:
https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=de37d0493f54
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → WORKSFORME
You need to log in
before you can comment on or make changes to this bug.
Description
•