Closed Bug 1092220 Opened 10 years ago Closed 10 years ago

Sometimes preload.js attempts to load the cookie service when we can't send messages

Categories

(Core :: DOM: Content Processes, defect)

defect
Not set
normal

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: mccr8, Assigned: mccr8)

References

Details

Attachments

(1 file)

Bug 1091766 changes the behavior in child processes when we drop a message from calling exit(0) to returning. This causes an abort in B2G ICS Emulator opt M5 in PNeckoChild::SendPCookieServiceConstructor here: bool __sendok = (mChannel)->Send(__msg); if ((!(__sendok))) { NS_RUNTIMEABORT("constructor for actor failed"); <-- failure here return nullptr; } Before I guess we'd fail to send, and just quietly exit, whereas now we return, and the failure triggers an abort. Looking further up the stack, what is happening is that we're running preload.js, which is loading the cookie service, which tries to send a message, apparently when we can't. Now, the odd thing to me is that this is only happening on a single test. The two failures for this are happening shortly after test_audio2.html completes: 667 INFO TEST-START | /tests/dom/media/test/test_audio2.html 668 INFO TEST-OK | /tests/dom/media/test/test_audio2.html | took 7445ms -*- NetworkService: NetworkService shutdown [Parent 724] WARNING: waitpid failed pid:1000 errno:10: file ../../../gecko/ipc/chromium/src/base/process_util_posix.cc, line 267 JavaScript error: resource://gre/modules/IndexedDBHelper.jsm, line 121: UnknownError: The operation failed for reasons unrelated to the database itself and not covered by any other error code. [Child 1014] ###!!! ABORT: constructor for actor failed: file PNeckoChild.cpp, line 267 I'm not sure if those messages after the TEST-OK are related or not. this is the try run: https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=6beb4e8478f0
The problem here appears to be that we're somehow peeling a RecvAppInfo message off the queue after we're already in XPCOM shutdown. This patch checks if we're in shutdown, and just fails in RecvAppInfo if it is late, instead of calling PreloadSlowThings(). Maybe that will fix it.
Assignee: nobody → continuation
Well, that didn't change anything.
Kyle gave me some tips for more things to investigate. This is being caused by dom/ipc/tests/test_NuwaProcessDeadlock.html try run with that disabled is green: https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=66fc55ac12f2 try run with it enabled is orange: https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=23e9744322bd The sequence of events seems to be that we run that test, then shortly there after we have a SendAppInfo/RecvAppInfo pair, but it does not hit the PreloadSlowThings() code. Then 6 seconds later, we do another SendAppInfo in the parent, then two seconds later we start a RecvAppInfo in the child. Then another 37 or so seconds later we seem to start running dom/ipc/preload.js and get to the line Cu.import("resource://gre/modules/AppsServiceChild.jsm"); Then we don't see anything else from the child process for a while, then the parent process reaches xpcom-shutdown, then suddenly the child starts running preload.js again, until we try to get the cookie service, where we try to send a message to the parent that has shut down, and do an abort (prior to my patch, we'd just do an exit(0), is my theory). I'm guessing that all of the pref setting that NuwaProcessDeadlock somehow doesn't quite get undone correctly, so we end up creating a new child process that is weird somehow, so it ends up doing something that makes it blocked on the parent process, or run at a very low priority, or something. Then when the parent process goes away, whatever it was hung on is unblocked, and it runs, but it fails because it is so late.
The deadlock test does a setPref on dom.ipc.processPrelaunch.testMode twice, and the documentation for SpecialPowers.pushPrefEnv says that is undefined, so maybe it isn't getting cleared on exit. When that pref is true, we do a sleep(5): http://mxr.mozilla.org/mozilla-central/source/ipc/glue/MessageLink.cpp#141 which somewhat matches up with the 6 seconds between the two SendAppInfo we see, so maybe it is related? I'm not sure what the point of that second setPref on processPrelauch.testMode is. Cervantes, do you have any ideas what might be happening here? It looks like you wrote this test. Thanks.
Flags: needinfo?(cyu)
(In reply to Andrew McCreight [:mccr8] from comment #4) > The deadlock test does a setPref on dom.ipc.processPrelaunch.testMode twice, > and the documentation for SpecialPowers.pushPrefEnv says that is undefined, > so maybe it isn't getting cleared on exit. When that pref is true, we do a > sleep(5): > http://mxr.mozilla.org/mozilla-central/source/ipc/glue/MessageLink.cpp#141 > which somewhat matches up with the 6 seconds between the two SendAppInfo we > see, so maybe it is related? I'm not sure what the point of that second > setPref on processPrelauch.testMode is. > > Cervantes, do you have any ideas what might be happening here? It looks > like you wrote this test. Thanks. I have no idea about how the failure came to be, but it sounds like we can only perform one setPref() in the test case and then just finish the test case, letting the system to revert the change. If that's the case then let's see if it helps by removing the setPref() call during shutting down: https://tbpl.mozilla.org/?tree=Try&rev=50842d8ee0f1 If it doesn't, I can take a further look.
Flags: needinfo?(cyu)
Thanks. Unfortunately, that doesn't seem to have helped.
I ran the test locally in the hope of debugging locally, but this seems to be reproducible only on the server. The test finished without the crash: 14383 INFO MEMORY STAT vsize after test: 150413312 14384 INFO MEMORY STAT residentFast after test: 51818496 14385 INFO MEMORY STAT heapAllocated after test: 22514896 14386 INFO TEST-OK | /tests/dom/media/test/test_VideoPlaybackQuality_disabled.html | took 545ms 14387 INFO TEST-START | Shutdown 14388 INFO Passed: 32769 14389 INFO Failed: 0 14390 INFO Todo: 446 14391 INFO Slowest: 22278ms - /tests/dom/ipc/tests/test_NuwaProcessDeadlock.html 14392 INFO SimpleTest FINISHED 14393 INFO TEST-INFO | Ran 1 Loops 14394 INFO SimpleTest FINISHED
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: