Closed
Bug 968604
Opened 9 years ago
Closed 9 years ago
[Tarako] The browser process needs to be forked from Nuwa
Categories
(Core :: IPC, defect)
Tracking
()
Tracking | Status | |
---|---|---|
b2g-v1.3T | --- | fixed |
People
(Reporter: llin, Assigned: cyu)
References
Details
Attachments
(5 files, 1 obsolete file)
65.60 KB,
image/png
|
Details | |
1.19 KB,
patch
|
khuey
:
review+
|
Details | Diff | Splinter Review |
6.31 KB,
patch
|
cyu
:
review+
|
Details | Diff | Splinter Review |
1.21 KB,
patch
|
fabrice
:
review+
|
Details | Diff | Splinter Review |
5.90 KB,
patch
|
cyu
:
review+
|
Details | Diff | Splinter Review |
After smoking test of Tarako, it was found at final memory usage check. Reproduce steps are still wanted. The screenshot of b2g-info is attached.
Comment 1•9 years ago
|
||
Browser processes are the ones used by browser tabs. So it can be normal to have more than one. The potential issue is that they are forked from the main b2g process, and not from a preallocated one.
Comment 2•9 years ago
|
||
Was the Nuwa process killed and re-born? Lawrence, could you provide logcat or information of creation/killing of processes?
Comment 3•9 years ago
|
||
Thinker, I just did a quick check: Before opening a page in the browser: fabrice@fabrice-x240:~/dev/tarako$ b2g-info | megabytes | NAME PID PPID CPU(s) NICE USS PSS RSS VSIZE OOM_ADJ USER b2g 86 1 28.4 0 40.9 44.8 52.5 147.0 0 root (Nuwa) 321 86 0.9 0 0.8 3.4 9.5 49.5 0 root Homescreen 338 321 5.5 1 10.2 14.6 23.6 64.7 2 app_338 (Preallocated a 390 321 0.9 18 4.5 7.6 15.0 57.5 10 root And after opening a page: fabrice@fabrice-x240:~/dev/tarako$ b2g-info | megabytes | NAME PID PPID CPU(s) NICE USS PSS RSS VSIZE OOM_ADJ USER b2g 86 1 38.7 0 33.9 37.0 42.5 150.0 0 root (Nuwa) 321 86 1.1 0 0.3 1.1 3.4 49.5 0 root Homescreen 338 321 5.9 18 6.8 8.6 12.8 63.7 8 app_338 Browser 438 86 4.8 1 13.2 16.2 21.8 71.2 2 app_438 So we don't reuse the preallocated process, but just fork from b2g. The USS of the browser app is then pretty high (in this test I could not even get a resource because I have no network access...)
Reporter | ||
Comment 4•9 years ago
|
||
In the smoke test of Tarako, there is no browser tab related cases. It means, during the test, no additional tab will be created. The issue is that I cannot see or kill the additional broswer process via the application manager, and what the browser process being killed is the latest created. Even the application manager shows "No recent apps", the browser process (in the case, PID is 1024) is still there. Have we been got any reported zombie processes?
Reporter | ||
Comment 5•9 years ago
|
||
Possible reproduce steps: 1. Load reference workload (light) on the target phone 2. After reboot, launch Music and play a song 3. Launch Contacts 4. Launch Messages 5. Launch Gallery 6. Call the target phone from another phone 7. The dialer(Communications) is not launched and the call is redirected to the voice mail directly 8. Then, the instance of zombie browser is on the process list
Assignee | ||
Comment 6•9 years ago
|
||
I think there is a bug in creating the browser process so it forks from b2g. I will take a look.
Assignee: nobody → cyu
Assignee | ||
Updated•9 years ago
|
Summary: [Tarako] There might be two existing instances of Browser → [Tarako] The browser process needs to be forked from Nuwa
Assignee | ||
Comment 7•9 years ago
|
||
* Perform ContentChild::InitProcessAttributes() when PBrowser is first allocated in the child process. This is to reset child-side attributes for browser. * Add ContentParent::TransformPreallocatedIntoBrowser() to reset ContentParent's member fields and set OS privileges of the child process.
Attachment #8375423 -
Flags: review?(khuey)
Assignee | ||
Comment 8•9 years ago
|
||
Tests on try: https://tbpl.mozilla.org/?tree=Try&rev=8fb61c9e31a4
Comment on attachment 8375423 [details] [diff] [review] For the browser process from Nuwa Review of attachment 8375423 [details] [diff] [review]: ----------------------------------------------------------------- very nice. r=me
Attachment #8375423 -
Flags: review?(khuey) → review+
This patch causes dom/browser-element/mochitest/priority/test_Simple.html to fail though ... are we not setting the priority of the browser process?
Assignee | ||
Comment 11•9 years ago
|
||
We are. ProcessPriorityManager::SetProcessPriority(this, PROCESS_PRIORITY_FOREGROUND); should do this. I will take a look why it fails.
Assignee | ||
Updated•9 years ago
|
blocking-b2g: --- → 1.3T?
Updated•9 years ago
|
blocking-b2g: 1.3T? → 1.3T+
Assignee | ||
Comment 12•9 years ago
|
||
The test case passes when run alone, but fails when run in in the suite. This seems to be relevant to the test case failure. 1:03.11 [Parent 15327] WARNING: Ignoring duplicate observer.: file /home/cervantes/hg/mozilla-central/modules/libpref/src/nsPrefBranch.cpp, line 622 1:03.12 [Parent 15327] WARNING: Adding an observer twice!: '!mEventObservers.Contains(observer)', file /home/cervantes/hg/mozilla-central/xpcom/threads/nsThread.cpp, line 752 Still checking why there is a duplicate observer.
Assignee | ||
Comment 13•9 years ago
|
||
The failure is a false alarm not indicating to a bug in this fix. The problem is that (very unfortunately) test_Preallocated.html runs before test_Simple.html. The preallocated process created in test_Preallocated.html is used by test_Simple.html, then "process-priority-manager:TEST-ONLY:process-created" will not be fired because it already was in test_Preallocated.html. We need to shutdown the preallocated process in test_Preallocated.html so it's not affecting the test case running after it.
Assignee | ||
Comment 14•9 years ago
|
||
This fixes the test case that the preallocated process is not actually shut down in the cleanup() function: - PreallocatedProcessManagerImpl::AllocateNow() creates the ContentParent instance. - ContentParent notifies that the process is created - The test case calls cleanup() to flip the preference back to false and then PreallocatedProcessManagerImpl::Disable() is called. - PreallocatedProcessManagerImpl::Disable() won't call mPreallocatedAppProcess->Close() because the assignment in PreallocatedProcessManagerImpl::AllocateNow() hasn't been done. This fixes the test case by making cleanup() async.
Attachment #8381368 -
Flags: review?(khuey)
Assignee | ||
Comment 15•9 years ago
|
||
Tests: https://tbpl.mozilla.org/?tree=Try&rev=b966175e43ff
Attachment #8381368 -
Flags: review?(khuey) → review+
Assignee | ||
Comment 16•9 years ago
|
||
(In reply to Cervantes Yu from comment #15) > Tests: https://tbpl.mozilla.org/?tree=Try&rev=b966175e43ff Mochitest group 2 on Linux Debug and Linux X64 Debug still fail in a line. Seems not intermittent. I need to look at it.
Comment 17•9 years ago
|
||
Cervantes, do you need help fixing these tests issues?
Flags: needinfo?(cyu)
Assignee | ||
Comment 18•9 years ago
|
||
(In reply to Fabrice Desré [:fabrice] from comment #17) > Cervantes, do you need help fixing these tests issues? That would be great. :)
Flags: needinfo?(cyu)
Assignee | ||
Comment 19•9 years ago
|
||
From the log it looks like it completed all the test cases in mochitest-2, then it get stuck in shutting down. The test runner sends SIGABRT to force it to shut down. I still don't make the test get stuck. Is there any way to bisect the offending test case in mochitest-2?
Comment 20•9 years ago
|
||
ni? Fabrice to see if any help can be provided regarding comment 19
Flags: needinfo?(fabrice)
Assignee | ||
Comment 21•9 years ago
|
||
For more information, I ran mochitest-2 locally with the patches applied. The failure is intermittent. I still have no clue why it is permfail on try.
Updated•9 years ago
|
Component: General → IPC
Product: Firefox OS → Core
Assignee | ||
Comment 22•9 years ago
|
||
From the crash stack in the log, the failure indicates that it is performing PlatformThread::join(), called from ShutdownXPCOM() to delete sIOThread. It's still unclear how the change can make this pthread_join() to block until we got killed by the test runner.
Assignee | ||
Comment 23•9 years ago
|
||
Still wrestling with mochitest-2 failures. With https://bugzilla.mozilla.org/attachment.cgi?id=8381368 fixing the inherent bug in the test case, we saw consecutive mochitest-2 failure. It's hard to say that this is intermittent. I'll try to reorder or disabling this test case to see if we can pass the suite. If yes, can we unblock the patch to 1.3T and reenable the test case once we fix it. Fabrice, how does this sound?
Comment 25•9 years ago
|
||
(In reply to Cervantes Yu from comment #23) > Still wrestling with mochitest-2 failures. With > https://bugzilla.mozilla.org/attachment.cgi?id=8381368 fixing the inherent > bug in the test case, we saw consecutive mochitest-2 failure. It's hard to > say that this is intermittent. I'll try to reorder or disabling this test > case to see if we can pass the suite. If yes, can we unblock the patch to > 1.3T and reenable the test case once we fix it. Fabrice, how does this sound? Last resort we'll disable the test temporarily. Are you sure it's a test issue and not a bug in the patch itself? (In reply to James Zhang from comment #24) > When can we land this patch? Worst case we'll land Tuesday with the test disabled, sooner if Cervantes finds what's going wrong.
Flags: needinfo?(fabrice)
Assignee | ||
Comment 26•9 years ago
|
||
(In reply to Fabrice Desré [:fabrice] from comment #25) > (In reply to Cervantes Yu from comment #23) > Last resort we'll disable the test temporarily. Are you sure it's a test > issue and not a bug in the patch itself? > Seriously, I am not sure. But from the log the test cases finished, but the suite cannot finish and get killed by the test runner after no output for 330 seconds. mochitest-2 on Linux Debug and Linux 64 Debug are the only 2 suites that failed during shutdown. 3 more tests: 1. https://tbpl.mozilla.org/?tree=Try&rev=69bea5abc0f8 Only with test case fix. 1 passed, 2 failed. This should always pass in theory. 2. https://tbpl.mozilla.org/?tree=Try&rev=d96de9c1e324 Code and test fixes, plus a little change to not use preallocated process for the browser. All 3 failed. 3. 2. with test_Preallocated.html disable. 1 green (for the 1st time...), 2 more running. Because of 1. I tend to believe there is something wrong in how we run the tests. I need to dig into the tests further to fix that.
Assignee | ||
Comment 27•9 years ago
|
||
Oops, the URL for 3. in comment #26 is https://tbpl.mozilla.org/?tree=Try&rev=aa61c08dab3b
Assignee | ||
Comment 28•9 years ago
|
||
I guess the main thread fails to pthread_join() the IO thread because it is stuck in the following call: Thread 3 0 libpthread-2.15.so + 0xf88d 1 libxul.so!ChildReaper::WaitForChildExit [process_watcher_posix_sigchld.cc:69bea5abc0f8 : 68 + 0xa] 2 libxul.so!ChildLaxReaper::WillDestroyCurrentMessageLoop [process_watcher_posix_sigchld.cc:69bea5abc0f8 : 158 + 0x4] 3 libxul.so!MessageLoop::~MessageLoop() [message_loop.cc:69bea5abc0f8 : 155 + 0x5] 4 libxul.so!base::Thread::ThreadMain() [thread.cc:69bea5abc0f8 : 174 + 0xb] 5 libxul.so!ThreadFunc [platform_thread_posix.cc:69bea5abc0f8 : 39 + 0x2] 6 libpthread-2.15.so + 0x7e99 There is a loop in HANDLE_EINTR(waitpid()). It could be an infinite loop so the main thread can never shut down. Since this is not reproduced locally, I pushed a test with more debugging instruments. Let's see if it's really what I guess. If so then I think we have another bug. https://tbpl.mozilla.org/?tree=Try&rev=5d82ded68f39
Assignee | ||
Comment 29•9 years ago
|
||
waitpid() for the prealloc'd process doesn't return but instead block indefinitely. I think PreallocatedProcessManagerImpl::Disable() doesn't close the prealloc'd process correctly.
Updated•9 years ago
|
Flags: needinfo?(styang)
Comment 30•9 years ago
|
||
(In reply to Fabrice Desré [:fabrice] from comment #25) > (In reply to James Zhang from comment #24) > > When can we land this patch? > > Worst case we'll land Tuesday with the test disabled, sooner if Cervantes > finds what's going wrong. It seems to be difficult to debug the test case failure on try server, can we land first and follow up to fix the test case? thanks
Assignee | ||
Comment 31•9 years ago
|
||
Updated patch for 1.3t: * Check for dom.ipc.processPrelaunch.enabled before returning null from ContentParent::GetNewOrUsed() so it will work correctly for NUWA + no prelaunch config. * Don't #ifdef in ContentChild::RecvPBrowserConstructor() so it will work correctly for NUWA + no prelaunch config.
Attachment #8375423 -
Attachment is obsolete: true
Attachment #8396307 -
Flags: review+
Assignee | ||
Comment 32•9 years ago
|
||
(In reply to Joe Cheng [:jcheng] from comment #30) > It seems to be difficult to debug the test case failure on try server, can > we land first and follow up to fix the test case? thanks Fabrice, if you agree, we can land this on 1.3t with test_Preallocated.html disabled temporarily (the test is never correctly cleaned up, see comment #14). I will fix it and reenable the test in bug 987164.
Flags: needinfo?(cyu)
Assignee | ||
Comment 33•9 years ago
|
||
Attachment #8396314 -
Flags: review?(fabrice)
Comment 34•9 years ago
|
||
Comment on attachment 8396314 [details] [diff] [review] Disable test_Preallocated.html Review of attachment 8396314 [details] [diff] [review]: ----------------------------------------------------------------- ok, let's land.
Attachment #8396314 -
Flags: review?(fabrice) → review+
Updated•9 years ago
|
Flags: needinfo?(fabrice)
Assignee | ||
Comment 36•9 years ago
|
||
Attachment #8396937 -
Flags: review+
Assignee | ||
Comment 37•9 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/0c3c8215ef73 https://hg.mozilla.org/integration/mozilla-inbound/rev/e3ddffe9d504 https://hg.mozilla.org/integration/mozilla-inbound/rev/65092435a027
Comment 38•9 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/0c3c8215ef73 https://hg.mozilla.org/mozilla-central/rev/e3ddffe9d504 https://hg.mozilla.org/mozilla-central/rev/65092435a027
Status: NEW → RESOLVED
Closed: 9 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → mozilla31
Assignee | ||
Comment 40•9 years ago
|
||
Push to 1.3t branch. https://hg.mozilla.org/releases/mozilla-b2g28_v1_3t/rev/c6ec532e89ca https://hg.mozilla.org/releases/mozilla-b2g28_v1_3t/rev/c9715ee1bdd1 https://hg.mozilla.org/releases/mozilla-b2g28_v1_3t/rev/0e12c0414846
Flags: needinfo?(cyu)
Assignee | ||
Updated•9 years ago
|
Comment 41•9 years ago
|
||
Nom for 1.3? because we will need this to help fix bug 987556 which is a 1.3+ blocker.
blocking-b2g: 1.3T+ → 1.3?
This is relatively high risk, IMO.
Although we're going to have to fix any issues that come up for Tarako since we took this on 1.3T, so ...
Comment 45•9 years ago
|
||
Ok - in that case, we're probably going to need to punt this then. We can't take anything risky at this point. I'll go back & discuss the blocking bug here with Vance here to see if we can get a waiver for it.
Flags: needinfo?(cyu)
bkelly has a new theory about the blocking bug, fwiw
Comment 47•9 years ago
|
||
Yes, let me see if I can get browsermark to run without this. Its a bit up in the air at the moment. That sad truth is our baseline memory usage went up from b2g18 to b2g26. We more or less compensated with NUWA. That being said, the major issue with the blocking bug is with script source compression being disabled. I just need to verify if I can get browsermark to run with compression or another fix, but without NUWA.
Assignee | ||
Comment 48•9 years ago
|
||
You don't have to really test on Tarako. For a quick verification we can test it on m-c, ./edit-prefs.sh to disable "dom.ipc.processPrelaunch.enabled" and retest.
Comment 49•9 years ago
|
||
Triage decided to move this to backlog but reevaluate if this is deemed necessary to fix bug 987556.
blocking-b2g: 1.3? → backlog
Updated•8 years ago
|
blocking-b2g: backlog → ---
tracking-b2g:
--- → backlog
You need to log in
before you can comment on or make changes to this bug.
Description
•