Closed
Bug 821192
Opened 12 years ago
Closed 12 years ago
nsUpdateDriver needs to waitpid() all subprocesses before applying update
Categories
(Toolkit :: Startup and Profile System, defect, P1)
Tracking
()
People
(Reporter: cjones, Assigned: cjones)
References
Details
Attachments
(3 files, 1 obsolete file)
1.28 KB,
patch
|
dhylands
:
review+
|
Details | Diff | Splinter Review |
7.37 KB,
patch
|
bent.mozilla
:
review+
|
Details | Diff | Splinter Review |
1.93 KB,
patch
|
dhylands
:
review+
|
Details | Diff | Splinter Review |
Right now we simply exec() the updater binary over the b2g process to finish applying the background update. This is subject to the following race-condition-y problem
1. b2g process forks new subprocess
2. Immediately afterwards, b2g process exec("updater")s
At this point, CLOEXEC will take out the b2g process's socketpair with the new subprocess. However, the following two steps can race
3a. updater moves old /system/b2g directory out of the way
3b. updater moves new /system/b2g directory into place
4a. plugin-container exec()s
4b. plugin-container finishes loading dynamic libraries
If this happens, extremely scary things can happen in the plugin-container. Most of those things should be harmless crashes, but since Camera and Movie apps will have high privileges we need to defend against that.
The solution is to enumerate the live subprocesses just before we exec() in nsUpdateProcessor and join with all of them.
bb? to triage assignee.
Updated•12 years ago
|
Component: Application Update → Startup and Profile System
Updated•12 years ago
|
Assignee: nobody → dhylands
blocking-basecamp: ? → +
Priority: -- → P1
Target Milestone: --- → B2G C3 (12dec-1jan)
Assignee | ||
Comment 1•12 years ago
|
||
We have a helper to enumerate child processes here
http://mxr.mozilla.org/mozilla-central/source/dom/ipc/ContentParent.h#82
I think we'll need a new interface ContentParent::KillAndWait() that uses the existing process_util.h helper. We also need to double check that that helper is bulletproof.
Comment 3•12 years ago
|
||
NP. I hadn't started any coding, just looking at where it would be coded. I'm happy to review...
Assignee | ||
Comment 4•12 years ago
|
||
Includes fixes for a couple of other unrelated bugs. Need to split.
Assignee | ||
Comment 5•12 years ago
|
||
(In reply to Dave Hylands [:dhylands] from comment #3)
> NP. I hadn't started any coding, just looking at where it would be coded.
> I'm happy to review...
No worries. I was afraid this one would get messy, and it did :/.
Comment 6•12 years ago
|
||
Comment on attachment 695587 [details] [diff] [review]
Patch
Review of attachment 695587 [details] [diff] [review]:
-----------------------------------------------------------------
At a conceptual level, this is more or less what was thinking would need to happen.
::: dom/ipc/ContentParent.cpp
@@ +315,5 @@
> + {
> + MonitorAutoLock lock(monitor);
> + while (!done) {
> + lock.Wait();
> + }
The only issue I can think of is that this is going to block the MainThread. If the IOThread happened to do something similar (Post a runnable to the main thread) at around the same time, then we could deadlock.
Assignee | ||
Comment 7•12 years ago
|
||
The IO thread isn't allowed to block on the main thread, because the main thread needs to block on the IO thread when sending sync IPC.
Assignee | ||
Comment 8•12 years ago
|
||
When we're (asynchronously) shutting content processes, their exits result in SIGCHLD being delivered to the parent, which interrupts the sleep. So we have so far basically always been committing SIGKILL seppuku right after attempting restart. That's very bad because it means our profile sync may not be completing, data loss issue.
Attachment #695841 -
Flags: review?(dhylands)
Assignee | ||
Comment 9•12 years ago
|
||
Attachment #695587 -
Attachment is obsolete: true
Attachment #695842 -
Flags: review?(bent.mozilla)
Assignee | ||
Comment 10•12 years ago
|
||
Fixes the "real bug" here. We only need to do this when restarting, because reboots and poweroffs will use the hardware itself to kill any running processes.
Attachment #695843 -
Flags: review?(dhylands)
Comment 11•12 years ago
|
||
Comment on attachment 695841 [details] [diff] [review]
part 1: Fix the watchdog timeout code
Review of attachment 695841 [details] [diff] [review]:
-----------------------------------------------------------------
::: hal/linux/LinuxPower.cpp
@@ +79,5 @@
> + if (sleepSeconds <= 0) {
> + break;
> + }
> + sleep(sleepSeconds);
> + }
This should use TimeTicks::Now which is a monotonically increasing time, whereas TimeStamp::Now is wall time, which could change (due to timezone changes or the user setting the time).
Updated•12 years ago
|
Attachment #695843 -
Flags: review?(dhylands) → review+
Assignee | ||
Comment 12•12 years ago
|
||
(In reply to Dave Hylands [:dhylands] from comment #11)
> Comment on attachment 695841 [details] [diff] [review]
> part 1: Fix the watchdog timeout code
>
> Review of attachment 695841 [details] [diff] [review]:
> -----------------------------------------------------------------
>
> ::: hal/linux/LinuxPower.cpp
> @@ +79,5 @@
> > + if (sleepSeconds <= 0) {
> > + break;
> > + }
> > + sleep(sleepSeconds);
> > + }
>
> This should use TimeTicks::Now which is a monotonically increasing time,
> whereas TimeStamp::Now is wall time, which could change (due to timezone
> changes or the user setting the time).
TimeStamp is monotonic [1], rather painfully so on some platforms :/. (I wrote the POSIX clock_gettime(CLOCK_MONOTONIC) implementation, which was delightfully non-painful :).)
[1] http://mxr.mozilla.org/mozilla-central/source/xpcom/ds/TimeStamp.h#158
Comment 13•12 years ago
|
||
(In reply to Chris Jones [:cjones] [:warhammer] from comment #12)
> > This should use TimeTicks::Now which is a monotonically increasing time,
> > whereas TimeStamp::Now is wall time, which could change (due to timezone
> > changes or the user setting the time).
>
> TimeStamp is monotonic [1], rather painfully so on some platforms :/. (I
> wrote the POSIX clock_gettime(CLOCK_MONOTONIC) implementation, which was
> delightfully non-painful :).)
>
> [1] http://mxr.mozilla.org/mozilla-central/source/xpcom/ds/TimeStamp.h#158
Right you are - I was confused and was looking at Time::Now and not TimeStamp::Now.
Updated•12 years ago
|
Attachment #695841 -
Flags: review?(dhylands) → review+
Comment on attachment 695842 [details] [diff] [review]
part 2: Add an interface to join all live content processes
Review of attachment 695842 [details] [diff] [review]:
-----------------------------------------------------------------
::: dom/ipc/ContentParent.cpp
@@ +290,5 @@
> + // Don't touch any arguments to this function from now on.
> +}
> +
> +/*static*/ void
> +ContentParent::JoinAllSubprocesses()
Nit: I'd assert main thread here.
@@ +299,5 @@
> + printf_stderr("There are no live subprocesses.");
> + return;
> + }
> +
> + printf_stderr("Subprocesses are still alive. Doing emergency join.");
Nit: Need a newline.
Attachment #695842 -
Flags: review?(bent.mozilla) → review+
Assignee | ||
Comment 15•12 years ago
|
||
Done and done.
https://tbpl.mozilla.org/?tree=Try&rev=40abbd7ff692
Assignee | ||
Comment 16•12 years ago
|
||
Comment 17•12 years ago
|
||
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Comment 18•12 years ago
|
||
You need to log in
before you can comment on or make changes to this bug.
Description
•