Closed Bug 821192 Opened 12 years ago Closed 12 years ago

nsUpdateDriver needs to waitpid() all subprocesses before applying update

Categories

(Toolkit :: Startup and Profile System, defect, P1)

ARM
Gonk (Firefox OS)
defect

Tracking

()

RESOLVED FIXED
B2G C3 (12dec-1jan)
blocking-basecamp +
Tracking Status
firefox19 --- fixed
firefox20 --- fixed
b2g18 --- fixed

People

(Reporter: cjones, Assigned: cjones)

References

Details

Attachments

(3 files, 1 obsolete file)

Right now we simply exec() the updater binary over the b2g process to finish applying the background update. This is subject to the following race-condition-y problem 1. b2g process forks new subprocess 2. Immediately afterwards, b2g process exec("updater")s At this point, CLOEXEC will take out the b2g process's socketpair with the new subprocess. However, the following two steps can race 3a. updater moves old /system/b2g directory out of the way 3b. updater moves new /system/b2g directory into place 4a. plugin-container exec()s 4b. plugin-container finishes loading dynamic libraries If this happens, extremely scary things can happen in the plugin-container. Most of those things should be harmless crashes, but since Camera and Movie apps will have high privileges we need to defend against that. The solution is to enumerate the live subprocesses just before we exec() in nsUpdateProcessor and join with all of them. bb? to triage assignee.
Component: Application Update → Startup and Profile System
Assignee: nobody → dhylands
blocking-basecamp: ? → +
Priority: -- → P1
Target Milestone: --- → B2G C3 (12dec-1jan)
We have a helper to enumerate child processes here http://mxr.mozilla.org/mozilla-central/source/dom/ipc/ContentParent.h#82 I think we'll need a new interface ContentParent::KillAndWait() that uses the existing process_util.h helper. We also need to double check that that helper is bulletproof.
Stealing.
Assignee: dhylands → jones.chris.g
NP. I hadn't started any coding, just looking at where it would be coded. I'm happy to review...
Attached patch Patch (obsolete) — Splinter Review
Includes fixes for a couple of other unrelated bugs. Need to split.
(In reply to Dave Hylands [:dhylands] from comment #3) > NP. I hadn't started any coding, just looking at where it would be coded. > I'm happy to review... No worries. I was afraid this one would get messy, and it did :/.
Comment on attachment 695587 [details] [diff] [review] Patch Review of attachment 695587 [details] [diff] [review]: ----------------------------------------------------------------- At a conceptual level, this is more or less what was thinking would need to happen. ::: dom/ipc/ContentParent.cpp @@ +315,5 @@ > + { > + MonitorAutoLock lock(monitor); > + while (!done) { > + lock.Wait(); > + } The only issue I can think of is that this is going to block the MainThread. If the IOThread happened to do something similar (Post a runnable to the main thread) at around the same time, then we could deadlock.
The IO thread isn't allowed to block on the main thread, because the main thread needs to block on the IO thread when sending sync IPC.
When we're (asynchronously) shutting content processes, their exits result in SIGCHLD being delivered to the parent, which interrupts the sleep. So we have so far basically always been committing SIGKILL seppuku right after attempting restart. That's very bad because it means our profile sync may not be completing, data loss issue.
Attachment #695841 - Flags: review?(dhylands)
Fixes the "real bug" here. We only need to do this when restarting, because reboots and poweroffs will use the hardware itself to kill any running processes.
Attachment #695843 - Flags: review?(dhylands)
Comment on attachment 695841 [details] [diff] [review] part 1: Fix the watchdog timeout code Review of attachment 695841 [details] [diff] [review]: ----------------------------------------------------------------- ::: hal/linux/LinuxPower.cpp @@ +79,5 @@ > + if (sleepSeconds <= 0) { > + break; > + } > + sleep(sleepSeconds); > + } This should use TimeTicks::Now which is a monotonically increasing time, whereas TimeStamp::Now is wall time, which could change (due to timezone changes or the user setting the time).
Attachment #695843 - Flags: review?(dhylands) → review+
(In reply to Dave Hylands [:dhylands] from comment #11) > Comment on attachment 695841 [details] [diff] [review] > part 1: Fix the watchdog timeout code > > Review of attachment 695841 [details] [diff] [review]: > ----------------------------------------------------------------- > > ::: hal/linux/LinuxPower.cpp > @@ +79,5 @@ > > + if (sleepSeconds <= 0) { > > + break; > > + } > > + sleep(sleepSeconds); > > + } > > This should use TimeTicks::Now which is a monotonically increasing time, > whereas TimeStamp::Now is wall time, which could change (due to timezone > changes or the user setting the time). TimeStamp is monotonic [1], rather painfully so on some platforms :/. (I wrote the POSIX clock_gettime(CLOCK_MONOTONIC) implementation, which was delightfully non-painful :).) [1] http://mxr.mozilla.org/mozilla-central/source/xpcom/ds/TimeStamp.h#158
(In reply to Chris Jones [:cjones] [:warhammer] from comment #12) > > This should use TimeTicks::Now which is a monotonically increasing time, > > whereas TimeStamp::Now is wall time, which could change (due to timezone > > changes or the user setting the time). > > TimeStamp is monotonic [1], rather painfully so on some platforms :/. (I > wrote the POSIX clock_gettime(CLOCK_MONOTONIC) implementation, which was > delightfully non-painful :).) > > [1] http://mxr.mozilla.org/mozilla-central/source/xpcom/ds/TimeStamp.h#158 Right you are - I was confused and was looking at Time::Now and not TimeStamp::Now.
Attachment #695841 - Flags: review?(dhylands) → review+
Comment on attachment 695842 [details] [diff] [review] part 2: Add an interface to join all live content processes Review of attachment 695842 [details] [diff] [review]: ----------------------------------------------------------------- ::: dom/ipc/ContentParent.cpp @@ +290,5 @@ > + // Don't touch any arguments to this function from now on. > +} > + > +/*static*/ void > +ContentParent::JoinAllSubprocesses() Nit: I'd assert main thread here. @@ +299,5 @@ > + printf_stderr("There are no live subprocesses."); > + return; > + } > + > + printf_stderr("Subprocesses are still alive. Doing emergency join."); Nit: Need a newline.
Attachment #695842 - Flags: review?(bent.mozilla) → review+
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: