/system/bin/b2g.sh needs to defend against failed critical section during updating

RESOLVED FIXED in B2G C3 (12dec-1jan)

Status

Firefox OS
GonkIntegration
P1
critical
RESOLVED FIXED
6 years ago
6 years ago

People

(Reporter: cjones, Assigned: bjacob)

Tracking

unspecified
B2G C3 (12dec-1jan)
ARM
Gonk (Firefox OS)

Firefox Tracking Flags

(blocking-basecamp:+)

Details

(Whiteboard: [b2g-gfx])

Attachments

(4 attachments, 3 obsolete attachments)

In updater.cpp, there's a critical section that looks like

 mv /system/b2g /system/b2g.bak
 mv /system/b2g.bak/updated /system/b2g

This has to be bulletproof.  If it fails, we leave behind a brick.

This isn't atomic, so we have one known bricking failure mode currently

 1, mv /system/b2g /system/b2g.bak
 2. [changes are flushed to disk]
 3. crash / power loss etc.

In this case, we're left with only /system/b2g.bak, and b2g won't start back up.

If we were updating a /system/b2g symlink, we could do this atomically.  But in the current update algorithm, the only way to fix this is through /system/bin/b2g.sh.  On startup, it needs to make a check like

 if !exists(/system/b2g):
   mv /system/b2g.bak /system/b2g

This blocks using gecko updates.

mwu, want to grab?

Updated

6 years ago
Assignee: nobody → mwu

Comment 1

6 years ago
Hm this is trickier than just moving b2g.bak. We also need to remount system rw so we can move the directory, and then remount system ro. I don't know of any way to do this within a shell script, so we may need a dedicated program to do this.
Assignee: mwu → bjacob
Whiteboard: [b2g-gfx]

Comment 2

6 years ago
That change wasn't intentional, right?
Assignee: bjacob → mwu
Whiteboard: [b2g-gfx]

Comment 3

6 years ago
(In reply to Michael Wu [:mwu] from comment #2)
> That change wasn't intentional, right?

It was, thought we could take some load off you, but if you prefer to handle it yourself, understood.

Comment 4

6 years ago
(Understood you were OOO until at least Thursday)
(Assignee)

Comment 5

6 years ago
(Happy to try this bug, let me know; hadn't started work on it yet.)

Comment 6

6 years ago
Ah ok. This was marked as b2g-gfx but it's not a graphics bug so I thought the change was a mistake.

bjacob, feel free to take this bug. Basically I think we need a program that:

1. Checks for the existence of /system/b2g
2. Quits if it exists, otherwise
3. Remounts the system partition rw
4. Moves /system/b2g.bak to /system/b2g
5. Remounts the system partition ro

See http://mxr.mozilla.org/mozilla-central/source/toolkit/mozapps/update/updater/automounter_gonk.cpp for how remounting works.

This program would need to be run from b2g.sh in gonk-misc. The program can be added to gonk-misc and an appropriate module added to gonk-misc/Android.mk. Then PRODUCT_PACKAGSE in gonk-misc/b2g.mk and build/product/target/b2g.mk needs to have the new module/program added.

It's relatively straight forward but I just haven't been able to get to it due to travel.
(Assignee)

Comment 7

6 years ago
Thanks for the explanation! Happy if I can save you some time.

Updated

6 years ago
Assignee: mwu → bjacob
Right, we're using [b2g-gfx] as a way to track what the graphics team is looking at, rather than just the bugs in graphics. Makes it easier for JP to chase me down :-)
Whiteboard: [b2g-gfx]
(Assignee)

Comment 9

6 years ago
Created attachment 693576 [details] [diff] [review]
proof of concept, currently implemented in the updater itself to avoid build-system questions for now

Since I don't understand the build system around here very well, and I had to link to automounter_gonk.cpp anyway, I figured that it would be simplest to make a proof of concept directly in the updater binary --- even if in the end that doesn't make any sense as this program must be installed outside of /system/b2g.

Usage:

  > updater --b2g-ftw

I did minimal testing that it detects and renames directories correctly, but I just blindly trusted that GonkAutoMounter does its job.

For next steps,
 - where should this program go, both in our source tree and in the device filesystem?
 - do you have an example to follow for the build-system changes?
 - what's the plan to link to automounter_gonk.cpp? Just have the source in the same directory?
Attachment #693576 - Flags: feedback?(mwu)

Comment 10

6 years ago
(In reply to Benoit Jacob [:bjacob] from comment #9)
> For next steps,
>  - where should this program go, both in our source tree and in the device
> filesystem?
>  - do you have an example to follow for the build-system changes?

See the fakeperm module in https://github.com/mozilla-b2g/gonk-misc/blob/master/Android.mk#L56 for one example of how to put in a new module in the Android build system. It should go into the gonk-misc repo. The module then needs to be listed in PRODUCT_PACKAGES in gonk-misc/b2g.mk and build/product/target/b2g.mk.

>  - what's the plan to link to automounter_gonk.cpp? Just have the source in
> the same directory?

We can just have a copy of automounter_gonk in gonk-misc. The only issue is that it's MPL licensed and code that's in gonk-misc should generally be apache 2 licensed.
Thanks! Another question for my local testing. I don't why it worked the first time I tried, but as expected, doing this in adb shell,

   cd /system
   mv b2g b2g.bak

fails as the partition is read-only. I can't easily unmount it as it's "busy". How can I remount /system read-write and then read-only from adb shell?

Comment 12

6 years ago
"adb remount" makes the system partition readwrite. Dunno if there's a way to make it readonly again.
Update: I now have a stand-alone executable in /system/bin, that works. I need to clean my patch a bit (currently it duplicates more gecko/ code than necessary; if I can use exported Gecko headers I'll be able to make a smaller patch). I've tested it by renaming b2g to b2g.bak, rebooting, and running my program; it just works. Will upload clean patch asap.

Question:
when this program terminates successfully, we probably want to reboot, right? Should this program take care of rebooting?

Comment 14

6 years ago
I don't think we need to reboot. This program can be called in https://github.com/mozilla-b2g/gonk-misc/blob/master/b2g.sh before attempting to start b2g.
Created attachment 693734 [details] [diff] [review]
implementation without code duplication #including .cpp files

How do you like this approach?

Pro:
  - minimal size patch;
  - no duplicate files;
  - no modification to gecko files such as the automounter;
  - no non-Apache-2 code in gonk-misc (and MPL2 is unambiguously not propagating copyleft through #includes).

Con:
  - #including cpp files from gecko/ into gonk-misc is ugly.

I also have alternative patches if you prefer:
  - either duplicating 3 files, automounter_gonk.* and updatelogging.cpp, into gonk-misc.
  - or duplicating only automounter_gonk.* and tweaking them to remove the need for updatelogging.*.

Also, what name would you like for this program?
Attachment #693576 - Attachment is obsolete: true
Attachment #693576 - Flags: feedback?(mwu)
Attachment #693734 - Flags: feedback?(mwu)
Note: this patch is only gonk-misc; also need this patch to project build/ :

$ ./repo diff

project build/
diff --git a/target/product/b2g.mk b/target/product/b2g.mk
index 7a11bd3..a036032 100644
--- a/target/product/b2g.mk
+++ b/target/product/b2g.mk
@@ -31,6 +31,7 @@ PRODUCT_PACKAGES := \
        MozTT-Regular.ttf \
        MozTT-Medium.ttf \
        MozTT-Bold.ttf \
+       b2gftw \
        $(NULL)
Also, regarding this line in the patch:

LOCAL_CFLAGS       := -D__BEGIN_DECLS='extern "C" {' -D__END_DECLS='}'

This is because android_reboot.h, which is included by the automounter code, has __BEGIN_DECLS and __END_DECLS and it isn't clear to me where they are supposed to be defined, but it seemed clear that that was what they were supposed to expand to, from the linking errors that I got.
Created attachment 693741 [details] [diff] [review]
implementation without code duplication #including .cpp files

Figured that the issue I mentioned in my previous comment could be better addressed by #including this system header, which defines __{BEGIN,END}_DECLS.
Attachment #693734 - Attachment is obsolete: true
Attachment #693734 - Flags: feedback?(mwu)
Attachment #693741 - Flags: feedback?(mwu)
Created attachment 693889 [details] [diff] [review]
implementation without code duplication #including .cpp files

Unless there is an objection to the design (see comment 15) this seems ready for review. Renamed to 'recover_from_backup'; removed another unneeded line from Android.mk.
Attachment #693741 - Attachment is obsolete: true
Attachment #693741 - Flags: feedback?(mwu)
Attachment #693889 - Flags: review?(mwu)
Created attachment 693956 [details] [diff] [review]
Use it in b2g.sh to actually recover

The only tricky part, as explained in the comment, is that we must do that _before_ setting LD_PRELOAD or anything else that would add a dependency on /system/b2g.

Tested on an Otoro device:
 - if I rename b2g to b2g.bak and reboot, it successfully recovers and boots
 - if I rename b2g to something else and reboot, it fails with the expected error messages
Attachment #693956 - Flags: review?(mwu)
Created attachment 693958 [details] [diff] [review]
The build/ part

Trivial but for the sake of completeness. Separate patch because separate git repo.
Attachment #693958 - Flags: review?(mwu)

Updated

6 years ago
Attachment #693958 - Flags: review?(mwu) → review+

Comment 22

6 years ago
Comment on attachment 693889 [details] [diff] [review]
implementation without code duplication #including .cpp files

Review of attachment 693889 [details] [diff] [review]:
-----------------------------------------------------------------

Overall, looks fine. Requesting review from dhylands for a sanity check on the recovery_from_backup.cpp implementation.

I don't like including those files from gecko, but I can't think of alternatives that are much better.

::: Android.mk
@@ +67,5 @@
> +LOCAL_MODULE_CLASS := EXECUTABLES
> +LOCAL_SRC_FILES    := recover_from_backup.cpp
> +LOCAL_C_INCLUDES   := $(GECKO_OBJDIR)/dist/include \
> +                      $(GECKO_OBJDIR)/dist/include/nspr \
> +                      gecko/toolkit/mozapps/update/common \

s/gecko/$(GECKO_PATH)/ (you may need to move this to make it work)
Attachment #693889 - Flags: review?(mwu)
Attachment #693889 - Flags: review?(dhylands)
Attachment #693889 - Flags: review+

Comment 23

6 years ago
Comment on attachment 693956 [details] [diff] [review]
Use it in b2g.sh to actually recover

Review of attachment 693956 [details] [diff] [review]:
-----------------------------------------------------------------

::: b2g.sh
@@ +10,5 @@
> +#
> +# Important: this must be done before we set LD_PRELOAD, as this must run
> +# without any dependency on /system/b2g.
> +if [ ! -d /system/b2g ]; then
> +  echo "No /system/b2g directory. Attempting recovery."

echo doesn't put any messages in the system log. We should also log to the system log when this happens. I think "log" is the command you'll want to use.
Comment on attachment 693889 [details] [diff] [review]
implementation without code duplication #including .cpp files

Review of attachment 693889 [details] [diff] [review]:
-----------------------------------------------------------------

All of my comments are really nits, so r=me with nits addressed.

::: recover_from_backup.cpp
@@ +15,5 @@
> + */
> +
> +#include "sys/cdefs.h" // for __BEGIN_DECLS needed under automounter_gonk.cpp
> +
> +#include "automounter_gonk.cpp"

I think its worth a comment explaining why you're #including the .cpp files

@@ +16,5 @@
> +
> +#include "sys/cdefs.h" // for __BEGIN_DECLS needed under automounter_gonk.cpp
> +
> +#include "automounter_gonk.cpp"
> +#include "updatelogging.cpp"

As far as I can tell, the only reason we're including this is because the automounter_gonk.cpp file calls LogFlush.

We don't seem to call LogInit anywhere so you might be better off to just #define LogFlush() and remove the #include

It might be even better to modify automounter_gonk.cpp to do:

#if !defined(LogFlush)
#define LogFlush()
#endif

and an appropriate comment

@@ +24,5 @@
> +#include <sys/types.h>
> +#include <sys/stat.h>
> +#include <unistd.h>
> +
> +#undef LOG_TAG

It's probably worth mentioning that we're #undef ing these because we #included automounter_gonk.cpp

@@ +45,5 @@
> +int main()
> +{
> +  struct stat buf;
> +
> +  if (stat(system_b2g_path, &buf) == 0) {

Do we care if /system/b2g isn't a directory?

Probably not, since we're only trying to cover off the case that the updater failed, and I'm not sure if we're trying to be super paranoid here or not.

@@ +71,5 @@
> +         "We're probably going to fail, but we have nothing to lose trying...\n");
> +    // continue, we have nothing to lose.
> +  }
> +
> +  if (rename(system_b2g_backup_path, system_b2g_path) == -1) {

You should use != 0 to be consistent with your comparisons to stat above
Attachment #693889 - Flags: review?(dhylands) → review+
(In reply to Michael Wu [:mwu] from comment #23)
> Comment on attachment 693956 [details] [diff] [review]
> Use it in b2g.sh to actually recover
> 
> Review of attachment 693956 [details] [diff] [review]:
> -----------------------------------------------------------------
...snip...
> echo doesn't put any messages in the system log. We should also log to the
> system log when this happens. I think "log" is the command you'll want to
> use.

log -p E "Some message here"

will log with error severity.
I think that this could all be done from the script level:

if [ ! -d /system/b2g -a -d /system/b2g.bak ]; then
  log -p W "No /system/b2g directory. Attempting recovery."
  mount -w -o remount /system
  mv /system/b2g.bak /system/b2g
  mount -r -o remount /system
  if [ -d /system/b2g ]; then
    log "Recovery successful."
  else
    log -p E "Recovery failed."
  fi
fi

Comment 27

6 years ago
Comment on attachment 693956 [details] [diff] [review]
Use it in b2g.sh to actually recover

Clearing review, waiting for a patch with better logging.

Alternately, if dhylands' suggestion about using the mount command works, we should just do that.
Attachment #693956 - Flags: review?(mwu)
Yup, the suggestion in comment 26 works from quickly trying in |adb shell|. Looks like CC'ing Dave on this would have saved a bit of time :-P  Turning comment 26 into a patch now.
Created attachment 695018 [details] [diff] [review]
do it all in b2g.sh per comment 26

Works!

Tried to add meaningful logging.
Attachment #695018 - Flags: review?(dhylands)
Comment on attachment 695018 [details] [diff] [review]
do it all in b2g.sh per comment 26

Review of attachment 695018 [details] [diff] [review]:
-----------------------------------------------------------------

Looks reasonable to me
Attachment #695018 - Flags: review?(dhylands) → review+
Comment on attachment 695018 [details] [diff] [review]
do it all in b2g.sh per comment 26

Review of attachment 695018 [details] [diff] [review]:
-----------------------------------------------------------------

::: b2g.sh
@@ +4,5 @@
>  mkdir -p $TMPDIR
>  chmod 1777 $TMPDIR
>  
> +if [ ! -d /system/b2g ]; then
> +

And I'd probably remove these blank lines.

@@ +11,5 @@
> +  if [ -d /system/b2g.bak ]; then
> +
> +    mount -w -o remount /system
> +    if [ $? -ne 0 ]; then
> +      log -p E "Failed to remount /system read-write"

Actually, a tip here. You can do:

if ! mount -w -o remount /system; then
   deal with error
fi
Merged:
https://github.com/mozilla-b2g/gonk-misc/commit/ffc42185726384eaa80ea685bd9c95b7dab467c3
Component: Application Update → Builds
Product: Toolkit → Boot2Gecko
Status: NEW → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.