Closed Bug 1056424 Opened 10 years ago Closed 10 years ago

Stuck at loading screen during power cut off test

Categories

(Core Graveyard :: DOM: Apps, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

(blocking-b2g:-, b2g-v1.3T affected, b2g-v1.4 affected, b2g-v2.0 affected)

RESOLVED FIXED
mozilla35
blocking-b2g -
Tracking Status
b2g-v1.3T --- affected
b2g-v1.4 --- affected
b2g-v2.0 --- affected

People

(Reporter: zhenqing.liu, Assigned: fabrice)

References

Details

(Whiteboard: [sprd335053])

Attachments

(3 files)

Cut off the power at different stages when booting the phone. And sometimes it will be stuck at the loading screen.

This bug can be consistently reproduced if we delete the info about |system.gaiamobile.org| in |local/webapps/webapps.json|.

So it's about the integrity of |local/webapps/webapps.json|. I think we'd better add some check out to make sure the file be rewritten when broken up,  instead of stuck at the loading screen.
Whiteboard: [sprd335053]
If we delete system.gaiamobile.org information from data/local/webapps/webapps.json, we can reproduce this issue on 1.3t/1.4/2.0.
Hi Zhenqing,
Could you explain what is "power cut off test"? We want to know the test steps.
Flags: needinfo?(zhenqing.liu)
1. Power on the phone.
2. Remove the battery directly after 15s.
3. Wait for 3s and power on the phone again.
4. Repeat Step 1-3 and increase the time in Step 2 for 2ms every test.

This is an auto test and The probability to reproduce this bug is not very high. But there is indeed a risk.
Flags: needinfo?(zhenqing.liu)
Hi Danny,
Is there any proper solution about this bug?
Flags: needinfo?(dliang)
Hi Fabrice,

Not sure if this is something you can look at or trigger a discussion on, but I believe we do need some sort of protection or fallback mechanism to avoid these cases.
i.e. power cut (battery falls out etc) when certain files are being written to.

In such cases the phone cannot be powered up again correctly.
Flags: needinfo?(dliang) → needinfo?(fabrice)
Hi Wayne, yes this is something we need to fix. I think I know what to do and will post a patch,
Component: Stability → DOM: Apps
Flags: needinfo?(fabrice)
Product: Firefox OS → Core
Summary: [Tarako]Stuck at loading screen during power cut off test → Stuck at loading screen during power cut off test
Assignee: nobody → fabrice
Hi Fabrice,

Could we know any progress on this problem? This bug was reproduced again!
Flags: needinfo?(fabrice)
[Blocking Requested - why for this release]:
Can we enter recovery mode when we meet this issue?
blocking-b2g: --- → 1.4?
Hi Fabrice, May we know the progress of the issue ?
Since it might impact mass production, and once it happened, user couldn't the reboot the phone again.
Could you please kindly provide your findings and investigation on this ?
Thank you very much.
James, can you test with this patch?
Attachment #8481103 - Flags: feedback?(james.zhang)
Flags: needinfo?(fabrice)
Thanks, Fabrice, I am testing this patch,and I'll do a reply if anything turns up.
Comment on attachment 8481103 [details] [diff] [review]
corrupted-webapps.patch

Validation failed.
Attachment #8481103 - Flags: feedback?(james.zhang) → feedback-
Hi Fabrice,

Thanks for following up, sorry I have been in and out of office.

From what you're seeing and working on, do you think the occurrence of this is likely? 
I understand unless the battery fails abruptly, we do a graceful device shut down when the battery is 1% (or 2?), which should avoid this right?

Trying to understand the impact on a blocking decision.

Thanks again
Flags: needinfo?(fabrice)
You can delete |system.gaiamobile.org| in |data/local/webapps/webapps.json| to verify, you can also delete |data/local/webapps/webapps.json| .
I have seen this once after updating with too little space on /data. It would try to copy webapps added in the update to the user profile and run out of space. 

There's a warning displayed at the bottom over the splash screen in Gaia about low available space, then it would be stuck at the splash screen forever. The power button brought up the power menu and I could reboot from there. Solved it for me.
Hi Fabrice, I am sceptical about this line in the patch
      
+        Services.obs.notifyObservers(null, "webapps-before-update-merge", null);

Is "webapps-before-update-merge" a new event? I can't find the handling process. May I know how to process it?
(In reply to James Zhang (Spreadtrum) from comment #14)
> You can delete |system.gaiamobile.org| in |data/local/webapps/webapps.json|
> to verify, you can also delete |data/local/webapps/webapps.json| .

That's what I did... Can you send me your webapps.json?

(In reply to Shiwei Zhang from comment #16)
> Hi Fabrice, I am sceptical about this line in the patch
>       
> +        Services.obs.notifyObservers(null, "webapps-before-update-merge",
> null);
> 
> Is "webapps-before-update-merge" a new event? I can't find the handling
> process. May I know how to process it?

My patch just corrects a whitespace issue. This observer notification was introduced in bug 938171, where you will find the xpcom component that uses it.
Flags: needinfo?(fabrice)
(In reply to Wayne Chang [:wchang] from comment #13)
> Hi Fabrice,
> 
> Thanks for following up, sorry I have been in and out of office.
> 
> From what you're seeing and working on, do you think the occurrence of this
> is likely?

It's not extremely likely, but can definitely happen, eg. if the user removes the battery.

> I understand unless the battery fails abruptly, we do a graceful device shut
> down when the battery is 1% (or 2?), which should avoid this right?

Yes, we shutdown cleanly in these cases.
Fabrice, this patch failed on v1.3t and v1.4, not master branch.
(In reply to James Zhang (Spreadtrum) from comment #19)
> Fabrice, this patch failed on v1.3t and v1.4, not master branch.

Interesting. I'll do a new 1.3t build then!
With your patch on 1.3t or 1.4, delete webapps.json and reboot, you can see {} in webapps.json and the phone still stucks at boot animation.
(In reply to Fabrice Desré [:fabrice] from comment #17)

> > Is "webapps-before-update-merge" a new event? I can't find the handling
> > process. May I know how to process it?
> 
> My patch just corrects a whitespace issue. This observer notification was
> introduced in bug 938171, where you will find the xpcom component that uses
> it.

We meet the issue in v1.3t, So I think there is no need to add this line

+        Services.obs.notifyObservers(null, "webapps-before-update-merge", null);

According to these lines above it,

+          // We configured a system app but can't find it. That prevents us
+          // from starting so we clear our registry to start again from scratch.
+          if (!systemAppFound) {
+            runUpdate = true;
+          }
+      } catch(e) {}
+
       if (runUpdate) {

Can webapps.json be restored if runUpdate is true?
(In reply to Shiwei Zhang from comment #22)
> (In reply to Fabrice Desré [:fabrice] from comment #17)
> 
> > > Is "webapps-before-update-merge" a new event? I can't find the handling
> > > process. May I know how to process it?
> > 
> > My patch just corrects a whitespace issue. This observer notification was
> > introduced in bug 938171, where you will find the xpcom component that uses
> > it.
> 
> We meet the issue in v1.3t, So I think there is no need to add this line
> 
> +        Services.obs.notifyObservers(null, "webapps-before-update-merge",
> null);

My patch was not against 1.3t, but master. I said in comment 20 that I will look at the issue on tarako.

> According to these lines above it,
> 
> +          // We configured a system app but can't find it. That prevents us
> +          // from starting so we clear our registry to start again from
> scratch.
> +          if (!systemAppFound) {
> +            runUpdate = true;
> +          }
> +      } catch(e) {}
> +
>        if (runUpdate) {
> 
> Can webapps.json be restored if runUpdate is true?

That's what will happen, yes. Actually the issue that makes this patch fail on 1.3t is that the preference that we use to find the system app manifest url has changed. Replacing b2g.system_manifest_url by browser.manifestURL in my patch should work on 1.3t/1.4.
Fabrice, this patch works with "browser.manifestURL".
Can you land this patch on all branch? Thank you.
Status: NEW → ASSIGNED
Flags: needinfo?(fabrice)
Triage - minus due to the unlikelihood of this issue occurring during normal usage, but we should aim at having this fixed in the long term.

James, if you feel the need and the patch is beneficial to you, please patch it locally.
blocking-b2g: 1.4? → -
Attachment #8481103 - Flags: feedback- → review?(myk)
Flags: needinfo?(fabrice)
Comment on attachment 8481103 [details] [diff] [review]
corrupted-webapps.patch

Review of attachment 8481103 [details] [diff] [review]:
-----------------------------------------------------------------

This needs rebasing over bug 1058101, but that's easy enough.

Is it worth adding a test for this specific case, or is existing test coverage sufficient?

::: dom/apps/src/Webapps.jsm
@@ +635,5 @@
>  
> +      try {
> +        let systemManifestURL =
> +          Services.prefs.getCharPref("b2g.system_manifest_url");
> +          let systemAppFound = false;

Nit: from this line to the end of the block, the code is indented two extra spaces.

@@ +638,5 @@
> +          Services.prefs.getCharPref("b2g.system_manifest_url");
> +          let systemAppFound = false;
> +          for (let id in this.webapps) {
> +            if (this.webapps[id].manifestURL == systemManifestURL) {
> +              systemAppFound = true;

Nit: this should also break out of the loop so it doesn't unnecessarily loop over the rest of the apps after finding the system one.

But Array.some would be even easier:

  let systemAppFound = this.webapps.some(v => v.manifestURL == systemManifestURL);

@@ +646,5 @@
> +          // from starting so we clear our registry to start again from scratch.
> +          if (!systemAppFound) {
> +            runUpdate = true;
> +          }
> +      } catch(e) {}

Nit: blocks like this should include a comment explaining what might get thrown and why it's ok to ignore the exception(s).
Attachment #8481103 - Flags: review?(myk) → review+
https://hg.mozilla.org/mozilla-central/rev/f187c2c42c50
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla35
Ni andreas, strongly suggest that you merge this patch to your branch, we can reproduce it on dev of sprocomm.
Flags: needinfo?(pehrsons)
We cannot get this in place for users' first boot anyway. Perhaps uplift to 2.0?
Flags: needinfo?(pehrsons)
If reset phone and power cut off when boot animation, you will also meet this issue.
(In reply to Andreas Pehrson [:pehrsons] (Telenor) from comment #32)
> We cannot get this in place for users' first boot anyway. Perhaps uplift to
> 2.0?

Hi Andreas,

This problem has high reproduce rate in our v1.4.I think it is matter with profiles.ini and profile directories in /data/b2g/mozilla.
Thanks.
(In reply to ben.song from comment #34)
> (In reply to Andreas Pehrson [:pehrsons] (Telenor) from comment #32)
> > We cannot get this in place for users' first boot anyway. Perhaps uplift to
> > 2.0?
> 
> Hi Andreas,
> 
> This problem has high reproduce rate in our v1.4.I think it is matter with
> profiles.ini and profile directories in /data/b2g/mozilla.
> Thanks.

For customers? What's triggering it?
Flags: needinfo?(ben.song)
(In reply to Andreas Pehrson [:pehrsons] (Telenor) from comment #35)
> (In reply to ben.song from comment #34)
> > (In reply to Andreas Pehrson [:pehrsons] (Telenor) from comment #32)
> > > We cannot get this in place for users' first boot anyway. Perhaps uplift to
> > > 2.0?
> > 
> > Hi Andreas,
> > 
> > This problem has high reproduce rate in our v1.4.I think it is matter with
> > profiles.ini and profile directories in /data/b2g/mozilla.
> > Thanks.
> 
> For customers? What's triggering it?

Hi Andreas,

Not for customers,but in our power cut test.After four days test phone also stays at loading screen,we could find there are too many profile directories like ********.default and in designated profile directory there are too many prefs.js files.

If we remove or empty profiles.ini, phone would return to normal.And if we empty directory profile specified in profiles.ini, phone would return to normal.
Flags: needinfo?(ben.song) → needinfo?(pehrsons)
I don't think we merged this fix to our 1.4 repo - are you testing with or without it?

It essentially happens only after power cut on first boot as well, right?
Afaik, first boot was done in the factory, so if at all exposed to users it should be very rare (after app installs when webapps.json is modified I could imagine).
Flags: needinfo?(pehrsons) → needinfo?(ben.song)
(In reply to Andreas Pehrson [:pehrsons] (Telenor) from comment #37)
> I don't think we merged this fix to our 1.4 repo - are you testing with or
> without it?
> 
> It essentially happens only after power cut on first boot as well, right?
> Afaik, first boot was done in the factory, so if at all exposed to users it
> should be very rare (after app installs when webapps.json is modified I
> could imagine).

Hi Andreas,

We have merged this fix in v1.4, and we have tested it with and without it.But i don't know the problem without it is or not same with the problem with it.I think this new problem is diferent with the last problem fixed.For in last problem if we remove or empty profiles.ini, phone would not return to normal.
Flags: needinfo?(ben.song)
OK, please file a new bug for that issue. Include (as clear as possible) reproduction steps and consequences. We should also figure out if it could happen for 2.0 and up. Perhaps Fabrice can take a look at it since he fixed this bug.
(In reply to Andreas Pehrson [:pehrsons] (Telenor) from comment #39)
> OK, please file a new bug for that issue. Include (as clear as possible)
> reproduction steps and consequences. We should also figure out if it could
> happen for 2.0 and up. Perhaps Fabrice can take a look at it since he fixed
> this bug.

OK,Thanks.We have filed a new bug in Bug 1092059.
See Also: → 1092059
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: