Closed Bug 849186 Opened 11 years ago Closed 11 years ago

[Buri][Contacts]Failed to import above 1000 FB friends at one time

Categories

(Firefox OS Graveyard :: Gaia::Contacts, defect, P1)

ARM
Gonk (Firefox OS)
defect

Tracking

(blocking-b2g:tef+, b2g18 unaffected, b2g18-v1.0.0 unaffected, b2g18-v1.0.1 fixed)

VERIFIED FIXED
blocking-b2g tef+
Tracking Status
b2g18 --- unaffected
b2g18-v1.0.0 --- unaffected
b2g18-v1.0.1 --- fixed

People

(Reporter: sync-1, Assigned: jmcf)

References

Details

Attachments

(5 files)

Firefox v1.0.1
 Mozilla build ID: 20130304070202.
 
 +++ This bug was initially created as a clone of Bug #417235 +++
 
 DEFECT DESCRIPTION:
 (1)Try to import one Facebook Account, who has about 1200 friends
 (2)When it is importing, the backlight of the mobile will switched off. when try to light it by power key, unlock the screen, I found the contacts APP is not there any more.Go to contacts, try to update the contacts, I found there are 491 friends imported from 1289.---- NOK1
 
 (3)Try to update all contacts again. Once again, LCD switched off during updating, and when back, COntacts quit. NOK2, here we have FB1.logcat
 
 (4)Try the 3rd time,and try to keep the LCD always on, finally importing can be finished.  , here we have FB2.logcat
 
 REPRODUCING PROCEDURES:
 
 EXPECTED BEHAVIOUR:
 All the friends can be imported at once. No failure in the middle.
 
 ASSOCIATE SPECIFICATION:
 
 TEST PLAN REFERENCE:
 
 TOOLS AND PLATFORMS USED:
 
 USER IMPACT:
 big.
 
 REPRODUCING RATE:
 100%
 
 For FT PR, Please list reference mobile's behavior:
 
 ++++++++++ end of initial bug #417235 description ++++++++++
 
 
 
 CONTACT INFO (Name,Phone number):
 
 DEFECT DESCRIPTION:
 
 REPRODUCING PROCEDURES:
 
 EXPECTED BEHAVIOUR:
 
 ASSOCIATE SPECIFICATION:
 
 TEST PLAN REFERENCE:
 
 TOOLS AND PLATFORMS USED:
 
 USER IMPACT:
 
 REPRODUCING RATE:
 
 For FT PR, Please list reference mobile's behavior:
Attached file logcat logs
blocking-b2g: --- → tef?
Cannot repro it on an unagi device ...

I was testing with a FB user with > 1000 friends.
I tried to test again. Still failed after two times trying.
First time, import until 240 friends
Second time, import until 581 friends

Logcat is attached.
(tef+ per partner request)
blocking-b2g: tef? → tef+
as per the logs it seems the patch for bug 841669 is not consolidated on your build. 

Are you testing this with v1.0.1? If so that should be fixed by merging that patch in v1.0.1

thanks
Flags: needinfo?(sync-1)
Flags: needinfo?(mvines)
Let's hope this is already fixed.  Jose, I'm going to assign to you just in case it's not.  Hope that's okay :)
Assignee: nobody → jmcf
Is the qawanted for needing repro on Unagi? Need a clarification before that keyword is actionable.
qawanted request was to check if this can be repro-ed on the tip of v1.0.1.  But sounds like what may be missing is just an uplift of bug 841669 to v1.0.1?
Flags: needinfo?(mvines)
(In reply to Michael Vines [:m1] [:evilmachines] from comment #9)
> qawanted request was to check if this can be repro-ed on the tip of v1.0.1. 
> But sounds like what may be missing is just an uplift of bug 841669 to
> v1.0.1?

yes, that bug should be uplifted to v1.0.1
OK, clearing qawanted in favor of verifyme then. Once bug 841669 is uplifted we should check this one against 1.0.1 branch.
Keywords: qawantedverifyme
(In reply to Geo Mealer [:geo] from comment #11)
> OK, clearing qawanted in favor of verifyme then. Once bug 841669 is uplifted
> we should check this one against 1.0.1 branch.

bug 841669 has been already uplifted to v1.0.1 so you can verify if this issue is fixed.
Heh, the uplift notice got posted 8 minutes after my comment! Good to know, thanks for the heads up.
Went to go do this with today's build and realized this is another Buri bug. I can go do a verification pass on Unagi (and will today), but don't have the setup to handle it for Buri.
Gecko/Gaia v1.0.1 is basically identical between these two devices, so I think if we can duplicate on Unagi v1.0.1 tip then I would not expect this issue to manifest on Buri at tip
OK. So I tried this on two different Unagi builds--one without the uplift and one with. Neither behaved all that well.

Facebook account has 1335 contacts. Once actually in the import, found slightly less (was something like 1317, sorry I didn't write that down. :( )

Without: 3/12 Nightly 1.0.1 20130312070203

Imported contacts. Stopped ~150-200 or so. Updated. Paused for a few more seconds than was convenient, then continued to 1151 and then stopped. Tried restarting, hung at the Connecting stage for several minutes before I cancelled.

With: 3/13 Nightly 1.0.1 20130313070202

Imported contacts. Much quicker. Stopped at *exactly* 1000 contacts imported. Tried updating, paused for what must have been a minute or two, then imported ten contacts and hung. Ended up closing the contact app after several minutes to abort.

So a few observations: *

* Whole thing was pretty slow. 
* Behavior on update looked an awful lot like there was a pause that scaled to the number already imported (with network activity during the pause), making me wonder if it pulled the full list every time and just skipped past ones already seen.

But most importantly, definitely looked like it was still stopping at 1000. That exact 1000 number on the second try was very round. Could be coincidence, but...

I'll try setting this up again tomorrow a little more scientifically, if I can. 

However, I may be limited by lack of popularity on this one--I only have 600 FB friends. We have one person with a large enough account (who I borrowed) but he's out of town the next two days. We're exploring getting more populated test accounts.
Oh, should add, since each import operation was several minutes the phone did time out and sleep the display several times. 

I didn't see any correlation between sleeping and stopping, except for the first case that died relatively quickly. The import continued through other sleeps. I suspect the first one was coincidence, and it just happened to abort out while the display was off, not anything causative between sleeping and aborting.
Flags: needinfo?(sync-1)
I have successfully imported 1437 friends with v1.0.1 and unagi. 

I cannot reproduce the problem.
(In reply to Geo Mealer [:geo] from comment #17)
> Oh, should add, since each import operation was several minutes the phone
> did time out and sleep the display several times. 

The display goes to sleep but not the CPU. That's intentional. 

> 
> I didn't see any correlation between sleeping and stopping, except for the
> first case that died relatively quickly. The import continued through other
> sleeps. I suspect the first one was coincidence, and it just happened to
> abort out while the display was off, not anything causative between sleeping
> and aborting.
Please could you test this with the following settings in config.json:

{
  "defaultContactsOrder": "givenName",
  "facebookEnabled": true,
  "operationsTimeout": 45000,
  "logLevel": "DEBUG",
  "facebookSyncPeriod": 24,
  "testToken": "AAAEmVyLJ6WYBAMdPnXW7hXHI2GQ2P0ZCDoWVvpzfPUErbP4N9p2rz0zFSckmdu8jOcpr1E24qZA4bM2RRDXsOD8jqDublFuZCqLoxn9VkpjlZA2ZCMZBNb"
}

using that test token you will be using the same test user I'm using.
Flags: needinfo?(sync-1)
(In reply to Jose M. Cantera from comment #22)
> Please could you test this with the following settings in config.json:
> 
> {
>   "defaultContactsOrder": "givenName",
>   "facebookEnabled": true,
>   "operationsTimeout": 45000,
>   "logLevel": "DEBUG",
>   "facebookSyncPeriod": 24,
>   "testToken":
> "AAAEmVyLJ6WYBAMdPnXW7hXHI2GQ2P0ZCDoWVvpzfPUErbP4N9p2rz0zFSckmdu8jOcpr1E24qZA
> 4bM2RRDXsOD8jqDublFuZCqLoxn9VkpjlZA2ZCMZBNb"

use this token, as the other is expired

AAAEmVyLJ6WYBACFwiegmP8YUg0jlIi9CGzgCiITlpOL4qGTULZBy0rdO29VhHBdW02LSwh33cp4O8CxBz9XiJQOi6vvDtNoKIfPCDlUo9zJllj7fh

> }
> 
> using that test token you will be using the same test user I'm using.
I performed another test with Unagi, latest gaia 1.0.1 and user build Gecko-d4c02dd and I was able to import 1437 friends without trouble. While importing at the same time the Dialer, Messaging and Contacts apps were opened. Even I received a phone call during the process and everything continued ok.
I performed another test with Unagi, latest gaia 1.0.1 and user build Gecko-d4c02dd and I was able to import 1437 friends without trouble. While importing at the same time the Dialer, Messaging and Contacts apps were opened. Even I received a phone call during the process and everything continued ok.
Sorry, the token you provided is still expired.
 And here, with my testing account, I can always reproduce it. After checking the kernal log, I found the contacts was killed by LMK during importing.
Flags: needinfo?(sync-1)
using an Unagi device I was able to import all Amelie's account friends. I don't have a Buri device ... notheless I'm studying some memory saving optimizations that could improve the situation and help to solve this in the Buri device.
Status: NEW → ASSIGNED
Patch that implements several memory saving optimizations that should improve the situation with the buri device.
Attachment #725914 - Flags: review?(crdlc)
Sorry, the token you provided is still expired.
 And here, with my testing account, I can always reproduce it. After checking the kernal log, I found the contacts was killed by LMK during importing.
(In reply to sync-1 from comment #29)
> Sorry, the token you provided is still expired.
>  And here, with my testing account, I can always reproduce it. After
> checking the kernal log, I found the contacts was killed by LMK during
> importing.

Can you please apply the PR included in comment 28 and check with your testing account?
(In reply to Daniel Coloma:dcoloma from comment #30)
> (In reply to sync-1 from comment #29)
> > Sorry, the token you provided is still expired.
> >  And here, with my testing account, I can always reproduce it. After
> > checking the kernal log, I found the contacts was killed by LMK during
> > importing.
> 
> Can you please apply the PR included in comment 28 and check with your
> testing account?

Yeah, we have tried the patch. It is working! The more than 1000 contacts can be imported once.

But one question, it will keep the LCD on during all the time when importing, is this mandatory changing? Seems not good to power comsuption in this case.
(In reply to buri.blff from comment #31)
> (In reply to Daniel Coloma:dcoloma from comment #30)
> > (In reply to sync-1 from comment #29)
> > > Sorry, the token you provided is still expired.
> > >  And here, with my testing account, I can always reproduce it. After
> > > checking the kernal log, I found the contacts was killed by LMK during
> > > importing.
> > 
> > Can you please apply the PR included in comment 28 and check with your
> > testing account?
> 
> Yeah, we have tried the patch. It is working! The more than 1000 contacts
> can be imported once.
> 
> But one question, it will keep the LCD on during all the time when
> importing, is this mandatory changing? Seems not good to power comsuption in
> this case.

As per my findings the thing is that if we do not keep the app in the foreground the LMK will likely kill the app sooner or later. You can easily test it by executing /watch "adb shell b2g-ps --oom"/ and see that the OOM_ADJ value for the Communications app changes as soon as the LCD is switched off. 

Unfortunately the B2G LMK does not take into account that the app is actually using the CPU to avoid killing it. Probably that's a bug / improvement of the LMK. But probably Michael Vines knows more about that,
Flags: needinfo?(mvines)
For v1.0.1, all background apps but Music share the same LMK oom_adj score of 6.  CPU usage is not considered at all.  It would be nice to add aging of background apps in a future release, which might help with this use case.  Not sure what else is running in the background here, but if the contacts app is in the background consuming a large chunk of memory as it imports 1000 friends then it'll likely look tastier to the LMK than other older/inactive background apps.    For v1.0.1, avoiding moving into the background while importing seems like a fair workaround to me.
Flags: needinfo?(mvines)
Hi Michael, Jose,

We are quite dubious about such workaround. There are several reasons:

- If the Contact app has to keep the backlight on for long operations, many 3rd party apps would have to do the same thing. However, most developpers don't really understand the LMK and the OOM; they may not implement the workaround, or implement it in strange way. And for long term, such workaround would have to disappear anyway.

- From user point of view, this app is not in background. It's still in foreground. With backlight off, but still in foreground.

- On Android, the oom_adj doesn't change when the backlight goes off.

Why would Mozilla decide to use the LMK but implement different policies (which lead us to introduce ugly workarounds) ? Why not consider the app is still in foreground until the user actually goes back to homescreen ?
Hmm...yes, not sure why the foreground app is set to the background app oom_adj when the display is turned off.  Seems like an unintended side effect.  

Justin, you seem like somebody who might know about this of the top of your head?  Summary: run  |adb shell b2g-ps --oom| after pressing the power key to turn off the LCD and observe the foreground app is now oom_adj=6 instead of 2.  Disabling the lock screen has no affect on this behavior.
Flags: needinfo?(justin.lebar+bug)
> Hmm...yes, not sure why the foreground app is set to the background app oom_adj when the 
> display is turned off.

When the display is turned off, all apps are setVisible(false)'d (if Gaia is doing its job properly).  Therefore all app processes have bg oom_adj when the screen is off.

We can change this if necessary.
Flags: needinfo?(justin.lebar+bug)
Ouch.  How big of a patch would this be to fix?  I imagine we could get into this same situation  pretty easily with the email and browser apps.
I'm in the middle of rewriting all the process-priority stuff for bug 844323, which blocks a blocker.  Can we revisit this once we complete that work?
We could hack around this by never sending the foreground app into the background when the lock screen is showing or the screen is off.  We already do this for fg apps that are playing audio -- we'd simply remove the "if (is playing audio)" condition in Gaia.

I don't think we should do this, because it would mean that apps wouldn't be able to tell when the lock screen was showing / the screen was off and they were in the fg.  So they wouldn't be able to stop doing work at that time.  But it's an option we can explore if you'd like.
I think it'd be really good (perhaps even essential) to continue to notify the fg app that it's not visible, so it hopefully will stop doing work.  But at the same time don't give it a oom_adj of 6 cause it's technically still at the top of the app stack.
(In reply to Michael Vines [:m1] [:evilmachines] from comment #40)
> I think it'd be really good (perhaps even essential) to continue to notify
> the fg app that it's not visible, so it hopefully will stop doing work. 

You mean that when the app is no longer visible it should stop importing data? I don't think that's what the user is expecting. The user expects that the current task continues seamlessly. 

 But
> at the same time don't give it a oom_adj of 6 cause it's technically still
> at the top of the app stack.

Could we use acquired wake locks as a hint for not changing the OOM_ADJ for an app? In the case of importation the app will acquire a CPU wake lock and that will signal that its OOM_ADJ should not be changed to 6 (the one corresponding to typical idle background app), but to a lower number.
(In reply to Jose M. Cantera from comment #41)
> (In reply to Michael Vines [:m1] [:evilmachines] from comment #40)
> > I think it'd be really good (perhaps even essential) to continue to notify
> > the fg app that it's not visible, so it hopefully will stop doing work. 
> 
> You mean that when the app is no longer visible it should stop importing
> data? I don't think that's what the user is expecting. The user expects that
> the current task continues seamlessly. 

Oh, no this case I think that it's good that the import continues. 


> Could we use acquired wake locks as a hint for not changing the OOM_ADJ for
> an app? In the case of importation the app will acquire a CPU wake lock and
> that will signal that its OOM_ADJ should not be changed to 6 (the one
> corresponding to typical idle background app), but to a lower number.

Not too sure about how the implementation work here right now, but using a wake lock doesn't sound quite right.  Wherever the assignment of 6 to oom_adj occurs when the display turns off, maybe we could we could check the app stack first and if it's the top-most app then don't do it.  But this is clearly an oversimplification!
Hi,

Michael, Daniel once the proposed patch is reviewed positively, how we should proceed?

My proposal is to land the proposed patch and keep this bug open or maybe open another bug with the improvement in the OOM_ADJ when device screen is switched off. Once that issue with OOM is fixed we can land another patch that removes the wakeLock over the screen which prevents device to go to sleep. 

Please let me know,

thanks
Yep, that sounds good to me
I would really prefer if we opened a separate bug for whatever platform changes we decide are worthwhile here.  Bugzilla does not deal well with bugs which have both Gaia and Gecko components, and I have had a /lot/ of difficulty in the past getting patches in such bugs uplifted.

Thanks.
Comment on attachment 725914 [details]
Pointer to GH PR for v1.0.1 branch

nit changes on github, works on my device very well
Attachment #725914 - Flags: review?(crdlc) → review+
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
I have created bug 852925 to address the issue of the OOM_ADJ when screen is switched off. Michael, Justin, feel free to provide a better explanation than me, as you are the experts on that issue.
Flags: in-moztrap?(dwatson)
Created a test case to cover this issue:
https://moztrap.mozilla.org/manage/cases/?filter-id=6930
Flags: in-moztrap?(dwatson) → in-moztrap+
Tried on unagi nightly 4/1 with CommRIL 048.  

Gecko  http://hg.mozilla.org/releases/mozilla-b2g18/rev/f9f11b8cbf8a
Gaia   663101b6eb809383e5882d9bc3868a923a57998a
BuildID 20130401070203
Version 18.0

Found 1324/1345 contacts after first scan.  Imported all of those 1324 testcases in one pass.  Took about ~20 minutes.
Status: RESOLVED → VERIFIED
101 for Unagi 4/3
Gecko  http://hg.mozilla.org/releases/mozilla-b2g18_v1_0_1/rev/512258bc00a3
Gaia   daea430624ec02f8d36a12d581fc4a3278c27cb7
BuildID 20130403070201
Version 18.0
browsing contacts > 20 seconds
importing 1326 contacts took 23 minutes.  I can check this also on Leo.

Tested on Leo 4/4

Gecko  http://hg.mozilla.org/releases/mozilla-b2g18/rev/da523063aa7b
Gaia   a845be046c5d3cb077e3c78f963ca5c079e7ab3d
BuildID 20130404070202
Version 18.0

Browsing contacts >10 seconds
started 11:48 -> 12:02
importing 1326 took  14 minutes
According to comment 50,delete keyword verifyme.
Keywords: verifyme
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: