Closed Bug 904973 Opened 11 years ago Closed 11 years ago

[zffos1.1][Category] Export Contacts from SD card error

Categories

(Firefox OS Graveyard :: Gaia::Contacts, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: zhang.dapeng, Assigned: bkelly, NeedInfo)

References

Details

(Keywords: perf, Whiteboard: [c= p=4 s= u=])

Attachments

(5 files, 1 obsolete file)

User Agent: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; TCO_20130814092455)

Steps to reproduce:

Export Contacts from SD card (SD card contains a large capacity vcard)


Actual results:

Blocking in the reading interface,The previous version can import correctly,Attach the vcard file that can reproduce the problem


Expected results:

can import correctly
Blocks: 899451
leo+ requested per bug 899451.
blocking-b2g: --- → leo?
Two questions:
1. Need more info on if this issue happens on other kinds of vcard?
2. Does this also happen if we don't use this import but import through outlook?
Flags: needinfo?(zhang.dapeng)
Wilfred, was this a 1.1 committed feature?
Flags: needinfo?(wmathanaraj)
yes - v1.1 must have import contacts from SD card. This was an operator launch blocker. Implementation was done by Telenor eng team.
Flags: needinfo?(wmathanaraj) → needinfo?(sergi.mansilla)
Assignee: nobody → sergi.mansilla
Flags: needinfo?(sergi.mansilla)
Broken new 1.1 feature = leo+
blocking-b2g: leo? → leo+
Big vcf file, could the issue be in fact the same as in bug 904313 ?
Hi Sergi,

This one seems to be the same as 904313 according to its symptom and description. Will you use this bug or 904313 for the investigation and fixings. I suggest using this one since this is leo+ for ZTE v1.1 release. What do you think?
Flags: needinfo?(sergi.mansilla)
Hi Ivan,

That makes sense, I will update investigation and fixings here.

Cheers.
Flags: needinfo?(sergi.mansilla)
Summary: [zffos1.1][Category] Export Contacts from SD card error → [zffos1.1][Category] Import Contacts from SD card error
Bug 905656 might be a dupe of this one.
Hi Sergi,

Thank you for helping on this case. May I know if there is any target date to resolve this issue?
Flags: needinfo?(sergi.mansilla)
Hi Sergi,,

May I know when this issue will be resolved? Is it possible to be done this Friday? Our partner is targeting on 9/9 to integrate our latest gaia for the TA build. Thank you!
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Ivan

please check if ZTE really needs this bug to be fixed.
Flags: needinfo?(itsay)
Since this is not IOT certification bug, it is not blocker for ZTE release now
Flags: needinfo?(itsay)
Any update on this one? This is blocking some of the performance tasks
Blocks: 894751
Minus, based on Comment 14.

Hub,
Please point out clearly what performance tasks are blocked.
blocking-b2g: leo+ → -
Flags: needinfo?(hub)
Bug 894751. It is not a hard blocker, but I really wanted to know where it was at. I know Sergi was on holiday for a bit, etc.

Thanks
Hi,

I was on holiday but I am working on this now. So far I have a streaming parser that shouldn't consume as much memory as before, but after parsing the first batch (of 100 contacts) quite fast, it gets stuck with the second batch. Looking into whether the problem is in my code or in mozContacts. I'd appreciate if somebody has suggestions or can shed some light on it.

The branch is https://github.com/comoyo/gaia/compare/vcard_streaming_4
Flags: needinfo?(sergi.mansilla)
Flags: needinfo?(hub)
Attached file Profiler JSON trace
I attach a profiler trace from the App Manager taken after 2 rounds of batch processing (when the app starts stalling). Looks like the load is mostly on the back-end side of Contacts.
I pulled vcard_streaming_5 (_4 didn't work on m-c) and it is stuck at 1/25000.

But given comment 18 I'm not surprised.
And the cancel button is unresponsive.
Hey,

After more investigation, it is pretty clear that there is a considerable memory leak. Trying with 6000 contacts, it starts very fast and it slows down gradually. I am watching b2g-ps in the devices shell and memory clearly increases and doesn't free up enough. Branch vcard_streaming_5 shows it more clearly (https://github.com/comoyo/gaia/tree/vcard_streaming_5).

My experience debugging memory issues in B2G/Gaia is pretty limited though. I need some pointers or help in how to find out what's leaking exactly. Julien, can you point me in the right direction or to somebody that can help with these kind of issues?
Flags: needinfo?(felash)
Keywords: perf
Whiteboard: [c= p= s= u=1.2]
Sergi, is it really leaking, or is the memory released at the end of the import ?

My thought here is that we're maybe not waiting that one import is finished before doing the next one, and therefore the memory piles up, and this probably gets slow because of parallel database operations.

But I didn't look at the code at all so I may be completely wrong here !

Gabriele will probably have pointers on how to debug the memory usage.
Flags: needinfo?(felash) → needinfo?(gsvelto)
I suggest having a look at the tutorial written by Justin Lebar a few months ago to diagnose if this is an OOM issue and what can be done about it:

https://wiki.mozilla.org/B2G/Debugging_OOMs

If you run into problems while running one of the steps contained in the tutorial feel free to ping me on IRC or send me an e-mail; I'll be glad to help.
Flags: needinfo?(gsvelto)
I confirmed it is not a OOM, but I couldn't get gdb to work (complains about the lack of gdb-server).

However, I found the bottleneck that causes importing to slow down and possibly eventually crash. It is navigator.mozContacts.save. Upon substituting mozContacts.save by a timeout, the importing runs flawlessly and consistently fast, but of course it doesn't save the final contacts in the phone.

Who can I talk to about internal mozContacts.save performance and how to trace exactly what happens there? It is hard to believe that this would be because of too many saves, since it is done by batches and sequentially.
So by the scenario pointed out above, it could still be what Julien suggests. I will investigate.
The contact saves are definitely happening sequentially, and each one is saved only when the previous one calls its `onsuccess` handler, which should mean that the contact has been saved to the DB and thus saving the next one should not overload it.

I mentioned it above, but I am currently working in the branch https://github.com/comoyo/gaia/tree/vcard_streaming_5 in case anybody wants to give it a try.
(In reply to Sergi Mansilla from comment #27)
> Who can I talk to about internal mozContacts.save performance and how to
> trace exactly what happens there? It is hard to believe that this would be
> because of too many saves, since it is done by batches and sequentially.

Gregor will know.
Flags: needinfo?(anygregor)
mozContacts.save is not highly optimized. 
It's also an async operation and if you call it in a loop without waiting for the first results to get completed you might run into problems.

So it seems the API stops working along the way right? I haven't looked at your code. How are you using mozcontacts.save? Have you tried adding contacts in chunks?

I cced Reuben and Ben here. They have a lot of experience with contacts performance.
Flags: needinfo?(anygregor)
ni? to myself as a reminder to look at this per comment 31.
Flags: needinfo?(bkelly)
I can't load profile_1.json in Cleopatra. It just says (empty) and everything in the sidebar is says undefined. I wonder if we're somehow not deleting things from the _cachedContacts array.
Thanks Gregor.

I am indeed adding contacts in chunks, and each chunk is processed only when the previous one has finished. I would appreciate a second pair of eyes on the code, though :)
Sergi, can you try removing the mozContacts.oncontactchanged handler in contacts.js?  The slow down may be due to work done in the oncontactchanged handler which will be triggered for each mozContacts.save().
Whiteboard: [c= p= s= u=1.2] → [c= p= s= u=]
koi+ since this blocks bug 894751 whic is also a koi (1.2) blocker.
Assignee: sergi.mansilla → bkelly
blocking-b2g: - → koi+
Flags: needinfo?(bkelly)
Whiteboard: [c= p= s= u=] → [c= p= s= u=1.2]
Priority: -- → P1
As of bug 894751 comment 64 this is no longer blocking a blocker.  Therefore I'm dropping the koi+ here as well.
blocking-b2g: koi+ → ---
Whiteboard: [c= p= s= u=1.2] → [c= p= s= u=]
Priority: P1 → --
Sergi, I was able to get this working against the v1.2 branch here:

  https://github.com/wanderview/gaia/tree/contacts-stream-vcard-v1.2

The changes I made are in this commit:

  https://github.com/wanderview/gaia/commit/6cd5602f5db5c425dc1f4287cfce3f555833b240

Unfortunately your branch did not apply cleanly to master, so I have not tried there.

Can you confirm if these changes fix the issues you were having with import?  If so, do you have time to get this rebased and landed against master?
Flags: needinfo?(sergi.mansilla)
Hi Ben,

Thanks for your work! I am testing your patch right now against a 1.2 Peak and 5000 contacts. It has processed ~440 contacts already without crashing but it gets gradually slower over time. At 400 contacts processed it takes about 10 seconds to process 5 more contacts, which is pretty slow. On the other hand, it hasn't crashed yet. It could just be that the contacts API just doesn't perform, which would be another problem out of the scope of this issue.

After that I will try with a different `CONCURRENCY` value instead of 5 and see if it performs a big better, but I doubt it.

Thoughts?
Flags: needinfo?(sergi.mansilla)
Ben, unfortunately it crashed past 1000 contacts.
To know if this is a mozContact issue, just import like 500 contacts, and then start again to see if this is still slow of if this is fast again.
Sergi can you attach the vcard file you are using?  Does it have a bunch of contacts that start with the same name?  Like "Test1", "Test2", etc?  If so, then the slowness is probably due to O(n) insertion sort here:

  https://github.com/mozilla-b2g/gaia/blob/master/apps/communications/contacts/js/views/list.js#L1125

To test this theory you can just disable the code by hacking out the contactchange event handler here:

  https://github.com/mozilla-b2g/gaia/blob/master/apps/communications/contacts/js/contacts.js#L684

In regards to the crash, I'll probably need your vcard file to look at it.

Thanks!
Flags: needinfo?(sergi.mansilla)
(In reply to Sergi Mansilla from comment #39)
> At 400 contacts processed it takes
> about 10 seconds to process 5 more contacts, which is pretty slow.

If you're running into this kind of problems it might be worth profiling the application while this is happening. The profile will show you immediately what's being slow and it's faster than trying to guess it from the code.
This is an old stress-test file with 20000 generated vcard contacts. I my contacts from this file (stopping at the first n I need). Since they are generated incrementally, their names have the same root, as Ben asks in https://bugzilla.mozilla.org/show_bug.cgi?id=904973#c42.

I don't have the phone with sdcard with me right now, so I can't properly test what Julien suggests, but I'll try as soon as I can.

Gabriele, are you referring to OOM profiling in the device or in the browser?
Flags: needinfo?(sergi.mansilla)
Thanks Sergi.  I will look at this today.  I can profile it as well.
(In reply to Sergi Mansilla from comment #39)
> Thanks for your work! I am testing your patch right now against a 1.2 Peak
> and 5000 contacts. It has processed ~440 contacts already without crashing
> but it gets gradually slower over time. At 400 contacts processed it takes
> about 10 seconds to process 5 more contacts, which is pretty slow. On the
> other hand, it hasn't crashed yet. It could just be that the contacts API
> just doesn't perform, which would be another problem out of the scope of
> this issue.

So I think the problem here is the workload.  The 20000 vcard file in attachment 828015 [details] uses the same telephone number for all the contacts.  This hits a pathological case in the de-duplication code.  It goes to all the work to find contacts with the same phone number (all of them) and then discards them because the name is different.  This is essentially O(n^2) with an expensive search mixed in for good measure.

Let me see if I can get another large file to test or change this one to use different tel numbers.
Thanks to some perl evil-ness:

  perl -ne 'my $val = sprintf("(%03d) %03d-%04d", int(rand(999)), int(rand(999)), int(rand(9999))); s/\([0-9]{3}\) [0-9]{3}-[0-9]{4}/$val/; print' test.vcf > test2.vcf

Here is a vcard file with 20000 contacts and different phone numbers.

I haven't had a chance to re-run the test, yet.
So with the vcard file using different tel numbers it runs much faster.  It does start to slow down around 1500 to 2000 contacts due to the issue I mentioned in comment 42.  Let me see if I can hack the insertion sort to use a binary search.
Oh, I'm also at 4400+ contacts and still importing without a crash.
Perl ! You know regular expressions ! :D

Ben, why is there this pathological testcase O(n²) ? Aren't we supposed to have an index to lookup the phone number?
(In reply to Julien Wajsberg [:julienw] from comment #50)
> Ben, why is there this pathological testcase O(n²) ? Aren't we supposed to
> have an index to lookup the phone number?

Well, the search itself is not O(n^2).  Each mozContacts.find() seems to run fairly quickly.

The problem is that when you combine the import process, the de-dupe process, and the find all together you get:

  for N vcard records
    find contacts with matching tel/mail
    for X matching contacts
      if (vcard name matches contact name)
        ignore vcard entry
    if (vcard name did not match any contact names)
      save new contact from vcard

When all of the entries in the vcard have the same phone number, but different names, then mozContacts.find() will always be the number of contacts previously imported.  Essentially the inner loop X equals N-1.  So N iterations each containing N-1 iterations; i.e. O(n^2).

I would say we should do a find() on the name instead since it would let us short circuit faster, but unfortunately the name matching logic requires some normalization which I don't think mozContacts can't handle.

Overall the algorithm appears to be working correctly, but was handed an unexpected and unusual set of data.  I think running that data slowly without crashing is a reasonable response.
Thanks for the explanation, this is very clear!

I agree with everything you said :)

I'm asking for a 1.3 blocker to fix the crash.
blocking-b2g: --- → 1.3?
So I found a bunch of small optimizations to the way we update contacts in the app after initial load is complete:

  - Insert contacts into their group list using a binary search insertion
  - Avoid calling imgLoader.reload() on every contact which causes a sync reflow
  - Avoid calling FixedHeader.refresh() on every contact
  - Add the 'hide' style to the list while we are importing to avoid reflowing continuously

With these changes it no longer slows down at 1500 to 2000 contacts.  Its still importing at ~3 contacts per second.

I'm going to write these issues up as separate, dependent bugs.
Whiteboard: [c= p= s= u=] → [c= p=4 s= u=]
Depends on: 936170
Depends on: 936202
Depends on: 936206
Depends on: 936215
Ok, I've added bugs and pull requests for all the dependent bugs.

We still need to rebase the original patch master.  Sergi, do you have time to do this tomorrow morning?  Unfortunately I've run out of time for tonight, but want to avoid missing review tomorrow if we can.  If you don't have time I'll do the rebase tomorrow when I get online.

Thanks!
Flags: needinfo?(sergi.mansilla)
Ben, about the FixedHeader issue, you might want to have a look at the FixedHeader rewrite I did for SMS at [1]. It's well-tested (see [2]) and I tried to make it as efficient as possible. I wanted to put it in shared and use it in the other apps but I never found the time, but now could be a good time...

[1] https://github.com/mozilla-b2g/gaia/blob/master/apps/sms/js/fixed_header.js
[2] https://github.com/mozilla-b2g/gaia/blob/master/apps/sms/test/unit/fixed_header_test.js
(In reply to Julien Wajsberg [:julienw] from comment #55)
> Ben, about the FixedHeader issue, you might want to have a look at the
> FixedHeader rewrite I did for SMS at [1]. It's well-tested (see [2]) and I
> tried to make it as efficient as possible. I wanted to put it in shared and
> use it in the other apps but I never found the time, but now could be a good
> time...
> 
> [1]
> https://github.com/mozilla-b2g/gaia/blob/master/apps/sms/js/fixed_header.js
> [2]
> https://github.com/mozilla-b2g/gaia/blob/master/apps/sms/test/unit/
> fixed_header_test.js

Thanks Julien.  This definitely looks interesting.

I think it might be worth pursuing that as a separate enhancement, though.  It seems like it would require more testing/work to know the impact for contacts.  For example, it looks like the sms version might trigger a sync reflow on refresh() by touching scrollTop and offsetTop in findFixedHeader().

Anyway, I guess I just feel like there is so much going on with this bug already I'd like to avoid expanding the scope unless necessary.
(In reply to Ben Kelly [:bkelly] from comment #56)

> Thanks Julien.  This definitely looks interesting.
> 
> I think it might be worth pursuing that as a separate enhancement, though. 
> It seems like it would require more testing/work to know the impact for
> contacts.  For example, it looks like the sms version might trigger a sync
> reflow on refresh() by touching scrollTop and offsetTop in findFixedHeader().

Except it's done after a setTimeout so chances are the reflow has been done already ;)
But see why I want it in shared! More eyes :)

but yes, in the end, I want "position: fixed" and get rid of this lib ;)

> 
> Anyway, I guess I just feel like there is so much going on with this bug
> already I'd like to avoid expanding the scope unless necessary.

Good for me!
Note, we appear to be under greater memory pressure on master/m-c.  I don't know if its because of gecko or gaia differences.  I think we should still proceed with this fix, though.
Attached patch GitHub PR (obsolete) — Splinter Review
Setting bkelly as the reviewer of the patch since most of the code is mine.
Attachment #829332 - Flags: review?(bkelly)
Flags: needinfo?(sergi.mansilla)
Comment on attachment 829332 [details] [diff] [review]
GitHub PR

Sergi, this is a really good start and works well.  That being said, as I dug into the code I had a number of nits and questions.  I posted these on Github.

Since this is such a large PR and the code is complex I let myself go a bit deeper into style/readability issues than I normally due.  I'm sorry if I was a bit too nit picky.  :-)

Due to the size and complexity of the code, I think I'd like to see this again.  I feel pretty confident this will get an r+ on the next pass.

Thanks!
Attachment #829332 - Flags: review?(bkelly) → review-
Oh, can you also update the branch so that you show as the commit author?  (I'm not actually sure how to do that in git...)
Attached patch GitHub PRSplinter Review
Updated Github pull request with all the issues pointed by the reviewer fixed. I left the different commits so it is clear what has been fixed. After the review is approved I will squash them and change authorship.
Attachment #829332 - Attachment is obsolete: true
Attachment #830176 - Flags: review?(bkelly)
Comment on attachment 830176 [details] [diff] [review]
GitHub PR

Looks good!  Thanks!

r=me, but please squash commits, update the author to be you, and make sure travis is green.  (I can't seem to get to travis tonight to see what the issue is.)
Attachment #830176 - Flags: review?(bkelly) → review+
Squashed, edited author, re-tested and merged, thanks!
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
this is resovled fixed. already in 1.3 master
blocking-b2g: 1.3? → ---
Looks like the commit rev never got added to this bug.  For reference it appears to be:  10c4f80dc0b6326044085af08bbd399fc54c88fd
Summary: [zffos1.1][Category] Import Contacts from SD card error → [zffos1.1][Category] Export Contacts from SD card error
Blocks: 940334
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: