Closed Bug 1048841 Opened 8 years ago Closed 7 years ago

[Messages] investigate performance hit from IndexedDB at startup (eg: Contact lookup)

Categories

(Firefox OS Graveyard :: Gaia::SMS, defect, P3)

ARM
Gonk (Firefox OS)
defect

Tracking

(Not tracked)

RESOLVED WONTFIX
2.1 S3 (29aug)

People

(Reporter: julienw, Assigned: steveck)

References

Details

(Keywords: perf, Whiteboard: [p(2.1S3)=1][p(2.1S2)=2][c=startup p= s= u=])

Attachments

(3 files)

In this bug, we want to investigate how doing less IDB at startup could help the initial rendering. We can try to cache contact lookup for the first panel.
Target Milestone: --- → 2.1 S2 (15aug)
Assignee: nobody → schung
Some brief profiling result about rendering time with first panel of threads(10 latest threads):

          Original  |  without facebook db request  |  without both facebook db and contacts db
        ----------------------------------------------------------------------------------------
AVG time for        |                               |     
rendering   752 ms  |          742 ms               |              703 ms
1st panel           |                               |
            
The result shows that reducing the contact look up step only decrease 50ms in average. Considering the  effort for caching the contacts, we might need some aggressive way to improve this part. Maybe we can try to cache threads data in 1st panel, but this changes also means we need to deal with the cache update and make sure no outdated item in thread list view. I'll make a simple experiment to see how much time will take if we cache the threads data.
Status: NEW → ASSIGNED
Keywords: perf
Whiteboard: [p=2] → [p=2][c=startup p= s= u=]
Priority: -- → P3
Attached file Link to github
Hi Julien, it's a WIP that simply show the possible performance if we cache latest threads for rendering. If you think result is good enough that we should give it a try for QC blocker, I can refine the patch for review next time.

It only takes ~150ms from app loaded to first view ready. It's much faster comparing the original one (around 750ms), and entering share activity could also get some benefits from this change. I might move the cache threads into another script, and apply event dispatch to theads object for updating the cache by monitoring threads changes. The only concern is outdated threads cached in thread but unable to store in localStorage because of OOM or something.
Attachment #8469964 - Flags: feedback?(felash)
Comment on attachment 8469964 [details] [review]
Link to github

yeah, this looks really good
Attachment #8469964 - Flags: feedback?(felash) → feedback+
Comment on attachment 8469964 [details] [review]
Link to github

I update the patch with cache update mechanism, so thread changes will be store in the cache now. Although the localStorage read speed faster than asyncStorage, it actually put the cache loading effort in the app preload time, that means load time will be increased obviously. The average load time became 700~750ms after applying the localStorage cache. But more serious issue is in cache updating. It didn't block UI while set the cache, however, it increase load time if cache is updated. The most case I've ever seen could up to 1 sec(load time), and this may become worse than original version, which seems not a convincing performance to apply the localStorage cache...

So I made another asyncStorage cache here: https://github.com/steveck-chung/gaia/tree/message-cache-asyncStorage. I moved cache init before load event(right inside ThreadCache script actually). It'll make the cache return earlier, but this move also affect the load time. Another thing is updated cache will affect the performance as well, the cache return time will have a delay around 200~250 ms if cache updated.
Attached file Link to github
Hi Julien, this patch is using asyncStorage for cache. I move the cache initialize as early as possible, and overall first panel rendering performance should be almost the same as localStorage version. Please note the first item rendering would be delay comparing the original, but overall first panel time could be faster since cache data is in memory. A noticeable drawback is in thread update, it might cause overall time become slower on certain devices like buri. Seems we could not avoid it ...
Attachment #8471505 - Flags: feedback?(felash)
Comment on attachment 8471505 [details] [review]
Link to github

I'm quite sad: on my buri, master (after bug 1051852 landed) is faster than both of the patches (localStorage or asyncStorage) :(

Note that in the localStorage patch, you need to add "navigator.mozL10n.once()" around "appendThread" calls so that l10n does not throw you an error.

One theory is that the DB in Gecko stays in memory between launches, while the DB in asyncStorage does not, so in the end it's faster if you merely relaunch the app.

However... I've reverted bug 1051852 and now I can see that your patch + revert bug 1051852 is faster than master (with bug 1051852).

From a user point of view, I think the behavior on master feels better (there is no "white" flash). I don't know if we can make your patch better :/

Also I tested on Buri so maybe on a device with 2 CPU it's different, so I'm waiting for your own tests too.
Attachment #8471505 - Flags: feedback?(felash)
Another lead could be Doug's patch in bug 865743. Never had the time to finally look at it properly but maybe it can make a faster rendering without caching?
(In reply to Julien Wajsberg [:julienw] from comment #6)
> Comment on attachment 8471505 [details] [review]
> Link to github
> 
> I'm quite sad: on my buri, master (after bug 1051852 landed) is faster than
> both of the patches (localStorage or asyncStorage) :(
> 
> Note that in the localStorage patch, you need to add
> "navigator.mozL10n.once()" around "appendThread" calls so that l10n does not
> throw you an error.
> 
> One theory is that the DB in Gecko stays in memory between launches, while
> the DB in asyncStorage does not, so in the end it's faster if you merely
> relaunch the app.
> 
> However... I've reverted bug 1051852 and now I can see that your patch +
> revert bug 1051852 is faster than master (with bug 1051852).
> 
> From a user point of view, I think the behavior on master feels better
> (there is no "white" flash). I don't know if we can make your patch better :/
> 
> Also I tested on Buri so maybe on a device with 2 CPU it's different, so I'm
> waiting for your own tests too.

Here is the slow motion video for buri and flame comparison (both with latest build and bug 1051852 applied, device at left applied cache patch):

Buri: https://www.youtube.com/watch?v=xs1WUqpMLRs&list=UUkM1XSMB9c0p9684qCa6ZAA
Flame: https://www.youtube.com/watch?v=823dS2_TnF4&list=UUkM1XSMB9c0p9684qCa6ZAA


On the Buri, yes the effect is not obvious, so maybe this changes didn't improve much the performance on single cpu devices... :-/ So maybe we should abandon the cache idea?

Without caching, I got another idea that moving the getThreads operation as early as possible, before init anything or even load complete. And we render data in Threads map at proper timing, wdyt?
(In reply to Julien Wajsberg [:julienw] from comment #7)
> Another lead could be Doug's patch in bug 865743. Never had the time to
> finally look at it properly but maybe it can make a faster rendering without
> caching?

What about Yan's patch in bug 972731? I've been needinfo'd there for a ton of time but didn't find some extra time to do more proper testing on that but it looked very promising.
Hi Julien, I've tried many ways I could think of for improving the launch time. But there is no easy and obvious method in gaia to be honest. Ignoring the contacts and draft for rendering first view might saving ~150ms, but is not acceptable form UX POV because of flashing screen. If we cache contact, message app could not know any contact changes while app is close, which means we still need to looks up the contacts DB and update after launch, it's still not a idea behavior for user.

I also tried some experiment on batch rendering for less reflow, for the best case, it could be 1 or 2 threads faster for first view, and result is when rendering 1 panel at once. If we narrow down the batch window(For example, batching 3 threads for rendering), it looks no faster at all. However, If we batch length of panel, it's still not guarantee faster for all cases because the threads in first panel is not fixed, it could be less because of time header, so it's still possible become slower if we have 7 threads in first view but batch window size is 9. 

After some discussion with vicamo, he suggested that we could try to push action into callback and keep the cursor continue as early as possible, but the effect it still not obvious. It just improves 30~50ms for 1st view in average. But it safer and much less changes comparing the contact/threads cache and batch rendering. I also move the threads rendering to init and remove some steps before rendering the threads, Although it could only reduced ~100ms or less in launch time, this is the only way I could think of for less side effect for now... :-/
Attachment #8474384 - Flags: feedback?(felash)
Comment on attachment 8474384 [details] [diff] [review]
quick patch for tiny improment

Review of attachment 8474384 [details] [diff] [review]:
-----------------------------------------------------------------

I don't see any difference when comparing on my Buris :(

Some of the changes could be a nice cleanup (for example, where we remove the content when there was no content before), but I don't see that it improves anything performance-wise...
Attachment #8474384 - Flags: feedback?(felash)
That said, if you see a tiny difference on your side, I'm not against the change itself. It looks safe enough.
(In reply to Steve Chung [:steveck] from comment #10)
> I also tried some experiment on batch rendering for less reflow, for the
> best case, it could be 1 or 2 threads faster for first view, and result is
> when rendering 1 panel at once. If we narrow down the batch window(For
> example, batching 3 threads for rendering), it looks no faster at all.
> However, If we batch length of panel, it's still not guarantee faster for
> all cases because the threads in first panel is not fixed, it could be less
> because of time header, so it's still possible become slower if we have 7
> threads in first view but batch window size is 9. 

We have already explored batching and it does have its benefits but it also comes with some downsides which have us hold it off for now.
OS: Linux → Gonk (Firefox OS)
Hardware: x86_64 → ARM
Whiteboard: [p=2][c=startup p= s= u=] → [p(2.1S3)=1][p(2.1S2)=2][c=startup p= s= u=]
Target Milestone: 2.1 S2 (15aug) → 2.1 S3 (29aug)
We found no useful improvements following this lead.
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.