Closed Bug 1045842 Opened 10 years ago Closed 10 years ago

mozApps.mgmt.getAll() is ~200ms slower in 2.0 than 2.1

Categories

(Core Graveyard :: DOM: Apps, defect, P1)

ARM
Gonk (Firefox OS)
defect

Tracking

(b2g-v2.0 affected, b2g-v2.1 fixed)

RESOLVED WORKSFORME
Tracking Status
b2g-v2.0 --- affected
b2g-v2.1 --- fixed

People

(Reporter: kgrandon, Assigned: huseby)

References

Details

(Keywords: perf, Whiteboard: [c=progress p= s= u=2.0])

Noticed this while profiling bug 1037706. Here are the numbers I'm seeing for an engineering build with 47 apps.

Master Mozapps getAll() performance: 140 - 320
2.0 Mozapps getAll() performance: 300 - 460
[Blocking Requested - why for this release]:bug 1037706 is a 2.0 blocker and the issue reported in this bug is a strong candidate for significantly reducing the problem reported in bug 1037706.

Kevin,
Are you taking this issue or should I find someone on our fxOS Perf team to own it?
Severity: normal → blocker
blocking-b2g: --- → 2.0?
Flags: needinfo?(kgrandon)
Priority: -- → P1
Whiteboard: [c=progress p= s= u=2.0]
(In reply to Mike Lee [:mlee] from comment #1)
> Kevin,
> Are you taking this issue or should I find someone on our fxOS Perf team to
> own it?

I would prefer someone on perf or an apps owner/peer drive this. I am happy to assist in debugging this though.
Flags: needinfo?(kgrandon)
Blocks: 1045852
Dave, need you to take this issue and work with Kevin to identify the root and solution.
Assignee: nobody → dhuseby
Status: NEW → ASSIGNED
Well, the cause is because 2.1 has bug 903291 but not 2.0. However we won't uplift it so I'm not sure there's much to do here.
(In reply to Fabrice Desré [:fabrice] from comment #4)
> Well, the cause is because 2.1 has bug 903291 but not 2.0. However we won't
> uplift it so I'm not sure there's much to do here.

Is there some way of achieving the same thing but with a less-risky patch?
blocking-b2g: 2.0? → 2.0+
So the main speed cost is in reading the manifest data.  And since we aren't going to uplift the manifest cache patch in Bug 903291, we have to get creative.

1. we can strip down the patch in Bug 903291 to just do the manifest caching and not everything else.
2. kevin, is there a way that the homescreen can know what apps it wants to show icons for above the fold?  maybe cache them?  because, fabrice, then we can talk about adding optional parameters to getAll() or add a getByName() or getByIndexOffest() functions to the API to work around the O(n) complexity of getAll().

to make this faster we have to read fewer manifest file and/or cache the manifest file data after it is read the first time.  I just thought of another idea:

3. kevin, maybe the homescreen app can cache the manifest data and not call getAll() every time?  as long as it is aware of when apps are installed/removed, then it can keep its own cache consistent.
Flags: needinfo?(kgrandon)
Flags: needinfo?(fabrice)
BTW, getAll() is *much* faster now than it was 6 months ago.  I'm measuring between 10-11ms per app.  The main performance problem is the O(n) complexity of getAll().
Not sure what you expect of me there. Feel free to try something of course, but don't make api changes.
If you do 2. you can likely use checkInstalled(in DOMString manifestUrl).
Flags: needinfo?(fabrice)
Cache invalidation in the new homescreen is going to be extremely difficult to implement in 2.0. I would imagine that this would be much more risky than an uplift of bug 903291.

If we can do some magic manifest caching, I think it would be worthwhile to implement and perform some benchmarking.
Flags: needinfo?(kgrandon)
I'm going to attempt to strip down the patch(es) in Bug 903291 to only do manifest caching.  Fabrice, what is it in Bug 903291 that is too risky for uplift?  Will I have any chance of landing a stripped down patch that does manifest caching?
Flags: needinfo?(fabrice)
I don't know if you can write a correct & not risky patch that does just the manifest part. In  Bug 903291 we changed the way we manage the DOM objects and the ipc traffic to reduce ipc from the parent to a given child, and make sure that dom objects's state is consistent between windows in a given process. So this is a very big change, and while it looks ok now on master that's not something to take on 2.0 that late in the game.
Flags: needinfo?(fabrice)
Also, I'm interested in new timings since bug 972076 landed.
Seems like there's a few things we should do here.

1 - Perform timings since bug 972076 landed.

2 - See if there is some manifest caching we can do without the mass refactor.

3 - Reconsider uplifting. We're running out of time, so we need to act fast. Uplifting at this point seems like the most risk-free imo.
(In reply to Kevin Grandon :kgrandon from comment #13)
> 3 - Reconsider uplifting. We're running out of time, so we need to act fast.
> Uplifting at this point seems like the most risk-free imo.

We also have a ton of gaia tests around apps now. Since they're all passing as of now we should be pretty confident in our ability to make changes to this code.
I'm re-running the timing right now.
I tested the timing by changing the JS to collect and output the time it takes to call mozApps.mgmt.getAll().  I rebooted b2g 15 times to measure the execution time of getAll().  I first tested the before changeset (f483f43) and here's what I found:

312,330,292,290,312,237,256,245,296,284,257,370,483,320,307
median: 306 ms
std dev: 60 ms
time per app: 6.9 ms

Then I tested the after (9feaa11) and found:

282,211,207,235,219,231,210,260,302,327,320,350,336,307,327
media: 274 ms
std dev: 52 ms
time per app: 6.2 ms

The patch in Bug 972076 speeds up the call to getAll() by about 10%.  But I want to point out that making getAll() faster is not what we should ultimately do.  Even if we made getAll() only take 1ms per app, if a phone has 100 apps, that is 100ms.  Currently, with the patch from Bug 972076, it takes 6.2 ms per app.  If a phone has 100 apps, it will take 620 ms for this call to return back with data.

I'm going to run the same test against v2.0 tip and see what the results are.
I recommend that we uplift the 2.1 patch.
So I just tested the tip of v2.0 (02126fb) and there is trouble.  Running the same test as in Comment 16 I found this:

474,488,443,473,522,448,433,541,448,439,455,554,437,512,564
median: 482ms
std dev: 45ms
time per app: 11ms

This is much worse than just after the Bug 972076 patch landed.  I want to test v2.1 and then bisect this regression to see what causes things to get much slower between 9feaa11 and 02126fb.
2.1 has bug 903291 and bug 972076 that both provide benefits. 2.0 has none of these so it's not surprising to see a difference between 2.0 and 2.1. I'm even surprised that you don't see more improvement after bug 972076.
In comment 16 you said you changed the JS to collect timinig. Was it on the gaia side or in Webapps.js ?
It was gaia side:

diff --git a/apps/verticalhome/js/sources/application.js b/apps/verticalhome/js/sources/application.js
index 329ec36..0554f97 100644
--- a/apps/verticalhome/js/sources/application.js
+++ b/apps/verticalhome/js/sources/application.js
@@ -6,7 +6,9 @@
   var appMgr = navigator.mozApps.mgmt;
   var apps = null;
 
+  var start_time = performance.now();
   appMgr.getAll().onsuccess = function onsuccess(event) {
+    console.log("XXXXXX: getAll() " + (performance.now() - start_time));
     apps = event.target.result;
     window.dispatchEvent(new CustomEvent('navigator-mozapps-ready'));
   };
Flags: needinfo?(fabrice)
ok. I was measuring chrome side. Could be interesting to know how much the event dispatching itself costs, but it's mostly out of curiosity.
Flags: needinfo?(fabrice)
Flags: needinfo?(fabrice)
So what is the verdict on this bug?  How are we going to decide what to do?  If we uplift, we take on risk.  If I try to make things faster with new code, we take on risk.  Is this really something we want to block on?
Flags: needinfo?(kgrandon)
I would not block <broken_record_mode>, and certainly not uplift the 2.1 patches<broken_record_mode>.
Flags: needinfo?(fabrice)
(In reply to Fabrice Desré [:fabrice] from comment #23)
> I would not block <broken_record_mode>, and certainly not uplift the 2.1
> patches<broken_record_mode>.

Does this mean we're accepting the regression?
Flags: needinfo?(fabrice)
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #25)
> Where is there a regression?

I'm not sure if there is a regression in the mozApps API or not. We would need to compare to 1.4 and I don't think we have yet. There *is* a regression in the homescreen launch time. This is all being tracked in bug 1037706, and there are more details there. There are a few reasons why we have a regression, and we can do more to fix it - but the safest thing to do is to leverage the patches we already have around in 2.1 instead of trying to re-create these gains with new patches.


(In reply to Dave Huseby [:huseby] from comment #22)
> So what is the verdict on this bug?  How are we going to decide what to do? 

I think at this point if we want to ship the vertical homescreen in 2.0 we have no choice but to uplift these patches. Unless bug 1037706 is no longer deemed a CAF blocker, then we are fine.
Flags: needinfo?(kgrandon)
Flags: needinfo?(fabrice)
So here's timing data collected with the patch in Comment 20 on master this morning:

281
264
246
263
294
240
294
305
254
270
270
249
255
272
277

average is 269
std dev is 19
avg per app is 6.1ms
Here's numbers for v1.4:

409
386
332
369
402
381
289
415
403
387
441
434
397
361
454

average is 391
std dev is 42
avg per app is 8.9
Thanks Dave. According to your numbers we do have a slight regression in 2.0 of approximately ~90ms. (Based on your latest numbers and findings in comment 18).

I don't really think it's worthwhile to try to narrow down the regression if we already have patches in master that fix this gain and quite a bit more (~200 ms or so based on comment 27 and comment 18).
Summary: mozApps.mgmt.getAll() is ~150ms slower in 2.0 than 2.1 → mozApps.mgmt.getAll() is ~200ms slower in 2.0 than 2.1
QA Whiteboard: [2.0-signoff-need-]
[Blocking Requested - why for this release]:

Given https://bugzilla.mozilla.org/show_bug.cgi?id=1037706 was good enough for 2.0 and any further improvements would be done in 2.1, moving this nom to 2.1 as well.
blocking-b2g: 2.0+ → 2.1?
I don't think we need to block unless we need some specific investigation here still. 2.1 is a lot faster already, so thanks for all of the work and investigation here. Long-term, getAll() is not really a good solution for us and third party app developers, so we should explore some other API solutions for them.
blocking-b2g: 2.1? → ---
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → WORKSFORME
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.