Closed Bug 915138 Opened 6 years ago Closed 6 years ago

Cards view can have a maximum of 3 cards

Categories

(Firefox OS Graveyard :: General, defect, P1)

Other
Gonk (Firefox OS)
defect

Tracking

(b2g18 unaffected, b2g-v1.2 affected)

RESOLVED FIXED
Tracking Status
b2g18 --- unaffected
b2g-v1.2 --- affected

People

(Reporter: AlinT, Assigned: jhylands)

References

Details

(Keywords: perf, regression, Whiteboard: [c=memory p=2 s= u=] [fromAutomation] burirun3 burirun1.3-1)

Attachments

(4 files, 6 obsolete files)

Attached file Logcat (obsolete) —
This is reproducible only on master, it does not affect v1-train

STR:
1.Tap 4(or more) apps in the homescreen: tap one, then tap home button, tap the second, and so on, so that they remain in the history
2.long press home button

Expected results:
In the cards view there should be 4 items(or more, depending on how many apps there were opened)

Actual results:
There are only 3 cards.

The issue has started with the following build:
Gecko  http://hg.mozilla.org/mozilla-central/rev/a468b2e34b04
Gaia  753bed59566ad14c5e032e45d2b320ef9529ca9a
BuildID 20130909195156
Version 26.0a1

The logcat is attached.
OS: Linux → Gonk (Firefox OS)
Hardware: x86_64 → Other
Whiteboard: [fromAutomation]
This is tough bug to action. Background processes can be killed by the OOM killer if we run out of application memory. That might mean we could have a max of 3 cards potentially in some cases depending on what apps run and how much memory they consume.

Can you give more information on the gaia ui tests failing here? What's the regression range?
OK that seems to correlate well with the test failure we are seeing.

We have a test that opens 3 apps (Clock, Gallery and Calendar) and the opens card view and closes them off in the reverse order of listed above.
When closing the Calendar we intermittently get the Gallery app closed at the same time and our test fails.

Jason think this might mean a memory spike/regression in one of the three apps (or even in the Cards View code) we're using in the test.

Regression range is since the nightly Unagi build noted in comment #0.
Okay, that implies there's a memory regression on startup of the gallery app.
Component: Gaia::System → Gaia::Gallery
Whiteboard: [fromAutomation] → [fromAutomation][MemShrink]
blocking-b2g: --- → koi?
Do we have clearer steps to reproduce (what sequence of apps to launch, for instance) or a regression range here?  There's not much to go on for investigating this yet.
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #4)
> Do we have clearer steps to reproduce (what sequence of apps to launch, for
> instance) or a regression range here?  There's not much to go on for
> investigating this yet.

Regression window is in comment 0 with the first failure in a nightly build on:

Gecko  http://hg.mozilla.org/mozilla-central/rev/a468b2e34b04
Gaia  753bed59566ad14c5e032e45d2b320ef9529ca9a
BuildID 20130909195156
Version 26.0a1

STR can be followed using the test in question in Gaia UI Automation.

Zac - Which gaia ui test was used to reproduce this bug?
Flags: needinfo?(zcampbell)
A single data point is not a window.  What is the last known good revision?
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #6)
> A single data point is not a window.  What is the last known good revision?

We can't do that easily - B2G does not support per changeset builds unless it's done via manual generation of builds. We can only reduce regression windows down by nightly builds available. Bisections should only be requested if there's absolutely no way to figure out the issue. You've right now got a reproducible automated test + regression window.

Note - That implies that the regression occurred over the weekend between Sept 6th - Sept 9th.
Again, a regression window involves two points, a known good revision and a known bad revision.  You've provided a bad revision.  We need a good revision too.  I'm asking for the last *known* good revision, not the last good revision.  There's no need to build anything, just provide the cset ids for the last nightly that works.
Looks like we first failed with build:

https://pvtbuilds.mozilla.org/pub/mozilla.org/b2g/nightly/mozilla-central-unagi-eng/2013/09/2013-09-09-19-51-56/

Which implies we then last passed with build:

https://pvtbuilds.mozilla.org/pub/mozilla.org/b2g/nightly/mozilla-central-unagi-eng/2013/09/2013-09-06-04-02-04/

Which means the last known good revision is:

Gecko  http://hg.mozilla.org/mozilla-central/rev/ab5f29823236
Gaia   94e5f269874b02ac0ea796b64ab995fce9efa4b3
Version 26.0a1
Great, thanks.

The gecko change range is http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=ab5f29823236&tochange=a468b2e34b04

There's nothing in that range that really jumps out at me.  Bug 817700 might be the culprit, though it has been backed out on trunk so this would be WFM then.  Bug 907745 is a little suspicious, although I don't really understand what it changes.
I think you meant to cc Nical, not me. Unless I'm missing something.
FWIW - Looking at the Gaia::Gallery patches in that regression range at https://github.com/mozilla-b2g/gaia/commits/master/apps/gallery, there's only three commits that we're pushed on 9/6:

https://github.com/mozilla-b2g/gaia/commit/946a506e9d312b82822e9593591e0993c8cb0943

https://github.com/mozilla-b2g/gaia/commit/bfe763146301349e30fea0bd5265db28a78b2be2

https://github.com/mozilla-b2g/gaia/commit/3628d9f5829f087782c708981c4863a7d885a96a

Not sure if any of those look suspicious. djf would probably know if any of them are.
None of those commits look suspicious to me. The first is just tests. The third is trivial.  You could try reverting the second, but I doubt it will make a difference.

I'm also not convinced in comment #3 that there is a gallery regression here but maybe I'm just not following the logic. 

Bug 914412 landed 6 days ago and modified the window management code. I don't understand it but there are comments in that bug about the OOM killer and the danger that the browser app would get kiiled. Probably not related to this because it was only supposed to affect cases where inline activities were launched.  I'm mentioning it only because I don't have any other ideas.  It was a gaia patch so probably easy to revert and test without, if you want to try.
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #10)
> Bug 907745 is a little suspicious, although I don't really understand
> what it changes.

It switches code paths related compositing video and non-accelerated canvas 2d (most canvas 2D are now accelerated on b2g since skia-GL canvas landed) on b2g, to a new architecture that doesn't have all the problems we often face with using Gralloc, I have been instrumenting that code a lot for the last few weeks and I would be very surprised that it makes any difference as far as memory consumption is concerned.
Actually, it is the third commit above that is more substantial than the second. But neither one would even have any effect until your gallery app is scanning photos when you launch it.  (And if it is, that is a memory intensive process, and 3 apps is probably actually pretty good.)

Be sure, when you test, that the gallery is a stable state and is not finding and parsing metadata for new photos when you launch it, or it really isn't a fair test. If you're not scanning and are just leaving the gallery in at its list of thumbnails screen, then the commits listed above shouldn't even be running.

And those commits are from the bug that Kyle links to in comment 13, so same thing there.

Gallery does use a lot of memory for images, so it should always be suspect for OOMs.  But in this case it feels like a red herring to me.
(In reply to David Flanagan [:djf] from comment #14)
> None of those commits look suspicious to me. The first is just tests. The
> third is trivial.  You could try reverting the second, but I doubt it will
> make a difference.
> 
> I'm also not convinced in comment #3 that there is a gallery regression here
> but maybe I'm just not following the logic. 

I think my argument thinking this was related to the gallery was because that was the app always getting killed in the test 9/9/2013 and later intermittently. We also know that scanning is a memory intensive operation. Note that this test has not failed like this for quite some time, so I do think there's a regression present here. If it was always intermittent, then I'd agree that this wasn't a valid test to use as an assessment.

> 
> Bug 914412 landed 6 days ago and modified the window management code. I
> don't understand it but there are comments in that bug about the OOM killer
> and the danger that the browser app would get kiiled. Probably not related
> to this because it was only supposed to affect cases where inline activities
> were launched.  I'm mentioning it only because I don't have any other ideas.
> It was a gaia patch so probably easy to revert and test without, if you want
> to try.

bug 914412 however doesn't appear to fall in the regression range.

(In reply to David Flanagan [:djf] from comment #16)
> Actually, it is the third commit above that is more substantial than the
> second. But neither one would even have any effect until your gallery app is
> scanning photos when you launch it.  (And if it is, that is a memory
> intensive process, and 3 apps is probably actually pretty good.)

If scanning photos is memory intensive, then why couldn't that be a reason why this issue happening? Could the memory resource cost of scanning photos have increased leading to the intermittent test failure?

> 
> Be sure, when you test, that the gallery is a stable state and is not
> finding and parsing metadata for new photos when you launch it, or it really
> isn't a fair test. If you're not scanning and are just leaving the gallery
> in at its list of thumbnails screen, then the commits listed above shouldn't
> even be running.

Why wouldn't having this test including scanning metadata in the background be a valid test?

> 
> And those commits are from the bug that Kyle links to in comment 13, so same
> thing there.
> 
> Gallery does use a lot of memory for images, so it should always be suspect
> for OOMs.  But in this case it feels like a red herring to me.

The problem I have here is that this test hasn't intermittently failed like this for some time, but now is. The discussion above seems to imply that there may be value to study memory consumption over time during startup of the gallery app with metadata parsing happening and compare against the two target builds mentioned above to see if there's an increase of memory using the same target dataset the automation uses.
DJF, when we run this test (automated) the SD card is wiped before the test so there are no photos at all in the Gallery. So unless the scanning takes some time despite finding nothing.

However Jason is correct in that this test was very stable for a long time before this intermittent.
Flags: needinfo?(zcampbell)
Pardon me I didn't fill the needinfo properly!

The test to replicate it is here:
https://github.com/mozilla-b2g/gaia/blob/master/tests/python/gaia-ui-tests/gaiatest/tests/functional/cards_view/test_cards_view_with_three_apps.py

The initial bug didn't mention that this was replicated on Unagi device. 

If it is definitely a memory usage bug it could be sensitive to the device used.
Talking with rwood, approaches we consider doing next is:

Run the Gaia UI Test on the last working build & first affected with running adb shell b2g-ps & B2G/tools/get_about_memory.py), especially in cases in the test failing in the first affected build. With that information, we'll be able to better identify where the problem is.
Zac - Can someone on your team do the following:

With the last working build & first affected build:

1. Run adb shell b2g-ps & about_memory.py from https://github.com/mozilla-b2g/B2G/blob/master/tools/get_about_memory.py before you run the test & dump the results into log files to keep around
2. Run the Gaia UI Test in question
3. Run adb shell b2g-ps & about_memory.py from https://github.com/mozilla-b2g/B2G/blob/master/tools/get_about_memory.py after you run the test & dump the results into log files to keep around

Note - for the first affected build, make sure you can reproduce the test failure when doing this analysis.

After you do this, include the results at attachments to the bug. That will give Kyle here a good enough of information to go off of on how to diagnose this bug.
Flags: needinfo?(zcampbell)
We can do that tomorrow!
Attached file about:memory for passing build (obsolete) —
In comment #2 I had the test steps wrong, my apologies. The test merely opens the 3 apps and asserts the order they appear in the Cards View is the inverse order they were opened (so most recent first). Which is good, the test is simpler than I thought.

Now for data:

Passing build
=============

b2g-ps:
APPLICATION      USER     PID   PPID  VSIZE  RSS     WCHAN    PC         NAME
b2g              root      2895  1     174448 67820 ffffffff 40064330 S /system/b2g/b2g
Usage            app_2932  2932  2895  65560  26576 ffffffff 400cd330 S /system/b2g/plugin-container
Homescreen       app_2951  2951  2895  77144  30040 ffffffff 400f2330 S /system/b2g/plugin-container
Clock            app_3007  3007  2895  67144  27384 ffffffff 4005b330 S /system/b2g/plugin-container
Gallery          app_3029  3029  2895  68176  27368 ffffffff 400d8330 S /system/b2g/plugin-container
Calendar         app_3042  3042  2895  71240  29556 ffffffff 4006d330 S /system/b2g/plugin-container
(Preallocated a  root      3043  2895  62924  22612 ffffffff 40106330 S /system/b2g/plugin-container
Attached file about:memory for failing build (obsolete) —
Failing build  (2013-09-09)
=============
APPLICATION      USER     PID   PPID  VSIZE  RSS     WCHAN    PC         NAME
b2g              root      743   1     190728 68608 ffffffff 4010c330 S /system/b2g/b2g
Usage            app_781   781   743   66708  23944 ffffffff 40127330 S /system/b2g/plugin-container
Homescreen       app_800   800   743   80660  27124 ffffffff 40100330 S /system/b2g/plugin-container
Clock            app_856   856   743   74860  27948 ffffffff 400ba330 S /system/b2g/plugin-container
Calendar         app_890   890   743   77740  30508 ffffffff 4010f330 S /system/b2g/plugin-container
(Preallocated a  root      892   743   64008  22564 ffffffff 40049330 S /system/b2g/plugin-container
Although I have replicated this on 2013-09-06 build it is harder to replicate, as in it gets more frequent.

On the 2013-09-09 build it gets worse and is easier to replicate. Perhaps it is the cumulative effect of more than one commit in this range?

http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=aa9ec17cf912&tochange=740094c07328
Comment on attachment 807706 [details]
Pointer to Github pull request: https://github.com/mozilla-b2g/gaia/pull/12334

Sorry all wrong bug
Attachment #807706 - Attachment is obsolete: true
Duplicate of this bug: 919684
The dupe here implies that this isn't a gallery regression - it's likely a System regression now. Moving to Gaia::System as such.

Note - that also confirms this is possible to reproduce outside of automation.
Component: Gaia::Gallery → Gaia::System
This is not a smoketest blocker - this just so happened to be caught in today's smoketest. There's already a regression window included in the above comments.
Keywords: perf
Mike, can your team take a look here and do the koi? triage?
Flags: needinfo?(mlee)
Duplicate of this bug: 923316
No longer blocks: b2g-central-dogfood
Since this is only in master, we switched this to 1.3?  Kyle, what's your take?
blocking-b2g: koi? → 1.3?
Flags: needinfo?(khuey)
(In reply to Dave Huseby [:huseby] from comment #34)
> Since this is only in master, we switched this to 1.3?  Kyle, what's your
> take?

That's incorrect. The bug as filed was filed when master was 1.2 (9/11/2013). See the affected flag which indicates this reproduces on 1.2.
blocking-b2g: 1.3? → koi?
The about:memory logs here are not useful.  It appears that someone ran get_about_memory.py, and then opened about:memory inside the desktop Firefox browser that popped up and copied that?

Please zip and attach the entire folder that get_about_memory.py creates.  It should be something like $CURDIR/about-memory-N/ IIRC.
Flags: needinfo?(khuey)
Zac - Can you help address Kyle's comment in comment 36 when getting about:memory logs?
Flags: needinfo?(zcampbell)
Flags: needinfo?(mlee)
Whiteboard: [fromAutomation][MemShrink] → [c=memory p= s= u=] [fromAutomation] [MemShrink]
I can provide the info but it's very easy to do yourself. 

Since we had first raised this bug we had to disable more tests are the mem problem got a bit worse. 

Kyle, which build do you want the info for?
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #36)
> The about:memory logs here are not useful.  It appears that someone ran
> get_about_memory.py, and then opened about:memory inside the desktop Firefox
> browser that popped up and copied that?
> 

Actually the get_about_memory.py script loads that file in Firefox automatically and pardon but that is what led me to believe it to be the correct information.
Yeah, I can understand why you got confused by it.

Can you get a single report from a trunk build?  I suspect this is the same underlying issue as bug 919864.  Seeing high heap-unclassified on trunk would make more confident in that suspicion.
Attached file about-memory-test_cards_view.zip (obsolete) —
Here is a zip of the about-memory directory created after running the script.

I've run it on yesterday's Hamachi build out of convenience but if you want to dig deeper with a bit more time I can go back to the original test cases of 1.2/Unagi device.

See how this looks!

Device Hamachi
Gecko  http://hg.mozilla.org/mozilla-central/rev/64b497e6f593
Gaia  122ff8c6363227501f4121e5a3892ba41d4c0417
BuildID 20131008064334
Version 27.0a1
Attachment #802970 - Attachment is obsolete: true
Attachment #807129 - Attachment is obsolete: true
Attachment #807144 - Attachment is obsolete: true
Flags: needinfo?(zcampbell)
(In reply to Zac C (:zac) from comment #41)
> Created attachment 815312 [details]
> about-memory-test_cards_view.zip
> 
> Here is a zip of the about-memory directory created after running the script.
> 
> I've run it on yesterday's Hamachi build out of convenience but if you want
> to dig deeper with a bit more time I can go back to the original test cases
> of 1.2/Unagi device.
> 
> See how this looks!
> 
> Device Hamachi
> Gecko  http://hg.mozilla.org/mozilla-central/rev/64b497e6f593
> Gaia  122ff8c6363227501f4121e5a3892ba41d4c0417
> BuildID 20131008064334
> Version 27.0a1

Perfect, that's the right stuff.

I'll take a look at it once I'm fully awake.
Ok, this looks like something different from bug 919864.  We have zombie content parents floating around too.  I'll have to reproduce this and dig in here.
Assignee: nobody → khuey
Blocks: 801898
No longer blocks: 801898
Whiteboard: [c=memory p= s= u=] [fromAutomation] [MemShrink] → [c=memory p= s= u=] [fromAutomation] [MemShrink] [xfail]
Status: NEW → ASSIGNED
Target Milestone: --- → 1.2 C3(Oct25)
Whiteboard: [c=memory p= s= u=] [fromAutomation] [MemShrink] [xfail] → [c=memory p= s= u=] [fromAutomation] [MemShrink:P1] [xfail]
Keywords: qablocker
blocking-b2g: koi? → koi+
Priority: -- → P1
Whiteboard: [c=memory p= s= u=] [fromAutomation] [MemShrink:P1] [xfail] → [c=memory p= s= u=1.2] [fromAutomation] [MemShrink:P1] [xfail]
Updating Target Milestone for FxOS Perf koi+'s.
Target Milestone: 1.2 C3(Oct25) → 1.2 C4(Nov8)
Duplicate of this bug: 933747
Whiteboard: [c=memory p= s= u=1.2] [fromAutomation] [MemShrink:P1] [xfail] → [c=memory p= s= u=1.2] [fromAutomation] [MemShrink:P1] [xfail] burirun3
I've been looking into this but haven't managed to reproduce it yet.  Going to try with a lower memory limit in the emulator tomorrow, and if that doesn't work I'll try an actual device.
Hi Kyle, that does not surprise me! For the test automation that caught this bug we have it enabled for desktopb2g and expected failure for on device testing.
I've never been able to work out how much RAM the desktopb2g uses; I'd still like to know.
No doubt lowering the memory limit on the emulator will achieve the same effect.
Actually Kyle this test has started passing again on Hamachi devices. Maybe some memory improvements were made in the last couple of days.

We'll enable the test but I still think it's worth looking into this using an older known buggy build so we can close this bug and know why we're closing it.
QA Wanted - Can we confirm this no longer reproduces? See the dupes for example STR to use to test this.
Keywords: qawanted
I'm still seeing this issue when opening what I'd assume are more memory intensive apps like Music or Gallery, I've had as many as 5 or 6 apps open when it's only things like Settings, and an empty Dialer, Messages, and Contacts on a fresh flash.
Opening something like Gallery or Music will knock the task manager down to only 2 or 3 open apps when more have been opened previously.

I get the same results in 1.2 and 1.3.

Environmental Variables:
Device: Buri v1.2 Mozilla RIL
BuildID: 20131107004003
Gaia: 590eb598aacf1e2136b2b6aca5c3124557a365ca
Gecko: 26f1e160e696
Base Image: 20131104

and

Environmental Variables:
Device: Buri v1.3 Mozilla RIL
BuildID: 20131107040200
Gaia: 42bbe26a72e030faf07a6fc297f61a3a8ccda25b
Gecko: 70de5e24d79b
Version: 28.0a1
Base Image: 20131104
Keywords: qawanted
QA Contact: jzimbrick
A contributing factor here could be a change that was merged into gaiatest today aimed at reducing the memory used by the test framework during perf/endurance tests. In hindsight these test cases were probably suffering from the same problem, memory sucked by the test framework!
(In reply to Zac C (:zac) from comment #51)
> A contributing factor here could be a change that was merged into gaiatest
> today aimed at reducing the memory used by the test framework during
> perf/endurance tests. In hindsight these test cases were probably suffering
> from the same problem, memory sucked by the test framework!

I think you are referring to the patch from bug 924565.  That might help here, but those objects only would have started truly leaking when bug 915598 landed.  That was around 10/5 and it looks like this bug was reported a month earlier.
Yes I was talking about bug 924565, thanks Ben. Sounds like there are more than one contributing factor.

Anyway this test case was never intended to catch a memory issue - it just lucked into it.
(In reply to J Zimbrick from comment #50)
> I'm still seeing this issue when opening what I'd assume are more memory
> intensive apps like Music or Gallery, I've had as many as 5 or 6 apps open
> when it's only things like Settings, and an empty Dialer, Messages, and
> Contacts on a fresh flash.
> Opening something like Gallery or Music will knock the task manager down to
> only 2 or 3 open apps when more have been opened previously.
> 
> I get the same results in 1.2 and 1.3.
> 
> Environmental Variables:
> Device: Buri v1.2 Mozilla RIL
> BuildID: 20131107004003
> Gaia: 590eb598aacf1e2136b2b6aca5c3124557a365ca
> Gecko: 26f1e160e696
> Base Image: 20131104
> 
> and
> 
> Environmental Variables:
> Device: Buri v1.3 Mozilla RIL
> BuildID: 20131107040200
> Gaia: 42bbe26a72e030faf07a6fc297f61a3a8ccda25b
> Gecko: 70de5e24d79b
> Version: 28.0a1
> Base Image: 20131104

Can you specifically test this by opening the Clock app, Gallery app, and Calendar and then closing the apps in reverse order? That's the bug here - there's always going to be cases with apps getting killed in the background.
Keywords: qawanted
Actually, let me be really specific - test the following cases:

1. Launch the Clock app, Gallery app, and Calendar app and then close those apps in reverse order.

2. Launch the browser app, contacts app, and settings app and then close those apps in reverse order.

If any of the apps get killed in the background during testing the above scenarios, indicate which one gets killed. Repeat the above test cases 3 times.
Removing xfail as the relevant automated test is now passing.
Whiteboard: [c=memory p= s= u=1.2] [fromAutomation] [MemShrink:P1] [xfail] burirun3 → [c=memory p= s= u=1.2] [fromAutomation] [MemShrink:P1] burirun3
Repeating procedures 1 and 2 stated in Comment 55 produced the following results:

On 1.1:

1. Only Calendar and Clock were displayed all three times.
2. All apps were displayed on all three tries.

On 1.2:

1. All three apps displayed one time, the other two tries only displayed Calendar and Clock.
2. All apps were displayed on all three tries.

On 1.3:

1. All three apps displayed one time, the other two tries only displayed Calendar and Clock.
2. All apps were displayed on all three tries.


1.1's environmental variables are as follows:
Environmental Variables:
Device: Buri v1.1 Mozilla RIL
BuildID: 20131107041203
Gaia: 39b0203fa9809052c8c4d4332fef03bbaf0426fc
Gecko: 31fa87bfba88
Version: 18.0
Base Image: 20131104


The environmental variables for the 1.2 and 1.3 builds are the same as stated in Comment 50.
Keywords: qawanted
Thanks for the detailed analysis. I'll go talk with Sandip about this to find out if we need to block on this still knowing comment 57.
(In reply to J Zimbrick from comment #57)
> On 1.2:
> 
> 1. All three apps displayed one time, the other two tries only displayed
> Calendar and Clock.
> 2. All apps were displayed on all three tries.
> 
> On 1.3:
> 
> 1. All three apps displayed one time, the other two tries only displayed
> Calendar and Clock.
> 2. All apps were displayed on all three tries.

Was there any pattern here?  For example, was it the first try that all the apps survived on both v1.2 and v1.3?  Or always the last try?
If I remember correctly, the first try on 1.2 displayed all three, and the next two tries only displayed two apps.

And on 1.3 it was the opposite, where the first two tries were the ones to only show two apps, and the third showed all three.
(In reply to J Zimbrick from comment #60)
> If I remember correctly, the first try on 1.2 displayed all three, and the
> next two tries only displayed two apps.
> 
> And on 1.3 it was the opposite, where the first two tries were the ones to
> only show two apps, and the third showed all three.

Darn.  I thought there might be a clue there.  :-)  Thanks for the info!
Target Milestone: 1.2 C4(Nov8) → 1.2 C5(Nov22)
Flags: needinfo?(ffos-product)
Discussed in triage, but this isn't going to happen for 1.2. Product can comment here on what we can do for a future release, but I don't see this happening in 1.2.
blocking-b2g: koi+ → ---
Since our performance commitment for v1.2 is not to degrade it from 1.1 to 1.2, based on comment #57, this should be a blocker for v1.2. Existing 1.1 Users would notice this degradation once their devices get v1.2 software updates. btw, Can QA provide how many cards you could open without OOM issues in 1.1? Was that 4? or more? 

However I do realize that memory pressure may have increased in v1.2 with newer features. Is it only the gallery app (with its memory intensive operations) that is most impacted? If we can zero down on the overall degradation here, is there a way to educate the user about this (either via UI, or with a call out in in release notes)?
Flags: needinfo?(ffos-product)
I think the patches in bug 924565 might help here since it will allow DOMRequestHelper objects to get cleaned up under memory pressure.
Depends on: 924565
Alright - moving back to the koi+ then per comment 63
blocking-b2g: --- → koi+
Whiteboard: [c=memory p= s= u=1.2] [fromAutomation] [MemShrink:P1] burirun3 → [c=memory p= s= u=1.2] [fromAutomation] [MemShrink:P1] burirun3 [xfail]
Wait a second, the results in comment 57 don't make sense. That would indicate the gallery was never present in the background on 1.1, but the test automation seems to indicate otherwise. We never had this test fail on 1.1.
See comment 67 - I think you need to recheck your 1.1 test results here.
Flags: needinfo?(jzimbrick)
Got the same results as comment 67 if the gallery was still searching for pictures, only Clock and Calendar would be shown in the task manager every time.

If the gallery is opened and allowed to sit for a minute or so and finish loading all of the pictures, all three apps will display in the task manager.

Environmental Variables:
Device: Buri v1.1 Mozilla RIL
BuildID: 20131115041203
Gaia: 4fa6e6362b6a35fd18c7a631dfdaca748cc22c18
Gecko: 7c3cfc0936ca
Version: 18.0
Base Image: 20131104
Flags: needinfo?(jzimbrick)
(In reply to J Zimbrick from comment #69)
> Got the same results as comment 67 if the gallery was still searching for
> pictures, only Clock and Calendar would be shown in the task manager every
> time.
> 
> If the gallery is opened and allowed to sit for a minute or so and finish
> loading all of the pictures, all three apps will display in the task manager.
> 
> Environmental Variables:
> Device: Buri v1.1 Mozilla RIL
> BuildID: 20131115041203
> Gaia: 4fa6e6362b6a35fd18c7a631dfdaca748cc22c18
> Gecko: 7c3cfc0936ca
> Version: 18.0
> Base Image: 20131104

That's not the right way to reproduce the bug. The test here requires that there are no pictures loaded in the sdcard. Retest w/o any pictures in the SD card.
Flags: needinfo?(jzimbrick)
I must have missed comment 16.

All apps stay open when there is nothing on the SD across 1.1, 1.2, and 1.3.

Environmental Variables:
Device: Buri v1.1 Mozilla RIL
BuildID: 20131115041203
Gaia: 4fa6e6362b6a35fd18c7a631dfdaca748cc22c18
Gecko: 7c3cfc0936ca
Version: 18.0
Base Image: 20131104

Device: Buri v1.2 Mozilla RIL
BuildID: 20131115004003
Gaia: a6484b1e6fc07cf6bd8d6fcf9aeebb14b7e8869d
Gecko: ff2c7c9d01d6
Version: 26.0
Base Image: 20131104

Device: Buri v1.3 Mozilla RIL
BuildID: 20131115040200
Gaia: ac42cb33f21b3f13595432c965f44615daae2225
Gecko: b2fab608772f
Version: 28.0a1
Base Image: 20131104
Flags: needinfo?(jzimbrick)
Okay - based on the above comment, that means this is an automation only bug, likely since we're using additional memory with marionette present. We don't need to block on this.
blocking-b2g: koi+ → ---
Assignee: khuey → mrbkap
(In reply to Jason Smith [:jsmith] from comment #72)
> Okay - based on the above comment, that means this is an automation only
> bug, likely since we're using additional memory with marionette present. We
> don't need to block on this.

Yes I can 'fix' this on the automation side by loading the apps more slowly.

We'll patch it to get the functional test back.
Assignee: mrbkap → zcampbell
Attached file github pr (obsolete) —
Attachment #815312 - Attachment is obsolete: true
Attachment #833164 - Flags: review?(florin.strugariu)
Attachment #833164 - Flags: review?(bob.silverberg)
(In reply to Zac C (:zac) from comment #74)
> Created attachment 833164 [details] [review]
> github pr

Shouldn't we do this patch on bug 922708?
Spoke with Aus & Blake in IRC - there's a suspicion that this might be because of a marionette client update within the regression range, implying that this regression actually is on the marionette side. The regression would also be related to memory spikes during launching of apps in rapid succession.

Dave - Do you know if there was a marionette client update within the target regression range (9/6/2013 - 9/9/2013)?
Flags: needinfo?(dave.hunt)
Attachment #833164 - Attachment is obsolete: true
Attachment #833164 - Flags: review?(florin.strugariu)
Attachment #833164 - Flags: review?(bob.silverberg)
Assignee: zcampbell → mrbkap
Here are the dates the Python marionette_client package versions were published to PyPI:

v0.5.36: 2013-08-01
v0.5.37: 2013-09-05
v0.6.0:  2013-10-16
Flags: needinfo?(dave.hunt)
(In reply to Dave Hunt (:davehunt) from comment #77)
> Here are the dates the Python marionette_client package versions were
> published to PyPI:
> 
> v0.5.36: 2013-08-01
> v0.5.37: 2013-09-05
> v0.6.0:  2013-10-16

Hmm...so it's plausible this could have regressed by the Marionette client changes in v0.5.37, since it's near the target regression range.

Given that the above analysis indicates this is likely a regression coming from marionette & there's plausibility that v0.5.37 caused this, I'm moving this over to the Marionette component to have someone from the ateam investigate this.
Assignee: mrbkap → nobody
Component: Gaia::System → Marionette
No longer depends on: 924565
Priority: P1 → --
Product: Firefox OS → Testing
QA Contact: jzimbrick
Whiteboard: [c=memory p= s= u=1.2] [fromAutomation] [MemShrink:P1] burirun3 [xfail] → [c=memory p= s= u=1.2] [fromAutomation] burirun3 [xfail]
Target Milestone: 1.2 C5(Nov22) → ---
I can replicate this on an engineering build just by loading the apps really quickly.
I'm going to switch back to a user build and see if I get the same thing.
It's pretty unscientific but the Engineering build can definitely not load as many apps:

Base: V1.2_US_20131115.cfg
Gaia:     71063dd91bc8cbb15ba335236ed67a1c5058bd58
Gecko:    http://hg.mozilla.org/mozilla-central/rev/cf378dddfac8    
BuildID   20131121040202
Version   28.0a1

My STR:
1. Flash the build
2. Complete the FTU
3. Pan to the next screen
4. tap calendar, tap home, tap clock, tap home, tap settings. All this should take less than 2 seconds
5. Wait 4-5 seconds
6. Open cards view and swipe across to and enter the Clock app
7. Open cards view again and check the number of apps open

Kill all apps and repeat again to get a few samples.

The user build will handle 3 apps and the engineering build will not.
I really don't think this has anything to do with Marionette, given the manual STR given by Zac.  There are several differences between user and eng builds; Marionette isn't the only one.
(In reply to Jonathan Griffin (:jgriffin) from comment #81)
> I really don't think this has anything to do with Marionette, given the
> manual STR given by Zac.  There are several differences between user and eng
> builds; Marionette isn't the only one.

Okay, I'll move it back to general.

At this point, I think the best debugging strategy we have is to do the same manual STR on a user build & eng build, get the memory report during the STR, and compare the results.
Component: Marionette → General
Product: Testing → Firefox OS
Whiteboard: [c=memory p= s= u=1.2] [fromAutomation] burirun3 [xfail] → [c=memory p= s= u=1.2] [fromAutomation] burirun3
Whiteboard: [c=memory p= s= u=1.2] [fromAutomation] burirun3 → [c=memory p= s= u=1.2] [fromAutomation] burirun3 [xfail]
Assignee: nobody → dhuseby
Priority: -- → P1
Whiteboard: [c=memory p= s= u=1.2] [fromAutomation] burirun3 [xfail] → [c=memory p= s= u=1.3] [fromAutomation] burirun3 [xfail]
Whiteboard: [c=memory p= s= u=1.3] [fromAutomation] burirun3 [xfail] → [c=memory p=2 s= u=1.3] [fromAutomation] burirun3 [xfail]
Whiteboard: [c=memory p=2 s= u=1.3] [fromAutomation] burirun3 [xfail] → [c=memory p=2 s= u=1.3] [fromAutomation] burirun3 [xfail] burirun1.3-1
Whiteboard: [c=memory p=2 s= u=1.3] [fromAutomation] burirun3 [xfail] burirun1.3-1 → [c=memory p=2 s= u=1.3] [fromAutomation] burirun3 burirun1.3-1
Assignee: dhuseby → mchang
Whiteboard: [c=memory p=2 s= u=1.3] [fromAutomation] burirun3 burirun1.3-1 → [c=memory p=2 s= u=] [fromAutomation] burirun3 burirun1.3-1
Assignee: mchang → bkelly
I'm dropping this in favor of bug 951806 which is a 1.3+ blocker.
Assignee: bkelly → nobody
Status: ASSIGNED → NEW
Assignee: nobody → jhylands
Status: NEW → UNCONFIRMED
Ever confirmed: false
Attached file Memory Report 1
Memory Report 1 is a user-build Hamachi, running 1.3 (built from source), on Thursday Feb 6, 2014. It is a baseline with only the settings app open.
Attached file Memory Report 3
Memory Report 3 is a user-build Hamachi, running 1.3 (built from source), on Thursday Feb 6, 2014. It was taken after the Calendar, Clock, and Settings apps were opened per Comment 80.
Attached file Memory Report 4
Memory Report 4 is a eng-build Hamachi, running 1.3 (built from source), on Thursday Feb 6, 2014. It was taken after the Calendar, Clock, and Settings apps were opened per Comment 80, but before the Calendar app was killed.
Attached file Memory Report 5
Memory Report 5 is a eng-build Hamachi, running 1.3 (built from source), on Thursday Feb 6, 2014. It was taken after the Calendar, Clock, and Settings apps were opened per Comment 80, after the Calendar app was killed.
See Also: → 968297
Keywords: qablocker
Bug 968297 has now landed and I've re-tested this opening the dialer, messages, clock and settings app without issues. I suggest re-testing and closing as a dup of bug 968297 if the problem went away as this was very likely caused by that issue.
(In reply to Gabriele Svelto [:gsvelto] from comment #88)
> Bug 968297 has now landed and I've re-tested this opening the dialer,
> messages, clock and settings app without issues. I suggest re-testing and
> closing as a dup of bug 968297 if the problem went away as this was very
> likely caused by that issue.

If you've confirmed this fixed, then we don't need to worry about retesting. I'll close this out as fixed by that bug.
Status: UNCONFIRMED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.