Closed Bug 1058831 Opened 7 years ago Closed 7 years ago

[Flame][KK]e-mail with pictures crashes after repeated pinch zoom / scrolling

Categories

(Core :: Layout, defect, P2)

ARM
Gonk (Firefox OS)
defect

Tracking

()

VERIFIED FIXED
2.1 S5 (26sep)
blocking-b2g 2.0+
Tracking Status
b2g-v1.4 --- wontfix
b2g-v2.0 --- verified
b2g-v2.0M --- verified
b2g-v2.1 --- unaffected
b2g-v2.2 --- unaffected

People

(Reporter: njpark, Assigned: asuth)

References

Details

(4 keywords, Whiteboard: [MemShrink:P2][c=memory p= s= u=2.0])

Attachments

(6 files)

Attached file pinch.txt
This bug is reproducible in 2.0 build with KK base image (v165) (w/ 319MB memory)
logcat attached.  

One full zoom in/out combo is sufficient to exit the E-mail application

Version info:
Gaia      a51633e29a7826b6bf07cb1c5ad81b3217b9820a
Gecko     8f121f8fa5fa1a3e82967f91e2afe337465fc593
BuildID   20140826105138
Version   32.0
ro.build.version.incremental=18
ro.build.date=Thu Aug 14 19:29:46 CST 2014

+++ This bug was initially created as a clone of Bug #1019693 +++

STR:
Open an HTML email with pictures (attached e-mail)
Pinch zoom in/out, and scroll.  Repeat these actions.

Expected:
the image zooms in/out and scrolls normally
Actual:
the image freezes, and eventually the e-mail app exits suddenly.

Video location:
https://www.dropbox.com/s/lyl1w3ij5h37uvj/VID_20140603_111520.mp4


Version Info:
│ Gaia      61cd07a8b5fa017777db6d345e00afb4fb8789b7                         │
│ Gecko     https://hg.mozilla.org/mozilla-central/rev/f28005b84ed0          │
│ BuildID   20140603040210                                                   │
│ Version   32.0a1                                                           │
│ ro.build.version.incremental=94                                            │
│ ro.build.date=Tue May 20 09:29:20 CST 2014
Email that causes the app to exit unexpectedly when one full set of zoom in/out is performed
Severity: blocker → normal
Priority: P1 → --
Hi Milan, is there any additional information you need?  I'll test it on 2.1 KK build shortly.
Flags: needinfo?(milan)
Additional context from the previous bug and from IRC discussion between :njpark and :tchung and myself:

In bug 1019693 the fix that was landed on trunk/v2.1 resulted in a transient spike of ~30MiB for the given email.  In my v2.0 testing, the transient spike was ~80MiB.  I measured by running on a flame device with a full 1GiB of memory and using firewatch and repeated pinching/zooming to see what the tallest spikes where.  It's also notable that in some situations memory spikes would not occur at all, but pinching/zooming would be accompanied by graphical corruption.

The options as I (working on the email app) broadly see them are:
- The "correct" way:
  - See if there's something on v2.1 that would be safe/obvious to uplift to v2.0, assuming :njpark's tests show v2.1 is still fine.
- The "mitigation" ways.  Assuming the memory spikes are a function of the total area of the rendered HTML document and the zoom factor, have the email app do some combination of:
  - limiting the zoom factor based on apparent HTML area to limit memory spikes.  Right now we go up to 2x.  (Magic inline constant in apps/email/js/iframe_shims.js at the bottom of the file under the comment  "Never zoom in farther than 2x".)
  - Disable our manual pinch/zoom logic when the area of the HTML document exceeds some threshold.  Probably just stick to a 1x display rather than our initially-zoomed-out-to-fit-the-page-width zoom.
  - Some combination of the above.

Note that we only enable pinch/zoom on a HTML mail when we think the message wants to be wider than the screen area.  This usually means some combination of ads (that may stick to a single screen's worth or be very lengthy) or HTML newsletters/blasts where they have graphic designers and really want a 600px wide rendering area.  So it might make sense to apply aggressive heuristics for any pinch/zoom scenario.
What's the relationship of this bug to bug 1019693?  This is for KK and the other one was for JB, and the patch from bug 1019693 fixed JB, but the issue still exists on KK?
Flags: needinfo?(milan)
Just to note that with v2.1 KK, crash does not occur under the same condition.  the pinch zoom action does lag a bit, but I could not make the app to exit unexpectedly.
(In reply to Milan Sreckovic [:milan] from comment #4)
> What's the relationship of this bug to bug 1019693?  This is for KK and the
> other one was for JB, and the patch from bug 1019693 fixed JB, but the issue
> still exists on KK?

bug 1019693 was found under JB, with 512MB.
This bug was found under KK v2.0, with 319MB of memory.

Per comment 5, this does not reproduce in KK 2.1 with 319MB of memory.
(In reply to Milan Sreckovic [:milan] from comment #4)
> What's the relationship of this bug to bug 1019693?  This is for KK and the
> other one was for JB, and the patch from bug 1019693 fixed JB, but the issue
> still exists on KK?

Bug 1019693 addressed the "pinch/zoom uses so much memory that the email app gets OOM-killed on lower memory devices" by fixing things so that a tiled layer was properly used.  However, not all memory issues were addressed for v2.0 since we saw a swing of 80 MiB instead of values like 240 MiB  we were seeing before.

Per Kats' comment in https://bugzilla.mozilla.org/show_bug.cgi?id=1019693#c80 it made sense to land that patch and close that bug and to file a new one for rendering glitches.  I realize now I slightly misread that to also mean we should still use a new bug even if there was still non-trivial memory usage.  Given that it's also been almost 2 weeks since the uplift, it also seemed reasonable to open a new bug.

From my perspective, the main question is whether there's something obvious we can uplift to v2.1 to v2.0 to make 2.0 happy like v2.1 or whether we need to turn to mitigations.  Unfortunately, because the bug 1019693 fix changed code-paths, a trivial bisection won't work to identify the change in 2.1.  Bisection would need to apply the bug 1019693 to each step of the bisection and then build.
No-Jun - Given this is a followup to a 2.0 blocker, should this be nominated as well?
Flags: needinfo?(npark)
[Blocking Requested - why for this release]:
Nominating for 2.0 blocking per discussion with asuth and tchung for following reason:
- Original bug found on 512MB was verified, but production environment will have 319MB
- It is more readily reproducible in 319MB KK v2.0 (one single full pinch zoom in and out is sufficient to cause application to exit)
blocking-b2g: --- → 2.0?
Flags: needinfo?(npark)
NI :matt for further input given this is related to 1019693.

 Given comment #9, i think we cshould block on this though.
Flags: needinfo?(matt.woodrow)
Given that 2.1 is ok here, I think it would be best to try find out what changes were made that fixed this.

Once we know that we can make a decision about whether to uplift them or to try mitigate this problem in other ways.
Flags: needinfo?(matt.woodrow)
Flagging window wanted to bisect to find out when this first passed & last failed.
(In reply to Jason Smith [:jsmith] from comment #12)
> Flagging window wanted to bisect to find out when this first passed & last
> failed.

Just realized we can't bisect this - we don't have daily builds on kitkat, so we're going to have to investigate this the old fashioned way.
blocking-b2g: 2.0? → 2.0+
Priority: -- → P2
Milan, I talked with Jason and he'd prefer that engineering do the investigation on this one since we don't have dailies on KK to work from. Can someone from your team take the lead on it?
Flags: needinfo?(milan)
Layout issue, passing to Jet, in case we can't wait for Monday when Matt is back.
Flags: needinfo?(milan) → needinfo?(bugs)
I don't have a flame, so probably need someone else to track this down.
Spoke offline to matt and we are trying to get QA help to do the bisection here to move forward here. :tchung/no-jun, will be helping here who may have to do their own nightlies and help out here.
Flags: needinfo?(npark)
Keywords: qawanted
(In reply to bhavana bajaj [:bajaj] from comment #17)
> Spoke offline to matt and we are trying to get QA help to do the bisection
> here to move forward here. :tchung/no-jun, will be helping here who may have
> to do their own nightlies and help out here.

No-Jun - Can you report back what happens on 1.4? Tony is going to followup offline to discuss the bisection more.
After loading base image v166 (comes with 1.4), the bug does not reproduce.  I'll check the latest 2.0 kk first and start doing the bisection.
I can repro the bug on the tip of 2.0 flame-kk gaia and gecko
Gaia      d056733f8a7a1a152f5458b323f092c47dbffa48
Gecko     775580c1762d2c71c24e7b835d4fceadd4923637
BuildID   20140903121226
Version   32.0
ro.build.version.incremental=23
ro.build.date=Thu Aug 28 15:16:38 CST 2014
Flags: needinfo?(npark)
Flags: needinfo?(npark)
Able to repro on today's Flame 2.0 JB version with 319 MB.  Zooming in and out on an email with an HTML image will cause the Email application to close unexpectedly.

BuildID: 20140903060552
Gaia: d056733f8a7a1a152f5458b323f092c47dbffa48
Gecko: 5137a70489dd
Platform Version: 32.0
Firmware Version: V123
User Agent: Mozilla/5.0 (Mobile; rv:32.0) Gecko/32.0 Firefox/32.0

Jason, do you want us to do a regression window from this?
Flags: needinfo?(jsmith)
Comment on attachment 8479261 [details]
FW_ Helpful Windows tips for your Surface Pro 3..eml

deprecated.  Use the 2 new .eml files uploaded
Attached file Bug_causing_mail_1.eml
Attached file Bug_causing_mail_2.eml
Flags: needinfo?(npark)
To reproduce the bug, try both emails in comment 23 and comment 24.  If the app does not crash after repeated pinch/zooming (does not have to be fast) on one email, try the other one.  Only time I could not reproduce was on 1.4 that came with v166 base image.

Looks like this bug may be present before 2.0.

Note that the parent bug Bug 1019693 is no longer reproducible, but that bug was caused by a different HTML email as well as being tested on 512MB device.
For test email comparisons note that (with external images displayed):
- The email from bug 1019693 resulted in an iframe that was 609px by 4278px
- Bug_causing_mail_1.eml results in an iframe that is 608px by 3314px
- Bug_causing_mail_2.eml results in an iframe that is 608px by 3130px

So in general we expect the memory usage of these reproduction cases to be less (for the case where we are rasterizing the iframe in its entirety, tiled though it may be.)

Also note that I don't think it matters which email is tested with; I'd say just pick one and stick with it.
If it's reproducible on JB, then let's move forward with a bisection.
Flags: needinfo?(jsmith)
To make sure there is no confusion: what we want from this bug is to identify the fix that v2.1 has (and that v2.0 lacks!) that made the bug 1019693 fix make things (more) happy on v2.1.  

The only way to identify a fix via bisection is to bisect the fixes that landed on v2.1 prior to the landing of the bug 1019693 fix, *with the fix from bug 1019693 applied*.

Bisecting using pre-existing builds will not tell us anything.  Custom spun try builds/etc. should work.
Summary: [Flame] e-mail with pictures crashes after repeated pinch zoom / scrolling → [Flame] v2.0-specific bug: e-mail with pictures crashes after repeated pinch zoom / scrolling, we need to locate and uplift a fix from v2.1
Adding qawanted to get No-jun some help.   

Joshua, Chris, please try to see if you can find the regression range of when it was working/stopped working around the time of the fix in bug 1019693.   You can use the attached emails in the examples above, or generate your own HTML based emails to reproduce.  According to https://bugzilla.mozilla.org/show_bug.cgi?id=1019693#c84, the patch was fixed on 8/13/2014.  

Therefore, start with the following tests:

Test and report against
* 319Mb, JB image, Flame 2.0 master build from 8/12/2014
* 319Mb, JB image, Flame 2.0 master build from 8/14/2014
* If the above are still broken, keep going back until you find the build where it was working on master.

Thanks!
Tony
Keywords: qawanted
Keywords: qawanted
Just to note, the bug does reproduce on build 20140812160208 of master (on both JB and KK), but it does not happen so easily as the tip of the 2.0 branch does.  This build is right before the fix to the bug 1019693 was added.

Also, I just realized, I have the browser app running in the backgroup when the crash happened.  
(open browser, open home button, then go to e-mail)
QA Contact: ckreinbring
(In reply to No-Jun Park [:njpark] from comment #30)
> Just to note, the bug does reproduce on build 20140812160208 of master (on
> both JB and KK), but it does not happen so easily as the tip of the 2.0
> branch does.  This build is right before the fix to the bug 1019693 was
> added.

Are you suggesting that we should back out 1019693 on 2.0 but leave it on 2.1?

> Also, I just realized, I have the browser app running in the backgroup when
> the crash happened.  
> (open browser, open home button, then go to e-mail)

Does it reproduce when it's just the e-mail app running?
Flags: needinfo?(bugs)
(In reply to Jet Villegas (:jet) from comment #31)
> Are you suggesting that we should back out 1019693 on 2.0 but leave it on
> 2.1?
> 
No not at all. the bug 1019693 was a fix for the same issue as the current bug, but it was for 512MB JB, so I assumed before the fix was introduced, the state of the OOM was much worse.  As asuth says, something made this bug to disappear in 2.1, so I thought around the time when the fix was added to the 512MB OOM is a good starting point for bisection. (with custom try builds as Comment 28 says)

> Does it reproduce when it's just the e-mail app running?
Yes it does (especially on the latest 2.0 branch), but it seems easier to reproduce when you have the browser running on the background.
Hi Chris, were you able to get the regression range for this bug on JB device? jsmith told me that you'll be working on it.
Flags: needinfo?(ckreinbring)
Unable to get a reverse regression window on JB Master with 319 MB memory.  The lastest build that was available reproed the bug, meaning that this issue is not fixed.

BuildID: 20140908062801
Gaia: c71fd5d8c9c7cb021c97e5e9fbb29f92b50a084d
Gecko: f7a27a866c47
Platform Version: 35.0a1
Firmware Version: V123
User Agent: Mozilla/5.0 (Mobile; rv:35.0) Gecko/35.0 Firefox/35.0
Flags: needinfo?(ckreinbring)
I'll test it on latest m-c with KK build and see whether I can repro the bug as well.
On today's mozilla-aurora flame eng build for kk, I was able to reproduce this with browser app running in the background.  Without the browser app on, I was unable to repro it.

https://pvtbuilds.mozilla.org/pvt/mozilla.org/b2gotoro/tinderbox-builds/mozilla-aurora-flame-kk-eng/20140909090754/
So... 2.1 is not better off than 2.0, but otherwise we're still trying to fix the extra 80mb usage and leaving it as 2.0+ blocker?
Attached image firewatch screenshot
The orange bar represents the memory usage.  You see a sharp jump before 850s and again in 860s, that was me doing zoom in / zoom out in one shot.  When I did it for the 3rd time around 870s, that's when the E-mail app died.  Looks like each set of pinch zoom in/out costs around 15MB of memory, and if the memory isn't all freed before the next action (i.e. I do pinch zoom really fast repeatedly- unlikely scenario but possible), it might be out of the memory.

This was done in 2.1 with 319MB of RAM:
Gaia      ee335e6340dab596239d335bb571b732775dc84e
Gecko     https://hg.mozilla.org/releases/mozilla-aurora/rev/48c37c8442cb
BuildID   20140912023307
Version   34.0a2
ro.build.date   Fri Sep 12 07:32:14 EDT 2014
ro.bootloader   L1TC10011800
regression window tag should have been removed with comment 34
QA Whiteboard: [QAnalyst-Triage+]
See Also: → 1058900
Milan, Any update on this bug?
This bug is still marked 2.0+
Flags: needinfo?(milan)
Discovered an interesting phenomenon while trying to come up with useful mitigations on v2.0.  In a nutshell:

- Perform the transform update inside the touch event handler => very, very low probability of memory spike
- Wrap the transform update inside a setTimeout(func, 10) => very, very HIGH probability of memory spike.

I noticed this while trying to have logic that throttles transform updates to avoid upsetting the graphics subsystem, and noticed it happening even when I set my delay to an absurd 8000ms.  And on the flip-side, if I set the throttle delay much lower (ex: 500 ms), I was fine if I re-triggered the transform update within a second as long as it was triggered by the event handler.

Is it possible there's something important about having the touch event delivery on the stack and/or some kind of opportunistic buffer reuse as long as the compositor doesn't actively have it locked or something?
(In reply to Sri Kasetti from comment #40)
> Milan, Any update on this bug?
> This bug is still marked 2.0+

I believe the idea is to deal with bug 1058900 first.
Flags: needinfo?(milan)
QA Whiteboard: [QAnalyst-Triage+] → [QAnalyst-Triage+][lead-review+]
Editing the title since it is also reproducible in 2.1 with less repro rate (Comment 38)
Summary: [Flame] v2.0-specific bug: e-mail with pictures crashes after repeated pinch zoom / scrolling, we need to locate and uplift a fix from v2.1 → [Flame][KK]e-mail with pictures crashes after repeated pinch zoom / scrolling
Can we get a 2.0 re-test once bug 1058900 is uplifted?
ni?ing myself to retest in 2.0 when bug 1058900 is uplifted.
Flags: needinfo?(npark)
Tested 2.0 and 2.1 after the fix.  The system is more responsive, and the app does not crash anymore (Or it is much harder to make it crash).  I did notice the temporarily unresponsive scroll UI, and raised Bug 1072982.  This only happens in 2.0
Flags: needinfo?(npark)
Since this issue is now resolved due to the fix for bug 1058900, marking it as resolved.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Assignee: nobody → bugmail
Target Milestone: --- → 2.1 S5 (26sep)
Issue is verified fixed in Flame 2.0 build (Full Flash, nightly, 319 MB memory). 

Actual Results: Zooming-in and out on attached photo in email app does not crash the application. 

Device: Flame 2.0
BuildID: 20141105000201
Gaia: fe2167fa5314c7e71c143a590914cbf3771905a8
Gecko: 093de6b632c5
Gonk: 48835395daa6a49b281db62c50805bd6ca24077e
Version: 32.0 (2.0)
Firmware: V188-1
User Agent: Mozilla/5.0 (Mobile; rv:32.0) Gecko/32.0 Firefox/32.0
Status: RESOLVED → VERIFIED
QA Whiteboard: [QAnalyst-Triage+][lead-review+] → [QAnalyst-Triage?]
Flags: needinfo?(ktucker)
QA Whiteboard: [QAnalyst-Triage?] → [QAnalyst-Triage+]
Flags: needinfo?(ktucker)
Verify passed, this issue can't be repro on Woodduck 2.0.
Attached: Verify_Woodduck_Mailpicture.mp4
Reproducing rate: 0/5

Woodduck build:
Gaia-Rev        cc690f8016b672475dc186bc7fd58aef45e684b7
Gecko-Rev       03d3ab62d5b07b915434f2d1d68495ad5915ecd2
Build-ID        20141118184148
Version         32.0
Blocks: Woodduck
You need to log in before you can comment on or make changes to this bug.