Closed Bug 984460 Opened 10 years ago Closed 10 years ago

[B2G][E-mail] Zooming 3+ times causes the E-Mail app to close

Categories

(Core :: Panning and Zooming, defect, P1)

30 Branch
ARM
Gonk (Firefox OS)
defect

Tracking

()

RESOLVED FIXED
mozilla31
blocking-b2g 1.4+
Tracking Status
firefox29 --- wontfix
firefox30 --- fixed
firefox31 --- fixed
b2g-v1.3 --- unaffected
b2g-v1.4 --- verified
b2g-v2.0 --- fixed

People

(Reporter: rkunkel, Assigned: kats)

References

Details

(Keywords: memory-footprint, perf, regression, Whiteboard: [c=memory p= s= u=1.4] dogfood1.4 [MemShrink:P2])

Attachments

(8 files, 4 obsolete files)

12.48 KB, text/plain
Details
1.08 MB, video/ogg
Details
70.29 KB, text/plain
Details
3.64 MB, application/x-zip-compressed
Details
443 bytes, message/rfc822
Details
1.64 KB, text/plain
Details
5.55 KB, text/plain
Details
5.57 KB, patch
mattwoodrow
: review+
Details | Diff | Splinter Review
Attached file logcat
Description:
When the user zooms in on the body of an E-mail 3 or more times, the E-mail app will force close. 

If the app does not close, the phone enters a state where the screen remains black and the user can not enter card view. User must restart the phone.

Repro Steps:
1) Update Buri to v1.4 BuildID: 20140317040204
2) Setup an existing E-mail on the device
3) Send an email to the device's linked E-mail
4) Double tap on the text in the body to zoom, do this three times so the text takes up roughly 1/3 of the screen
5)Try to scroll left or right

Actual:
The E-mail app force closes

Expected:
The E-mail app gracefully handles user interactions

Environmental Variables:
Device: Buri 1.4 MOZ RIL
BuildID: 20140317040204
Gaia: 8f802237927c7d5e024fb7dca054dd5efef6b2e6
Gecko: 25cfa01ba054
Version: 30.0a1
Firmware Version: V1.2-device.cfg

Repro frequency: 100%
See attached: logcat, video
OS: Linux → Gonk (Firefox OS)
Hardware: x86 → ARM
Attached video Video_0001.ogg
This issue does not reproduce on the latest v1.3 Buri Build. Zooming the third time returns the text to it's original size. 

Environmental Variables:
Device: Buri 1.3 MOZ RIL
BuildID: 20140317004001
Gaia: 0ab8a9cbcef5f23cec904a3d7f7675e44de29951
Gecko: f824e9d91a2d
Version: 28.0
Firmware Version: V1.2-device.cfg
blocking-b2g: --- → 1.4?
Component: Gaia::Browser → Panning and Zooming
Product: Firefox OS → Core
Version: unspecified → 30 Branch
Does this reproduce with tiling disabled on 1.4?
(In reply to Jason Smith [:jsmith] from comment #3)
> Does this reproduce with tiling disabled on 1.4?

Also - let's get an about:memory report for this & a dmesg log.
Keywords: perf
Whiteboard: dogfood1.4 → dogfood1.4 [MemShrink]
blocking-b2g: 1.4? → 1.4+
QA Contact: pbylenga
Dylan, can you have Andrew or someone else working on Email take a look at this?
Flags: needinfo?(doliver)
Priority: -- → P1
Whiteboard: dogfood1.4 [MemShrink] → [c=memory p= s= u=1.4] dogfood1.4 [MemShrink]
No-Jun, Jason's question on "is this reproducible with tiling disabled" got lost - can you take a quick look?  Make sure you restart the device after changing the pref.
Flags: needinfo?(npark)
Actually, this does not reproduce on Buri with following build (with tiling enabled/disabled).  As RolandK says, tapping it for the 3rd time reverts back to the original size:

Gaia      c03a6af9028c4b74a84b5a98085bbb0c07261175
Gecko     https://hg.mozilla.org/mozilla-central/rev/082761b7bc54
BuildID   20140318160201
Version   31.0a1
ro.build.version.incremental=eng.tclxa.20131223.163538
ro.build.date=Mon Dec 23 16:36:04 CST 2013
Flags: needinfo?(npark)
Actually, I forgot that I should have loaded 1.4.  I'll retest.  Sorry about that.
Flags: needinfo?(npark)
Hmm, actually , in the final copy of 1.4 build, this does not reproduce either (with or without tiling):

Gaia      c03a6af9028c4b74a84b5a98085bbb0c07261175
Gecko     https://hg.mozilla.org/releases/mozilla-aurora/rev/b07ecc057abf
BuildID   20140319000200
Version   30.0a2
ro.build.version.incremental=eng.tclxa.20131223.163538
ro.build.date=Mon Dec 23 16:36:04 CST 2013
Flags: needinfo?(npark)
I'm able to reproduce the issue with and without tiling on the latest Buri v1.4.

After further investigation this issue occurs only when a very long string is in the email without space breaks.  A hyperlink doesn't reproduce the bug, but a broken long url will.  Such that if a user taps multiple times on a long url missing 'www' they will encounter this issue.

Environmental Variables:
Device: Buri 1.4 MOZ RIL
BuildID: 20140319000200
Gaia: c03a6af9028c4b74a84b5a98085bbb0c07261175
Gecko: b07ecc057abf
Version: 30.0a2
Firmware Version: V1.2-device.cfg
Keywords: qawanted
Attached file dmesg log
Attached file about-memory-4.zip
Note: There were several errors displayed in the console that I don't usually see while capturing the about memory report, they begin just as the issue is reproduced. I tried the about memory pull three separate times while reproducing this bug and got a similar result each time. 
  I've included a copy of the console output here displaying the errors, as well as what was populated in the about-memory folder itself as the attachment. Hopefully these are of some use.

---

$ MOZ_IGNORE_NUWA_PROCESS=1 ./get_about_memory.py
Warning: Child 361 exited during memory reporting
Warning: Child 523 exited during memory reporting
Warning: Child 460 exited during memory reporting
Got 1/1 files.
Pulled files into about-memory-4.

To view this report, open Firefox on this machine and load the
following URL:

  about:memory?file=/home/Root/B2G/tools/about-memory-4/memory-reports

Pulling GC/CC logs...
Got 6/6 files.
Command adb pull /data/local/tmp/memory-reports/gc-edges.137.1395268527.log failed with error code 1
Traceback (most recent call last):
  File "./get_about_memory.py", line 291, in <module>
    get_and_show_info(args)
  File "./get_about_memory.py", line 231, in get_and_show_info
    get_gc_cc_log.get_logs(args, out_dir=out_dir, get_procrank_etc=False)
  File "/home/Root/B2G/tools/get_gc_cc_log.py", line 90, in get_logs
    utils.run_and_delete_dir_on_exception(do_work, out_dir)
  File "/home/Root/B2G/tools/include/device_utils.py", line 147, in run_and_delete_dir_on_exception
    return fun()
  File "/home/Root/B2G/tools/get_gc_cc_log.py", line 83, in do_work
    out_dir=out_dir)
  File "/home/Root/B2G/tools/include/device_utils.py", line 265, in notify_and_pull_files
    new_files = _pull_remote_files(all_outfiles_prefixes, old_files, out_dir)
  File "/home/Root/B2G/tools/include/device_utils.py", line 325, in _pull_remote_files
    shell('adb pull %s' % f, cwd=out_dir)
  File "/home/Root/B2G/tools/include/device_utils.py", line 78, in shell
    raise subprocess.CalledProcessError(proc.returncode, cmd, err)
subprocess.CalledProcessError: Command 'adb pull /data/local/tmp/memory-reports/gc-edges.137.1395268527.log' returned non-zero exit status 1

---

I'll start on the regression window, and can capture another log if requested.
QA Contact: pbylenga → jzimbrick
(In reply to Peter Bylenga from comment #10)
> I'm able to reproduce the issue with and without tiling on the latest Buri
> v1.4.
> 
> After further investigation this issue occurs only when a very long string
> is in the email without space breaks.  A hyperlink doesn't reproduce the
> bug, but a broken long url will.  Such that if a user taps multiple times on
> a long url missing 'www' they will encounter this issue.

Thanks for narrowing this down Peter.  I wonder if we doing something special with the broken (or otherwise) urls when it comes to navigation?  Or is it just a long string that causes layout to give us a different result?

Assigning to me to signify tracking.
Assignee: nobody → milan
The long string will change the results of the double-tap-to-zoom algorithm that is invoked as per the STR. Specifically, it might change a zoom-out to a zoom-in, and so we might be zooming in a lot more with long strings than without.
(In reply to Milan Sreckovic [:milan] from comment #13)
> (In reply to Peter Bylenga from comment #10)
> > I'm able to reproduce the issue with and without tiling on the latest Buri
> > v1.4.
> > 
> > After further investigation this issue occurs only when a very long string
> > is in the email without space breaks.  A hyperlink doesn't reproduce the
> > bug, but a broken long url will.  Such that if a user taps multiple times on
> > a long url missing 'www' they will encounter this issue.
> 
> Thanks for narrowing this down Peter.  I wonder if we doing something
> special with the broken (or otherwise) urls when it comes to navigation?  Or
> is it just a long string that causes layout to give us a different result?
> 
> Assigning to me to signify tracking.

I can confirm that it works with long strings that aren't broken urls, I used "blahblahblah..." that spanned passed 2 line breaks and reproduced the issue.  Broken url scenario was the most user relevant scenario I could think of.
Flags: needinfo?(doliver)
This bug can be reproduced in a way on the first 1.4 build I have available using the steps provided in Comment 0 with a multi-line unbroken text string as noted in Comment 10, the entire phone crashes and then reboots.

APZ must be enabled in the dev menu to produce a crash, otherwise the email can be zoomed in and out without issue (This is true in the oldest builds as well as the latest).

First 1.4 Environmental Variables (Phone crashes after following repro steps with APZ enabled in the dev menu):
Device: Buri
BuildID: 20131216040201
Gaia: 358cd74fd2b2ef5d541f71a5d53d65d6a7335424
Gecko: f67feb33a974
Version: 29.0a1
Base Image: V1.2-device.cfg

At some point in February the results changed from the phone crashing to only the app crashing (as noted in the initial writeup), I'm still looking in to the window on when that change occurred.

Or since Comment 2 states that this issue does not occur at all in the latest 1.3 build, would it be more beneficial to look for a reverse regression window in that branch?
(In reply to J Zimbrick from comment #16)
> This bug can be reproduced in a way on the first 1.4 build I have available
> using the steps provided in Comment 0 with a multi-line unbroken text string
> as noted in Comment 10, the entire phone crashes and then reboots.

That's not the same bug as this issue. This issue deals with a process OOM, not a full phone crash.

> Or since Comment 2 states that this issue does not occur at all in the
> latest 1.3 build, would it be more beneficial to look for a reverse
> regression window in that branch?

No - we're looking for what regressed here in 1.4.
(In reply to J Zimbrick from comment #16)
> At some point in February the results changed from the phone crashing to
> only the app crashing (as noted in the initial writeup), I'm still looking
> in to the window on when that change occurred.

That's probably due to bug 965945 being fixed.
(In reply to npark from comment #9)
> Hmm, actually , in the final copy of 1.4 build, this does not reproduce
> either (with or without tiling):
> 
> Gaia      c03a6af9028c4b74a84b5a98085bbb0c07261175
> Gecko     https://hg.mozilla.org/releases/mozilla-aurora/rev/b07ecc057abf
> BuildID   20140319000200
> Version   30.0a2
> ro.build.version.incremental=eng.tclxa.20131223.163538
> ro.build.date=Mon Dec 23 16:36:04 CST 2013

In above version, on Buri, when i double tap on unbroken multiline string, it does not zoom in, but it does not crash either.  Tapping action does not cause any behavior.

I have the email as attachement to this bug.
Mozilla-Inbound Regression Window:

Last Working Environmental Variables (App crash): 
Device: Buri
BuildID: 20140312043011
Gaia: e61dc0019d9d6135d88ba15153c37f73a952567e
Gecko: 22d46873261d
Version: 30.0a1
Base Image: V1.2-device.cfg

First Broken Environmental Variables (OOM Force Close):
Device: Buri
BuildID: 20140312051710
Gaia: e61dc0019d9d6135d88ba15153c37f73a952567e
Gecko: 8b449c3a4caa
Version: 30.0a1
Base Image: V1.2-device.cfg

The Gaia is the same on both builds, pointing to this being a gecko issue.

Gecko Pushlog: http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=22d46873261d&tochange=8b449c3a4caa

Note: The build marked "Last Working" here actually produces an app crash with a crash log when following the repro steps, the build marked "First Broken" results in the OOM force close that this issue was logged for.

Also, in response to Comment 20, the string has to be longer than that to produce this issue. It should be displayed on several lines when viewed on the phone, we've generally just been copy-pasting something like "blahblahblah" several times in to an email while testing this issue.
(In reply to J Zimbrick from comment #21)
> Gecko Pushlog:
> http://hg.mozilla.org/integration/mozilla-inbound/
> pushloghtml?fromchange=22d46873261d&tochange=8b449c3a4caa

Pretty clearly bug 962629, cc'ing Chris.
Blocks: 962629
(In reply to Botond Ballo [:botond] from comment #22)
> (In reply to J Zimbrick from comment #21)
> > Gecko Pushlog:
> > http://hg.mozilla.org/integration/mozilla-inbound/
> > pushloghtml?fromchange=22d46873261d&tochange=8b449c3a4caa
> 
> Pretty clearly bug 962629, cc'ing Chris.

Not sure how bug 962629 affects this... It turned it from a phone crash to a process crash... Isn't that a good thing?

All that patch does is stop us from using gralloc surfaces above the maximum texture size (and fall back to shared memory in this case) - I'd say this is pretty firmly bug 957668, as previously identified.
(In reply to Chris Lord [:cwiiis] from comment #23)
> (In reply to Botond Ballo [:botond] from comment #22)
> > (In reply to J Zimbrick from comment #21)
> > > Gecko Pushlog:
> > > http://hg.mozilla.org/integration/mozilla-inbound/
> > > pushloghtml?fromchange=22d46873261d&tochange=8b449c3a4caa
> > 
> > Pretty clearly bug 962629, cc'ing Chris.
> 
> Not sure how bug 962629 affects this... It turned it from a phone crash to a
> process crash... Isn't that a good thing?
> 
> All that patch does is stop us from using gralloc surfaces above the maximum
> texture size (and fall back to shared memory in this case) - I'd say this is
> pretty firmly bug 957668, as previously identified.

Makes sense, thanks. Just wanted to make sure.
With bug 957668 (it just hit central this morning) I'm not able to reproduce this. The email attached in comment 20 was not useful because it is a text/plain email and doesn't seem to zoom. However I just sent myself a text/html email with a really long unbroken string to test and didn't see any crashing.

qawanted to confirm and dupe this bug to 957668 if it is fixed.
Keywords: qawanted
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #25)
> With bug 957668 (it just hit central this morning) I'm not able to reproduce
> this. The email attached in comment 20 was not useful because it is a
> text/plain email and doesn't seem to zoom. However I just sent myself a
> text/html email with a really long unbroken string to test and didn't see
> any crashing.
> 
> qawanted to confirm and dupe this bug to 957668 if it is fixed.

I was able to repro this issue on the latest 1.4 Buri build

Environmental Variables:
Device: Buri 1.4 MOZ
BuildID: 20140328000202
Gaia: e07a5915737727ba22c06f270ee7af6572f746f3
Gecko: 33e7fd745c1b
Version: 30.0a2
Firmware Version: V1.2-device.cfg
Keywords: qawanted
Can you provide the email you used to reproduce this?
Flags: needinfo?(jschmitt)
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #27)
> Can you provide the email you used to reproduce this?

I don't he's got the patch here - the gecko commit points to a changeset that landed at 3pm yesterday, but the aurora landing of bug 957668 landed at 9:46pm.
(In reply to Jason Smith [:jsmith] from comment #28)
> (In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #27)
> > Can you provide the email you used to reproduce this?
> 
> I don't he's got the patch here - the gecko commit points to a changeset
> that landed at 3pm yesterday, but the aurora landing of bug 957668 landed at
> 9:46pm.

don't think*
Let's retest this on Monday on 1.4. Can't test on trunk today due to the fact that trunk is broken on Firefox OS right now.
Keywords: qawanted
Flags: needinfo?(jschmitt)
Actually I misread the commits - the build today does include that patch.

Resetting needinfo to get the email in question.
Flags: needinfo?(jschmitt)
Keywords: qawanted
Attached file BrokenUrl_Email
I have uploaded a txt file called BrokenUrl_Email

This is the email I used to repro this issue.
Flags: needinfo?(jschmitt)
Just to clarify, in order to repro, copy the text from BrokenUrl_Email into the body of an email.
Ok, I can reproduce this by sending myself a text/html email with the content you attached in comment 32. The first double-tap seems to zoom in a lot, and the second double-tap sometimes crashes the email app. If it doesn't crash there then a third double-tap will do it.

I turned on APZ logging and it's not APZ that's requesting the repaint or zooming. The email app actually handles double-tap events itself and implements "zooming" using CSS transforms [1]. So what's likely happening is that the margins code is adding the margins onto a base rect that doesn't take the CSS transform into account, and then the transform gets applied on top of that and ends up creating a giant texture and OOMing.

[1] http://mxr.mozilla.org/gaia/source/apps/email/js/iframe_shims.js#337
tn, any ideas on how to deal with this? Seems like case we (or at least I) didn't consider for the margins change.
Flags: needinfo?(tnikkel)
I'll have to give it some thought.

I specifically made it a non-goal of my work on layer margins to consider transforms because there are assumptions about transforms baked deep into the code. For example LayerPixels aren't actually layer pixels when transforms are involved.
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #34)
> Ok, I can reproduce this by sending myself a text/html email with the
> content you attached in comment 32. The first double-tap seems to zoom in a
> lot, and the second double-tap sometimes crashes the email app. If it
> doesn't crash there then a third double-tap will do it.
> 
> I turned on APZ logging and it's not APZ that's requesting the repaint or
> zooming. The email app actually handles double-tap events itself and
> implements "zooming" using CSS transforms [1]. So what's likely happening is
> that the margins code is adding the margins onto a base rect that doesn't
> take the CSS transform into account, and then the transform gets applied on
> top of that and ends up creating a giant texture and OOMing.
> 
> [1] http://mxr.mozilla.org/gaia/source/apps/email/js/iframe_shims.js#337

Can somebody comment on whether we want the E-mail app to be doing this zooming itself, or if we should just leave it up to APZ to handle zooming?  Or is it a CSS case that APZ doesn't support?
It's something that APZ will have to support in general at some point, transforms are a web feature that are used in many places. Whether we can get away with changing the email app for now is an open question I think.
(In reply to Milan Sreckovic [:milan] (TPE timezone) from comment #37)
> Can somebody comment on whether we want the E-mail app to be doing this
> zooming itself, or if we should just leave it up to APZ to handle zooming? 
> Or is it a CSS case that APZ doesn't support?

We can't do this in the APZ without supporting subframe zooming, or pushing the email body into a separate process. There is some more discussion on this at bug 982888 comment 16 and bug 982888 comment 17.
Whiteboard: [c=memory p= s= u=1.4] dogfood1.4 [MemShrink] → [c=memory p= s= u=1.4] dogfood1.4 [MemShrink:P2]
So we can get the actual scale factor that FrameLayerBuilder has decided to render the layer at (including resolution and transfroms) from the ContainerParameters in BuildLayer. Then we can either use that to adjust things in RecordFrameMetrics or just record that information and use it in the display port calculation (depends what exactly we want to do).
Flags: needinfo?(tnikkel)
(In reply to Timothy Nikkel (:tn) from comment #40)
> So we can get the actual scale factor that FrameLayerBuilder has decided to
> render the layer at (including resolution and transfroms) from the
> ContainerParameters in BuildLayer. Then we can either use that to adjust
> things in RecordFrameMetrics or just record that information and use it in
> the display port calculation (depends what exactly we want to do).

We probably want to use it in the display port calculation so that we can keep tile boundary alignment(?)
My thinking is that the code in nsLayoutUtils::GetDisplayPort needs to use the cumulative ContainerParameters scale rather than the cumulative resolution, because that will include the CSS transform scales as well. We need to do this in order to prevent the crash because we don't get a chance to update the displayport margins between when CSS sets the scale and when the process crashes.

The worst case I can think of is that content JS sets a transform:scale(100000000) on an ancestor of the displayport'd element. Even if we set the (CSS) displayport base to 1x1 it'll get blown up too big. And if we set it to 0x0 then nothing gets painted.
You'll also want to take into account the container params scale when determining the root composition size, otherwise when we do calculate a new display port it will be too big.
Displayport margins should deal fine with a 0 by 0 sized base rect, giving you a reasonably sized non-zero displayport rect.
Kats, please reassign if this is better handled by somebody else, or you can't get to it.
Assignee: milan → bugmail.mozilla
I spent some time yesterday looking into this bug. After some discussion on IRC I was thinking we could pick up the CSS scale using nsLayoutUtils::GetTransformToAncestor and divide it out in either nsLayoutUtils::GetDisplayPort or in the DisplayportExceedsMaxTextureSize code. There turned out to be a few problems.

First of all, the DisplayportExceedsMaxTextureSize approach, which I preferred, suffers from bug 994501 and so is nontrivial to use without fixing that first.

The next problem is when I first load the email, the CSS scale is set to something tiny like 0.041. On the first double-tap it goes up to 1.0 and on the second double-tap is goes to 2.0. So this is actually something like a 50x magnification over the original view, but GetTransformToAncestor returns 2, and dividing that out from the displayport margins didn't prevent the OOM/crash. I'll have to analyze the composition bounds and computed displayport rect in more detail to figure out what's going on.
I spent some more time digging through this code and the values that get set. I discovered that in most cases it is actually the displayport base rect that is set to a large value (width =~ 6225 CSS pixels) and that results in the OOM and crash. The base rect is set from both APZCCallbackHelper.cpp and from nsSubDocumentFrame, and in both places it is *sometimes* set to the large value. When the email is displayed zoomed out or at zoom 1.0, what generally happens is that the large value is clobbered by a smaller value before repainting, and so there is no crash. At zoom 2.0 however the large value is not clobbered and so it crashes.

There are still some aspects about all this that are unclear to me, but I was able to fix the APZCCallbackHelper part so that it no longer sets a giant base rect. The nsSubDocumentFrame code [1] still has a giant dirty rect though and that still triggers the crash. I think the dirty rect clipping code may be missing something, not really sure. I can attach the patches that fix the APZCCH side of things.

[1] http://mxr.mozilla.org/mozilla-central/source/layout/generic/nsSubDocumentFrame.cpp?rev=a9e22794850e#386
Some more logging shows that the dirty rect is set to the giant value at http://mxr.mozilla.org/mozilla-central/source/layout/generic/nsFrame.cpp?rev=7be71c699b85#1940 - I will try implementing what the comment says to see if it fixes the problem.
Comment on attachment 8406371 [details] [diff] [review]
Part 1 - refactoring to reuse code

This is good except for the naming: this isn't actually the display port base rect. It's a rect that I don't think we really have a name for yet that we use kind of like the display port base rect to calculate the display port, and then the display port margins.
Comment on attachment 8406372 [details] [diff] [review]
Part 2 - Use the right base rect in APZCCH

Do we actually use the base rect set here for anything painting related? That would be a bug I think. Painting is careful to always set the base rect itself (it's the source of truth for the base rect) and then use the base rect. (Image visibility will use display port stuff, but it doesn't allocate surfaces, so it shouldn't matter.)

Anyways, I think I intended to only set the base rect from APZCCallback if it was not already set. I can't recall if the reason that didn't make it into the code is: I forgot to do it, I decided it wasn't important, or something more important.
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #47)
> There are still some aspects about all this that are unclear to me, but I
> was able to fix the APZCCallbackHelper part so that it no longer sets a
> giant base rect. The nsSubDocumentFrame code [1] still has a giant dirty
> rect though and that still triggers the crash. I think the dirty rect
> clipping code may be missing something, not really sure. I can attach the
> patches that fix the APZCCH side of things.
> 
> [1]
> http://mxr.mozilla.org/mozilla-central/source/layout/generic/
> nsSubDocumentFrame.cpp?rev=a9e22794850e#386

Before we use dirty at the base rect in subdocument frame I think we need to restrict it to the content area of the subdocument frame.
(In reply to Timothy Nikkel (:tn) from comment #51)
> This is good except for the naming: this isn't actually the display port
> base rect. It's a rect that I don't think we really have a name for yet that
> we use kind of like the display port base rect to calculate the display
> port, and then the display port margins.

Ok, I can come up with some other name for it.

(In reply to Timothy Nikkel (:tn) from comment #52)
> Do we actually use the base rect set here for anything painting related?

I think so, yeah. I added logging to UpdateSubFrame when it sets the base rect, more logging to SetDisplayPortBase to catch any other calls to set the base rect, and a dump in RecordFrameMetrics which I took to indicate the end of a paint. I was seeing the UpdateSubFrame code get triggered, and then without any other calls to set the base rect (for the same content object) it would go to RecordFrameMetrics, where the displayport would be massive.

> That would be a bug I think. Painting is careful to always set the base rect
> itself (it's the source of truth for the base rect) and then use the base
> rect. (Image visibility will use display port stuff, but it doesn't allocate
> surfaces, so it shouldn't matter.)
> 
> Anyways, I think I intended to only set the base rect from APZCCallback if
> it was not already set. I can't recall if the reason that didn't make it
> into the code is: I forgot to do it, I decided it wasn't important, or
> something more important.

Ok, I can double-check my findings and file a separate bug for this.

(In reply to Timothy Nikkel (:tn) from comment #53)
> Before we use dirty at the base rect in subdocument frame I think we need to
> restrict it to the content area of the subdocument frame.

Is this not the same as what it says in the code comment referenced in comment 50? With mattwoodrow's guidance on IRC I implemented what that says and it does seem to have solved the problem. Will attach patch in a sec.
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #54)
> (In reply to Timothy Nikkel (:tn) from comment #52)
> > Do we actually use the base rect set here for anything painting related?
> 
> I think so, yeah. I added logging to UpdateSubFrame when it sets the base
> rect, more logging to SetDisplayPortBase to catch any other calls to set the
> base rect, and a dump in RecordFrameMetrics which I took to indicate the end
> of a paint. I was seeing the UpdateSubFrame code get triggered, and then
> without any other calls to set the base rect (for the same content object)
> it would go to RecordFrameMetrics, where the displayport would be massive.

That sounds like a problem that should be investigated (not necessarily by you).

> (In reply to Timothy Nikkel (:tn) from comment #53)
> > Before we use dirty at the base rect in subdocument frame I think we need to
> > restrict it to the content area of the subdocument frame.
> 
> Is this not the same as what it says in the code comment referenced in
> comment 50? With mattwoodrow's guidance on IRC I implemented what that says
> and it does seem to have solved the problem. Will attach patch in a sec.

No, I think they are separate issues, but perhaps not affecting this situation then.
(In reply to Timothy Nikkel (:tn) from comment #56)
> (In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #54)
> > (In reply to Timothy Nikkel (:tn) from comment #52)
> > > Do we actually use the base rect set here for anything painting related?
> > 
> > I think so, yeah. I added logging to UpdateSubFrame when it sets the base
> > rect, more logging to SetDisplayPortBase to catch any other calls to set the
> > base rect, and a dump in RecordFrameMetrics which I took to indicate the end
> > of a paint. I was seeing the UpdateSubFrame code get triggered, and then
> > without any other calls to set the base rect (for the same content object)
> > it would go to RecordFrameMetrics, where the displayport would be massive.
> 
> That sounds like a problem that should be investigated (not necessarily by
> you).

Actually I was wrong, it does seem to always get overwritten by layout. That is, the part 3 patch alone is sufficient to fix this bug.
Comment on attachment 8406372 [details] [diff] [review]
Part 2 - Use the right base rect in APZCCH

Obsoleting this patch as per above comment.
Attachment #8406372 - Attachment is obsolete: true
Comment on attachment 8406371 [details] [diff] [review]
Part 1 - refactoring to reuse code

Obsoleting this one too since it's not needed for the actual fix. I'll move it to a new bug or something since it's still nice to have.
Attachment #8406371 - Attachment is obsolete: true
Attachment #8406445 - Attachment description: Part 3 - Ensure the dirty rect is computed properly → Ensure the dirty rect is computed properly when CSS transforms are in effect
Attachment #8406445 - Flags: review?(matt.woodrow)
Comment on attachment 8406445 [details] [diff] [review]
Ensure the dirty rect is computed properly when CSS transforms are in effect

Review of attachment 8406445 [details] [diff] [review]:
-----------------------------------------------------------------

::: layout/generic/nsFrame.cpp
@@ +1943,5 @@
> +        if (nsDisplayTransform::UntransformRect(dirtyRect, overflow, this, offset, &untransformedDirtyRect)) {
> +          dirtyRect = untransformedDirtyRect;
> +        } else {
> +          NS_WARNING("Unable to untransform dirty rect!");
> +          // Dunno what to do here...

This should only happen if the transform is singular, in which case nothing would be visible.
Attachment #8406445 - Flags: review?(matt.woodrow) → review+
The try push [1] was pretty sad, with some sets of reftests failing consistently across all platforms. There were also some crashtest failures with the crashing stack looking like the attached. This seems to imply that GetTransform() and GetResultingTransformMatrix() don't really do the same thing, and that using the latter instead of the former is a bad idea. I've started another try push [2] where I leave UntransformVisibleRect as it was before and just add the new function independently. Hopefully that will make these errors go away, but if you have other ideas please let me know because I'm out of my depth in this code.

[1] https://tbpl.mozilla.org/?tree=Try&rev=041e2e5c3008
[2] https://tbpl.mozilla.org/?tree=Try&rev=af8a7f75f459
Flags: needinfo?(matt.woodrow)
Keywords: footprint
Comment on attachment 8406445 [details] [diff] [review]
Ensure the dirty rect is computed properly when CSS transforms are in effect

>+        dirtyRect = dirtyRect.Intersect(overflow);

trans and dirtyRect are in the transformed coord space, which is just the normal coord system. overflow is the overflow rect relative to self (of the frame with a transform), which means it doesn't take into account the transform on that frame (the other overflow getters of the frame do, so that "RelativeToSelf" part is very important), which means it is in the untransformed space. So not in the space space as trans and dirtyRect. You want the intersection of dirtyRect and trans here I think.
Thanks! that seemed to fix the reftests - https://tbpl.mozilla.org/?tree=Try&rev=39efb05a0647

I'll wait on a few retriggers just to make sure and test the updated fix on device as well.
Flags: needinfo?(matt.woodrow)
Ok, verified this fixes the original problem on device. Since it's a fairly small change I'm just going to carry the original r+.
Attachment #8406445 - Attachment is obsolete: true
Attachment #8409653 - Flags: review+
Comment on attachment 8409653 [details] [diff] [review]
Ensure the dirty rect is computed properly when CSS transforms are in effect

Review of attachment 8409653 [details] [diff] [review]:
-----------------------------------------------------------------

Actually since the original r+'d version I also removed the refactoring of UntransformVisibleRect so maybe this needs re-review.
Attachment #8409653 - Flags: review+ → review?(matt.woodrow)
Comment on attachment 8409653 [details] [diff] [review]
Ensure the dirty rect is computed properly when CSS transforms are in effect

Review of attachment 8409653 [details] [diff] [review]:
-----------------------------------------------------------------

::: layout/generic/nsFrame.cpp
@@ +1939,5 @@
>        dirtyRect += offset;
>        if (dirtyRect.Intersects(trans)) {
> +        dirtyRect = dirtyRect.Intersect(trans);
> +        nsRect untransformedDirtyRect;
> +        if (nsDisplayTransform::UntransformRect(dirtyRect, overflow, this, offset, &untransformedDirtyRect)) {

We shouldn't need to call TransformRect/Intersect at all any more, we should just call UntransformRect.

The call to gfx3DMatrix::UntransformBounds handles both of these things interally.
(In reply to Matt Woodrow (:mattwoodrow) from comment #66)
> We shouldn't need to call TransformRect/Intersect at all any more, we should
> just call UntransformRect.
> 
> The call to gfx3DMatrix::UntransformBounds handles both of these things
> interally.

I'm not following this. From a code inspection it doesn't look like removing the TransformRect and Intersect calls will produce the same result. And I verified that they produce different results by leaving in both code paths and comparing the resulting dirtyRect values. I'll try to find you on IRC so we can discuss this.
Updated as per IRC conversation. The dirty rect does come out a little bigger in some cases but matt says that should be fine. The difference in size appears to be due to the Is2D() shortcut in gfx3DMatrix::UntransformBounds, which doesn't do an intersection with the child bounds.

Try push for this patch at https://tbpl.mozilla.org/?tree=Try&rev=29ab783293b7
Attachment #8409653 - Attachment is obsolete: true
Attachment #8409653 - Flags: review?(matt.woodrow)
Attachment #8410546 - Flags: review?(matt.woodrow)
Attachment #8410546 - Flags: review?(matt.woodrow) → review+
(In reply to Timothy Nikkel (:tn) from comment #53)
> (In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #47)
> > There are still some aspects about all this that are unclear to me, but I
> > was able to fix the APZCCallbackHelper part so that it no longer sets a
> > giant base rect. The nsSubDocumentFrame code [1] still has a giant dirty
> > rect though and that still triggers the crash. I think the dirty rect
> > clipping code may be missing something, not really sure. I can attach the
> > patches that fix the APZCCH side of things.
> > 
> > [1]
> > http://mxr.mozilla.org/mozilla-central/source/layout/generic/
> > nsSubDocumentFrame.cpp?rev=a9e22794850e#386
> 
> Before we use dirty at the base rect in subdocument frame I think we need to
> restrict it to the content area of the subdocument frame.

Filed bug 1000322 for this.
(In reply to Timothy Nikkel (:tn) from comment #52)
> Anyways, I think I intended to only set the base rect from APZCCallback if
> it was not already set. I can't recall if the reason that didn't make it
> into the code is: I forgot to do it, I decided it wasn't important, or
> something more important.

I filed bug 1000350 for this.
https://hg.mozilla.org/mozilla-central/rev/5ff928574d65
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla31
Verified fix on 1.4 branch, using the same testcase from bug 996991.  woohoo!

Gaia      d23e479e8a4ce0bc620acb2d7e2f82801aa4d0ea
Gecko     https://hg.mozilla.org/releases/mozilla-aurora/rev/36f67ce46855
BuildID   20140428000206
Version   30.0a2
ro.build.version.incremental=76
ro.build.date=Mon Apr 14 14:02:50 CST 2014
You need to log in before you can comment on or make changes to this bug.