947523 - [B2G][Browser] The phone will crash if the user keeps zooming in on MoPad webpage

KTucker [:KTucker][Inactive 3/4/2016]

Reporter

Description

•

12 years ago

Attached file Zoomingincrashlog.txt — Details

Description: If the user keeps zooming in on an etherpad webpage, the phone will crash. Repro Steps: 1) Updated Buri to Build ID: 20131206040203 2) Tap on the "Browser" icon. 3) Navigate to https://etherpad.mozilla.org 4) Tap on "Create new public pad". 5) Keep zooming in on the webpage as far as the user can go. If the crash doesn't occur keeping zooming in on the webpage while changing the phone's orientation. Actual: The device will either crash and reset or the browser will crash. Expected: The user can zoom in without a crash occurring. Environmental Variables Device: Buri v 1.3.0 Mozilla RIL Build ID: 20131206040203 Gecko: http://hg.mozilla.org/mozilla-central/rev/1401e4b394ad Gaia: 8fca2ca67e8a6022fe6ed8cb576e5d59dfb5237f Platform Version: 28.0a1 Notes: Repro frequency: 100% See attached: video clip, logcat

KTucker [:KTucker][Inactive 3/4/2016]

Reporter

Updated

•

12 years ago

status-firefox28: --- → affected

Keywords: regression, regressionwindow-wanted

Whiteboard: dogfood1.2

KTucker [:KTucker][Inactive 3/4/2016]

Reporter

Comment 1

•

12 years ago

This issue does not occur on Leo v 1.1.0 COM RIL Environmental Variables Device: Leo v 1.1.0 COM RIL Build ID: 20131024041202 Gecko: http://hg.mozilla.org/releases/mozilla-b2g18/rev/e4cb5a852e3d Gaia: 39b0203fa9809052c8c4d4332fef03bbaf0426fc Platform Version: 18.1 RIL Version: 01.01.00.019.264 The browser crash does not occur when zooming in on the page.

KTucker [:KTucker][Inactive 3/4/2016]

Reporter

Comment 2

•

12 years ago

Attached video BrowserCrash.mp4 — Details

KTucker [:KTucker][Inactive 3/4/2016]

Reporter

Comment 3

•

12 years ago

This issue occurs on the Buri v 1.2.0 COM RIL Environmental Variables Device: Buri v 1.2.0 COM RIL Build ID: 20131204004003 Gecko: http://hg.mozilla.org/releases/mozilla-b2g26_v1_2/rev/758f3fb32dda Gaia: 8d762f3376318fd6be390432db750ae4904c9ab6 Platform Version: 26.0 RIL Version: 01.02.00.019.102 The browser or device will crash when zooming in on the MoPad webpage.

Jason Smith [:jsmith]

Comment 4

•

12 years ago

Uh oh. That's an OS restart crash, which is really bad. Can we get a crash report ID for this bug?

blocking-b2g: --- → koi?

Keywords: regressionwindow-wanted → crash, qawanted, reproducible

Whiteboard: dogfood1.2 → dogfood1.2 [b2g-crash] [osrestartcrash]

Jason Smith [:jsmith]

Updated

•

12 years ago

Severity: normal → critical

Component: Gaia::Browser → General

gbennett

Updated

•

12 years ago

QA Contact: gbennett

gbennett

Comment 5

•

12 years ago

Attached file 947523_dmesg — Details

During the few hours of reproing this issue I was only able to get the OS crash three times and there was no dump in the Crash Reports folder on the phone. I talked to Kevin and he said that it could be an OOM issue so I snagged a dmesg. Hopefully that should help you guys out. I was able to get the OS crash on today's 1.2 and 1.3, but the attached dmesg is from the build below. Environmental Variables: Device: Buri 1.3 mozRIL BuildID: 20131209053402 Gaia: 1d45d1dc3201059d5c8f2efdeb92c04576d8e161 Gecko: 9f12a9fab080 Version: 28.0a1 Firmware Version: 20131115

gbennett

Updated

•

12 years ago

Keywords: qawanted

Jason Smith [:jsmith]

Updated

•

12 years ago

Keywords: crash

Whiteboard: dogfood1.2 [b2g-crash] [osrestartcrash] → dogfood1.2 [osrestartoom]

Jason Smith [:jsmith]

Comment 6

•

12 years ago

Does this reproduce on a 1.1 Buri device?

Keywords: qawanted

gbennett

Comment 7

•

12 years ago

This does not repro on a 1.1 Buri. The same functionality is seen in Leo 1.1 as it simply doesn't let you zoom in far enough and keeps rubber-banding back to to a set zoom if you try to keep pinching to zoom in. Environmental Variables: Device: Buri 1.1 mozRIL BuildID: 20131209041202 Gaia: 6ff3a607f873320d00cb036fa76117f6fadd010f Gecko: 05117f42088f Version: 18.0 Firmware Version: 20131115

Keywords: qawanted

Kai-Chih Hu [:khu]

Comment 8

•

12 years ago

Just wondering, should we also add "crash" in the keyword field?

Jason Smith [:jsmith]

Comment 9

•

12 years ago

(In reply to Kevin Hu [:khu] from comment #8) > Just wondering, should we also add "crash" in the keyword field? Nope - this is an OOM of the parent process, not a crash.

Keywords: perf, regressionwindow-wanted

Whiteboard: dogfood1.2 [osrestartoom] → dogfood1.2 [osrestartoom] [MemShrink]

Nicholas Nethercote [inactive]

Comment 10

•

12 years ago

<6>[ 434.353606] [ pid ] uid tgid total_vm rss cpu oom_adj oom_score_adj name <6>[ 434.353648] [ 98] 0 98 78 41 0 -16 -941 ueventd <6>[ 434.353668] [ 128] 1000 128 213 15 0 -16 -941 servicemanager <6>[ 434.353686] [ 129] 0 129 1009 45 0 -16 -941 vold <6>[ 434.353703] [ 132] 0 132 834 29 0 -16 -941 fakeperm <6>[ 434.353719] [ 136] 0 136 76681 39295 0 0 0 b2g <6>[ 434.353738] [ 137] 0 137 1958 67 0 -16 -941 netd <6>[ 434.353754] [ 138] 0 138 180 14 0 -16 -941 debuggerd <6>[ 434.353773] [ 139] 1001 139 5093 370 0 -16 -941 rild <6>[ 434.353789] [ 140] 1013 140 4665 340 0 -16 -941 mediaserver <6>[ 434.353808] [ 141] 1002 141 338 27 0 -16 -941 dbus-daemon <6>[ 434.353826] [ 142] 1017 142 437 35 0 -16 -941 keystore <6>[ 434.353843] [ 152] 1000 152 481 23 0 -16 -941 mm-pp-daemon <6>[ 434.353861] [ 155] 1000 155 763 76 0 -16 -941 mm-qcamera-daem <6>[ 434.353879] [ 160] 0 160 297 27 0 -16 -941 gpu_dcvsd <6>[ 434.353896] [ 161] 0 161 863 2 0 -16 -941 adbd <6>[ 434.353914] [ 187] 0 187 218 40 0 -16 -941 location-mq <6>[ 434.353933] [ 188] 1001 188 1508 58 0 -16 -941 qmuxd <6>[ 434.353949] [ 196] 1001 196 1549 72 0 -16 -941 netmgrd <6>[ 434.353968] [ 197] 0 197 3651 73 0 -16 -941 xtwifi-inet-age <6>[ 434.353984] [ 205] 2000 205 198 16 0 -16 -941 sh <6>[ 434.354003] [ 208] 0 208 3046 83 0 -16 -941 xtwifi-client <6>[ 434.354021] [ 408] 1007 408 176 14 0 -16 -941 logwrapper <6>[ 434.354039] [ 409] 1010 409 662 78 0 -16 -941 wpa_supplicant <6>[ 434.354058] [ 421] 1000 421 337 27 0 -16 -941 qosmgr <3>[ 434.354074] Out of memory: Kill process 136 (b2g) score 827 or sacrifice child <3>[ 434.354268] Killed process 136 (b2g) total-vm:306724kB, anon-rss:52324kB, file-rss:104856kB Near the top it says total_vm is 76681 KB, but the "Killed process" line says 306724 KB...

Nicholas Nethercote [inactive]

Updated

•

12 years ago

Whiteboard: dogfood1.2 [osrestartoom] [MemShrink] → dogfood1.2 [osrestartoom] [MemShrink:P2]

Matthew Vaughan

Comment 11

•

12 years ago

This issue looks to have started reproducing on the 11/08 1.2 Buri build. On builds before 11/08, the tab would need to be reloaded every so often, but B2G would remain stable. - Works - Environmental Variables: Device: Buri v1.2 MOZ RIL BuildID: 20131107004003 Gaia: 590eb598aacf1e2136b2b6aca5c3124557a365ca Gecko: 26f1e160e696 Version: 26.0 RIL Version: 01.02.00.019.102 Firmware Version: 20131115 - Broken - Environmental Variables: Device: Buri v1.2 MOZ RIL BuildID: 20131108004004 Gaia: 4cf40fb30c7b8380ea83ed0d4efe043b8b81737f Gecko: a886c641a306 Version: 26.0 Firmware Version: 20131115

Keywords: regressionwindow-wanted

Jason Smith [:jsmith]

Comment 12

•

12 years ago

http://hg.mozilla.org/releases/mozilla-b2g26_v1_2/pushloghtml?fromchange=26f1e160e696&tochange=a886c641a306

Jason Smith [:jsmith]

Comment 13

•

12 years ago

The regression here is likely on the gecko side, as this involves killing the parent process. A suspected patch for causing this regression is bug 927633 because is involving IPC. Can we get a regression range on 1.3 as well? That will help isolate with bug caused this. Mason - Any ideas if bug 927633 caused this?

Flags: needinfo?(mchang)

Keywords: regressionwindow-wanted

Jason Smith [:jsmith]

Comment 14

•

12 years ago

blocking+ - we should never cause the full phone to restart. Also marking security sensitive because a web page should be able to cause the full phone to restart, which could be exploited by a web developer to force phone restarts.

Group: core-security

blocking-b2g: koi? → koi+

Andrew McCreight [:mccr8]

Updated

•

12 years ago

Keywords: csectype-dos, sec-low

Matthew Vaughan

Comment 15

•

12 years ago

Something similar to this issue looks to have started on the first working 1.3 build from 9/19. The OS will crash and display the crash report page on each build up to 9/23. From 9/24 and on, there is no crash report screen displayed and appears to instead be an OOM issue... exactly as specified in comment 5 and comment 9. There are builds between 9/24 and 12/11 where it is really difficult to get a repro and instead the Browser tab will need to be reloaded since the webpage becomes unresponsive many times. I am making the window be between 9/23 and 9/24 since that is when it looks to have become and OOM issue which is what this bug is about. If the crash is considered relevant to this bug, then it occurred on the first 1.3 build from 9/19. - "Works" (OS crashes) - Environmental Variables: Device: Buri v1.3 MOZ RIL BuildID: 20130923040214 Gaia: 3408cc3f6b190c8cd31832fbb8cd2ae571041f29 Gecko: f97307cb4c95 Version: 27.0a1 Firmware Version: 20131115 - Broken (OS restarts with an OOM) - Environmental Variables: Device: Buri v1.3 MOZ RIL BuildID: 20130924040201 Gaia: a22ba4a3a9efd99f94adf9ece8a2b391d4df295b Gecko: 1fda74e33e06 Version: 27.0a1 Firmware Version: 20131115

Keywords: regressionwindow-wanted

Mason Chang [Inactive] [:mchang]

Comment 16

•

12 years ago

I just tried the broken gaia and gecko revisions from comment 15. I was able to reproduce the crash with and without bug 927633. The patch also reduced memory usage by 1.5 mb per process, so if we're getting an OOM, it should just happen sooner without bug 927633.

Flags: needinfo?(mchang)

Jason Smith [:jsmith]

Comment 17

•

12 years ago

Andrew - Can you find someone to look into this full phone OOM?

Flags: needinfo?(overholt)

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 18

•

12 years ago

Denial of service is not a security issue. Especially when user interaction is required to trigger the bug.

Group: core-security

Andrew Overholt [:overholt]

Comment 19

•

12 years ago

Kyle suggests testing this with skiagl off to narrow out that as a cause => qawanted. Dale, with Ben out can you tell us how the gaia browser zooming functionality is implemented? Do you think it could be triggering a gecko crash with graphics changes around the beginning of September?

Flags: needinfo?(overholt) → needinfo?(dale)

Keywords: qawanted

Jason Smith [:jsmith]

Comment 20

•

12 years ago

(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #18) > Denial of service is not a security issue. Especially when user interaction > is required to trigger the bug. I disagree with this. DOSs are definitely security issues if it's outside of the user's control. I'm not sure if this particular bug is outside of the user's control however, as it's dependent whether the root cause can happen in other user scenarios that are triggered by visiting a malicious website/app aiming to reboot the phone. (In reply to Andrew Overholt [:overholt] from comment #19) > Kyle suggests testing this with skiagl off to narrow out that as a cause => > qawanted. I'm doubtful this is related to Skia GL. This is OOMing the parent process, which shouldn't technically ever be possible on the phone with anything driven by untrusted web content. This is likely a process management regression, which is why this why filed in IPC.

Milan Sreckovic [:milan] (needinfo for best results)

Comment 21

•

12 years ago

(In reply to Jason Smith [:jsmith] from comment #20) > (In reply to Andrew Overholt [:overholt] from comment #19) > > Kyle suggests testing this with skiagl off to narrow out that as a cause => > > qawanted. > > I'm doubtful this is related to Skia GL. This is OOMing the parent process, > which shouldn't technically ever be possible on the phone with anything > driven by untrusted web content. This is likely a process management > regression, which is why this why filed in IPC. If you do want to rule out the extra memory usage due to GL acceleration, just set: pref("gfx.canvas.azure.accelerated", false);

Dale Harvey (:daleharvey)

Comment 22

•

12 years ago

The browser doesnt do any zooming, its purely implemented inside Gecko/APZC, the graphics changes are very likely to effect it

Flags: needinfo?(dale)

Preeti Raghunath(:Preeti)

Comment 23

•

12 years ago

Milan Please have this assigned. We would like to have a patch for the same by 12/20.

Flags: needinfo?(milan)

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 24

•

12 years ago

Why does this block? Do we expect that users will commonly try to zoom in like this?

Jason Smith [:jsmith]

Comment 25

•

12 years ago

(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #24) > Why does this block? Do we expect that users will commonly try to zoom in > like this? I think it's not the likelihood that's caused to be marked as a blocker - instead, it was marked as a blocker because it should never be possible to cause a full reboot of the phone. During 1.01, a lot our partners were pretty adamant on having a no known full phone crashes on release policy, as that ensures a user never sees an unexpected reboot of the phone to think that something is fundamentally broken with the phone.

Jason Smith [:jsmith]

Comment 26

•

12 years ago

(In reply to Jason Smith [:jsmith] from comment #25) > (In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #24) > > Why does this block? Do we expect that users will commonly try to zoom in > > like this? > > I think it's not the likelihood that's caused to be marked as a blocker - > instead, it was marked as a blocker because it should never be possible to > cause a full reboot of the phone. During 1.01, a lot our partners were > pretty adamant on having a no known full phone crashes on release policy, as > that ensures a user never sees an unexpected reboot of the phone to think > that something is fundamentally broken with the phone. Another way to explain this - the blocker here is the full phone reboot, not the fact that we're OOMing in the STR in general. If we fixed this bug by causing the OOM to happen in the child process, rather than the parent process, then that would fix the blocking factor on this bug.

Sarah Parsons

Comment 27

•

12 years ago

I tried to reproduce this issue after setting pref("gfx.canvas.azure.accelerated", false); and I could not reproduce this issue on the latest Buri 1.3 Build ID: 20131217004001 I also checked without setting the preference on the latest Buri 1.3 and could easily reproduce the bug. Gaia dca0a3dcf062ce3e422a9c56d141c14543c816fb SourceStamp 1f7db4cc788e BuildID 20131217004001 Version 28.0a2

Keywords: qawanted

Jason Smith [:jsmith]

Comment 28

•

12 years ago

(In reply to Sarah Parsons from comment #27) > I tried to reproduce this issue after setting > pref("gfx.canvas.azure.accelerated", false); and I could not reproduce this > issue on the latest Buri 1.3 Build ID: 20131217004001 > > I also checked without setting the preference on the latest Buri 1.3 and > could easily reproduce the bug. > > Gaia dca0a3dcf062ce3e422a9c56d141c14543c816fb > SourceStamp 1f7db4cc788e > BuildID 20131217004001 > Version 28.0a2 Guess I wasn't right - the OOM is being caused by Skia GL. Still puzzled why the OOM is hitting the parent process rather than the child process.

Jason Smith [:jsmith]

Updated

•

12 years ago

Component: General → Graphics

Product: Firefox OS → Core

Jason Smith [:jsmith]

Comment 29

•

12 years ago

There's only two gfx bugs in the regression range here: * bug 844819 * bug 934886 Which one caused this regression? Maybe bug 934886? Peter - What do you think?

Flags: needinfo?(pchang)

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 30

•

12 years ago

(In reply to Jason Smith [:jsmith] from comment #28) > Guess I wasn't right - the OOM is being caused by Skia GL. Still puzzled why > the OOM is hitting the parent process rather than the child process. Perhaps because the browser app runs in the parent process?

Jason Smith [:jsmith]

Comment 31

•

12 years ago

(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #30) > (In reply to Jason Smith [:jsmith] from comment #28) > > Guess I wasn't right - the OOM is being caused by Skia GL. Still puzzled why > > the OOM is hitting the parent process rather than the child process. > > Perhaps because the browser app runs in the parent process? Well right, but I thought each browser tab is supposed to run in a separate child process. So that means if the child process runs out of memory, then the tab should get killed with the sad face error page.

Peter Chang[:pchang]

Comment 32

•

12 years ago

(In reply to Jason Smith [:jsmith] from comment #29) > There's only two gfx bugs in the regression range here: > > * bug 844819 > * bug 934886 > > Which one caused this regression? Maybe bug 934886? > > Peter - What do you think? bug 934886 is trying to fix graphic artifact when taking screenshot. I didn't see the related function called (GetThebesSurfaceForDrawTarget) when I zoom in/out the browser pages.

Flags: needinfo?(pchang)

Milan Sreckovic [:milan] (needinfo for best results)

Comment 33

•

12 years ago

If this is "just" OOM, then anything that causes us to use more memory would cause the regression. The memory used is no different than that used by non-graphics code.

Flags: needinfo?(milan)

Dave Huseby [:huseby]

Updated

•

12 years ago

Whiteboard: dogfood1.2 [osrestartoom] [MemShrink:P2] → dogfood1.2 [osrestartoom] [MemShrink:P2][c=memory p= s= u=1.2]

Ben Kelly [:bkelly, not reviewing]

Comment 34

•

12 years ago

I will investigate why the parent process is getting killed and if there is anything we can do to address it.

Assignee: nobody → bkelly

Status: NEW → ASSIGNED

Priority: -- → P1

Whiteboard: dogfood1.2 [osrestartoom] [MemShrink:P2][c=memory p= s= u=1.2] → dogfood1.2 [osrestartoom] [MemShrink:P2][u=memory p=5 s= u=1.2]

Ben Kelly [:bkelly, not reviewing]

Comment 35

•

12 years ago

I've reproduced the browser crash. I zoomed in a fair amount and ran b2g-info: | megabytes | NAME PID NICE USS PSS RSS VSIZE OOM_ADJ USER b2g 1531 0 49.5 63.2 77.0 208.1 0 root Browser 1975 1 23.2 36.9 50.6 104.1 2 app_1975 System memory info: Total 179.7 MB Used - cache 94.6 MB B2G procs (PSS) 100.1 MB Non-B2G procs -5.4 MB Free + cache 85.1 MB Free 52.4 MB Cache 32.6 MB I then zoomed just a little bit more and the browser tab was killed. I got the "oops, this is embarrassing" dialog. The logcat indicated that "memory pressure" was over, so that suggests it was an OOM. I find it interesting that just a tiny bit of zooming could cause us to consume over 50MB of free memory, though. I will continue investigating tomorrow.

Ben Kelly [:bkelly, not reviewing]

Comment 36

•

12 years ago

Adding instrumentation I can see that zooming results in extremely large gralloc buffers getting created. For example, with a zoom of 3.75 in APZC I see a gralloc buffer request for 2238x13249. This asks the system for ~110MB of mlocked memory which ends up killing b2g altogether. Sometimes we get moderately smaller requests for gralloc buffers around ~30MB which only end up killing the browser tab. I'm trying to trace where this size is getting calculated. Are we zooming the entire layer and trying to alloc a buffer for the entire thing? Even if that is the case it seems the zoom is being over-accumulated over time or something.

Milan Sreckovic [:milan] (needinfo for best results)

Comment 37

•

12 years ago

(In reply to Ben Kelly [:bkelly] from comment #36) > Adding instrumentation I can see that zooming results in extremely large > gralloc buffers getting created. For example, with a zoom of 3.75 in APZC I > see a gralloc buffer request for 2238x13249. This asks the system for > ~110MB of mlocked memory... Let's get the explanation - going a bit trigger happy on the ni...

Flags: needinfo?(sotaro.ikeda.g)

Flags: needinfo?(jmuizelaar)

Flags: needinfo?(bugmail.mozilla)

Flags: needinfo?(bgirard)

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 38

•

12 years ago

The gralloc buffer request is probably coming from layout/gfx code as it tries to rasterize the requested area. The "requested area" is going to be a function of the resolution and the displayport. In 1.2 the code that sets these values are in TabChild.cpp, at [1], [2] and [3]. We should verify that as the resolution increases the displayport size (in CSS pixels) decreases. [1] http://git.mozilla.org/?p=releases/gecko.git;a=blob;f=dom/ipc/TabChild.cpp;h=8325a51b1091cc8c6a162f139e3d0d860b4ca969;hb=cd74e4ad7d9dd3d699b13f7262ee898e4cf3f628#l1639 [2] http://git.mozilla.org/?p=releases/gecko.git;a=blob;f=dom/ipc/TabChild.cpp;h=8325a51b1091cc8c6a162f139e3d0d860b4ca969;hb=cd74e4ad7d9dd3d699b13f7262ee898e4cf3f628#l1682 [3] http://git.mozilla.org/?p=releases/gecko.git;a=blob;f=dom/ipc/TabChild.cpp;h=8325a51b1091cc8c6a162f139e3d0d860b4ca969;hb=cd74e4ad7d9dd3d699b13f7262ee898e4cf3f628#l1648

Flags: needinfo?(bugmail.mozilla)

Ben Kelly [:bkelly, not reviewing]

Comment 39

•

12 years ago

It appears that the display port being set from ProcessUpdateFrame() shrinks with the scale, but not when its set from ProcessUpdateSubFrame(). Here is my instrumentation while zooming the etherpad: I/Gecko (12250): ### ### TabChild::ProcessUpdateSubFrame() calling SetDisplayPortForElement() width:598.000000 height:2525.000000 I/Gecko (12250): ### ### TabChild::ProcessUpdateFrame() calling SetResolution() scale:0.339769 I/Gecko (12250): ### ### TabChild::ProcessUpdateFrame() calling SetDisplayPortForElement() width:980.000000 height:1117.816650 I/Gecko (12250): ### ### TabChild::ProcessUpdateFrame() calling SetResolution() scale:0.466176 I/Gecko (12250): ### ### TabChild::ProcessUpdateFrame() calling SetDisplayPortForElement() width:980.000000 height:1117.816650 I/Gecko (12250): ### ### TabChild::ProcessUpdateFrame() calling SetResolution() scale:0.466176 I/Gecko (12250): ### ### TabChild::ProcessUpdateFrame() calling SetDisplayPortForElement() width:980.000000 height:1117.816650 I/Gecko (12250): ### ### TabChild::ProcessUpdateFrame() calling SetResolution() scale:1.090141 I/Gecko (12250): ### ### TabChild::ProcessUpdateFrame() calling SetDisplayPortForElement() width:439.500000 height:835.000000 I/Gecko (12250): ### ### TabChild::ProcessUpdateSubFrame() calling SetDisplayPortForElement() width:598.000000 height:2525.000000 I/Gecko (12250): ### ### TabChild::ProcessUpdateSubFrame() calling SetDisplayPortForElement() width:598.000000 height:2525.000000 I/Gecko (12250): ### ### TabChild::ProcessUpdateFrame() calling SetResolution() scale:2.474219 I/Gecko (12250): ### ### TabChild::ProcessUpdateFrame() calling SetDisplayPortForElement() width:193.500000 height:367.500000 I also started to instrument the ThebesLayerData::Accumulate method and got this in my last run: I/Gecko (12250): ### ### ThebesLayerData::Accumulate() OR mVisibleRegion:0/0 aVisibleRect:0/0 I/Gecko (12250): ### ### ThebesLayerData::Accumulate() OR mVisibleRegion:0/0 aVisibleRect:0/0 I/Gecko (12250): ### ### ThebesLayerData::Accumulate() OR mVisibleRegion:0/0 aVisibleRect:1408/612 8 I/Gecko (12250): ### ### ThebesLayerData::Accumulate() OR mVisibleRegion:1408/6128 aVisibleRect:13 88/6126 I/Gecko (12250): ### ### ThebesLayerData::Accumulate() OR mVisibleRegion:1408/6128 aVisibleRect:69 /6128 I/Gecko (12250): ### ### ContainerState::PopThebesLayer() calling SetVisibleRegion() width:1477 he ight:6128 I still can't map the height of 6128 in this case back to the display port values.

Ben Kelly [:bkelly, not reviewing]

Comment 40

•

12 years ago

Oh... the display port of 598x2525 in ProcessUpdateSubFrame() does map directly to the visible region if you multiply it by the scale: 598x2525 * 2.474219 = 1480x6247

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 41

•

12 years ago

Well that's not good. Seems like a sub-apzc bug.

Flags: needinfo?(sotaro.ikeda.g)

Flags: needinfo?(jmuizelaar)

Flags: needinfo?(bgirard)

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 42

•

12 years ago

So part of the problem here is that the composition bounds on the child APZC is getting scaled up as we zoom. This doesn't happen for the parent because the parent's composition bounds is clamped to the screen bounds [1]. (We have plans to remove that behaviour but it doesn't affect this bug on 1.2). The next contributing factor is that the displayport calculation at [2] doesn't account for this increase in composition bounds. For the parent APZC the composition bounds don't change, so the division by the zoom at line [2] has the desired effect of shrinking the displayport as the zoom increases. For the child APZC the division merely cancels out the increase in composition bounds, and so we end up with a constant-sized displayport (in CSS pixels) regardless of zoom. So the straightforward fix here would be to just divide again by the zoom. However, we can't do that because in general the child APZC doesn't request content repaints (and hence doesn't recalculate its displayport) while the parent is getting zoomed. So if the child last set a displayport of 500x1000 CSS pixels, and then the parent zooms to 5x what it was before, we would want the child to shrink its displayport down to 100x200 CSS pixels but it never gets a chance. The parent sets the resolution without the child knowing about it, and at that point layout tries to render the child's 500x1000 displayport at a 5x resolution which causes the OOM. For a fix to work we would either need the parent APZC to notify all other APZCs in its subtree that it's about to change the zoom (and then have the children recalculate their displayport) or, in TabChild::UpdateRootFrame, find all the displayports inside a presShell and shrink them by the right factor. I'll try to see if either of these approaches are viable. [1] http://git.mozilla.org/?p=releases/gecko.git;a=blob;f=layout/base/nsDisplayList.cpp;h=7e11b99bedc4b3e80b216e4ba05a8c73f40d6a2b;hb=cd74e4ad7d9dd3d699b13f7262ee898e4cf3f628#l748 [2] http://git.mozilla.org/?p=releases/gecko.git;a=blob;f=gfx/layers/ipc/AsyncPanZoomController.cpp;h=a6878295d0c476f71855003f73b403930e339e7f;hb=cd74e4ad7d9dd3d699b13f7262ee898e4cf3f628#l911

Ben Kelly [:bkelly, not reviewing]

Comment 43

•

12 years ago

Throwing this over to :kats as he is working on a patch to propagate the zoom info to child APZCs as described in comment 42. Please let me know if you would like me to test anything. Thanks!

Assignee: bkelly → bugmail.mozilla

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 44

•

12 years ago

So I tried propagating the zoom info to child APZCs as I described above but ran into a threading problem. Specifically, I need to propagate the new zoom to child APZCs in ScheduleContentRepaint *before* posting the message to the GeckoContentController. However propagating the zoom to child APZCs requires holding the tree lock, and that results in a deadlock scenario because we're already holding the APZC lock there. In addition this solution felt pretty hacky and brittle so I'm working on a different solution now which modifies how the displayport is stored. In nsDOMWindowUtils::SetDisplayPortForElement I'm going to multiply in the cumulative resolution and in nsLayoutUtils::GetDisplayPort I'm going to divide out the cumulative resolution. This way when the resolution of intermediate presshells change, the retrieved displayport is automatically scaled accordingly. I *think* this should work because the displayport is stored relative to the scroll offset and so scaling it won't move it around improperly, but I'm building the patch now to test it and see how it fares.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 45

•

12 years ago

Attached patch Change how the displayport is stored (obsolete) — Details — Splinter Review

Since I don't have a working 1.2 build could you guys apply this patch and see if it fixes the problem without introducing any new regressions? Possible regressions would include the painted content (the displayport) not being positioned correctly, specially when zoomed in or out. I did some light testing of this patch on master and I couldn't repro the OOM although there does still appear to be a fair amount of memory pressure and I think the HWC is dropping layers or something.

Attachment #8355612 - Flags: feedback?(botond)

Attachment #8355612 - Flags: feedback?(bkelly)

Botond Ballo [:botond]

Comment 46

•

12 years ago

Attached patch Change how the displayport is stored (obsolete) — Details — Splinter Review

While attempting to build the 1.2 branch with the attached patch applied, I encountered the following compiler errors: /home/botond/dev/mozilla/b2g-hamachi-1.2/gecko/dom/base/nsDOMWindowUtils.cpp: In function 'void DestroyGfxRect(void*, nsIAtom*, void*, void*)': /home/botond/dev/mozilla/b2g-hamachi-1.2/gecko/dom/base/nsDOMWindowUtils.cpp:308: error: reference to 'gfx' is ambiguous /home/botond/dev/mozilla/b2g-hamachi-1.2/gecko/ipc/chromium/src/chrome/common/transport_dib.h:20: error: candidates are: namespace gfx { } ../../dist/include/mozilla/gfx/Types.h:15: error: namespace mozilla::gfx { } /home/botond/dev/mozilla/b2g-hamachi-1.2/gecko/dom/base/nsDOMWindowUtils.cpp:308: error: 'rect' was not declared in this scope /home/botond/dev/mozilla/b2g-hamachi-1.2/gecko/dom/base/nsDOMWindowUtils.cpp:308: error: reference to 'gfx' is ambiguous /home/botond/dev/mozilla/b2g-hamachi-1.2/gecko/ipc/chromium/src/chrome/common/transport_dib.h:20: error: candidates are: namespace gfx { } ../../dist/include/mozilla/gfx/Types.h:15: error: namespace mozilla::gfx { } /home/botond/dev/mozilla/b2g-hamachi-1.2/gecko/dom/base/nsDOMWindowUtils.cpp:308: error: expected type-specifier before 'gfx' /home/botond/dev/mozilla/b2g-hamachi-1.2/gecko/dom/base/nsDOMWindowUtils.cpp:308: error: expected '>' before 'gfx' /home/botond/dev/mozilla/b2g-hamachi-1.2/gecko/dom/base/nsDOMWindowUtils.cpp:308: error: expected '(' before 'gfx' /home/botond/dev/mozilla/b2g-hamachi-1.2/gecko/dom/base/nsDOMWindowUtils.cpp:308: error: expected primary-expression before '>' token /home/botond/dev/mozilla/b2g-hamachi-1.2/gecko/dom/base/nsDOMWindowUtils.cpp:308: error: expected ')' before ';' token /home/botond/dev/mozilla/b2g-hamachi-1.2/gecko/dom/base/nsDOMWindowUtils.cpp:309: error: type '<type error>' argument given to 'delete', expected pointer /home/botond/dev/mozilla/b2g-hamachi-1.2/gecko/dom/base/nsDOMWindowUtils.cpp: In member function 'virtual nsresult nsDOMWindowUtils::SetDisplayPortForElement(float, float, float, float, nsIDOMElement*)': /home/botond/dev/mozilla/b2g-hamachi-1.2/gecko/dom/base/nsDOMWindowUtils.cpp:391: error: reference to 'gfx' is ambiguous /home/botond/dev/mozilla/b2g-hamachi-1.2/gecko/ipc/chromium/src/chrome/common/transport_dib.h:20: error: candidates are: namespace gfx { } ../../dist/include/mozilla/gfx/Types.h:15: error: namespace mozilla::gfx { } /home/botond/dev/mozilla/b2g-hamachi-1.2/gecko/dom/base/nsDOMWindowUtils.cpp:391: error: expected type-specifier before 'gfx' /home/botond/dev/mozilla/b2g-hamachi-1.2/gecko/dom/base/nsDOMWindowUtils.cpp:391: error: expected ')' before 'gfx' Attached is an updated patch that fixes these errors.

Attachment #8355612 - Attachment is obsolete: true

Attachment #8355612 - Flags: feedback?(botond)

Attachment #8355612 - Flags: feedback?(bkelly)

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 47

•

12 years ago

Comment on attachment 8356186 [details] [diff] [review] Change how the displayport is stored Review of attachment 8356186 [details] [diff] [review]: ----------------------------------------------------------------- :tn, what do you think of this change? In a nutshell it changes the displayport stored in layout to be in CSS pixels multiplied by the resolution, rather than just CSS pixels. That way when we increase the resolution on an ancestor layer, the descendants texture sizes automatically shrink and we don't OOM.

Attachment #8356186 - Flags: feedback?(tnikkel)

Timothy Nikkel (:tnikkel)

Comment 48

•

12 years ago

Comment on attachment 8356186 [details] [diff] [review] Change how the displayport is stored My first thought is that we will now have to update every display port under a presshell (and it's child documents) anytime it's resolution changes. Is this feasible/desirable?

Botond Ballo [:botond]

Comment 49

•

12 years ago

Comment on attachment 8356186 [details] [diff] [review] Change how the displayport is stored I tested this on my 1.2 hamachi device, and I cannot reproduce the crash with it. I also didn't notice any regressions.

Attachment #8356186 - Flags: feedback+

Ben Kelly [:bkelly, not reviewing]

Comment 50

•

12 years ago

With this patch I still see Browser crash while zooming the gaia meeting etherpad: https://etherpad.mozilla.org/gaia-meeting-notes I would like to test a bit more with instrumentation to see if its the same OOM issue, but probably won't be able to until tomorrow morning.

Botond Ballo [:botond]

Comment 51

•

12 years ago

(In reply to Ben Kelly [:bkelly] from comment #50) > With this patch I still see Browser crash while zooming the gaia meeting > etherpad: > > https://etherpad.mozilla.org/gaia-meeting-notes On this page, I see the crash as well with the patch (I saw a phone crash, not a browser crash), though it seemed it required zooming in further to get the crash with the patch than without.

Timothy Nikkel (:tnikkel)

Comment 52

•

12 years ago

Comment on attachment 8356186 [details] [diff] [review] Change how the displayport is stored Talked to kats on irc about this. If we don't change AZPC code too then it will just end up setting the same display port after zooming again, so it won't fix the main problem. We need to at least pair this with an AZPC change. So clearing feedback flag for now.

Attachment #8356186 - Flags: feedback?(tnikkel)

Ben Kelly [:bkelly, not reviewing]

Comment 53

•

12 years ago

On the buri you can monitor the texture memory usage by looking at the Mlocked field in /proc/meminfo: while true; do sleep 1; adb shell cat /proc/meminfo | grep Mlocked; done This shows how much memory the device has mlocked as ashmem.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 54

•

12 years ago

Attached patch Change how the displayport is stored (obsolete) — Details — Splinter Review

Updated patch to shrink the displayport for subframes in APZC code.

Attachment #8356186 - Attachment is obsolete: true

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 55

•

12 years ago

I've been discussing this issue with :tn and :bkelly and thinking about it, and we might need to take a step back here with the approach taken. The fundamental problem is that when async subframe scrolling is enabled, we need more memory to render content. This is expected behaviour, but it was always unclear just how much more memory we would need, how it would be bounded, and whether we would often run into OOM situations. For the most part we went with a "let's do it and see" approach. Turns out that with our current implementation (in 1.2+) the memory usage is unbounded, and can easily run into OOM situations. A simple conceptual example is a page that looks something like this: <body> <iframe style="height: 10000px" src="reallylongpage.html"></iframe> </body> The iframe is 10,000 pixels tall, and we end up requesting a giant displayport (some multiple of 10000px) for this, because we could scroll either the outer page or the iframe, and need to have enough pre-rendered content to paint all of that. This, of course, will trigger an OOM. Even if the patches I uploaded to this bug worked perfectly we would still probably OOM on this page. Going forward I think we have to use the visible region of the iframe (i.e. clipped by parent frames and the screen) rather than the full composition bounds as the base for the displayport. However I think that doing this is too risky for 1.2. For 1.2 I see four main options: 1) Disable subframe apzc. This eliminates the memory requirement and also reduces responsiveness. I suspect this is off the table because of commitments we made to partners. 2) Apply some tweaks to make the OOMs less frequent. This is what I've been trying to do with the above patches, but it's hard to reduce the OOM threshold enough to stop OOMing on this etherpad page because the iframe is huge. It also doesn't guarantee OOMs will not happen. 3) Try some other method to clamp the memory usage, such as putting a hard limit on texture sizes. (I don't know if this can be easily implemented, and it would almost certainly result in scenarios where parts of subframes remain blank.) Another option is to just kill the entire tab if the requested displayport is too big. That limits the crash to a browser tab rather than a phone reboot. 4) Do nothing for 1.2; leave this bug unfixed and ship subframe APZC as-is. Focus efforts on getting a proper fix in for 1.3+ Based on comment 26, I'm thinking of abandoning my current attempt at option #2 and going with option #3 instead. If anybody has other thoughts please let me know.

Jason Smith [:jsmith]

Comment 56

•

12 years ago

[3] sounds okay - the blocking factor is the fact that we're OOMing the phone, rather than the browser tab. If we OOM the browser tab here instead, then that's okay enough to remove the blocking factor on the bug.

Ben Kelly [:bkelly, not reviewing]

Comment 57

•

12 years ago

(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #55) > 3) Try some other method to clamp the memory usage, such as putting a hard > limit on texture sizes. (I don't know if this can be easily implemented, and > it would almost certainly result in scenarios where parts of subframes > remain blank.) Another option is to just kill the entire tab if the > requested displayport is too big. That limits the crash to a browser tab > rather than a phone reboot. I think clamping would be preferable to predictably killing the browser tab. While parts of the screen may not be visible, at least they can zoom back out to a more reasonable level without losing state. The user doesn't get any feedback or options to proceed if the tab dies. Would it be possible to make the existing scale threshold dynamic based on the iframe size? http://mxr.mozilla.org/mozilla-central/source/gfx/layers/ipc/AsyncPanZoomController.cpp#189 So for very large iframes, perhaps zoom is capped at 2.0 or even 1.0?

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 58

•

12 years ago

(In reply to Ben Kelly [:bkelly] from comment #57) > I think clamping would be preferable to predictably killing the browser tab. > While parts of the screen may not be visible, at least they can zoom back > out to a more reasonable level without losing state. The user doesn't get > any feedback or options to proceed if the tab dies. I was discussing this a bit with bjacob and we came to the conclusion it would be better to kill the tab because it's a much simpler codepath and we lose a lot of information by the time we get to the code that allocates the texture. > > Would it be possible to make the existing scale threshold dynamic based on > the iframe size? > > > http://mxr.mozilla.org/mozilla-central/source/gfx/layers/ipc/ > AsyncPanZoomController.cpp#189 > > So for very large iframes, perhaps zoom is capped at 2.0 or even 1.0? The zoom is always done on the root frame of the page, and it doesn't always know how large the iframes are. So we'd have to plumb all of that in to go with this approach, and even a 2.0 zoom might trigger an OOM if the iframe is large enough. So it still wouldn't completely guard against this case. I have a set of patches now which crash the tab if the texture size would be too large. I will upload them shortly; there is a rollup at http://people.mozilla.org/~kgupta/tmp/rollup.patch to test with.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 59

•

12 years ago

Attached patch 1/4 - Add a GetCumulativeResolution function (obsolete) — Details — Splinter Review

Attachment #8356789 - Flags: review?(tnikkel)

Ben Kelly [:bkelly, not reviewing]

Comment 60

•

12 years ago

I'll test this new patch. Just to play devil's advocate, maybe we should consider disabling sub-apzc. I am seeing a lot of jumpiness and general instability on this page in v1.2 compared to v1.1: v1.1 on helix: https://www.dropbox.com/s/hwhgfz4t8nne00m/20140107_apzc_etherpad_v1_1.mp4 v1.2 on buri: https://www.dropbox.com/s/g2y35jd5llpnh7o/20140107_apzc_etherpad_v1_2.mp4 It just seems there are other regressions in usability here beyond the OOM.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 61

•

12 years ago

Attached patch 2/4 - Change the displayport storage units (obsolete) — Details — Splinter Review

We still need this patch in order to not blow up in the "transient" state, where the parent has increased its resolution but the child has not yet requested a new displayport. If we paint in this state then we will apply the old (small) displayport with the new (high) resolution which could OOM. So this patch prevents that.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Updated

•

12 years ago

Attachment #8356790 - Flags: review?(tnikkel)

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 62

•

12 years ago

Attached patch 3/4 - Shrink the displayport multiplier at high zooms (obsolete) — Details — Splinter Review

We could probably do without this patch but it should help reduce the occurrence of the tab crashes. This contracts the subframe displayport multipliers if those subframes are getting rendered at a resolution > 1.

Attachment #8356791 - Flags: review?(botond)

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 63

•

12 years ago

Attached patch 4/4 - Crash the tab if the texture size is going to exceed the max GL texture size (obsolete) — Details — Splinter Review

If the texture size exceeds 4k then we can't render it anyway so that seems like a sensible limit to hard-code to, and crash if we exceed. We actually deal ok with a page with many small iframes; it's only when the iframe gets big on one or both axes that the total texture area blows up a lot.

Attachment #8356593 - Attachment is obsolete: true

Attachment #8356793 - Flags: review?(botond)

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 64

•

12 years ago

Attached patch Rollup parts 1-4 (obsolete) — Details — Splinter Review

Please let me know if you still see the OOM with this. I didn't see it on my master build for Gaia; I will try to borrow a hamachi and test there as well.

Attachment #8356796 - Flags: feedback?(bkelly)

Timothy Nikkel (:tnikkel)

Comment 65

•

12 years ago

Comment on attachment 8356789 [details] [diff] [review] 1/4 - Add a GetCumulativeResolution function IID bump on presshell please.

Attachment #8356789 - Flags: review?(tnikkel) → review+

Timothy Nikkel (:tnikkel)

Comment 66

•

12 years ago

Comment on attachment 8356790 [details] [diff] [review] 2/4 - Change the displayport storage units What's the plan for this longer term? Make this be stored in layer pixels? Only land this on 1.2? We should have a comment explaining why we do this.

Botond Ballo [:botond]

Comment 67

•

12 years ago

Comment on attachment 8356791 [details] [diff] [review] 3/4 - Shrink the displayport multiplier at high zooms Review of attachment 8356791 [details] [diff] [review]: ----------------------------------------------------------------- ::: gfx/layers/ipc/AsyncPanZoomController.cpp @@ +898,5 @@ > +ScaleDisplayPortMultiplier(float aMultiplier, const FrameMetrics& aMetrics) > +{ > + // This will only have an effect on subframes where the parent frame > + // is zoomed in to > 1.0 (for the root frame GetParentResolution() will > + // return 1.0). The effect is to scale down the given displayport multiplier Is this patch meant for trunk too? On trunk it's not necessarily the case that the root frame's GetParentResolution() returns 1.0, at least it won't be after bug 951936 and bug 935219 land, because then the resolution will be on the non-scrollable root layer which will be the parent of the rootmost scrollable layer. Perhaps we can condition on IsRootForLayersId() instead (or mIsRoot, although see bug 957221)?

Timothy Nikkel (:tnikkel)

Comment 68

•

12 years ago

Comment on attachment 8356793 [details] [diff] [review] 4/4 - Crash the tab if the texture size is going to exceed the max GL texture size This feels fairly specific to b2g; some environments (metro when it goes with remote tabs) might not be bound by such a limit.

Botond Ballo [:botond]

Updated

•

12 years ago

Attachment #8356791 - Flags: review?(botond) → review+

Timothy Nikkel (:tnikkel)

Comment 69

•

12 years ago

Comment on attachment 8356790 [details] [diff] [review] 2/4 - Change the displayport storage units kats tell me this is for 1.2 only. So just add a comment about thus and r+.

Attachment #8356790 - Flags: review?(tnikkel) → review+

Botond Ballo [:botond]

Comment 70

•

12 years ago

Comment on attachment 8356793 [details] [diff] [review] 4/4 - Crash the tab if the texture size is going to exceed the max GL texture size Review of attachment 8356793 [details] [diff] [review]: ----------------------------------------------------------------- ::: dom/ipc/TabChild.cpp @@ +1682,5 @@ > + if (aContent->GetPrimaryFrame()) { > + resolution = aContent->GetPrimaryFrame()->PresContext()->PresShell()->GetCumulativeResolution(); > + } > + if (aMetrics.mDisplayPort.width * resolution.width > 4096 || > + aMetrics.mDisplayPort.height * resolution.height > 4096) { Should we be multiplying per aMetrics.mDevPixelsPerCSSPixel as well here?

Attachment #8356793 - Flags: review?(botond) → review+

Ben Kelly [:bkelly, not reviewing]

Comment 71

•

12 years ago

Comment on attachment 8356793 [details] [diff] [review] 4/4 - Crash the tab if the texture size is going to exceed the max GL texture size Review of attachment 8356793 [details] [diff] [review]: ----------------------------------------------------------------- ::: dom/ipc/TabChild.cpp @@ +1686,5 @@ > + aMetrics.mDisplayPort.height * resolution.height > 4096) { > + // The displayport is too big. At this size we can't even turn it into a GL texture > + // and all it's going to do is OOM the phone so we are better off if we just crash > + // this child process now. See bug 947523. > + MOZ_CRASH(); Would it be possible to log something indicating why the crash occurred? I could see this saving us some investigation time on future bug reports.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 72

•

12 years ago

(In reply to Timothy Nikkel (:tn) from comment #65) > IID bump on presshell please. Will do. (In reply to Botond Ballo [:botond] from comment #67) > Is this patch meant for trunk too? On trunk it's not necessarily the case > that the root frame's GetParentResolution() returns 1.0, at least it won't > be after bug 951936 and bug 935219 land, because then the resolution will be > on the non-scrollable root layer which will be the parent of the rootmost > scrollable layer. > > Perhaps we can condition on IsRootForLayersId() instead (or mIsRoot, > although see bug 957221)? The patch is 1.2-only. I have a much bigger change in mind for 1.3 to try and fix this properly. (In reply to Timothy Nikkel (:tn) from comment #68) > This feels fairly specific to b2g; some environments (metro when it goes > with remote tabs) might not be bound by such a limit. Agreed. I can wrap it in an #ifdef if you prefer but since I only plan on having the code land on b2g-1.2 I don't think it matters. (In reply to Timothy Nikkel (:tn) from comment #69) > kats tell me this is for 1.2 only. So just add a comment about thus and r+. Will do. (In reply to Botond Ballo [:botond] from comment #70) > Should we be multiplying per aMetrics.mDevPixelsPerCSSPixel as well here? Yeah, good catch. I'll add that in. (In reply to Ben Kelly [:bkelly] from comment #71) > Would it be possible to log something indicating why the crash occurred? I > could see this saving us some investigation time on future bug reports. Yup, I'll do that. I was able to consistently reproduce the OOM on the hamachi I borrowed from Botond using a 1.2 build without my patch. The STR basically involved zooming in the page as much as possible using the area to the left of the iframe (because putting focus in the frame brings up the keyboard and that thing is impossible to work with) and then trying to scroll the iframe. Usually it would OOM before I even tried scrolling the iframe. With my patch the OOM crash was replaced by a tab crash, and I wasn't able to trigger an OOM with any other panning/zooming actions on the page. So unless Ben has any objections I'll address the comments and get it landed.

Ben Kelly [:bkelly, not reviewing]

Comment 73

•

12 years ago

Comment on attachment 8356796 [details] [diff] [review] Rollup parts 1-4 I also was unable to get b2g to crash with the patch. The tab reliably got killed. I did, however, see some content fail to draw with this patch. See time 45s to 50s in this movie: https://www.dropbox.com/s/vguya4xf6rgeqwh/20140107_apzc_etherpad_v1_2_patch.mov It appears to occur when you zoom in while simultaneously dragging up a bit causing a scroll. I could not reproduce without the patch. On IRC kats indicated this might be explainable given that the patch shrinks the displayport. Is this regression fixable? If not, is this regression serious enough to hold back the patch? Jason, what do you think? I will try to see if I can reproduce it on any other websites tomorrow. Maybe its isolated to sites with very large iframes like this etherpad.

Attachment #8356796 - Flags: feedback?(bkelly)

Flags: needinfo?(jsmith)

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 74

•

12 years ago

(In reply to Ben Kelly [:bkelly] from comment #73) > On IRC kats indicated this might be explainable given that the patch shrinks > the displayport. Is this regression fixable? I believe this regression could be caused by part 3 of this patch series (although I'm not 100% sure). If we take out that part then it might make the regression go away at the expense of killing the tab far more frequently. And if this is regression is a result of my patch then it should be limited to subframes and at high zoom levels.

Jason Smith [:jsmith]

Comment 75

•

12 years ago

I think it would be better to OOM the tab more often and not risk the painting regression seen in the video mainly because I'm concerned about the fallout risk that this has other sites. The blocking factor on the bug is to ensure that we don't OOM the phone. If we end up OOMing the tab a lot, but not the phone, then that's still okay.

Flags: needinfo?(jsmith)

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 76

•

12 years ago

I was able to reproduce the painting problem - my hypothesis in comment 74 was wrong. It's caused by part 2; as we zoom in the outer page the displayport (in CSS pixels) of the inner frame will shrink (by design). It remains the same size in layer pixels, but the composition bounds of the inner frame will expand in layer pixels. This results in the displayport shrinking to be smaller than the composition bounds until the APZC for the inner frame triggers a repaint for it. From the user's point of view the net effect is that if you zoom in on a page with an iframe, the iframe might end up clipped like in Ben's video, but it will repaint properly as soon as you touch it or scroll it.

Ben Kelly [:bkelly, not reviewing]

Comment 77

•

12 years ago

Would it be possible detect the condition and force a repaint after the zoom completes? As a user I think having some artifacts during zoom would be ok if they corrected once I stopped touching the screen. This is similar to how tiled map applications work, so users are probably used to the behavior.

Ben Kelly [:bkelly, not reviewing]

Comment 78

•

12 years ago

I seem to be able to get the painting defect fairly easily on this iframe example site (see #3): http://nunzioweb.com/iframes-example.htm Sometimes when the painting defect occurs I can no longer interact with the screen. If I go to the homescreen and then back into the browser, though, touch input starts working again. I can also cause the tab to get killed reliably by zooming in. The iframe in question is not particularly large.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 79

•

12 years ago

(In reply to Ben Kelly [:bkelly] from comment #77) > Would it be possible detect the condition and force a repaint after the zoom > completes? As a user I think having some artifacts during zoom would be ok > if they corrected once I stopped touching the screen. This is similar to > how tiled map applications work, so users are probably used to the behavior. I tried to do this quickly but was unsuccessful. At this point I'm not going to work on this bug any further - unfortunately there are too many pressing 1.3 issues for me to spend more time on this. We can either land the patchset as-is, which fixes the OOMs, and file follow-up bugs for the regressions. Or we can leave it out.

Ben Kelly [:bkelly, not reviewing]

Comment 80

•

12 years ago

I think given the options it would be better to prevent the b2g OOM. So I would vote for landing the patchset in v1.2. Jason, does this sound ok to you? I believe bug 957668 is the follow-up to address the memory issues in v1.3.

Flags: needinfo?(jsmith)

Comment 81

•

12 years ago

Attached patch 1/4 - Add a GetCumulativeResolution function v2 — Details — Splinter Review

Updated versions of patches with comments addressed and reviewer info included; ready to land.

Attachment #8356789 - Attachment is obsolete: true

Attachment #8357242 - Flags: review+

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 82

•

12 years ago

Attached patch 2/4 - Change the displayport storage units v2 — Details — Splinter Review

Attachment #8356790 - Attachment is obsolete: true

Attachment #8357244 - Flags: review+

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 83

•

12 years ago

Attached patch 3/4 - Shrink the displayport multiplier at high zooms v2 — Details — Splinter Review

Attachment #8356791 - Attachment is obsolete: true

Attachment #8357246 - Flags: review+

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 84

•

12 years ago

Attached patch 4/4 - Crash the tab if the texture size is going to exceed the max GL texture size v2 — Details — Splinter Review

Attachment #8356793 - Attachment is obsolete: true

Attachment #8356796 - Attachment is obsolete: true

Attachment #8357247 - Flags: review+

Jason Smith [:jsmith]

Comment 85

•

12 years ago

Yup, that sounds fine. Let's land this.

Flags: needinfo?(jsmith)

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 86

•

12 years ago

RyanVM, could you please land this on 1.2? Parts 1-4. Thanks!

Flags: needinfo?(ryanvm)

Ryan VanderMeulen [:RyanVM]

Comment 87

•

12 years ago

https://hg.mozilla.org/releases/mozilla-b2g26_v1_2/rev/8e7017c626fd https://hg.mozilla.org/releases/mozilla-b2g26_v1_2/rev/24b1e5cdad90 https://hg.mozilla.org/releases/mozilla-b2g26_v1_2/rev/bb652f96ab95 https://hg.mozilla.org/releases/mozilla-b2g26_v1_2/rev/ec909be2a849 Kats told me that 1.3+ will be addressed by bug 957668, so resolving.

Status: ASSIGNED → RESOLVED

Closed: 12 years ago

status-b2g-v1.1hd: --- → unaffected

status-b2g-v1.2: affected → fixed

status-b2g-v1.3: --- → wontfix

status-firefox27: --- → wontfix

status-firefox28: affected → wontfix

status-firefox29: --- → wontfix

Flags: needinfo?(ryanvm)

Resolution: --- → FIXED

Zoomingincrashlog.txt 12 years ago KTucker [:KTucker][Inactive 3/4/2016] 469.84 KB, text/plain		Details
BrowserCrash.mp4 12 years ago KTucker [:KTucker][Inactive 3/4/2016] 660.86 KB, video/mp4		Details
947523_dmesg 12 years ago gbennett 26.35 KB, text/plain		Details
Change how the displayport is stored 12 years ago Kartikaya Gupta (email:kats@mozilla.staktrace.com) 6.12 KB, patch		Details \| Diff \| Splinter Review
Change how the displayport is stored 12 years ago Botond Ballo [:botond] 4.38 KB, patch	botond : feedback+	Details \| Diff \| Splinter Review
Change how the displayport is stored 12 years ago Kartikaya Gupta (email:kats@mozilla.staktrace.com) 7.69 KB, patch		Details \| Diff \| Splinter Review
1/4 - Add a GetCumulativeResolution function 12 years ago Kartikaya Gupta (email:kats@mozilla.staktrace.com) 2.83 KB, patch	tnikkel : review+	Details \| Diff \| Splinter Review
2/4 - Change the displayport storage units 12 years ago Kartikaya Gupta (email:kats@mozilla.staktrace.com) 3.61 KB, patch	tnikkel : review+	Details \| Diff \| Splinter Review
3/4 - Shrink the displayport multiplier at high zooms 12 years ago Kartikaya Gupta (email:kats@mozilla.staktrace.com) 3.37 KB, patch	botond : review+	Details \| Diff \| Splinter Review
4/4 - Crash the tab if the texture size is going to exceed the max GL texture size 12 years ago Kartikaya Gupta (email:kats@mozilla.staktrace.com) 1.53 KB, patch	botond : review+	Details \| Diff \| Splinter Review
Rollup parts 1-4 12 years ago Kartikaya Gupta (email:kats@mozilla.staktrace.com) 7.48 KB, patch		Details \| Diff \| Splinter Review
1/4 - Add a GetCumulativeResolution function v2 12 years ago Kartikaya Gupta (email:kats@mozilla.staktrace.com) 3.64 KB, patch	kats : review+	Details \| Diff \| Splinter Review
2/4 - Change the displayport storage units v2 12 years ago Kartikaya Gupta (email:kats@mozilla.staktrace.com) 4.04 KB, patch	kats : review+	Details \| Diff \| Splinter Review
3/4 - Shrink the displayport multiplier at high zooms v2 12 years ago Kartikaya Gupta (email:kats@mozilla.staktrace.com) 3.38 KB, patch	kats : review+	Details \| Diff \| Splinter Review
4/4 - Crash the tab if the texture size is going to exceed the max GL texture size v2 12 years ago Kartikaya Gupta (email:kats@mozilla.staktrace.com) 1.71 KB, patch	kats : review+	Details \| Diff \| Splinter Review