Closed Bug 965945 Opened 12 years ago Closed 12 years ago

[B2G][Browser] Accessing files on github.com causes the phone to oom and restart

Tracking

(blocking-b2g:1.3+, firefox28 wontfix, firefox29 wontfix, firefox30 fixed, b2g-v1.2 unaffected, b2g-v1.3 fixed, b2g-v1.3T fixed, b2g-v1.4 fixed)

Status:

RESOLVED FIXED

Milestone:

1.4 S2 (28feb)

Project Flags:

blocking-b2g

1.3+

Tracking Flags:

Tracking

Status

firefox28

---

wontfix

firefox29

---

wontfix

firefox30

---

fixed

b2g-v1.2

---

unaffected

b2g-v1.3

---

fixed

b2g-v1.3T

---

fixed

b2g-v1.4

---

fixed

People

(Reporter: rkunkel, Assigned: kats)

References

Details

(Keywords: perf, regression, Whiteboard: [c=memory p= s= u=1.3] dogfood1.3)

Attachments

(4 files)

dmesg 12 years ago Roland Kunkel [:RolandK] 129.85 KB, text/plain		Details
logcat 12 years ago Roland Kunkel [:RolandK] 179.08 KB, text/plain		Details
about-memory-2.zip 12 years ago J Zimbrick 2.24 MB, application/x-zip-compressed		Details
Kill child before it kills the parent via OOM 12 years ago Kartikaya Gupta (email:kats@mozilla.staktrace.com) 1.44 KB, patch	bjacob : review+ fabrice : approval-mozilla-b2g28+	Details \| Diff \| Splinter Review

Roland Kunkel [:RolandK]

Reporter

Description

•

12 years ago

Attached file dmesg — Details

When the user accesses a file shared on github.com and scrolls down to view more of the intended file, the browser app will freeze and become unresponsive. If the user continues to tap on the frozen webpage, the webpage's display will turn black and the phone will eventually restart. Repro Steps: 1) Update Buri to BuildID: 20140127004002 2) Open the browser app. 3) Go to url: http://bit.ly/1dbggkX. 4) When the intended webpage loads scroll down. 5) Once the webpage freezes, continue tapping on the screen. Actual: Webpage freezes, the screen turns black and the phone eventually restarts. Expected: Webpage is displayed without any issues. Environmental Variables: Device: Buri 1.3 MOZ BuildID: 20140127004002 Gaia: 25a45a836a4a21a30f63fa7b544b42e8b781180a Gecko: c40099a42c1f Version: 28.0a2 Firmware Version: V1.2-device.cfg Notes: The smallest file this issue was reproduced on was 9,500 kb in size. Repro frequency: 100% See attached: dmesg, logcat

Roland Kunkel [:RolandK]

Reporter

Comment 1

•

12 years ago

I was unable to reproduce this issue on the latest v1.2 build. When navigating to the link in Comment #0 the webpage immediately displayed a notification that it had crashed. The device never restarted as the page was never able to load. Environmental Variables: Device: Buri 1.2 MOZ BuildID: 20140128004004 Gaia: 539a25e1887b902b8b25038c547048e691bd97f6 Gecko: d10e1f965d0c Version: 26.0 Firmware Version: v1.2-device.cfg

status-b2g-v1.2: --- → unaffected

status-b2g-v1.3: --- → affected

Roland Kunkel [:RolandK]

Reporter

Comment 2

•

12 years ago

Attached file logcat — Details

Jason Smith [:jsmith]

Updated

•

12 years ago

blocking-b2g: --- → 1.3?

Component: Gaia::Browser → General

Keywords: perf, regression, regressionwindow-wanted

Whiteboard: dogfood1.3 → dogfood1.3 [MemShrink]

J Zimbrick

Comment 3

•

12 years ago

I'm able to reproduce this issue on the earliest 1.3 build I have available. Device: Buri v1.3 Mozilla RIL BuildID: 20130919040201 Gaia: 0dedb5b3789afc638a0c7c67652937fcb30e77d2 Gecko: 803189f35921 Version: 27.0a1 Base Image: V1.2-device.cfg

Keywords: regressionwindow-wanted

Jason Smith [:jsmith]

Comment 4

•

12 years ago

Can we get an about:memory report? https://wiki.mozilla.org/B2G/QA/Tips_And_Tricks#Getting_About_Memory_Logs_for_a_Bug

Keywords: qawanted

J Zimbrick

Comment 5

•

12 years ago

Attached file about-memory-2.zip — Details

Attaching the about memory report, hopefully this contains some useful information.

J Zimbrick

Updated

•

12 years ago

Keywords: qawanted

Jason Smith [:jsmith]

Comment 6

•

12 years ago

Mike - Can you assign someone from the perf team to investigate this?

blocking-b2g: 1.3? → 1.3+

Flags: needinfo?(mlee)

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 7

•

12 years ago

So to be clear the regression is that this used to just crash the tab and now crashes the phone? I'll take this for investigation.

Assignee: nobody → khuey

Flags: needinfo?(mlee)

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 8

•

12 years ago

This looks very similar to bug 947523 and is probably the same underlying issue. Bug 957668 is aiming to fix this properly for 1.3+.

Depends on: 957668

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Updated

•

12 years ago

Assignee: khuey → nobody

Mike Lee [:mlee]

Comment 9

•

12 years ago

Kyle did you unassign yourself because of comment 8? If so any reason why we shouldn't close this as a duplicate of Bug 957668?

Flags: needinfo?(khuey)

Priority: -- → P1

Whiteboard: dogfood1.3 [MemShrink] → [c=memory p= s= u=1.3] dogfood1.3 [MemShrink]

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 10

•

12 years ago

(In reply to Mike Lee [:mlee] from comment #9) > Kyle did you unassign yourself because of comment 8? If so any reason why we > shouldn't close this as a duplicate of Bug 957668? Yes. Whether we dupe it is up to kats. I think we should keep it open because we can verify that the fix in bug 957668 fixes this once it is ready.

Flags: needinfo?(khuey)

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 11

•

12 years ago

I would rather leave it as blocking than dupe it.

Nicholas Nethercote [inactive]

Comment 12

•

12 years ago

Let's make bug 957668 the MemShrink-tagged one.

Whiteboard: [c=memory p= s= u=1.3] dogfood1.3 [MemShrink] → [c=memory p= s= u=1.3] dogfood1.3

Mike Lee [:mlee]

Comment 13

•

12 years ago

Ok. Kats, assigning this to you. Please close out as needed once bug 957668 is verified as fixing this.

Assignee: nobody → bugmail.mozilla

Status: NEW → ASSIGNED

Whiteboard: [c=memory p= s= u=1.3] dogfood1.3 → [c=memory p= s= u=1.3] dogfood1.3 [ETA:Feb14]

Mike Lee [:mlee]

Updated

•

12 years ago

Target Milestone: --- → 1.4 S1 (14feb)

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Updated

•

12 years ago

Whiteboard: [c=memory p= s= u=1.3] dogfood1.3 [ETA:Feb14] → [c=memory p= s= u=1.3] dogfood1.3 [ETA:see dep bug]

Jason Smith [:jsmith]

Comment 14

•

12 years ago

I'm going to unblock the dependency here, as discussions offline concluded that patch will not be taken to 1.3.

No longer depends on: 957668

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 15

•

12 years ago

I guess I better come up with an alternate fix for this then.

Whiteboard: [c=memory p= s= u=1.3] dogfood1.3 [ETA:see dep bug] → [c=memory p= s= u=1.3] dogfood1.3 [ETA:2/26]

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 16

•

12 years ago

So I tried to reproduce the problem on latest 1.3 hamachi code. I wasn't able to get the exact behaviour described but I saw one of two things happen when I scrolled the page in question: 1) The browser tab crashed with the "Sorry" message. I could close or reload the tab no problem. 2) The iframe on the page went black. When this happened, b2g-ps showed that the Preallocated process and the homescreen had been killed, and the b2g root process and the browser process were using up large amounts of memory. Further scrolling or interaction at this point usually caused all the b2g processes to die, even though the phone never actually rebooted itself. (Maybe this is a separate bug; once the b2g root process gets OOM-killed it should probably restart the phone). In the latter case I looked at the about:memory dumps and confirmed what I was expecting, which is that the gralloc memory reporter showed a large (~60MB) increase. This is because the APZ code sets a gigantic displayport (something to be addressed properly in bug 957668) and the parent process merrily tries to allocate a giant gralloc buffer for it. This is actually very similar to a problem I was discussing with bjacob yesterday. I have a patch that addresses this (by killing the child process instead of OOMing the parent) but I will run it by bjacob first.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 17

•

12 years ago

Attached patch Kill child before it kills the parent via OOM — Details — Splinter Review

After discussion with jrmuizel and bjacob the attached patch seems like a good fix for 1.3. However they pointed out another issue in the flow I described in comment 16. That is, when the OOM killer decides to kill the browser and the root process, instead of killing both at the same time (or so it appears to me) it should kill the browser process first, wait for cleanup to happen and for the root process to free up memory, and only then kill the root process. I will file a follow-up bug for this.

Attachment #8378367 - Flags: review?(bjacob)

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 18

•

12 years ago

Filed bug 974445 as a follow-up for the LMK behaviour.

Benoit Jacob [:bjacob] (mostly away)

Comment 19

•

12 years ago

Comment on attachment 8378367 [details] [diff] [review] Kill child before it kills the parent via OOM Review of attachment 8378367 [details] [diff] [review]: ----------------------------------------------------------------- ::: gfx/layers/ipc/ShadowLayerUtilsGralloc.cpp @@ +255,5 @@ > } > > + // If the requested size is too big (i.e. exceeds the commonly used max GL texture size) > + // then we risk OOMing the parent process. It's better to just deny the allocation and > + // kill the child process, which is what the following code does. I would add a comment: TODO: make that check the actual GL_MAX_TEXTURE_SIZE instead of hardcoding 4096.

Attachment #8378367 - Flags: review?(bjacob) → review+

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 20

•

12 years ago

Added TODO and landed: https://hg.mozilla.org/integration/mozilla-inbound/rev/679e69e79fe2

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 21

•

12 years ago

Comment on attachment 8378367 [details] [diff] [review] Kill child before it kills the parent via OOM NOTE: This flag is now for security issues only. Please see https://wiki.mozilla.org/Release_Management/B2G_Landing to better understand the B2G approval process and landings. [Approval Request Comment] Bug caused by (feature/regressing bug #): APZ User impact if declined: load some webpages (those with large iframes or scrollable subframes) can cause the parent process to get OOM-killed Testing completed: locally Risk to taking this patch (and alternatives if risky): there is a risk that this patch causes child proceses to get killed more frequently. However this should only happen in cases where we otherwise risk OOMing the parent, so I think it's better. String or UUID changes made by this patch: none

Attachment #8378367 - Flags: approval-mozilla-b2g28?

[:fabrice] Fabrice Desré

Updated

•

12 years ago

Attachment #8378367 - Flags: approval-mozilla-b2g28? → approval-mozilla-b2g28+

Ryan VanderMeulen [:RyanVM]

Comment 22

•

12 years ago

https://hg.mozilla.org/mozilla-central/rev/679e69e79fe2

Status: ASSIGNED → RESOLVED

Closed: 12 years ago

Resolution: --- → FIXED

Target Milestone: 1.4 S1 (14feb) → 1.4 S2 (28feb)

Ryan VanderMeulen [:RyanVM]

Comment 23

•

12 years ago

https://hg.mozilla.org/releases/mozilla-b2g28_v1_3/rev/c36fb08a0ec0

status-b2g-v1.3: affected → fixed

status-b2g-v1.4: --- → fixed

status-firefox28: --- → wontfix

status-firefox29: --- → wontfix

status-firefox30: --- → fixed

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 24

•

12 years ago

Backed out on m-c for causing C2 failure on B2G emulator: https://hg.mozilla.org/mozilla-central/rev/660b62608951 Backouts to follow for other trees.

Status: RESOLVED → REOPENED

status-b2g-v1.4: fixed → affected

status-firefox30: fixed → affected

Resolution: FIXED → ---

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 25

•

12 years ago

Backout on b2g28: https://hg.mozilla.org/releases/mozilla-b2g28_v1_3/rev/32fd5d798477

status-b2g-v1.3: fixed → affected

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 26

•

12 years ago

Here is an example failure log: https://tbpl.mozilla.org/php/getParsedLog.php?id=34935180&full=1&branch=mozilla-inbound The crashtest that is failing is layout/base/crashtests/653133-1.html which sets a displayport 6000 pixels tall and expects to not crash. That sounds a little unreasonable to me but I'll have to look into the history of that test before I can disable it with a clear conscience.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 27

•

12 years ago

Based on the history of the test I think it should still be a valid test if I drop the displayport height to 4096. Try run with that change: https://tbpl.mozilla.org/?tree=Try&rev=90968afa290c

Botond Ballo [:botond]

Updated

•

12 years ago

Comment 28

•

12 years ago

Try run is green but I'd like to wait on the results of the investigation in bug 965389 before re-landing, since this might result in the gallery app crashing more frequently.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 29

•

12 years ago

The gallery app part of bug 965389 doesn't happen on 1.3 so I'm going to go ahead and land this again with the test change. It will likely cause the gallery app to crash on nexus 4 and (less frequently) on hamachi, but that's ok because if it didn't crash it would just show black or OOM the phone which isn't much better. And hopefully we'll have bug 957668 done in the 1.4 timeframe and so it'll get taken care of properly.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 30

•

12 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/5ac5ee30517d

Carsten Book [:Tomcat]

Comment 31

•

12 years ago

https://hg.mozilla.org/mozilla-central/rev/5ac5ee30517d

Status: REOPENED → RESOLVED

Closed: 12 years ago → 12 years ago

Resolution: --- → FIXED

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Updated

•

12 years ago

status-b2g-v1.4: affected → fixed

status-firefox30: affected → fixed

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 32

•

12 years ago

I had a b2g28 tree handy so I pushed the uplift too: https://hg.mozilla.org/releases/mozilla-b2g28_v1_3/rev/34278682f4c6

status-b2g-v1.3: affected → fixed

Whiteboard: [c=memory p= s= u=1.3] dogfood1.3 [ETA:2/26] → [c=memory p= s= u=1.3] dogfood1.3

Sotaro Ikeda [:sotaro]

Updated

•

12 years ago

Depends on: 977423

Ryan VanderMeulen [:RyanVM]

Updated

•

12 years ago

status-b2g-v1.3T: --- → fixed

You need to log in before you can comment on or make changes to this bug.