Closed Bug 965945 Opened 6 years ago Closed 6 years ago
[B2G][Browser] Accessing files on github
.com causes the phone to oom and restart
129.85 KB, text/plain
179.08 KB, text/plain
2.24 MB, application/x-zip-compressed
1.44 KB, patch
|Details | Diff | Splinter Review|
When the user accesses a file shared on github.com and scrolls down to view more of the intended file, the browser app will freeze and become unresponsive. If the user continues to tap on the frozen webpage, the webpage's display will turn black and the phone will eventually restart. Repro Steps: 1) Update Buri to BuildID: 20140127004002 2) Open the browser app. 3) Go to url: http://bit.ly/1dbggkX. 4) When the intended webpage loads scroll down. 5) Once the webpage freezes, continue tapping on the screen. Actual: Webpage freezes, the screen turns black and the phone eventually restarts. Expected: Webpage is displayed without any issues. Environmental Variables: Device: Buri 1.3 MOZ BuildID: 20140127004002 Gaia: 25a45a836a4a21a30f63fa7b544b42e8b781180a Gecko: c40099a42c1f Version: 28.0a2 Firmware Version: V1.2-device.cfg Notes: The smallest file this issue was reproduced on was 9,500 kb in size. Repro frequency: 100% See attached: dmesg, logcat
I was unable to reproduce this issue on the latest v1.2 build. When navigating to the link in Comment #0 the webpage immediately displayed a notification that it had crashed. The device never restarted as the page was never able to load. Environmental Variables: Device: Buri 1.2 MOZ BuildID: 20140128004004 Gaia: 539a25e1887b902b8b25038c547048e691bd97f6 Gecko: d10e1f965d0c Version: 26.0 Firmware Version: v1.2-device.cfg
I'm able to reproduce this issue on the earliest 1.3 build I have available. Device: Buri v1.3 Mozilla RIL BuildID: 20130919040201 Gaia: 0dedb5b3789afc638a0c7c67652937fcb30e77d2 Gecko: 803189f35921 Version: 27.0a1 Base Image: V1.2-device.cfg
Can we get an about:memory report? https://wiki.mozilla.org/B2G/QA/Tips_And_Tricks#Getting_About_Memory_Logs_for_a_Bug
Attaching the about memory report, hopefully this contains some useful information.
Mike - Can you assign someone from the perf team to investigate this?
blocking-b2g: 1.3? → 1.3+
So to be clear the regression is that this used to just crash the tab and now crashes the phone? I'll take this for investigation.
Assignee: nobody → khuey
This looks very similar to bug 947523 and is probably the same underlying issue. Bug 957668 is aiming to fix this properly for 1.3+.
Depends on: 957668
Assignee: khuey → nobody
Kyle did you unassign yourself because of comment 8? If so any reason why we shouldn't close this as a duplicate of Bug 957668?
Priority: -- → P1
Whiteboard: dogfood1.3 [MemShrink] → [c=memory p= s= u=1.3] dogfood1.3 [MemShrink]
(In reply to Mike Lee [:mlee] from comment #9) > Kyle did you unassign yourself because of comment 8? If so any reason why we > shouldn't close this as a duplicate of Bug 957668? Yes. Whether we dupe it is up to kats. I think we should keep it open because we can verify that the fix in bug 957668 fixes this once it is ready.
I would rather leave it as blocking than dupe it.
Let's make bug 957668 the MemShrink-tagged one.
Whiteboard: [c=memory p= s= u=1.3] dogfood1.3 [MemShrink] → [c=memory p= s= u=1.3] dogfood1.3
Ok. Kats, assigning this to you. Please close out as needed once bug 957668 is verified as fixing this.
Assignee: nobody → bugmail.mozilla
Status: NEW → ASSIGNED
Whiteboard: [c=memory p= s= u=1.3] dogfood1.3 → [c=memory p= s= u=1.3] dogfood1.3 [ETA:Feb14]
6 years ago
Whiteboard: [c=memory p= s= u=1.3] dogfood1.3 [ETA:Feb14] → [c=memory p= s= u=1.3] dogfood1.3 [ETA:see dep bug]
I'm going to unblock the dependency here, as discussions offline concluded that patch will not be taken to 1.3.
No longer depends on: 957668
I guess I better come up with an alternate fix for this then.
Whiteboard: [c=memory p= s= u=1.3] dogfood1.3 [ETA:see dep bug] → [c=memory p= s= u=1.3] dogfood1.3 [ETA:2/26]
So I tried to reproduce the problem on latest 1.3 hamachi code. I wasn't able to get the exact behaviour described but I saw one of two things happen when I scrolled the page in question: 1) The browser tab crashed with the "Sorry" message. I could close or reload the tab no problem. 2) The iframe on the page went black. When this happened, b2g-ps showed that the Preallocated process and the homescreen had been killed, and the b2g root process and the browser process were using up large amounts of memory. Further scrolling or interaction at this point usually caused all the b2g processes to die, even though the phone never actually rebooted itself. (Maybe this is a separate bug; once the b2g root process gets OOM-killed it should probably restart the phone). In the latter case I looked at the about:memory dumps and confirmed what I was expecting, which is that the gralloc memory reporter showed a large (~60MB) increase. This is because the APZ code sets a gigantic displayport (something to be addressed properly in bug 957668) and the parent process merrily tries to allocate a giant gralloc buffer for it. This is actually very similar to a problem I was discussing with bjacob yesterday. I have a patch that addresses this (by killing the child process instead of OOMing the parent) but I will run it by bjacob first.
After discussion with jrmuizel and bjacob the attached patch seems like a good fix for 1.3. However they pointed out another issue in the flow I described in comment 16. That is, when the OOM killer decides to kill the browser and the root process, instead of killing both at the same time (or so it appears to me) it should kill the browser process first, wait for cleanup to happen and for the root process to free up memory, and only then kill the root process. I will file a follow-up bug for this.
Attachment #8378367 - Flags: review?(bjacob)
Filed bug 974445 as a follow-up for the LMK behaviour.
Comment on attachment 8378367 [details] [diff] [review] Kill child before it kills the parent via OOM Review of attachment 8378367 [details] [diff] [review]: ----------------------------------------------------------------- ::: gfx/layers/ipc/ShadowLayerUtilsGralloc.cpp @@ +255,5 @@ > } > > + // If the requested size is too big (i.e. exceeds the commonly used max GL texture size) > + // then we risk OOMing the parent process. It's better to just deny the allocation and > + // kill the child process, which is what the following code does. I would add a comment: TODO: make that check the actual GL_MAX_TEXTURE_SIZE instead of hardcoding 4096.
Attachment #8378367 - Flags: review?(bjacob) → review+
Added TODO and landed: https://hg.mozilla.org/integration/mozilla-inbound/rev/679e69e79fe2
Comment on attachment 8378367 [details] [diff] [review] Kill child before it kills the parent via OOM NOTE: This flag is now for security issues only. Please see https://wiki.mozilla.org/Release_Management/B2G_Landing to better understand the B2G approval process and landings. [Approval Request Comment] Bug caused by (feature/regressing bug #): APZ User impact if declined: load some webpages (those with large iframes or scrollable subframes) can cause the parent process to get OOM-killed Testing completed: locally Risk to taking this patch (and alternatives if risky): there is a risk that this patch causes child proceses to get killed more frequently. However this should only happen in cases where we otherwise risk OOMing the parent, so I think it's better. String or UUID changes made by this patch: none
Attachment #8378367 - Flags: approval-mozilla-b2g28?
Attachment #8378367 - Flags: approval-mozilla-b2g28? → approval-mozilla-b2g28+
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: 1.4 S1 (14feb) → 1.4 S2 (28feb)
Backed out on m-c for causing C2 failure on B2G emulator: https://hg.mozilla.org/mozilla-central/rev/660b62608951 Backouts to follow for other trees.
Backout on b2g28: https://hg.mozilla.org/releases/mozilla-b2g28_v1_3/rev/32fd5d798477
Here is an example failure log: https://tbpl.mozilla.org/php/getParsedLog.php?id=34935180&full=1&branch=mozilla-inbound The crashtest that is failing is layout/base/crashtests/653133-1.html which sets a displayport 6000 pixels tall and expects to not crash. That sounds a little unreasonable to me but I'll have to look into the history of that test before I can disable it with a clear conscience.
Based on the history of the test I think it should still be a valid test if I drop the displayport height to 4096. Try run with that change: https://tbpl.mozilla.org/?tree=Try&rev=90968afa290c
Try run is green but I'd like to wait on the results of the investigation in bug 965389 before re-landing, since this might result in the gallery app crashing more frequently.
The gallery app part of bug 965389 doesn't happen on 1.3 so I'm going to go ahead and land this again with the test change. It will likely cause the gallery app to crash on nexus 4 and (less frequently) on hamachi, but that's ok because if it didn't crash it would just show black or OOM the phone which isn't much better. And hopefully we'll have bug 957668 done in the 1.4 timeframe and so it'll get taken care of properly.
Status: REOPENED → RESOLVED
Closed: 6 years ago → 6 years ago
Resolution: --- → FIXED
6 years ago
I had a b2g28 tree handy so I pushed the uplift too: https://hg.mozilla.org/releases/mozilla-b2g28_v1_3/rev/34278682f4c6
You need to log in before you can comment on or make changes to this bug.