Closed
Bug 965945
Opened 12 years ago
Closed 11 years ago
[B2G][Browser] Accessing files on github.com causes the phone to oom and restart
Categories
(Firefox OS Graveyard :: General, defect, P1)
Tracking
(blocking-b2g:1.3+, firefox28 wontfix, firefox29 wontfix, firefox30 fixed, b2g-v1.2 unaffected, b2g-v1.3 fixed, b2g-v1.3T fixed, b2g-v1.4 fixed)
People
(Reporter: rkunkel, Assigned: kats)
References
Details
(Keywords: perf, regression, Whiteboard: [c=memory p= s= u=1.3] dogfood1.3)
Attachments
(4 files)
129.85 KB,
text/plain
|
Details | |
179.08 KB,
text/plain
|
Details | |
2.24 MB,
application/x-zip-compressed
|
Details | |
1.44 KB,
patch
|
bjacob
:
review+
fabrice
:
approval-mozilla-b2g28+
|
Details | Diff | Splinter Review |
When the user accesses a file shared on github.com and scrolls down to view more of the intended file, the browser app will freeze and become unresponsive. If the user continues to tap on the frozen webpage, the webpage's display will turn black and the phone will eventually restart.
Repro Steps:
1) Update Buri to BuildID: 20140127004002
2) Open the browser app.
3) Go to url: http://bit.ly/1dbggkX.
4) When the intended webpage loads scroll down.
5) Once the webpage freezes, continue tapping on the screen.
Actual:
Webpage freezes, the screen turns black and the phone eventually restarts.
Expected:
Webpage is displayed without any issues.
Environmental Variables:
Device: Buri 1.3 MOZ
BuildID: 20140127004002
Gaia: 25a45a836a4a21a30f63fa7b544b42e8b781180a
Gecko: c40099a42c1f
Version: 28.0a2
Firmware Version: V1.2-device.cfg
Notes:
The smallest file this issue was reproduced on was 9,500 kb in size.
Repro frequency: 100%
See attached: dmesg, logcat
Reporter | ||
Comment 1•12 years ago
|
||
I was unable to reproduce this issue on the latest v1.2 build. When navigating to the link in Comment #0 the webpage immediately displayed a notification that it had crashed. The device never restarted as the page was never able to load.
Environmental Variables:
Device: Buri 1.2 MOZ
BuildID: 20140128004004
Gaia: 539a25e1887b902b8b25038c547048e691bd97f6
Gecko: d10e1f965d0c
Version: 26.0
Firmware Version: v1.2-device.cfg
status-b2g-v1.2:
--- → unaffected
status-b2g-v1.3:
--- → affected
Reporter | ||
Comment 2•12 years ago
|
||
Updated•12 years ago
|
blocking-b2g: --- → 1.3?
Component: Gaia::Browser → General
Whiteboard: dogfood1.3 → dogfood1.3 [MemShrink]
Comment 3•12 years ago
|
||
I'm able to reproduce this issue on the earliest 1.3 build I have available.
Device: Buri v1.3 Mozilla RIL
BuildID: 20130919040201
Gaia: 0dedb5b3789afc638a0c7c67652937fcb30e77d2
Gecko: 803189f35921
Version: 27.0a1
Base Image: V1.2-device.cfg
Keywords: regressionwindow-wanted
Comment 4•12 years ago
|
||
Can we get an about:memory report?
https://wiki.mozilla.org/B2G/QA/Tips_And_Tricks#Getting_About_Memory_Logs_for_a_Bug
Keywords: qawanted
Comment 5•12 years ago
|
||
Attaching the about memory report, hopefully this contains some useful information.
Comment 6•12 years ago
|
||
Mike - Can you assign someone from the perf team to investigate this?
blocking-b2g: 1.3? → 1.3+
Flags: needinfo?(mlee)
So to be clear the regression is that this used to just crash the tab and now crashes the phone?
I'll take this for investigation.
Assignee: nobody → khuey
Flags: needinfo?(mlee)
Assignee | ||
Comment 8•12 years ago
|
||
This looks very similar to bug 947523 and is probably the same underlying issue. Bug 957668 is aiming to fix this properly for 1.3+.
Depends on: 957668
Assignee: khuey → nobody
Comment 9•12 years ago
|
||
Kyle did you unassign yourself because of comment 8? If so any reason why we shouldn't close this as a duplicate of Bug 957668?
Flags: needinfo?(khuey)
Priority: -- → P1
Whiteboard: dogfood1.3 [MemShrink] → [c=memory p= s= u=1.3] dogfood1.3 [MemShrink]
(In reply to Mike Lee [:mlee] from comment #9)
> Kyle did you unassign yourself because of comment 8? If so any reason why we
> shouldn't close this as a duplicate of Bug 957668?
Yes.
Whether we dupe it is up to kats. I think we should keep it open because we can verify that the fix in bug 957668 fixes this once it is ready.
Flags: needinfo?(khuey)
Assignee | ||
Comment 11•12 years ago
|
||
I would rather leave it as blocking than dupe it.
![]() |
||
Comment 12•12 years ago
|
||
Let's make bug 957668 the MemShrink-tagged one.
Whiteboard: [c=memory p= s= u=1.3] dogfood1.3 [MemShrink] → [c=memory p= s= u=1.3] dogfood1.3
Comment 13•12 years ago
|
||
Ok. Kats, assigning this to you. Please close out as needed once bug 957668 is verified as fixing this.
Assignee: nobody → bugmail.mozilla
Status: NEW → ASSIGNED
Whiteboard: [c=memory p= s= u=1.3] dogfood1.3 → [c=memory p= s= u=1.3] dogfood1.3 [ETA:Feb14]
Updated•12 years ago
|
Target Milestone: --- → 1.4 S1 (14feb)
Assignee | ||
Updated•12 years ago
|
Whiteboard: [c=memory p= s= u=1.3] dogfood1.3 [ETA:Feb14] → [c=memory p= s= u=1.3] dogfood1.3 [ETA:see dep bug]
Comment 14•11 years ago
|
||
I'm going to unblock the dependency here, as discussions offline concluded that patch will not be taken to 1.3.
No longer depends on: 957668
Assignee | ||
Comment 15•11 years ago
|
||
I guess I better come up with an alternate fix for this then.
Whiteboard: [c=memory p= s= u=1.3] dogfood1.3 [ETA:see dep bug] → [c=memory p= s= u=1.3] dogfood1.3 [ETA:2/26]
Assignee | ||
Comment 16•11 years ago
|
||
So I tried to reproduce the problem on latest 1.3 hamachi code. I wasn't able to get the exact behaviour described but I saw one of two things happen when I scrolled the page in question:
1) The browser tab crashed with the "Sorry" message. I could close or reload the tab no problem.
2) The iframe on the page went black. When this happened, b2g-ps showed that the Preallocated process and the homescreen had been killed, and the b2g root process and the browser process were using up large amounts of memory. Further scrolling or interaction at this point usually caused all the b2g processes to die, even though the phone never actually rebooted itself. (Maybe this is a separate bug; once the b2g root process gets OOM-killed it should probably restart the phone).
In the latter case I looked at the about:memory dumps and confirmed what I was expecting, which is that the gralloc memory reporter showed a large (~60MB) increase. This is because the APZ code sets a gigantic displayport (something to be addressed properly in bug 957668) and the parent process merrily tries to allocate a giant gralloc buffer for it. This is actually very similar to a problem I was discussing with bjacob yesterday. I have a patch that addresses this (by killing the child process instead of OOMing the parent) but I will run it by bjacob first.
Assignee | ||
Comment 17•11 years ago
|
||
After discussion with jrmuizel and bjacob the attached patch seems like a good fix for 1.3.
However they pointed out another issue in the flow I described in comment 16. That is, when the OOM killer decides to kill the browser and the root process, instead of killing both at the same time (or so it appears to me) it should kill the browser process first, wait for cleanup to happen and for the root process to free up memory, and only then kill the root process. I will file a follow-up bug for this.
Attachment #8378367 -
Flags: review?(bjacob)
Assignee | ||
Comment 18•11 years ago
|
||
Filed bug 974445 as a follow-up for the LMK behaviour.
Comment 19•11 years ago
|
||
Comment on attachment 8378367 [details] [diff] [review]
Kill child before it kills the parent via OOM
Review of attachment 8378367 [details] [diff] [review]:
-----------------------------------------------------------------
::: gfx/layers/ipc/ShadowLayerUtilsGralloc.cpp
@@ +255,5 @@
> }
>
> + // If the requested size is too big (i.e. exceeds the commonly used max GL texture size)
> + // then we risk OOMing the parent process. It's better to just deny the allocation and
> + // kill the child process, which is what the following code does.
I would add a comment: TODO: make that check the actual GL_MAX_TEXTURE_SIZE instead of hardcoding 4096.
Attachment #8378367 -
Flags: review?(bjacob) → review+
Assignee | ||
Comment 20•11 years ago
|
||
Added TODO and landed:
https://hg.mozilla.org/integration/mozilla-inbound/rev/679e69e79fe2
Assignee | ||
Comment 21•11 years ago
|
||
Comment on attachment 8378367 [details] [diff] [review]
Kill child before it kills the parent via OOM
NOTE: This flag is now for security issues only. Please see https://wiki.mozilla.org/Release_Management/B2G_Landing to better understand the B2G approval process and landings.
[Approval Request Comment]
Bug caused by (feature/regressing bug #): APZ
User impact if declined: load some webpages (those with large iframes or scrollable subframes) can cause the parent process to get OOM-killed
Testing completed: locally
Risk to taking this patch (and alternatives if risky): there is a risk that this patch causes child proceses to get killed more frequently. However this should only happen in cases where we otherwise risk OOMing the parent, so I think it's better.
String or UUID changes made by this patch: none
Attachment #8378367 -
Flags: approval-mozilla-b2g28?
Updated•11 years ago
|
Attachment #8378367 -
Flags: approval-mozilla-b2g28? → approval-mozilla-b2g28+
Comment 22•11 years ago
|
||
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: 1.4 S1 (14feb) → 1.4 S2 (28feb)
Comment 23•11 years ago
|
||
status-b2g-v1.4:
--- → fixed
status-firefox28:
--- → wontfix
status-firefox29:
--- → wontfix
status-firefox30:
--- → fixed
Assignee | ||
Comment 24•11 years ago
|
||
Backed out on m-c for causing C2 failure on B2G emulator:
https://hg.mozilla.org/mozilla-central/rev/660b62608951
Backouts to follow for other trees.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Assignee | ||
Comment 25•11 years ago
|
||
Backout on b2g28: https://hg.mozilla.org/releases/mozilla-b2g28_v1_3/rev/32fd5d798477
Assignee | ||
Comment 26•11 years ago
|
||
Here is an example failure log: https://tbpl.mozilla.org/php/getParsedLog.php?id=34935180&full=1&branch=mozilla-inbound
The crashtest that is failing is layout/base/crashtests/653133-1.html which sets a displayport 6000 pixels tall and expects to not crash. That sounds a little unreasonable to me but I'll have to look into the history of that test before I can disable it with a clear conscience.
Assignee | ||
Comment 27•11 years ago
|
||
Based on the history of the test I think it should still be a valid test if I drop the displayport height to 4096. Try run with that change:
https://tbpl.mozilla.org/?tree=Try&rev=90968afa290c
Assignee | ||
Comment 28•11 years ago
|
||
Try run is green but I'd like to wait on the results of the investigation in bug 965389 before re-landing, since this might result in the gallery app crashing more frequently.
Assignee | ||
Comment 29•11 years ago
|
||
The gallery app part of bug 965389 doesn't happen on 1.3 so I'm going to go ahead and land this again with the test change. It will likely cause the gallery app to crash on nexus 4 and (less frequently) on hamachi, but that's ok because if it didn't crash it would just show black or OOM the phone which isn't much better. And hopefully we'll have bug 957668 done in the 1.4 timeframe and so it'll get taken care of properly.
Assignee | ||
Comment 30•11 years ago
|
||
Comment 31•11 years ago
|
||
Status: REOPENED → RESOLVED
Closed: 11 years ago → 11 years ago
Resolution: --- → FIXED
Assignee | ||
Updated•11 years ago
|
Assignee | ||
Comment 32•11 years ago
|
||
I had a b2g28 tree handy so I pushed the uplift too:
https://hg.mozilla.org/releases/mozilla-b2g28_v1_3/rev/34278682f4c6
Whiteboard: [c=memory p= s= u=1.3] dogfood1.3 [ETA:2/26] → [c=memory p= s= u=1.3] dogfood1.3
Updated•11 years ago
|
status-b2g-v1.3T:
--- → fixed
You need to log in
before you can comment on or make changes to this bug.
Description
•