Content process crash when zooming into www.mozilla.org on the Vibrant

VERIFIED FIXED

Status

Fennec Graveyard
General
--
critical
VERIFIED FIXED
7 years ago
7 years ago

People

(Reporter: aakashd, Assigned: cjones)

Tracking

Trunk
ARM
Android

Details

Attachments

(2 attachments)

(Reporter)

Description

7 years ago
Build Id:
Mozilla/5.0 (Android; Linux armv71; rv:2.0b8pre) Gecko/20101027 Namoroka/4.0b8pre Fennec/4.0b2pre (Vibrant)

Note: This does not happen on the Nexus One. Also, I've seen the entire browser crash as well performing these steps.

Steps to Reproduce:
1. Go to about:home
2. Go to addons.mozilla.org (it'll open in a new tab)
3. Go to www.mozilla.org in a new tab
4. Click the favicon and select "Find in page"
5. Type in "mozilla"
6. Tap the down button twice

Actual Results:
The content process should crash the tab. 

Expected Results:
I should see the third instance of the string "mozilla" in the page.
(Reporter)

Comment 1

7 years ago
An additional note:

* You can reproduce this by trying to multitouch zoom into the page once or twice as well. Find in Page is not the only way for this to happen.
tracking-fennec: --- → ?
I tried steps in comment 0 (typing "mo" instead of "mozilla" though) and pinch-zooming around and couldn't force a crash.  This is with a local build off m-c 591315017365.  Will try a nightly.
Grr, can't force a crash with a nightly either.  Can people who can make this happen check logcat after crashes to see if there are any error messages there?
Created attachment 486387 [details] [diff] [review]
Enable gdb (bypass custom dlopen)

I've been building locally with the attached patch, so that I can run gdb if need be.  I just made a local build without that patch, and during the last 15 minutes of my normal browsing I *am* now seeing much crashy; three content process crashes and one chrome crash.  With the patch applied, I haven't seen crashes in either for a very long time.

This presents rather a problem: I can't use gdb to get stacks and investigate without this patch, but with this patch I don't crash anymore.  Maybe mwu has thoughts.
Boy, fennec is not very usable with these crashes.  I think this should block beta2, possibly only wrt galaxy s's if we're only crashy there.  Either need to fix or drop support.
I don't see anything in logcat on a content-process crash.  There are some low-memory messages in the log though; unfortunately I don't know if these happen during normal browsing because I don't look at logcat.
I was able to repro a content process crash with attach 486387 applied.

0xafe0d9c4 in __futex_wait () from /home/cjones/android/gdb/lib/libc.so
(gdb) c
Continuing.

Program terminated with signal SIGKILL, Killed.

Looks like plugin-container being OOM-killed.  Will try to collect a few more, see what's up.
Should have mentioned --- I tried the patch in bug 607534 but it didn't seem to help.

With attach 486347 applied, I noodled around until I collected 5 crashes.  The STR didn't work for me; I just browsed around to large, image-heavy sites and zoomed, panned, and navigated.  3 of the crashes were SIGKILLs of the content process, like comment 7.  One was bug 607723.  The last was a SIGKILL of the chrome process.  The SIGKILLs I'm pretty sure are fennec being OOM-killed.

The OOM crashes I used to see were abort()s in the content process after failing to allocate ashmem (and only the content process, since it's the only process that allocates).  These showed up in logcat too.  I don't really have an explanation for why we would start being SIGKILLed all of a sudden.
Bug 607534 didn't seem to help much on its own.  Another patch to reduce our prefetch area by 4x (2x in each dimension) didn't help much either, just resulted in the chrome process being OOM-killed before content.  With both together, however, I had to work quite hard to get fennec killed off, about as hard as with the dlopen-bypass patch.

Shrinking the prefetch region isn't free though, because we have to repaint more often.  I had wanted to abstain from twiddling with it until we had a good benchmark, but since we're down to the eleventh hour here, I'm going to post the shrinker patch along with a rollup of bug 603885.  If the panning UX isn't affected too much, I suggest we take all three: dlopen sharing, prefetch shrink, paint optimizations.
I put up a build with bug 606534, bug 603885, and
-pref("toolkit.browser.cachePixelX", 800);
-pref("toolkit.browser.cachePixelY", 2000);
+pref("toolkit.browser.cachePixelX", 400);
+pref("toolkit.browser.cachePixelY", 1000);

at http://people.mozilla.com/~cjones/fennec-4.0b2pre.en-US.eabi-arm.apk.  On news.google.com there's not much observable difference when scrolling the main content area.  Trying some other sites.
I tested nytimes.com by zooming to the top-left article, then scrolling straight downward at the speed I could read headlines in the main area, then at the speed I could read the "World", "US", "Politics", etc. headings in the article index.  In the main content area, I didn't see a difference between the build linked in comment 10 and vanilla tip.  In the article index however, vanilla tip had noticeably less checkerboarding (though both were bad at around "Real Estate").

I also tested horizontal scrolling in the main content area.  Both builds had a lot of checkerboarding, but I found the UX of the build in comment 10 better because new pixels came in more quickly.
On timecube.com, flicking down on every beat from http://webmetronome.com/, vanilla tip has a lot of checkerboarding all the way down.  The build from comment 10 seems to do a lot better.

It doesn't appear to me we're going to lose any perceived performance from a smaller prefetch+painting optimizations, and with shared-dlopen + smaller prefetch we have less upfront memory overhead plus 4x surface memory in the worst case.  I vote for taking these for beta2, if other folks can confirm what I'm seeing wrt to perceived perf.
(In reply to comment #12)
> with shared-dlopen + smaller prefetch we have less upfront memory overhead plus 4x surface memory in the worst case.

4x *less*.  A significant typo.
The smaller prefetch seems too small to me. I get a lot more checkerboarding on timecube. It does fill in faster though.
100% crash every time on Motorola Milestone (10/27 nightly) with STR in comment #0 as well as comment #12's url
(In reply to comment #15)
> 100% crash every time on Motorola Milestone (10/27 nightly) with STR in comment
> #0 as well as comment #12's url

Aaron - Can you try the build in comment 10?
(In reply to comment #15)
> 100% crash every time on Motorola Milestone (10/27 nightly) with STR in comment
> #0 as well as comment #12's url

You were crashing on http://webmetronome.com/ ?
(In reply to comment #16)
> (In reply to comment #15)
> > 100% crash every time on Motorola Milestone (10/27 nightly) with STR in comment
> > #0 as well as comment #12's url
> 
> Aaron - Can you try the build in comment 10?
Interesting, with this build not crashing. No crash on timecube.com, so far no crash on STR from comment #0. Will continue to use this build and experiment

(In reply to comment #17)
> (In reply to comment #15)
> > 100% crash every time on Motorola Milestone (10/27 nightly) with STR in comment
> > #0 as well as comment #12's url
> 
> You were crashing on http://webmetronome.com/ ?
No timecube.com. I can now pan down, albeit I see grey background if I go too quickly.
Thanks Aaron!

(In reply to comment #10)
> I put up a build with bug 606534

That was supposed to say "bug 607534".
Followup to my comment. To keep it simple, the build Chris posted is the most stable build I've used since early October on my device. Albeit a complicated and busy site such as nytimes.com is now not crashing on load, a crash still happens when I zoom. On the other hand, dozens of other sites are now not crashing the browser.
I played around with this some more last night, and the build in comment 10 is doesn't do well on sites with text that's wider than the screen.  From slow horizontal scrolling to see the end of lines, it appeared that we weren't caching anything outside of the window.  Probably worth twiddling if we want to take these for b2.
Let's start heading towards a decision.  Here are the improvements I think we know the build in comment 10 yields
 -- on lower memory devices, fennec is usable but crashy
 -- on higher memory devices, fennec is less crashy (and friendlier to the OS)

Here's what we seem to be trading off
 (1) on very-slow-to-rerender pages, smaller prefetch region strictly worse for vertical scrolling (e.g. html5test.com)
 (2) more frequent checkerboarding but faster refilling on fast-to-rerender pages (maybe a wash)
 (3) no horizontal prefetch (not sure why).  So, e.g., reading horizontally-overflowed text is very unpleasant

Am I missing anything?

If we were to ship something like comment 10, I think we would definitely need to twiddle (3).  I'm not so sure that we need to do anything for (1) or (2).  Thoughts?
(In reply to comment #22)
>  (3) no horizontal prefetch (not sure why).  So, e.g., reading
> horizontally-overflowed text is very unpleasant

With cachePixelX set to 400, we have an 800-pixel-wide displayport.  With a 480-pixel-wide portrait display, this leaves only 160 pixels buffered to either side.  With an 800 or 854-pixel-wide landscape display, it leaves no buffer at all.

I would suggest increasing cachePixelX to 500, which should give a more noticeable buffer for horizontal scrolling, in exchange for 25% higher surface memory.
Newer build off of m-c 16f18e41dcf3, m-b bec84542c46b with bug 607534, bug 603885, and tweaked cachePixelX/Y = 500/1000.

http://people.mozilla.com/~cjones/fennec-4.0b2pre.en-US.eabi-arm.apk
With *just* the newest patch in bug 607534, behavior on my Vibrant is about the same (AFAICT) as with beta1.  I couldn't force chrome or content to crash during my torture tests.  That build isn't playing nice with the rest of the OS, as builds in comment 24 and comment 10 did, but also neither did beta1.

The sharing patch isn't terribly "safe" in an abstract sense, but from my tests just that patch alone keeps behavior closest to beta1.
I tried a couple more cachePixelX/Y tweaks (along with latest in bug 607534).  In my testing, at
 -- 600x1200, we cause background stuff to die during both torture testing and normal browsing (though it takes longer for stuff to die as compared to b1)
 -- 580x1000, we cause background stuff to die during torture testing, but not normal browsing

Build is at http://people.mozilla.com/~cjones/fennec-4.0b2pre.en-US.eabi-arm.apk .

Still flying by the seat of my pants, but 580x1000 seems to me like a reasonable compromise for b2.  Input from folks on other devices very much desired.
(580x1000 turns out coincidentally to result in almost exactly 3x less memory use than 800x1600.)
Created attachment 486765 [details] [diff] [review]
Shrink our prefetch region to reduce memory usage
Attachment #486765 - Flags: review?(mbrubeck)
Attachment #486765 - Flags: review?(mbrubeck) → review+
http://hg.mozilla.org/mobile-browser/rev/c80baa8d65d6

Nothing more to do here for b2.  This patch should be enough to make the crashy go away, and bug 607534 will reduce memory usage yet more.
Status: NEW → RESOLVED
Last Resolved: 7 years ago
Depends on: 607534
Resolution: --- → FIXED

Updated

7 years ago
tracking-fennec: ? → 2.0b2+
(Reporter)

Comment 30

7 years ago
verified FIXED on build:

Mozilla/5.0 (Android; Linux armv71; rv:2.0b8pre) Gecko/20101029 Namoroka/4.0b8pre Fennec/4.0b2pre

It's terribly choppy while sync is syncing though.
Status: RESOLVED → VERIFIED

Updated

7 years ago
Assignee: nobody → jones.chris.g
You need to log in before you can comment on or make changes to this bug.