607634 - Content process crash when zooming into www.mozilla.org on the Vibrant

Reporter

Description

•

14 years ago

Build Id:
Mozilla/5.0 (Android; Linux armv71; rv:2.0b8pre) Gecko/20101027 Namoroka/4.0b8pre Fennec/4.0b2pre (Vibrant)

Note: This does not happen on the Nexus One. Also, I've seen the entire browser crash as well performing these steps.

Steps to Reproduce:
1. Go to about:home
2. Go to addons.mozilla.org (it'll open in a new tab)
3. Go to www.mozilla.org in a new tab
4. Click the favicon and select "Find in page"
5. Type in "mozilla"
6. Tap the down button twice

Actual Results:
The content process should crash the tab. 

Expected Results:
I should see the third instance of the string "mozilla" in the page.

Aakash Desai [:aakashd]

Reporter

Comment 1

•

14 years ago

An additional note:

* You can reproduce this by trying to multitouch zoom into the page once or twice as well. Find in Page is not the only way for this to happen.

tracking-fennec: --- → ?

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 2

•

14 years ago

I tried steps in comment 0 (typing "mo" instead of "mozilla" though) and pinch-zooming around and couldn't force a crash.  This is with a local build off m-c 591315017365.  Will try a nightly.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 3

•

14 years ago

Grr, can't force a crash with a nightly either.  Can people who can make this happen check logcat after crashes to see if there are any error messages there?

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 4

•

14 years ago

Attached patch Enable gdb (bypass custom dlopen) — Details — Splinter Review

I've been building locally with the attached patch, so that I can run gdb if need be.  I just made a local build without that patch, and during the last 15 minutes of my normal browsing I *am* now seeing much crashy; three content process crashes and one chrome crash.  With the patch applied, I haven't seen crashes in either for a very long time.

This presents rather a problem: I can't use gdb to get stacks and investigate without this patch, but with this patch I don't crash anymore.  Maybe mwu has thoughts.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 5

•

14 years ago

Boy, fennec is not very usable with these crashes.  I think this should block beta2, possibly only wrt galaxy s's if we're only crashy there.  Either need to fix or drop support.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 6

•

14 years ago

I don't see anything in logcat on a content-process crash.  There are some low-memory messages in the log though; unfortunately I don't know if these happen during normal browsing because I don't look at logcat.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 7

•

14 years ago

I was able to repro a content process crash with attach 486387 applied.

0xafe0d9c4 in __futex_wait () from /home/cjones/android/gdb/lib/libc.so
(gdb) c
Continuing.

Program terminated with signal SIGKILL, Killed.

Looks like plugin-container being OOM-killed.  Will try to collect a few more, see what's up.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 8

•

14 years ago

Should have mentioned --- I tried the patch in bug 607534 but it didn't seem to help.

With attach 486347 applied, I noodled around until I collected 5 crashes.  The STR didn't work for me; I just browsed around to large, image-heavy sites and zoomed, panned, and navigated.  3 of the crashes were SIGKILLs of the content process, like comment 7.  One was bug 607723.  The last was a SIGKILL of the chrome process.  The SIGKILLs I'm pretty sure are fennec being OOM-killed.

The OOM crashes I used to see were abort()s in the content process after failing to allocate ashmem (and only the content process, since it's the only process that allocates).  These showed up in logcat too.  I don't really have an explanation for why we would start being SIGKILLed all of a sudden.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 9

•

14 years ago

Bug 607534 didn't seem to help much on its own.  Another patch to reduce our prefetch area by 4x (2x in each dimension) didn't help much either, just resulted in the chrome process being OOM-killed before content.  With both together, however, I had to work quite hard to get fennec killed off, about as hard as with the dlopen-bypass patch.

Shrinking the prefetch region isn't free though, because we have to repaint more often.  I had wanted to abstain from twiddling with it until we had a good benchmark, but since we're down to the eleventh hour here, I'm going to post the shrinker patch along with a rollup of bug 603885.  If the panning UX isn't affected too much, I suggest we take all three: dlopen sharing, prefetch shrink, paint optimizations.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 10

•

14 years ago

I put up a build with bug 606534, bug 603885, and
-pref("toolkit.browser.cachePixelX", 800);
-pref("toolkit.browser.cachePixelY", 2000);
+pref("toolkit.browser.cachePixelX", 400);
+pref("toolkit.browser.cachePixelY", 1000);

at http://people.mozilla.com/~cjones/fennec-4.0b2pre.en-US.eabi-arm.apk.  On news.google.com there's not much observable difference when scrolling the main content area.  Trying some other sites.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 11

•

14 years ago

I tested nytimes.com by zooming to the top-left article, then scrolling straight downward at the speed I could read headlines in the main area, then at the speed I could read the "World", "US", "Politics", etc. headings in the article index.  In the main content area, I didn't see a difference between the build linked in comment 10 and vanilla tip.  In the article index however, vanilla tip had noticeably less checkerboarding (though both were bad at around "Real Estate").

I also tested horizontal scrolling in the main content area.  Both builds had a lot of checkerboarding, but I found the UX of the build in comment 10 better because new pixels came in more quickly.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 12

•

14 years ago

On timecube.com, flicking down on every beat from http://webmetronome.com/, vanilla tip has a lot of checkerboarding all the way down.  The build from comment 10 seems to do a lot better.

It doesn't appear to me we're going to lose any perceived performance from a smaller prefetch+painting optimizations, and with shared-dlopen + smaller prefetch we have less upfront memory overhead plus 4x surface memory in the worst case.  I vote for taking these for beta2, if other folks can confirm what I'm seeing wrt to perceived perf.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 13

•

14 years ago

(In reply to comment #12)
> with shared-dlopen + smaller prefetch we have less upfront memory overhead plus 4x surface memory in the worst case.

4x *less*.  A significant typo.

Benjamin Stover (:stechz)

Comment 14

•

14 years ago

The smaller prefetch seems too small to me. I get a lot more checkerboarding on timecube. It does fill in faster though.

Aaron Train [:aaronmt]

Comment 15

•

14 years ago

100% crash every time on Motorola Milestone (10/27 nightly) with STR in comment #0 as well as comment #12's url

Mark Finkle (:mfinkle) (use needinfo?)

Comment 16

•

14 years ago

(In reply to comment #15)
> 100% crash every time on Motorola Milestone (10/27 nightly) with STR in comment
> #0 as well as comment #12's url

Aaron - Can you try the build in comment 10?

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 17

•

14 years ago

(In reply to comment #15)
> 100% crash every time on Motorola Milestone (10/27 nightly) with STR in comment
> #0 as well as comment #12's url

You were crashing on http://webmetronome.com/ ?

Aaron Train [:aaronmt]

Comment 18

•

14 years ago

(In reply to comment #16)
> (In reply to comment #15)
> > 100% crash every time on Motorola Milestone (10/27 nightly) with STR in comment
> > #0 as well as comment #12's url
> 
> Aaron - Can you try the build in comment 10?
Interesting, with this build not crashing. No crash on timecube.com, so far no crash on STR from comment #0. Will continue to use this build and experiment

(In reply to comment #17)
> (In reply to comment #15)
> > 100% crash every time on Motorola Milestone (10/27 nightly) with STR in comment
> > #0 as well as comment #12's url
> 
> You were crashing on http://webmetronome.com/ ?
No timecube.com. I can now pan down, albeit I see grey background if I go too quickly.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 19

•

14 years ago

Thanks Aaron!

(In reply to comment #10)
> I put up a build with bug 606534

That was supposed to say "bug 607534".

Aaron Train [:aaronmt]

Comment 20

•

14 years ago

Followup to my comment. To keep it simple, the build Chris posted is the most stable build I've used since early October on my device. Albeit a complicated and busy site such as nytimes.com is now not crashing on load, a crash still happens when I zoom. On the other hand, dozens of other sites are now not crashing the browser.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 21

•

14 years ago

I played around with this some more last night, and the build in comment 10 is doesn't do well on sites with text that's wider than the screen.  From slow horizontal scrolling to see the end of lines, it appeared that we weren't caching anything outside of the window.  Probably worth twiddling if we want to take these for b2.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 22

•

14 years ago

Let's start heading towards a decision.  Here are the improvements I think we know the build in comment 10 yields
 -- on lower memory devices, fennec is usable but crashy
 -- on higher memory devices, fennec is less crashy (and friendlier to the OS)

Here's what we seem to be trading off
 (1) on very-slow-to-rerender pages, smaller prefetch region strictly worse for vertical scrolling (e.g. html5test.com)
 (2) more frequent checkerboarding but faster refilling on fast-to-rerender pages (maybe a wash)
 (3) no horizontal prefetch (not sure why).  So, e.g., reading horizontally-overflowed text is very unpleasant

Am I missing anything?

If we were to ship something like comment 10, I think we would definitely need to twiddle (3).  I'm not so sure that we need to do anything for (1) or (2).  Thoughts?

Matt Brubeck (:mbrubeck)

Comment 23

•

14 years ago

(In reply to comment #22)
>  (3) no horizontal prefetch (not sure why).  So, e.g., reading
> horizontally-overflowed text is very unpleasant

With cachePixelX set to 400, we have an 800-pixel-wide displayport.  With a 480-pixel-wide portrait display, this leaves only 160 pixels buffered to either side.  With an 800 or 854-pixel-wide landscape display, it leaves no buffer at all.

I would suggest increasing cachePixelX to 500, which should give a more noticeable buffer for horizontal scrolling, in exchange for 25% higher surface memory.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 24

•

14 years ago

Newer build off of m-c 16f18e41dcf3, m-b bec84542c46b with bug 607534, bug 603885, and tweaked cachePixelX/Y = 500/1000.

http://people.mozilla.com/~cjones/fennec-4.0b2pre.en-US.eabi-arm.apk

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 25

•

14 years ago

With *just* the newest patch in bug 607534, behavior on my Vibrant is about the same (AFAICT) as with beta1.  I couldn't force chrome or content to crash during my torture tests.  That build isn't playing nice with the rest of the OS, as builds in comment 24 and comment 10 did, but also neither did beta1.

The sharing patch isn't terribly "safe" in an abstract sense, but from my tests just that patch alone keeps behavior closest to beta1.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 26

•

14 years ago

I tried a couple more cachePixelX/Y tweaks (along with latest in bug 607534).  In my testing, at
 -- 600x1200, we cause background stuff to die during both torture testing and normal browsing (though it takes longer for stuff to die as compared to b1)
 -- 580x1000, we cause background stuff to die during torture testing, but not normal browsing

Build is at http://people.mozilla.com/~cjones/fennec-4.0b2pre.en-US.eabi-arm.apk .

Still flying by the seat of my pants, but 580x1000 seems to me like a reasonable compromise for b2.  Input from folks on other devices very much desired.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 27

•

14 years ago

(580x1000 turns out coincidentally to result in almost exactly 3x less memory use than 800x1600.)

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 28

•

14 years ago

Attached patch Shrink our prefetch region to reduce memory usage — Details — Splinter Review

Attachment #486765 - Flags: review?(mbrubeck)

Matt Brubeck (:mbrubeck)

Updated

•

14 years ago

Attachment #486765 - Flags: review?(mbrubeck) → review+

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Assignee

Comment 29

•

14 years ago

http://hg.mozilla.org/mobile-browser/rev/c80baa8d65d6

Nothing more to do here for b2.  This patch should be enough to make the crashy go away, and bug 607534 will reduce memory usage yet more.

Status: NEW → RESOLVED

Closed: 14 years ago

Depends on: 607534

Resolution: --- → FIXED

Doug Turner (:dougt)

Updated

•

14 years ago

tracking-fennec: ? → 2.0b2+

Aakash Desai [:aakashd]

Reporter

Comment 30

•

14 years ago

verified FIXED on build:

Mozilla/5.0 (Android; Linux armv71; rv:2.0b8pre) Gecko/20101029 Namoroka/4.0b8pre Fennec/4.0b2pre

It's terribly choppy while sync is syncing though.

Status: RESOLVED → VERIFIED

Dão Gottwald [:dao]

Updated

•

13 years ago

Assignee: nobody → jones.chris.g

Enable gdb (bypass custom dlopen) 14 years ago Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 1.23 KB, patch		Details \| Diff \| Splinter Review
Shrink our prefetch region to reduce memory usage 14 years ago Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 1.46 KB, patch	mbrubeck : review+	Details \| Diff \| Splinter Review