Closed Bug 841976 Opened 11 years ago Closed 11 years ago

Gaia apps randomly freeze during UI tests

Tracking

(Not tracked)

Status:

RESOLVED WORKSFORME

People

(Reporter: jgriffin, Unassigned)

References

Details

(Whiteboard: [MemShrink:P1])

Attachments

(10 files)

logcat 11 years ago Jonathan Griffin (:jgriffin) 306.29 KB, text/plain		Details
about:memory report 11 years ago Jonathan Griffin (:jgriffin) 1.13 MB, text/plain		Details
dmd-b2g-109.txt.gz 11 years ago Jonathan Griffin (:jgriffin) 1.08 MB, application/x-gzip		Details
dmd-1357043675-2027.txt.gz 11 years ago Jonathan Griffin (:jgriffin) 160.12 KB, application/x-gzip		Details
dmd-1357043675-937.txt.gz 11 years ago Jonathan Griffin (:jgriffin) 103.97 KB, application/x-gzip		Details
dmd-1357043675-109.txt.gz 11 years ago Jonathan Griffin (:jgriffin) 332.29 KB, application/x-gzip		Details
output of 'thread apply all bt' 11 years ago Jonathan Griffin (:jgriffin) 58.86 KB, text/plain		Details
about:memory report when reuseGlobal=false 11 years ago Jonathan Griffin (:jgriffin) 1.24 MB, text/plain		Details
DMD reports when reuseGlobal=false 11 years ago Jonathan Griffin (:jgriffin) 1.05 MB, application/x-gzip		Details
output of 'adb shell dmesg' 11 years ago Jonathan Griffin (:jgriffin) 129.56 KB, text/plain		Details

Jonathan Griffin (:jgriffin)

Reporter

Description

•

11 years ago

While running the Gaia UI tests, Gaia apps will randomly freeze for a few minutes, during which time the UI is (mostly) unresponsive.  This seems to be less frequent today than it was earlier in the week, but still occurs.  This completely breaks the tests when it happens.  This tends to happen towards the end of the test cycle after many apps have been exercised.

Per the discussion in bug 837187 I gathered a gdb thread dump and an about-memory report with MOZ_DMD enabled, and am attaching those (plus the logcat) here.

Note that the memory dump seemed to hang during "Processing DMD files.  This may take a minute or two."...I killed it after 20 minutes but am attaching the raw files.

Jonathan Griffin (:jgriffin)

Reporter

Comment 1

•

11 years ago

Attached file logcat — Details

Jonathan Griffin (:jgriffin)

Reporter

Comment 2

•

11 years ago

Attached file about:memory report — Details

Jonathan Griffin (:jgriffin)

Reporter

Comment 3

•

11 years ago

Attached file dmd-b2g-109.txt.gz — Details

Jonathan Griffin (:jgriffin)

Reporter

Comment 4

•

11 years ago

Attached file dmd-1357043675-2027.txt.gz — Details

Jonathan Griffin (:jgriffin)

Reporter

Comment 5

•

11 years ago

Attached file dmd-1357043675-937.txt.gz — Details

Jonathan Griffin (:jgriffin)

Reporter

Comment 6

•

11 years ago

Attached file dmd-1357043675-109.txt.gz — Details

Jonathan Griffin (:jgriffin)

Reporter

Comment 7

•

11 years ago

Attached file output of 'thread apply all bt' — Details

Justin Lebar (not reading bugmail)

Comment 8

•

11 years ago

I don't see anything particularly wrong in the gdb output, but maybe cjones will.  What concerns me the most is that the main process has an RSS here of 87mb, which is very bad.

I'm not sure that the memory problem is causing your hang, though.  I'd expect to see all of the apps on the system killed before we hang, but you still have the homescreen and a browser process alive.

It might be worth checking the output of adb shell dmesg next time this happens.

On the upside, the main process does not have high heap-unclassified (and also the DMD report for the main process was fully processed), but on the downside, most of the memory falls into the one System Principal compartment, which is mostly opaque.  This is due to bug 798491.

It actually might be very interesting if you could reproduce this problem with bug 798491 disabled.  We'd get a very different memory report, I expect.  I think this is just a matter of flipping the "jsloader.reuseGlobal" pref to false in b2g.js.

FWIW the memory report here looks pretty different from the one in bug 837187, although there may be some overlap, because they're both using too much memory in the system compartment.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 9

•

11 years ago

OK, this guy is *very* suspicious:

Unreported: 61 blocks in stack trace record 4 of 819
 499,712 bytes (386,252 requested / 113,460 slop)
 1.16% of the heap (5.96% cumulative);  4.09% of unreported (20.93% cumulative)
 Allocated at
   replace_malloc /home/jgriffin/mozilla-inbound/src/memory/replace/dmd/DMD.cpp:1228 (0x4008b75e libdmd.so+0x375e)
   malloc /home/jgriffin/mozilla-inbound/src/memory/build/replace_malloc.c:152 (0x401ff2fa libmozglue.so+0x42fa)
   moz_xmalloc /home/jgriffin/mozilla-inbound/src/memory/mozalloc/mozalloc.cpp:55 (0x411bebfa libxul.so+0xfa7bfa)
   Channel /home/jgriffin/mozilla-inbound/src/ipc/chromium/src/chrome/common/ipc_channel_posix.cc:838 (0x40d780f4 libxul.so+0xb610f4)
   mozilla::ipc::OpenDescriptor(mozilla::ipc::TransportDescriptor const&, IPC::Channel::Mode) /home/jgriffin/mozilla-inbound/src/ipc/glue/Transport_posix.cpp:56 (0x40b8eadc libxul.so+0x977adc)
   mozilla::dom::PContentParent::OnMessageReceived(IPC::Message const&) /home/jgriffin/unagi/objdir-gecko/ipc/ipdl/PContentParent.cpp:2371 (0x410be68a libxul.so+0x9a168a)

This is the ipc::Transport we allocate *per process* for the graphics pipeline.  Having 61 of them alive when there are 2 content processes is very worrying ...

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 10

•

11 years ago

This is awesome jgriffin, thanks!

(In reply to Jonathan Griffin (:jgriffin) from comment #7)
> Created attachment 714700 [details]
> output of 'thread apply all bt'

Unfortunately nothing looks wedged here.  In particular, all the chromium threads are sitting at epoll, and the compositor thread is one of those.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Updated

•

11 years ago

Depends on: 841993

Jonathan Griffin (:jgriffin)

Reporter

Comment 11

•

11 years ago

(In reply to Justin Lebar [:jlebar] from comment #8)
> 
> It actually might be very interesting if you could reproduce this problem
> with bug 798491 disabled.  We'd get a very different memory report, I
> expect.  I think this is just a matter of flipping the
> "jsloader.reuseGlobal" pref to false in b2g.js.
> 

Ok, I'll do that.

(In reply to Chris Jones [:cjones] [:warhammer] from comment #9)
> OK, this guy is *very* suspicious:
> 
> This is the ipc::Transport we allocate *per process* for the graphics
> pipeline.  Having 61 of them alive when there are 2 content processes is
> very worrying ...

FWIW, the freeze usually occurs during the start-app transition, with the transition half-complete on the screen.

Justin Lebar (not reading bugmail)

Updated

•

11 years ago

Whiteboard: [MemShrink]

Nicholas Nethercote [inactive]

Updated

•

11 years ago

Whiteboard: [MemShrink] → [MemShrink:P1]

Jonathan Griffin (:jgriffin)

Reporter

Comment 12

•

11 years ago

Attached file about:memory report when reuseGlobal=false — Details

Jonathan Griffin (:jgriffin)

Reporter

Comment 13

•

11 years ago

Attached file DMD reports when reuseGlobal=false — Details

Jonathan Griffin (:jgriffin)

Reporter

Comment 14

•

11 years ago

With jsloader.reuseGlobal=false, I get somewhat different symptoms.   Instead of a hard freeze followed by normal behavior, the tests just gradually get slower and slower until the phone becomes entirely non-operational.  I've attached the memory and DMD dumps from this.  I'll attach the output of 'adb shell dmesg' as well.

Jonathan Griffin (:jgriffin)

Reporter

Comment 15

•

11 years ago

Attached file output of 'adb shell dmesg' — Details

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 16

•

11 years ago

If these tests are exercising NITZ or the time API, there's an outside chance that bug 842550 could have contributed to hangs.

Depends on: 842550

Justin Lebar (not reading bugmail)

Comment 17

•

11 years ago

jgriffin, you are a hero for all this debugging info.  Sorry I haven't had a chance to look at it.  I may not be able to get to it until next week.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 18

•

11 years ago

Are the UI freezes still reproducing?

Jonathan Griffin (:jgriffin)

Reporter

Comment 19

•

11 years ago

We've changed the automation to restart B2G between each test, to avoid hitting this problem.  I'll run them locally without the restart and see if the freeze still occurs.

Jonathan Griffin (:jgriffin)

Reporter

Comment 20

•

11 years ago

I ran these tests a couple of times locally.  I'd say that the freezing problem is fixed, but other bad problems remain if we don't restart B2G between each test.  These problems include all the icons disappearing from the homescreen, IPC errors and child process crashes.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 21

•

11 years ago

Please to be filing! :)

Status: NEW → RESOLVED

Closed: 11 years ago

Resolution: --- → WORKSFORME

You need to log in before you can comment on or make changes to this bug.