Closed Bug 824526 Opened 12 years ago Closed 12 years ago

b2g process crash during MW0

Categories

(Core :: General, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

()

RESOLVED DUPLICATE of bug 822398
blocking-basecamp +

People

(Reporter: cjones, Assigned: dougt)

References

Details

STR
 (1) follow steps at https://wiki.mozilla.org/B2G/Memory_acceptance_criteria#MW0:_Every_app_is_successfully_launched_into_the_foreground

I get a crash after step 22, segfaulting on null.  No additional information.
Assignee: nobody → doug.turner
blocking-basecamp: ? → +
I am using a debug build and can reproduce something similar to this problem after step 9.

I see lots of OOM warnings, then:


I/Gecko   (  481): [Parent 481] ###!!! ASSERTION: op == PL_DHASH_LOOKUP || RECURSION_LEVEL(table) == 0: 'op == PL_DHASH_LOOKUP || RECURSION_LEVEL(table) == 0', file /Users/dougt/builds/B2G/objdir-gecko/xpcom/build/pldhash.cpp, line 574
F/MOZ_Assert(  481): Assertion failure: chars[length] == 0, at /Users/dougt/builds/B2G/gecko/js/src/vm/String-inl.h:284
F/libc    (  481): Fatal signal 11 (SIGSEGV) at 0x00000000 (code=1)


A few ms later:

I/Gecko   (  575): [Child 575] WARNING: NS_ENSURE_TRUE(IsChromeProcess()) failed: file /Users/dougt/builds/B2G/gecko/content/base/src/nsFrameMessageManager.cpp, line 687
D/memalloc(  481): /dev/pmem: Allocated buffer base:0x4b400000 size:4096 offset:2990080 fd:129
D/memalloc(  534): /dev/pmem: Mapped buffer base:0x45356000 size:2994176 offset:2990080 fd:35
D/memalloc(  481): /dev/pmem: Allocated buffer base:0x4b400000 size:122880 offset:2994176 fd:132
I/Gecko   (  534): [Child 534] ###!!! ABORT: unexpected type tag: '(mType) == (aType)', file ../../ipc/ipdl/_ipdlheaders/mozilla/layers/LayersSurfaces.h, line 83
I/Gecko   (  647): [Child 647] WARNING: shutting down early because of crash!: file /Users/dougt/builds/B2G/gecko/dom/ipc/ContentChild.cpp, line 830
I/Gecko   (  647): [Child 647] WARNING: content process _exit()ing: file /Users/dougt/builds/B2G/gecko/dom/ipc/ContentChild.cpp, line 879
F/libc    (  534): Fatal signal 11 (SIGSEGV) at 0x00000000 (code=1)


Snapshot of ps a about 100ms before the crash:

APPLICATION      USER     PID   PPID  VSIZE  RSS     WCHAN    PC         NAME
b2g              root      105   1     172192 69104 ffffffff b0003430 t /system/b2g/b2g
Homescreen       app_351   351   105   93872  24548 ffffffff 4009d6ec S /system/b2g/plugin-container
Messages         app_1482  1482  105   73680  18856 ffffffff 40082af4 R /system/b2g/plugin-container
Browser          app_3891  3891  105   67404  15592 ffffffff 4011f330 S /system/b2g/plugin-container
Feedback         app_4325  4325  105   69584  16916 ffffffff 400b6330 S /system/b2g/plugin-container
Gallery          app_6673  6673  105   72664  22220 ffffffff 400f6330 S /system/b2g/plugin-container


/proc/meminfo memfree stays around 1.5 mb during the last few MW0 steps and never dips below 1mb.


Could we just not be killing off plugin-container fast enough?
> Could we just not be killing off plugin-container fast enough?

We're not killing either process here -- they're both segfaulting, instead of being SIGKIL'ed.

I thought the kernel should kill a process before causing malloc to fail, but maybe that is incorrect.
Nope - the kernel has no control over malloc, that's a user-mode thing. And whether to kill a process or not would be a user-mode policy type decision, which the kernel goes to great lengths to avoid.

So in this case, the Assertion failure is causing the segfault (NS_ASSERTION eventually winds up at MOZ_CRASH which does a write to memory location zero which causes the segfault).
> Nope - the kernel has no control over malloc, that's a user-mode thing.

Sorry, what I mean is, malloc returns null iff mmap fails.  But mmap should only fail if we run out of virtual address space, right?  If we're running low on physical memory but have sufficient virtual memory, mmap should succeed.  Then when we touch one of those pages, the kernel should notice we're low on memory and kill something.

NS_ASSERTION is not fatal in release builds, but I guess it might be fatal in dougt's debug build...
If the NS_ASSERTIONs are causing us to crash (and if they are indeed not fatal in release builds, which I'm pretty sure is true), then dougt's logcat is not necessarily useful in understanding cjones's bug.
The assertion in comment 1 would in fact be exactly what that patch fixes.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.