Closed Bug 1036482 Opened 10 years ago Closed 10 years ago

Crash in low memory situation while stability testing

Categories

(Core :: DOM: Content Processes, defect)

32 Branch
ARM
Gonk (Firefox OS)
defect
Not set
critical

Tracking

()

RESOLVED DUPLICATE of bug 1037360
blocking-b2g 2.0+
Tracking Status
b2g-v2.0 --- affected

People

(Reporter: ggrisco, Assigned: cyu)

References

Details

(Keywords: crash, Whiteboard: [caf-crash 252][caf priority: p1][CR 691088][b2g-crash])

Crash Data

Attachments

(4 files)

Attached file decoded minidump
During stability testing which involved usage of call, SMS, Camera, Camcorder, music and video, found the following crash:

[@ nsRefPtrmozilla::dom::ContentParent::~nsRefPtr() | nsTArray_Impl, nsTArrayInfallibleAllocator>::RemoveElementsAt(unsigned int, unsigned int) | mozilla::PreallocatedProcessManager::MaybeForgetSpare(mozilla::dom::ContentParent*) | mozilla::dom::ContentParent::OnChannelError() ]
Crash Signature: [@ nsRefPtrmozilla::dom::ContentParent::~nsRefPtr() | nsTArray_Impl, nsTArrayInfallibleAllocator>::RemoveElementsAt(unsigned int, unsigned int) | mozilla::PreallocatedProcessManager::MaybeForgetSpare(mozilla::dom::ContentParent*) | mozilla::dom::ContentPa…
Component: General → DOM: Content Processes
Keywords: crash
Product: Firefox OS → Core
Whiteboard: [CR 691088] → [CR 691088][b2g-crash]
Version: unspecified → 32 Branch
Whiteboard: [CR 691088][b2g-crash] → [caf priority: p1][CR 691088][b2g-crash]
Observed on: 

Device: msm8610
Gonk Version: AU_LINUX_GECKO_B2G_KK_3.6.01.04.00.000.026
Moz BuildID: 20140707000200
B2G Version: 2.0
Gecko Version: 32.0a2
Gaia:  http://git.mozilla.org/?p=releases/gaia.git;a=commit;h=ef67af27dff3130d41a9139d6ae7ed640c34d922
Gecko: http://git.mozilla.org/?p=releases/gecko.git;a=commit;h=d3eae03cdf4e6944e646d05938a22dc1380a0d95
blocking-b2g: 2.0? → 2.0+
overholt, given the crash seems to be in DOM can you help assign this to someone to start with investigation on the minidump attached ?
Flags: needinfo?(overholt)
This seems nuwa-related.  Thinker/Cervantes?
Flags: needinfo?(tlee)
Flags: needinfo?(overholt)
Flags: needinfo?(cyu)
From the log, it appears that the OOM killer instead of the lowmem killer is active and starts killing processes. This is bad since the process priorities and OOM score adjs should prevent the OOM killer from being active.

Some questions:
* How reproducible is it? How can I run the test locally to reproduce?
* What device is used? Can you show the result of b2g-info? We need to look at the oom_adj and min_free settings.
Flags: needinfo?(tlee)
Flags: needinfo?(ggrisco)
Flags: needinfo?(cyu)
We saw this issue 3 times so far, once on the last build tested.  Comment 2 shows the information for that, including the device tested (8x10).  I'll see if we can run b2g-info next time we reproduce this crash, but I don't have that info now.
Flags: needinfo?(ggrisco)
Attached file b2g-ps log
Actually, we did capture b2g-ps on the last crash, so posting the logs here.
Attached file dmesg log
Passing on this to Cervantes as he's helping with the investigation here and we do not want to have critical stability bugs like these unassigned. Cervantes, we can move the assignee if needed.
Assignee: nobody → cyu
From the b2g-ps log, it is obvious that there is a huge memory leak in the b2g process. The vsize of the b2g process grows significantly during the test.

In the beginning of the test, vsize is around 260 MB:
b2g              0     0        96          0          root      228   1     268588 10932 ffffffff b6ec48ac S /system/b2g/b2g

Just before the crash of the first launch of b2g, vsize is around 880 MB.
b2g              0     0        226         0          root      228   1     880868 18524 ffffffff b4c41a78 R /system/b2g/b2g

Since the device has zram enabled, RSS of the b2g process doesn't reflect the amount of memory used. 

And from dmesg, we see such a shortage that the kernel fails to allocate memory for the compressed page:

<6>[58080.354207] zram: Error allocating memory for compressed page: 29312, size=1963

I think the crash is highly likely the result of memory leak. Please attach memory report and DMD for the test.
See Also: → 1037360
Observed on: 

Device: msm8610
Gonk Version: AU_LINUX_GECKO_B2G_KK_3.6.01.04.00.000.029
Moz BuildID: 20140710000201
B2G Version: 2.0
Gecko Version: 32.0a2
Gaia:  http://git.mozilla.org/?p=releases/gaia.git;a=commit;h=35a9b715e7348ec738ff6c8a59f50190390a06f2
Gecko: http://git.mozilla.org/?p=releases/gecko.git;a=commit;h=2fb60c777d3f82d580cba249e5e01a167a01de39
Greg, since bug 1037360 already has a patch, could you apply the patch and check if the crash in this bug is also fixed?

If the crash is fixed, we'll duplicate this bug to 1037360.
If the crash still happens, then there is another memory leak in b2g that we must fix. We'll need more specific STR (like the test script) to reproduce the problem locally. Otherwise, we need you to capture memory report and DMD report to see what is consuming memory in the b2g process.

Thanks.
Flags: needinfo?(ggrisco)
(In reply to Cervantes Yu from comment #12)
> Greg, since bug 1037360 already has a patch, could you apply the patch and
> check if the crash in this bug is also fixed?

Yes, we are trying the patch and will report results when we have them.
Is it a dup of 1037360?
That's what the last three comments are about ...
(In reply to Kevin Hu [:khu] from comment #14)
> Is it a dup of 1037360?

Refer comment #12. We do not have enough info yet. If the crash is fixed we will DUP it, hence the wait here.
We haven't seen this crash since the build referenced in Comment 11, so that's a good sign.
Flags: needinfo?(ggrisco)
Sounds like a dupe of bug 1037360 after all, please reopen if it reoccurs.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → DUPLICATE
Whiteboard: [caf priority: p1][CR 691088][b2g-crash] → [caf-crash 252][caf priority: p1][CR 691088][b2g-crash]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: